Lexical Analyzer & Numeric Conversion

Date assigned: 1/4/00
Date due: 1/11/00
Points: 50

Write the following two functions that will ultimately be part of your assembler. Save these two functions and their supporting functions in modules lex.c and evl.c respectively after they have been thoroughly tested.


function FGetTokens

Let 'skip' be any group of consecutive spaces and/or tab characters and 'blat' be any group of consecutive characters that do not contain a space, comma, semicolon, tab, or newline character. Write a function to find the starting position of up to twelve 'blats' in pszSrc, a 256-element character array (0-255) input parameter. The starting positions are specified in one output parameter rgiw[12]. Note: All positions after a semicolon, a newline character, or an EOF character are ignored. The first 'blat' can start in the first position or may be preceded by a 'skip'. If it is followed directly by a colon , then its starting position is stored in rgiw[0] and rgiw[1] is set to the starting position of the first 'blat' (if present) following the colon. If it is not directly followed by a colon, rgiw[0] is set to negative one and its starting position is stored in rgiw[1]. An optional 'skip' can precede the 'blat' pointed to by rgiw[1]. If a 'blat' follows the blat pointed to by rgiw[1], a 'skip' must follow the 'blat' that rgiw[1] points to. Zero to ten 'blats' may then follow separated by commas (each optionally preceded and/or optionally followed by a 'skip'). rgiw[i] will point to the ith 'blat' after rgiw[1]. It is possible that there is no 'blat' following the 'blat' that rgiw[0] points to; in this case rgiw[1] and all the rgiw[2-11] cells are set to negative one. Ten or fewer 'blats' may follow the 'blat' that rgiw[1] points to; in this case the remaining rgiw cells are set to negative one. Any irregularities are noted by returning an error condition of TRUE in the function name FGetTokens; otherwise, a condition of FALSE is returned.

Changes:


function FEvalNum

Write a function to convert a sequence of characters to the value represented. FEvalNum has three parameters:

1) pszSrc: a 256-element character array zero terminated

2) puMic: upon entry, pszSrc[puMic] is the first character of the sequence; upon exit, puMic (a non-negative integer value) has been incremented so that pszSrc[puMic] is the character after the last character of the number

3) puVal: the integer value of the constant (a non-negative integer value less than 2^16)

If puVal accurately represents the number scanned, set FEvalNum to FALSE; otherwise, set FEvalNum to TRUE if the actual constant would have been greater than 2^16-1. In the latter case, puVal is set to zero.

A constant may be in decimal, octal, binary or hexadecimal notation (see project manual for a description of their representations)


You are to write driver programs (lexdriver & evldriver) and make sure that these two routines work correctly for all sets of data.

The header for FGetTokens is: int FGetTokens(char *pszSrc, int *prgiw)

The header for FEvalNum is: int FEvalNum(char *pszSrc, unsigned int *puVal, unsigned int *puMic)

Turn in source listings of all code comprising both lex.c and evl.c modules.

Program grading will be based on structure, style, documentation, and execution. Most of the grade will be based on execution and how your routines handle several sets of test data.

I would strongly suggest you do two things:

1) In some CFG or flowchart style notation, map out the allowable syntax for the lines described above.

2) Make the test data as you go along. My main goal in the test phase of this assignment is to bomb your program and have it miss as many test cases as possible. I'm very good at achieving my goal!!!


The following is the data file format and output for FGetTokens and FEvalNum.

FGetTokens

Datafile
--------
source[0..255]
source[0..255]
and so on
EOF

Output
------
      rgiw
Line#  [0]  [1]  [2] ...
    1    0    3   -1 ...
    2  Syntax Error
and so on

FEvalNum

Datafile
--------
source[0..255]
puMic
source[0..255]
puMic
and so on

Output
------
Line#  puVal  puMic  
    1      5      3
    2      0      7  Error
and so on

Also,

1) Two separate tar files (lex.tar & evl.tar) are to be mailed to ryand on circe where I will simply use the command tar xvf filename.tar and then make with my driver. Each of your lex.c amd evl.c modules are to include the necessary external libraries and files. Do not put these two modules in different directories. That is, place all files in the same directory, tar them up, and send them to me on circe.

2) The driver will be such that the command lexdriver datafile or evldriver datafile will read the data from the file datafile and produce the output discussed above.

If you have questions, drop me a line. Don't go down the wrong road and then have to back up. ALL of us must be operating under the same set of assumptions. Please pay particular attention to detail in this class. Example, I've had someone in the past use flxgettoken instead of FGetToken. This worked for their driver but did not compile for mine as the specification above is very specific.


© 2000 Douglas J. Ryan/ryandj.pacificu.edu