Lexical Analyzer (Part 2 of 7)


Date assigned:
2/16/07
Date due: 2/28/07
Points: 50

Write the lexical analyzer for your compiler. Remember, the lexical analyzer returns a (token class, value) pair on demand to the parser. Your lexical analyzer will also produce a source listing where each line is preceded by a line number. There is no restriction on line length.

White space is defined as a blank, tab, newline, and comment delimited by /* and */. White space is to be skipped over and is not to be treated as a token but can delimit a token.

When your lexical analyzer encounters an identifier, look it up in the ST (but don't add it) to make sure it is not a reserved word.

When a constant is encountered, do the following:

  1. Evaluate the constant (i.e. convert the constant from a string to an integer)
  2. Search for the constant in the constant table which is a data structure that has two fields for each distinct constant. The two fields are the constant value and the runtime stack location where the constant has been placed.
  3. If the constant is not found, allocate space in the runtime stack and place the constant in that space. Add the constant to the constant table.
  4. If the constant is found, set the token value to the runtime stack address of that constant.
Token classes and values are as follows:

#  Class       Lexemes                 Values
1  reserved    given in assignment 1   which one
   words       main (1) int (2) if (3)
               else (4) return (5) 
               for (6) input (7)
               output (8)
2  identifiers [A-Za-z][A-Za-z0-9]*    NULL
3  constant    [0-9]+                  runtime stack address
4  relop       == != <= >= < >         which one (1, 2, 3, 4, 5, 6)
5  addop       + -                     which one (1, 2)
6  mulop       * / %                   which one (1, 2, 3)
7  autoop      ++ --                   which one (1, 2)
8  assignop    =                       NULL
9  addressof   &                       NULL
10  boolop     || &&                   which one (1, 2)
11 semicolon   ;                       NULL
12 comma       ,                       NULL
13 parenthesis ( )                     which one (1, 2)
14 bracket     [ ]                     which one (1, 2)
15 brace       { }                     which one (1, 2)

Consider the following C program:
main ()
{
  5
}
The results from Lex will be:
001 main ()


CLASS   LEXEME          VALUE
-----   ------          -----
1       main                1
13      (                   1
13      )                   2


002 {


CLASS   LEXEME          VALUE
-----   ------          -----
15      {                   1


003   5


CLASS   LEXEME          VALUE
-----   ------          -----
3       5                   1


004 }


CLASS   LEXEME          VALUE
-----   ------          -----
15      }                   2   
     

           CONSTANT TABLE

CONSTANT VALUE     RUNTIME STACK ADDRESS
--------------     ---------------------
             5                         1

           RUNTIME STACK
  
       ADDRESS     VALUE 
       -------     ----- 
             0        -1
             1         5
             2        -1
Test your lexical analyzer by constructing a driver that performs the following:
  1. Add all reserved words to the symbol table
  2. Get tokens one at a time until the end of file. Print each token on a line by itself under the headings: CLASS, LEXEME, and VALUE as shown above. Do not add any additional blank lines in your output other than the ones shown. Print the numeric value of the class type and for a value of NULL, output a 0.
  3. Dump the constant table.
  4. Dump the runtime stack up to (and including) the next available location.

IMPORTANT: Lex returns a SINGLE token to your driver program until the end of file is reached. Your driver program must make a call to lexGetToken() which is in the lex module and lexGetToken() returns a single token to the caller. A global data structure can be used to hold the token information.

Note1: Remember, your compiler is to be in a directory called yourlastname. In that directory there is to be a makefile that I simply use to compile your project creating the executable pcc. Continue creating subdirectories within yourlastname directory for each of the modules needed as we go through the compiler creation process; thus, for this second assignment, you will need to create at least a subdirectory called lex where all of the lexical analyzer routines will go. Also, continue to use subversion as we build this project. You won't be disappointed.

Note2: Use the submit script to submit your solution: zeus$ submit cs480s07 user.tar.gz

Note3: On the day the program is due, turn in a copy of the source code in this order:

  1. makefile
  2. main .h/.c
  3. lexical analyzer .h/.c
  4. any new .h/.c source file combinations
  5. all older .h/.c source file combinations

Note4: Remember, the first location of the runtime stack is reserved for a function return value, thus begin saving your constants at location 1 on the runtime stack.

 

Douglas J. Ryan / ryandj@pacificu.edu