Lexical Analyzer (Part 2 of 7)
Date assigned: 2/16/07
Date due: 2/28/07
Points: 50
Write the lexical analyzer for your compiler. Remember, the lexical analyzer returns a (token class, value) pair on demand to the parser. Your lexical analyzer will also produce a source listing where each line is preceded by a line number. There is no restriction on line length.
White space is defined as a blank, tab, newline, and comment delimited by /* and */. White space is to be skipped over and is not to be treated as a token but can delimit a token.
When your lexical analyzer encounters an identifier, look it up in the ST (but don't add it) to make sure it is not a reserved word.
When a constant is encountered, do the following:
# Class Lexemes Values
1 reserved given in assignment 1 which one
words main (1) int (2) if (3)
else (4) return (5)
for (6) input (7)
output (8)
2 identifiers [A-Za-z][A-Za-z0-9]* NULL
3 constant [0-9]+ runtime stack address
4 relop == != <= >= < > which one (1, 2, 3, 4, 5, 6)
5 addop + - which one (1, 2)
6 mulop * / % which one (1, 2, 3)
7 autoop ++ -- which one (1, 2)
8 assignop = NULL
9 addressof & NULL
10 boolop || && which one (1, 2)
11 semicolon ; NULL
12 comma , NULL
13 parenthesis ( ) which one (1, 2)
14 bracket [ ] which one (1, 2)
15 brace { } which one (1, 2)
Consider the following C program:
main ()
{
5
}
The results from Lex will be:
001 main ()
CLASS LEXEME VALUE
----- ------ -----
1 main 1
13 ( 1
13 ) 2
002 {
CLASS LEXEME VALUE
----- ------ -----
15 { 1
003 5
CLASS LEXEME VALUE
----- ------ -----
3 5 1
004 }
CLASS LEXEME VALUE
----- ------ -----
15 } 2
CONSTANT TABLE
CONSTANT VALUE RUNTIME STACK ADDRESS
-------------- ---------------------
5 1
RUNTIME STACK
ADDRESS VALUE
------- -----
0 -1
1 5
2 -1
Test your lexical analyzer by constructing a driver that performs the following:
IMPORTANT: Lex returns a SINGLE token to your driver program until the end of file is reached. Your driver program must make a call to lexGetToken() which is in the lex module and lexGetToken() returns a single token to the caller. A global data structure can be used to hold the token information.
Note1: Remember, your compiler is to be in a directory called yourlastname. In that directory there is to be a makefile that I simply use to compile your project creating the executable pcc. Continue creating subdirectories within yourlastname directory for each of the modules needed as we go through the compiler creation process; thus, for this second assignment, you will need to create at least a subdirectory called lex where all of the lexical analyzer routines will go. Also, continue to use subversion as we build this project. You won't be disappointed.
Note2: Use the submit script to submit your solution: zeus$ submit cs480s07 user.tar.gz
Note3: On the day the program is due, turn in a copy of the source code in this order:
Note4: Remember, the first location of the runtime stack is reserved for a function return value, thus begin saving your constants at location 1 on the runtime stack.
Douglas J. Ryan / ryandj@pacificu.edu