Bottom-up Parser (Part 4 of 7)

Date assigned: 3/9/07
Date due: 3/18/07 (by 5pm)
Points: 50

You are to write the bottom-up portion of the parser which will be an operator-precedence parser.

1) You need to make a few changes to your lexical analyzer as follows:

The lexemes "==" and "!=" must be in a separate class of their own, EQUOP since == and != do not have the same precedence as < , > , ...
The scanner must distinguish between the unary and binary minus. Unary minus will be in a class of its own called NEGOP.
The scanner must distinguish between unary "*" (pointer to) and binary "*" (multiplication). Unary "*" must be in a class of its own.

2) The expression grammar is at the bottom of this page. You must calculate the LEADING and TRAILING sets for each nonterminal in the grammar as well as the precedence relations between each terminal in the grammar. Use the spreadsheet I handed out in class.

3) There is no $ to end expressions and any symbol not in the expression grammar should act like $.

4) There are places in the precedence table that have no relation and are not errors. These are $ ) and $ , because ) and , can follow EXPRESSION in the top-down parser.

5) Your expression stack is to be a stack of (class,value) pairs. Nonterminals will be represented on the stack as dummy class, 99 with no value. Renumber all of your token classes to match the row numbers of the precedence table, starting with zero.

6) As output, print each handle as it is reduced including the dummy nonterminals in the handle.

7) When an error occurs, eat tokens until you get to a symbol not in the expression grammar.

Our expression grammar for the compiler (part 4 of 7) will be the following:

exp        -> lvalue assignop exp | * unexp assignop exp |
              orterm
orterm     -> orterm || andterm | andterm
andterm    -> andterm && eqterm | eqterm
eqterm     -> eqterm equop relterm | relterm
relterm    -> relterm relop term | term
term       -> term addop factor | factor
factor     -> factor mulop unexp | unexp
unexp      -> lvalue autoop | & lvalue | * unexp |
              negop unexp | primary
primary    -> ( exp ) | lvalue | constant | func
lvalue     -> var | ( lvalue )
var        -> id | id [ exp ]
func       -> id | id ( list )
list       -> exp | list , exp

The datafile will consist of lines of expressions separated by a semicolon. Your output is to print each line (exactly) from the datafile followed by each handle on a separate line. Each handle consists of one or more token class values as defined above. Separate each token class value of a given handle by exactly one space and after the last handle is printed for a given line of data, print ---ACCEPT--- if the expression is in an accept state and ---NOT ACCEPT--- if the expression is not in an accept state. Finally, print exactly one blank line and then go to the next line in the file and do the same thing.

As an example, if the datafile contained:

id = 5;
id = id + 1;

Your output would look exactly like:

id = 5;
15
14
99 0 99
---ACCEPT---

id = id + 1;
15
15
14
99 5 99
99 0 99
---ACCEPT---

Note1: Remember, your compiler is to be in a directory called yourlastname. In that directory there is to be a makefile that I simply use to compile your project creating the executable pcc. Continue creating subdirectories within yourlastname directory for each of the modules needed as we go through the compiler creation process; thus, for this third assignment, you will need to create at least a subdirectory called bu where all of the bottom-up parser routines will go. Also, continue to use subversion as we build this project.

Note2: Use the submit script to submit your solution: zeus$ submit cs480s07 user.tar.gz

Note3: On the day the program is due, turn in a colored hard copy of the source code in this order:

1. makefile
2. main .h/.c
3. bu.h / bu.c
4. any new .h/.c source file combinations
5. all older .h/.c source file combinations

Douglas J. Ryan / ryandj@pacificu.edu