The first half of this course consists of understanding the relationships between the lexical analyzer, parser, and ST. In general, we have something that looks like the following:
source code --> Lexical Analyzer --> Parser <-- \ / \ / Symbol Table
Q#1: What is the purpose of the lexical analyzer?
Q#2:
What is the purpose of the parser?
Q#3: What is the purpose of the
symbol table?
In general, the lexical analyzer, the parser, and the ST are three distinct modules within the compiler. It is possible for the parser to be combined with the lexical analyser but this tends to lead to more inefficient compilers in general.
Reasons for separating lexical analysis and parsing:
1) simpler design (e.g. easier to separate white space in lexical analyzer than in parser)
2) more efficient (e.g. better I/O if in lexical analyzer and not combined with parser)
3) better portability (e.g. machine dependent specifics can be limited to lexical analyzer)
A few definitions are necessary:
token - identifies a category understood by the parser (e.g. relop, id)
pattern - rules that define strings of a particular token type (e.g. id is a letter followed by letters and digits)
lexeme - a sequence of characters matched by a pattern of a particular token type (e.g. cost is a lexeme for token id)
Our lexical analyzer will return token,value pairs.
Q#4: Consider the following C statement: sum = 2 + sum - num--; Identify the token,value pairs for each of the lexemes in the statement. Note: each token does not necessarily have a value.
lexeme | token | value |
A limited amount of error handling happens during the lexical analysis stage. If an error does occur, how do we proceed?
Possibilities:
fi (vals == nums[i]);
So how might we proceed with implementing a lexical analyzer?