Chapter 4 - Syntax Analysis

Note: This set of notes must be viewed with IE 4.0 or later because of the greek characters.

Reading: pp. 159 - 195

At this point in time we have discussed:

As CS310 is a prerequisite for this course, I am making the assumption at this point that the following problems are easily solved and do not need review in CS480. If you do not find these problems easily solvable, you need to review your notes from CS310 and come up to speed.

P#1: Describe the language denoted by the regular expression ((e|0)1*)*.

P#2: Write regular definitions for: (a) all strings that begin with an aa (b) all strings that contain aa (c) all strings that do not contain aa over the alphabet {a,b}.

P#3: Construct an NFA for the regular expression ((e| a)b*)*.

P#4: Construct a DFA for the regular expression ((e| a)b*)*


Now on to syntax analysis, Chapter 4!!!

By now, we know that all programming languages have rules that describe well-formed programs. In chapter 2 we used CFGs to describe allowable expressions. We will expand the use of CFGs to describe our C subset programming language. It is also the case that certain classes of grammars can help us construct efficient parsers. We now turn our attention to the study of various parsing methods.

Grammar rules, like regular expressions, are defined over some alphabet. In the case of grammar rules, the symbols are usually lexemes whereas the symbols for regular expressions are usually characters.

Several notations can be used for CFGs. We've already seen the following:

expr -> expr op expr | ( expr ) | number | id
op -> + | - | *
Another common notation that you are likely to encounter is BNF (Backus-Naur Form):
< expr > ::= < expr > < op > < expr > | (< expr >) | NUMBER
< op > ::= + | - | *
From here on out we will use the following notational conventions as defined in Compilers Principles, Techniques, and Tools.
  1. Terminal Symbols
  2. Nonterminal Symbols
  3. Strings of terminals
  4. Greek letters
  5. Alternate forms
  6. Start Production
When talking about grammars, the term "token" is a synonym for "terminal".

Derivations
=> signifies "can derive with one application of a production"
=>* signifies "can derive with zero or more applications of any productions"

Consider the grammar: E -> ( E ) | a

P#5: Does E =>* ((a))?
P#6: Does E => ((a))?
p#7: Does E =>* (a)(a)?

Consider the following two grammars: A -> Aa | a and B -> aB | a

P#8: T / F These two grammars describe the same language.
P#9: T / F These two grammars generate the same language as the regular expression a*.
P#10: Are each of these grammars ambiguous? Why or why not?
P#11: Which grammar is left recursive and which is right recursive?
P#12: T / F A =>* e

P#13: Give a CFG which generates sequences of one or more statements separated by semicolons (i.e. L(G) = {s s;s s;s;s ...})

P#14: Give a CFG which generates sequences of one or more statements where the semicolon is a terminator and not a separator (i.e. L(G) = {s; s;s; s;s;s; ...} )


© Douglas J. Ryan / ryandj@pacificu.edu