Must be viewed in IE 4.0 or later for Greek symbols.

Parsing

In chapter 2, we saw an example of an ambiguous grammar and defined an ambiguous grammar as one that generates two different parse trees. An example of such a grammar is:

expr -> expr op expr | ( expr ) | number
op -> + | - | * | /
To understand how various parsers work, we must discuss various derivations as follows:

Defn: A CFG consists of:

  1. A set of terminals, T
  2. A set of nonterminals, N, disjoint from T
  3. A set of productions, P, of the form, A -> α where A is in N and α is in (T U N)*
  4. A start symbol, S, in N
Let G be a CFG, then we say a "derivation step" over G is of the form αAγ => αβγ where α and γ are in (T U N)* and A -> β is in P. The set of symbols of G is (T U N) and α in (T U N)* is called a sentential form.

P#1: Consider the above grammar for expressions. Give the leftmost derivation of the string (id+id). Notationally, this would read, show expr =>*lm (id+id)

Note1: If S =>*lmα, then we say that α is the left-sentential form of grammar G.

Note2: We can do the same derivations with similar notation for the rightmost derivation and the right-sentential form of the grammar G.

So where are we? Well for certain types of parsers, ambiguity is not a good thing and must be eliminated. For others, we can use some disambiguating rules to eliminate unwanted parse trees. Our discussion leads us to discussing these two topics in detail.

P#2: Consider the following grammar:

expr -> expr op expr | term
op -> + | - | *
term -> number
a) Is this grammar ambiguous? If so, why? If not, why not?

b) Rewrite the above grammar so that multiplication has precedence over addition and subtraction and in doing so, is not ambiguous.

Consider the following grammar:

stmt -> ifstmt | other
ifstmt -> if ( expr ) stmt | if ( expr ) stmt else stmt
expr -> T | F
Q#1: What can you say about this grammar?

Many times left recursion is unwanted in CFGs. Two kinds of left recursion can exist:

Immediate Left Recursion:

E -> E + T | T
T -> T * F | F
F -> (E) | id
Nonimmediate Left Recursion:
S -> Aa | b
A -> Ac | Sd | e
Let's begin by describing the elimination of immediate left recursion.

Productions of the form: A -> Aα | β need to be replaced by productions of the form:

A -> βA'
A' -> αA' | e
P#3: Eliminate the immediate left recursion from the above example.

P#4: Eliminate the immediate left recursion from the following example.

E -> E + T | T | Q
T -> T * F | F
F -> ( E ) | id
Q -> number
We need to discuss Algorithm 4.1 on p. 177 in conjunction with the following problem:

P#5: (a) What is L(S)? (b) Eliminate all left recursion in the grammar:

S -> Ba | b
B -> Sa | a

© Douglas J. Ryan / ryandj@pacificu.edu