Chapter 2 - A simple one-pass compiler

Chapter 2 - A Simple One-Pass Compiler

Reading: pp. 25 - 51 We begin by taking a look at a one-pass compiler and topics to include lexical analysis, parsing, and code generation. We will follow the example from Aho which constructs a compiler that translates infix expressions to postfix expressions.

P#1: Consider the infix expression 5 + 2 / 4 + 1. Give the postfix and prefix form for this expression.

Before we can write a lexical analyzer as described in chapter 1, we must be able to describe a language for expressions somehow. We will begin by describing languages using context-free grammars.

Context-free grammars have four basic components:

A set of terminal symbols that represent tokens.
A set of nonterminal symbols
A set of productions
A start symbol

Conventions for our context-free grammars:

Productions listed first will contain the start symbol.
Terminals are bold, symbols such as +, and digits.
Italicized names are nonterminals and nonitalicized names are tokens.

number -> int
int -> int digit | digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

P#2: (a) What language does the above grammar represent? (b) Add the production for real (without scientific notation) to the above context-free grammar.

The set of tokens that can be derived from the grammar form the language described by the grammar. Further, we can show the parse tree that represents the derived string in the language.

Properties for our parse trees:

The root corresponds to the start symbol.
Each leaf is a token or epsilon.
Each interior node is a nonterminal symbol.
The children (from left to right) of a nonterminal form a production.

P#3: Given the above grammar for numbers, draw the parse tree for the number 123.

Q#1: Consider the following grammar:

bin -> bin + bin | bin - bin | 0 | 1

a) Is this grammar ambiguous or not? Why or why not?

b) When we talk about the associativity of operators, what are we referring to?

c) Is + in the above grammar left or right associative? Explain.

Q#2: Consider the following grammar:

 expr -> expr + term | expr - term | term   
 term -> term * factor | term / factor | factor   
 factor -> digit | ( expr )

a) Is this grammar ambiguous or not? Why or why not?

b) Identify each operator as left or right associative.

c) Can the above grammar generate the string 9 + 2 * 5? If so, what productions do so?

Syntax-directed Translation

Consider the problem of converting infix expressions to postfix expressions. Context-free grammars can be used to specify the syntax of the input and semantic rules can be used with each production to provide the meaning of the production.A syntax-directed definition thus consists of the grammar and the symantic rules.

So what is a syntax-directed translation? Answer: Using the syntax-directed definition, the translation maps an input to a specific output given the grammar and semantic rules.

Consider the following syntax-directed definition for translating infix expressions to postfix expressions as defined in Compilers Principles, Techniques, and Tools.

Production               Semantic Rule

expr -> expr1 + term     expr.t := expr1.t || term.t || '+'
expr -> expr1 - term     expr.t := expr1.t || term.t || '-'
expr -> term             expr.t := term.t
term -> 0                term.t := '0'
term -> 1                term.t := '1'
...
term -> 9                term.t := '9'

What do we mean by term.t? Essentially, the t is an attribute of term at a particular node. In the above syntax-directed definition, the t is a string and the || stands for concatenation of strings. For now, we are only going to look at synthesized attributes which means that the attribute of a particular node in a parse-tree is determined from the children's attribute values of the particular node.

P#4: Consider the expression 4 - 3 + 2 - 1. Show the annotated parse tree for this expression.