Chapter 3 - Formal Language Theory

Formal Language Theory Review

Notation for regular expressions from Compilers Principles, Techniques, and Tools

As the prerequisite for this course is Theoretical CS, we should be familiar with the following terms:

Alphabet
String over some alphabet
Empty string
Language
Union
Concatenation
Kleene closure
Positive closure
Regular expressions

We remember that if r and s are regular expressions denoting languages L(r) and L(s) respectively that each of the following is true:

r|s denotes L(r) U L(s)
rs denotes L(r)L(s)
r^* denotes L(r)^*
r|s = s|r
r|(s|t) = (r|s)|t
(rs)t = r(st)
r(s|t) = rs|rt
er = r and re = r where e is a regular expression denoting the
 set containing the empty string
r^* = (r|e)^*
r^** = r^*

Regular Definitions

We may choose to give regular expressions names of the form:

n1 -> r1
n2 -> r2
...

where ni is a distinct name for a regular expression ri 
over the symbols in Σ U {n1, n2, ...}

We might define identifiers as follows over Σ = {A,B,...Z,a,b,...z,0,1,...9}:
(A U B ... U Z U a U b ... U z)(A U B ... U Z U a U b ... U z U 0 U 1 ... U 9)^*

It would be much easier to define identifiers as follows:

upper -> A | B | C | ... | Z
lower -> a | b | c | ... | z
digit -> 0 | 1 | 2 | ... | 9
id -> (upper | lower)(upper | lower | digit)^*

Notational shorthands for regular expressions

 
Notation     Meaning                Example
*            zero or more instances r*
⁺            one or more instances  r⁺ = rr^*
?            zero or one instance   r? = r | e
[a,b,c]      character class        [a,b,c] = a | b | c
                                    [a-z] = a | b | ...| z

P#1: What does -?[0-9]⁺ describe?

P#2: Using these notational shorthands, describe the set of identifiers.

P#3: Using these notational shorthands, describe the set of integer and real numbers.

P#4: Consider the language of all strings where every occurrence of a is followed immediately by a single occurrence of b over the alphabet {a,b}. Construct a regular expression that accepts this language. Construct a DFA that accepts this language.

P#5: Write a regular expression for real numbers where the number of digits on the left of the decimal point is equal to the number of digits on the right. A decimal point is necessary.

P#6: Give a CFG for P#5.

P#7: Give a CFG which accepts the language described in P#5 but where the digit sequence on the left of the decimal point is the same as on the right.