COMS W4115
Programming Languages and Translators
Lecture 4: February 3, 2014
Structure of a Compiler & Lexical Analysis

Overview

  1. Structure of a compiler
  2. The lexical analyzer
  3. Language theory background
  4. Regular expressions
  5. Tokens/patterns/lexemes/attributes

1. Structure of a Compiler

2. The Lexical Analyzer

3. Language Theory Background

4. Regular Expressions

5. Tokens/Patterns/Lexemes/Attributes

6. Practice Problems

  1. What language is denoted by the following regular expressions?
    1. (a*b*)*
    2. a(a|b)*a
    3. (aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)*
    4. a(ba|a)*
    5. ab(a|b*c)*bb*a
  2. Construct Lex-style regular expressions for the following patterns.
    1. All lowercase English words with the five vowels in order.
    2. All lowercase English words with exactly one vowel.
    3. All lowercase English words beginning and ending with the substring "ad".
    4. All lowercase English words in which the letters are in strictly increasing alphabetic order.
    5. Strings of the form abxba where x is a string of as, bs, and cs that does not contain ba as a substring.

7. Reading Assignment



aho@cs.columbia.edu