COMS E6998-2
Advanced Topics in Programming
Languages and Compilers
Lecture 1: September 6, 2011
4:10-6:00pm
Lecture Outline
- Introductions
- Course overview
- Prerequisites
- Course project
- Language desiderata
- Language design principles
- Kinds of languages
- Major application areas
- Untyped lambda calculus
- References
1. Introductions
2. Course Overview
- Course objectives
- Understanding how modern language and compiler technology can be used to make
more reliable software
- Learning the major concepts and design principles underlying programming languages
- Understanding modern program analysis techniques and tools
- Awareness of language and compiler issues in dealing with parallelism
and concurrency
- A highlight of this course is a semester-long
project in which you can explore some of these
concepts in more depth.
- Course syllabus
- Language design
- Lambda calculus and functional languages
- Concurrency and parallelism
- Program analysis techniques
- Interprocedural analysis
- Pointer analysis
- Binary decision diagrams
- Model checking
- Satisfiability modulo theory solvers
- Abstract interpretation
3. Prerequisites
- Fluency in at least one major programming language such as C, C++, C#, Java, or OCaml
- CS4115, Programming Languages and Translators
4. Course Project
- Each student should select a programming language or compiler topic to pursue in
more depth. Students will periodically discuss their projects with the class
and, at the end of the semester, will present their project and hand in a final project
report summarizing their findings.
- The project and classroom discussions will determine the final grade.
5. Language Desiderata
- A programming language should
- Be easy to learn
- Support rapid development
- Encourage reliability
- Promote efficiency
- Facilitate maintenance and evolution
- Language design concerns
- Programming model
- Readability
- Writability
- Expressiveness
- Learnability
- Performance
- Scalability
- Portability
6. Language Design Principles
- Simple, regular programming model for evaluation, data reference, memory management
- Versatile abstraction mechanisms for control, data, types
- Sound type system
- Precise language definition
7. Kinds of Languages
- Imperative
- Program specifies how a computation is to be done.
- Examples: C, C++, C#, Fortran, Java
- Declarative
- Program specifies what computation is to be done.
- Examples: Haskell, ML, Prolog
- von Neumann
- One whose computational model is based on the von Neumann architecture.
- Basic means of computation is through the modification of variables (computing
via side effects).
- Statements influence subsequent computations by changing the value of memory.
- Examples: C, C++, C#, Fortran, Java
- Object-oriented
- Program consists of interacting objects.
- Each object has its own internal state and executable functions (methods)
to manage that state.
- Object-oriented programming is based on encapsulation, modularity,
polymorphism, and inheritance.
- Examples: C++, C#, Java, OCaml, Scala, Simula 67, Smalltalk
- Scripting
- An interpreted language with high-level operators for
"gluing together" computations.
- Examples: AWK, Perl, PHP, Python, Ruby
- Functional
- One whose computational model is based on the recursive definition of functions
(lambda calculus).
- Examples: Haskell, Lisp, ML
- Parallel
- One that allows a computation to run concurrently on multiple processors.
- Examples
- Libraries: POSIX threads, MPI
- Languages: Ada, Cilk, OpenCL, Chapel, X10
- Architecture: CUDA (parallel programming architecture for GPUs)
- Domain specific
- Many areas have special-purpose languages to facilitate the creation of applications.
- Examples
- Yacc for creating parsers
- Lex for creating lexical analyzers
- Matlab for scientific computation
- Markup
- Not programming languages in the sense of being Turing complete, but widely used
for document preparation.
- Examples: HTML, XHTML, XML
- Tiobe programming community index top 10 for May 2011
- Java, C, C++, C#, PHP, Objective-C
- Python, (Visual) Basic, Perl, Ruby
8. Major Application Areas
- Scientific computing
- Scripting applications
- Specialized applications
- LaTex for typesetting
- SQL for database applications
- VB macros for spreadsheets
- Symbolic programming
- F#, Haskell, Lisp, ML, OCaml
- Systems programming
- Web programming
- CGI, HTML, Java, JavaScript, Perl
9. Introduction to Lambda Calculus
- Lambda calculus was introduced in the 1930s by Alonzo Church
as a mathematical system for defining computable functions.
- Lambda calculus is equivalent in definitional power to that of Turing machines.
- Lambda calculus serves as the model for functional programming languages.
- Lisp was developed by John McCarthy in 1956 around lambda calculus.
- ML, a general purpose functional programming language, was developed by
Robin Milner in the late 1970s.
- Haskell, considered by many as one of the purest functional programming languages,
was developed by Simon Peyton Jones, Paul Houdak, Phil Wadler and others in the late 1980s
and early 90s.
10. Grammar for Lambda Calculus
- The central concept in lambda calculus is an expression
which can denote either a function definition (called a function abstraction)
or a function application.
expr → abstraction | application | (expr) | var
abstraction → λ var . expr
application → expr expr
We can think of a lambda-calculus expression as a program which when
evaluated returns a result consisting of another lambda-calculus expression.
11. Function Abstraction
- A function abstraction, often called a lambda abstraction, is an expression
defining a function.
- It consists of a lambda followed by a variable, a period, and then an expression:
λ var . expr
- In the function λ var . expr, var is the formal parameter and expr the body.
- We say λ var . expr binds var in expr.
- Example
- λx.y is a function abstraction.
- The variable x after the λ is the formal parameter of the function.
- The expression y after the period is the body of the function.
12. Function Application
- A function application, often called a lambda application, consists of
an expression followed by an expression.
- Example 1: if e is a function and f an expression, then ef is a function application.
The expression f is the argument of the function e.
- Example 2: in (λx.y)z, we are applying the function λx.y to the
argument z.
- Function application is left associative and application binds tighter than period.
- Example 3: (λx. λy. xy) λz.z = (λx. (λy. xy)) λz.z
13. Free and Bound Variables
- In lambda calculus all variables are local to function definitions.
- In the function λx.x the variable x in the body of the definition
(the second x) is bound because its first occurrence in the
definition is λx.
- In the expression (λx.xy), the variable x in the
body of the function is bound and the variable y is free.
- In the expression (λx.x)(λy.yx):
- The variable x in the body of the leftmost
expression is bound to the first lambda.
- The variable y in the body of the second expression is bound to the second lambda.
- The variable x in the body of the second expression is free.
- Note that x in second expression is independent of the x in the first expression.
- In the expression (λx.xy)(λy.y):
- The variable y in the body of the leftmost
expression is free.
- The variable y in the body of the second expression is bound to the second lambda.
- Given an expression e, the following rules define FV(e), the set of free variables in e:
- If e is a variable x, then FV(e) = {x}.
- If e is of the form λx.y, then FV(e) = FV(y) - {x}.
- If e is of the form xy, then FV(e) = FV(x) ∪ FV(y).
- An expression with no free variables is said to be closed.
14. Renaming Bound Variables by Alpha Reduction
- The name of the parameter variable in function definition is arbitrary.
We can use any variable to name a parameter, so that the function
λx.x is equivalent to λy.y and λz.z.
This kind of renaming is called alpha reduction.
- Note that we cannot rename free variables in expressions.
- Also note that we cannot change the name of a bound variable in an
expression to conflict
with the name of a free variable in that expression.
15. Evaluation of Function Applications by Beta Reduction
- A function application fg is evaluated by substituting the argument g for the
formal parameter in the body
of the function definition f.
- The notation [y/x]e is used to indicate that y is to be substituted for all occurrences
of x in the expression e.
- Example: (λx.x)y → [y/x]x = y
- This substitution in a function application is called a beta reduction
and we use a right arrow
to indicate a beta reduction.
- If expr1 → expr2, we say expr1 reduces to expr2 in one step.
- In general, (λx.e)g → [g/x]e means that applying the function
(λx.e) to the argument expression g reduces to the function body [g/x]e
after substituting the argument expression g for the function's formal parameter x
in the function body e.
- A lambda-calculus expression (aka a "program") is "run" by computing a final result by
the application of zero or more beta reductions.
We use →* to denote the reflexive and transitive closure of →.
- Examples
- (λx.x)y → y (illustrating that λx.x is the identity function).
- (λx.xx)(λy.y) → (λy.y)(λy.y) → (λy.y);
thus, we can write (λx.xx)(λy.y) →* (λy.y);.
16. Substitutions
- In standard lambda calculus, the only name for a function is an expression
denoting the function.
- When we want to apply a function to an argument, we write down the whole function definition
and then proceed to evaluate it on the argument.
- As an example, let us apply the identify function to itself:
- (λx.x)(λy.y) → [λy.y/x]x = λy.y which
by alpha reduction is the same as λx.x
- Thus, as expected, the identify function applied to itself yields the identity function.
- When performing substitutions, we should be careful to avoid mixing up free
occurrences of a variable with bound ones.
- When we apply the function λx.e to an expression f, we substitute all
occurrences of x in e with f. If there is a free variable in f named x, we rename
the bound variable x in the function definition to avoid any conflicts before doing
the substitution.
- The rules for substitution are as follows. We assume x and y are distinct variables.
- For variables
- [e/x]x = e
- [e/x]y = y
- For function applications
- [e/x](f g) = ([e/x]f) ([e/x]g)
- For function abstractions
- [e/x](λx.f) = λx.f
- [e/x](λy.f) = λy.[e/x]f, provided y is not a free variable in e.
- Examples:
- The expression (λx.(λy.xy))y) contains a bound y in the middle
and a free y at the right. We therefore should rename the bound variable y to a new variable,
say z, to evaluate the expression with no name conflicts:
(λx.(λy.xy))y) = (λx.(λz.xz))y) →
[y/x](λz.xz) = (λz.yz)
- The body of the leftmost expression in (λx.(λy.(x(λx.xy))))y is
(λy.(x(λx.xy))). In this body only the first x is free.
Before substituting, we need to rename the bound variable y to
z, say, to avoid confusing it with its free occurrence. Therefore we get the
evaluation:
(λx.(λy.(x(λx.xy))))y = (λx.(λz.(x(λx.xz))))y →
[y/x](λz.(x(λx.xz))) = (λz.(y(λx.xz)))
17. Normal Forms
- An expression containing no more possible beta reductions is called a normal form.
- Any expression not containing a function application in it somewhere is a normal form.
- Examples of normal form expressions:
- x where x is a variable
- xe where x is a variable and e is a normal form expression
- λx.e where x is a variable and e is a normal form expression
- The expression (λz.z z) (λz.z z) does not have a normal form
because it always evaluates to itself. We can think of this expression as
a representation for an infinite loop.
- A remarkable property of lambda calculus is that every expression has a unique
normal form if one exists.
- Lambda calculus is also Church-Rosser, meaning that reductions can be applied in
any order. More formally, if w →* x and w →* y, then there always exists
an expression z such that x →* z and y →* z.
18. Evaluation Strategies
- A subexpression of an expression where a lambda can be applied to an argument is
called a redex (short for reducible expression).
- If there is more than one redex in an expression,
there will be several evaluation orders for an expression.
- Lambda calculus uses two basic evaluation orders: normal order and applicative order.
- Normal order evaluation
- We always reduce the leftmost redex of the outermost redex at each step.
- This corresponds to call by name as in Algol 60.
- If an expression has a normal form, then normal order evaluation will always find it.
- Normal order evaluation is sometimes known as lazy evaluation and is the
usual order of evaluation.
- Applicative order evaluation
- Here we always reduce the leftmost outermost redex whose argument is in normal form.
- This corresponds to call by value as in the programming language C.
- Actual parameters are evaluated before being passed to a function.
Both the function and the argument are reduced before the argument is substituted into
the body of the function.
- Even though an expression may have a normal form, applicative order evaluation
may fail to find it.
- Applicative order is sometimes called eager evaluation.
19. Arithmetic
- The numbers can be represented in lambda calculus starting from zero and using a
successor function "succ(0)" to represent 1, "succ(succ(0))" to represent 2, and so on.
- 0 is represented by the function λs.(λz.z) which we will abbreviate as
λsz.z. (Note that this is a function of two arguments, s and z.)
- 1 is defined as λsz.s(z)
- 2 is defined as λsz.s(s(z))
- 3 is defined as λsz.s(s(s(z))), and so on.
- We can define the successor function as λwyx.y(wyx).
- Applying the successor function to zero, we get
(λwyx.y(wyx))(λsz.z) =
λyx.y((λsz.z)yx) = λyx.y((λz.z)x) = λyx.y(x).
The last expression is just the representation for 1 (with y for s and x for z).
19. Lambda Calculus is Turing-complete.
- We can translate any Turing-machine program into an equivalent
lambda-calculus program and the other way around.
- We can have lambda-calculus expressions that simulate
arithmetic, booleans, logic, loops, data structures, etc.
20. Church Integers
- We will encode multiple arguments by currying:
- λ(x,y).e will be equivalent to λx.(λy.e).
- e(f,g) will be equivalent to (e f)g.
- The integers can be represented in lambda calculus using functions to represent zero,
1, 2, and so on.
- 0 is defined as the function (λs.λz.z).
- 1 is defined as (λs.λz.s z)
- 2 is defined as (λs.λz.s (s z))
- n is represented by a function that applies s
n times to the zero value z.
22. Arithmetic
- We need a successor function succ such that
- succ 0 = 1
succ 1 = 2
succ n = n+1
- Define succ as
- (λn.λs.λz.s (n s z))
- Here n represents the integer whose successor we want.
- Example succ 0:
- (λn.λs.λz.s (n s z)) (λu.λv.v)
→ (λs.λz.s ((λu.λv.v) s z))
→ (λs.λz.s ((λv.v) z))
→ (λs.λz.s z) which represents one.
- Example succ 1:
- (λn.λs.λz.s (n s z)) (λu.λv.u v)
→ (λs.λz.s ((λu.λv.u v) s z))
→ (λs.λz.s ((λv.s v) z))
→ (λs.λz.s (s z)) which represents two.
- Addition
- To add 2 and 3, apply the successor function twice to 3.
- We can define add as
- (λx.λy.x succ y)
- (For clarity, we have used the name succ to represent the successor function.)
23. Logic
- The boolean value true can be represented by a function that
always selects the first argument: λx.λy.x
- The boolean value false can be represented by a function that
always selects the second argument: λx.λy.y
- A conditional such as if c then i else e can be represented
by a function λc.λiλe.cie
- A test-for-zero predicate can be implemented by the function
isZero defined as
- (λn.n (λx.false) true)
- Example: apply isZero to zero
- (λn.n (λx.false) true) (λs.λz.z)
→ (λs.λz.z) (λx.false) true
→* true
- Example: apply isZero to one
- (λn.n (λx.false) true) (λs.λz.s z)
→ (λs.λz.s z) (λx.false) true
→ (λx.false) (λx.false) true
→* false
24. References
- http://www.inf.fu-berlin.de/lehre/WS01/ALPI/lambda.pdf
- http://www.soe.ucsc.edu/classes/cmps112/Spring03/readings/lambdacalculus/project3.html
aho@cs.columbia.edu