COMS W4115
Programming Languages and Translators
Lecture 2: January 27, 2014
Basic Elements of Programming Languages
Overview
- Language processing tools
- Language specifications
- Language categories
- Language design
- Language project
1. Language Processing Tools
- Basic compiler
- Interpreter
- Bytecode interpreter
- Just-in-time compiler
- Linker and loader
- Preprocessor
- For details see ALSU Section 1.1.
2. A Language Specification Defines
- the representation of programs
- the syntax and constraints of the language
- the semantic rules for interpreting programs
- the representation of input data to be processed by programs
- the representation of output data produced by programs
- other restrictions on programs (such as what makes a program portable
across different implementations and platforms)
3. Language Categories
- Imperative
- A program in an imperative languages specifies how a computation is to be done.
- Example: C is a prototypical imperative language.
- Declarative
- A program in a declarative language specifies what computation is to be done.
- Examples: The functional language Haskell and the domain specific language
Yacc are declarative languages.
- Object oriented
- An object-oriented program does its computation with interacting objects.
- C++ and Smalltalk are object-oriented languages.
- Functional
- A functional language is one whose computation model is based on the lambda calculus.
- Haskell is currently considered one of the purest functional languages.
- Scripting
- A scripting language is an interpreted language with high-level operators for
"gluing" computations together.
- AWK and Perl are scripting languages.
- Multi-paradigm
- Many modern languages support more than one programming model.
- Example: Python and Ruby are multi-paradigm languages that support object-oriented
and functional programming.
4. Language Design Issues
- The design of a programming language is an exercise in engineering tradeoffs.
In 1973 Tony Hoare stated that there are so many important but conflicting criteria
in language design that their reconciliation and satisfaction is a major engineering
task.
Here are just some of the elements of a programming language that need to be considered
in the design of a language.
- Domain of application
- What class of problems is the language designed to solve?
- For example, C was initially designed as a general-purpose programming language
for building the Unix operating system.
- Programming model
- What is the programming model of computation encapsulated
into the programming language.
- For example, C is an imperative language, designed around the
von Neumann model of computation.
- Program structure
- What is the basic translation unit for a program?
- In C, a translation unit is a sequence of function definitions and declarations
stored in a file.
- Character set and lexical conventions
- What are the source and target character sets? What are the lexical units of a source program?
- The character set of C source programs is contained within seven-bit ASCII.
- A token is a meaningful sequence of characters in a source program.
- C has six classes of tokens: identifiers, keywords, constants, string literals,
operators, and separators.
- Names, scopes, bindings, and lifetimes
- A name is a string of characters used to identify some element in a program.
- In C identifiers are named with sequences of letters (here, underscore is considered a letter) and digits.
The first character of an identifier name must be a letter. At least the first 31 characters in an
identifier name are significant.
- The scope of a name is the region of the program in
which it is known (visible).
- A binding is an association between two things such as between a
variable and its type or between a symbol and the operation it represents.
The time at which this association is determined is called
the binding time. Bindings can take place at various times ranging
from language design time to run time.
- The lifetime of a variable is the time during which the variable
is bound to a specific memory location.
- Memory management
- One important function of a programming language is to provide facilities
for managing memory either implicitly or explicitly.
- C provides three ways to allocate memory:
- Static allocation where space for an object is provided before the program begins to execute.
- Automatic allocation where temporary objects are stored by the compiler on the runtime stack.
This space is automatically freed by the compiler after the block in which the object is
declared is exited.
- Dynamic allocation where blocks of memory of arbitrary size can be requested
by a programmer using runtime library functions such as malloc from a region of
memory called the heap. These blocks
persist until freed by a call to a runtime library function such as free.
- Data types and operators
- A data type defines a set of data values and the operations allowed
on those values.
- C has a small number of basic types, including
char, int, double, float,
enum, void
.
- C has a potentially infinite number of recursively
defined derived types
such as arrays of objects of some type, functions returning objects of
some type, pointers to objects of some type, structures containing a sequence
of objects of various types, and unions containing any one of several
objects of various types.
- C has a rich set of arithmetic, relational, logical, and assignment operators.
- Expressions and assignment statements
- Expressions are the primary means for specifying computations
in a programming language.
- Assignment statements are basic constructs in imperative programming languages.
Assignment statements allow the programmer to dynamically change
the bindings of values to variables.
- Control flow
- Flow of control refers to the sequence in which the operations specified in
a program are executed at run time. There are flow-of-control issues at
many levels such as flow of control within
expressions, among statements, and among program units.
Most programming languages have control statements and other control
structures for controlling the flow of control within a program.
- C has a variety of flow-of-control constructs such as blocks and
control statements such as
if-else, switch, while, for,
do-while, break, continue
and goto
.
- Functions and process abstraction
- Functions are perhaps the most important building blocks of programs.
Functions are often called procedures, subroutines, or subprograms.
Functions break large computing tasks into smaller ones and facilitate
code reuse. Functions are such an important topic in programming
languages that we will talk about them in much more detail later in
this course.
- Data abstraction and object orientation
- Data abstraction in the form of abstract data types was introduced
into programming languages after process abstraction. The programming
language Simula67 was instrumental
in motivating constructs for supporting object-oriented programming
in modern programming languages such as C++, C#, Java, and Smalltalk.
Object orientation is characterized by encapsulation, polymorphism,
inheritance, and dynamic binding.
- Concurrency
- Concurrent execution of programs has assumed much more importance with
the widespread deployment of multi-core and many-core processors.
- Concurrency
in software execution can occur many levels of granularity: instruction,
statement, subprogram, and program.
- Concurrency can be achieved with libraries (like MPI for Fortran, pthreads
for C) or with direct language support (as in Cilk, X10).
- However, effective exploitation of concurrency is still an open
research area in software.
- Exception and event handling
- Many languages have facilities for reacting to run-time error conditions.
C++ has the
try-catch
construct to catch exceptions raised by the
throw
statement.
- Event handling is like exception handling in that an event handler is called
by the occurence of an event. Implementing reactions to user interactions
with GUI components is a common application of event handling.
- Usability issues
- A good language design should promote readability, writability, and efficiency.
5. Language Project
- Project description
- Form a team of five by February 3, 2014.
- Create an innovative (Turing-complete) little language of your own design.
- Write a compiler or interpreter for your language.
- Each team member must write at least 500 lines of
code of the compiler or interpreter.
- Project constitutes 40% of final grade.
- Project team: elect one person to serve each of the following functions.
- Project manager
- This person sets the project schedule, holds weekly meetings
with the entire team, maintains the project log, and makes
sure the project deliverables get done on time.
- Language and tools guru
- This person defines the baseline process to track language changes
and maintain the intellectual integrity of the language.
- This person teaches the team how to use various tools
used to build the compiler.
- System architect
- This person defines the compiler architecture, modules, and
interfaces.
- System integrator
- This person defines the system integration environment
and makes sure the compiler components work together.
- Tester and validator
- This person defines the test suites and executes them
to make sure the compiler meets the language specification.
- Project due dates and deliverables:
- Feb. 26: Language white paper (written by entire team, 3-4 pages).
- Mar. 26: Language tutorial (written by entire team, 15-20 pages).
- Chapter 1 of Kernighan and Ritchie is a good model of a
language tutorial.
- Describe a few representative programs that illustrate the
nature and scope of your language.
- A "hello, world" program is de rigueur.
- Mar. 26: Language reference manual (written by entire team, 20-25 pages).
- Appendix A of Kernighan and Ritchie is a good model.
- Give a complete description of the lexical and syntactic
structure of your language.
- Include a full grammar for your language.
- May 12-14: Working compiler and demo in Computer Science Conference Room.
- May 12-14: Final project report due at project demo.
6. Practice Problems
- Describe the von Neumann model of computation (computer architecture).
- Compare C and Python with regard to their (a) programming model and
(b) primitive data types.
7. Reading Assignment
8. References
- Brian Kernighan and Dennis Ritchie, The C Programming Language,
2nd edition, Prentice Hall, 1988.
This is the classic reference on ANSI C (C89).
-
Python Tutorial
- Robert W. Sebesta, Concepts of Programming Languages,
tenth edition, Pearson, 2012.
An easy-to-read book describing the important constructs found in many modern programming languages.
- C. A. R. Hoare, "Hints on Programming Language Design,"
Proceedings ACM SIGACT/SIGPLAN Conference on Principles of Programming Languages, 1973.
aho@cs.columbia.edu