COMS W4115
Programming Languages and Translators
Lecture 15: Types
March 23, 2015
Lecture Outline
- Semantic analysis
- Types
- Type systems
- Typing in programming languages
- Type inference rules
- Type conversions
1. Semantic Analysis
- The semantic analyzer uses the information in the syntax tree and the symbol
table to check the source program for semantic consistency with the
language definition.
- Type checking is one important part of semantic analysis. During type
checking we need to check that each operator in the source program has
semantically compatible operands. This type checking may be done at
compile time (static type checking) or at run time (dynamic type checking).
- Run-time storage organization
- Typical subdivision of run-time memory into code and data areas:
(low address) Code
Static data
Run-time heap
↓
Free memory
↑
(high address) Run-time stack
A compiler uses type and other semantic information for a variable x
to answer such questions as:
- What kind of value is stored in
x
?
- How big is
x
?
- What kind of operations can be applied to
x
?
- Who is responsible for allocating space for
x
and where should it be located?
- Who is responsible for initializing
x
?
- How long must the value of
x
be kept?
- If
x
is a procedure, what kinds of arguments does it take,
how are they to be passed, and what
kind of return value does it have?
2. Types
- Types play a central role in the design and implementation of programming languages.
- We can think of a type as a set of properties associated with a data value.
- Most, but not all, programming languages associate types with values.
The data type of a variable determines the values the variable may contain along with
the operations that may be performed on it.
A programming language may have predefined basic data types and rules for defining
additional types.
- ANSI C includes char, a variety of integers and floating point numbers, enumerations,
and void as its basic data types. Derived types that can be constructed from other
types in a variety of ways include:
- arrays of objects of a given type
- functions returning objects of a given type
- pointers to objects of a given type
- structures containing a sequence of objects of various types
- unions capable of containing any one of several objects of various types
- Python has a very rich set of types. Its built-in data types include
numeric types, sequences, sets, and mappings. Its data types can be
distinguished based on whether objects of a given type are mutable
or immutable. The contents of objects of immutable types cannot be
changed after they are created.
Python has an analog of a null pointer called None. None is not really
a null pointer or a null reference but an object of which there is
only one instance. In addition, Python has utilities to assist in
the creation of new types.
- Types often provide an implicit context for operations in a program.
For example, in a C program the operation
+
in the expression
x + y
will
be integer addition
if x
and y
are of type int
, and floating-point
addition if x
and y
are of type float
.
- Types are very useful for catching programming errors by making sure
operators are applied to semantically valid operands. A strong argument
for statically checkable types is that type clashes can be reported
to the programmer at compile time. For example, a Java compiler
can report an error at compile time if
x
and y
are
known to be of type String
in the expression x * y
.
3. Type Systems
- The type of a construct in a program can be denoted by a type expression.
- A type expression is either a basic type (e.g.,
integer
) or
a type constructor applied to a type expression (e.g., a function from an integer
to an integer).
- See ALSU, Section 6.3.1 (pp. 371-372) for a representation for type expressions.
- A type system is a set of rules for assigning type expressions to the
syntactic constructs of a program and for specifying
- type equivalence (when are the types of two values the same),
- type compatibility (when can a value of a given type be used in
a given context), and
- type inference (rules that determine the type of a language construct based
on how it is used).
- Forms of type equivalence
- Name equivalence: two types are equivalent iff they have the same name.
- Structural equivalence: two types are equivalent iff they have the same structure.
- To test for structural equivalence, a compiler must encode the structure
of a type in its representation. A tree (or type graph) is typically
used.
- A type checker makes sure that a program obeys the type-compability rules
of the language.
- We can think about types in several different ways:
- Denotational: a type is a set of values called a domain.
- Constructive: a type is either a primitive type (such as an integer or a character)
or a composite type created by applying a type constructor (such as a structure
or an array) to simpler types.
- Abstraction-based: a type is an interface consisting of a set of operations
with well-defined and mutually consistent semantics.
4. Typing in Programming Languages
- The type system of a language determines whether type checking can be
be performed at compile time (statically) or at run time (dynamically).
- A statically typed language is one in which all constructs of a language can be
typed at compile type. C, ML, and Haskell are statically typed.
- A dynamically typed language is one in which some of the constructs of a language
can only be
typed at run time. Perl, Python, and Lisp are dynamically typed.
- There is no universally agreed-upon definition for the term "strongly typed language."
We will just say an implementation of a language is strong typed if a compiler
guarantees that the programs it accepts will run without type errors.
5. Type Inference Rules
- Type inference rules specify for each operator the mapping between the types
of the operands and the type of the result.
- E.g., result types for
x + y
in C:
+ |
int |
float |
int |
int |
float |
float |
float |
float |
Type inference templates
- We can specify the type inference rule "if expression
e1
has the
type int
and expression e2
has the type int
,
then the expression e1 + e2
has the type int
" with a type inference
template of the form
⊢ e1: int ⊢ e2: int
_______________________
⊢ e1 + e2: int
The turnstile symbol ⊢ is read "it is provable that"
so the template can be read as "if it is provable that e1
has type int
and it is provable
that e2
has type int
,
then it is provable that e1 + e2
has type int
."
Templates of this form provide a compact way of expressing the type
rules of a language.
We say a type system is sound if whenever ⊢ e: T then e evaluates
to a value of type T.
We can apply the type-inference templates by making a bottom-up traversal of the AST.
We determine the types of the leaves using information from the symbol table.
We can then move up the tree determining the type of the interior nodes from the types
of their children by applying the inference rule for the operator at a given
interior node.
Type environments
- A type environment is often needed to determine the type of a variable
at a given node in the AST.
A type environment is just a mapping from variables to types that is stored
in the symbol table.
The type environment for each node can be determined by making
a top-down pass over the AST respecting the type scoping rules of the language.
- The type inference rules are augmented with the type environment information.
For example, if
E
is a type environment, we modify the template to make type
inferences within the context of E
:
E ⊢ e1: int E ⊢ e2: int
____________________________
E ⊢ e1 + e2: int
Static type checking can be done by making a top-down pass to compute the
type environment for each node followed by a bottom-up pass to check the types
at each node.
Operator and function overloading
- In Java the operator
+
can mean addition or string concatenation
depending on the types of its operands.
- We can choose between two versions of an overloaded function by
looking at the types of their arguments.
Function calls
- Compiler must check that the type of each actual parameter is compatible with
the type of the corresponding formal parameter. It must check that the
type of the returned value is compatible with the type of the function.
- The type signature of a function specifies the types of the formal
parameters and the type of the return value.
- Example:
strlen
in C
unsigned int strlen(const char *s);
Type expression:
strlen: const char * → unsigned int
Polymorphic functions
- A polymorphic function allows a function to manipulate data structures
regardless of the types of the elements in the data structure
- Example: Fig. 6.28 (p. 391) -- an ML program for the length of a list
6. Type Conversions
- Implicit type conversions
- In an expression like
f + i
where f
is a float and
i
is an integer a compiler must first convert the integer to a
float before the floating point addition operation is performed. That is, the
expression must be transformed into an intermediate representation like
t1 = INTTOFLOAT i
t2 = x FADD t1
Explicit type conversions
- In C, explicit type conversions can be forced ("coerced") in an expression using a
unary operator called a cast. E.g.,
sqrt((double) n)
converts the
value of the integer n
to a double
before passing it
on to the square root routine sqrt
.
7. Practice Problems
- Give some examples of typeless programming languages.
- The following grammar generates programs consisting of
a sequence of declarations
D
followed by a single expression E
.
Each identifier must be declared before its use.
P → D ; E
D → D ; D | T id
T → int | float | T [ num ]
E → num | id | E [ E ] | E + E
- Construct type expressions as in ALSU, Section 6.3.1 (pp. 371-372)
for the following programs:
- int a; int b; a + b
- float[10][20] a; a[1] + a[2]
- Write pseudcode for a function
sequiv(exp1, exp2)
that will test the
structural equivalence of two type expressions exp1
and exp2
.
Show how your function computes
sequiv(array(2, array(2, int)), array(2, array(3, int))).
8. Reading
- ALSU, Sects. 6.1-6.3, 6.5.
aho@cs.columbia.edu