COMS W4115
Programming Languages and Translators
Lecture 15: Types
March 23, 2015

Lecture Outline

Semantic analysis
Types
Type systems
Typing in programming languages
Type inference rules
Type conversions

1. Semantic Analysis

The semantic analyzer uses the information in the syntax tree and the symbol table to check the source program for semantic consistency with the language definition.
Type checking is one important part of semantic analysis. During type checking we need to check that each operator in the source program has semantically compatible operands. This type checking may be done at compile time (static type checking) or at run time (dynamic type checking).
Run-time storage organization

Typical subdivision of run-time memory into code and data areas:


   (low address) Code
                 Static data
                 Run-time heap
                   ↓

                 Free memory

                   ↑                                 
  (high address) Run-time stack

A compiler uses type and other semantic information for a variable x to answer such questions as:

What kind of value is stored in x?
How big is x?
What kind of operations can be applied to x?
Who is responsible for allocating space for x and where should it be located?
Who is responsible for initializing x?
How long must the value of x be kept?
If x is a procedure, what kinds of arguments does it take, how are they to be passed, and what kind of return value does it have?

2. Types

Types play a central role in the design and implementation of programming languages.
We can think of a type as a set of properties associated with a data value.
Most, but not all, programming languages associate types with values. The data type of a variable determines the values the variable may contain along with the operations that may be performed on it. A programming language may have predefined basic data types and rules for defining additional types.

ANSI C includes char, a variety of integers and floating point numbers, enumerations, and void as its basic data types. Derived types that can be constructed from other types in a variety of ways include:

arrays of objects of a given type
functions returning objects of a given type
pointers to objects of a given type
structures containing a sequence of objects of various types
unions capable of containing any one of several objects of various types

Python has a very rich set of types. Its built-in data types include numeric types, sequences, sets, and mappings. Its data types can be distinguished based on whether objects of a given type are mutable or immutable. The contents of objects of immutable types cannot be changed after they are created. Python has an analog of a null pointer called None. None is not really a null pointer or a null reference but an object of which there is only one instance. In addition, Python has utilities to assist in the creation of new types.
Types often provide an implicit context for operations in a program. For example, in a C program the operation + in the expression x + y will be integer addition if x and y are of type int, and floating-point addition if x and y are of type float.
Types are very useful for catching programming errors by making sure operators are applied to semantically valid operands. A strong argument for statically checkable types is that type clashes can be reported to the programmer at compile time. For example, a Java compiler can report an error at compile time if x and y are known to be of type String in the expression x * y.

3. Type Systems

The type of a construct in a program can be denoted by a type expression.

A type expression is either a basic type (e.g., integer) or a type constructor applied to a type expression (e.g., a function from an integer to an integer).
See ALSU, Section 6.3.1 (pp. 371-372) for a representation for type expressions.

A type system is a set of rules for assigning type expressions to the syntactic constructs of a program and for specifying

type equivalence (when are the types of two values the same),
type compatibility (when can a value of a given type be used in a given context), and
type inference (rules that determine the type of a language construct based on how it is used).

Forms of type equivalence

Name equivalence: two types are equivalent iff they have the same name.
Structural equivalence: two types are equivalent iff they have the same structure.
To test for structural equivalence, a compiler must encode the structure of a type in its representation. A tree (or type graph) is typically used.

A type checker makes sure that a program obeys the type-compability rules of the language.
We can think about types in several different ways:

Denotational: a type is a set of values called a domain.
Constructive: a type is either a primitive type (such as an integer or a character) or a composite type created by applying a type constructor (such as a structure or an array) to simpler types.
Abstraction-based: a type is an interface consisting of a set of operations with well-defined and mutually consistent semantics.

4. Typing in Programming Languages

The type system of a language determines whether type checking can be be performed at compile time (statically) or at run time (dynamically).
A statically typed language is one in which all constructs of a language can be typed at compile type. C, ML, and Haskell are statically typed.
A dynamically typed language is one in which some of the constructs of a language can only be typed at run time. Perl, Python, and Lisp are dynamically typed.
There is no universally agreed-upon definition for the term "strongly typed language." We will just say an implementation of a language is strong typed if a compiler guarantees that the programs it accepts will run without type errors.

5. Type Inference Rules

Type inference rules specify for each operator the mapping between the types of the operands and the type of the result.

E.g., result types for x + y in C:


    
    
      + 
     int
     float
    
    
      int 
      int 
      float 
    
    
     float
     float
     float

Type inference templates

`+`	`int`	`float`
`int`	`int`	`float`
`float`	`float`	`float`

We can specify the type inference rule "if expression e1 has the type int and expression e2 has the type int, then the expression e1 + e2 has the type int" with a type inference template of the form

⊢ e1: int     ⊢ e2: int
_______________________

    ⊢ e1 + e2: int

The turnstile symbol ⊢ is read "it is provable that" so the template can be read as "if it is provable that e1 has type int and it is provable that e2 has type int, then it is provable that e1 + e2 has type int."
Templates of this form provide a compact way of expressing the type rules of a language.
We say a type system is sound if whenever ⊢ e: T then e evaluates to a value of type T.
We can apply the type-inference templates by making a bottom-up traversal of the AST. We determine the types of the leaves using information from the symbol table. We can then move up the tree determining the type of the interior nodes from the types of their children by applying the inference rule for the operator at a given interior node.

Type environments

A type environment is often needed to determine the type of a variable at a given node in the AST. A type environment is just a mapping from variables to types that is stored in the symbol table. The type environment for each node can be determined by making a top-down pass over the AST respecting the type scoping rules of the language.
The type inference rules are augmented with the type environment information. For example, if E is a type environment, we modify the template to make type inferences within the context of E:

E ⊢ e1: int     E ⊢ e2: int
____________________________

    E ⊢ e1 + e2: int

Static type checking can be done by making a top-down pass to compute the type environment for each node followed by a bottom-up pass to check the types at each node.

Operator and function overloading

In Java the operator + can mean addition or string concatenation depending on the types of its operands.
We can choose between two versions of an overloaded function by looking at the types of their arguments.

Function calls

Compiler must check that the type of each actual parameter is compatible with the type of the corresponding formal parameter. It must check that the type of the returned value is compatible with the type of the function.
The type signature of a function specifies the types of the formal parameters and the type of the return value.
Example: strlen in C

Function prototype in C:


    unsigned int strlen(const char *s);

Type expression:


    strlen: const char * → unsigned int

Polymorphic functions

A polymorphic function allows a function to manipulate data structures regardless of the types of the elements in the data structure
Example: Fig. 6.28 (p. 391) -- an ML program for the length of a list

6. Type Conversions

Implicit type conversions

In an expression like f + i where f is a float and i is an integer a compiler must first convert the integer to a float before the floating point addition operation is performed. That is, the expression must be transformed into an intermediate representation like


         t1 = INTTOFLOAT i
         t2 = x FADD t1

Explicit type conversions

In C, explicit type conversions can be forced ("coerced") in an expression using a unary operator called a cast. E.g., sqrt((double) n) converts the value of the integer n to a double before passing it on to the square root routine sqrt.

7. Practice Problems

Give some examples of typeless programming languages.
The following grammar generates programs consisting of a sequence of declarations D followed by a single expression E. Each identifier must be declared before its use.


     P → D ; E
     D → D ; D | T id
     T → int | float | T [ num ]
     E → num | id | E [ E ] | E + E

Construct type expressions as in ALSU, Section 6.3.1 (pp. 371-372) for the following programs:

int a; int b; a + b
float[10][20] a; a[1] + a[2]

Write pseudcode for a function sequiv(exp1, exp2) that will test the structural equivalence of two type expressions exp1 and exp2. Show how your function computes sequiv(array(2, array(2, int)), array(2, array(3, int))).

8. Reading

ALSU, Sects. 6.1-6.3, 6.5.

aho@cs.columbia.edu

COMS W4115 Programming Languages and Translators Lecture 15: Types March 23, 2015

Lecture Outline

1. Semantic Analysis

2. Types

3. Type Systems

4. Typing in Programming Languages

5. Type Inference Rules

6. Type Conversions

7. Practice Problems

8. Reading

COMS W4115
Programming Languages and Translators
Lecture 15: Types
March 23, 2015