

# **Compiling Esterel** into **Better Circuits** and **Faster Simulations**

#### Stephen A. Edwards

Department of Computer Science, Columbia University www.cs.columbia.edu/~sedwards sedwards@cs.columbia.edu

#### Concurrency



- Parallel statements start in same cycle
- Block terminates once all have terminated

# **Esterel for Hardware Synthesis**

#### An Overview of Esterel

Synchronous model of time: implicit global clock

Communication through wire-like signals

Two flavors of statement:

#### Combinational

#### Sequential

Execute in one cycle

Take multiple cycles

emit

present / if loop

pause await sustain

#### The Abort Statement



#### **Esterel for Hardware Specification**

Why consider Esterel?

- Semantics more abstract than RTL More succinct descriptions faster and easier to write
- High-level semantics enable high-level optimizations State assignment a hierarchical problem
- High-level semantics enable more efficient simulation Semantics are more like an imperative program
- · Esterel's semantics are deterministic Simulation-synthesis mismatches eliminated

## **Sequencing and Decisions**

```
emit A;
emit B;
pause;
loop
  present C then emit D end;
 present E then emit F end;
 pause;
end
```

#### **The Suspend Statement**



#### **Applications of Esterel**

Systems with complex (non-pipelined) control-behavior:

- DMA controllers
- Cache controllers
- · Communication protocols

(Not processors)



#### **Verilog More Verbose Than Esterel**

```
case (cur_state) // symopsys parallel_
IDDM: begin // symopsys parallel_
if (volume.codemon a type_a a con-
trol // symopsys con-
late if (volume.code) begin
rasc_state = STANNET_POR_EN;
also if (volid_diag vindow) lbuf_full |
rosc_state = cur_state;
and
                                                                        loop
                                                                                  await
                                                                                        case [icu_miss and
                                                                                                                not cacheable] do
    Take _taxts = cur_state)

also if(icu_minssiconebable) begin

next_state = NC_NEGC_STATE ;

end

else if (icu_minssiconebable) begin

next_state = NC_NEGC_STATE)

else next_state = cur_state ;

dd
                                                                                                 await [normal_ack or error_ack]
                                                                                                               [icu_miss and
   SC_SEQ_STATE: begin
   if(normal_ack| error_ack) begin
   next_state = IDLE;
end
   else   next_state = cur_state;
end
                                                                                                               cacheable] do
                                                                                                 abort
     g_STATE: begin
if (normal_ack) begin
next_state = FILL_20D_ND;
                                                                                                        await 4 normal_ack;
    next state = FILL 20D MD;
end
else if (error ack) begin
next state = IDLR;
end
else next state = cur state
                                                                                                 when error_ack
                                                                                          end
  FILL 20D WD: begin
if(normal_ack) begin
next_state = EEQ_STATE2;
                                                                                          case [pcsu powerdown and
    else if (error_ack) begin

next_state = IDLE;

end

else next_state = cur_state
                                                                                                                 not imp e and
                                                                                                                not valid_diag_window] do
   NEO STATES; begin
if(normal_ack) begin
next_state = FILL_4TH_MD;
end else if (error_ack) begin
next_state = IDLE;
end
else next_state = cur_state;
end
                                                                                                 await [pcsu_powerdown and
                                                                                                                            not jmp_e]
                                                                                        end
  FILL 4TH_MD: begin
if(normal_ack| error_ack) begin
next_state = IDLR;
end
else next_state = cur_state;
                                                                                  end:
                                                                                 pause
    TANDET_PUR_DN: begin

if(!posu_powerdown | jmp_e ) begin

next_state = IDLE;

end

else next_state = STANDET_PUR_DN;
    efault: next state = 7'bx;
```

#### **Basic Circuit Generation**

```
loop
emit A; await C;
emit B; pause
end

A

B
```

#### **A State Assignment Example**

```
abort

[
   await A; await B
||
   await C
]
when D;
emit E;
pause;
[
   await F
||
   await G
]
```

## Why is Esterel More Succinct?



- Esterel provides cross-clock control-flow
- State machine logic represented implicitly
- · Higher-level constructs like await

#### **Basic Circuit Generation**

Berry's technique [1992] works, but is fairly inefficient:

 Many combinational redundancies. E.g., present A then emit B end; present C then emit D end produces two redundant OR gates



Many sequential redundancies
 One flop per pause can be very wasteful
 Touati, Toma, Sentovich, and Berry [1993–1997]
 proposed techniques to eliminate many, but requires reachable state space and only works on circuit.

#### **Hierarchical States**

```
abort
[
await A; await B
await C
]
when D;
emit E;
pause;
[
await F
|
await G
]
```



#### **Generating Fast Circuits**

Esterel's semantics match hardware. Translation is straightforward.

Nice feature: state space is well-defined and hierarchical (e.g., due to abort and concurrency).

Enables a hierarchical state assignment/synthesis procedure.

# Five Simple FSMs



#### **Five Simple FSMs**



Obvious questions:

- How should each state machine be encoded?
- Should state be shared between the AB/F and C/G machines?

#### **Choosing a Good Encoding**

Goal: The smallest circuit that meets a timing constraint

- 1. Start with largest, fastest circuit (one-hot, no sharing)
- 2. Estimate the slack at each state decision point by estimating how much the delay could be increased at that point while still meeting the timing requirement
- 3. Attempt to share states at the lowest decision point with the largest slack or reencode the widest-fanout decision point with sufficient slack.
- 4. Repeat steps/2–3 until no further gain possible

#### **Netlist-based Compilers**



#### **General Problem Statement**

States in an Esterel program an arbitrary tree of sequential and parallel state machines.



# **Simulating Esterel**

#### **Discrete-Event Based Compilers**

SAXO-RT [Weil et al. 2000] Divides Esterel program into event functions dispatched by a fixed scheduler.

```
unsigned curr = 0x1;
                   unsigned next = 0;
                   static void f1() {
                     A =/1:
                     curr &= ~0x1; next |= 0x2;
loop
                   static void f2() {
emit A; await C;
                     if (!C) return;
emit B; pause
                     curr &= ~0x2; next |= 0x1;
end
                   void tick() {
  if (curr & 0x1) f1();
                     if (curr & 0x2) f2();
                     curr |= next;
                     next = 0;
```

#### **Choosing an Encoding**



- How should  $s_1, \ldots, s_4$  be encoded?
- Should  $s_2$  or  $s_3$  be shared with  $s_4$  or  $s_5$ ?

#### **Automata Compilers**

Esterel is a finite-state language, so build an automata:

```
loop
                     switch (s) {
 emit A; await C;
                     case 0: A = 1; s = 1; break;
emit B; pause
                     case 1. if (C) { B = 1; s = 0; } break;
end
```

V1, V2, V3 (INRIA/CMA) [Berry, Gonthier 1992]

Fastest known code; great for programs with few states.

Does not scale; concurrency causes state explosion.

## **Generating Fast Simulations**



## **My Previous Technique**



#### **Comments on Previous Technique**

Much more efficient (can be  $100\times$ ) than netlist simulation.

Currently used within Synopsys' CoCentric System Studio for control-code generation.

Context-switching idea powerful, but does not have much insight into program behavior.

Adheres too closely to control dependencies; many more opportunities to reorder code and simplify scheduling.

# Concurrent Control-Flow and the PDG



## **My Previous Technique**

- 1. Translate Esterel into a concurrent control-flow graph
- 2. Analyze static data dependencies
- 3. Schedule
- Generate sequential control-flow graph by inserting context-switching code
- 5. Translate to C

# New Technique

- 1. Translate Esterel into a concurrent control-flow graph
- 2. Transform into Program Dependence Graph
- 3. Analyze static data dependencies
- 4. Insert control predicates to enable scheduling
- 5. Schedule
- 6. Generate sequential control-flow graph
- 7. Translate to C

## **Splitting the PDG for Scheduling**



## Average Cycle Times on an UltraSparc-II



## A Code-Generation Example



#### **Generating Code**



## **Summary**

Introduction to Esterel

Synchronous, Concurrent, Textual Language

Combinational and Sequential Statements

Esterel for Hardware Synthesis

Higher-level, deterministic

More succinct than Verilog

Generating Fast Circuits

Existing translation produces many redundancies

Hierarchical state assignment problem

## **Summary**

Simulating Esterel

Automata, Netlist, Discrete-Event, and Context-switching techniques

New Technique: Transform to PDG, insert predicates, schedule, generate code