

Compiling Esterel into Better Circuits and Faster Simulations

#### **Stephen A. Edwards**

Department of Computer Science, Columbia University

www.cs.columbia.edu/~sedwards

sedwards@cs.columbia.edu

# **Esterel for Hardware Specification**

Why consider Esterel?

- Semantics more abstract than RTL
   More succinct descriptions faster and easier to write
- High-level semantics enable high-level optimizations
   State assignment a hierarchical problem
- High-level semantics enable more efficient simulation
   Semantics are more like an imperative program
- Esterel's semantics are deterministic
   Simulation-synthesis mismatches eliminated

# **Applications of Esterel**

Systems with complex (non-pipelined) control-behavior:

- DMA controllers
- Cache controllers
- Communication protocols

(Not processors)

#### **Verilog More Verbose Than Esterel**

ase (cur\_state) IDLE: begin // synopsys parallel case case if (pcsu\_powerdown & !jmp\_e & !valid\_diag\_window) begin next\_state = STANDBY\_PWR\_DN; end else if (valid\_diag\_window | ibuf\_full | jmp\_e) begin next\_state = cur\_state; end else if(icu\_miss&!cacheable) begin next\_state = NC\_REQ\_STATE ; end else if (icu miss&cacheable) begin next\_state = REQ\_STATE; end else next\_state = cur\_state ; end NC\_REQ\_STATE: begin if(normal ack| error ack) begin next state = IDLE ; end else next\_state = cur\_state ; end REQ\_STATE: begin if (normal\_ack) begin next\_state = FILL\_2ND\_WD; end else if (error\_ack) begin next\_state = IDLE ; end else next\_state = cur\_state ; end FILL\_2ND\_WD: begin if(normal\_ack) begin next\_state = REQ\_STATE2; end else if (error ack) begin next\_state = IDLE ; end else next\_state = cur\_state ; end REQ\_STATE2: begin
 if(normal\_ack) begin next\_state = FILL\_4TH\_WD; end else if (error\_ack) begin next\_state = IDLE ; end else next\_state = cur\_state ; end FILL\_4TH\_WD: begin if(normal\_ack | error\_ack) begin next\_state = IDLE; end else next\_state = cur\_state ; end STANDBY\_PWR\_DN: begin if(!pcsu\_powerdown | jmp\_e ) begin next\_state = IDLE; end else next\_state = STANDBY PWR DN; end default: next\_state = 7'bx;

loop await case [icu miss and not cacheable] do await [normal ack or error ack] end case [icu miss and cacheablel do abort await 4 normal ack; when error ack end case [pcsu powerdown and not jmp e and not valid diag window] do await [pcsu powerdown and not jmp e] end end; pause end

# Why is Esterel More Succinct?

#### Verilog:

```
REQ STATE2: begin
   if(normal ack) begin
   next state = FILL 4TH WD; when error ack
   end
   else if (error ack) begin
   next state = IDLE ;
   end
   else next_state = cur state;
end
```

**Esterel**:

abort await normal ack

- Esterel provides cross-clock control-flow
- State machine logic represented implicitly
- Higher-level constructs like await



# **Generating Fast Circuits**

#### **Basic Circuit Generation**

loop emit A; await C; emit B; pause end





#### **Basic Circuit Generation**

Berry's technique [1992] works, but is fairly inefficient:

 Many combinational redundancies. E.g., present A then emit B end;
 present C then emit D end
 produces two redundant OR gates



Many sequential redundancies
 One flop per pause can be very wasteful
 Touati, Toma, Sentovich, and Berry [1993–1997]
 proposed techniques to eliminate many, but requires
 reachable state space and only works on circuit.

# **Generating Fast Circuits**

Esterel's semantics match hardware. Translation is straightforward.

Nice feature: state space is well-defined and hierarchical (e.g., due to abort and concurrency).

Enables a hierarchical state assignment/synthesis procedure.

# **Translation to CCFG**

every S do loop await I; weak abort sustain R when immediate A; emit O end loop pause; pause; present R then emit A end end end



#### **Translation to PDG**



#### **Translation to Circuitry**





#### A State Assignment Example

```
abort
  Ľ
    await A; await B
    await C
  1
when D;
emit E;
pause;
  await F
 await G
]
```

# **Hierarchical States**







Obvious questions:

- How should each state machine be encoded?
- Should state be shared between the AB/F and C/G machines?

#### **General Problem Statement**

States in an Esterel program an arbitrary tree of sequential and parallel state machines.





- How should  $s_1, \ldots, s_4$  be encoded?
- Should  $s_2$  or  $s_3$  be shared with  $s_4$  or  $s_5$ ?

# **Choosing a Good Encoding**

Goal: The smallest circuit that meets a timing constraint

- 1. Start with largest, fastest circuit (one-hot, no sharing)
- Estimate the slack at each state decision point by estimating how much the delay could be increased at that point while still meeting the timing requirement
- 3. Attempt to share states at the lowest decision point with the largest slack or reencode the widest-fanout decision point with sufficient slack.
- 4. Repeat steps 2–3 until no further gain possible



# **Simulating Esterel**

#### **Automata Compilers**

Esterel is a finite-state language, so build an automata:

loop switch (s) {
 emit A; await C; case 0: A = 1; s = 1; break;
 emit B; pause case 1: if (C) { B = 1; s = 0; } break;
end }

V1, V2, V3 (INRIA/CMA) [Berry, Gonthier 1992]

Fastest known code; great for programs with few states.

Does not scale; concurrency causes state explosion.

#### **Netlist-based Compilers**



A = entry || s2q; cf = !C && s1q; s1d = cf || A; B = s2d = C && s1q;

Clean semantics, scales well, but inefficient.

Can be 100 times slower than automata code.

#### **Discrete-Event Based Compilers**

SAXO-RT [Weil et al. 2000] Divides Esterel program into event functions dispatched by a fixed scheduler.

```
unsigned curr = 0x1;
                   unsigned next = 0;
                   static void f1() {
                     A = 1;
                     curr &= ^{0}x1; next |= 0x2;
loop
                   static void f2() {
 emit A; await C;
                     if (!C) return;
 emit B; pause
                     B = 1;
                     curr &= ^{\circ}0x2; next |= 0x1;
end
                   void tick() {
                     if (curr & 0x1) f1();
                     if (curr & 0x2) f2();
                     curr |= next;
                     next = 0;
```



# **Generating Fast Simulations**

#### **My Previous Technique**

every R do loop await A; emit B; present C then emit D end; pause end | | loop present B then emit C end; pause end end



if ((s0 & 3) == 1) {
 if (s) {
 s3 = 1; s2 = 1; s1 = 1;
 } else
 if (s1 >> 1)
 s1 = 3;
 else {
 if ((s3 & 3) == 1) {
 s3 = 2; t3 = L1;
 } else {
 t3 = L2;
 }
 C code

EsterelConcurrent SequentialSourceCFGCFGCFG

# **My Previous Technique**

- 1. Translate Esterel into a concurrent control-flow graph
- 2. Analyze static data dependencies
- 3. Schedule
- 4. Generate sequential control-flow graph by inserting context-switching code
- 5. Translate to C

## **Comments on Previous Technique**

Much more efficient (can be  $100 \times$ ) than netlist simulation.

- Currently used within Synopsys' CoCentric System Studio for control-code generation.
- Context-switching idea powerful, but does not have much insight into program behavior.
- Adheres too closely to control dependencies; many more opportunities to reorder code and simplify scheduling.

# **New Technique**

- 1. Translate Esterel into a concurrent control-flow graph
- 2. Transform into Program Dependence Graph
- 3. Analyze static data dependencies
- 4. Insert control predicates to enable scheduling
- 5. Schedule
- 6. Generate sequential control-flow graph
- 7. Translate to C

#### **A Code-Generation Example**



# Concurrent Control-Flow and the PDG





#### **Splitting the PDG for Scheduling**



#### **Generating Code**

