# 4840 Systolic Array Based on FPGA

Speaker: Hang Ye(hy2891), Yu Jia(yj2839), Siyuan Li(sl5590), Sicheng Hua(sh4605)

# Contents

1. Block Diagram 2. Data Flow 3. Synthesize 4. Simulation 5. On Board Test

# Block Diagram



# Control FSM & Data Flow



Figure 4 : FSM Diagram



Figure 5 : Data Timing Diagram

# Synthesize

| portname   | Avalon Bus interface                                          |
|------------|---------------------------------------------------------------|
| addr       | address                                                       |
| data_in    | writedata<br>0 : imgsize<br>1 : weight_data<br>2 : input_data |
| data_out   | readdata<br>3 : done<br>4 : output_data                       |
| write      | write                                                         |
| read       | read                                                          |
| clk        | clock                                                         |
| reset      | reset                                                         |
| chipselect | chipselect                                                    |



Pictures for hardware synthesize

# **Final Connection**

#### 🚛 System Contents 🐹 | Address Map 🐹 | Interconnect Requirements 🐹 |

System: soc\_system Path: top\_0.avalon\_slave\_0

| Use | Connections                                | Name                                                              | Description                                                                        | Export                                                | Clock                          | Base        | End         | IRQ | Tags | Opcode Name |
|-----|--------------------------------------------|-------------------------------------------------------------------|------------------------------------------------------------------------------------|-------------------------------------------------------|--------------------------------|-------------|-------------|-----|------|-------------|
| Y   | Clk_in<br>clk_in_reset<br>clk<br>clk reset |                                                                   | Clock Source<br>Clock Input<br>Reset Input<br>Clock Output<br>Reset Output         | clk<br>reset<br>Double-click to<br>Double-click to    | exported                       |             |             |     |      |             |
| V   |                                            | □                                                                 | Arria V/Cyclone V Hard Proce<br>Clock Output<br>Conduit<br>Conduit<br>Reset Output | Double-click to<br>hps_ddr3<br>hps<br>Double-click to | hps_0_h2                       |             |             |     |      |             |
|     |                                            | h2f_axi_clock<br>h2f_axi_master<br>f2h_axi_clock                  | Clock Input<br>AXI Master<br>Clock Input                                           | Double-click to<br>Double-click to<br>Double-click to | clk_0<br>[h2f_axi<br>clk_0     |             |             |     |      |             |
|     |                                            | f2h_axi_slave<br>h2f_lw_axi_clock<br>h2f_lw_axi_master<br>E top 0 | AXI Slave<br>Clock Input<br>AXI Master<br>TOP                                      | Double-click to<br>Double-click to<br>Double-click to | [f2h_axi<br>clk_0<br>[h2f_lw_a |             |             |     |      |             |
|     | • • • • • • • • • • • • • • • • • • •      | clock<br>reset<br>avalon slave 0                                  | Clock Input<br>Reset Input<br>Avalon Memory Mapped Slave                           | Double-click to<br>Double-click to                    | clk_0<br>[clock]<br>[clock]    | 0x0000 0000 | 0x0000 000f |     |      |             |

# Simulation Result



Figure 6: Pattern Comparison

| Θ | 0 | 0 | 127 | 127 | 127 | 0 | 0 | Θ | 127 | 127 | 127 | Θ | 0 | 0 | 127 |
|---|---|---|-----|-----|-----|---|---|---|-----|-----|-----|---|---|---|-----|
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 |   |   | 127 | 127 | 127 |   |   |   | 127 | 127 | 127 |   |   |   | 127 |
| 0 | 0 | Θ | 127 | 127 | 127 | Θ | Θ | 0 | 127 | 127 | 127 | 0 | 0 | 0 | 127 |

Figure 7: Pattern 2 Input Data

| _ |       |       |   |     |     |   |       |       |   |     |     |   |       |  |
|---|-------|-------|---|-----|-----|---|-------|-------|---|-----|-----|---|-------|--|
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
| 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 | 65028 | 0 | 508 | 508 | 0 | 65028 |  |
|   |       |       |   |     |     |   |       |       |   |     |     |   |       |  |

Figure 8: Pattern 2 SV Output Data



- Max Abs Error : 0
- Mean Abs Error : 0.00
- Match (zero diff) : 100.00%

# [Compare] Pattern 2

- Max Abs Error : 0
- Mean Abs Error : 0.00
- Match (zero diff) : 100.00%

# [Compare] Pattern 3

- Max Abs Error : 0
- Mean Abs Error : 0.00
- Match (zero diff) : 100.00%



Figure 9: Verification Results

### On board test



IO read is a big problem, since after we tested, the read signal will last for more than 1 cycle. So, we need to latch the counter even when the read is high.

```
printf("Reading output_data:\n");
for (i = 0; i < 14*14; i++) {
    output_data[i] = read_output_data();
    printf(" output_data[%d] = %d\n", i, output_data[i]);
```

#### // Verify output\_data

For the golden model, since the quantize scaling happens all over the computation, we need to scale back to do the comparison.