## CSEE W3827 Fundamentals of Computer Systems Homework Assignment 6

Prof. Martha A. Kim Columbia University Due December 2, 2014 at **10:10**.

Write your name and UNI on your solutions

Show your work for each problem; we are more interested in how you get the answer than whether you get the right answer.

Many of the problems in this assignment will involve analysis of the execution of this implementation of countlocalminima. It is derived from the HW#4 solution, modified slightly to use only instructions supported by our MIPS CPU implementations.

| i0:       | addi | \$al, \$al, -1       |
|-----------|------|----------------------|
| i1:       | add  | \$al, \$al, \$al     |
| i2:       | add  | \$al, \$al, \$al     |
| i3:       | add  | \$al, \$al, \$a0     |
| i4:       | addi | \$a0, \$a0, 4        |
| i5:       | add  | \$v0, \$0, \$0       |
| i6:       | addi | \$t3, \$0, 2         |
| i7_top:   | beq  | \$a0, \$a1, i19_done |
| i8:       | lw   | \$t0, -4(\$a0)       |
| i9:       | lw   | \$t1, 0(\$a0)        |
| i10:      | lw   | \$t2, 4(\$a0)        |
| i11:      | slt  | \$t0, \$t1, \$t0     |
| i12:      | slt  | \$t2, \$t1, \$t2     |
| i13:      | add  | \$t0, \$t0, \$t2     |
| i14:      | beq  | \$t0, \$t3, i17_inc  |
| i15_adv:  | addi | \$a0, \$a0, 4        |
| i16:      | beq  | \$0, \$0, i7_top     |
| i17_inc:  | addi | \$v0, \$v0, 1        |
| i18:      | beq  | \$0, \$0, i15_adv    |
| i19_done: |      |                      |

 (20 pts.) Imagine how countlocalminima would execute on a fully bypassed 5-stage MIPS pipeline (i.e., branches resolved in D, forwarding from W-E, M-E, and M-D). List all pairs of instructions between which one or more bubbles would occur. If a bubble occurs between i3 and i4, then you should write i3 → i4. (HINT: Think systematically through all scenarios that result in an empty slot in the pipeline.)

- 2. (20 pts.) Now list where in the program (still countlocalminima on the fully bypassed 5-stage pipe) data operands would be forwarded, and which forwarding path would be used. If i3 forwards the future value of \$a0 to i4 using the M-E forwarding path, write \$a0, i3  $\rightarrow$  i4, M-E, per the example below. (HINT: Think through the values consumed by each instruction systematically.)
  - \$a1 for i0, as producer unknown, assuming no forward
  - \$a1, i0 → i1, M-E

3. (15 pts.) Assuming a very large array such that the repeating loop body dominates the execution of the code snippet, what is the CPI of countlocalminima code? Assume that the beq in i14 is taken 20% of the time.

4. (10 pts.) Assuming an array of length 6 found at address 0x0000C0D8, list the stream of addresses referenced by countlocalminima.

5. (15 pts.) For each of the caches listed below, show how a 32-bit addresses breaks into *tag*, *set index*, and *byte offset* fields.

Cache A: 1024B, 2-way set-associative, 16B lines

Cache B: 4096B, direct-mapped, 16B lines

6. (15 pts.) Assuming the address references to an initially empty 2-level cache hierarchy with Cache A as the L1 and Cache B as the L2, fill in the table below indicating the set for each reference and whether it resulted in a hit or a miss. If there is no access, just mark the squares with a "-".

| Address    | L1 (Cache A)<br>Set Result | L2 (Cache B)<br>Set Result |
|------------|----------------------------|----------------------------|
| 0x0000C0DE |                            |                            |
| 0x0000C0E2 |                            |                            |
| 0x0000C0E6 |                            |                            |
| 0x0000C0EC |                            |                            |
| 0x0000C0DE |                            |                            |
| 0x0000C0E2 |                            |                            |

7. (5 pts.) Assume that Cache A's access time is 10ns with a miss rate of 10%, Cache B's access time is 500ns with a miss rate of 20%, and Memory's access time is 5000ns. What is the expected access time for the overall cache hierarchy?