| 31 |         | 26 2 | :5    | 21 | 20   |       | 16 | 15    | 11     | 10         | 6   | 5     |          |
|----|---------|------|-------|----|------|-------|----|-------|--------|------------|-----|-------|----------|
|    | LW      |      | REG 1 |    |      | REG 2 |    |       | LOAD A | DDRESS     |     |       | OFFSE    |
| 31 |         | 26 2 | :5    | 21 | 20   |       | 16 | 15    | 11     | 10         | 6   | 5     |          |
|    | SW      |      | REG 1 |    |      | REG 2 |    |       | STORE  | ADDRESS    |     |       | OFFSE    |
| 31 |         | 26 2 | 5     | 21 | 20   |       | 16 | 15    | 11     | 10         | 6   | 5     |          |
|    | R-TYPE  |      | REG 1 |    |      | REG 2 |    |       | DST    | SHIFT AMOU | NT  | ADD/. | AND/OR/S |
| 31 | :       | 26 2 | 15    | 21 | 20   |       | 16 | 15    | 11     | 10         | 6   | 5     |          |
|    | BEQ/BNE |      | REG 1 |    |      | REG 2 |    |       | BRANC  | H ADDRESS  |     |       | OFFSE    |
| 31 |         | 26 2 | :5    | 21 | 20   |       | 16 | 5 1 5 | П      | 10         | 6   | 5     |          |
|    | JUMP    |      |       |    | JUMP |       |    |       |        | ADD        | RES | s     |          |

| LW:                      | SW:                      | R-Type:                        | BR-Type:                   | ЈМР-Тур |
|--------------------------|--------------------------|--------------------------------|----------------------------|---------|
| . READ INST              | 1. READ INST             | 1. READ INST                   | 1. READ INST               | 1. READ |
|                          |                          |                                |                            | INST    |
| . READ REG 1             | 2. READ REG 1            | 2. READ REG 1                  | 2. READ REG 1              | 2.      |
| READ REG 2               | READ REG 2               | READ REG 2                     | READ REG 2                 |         |
| 5. ADD REG 1 +<br>OFFSET | 3. ADD REG 1 +<br>OFFSET | 3. OPERATE on<br>REG 1 / REG 2 | 3. SUB REG 2<br>from REG 1 | 3.      |
| . READ MEM               | 4. WRITE MEM             | 4.                             | 4.                         | 4.      |
| . WRITE REG2             | 5.                       | 5. WRITE DST                   | 5.                         | 5.      |

















| Have compiler quarantee no bazards                                                                                                         |   |
|--------------------------------------------------------------------------------------------------------------------------------------------|---|
| Where do we insert the "no-ops" ?                                                                                                          |   |
| sub \$2, \$1, \$3                                                                                                                          |   |
| and \$12, \$2, \$5                                                                                                                         |   |
| or \$13, \$6, \$2                                                                                                                          |   |
| add \$14, \$2, \$2                                                                                                                         |   |
| sw \$15, 100(\$2)                                                                                                                          |   |
| Problem: this really slows us down!                                                                                                        |   |
| <ul> <li>Also, the program will always be slow even if a techniques like<br/>forwarding is employed afterwards in newer version</li> </ul> | ÷ |
| Hardware can detect dependencies and insert no-ops in hardware                                                                             |   |
| <ul> <li>Hardware detection and no-op insertion is called stalling</li> </ul>                                                              |   |
| - This is a bubble in pipeline and waste one cycle at all stages                                                                           |   |
| <ul> <li>Need two or three bubbles between write and read of a registe</li> </ul>                                                          | r |





















| Improving Performance                                                         |    |  |
|-------------------------------------------------------------------------------|----|--|
| Try and avoid stalls! Fig. reorder these instructions:                        |    |  |
|                                                                               |    |  |
| 1w \$t0, 0(\$t1)                                                              |    |  |
| IW \$t2, 4(\$t1)                                                              |    |  |
| sw \$t2, 0(\$t1)<br>sw \$t0, 4(\$t1)                                          |    |  |
| Add a "hranch delay slot"                                                     |    |  |
| <ul> <li>the next instruction after a branch is always executed</li> </ul>    |    |  |
| - the next instruction after a branch is always executed                      |    |  |
| <ul> <li>rely on complier to "fill" the slot with something useful</li> </ul> |    |  |
| Superscalar: start more than one instruction in the same cycle                |    |  |
|                                                                               |    |  |
|                                                                               |    |  |
|                                                                               |    |  |
|                                                                               |    |  |
|                                                                               | 21 |  |
|                                                                               | 21 |  |















## **Execution Time**

- · Time of n instructions depends on
  - Number of instructions n
  - # of stages k
  - # of control hazard and penalty of each step
- # of data hazards and penalty for each
- Time = n + k 1 + load hazard penalty + branch penalty
- Load hazard penalty is 1 or 0 cycle
- depending on data use with forwarding
- branch penalty is 3, 2, 1, or zero cycles depending on scheme

29

