## Pipelining

- Reconsider the data path we just did
- Each instruction takes from 3 to 5 clock cycles
- However, there are parts of hardware that are idle many time
- We can reorganize the operation
  Make each hardware block independent
- Make each hardware block independent – 1. Instruction Fetch Unit
- 1. Instruction Fetch UI
   2. Register Read Unit
- 2. Register Re
   3. ALU Unit
- 3. ALU Unit
- 4. Data Memory Read/Write Unit
- 5. Register Write Unit
- Units in 3 and 5 cannot be independent, but operations can be
- Let each unit just do its required job for each instruction
- If for some instruction, a unit need not do anything, it can simply perform a noop

1

3



- Improve performance by increasing instruction throughput
- Ideal speedup is number of stages in the pipeline
- Do we achieve this? No, why not?



# Pipelining

#### What makes it easy

- all instructions are the same length
- just a few instruction formats
- memory operands appear only in loads and stores
- · What makes it hard?
  - structural hazards: suppose we had only one memory
  - control hazards: need to worry about branch instructions
  - data hazards: an instruction depends on a previous instruction
- We'll study these issues using a simple pipeline
  - Other complication:
  - exception handling
  - trying to improve performance with out-of-order execution, etc.







## **Execution Time**

- Time of n instructions depends on
  - Number of instructions n
  - # of stages k
  - # of control hazard and penalty of each step
  - # of data hazards and penalty for each
- Time = n + k 1 + load hazard penalty + branch penalty
  Load hazard penalty is 1 or 0 cycle
  - depending on data use with forwarding
- branch penalty is 3, 2, 1, or zero cycles depending on scheme

#### **Design and Performance Issues With Pipelining** Pipelined processors are not EASY to design ٠ Technology affect implementation . Instruction set design affect the performance, i.e., beq, bne . More stages do not lead to higher performance \_\_\_\_\_ 3.0 2.5 performance 2.0 1.6 Relative 1.0 0.5 0.0 16 4 Pipeline depth 8



7

| 31 | 2       | 6 25 |       | 21 | 20   |       | 16 | 15 | 11     | 10   | 6         | 5  |             |
|----|---------|------|-------|----|------|-------|----|----|--------|------|-----------|----|-------------|
|    | LW      |      | REG 1 |    |      | REG 2 |    |    | LOAD A | DDR  | ESS       |    | OFFSET      |
| 31 | 2       | 6 25 |       | 21 | 20   |       | 16 | 15 | 11     | 10   | 6         | 5  |             |
|    | SW      |      | REG 1 |    |      | REG 2 |    |    | STORE  | ADD  | RESS      |    | OFFSET      |
| 31 | 2       | 6 25 |       | 21 | 20   |       | 16 | 15 | 11     | 10   | 6         | 5  |             |
|    | R-TYPE  |      | REG 1 |    |      | REG 2 |    |    | DST    | SHI  | FT AMOUNT | AD | D/AND/OR/SL |
| 31 | 2       | 6 25 |       | 21 | 20   |       | 16 | 15 | 11     | 10   | 6         | 5  |             |
|    | BEQ/BNE |      | REG 1 |    |      | REG 2 |    |    | BRANC  | H AD | DRESS     |    | OFFSET      |
| 31 | 2       | 6 25 |       | 21 | 20   |       | 16 | 15 | 11     | 10   | 6         | 5  |             |
|    | JUMP    |      |       |    | JUMI | P     |    |    |        |      | ADDRES    | s  |             |

| LW:                      | SW:                      | R-Type:                        | BR-Type:                   | JMP-Type |
|--------------------------|--------------------------|--------------------------------|----------------------------|----------|
| I. READ INST             | 1. READ INST             | 1. READ INST                   | 1. READ INST               | 1. READ  |
|                          |                          |                                |                            | INST     |
| 2. READ REG 1            | 2. READ REG 1            | 2. READ REG 1                  | 2. READ REG 1              | 2.       |
| READ REG 2               | READ REG 2               | READ REG 2                     | READ REG 2                 |          |
| 3. ADD REG 1 +<br>OFFSET | 3. ADD REG 1 +<br>OFFSET | 3. OPERATE on<br>REG 1 / REG 2 | 3. SUB REG 2<br>from REG 1 | 3.       |
| 4. READ MEM              | 4. WRITE MEM             | 4.                             | 4.                         | 4.       |
| 5. WRITE REG2            | 5.                       | 5. WRITE DST                   | 5.                         | 5.       |

















































## **Important Facts to Remember**

- Pipelined processors divide the execution in multiple steps
- · However pipeline hazards reduce performance
  - Structural, data, and control hazard
- Data forwarding helps resolve data hazards - But all hazards cannot be resolved
  - Some data hazards require bubble or noop insertion
- Effects of control hazard reduced by branch prediction - Predict always taken, delayed slots, branch prediction
  - table
  - Structural hazards are resolved by duplicating resources

# **Pipeline control**

- We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment \_
  - Instruction Decode / Register Fetch Execution

39

- \_ Memory Stage Write Back
- How would control be handled in an automobile plant?
  - a fancy control center telling everyone what to do? should we use a finite state machine? \_





40



