Title: Computer Architecture Lecture 4 17th May, 2006
1Computer ArchitectureLecture 417th May, 2006
- Abhinav Agarwal
- Veeramani V.
2Recap
- Simple Pipeline hazards and solution
- Data hazards
- Static compiler techniques load delay slot,
etc. - Hardware solutions Data forwarding,
out-of-order execution, register renaming - Control hazards
- Static compiler techniques
- Hardware speculation through branch predictors
- Structural hazards
- Increase hardware resources
- Superscalar out-of-order execution
- Memory organisation
3Memory Organization in processors
- Caches inside the chip
- Faster Closer
- SRAM cells
- They contain recently-used data
- They contain data in blocks
4Rational behind caches
- Principle of spatial locality
- Principle of temporal locality
- Replacement policy (LRU, LFU, etc.)
- Principle of inclusivity
5Outline
- Instruction Level Parallelism
- Thread-level Parallelism
- Fine-Grain multithreading
- Simultaneous multithreading
- Sharable resources Non-sharable resources
- Chip Multiprocessor
- Some design issues
6Instruction Level Parallelism
- Overlap execution of many instructions
- ILP techniques try to reduce data and control
dependencies - Issue out-of-order independent instructions
7Thread Level Parallelism
- Two different threads have more independent
instructions - Better utilization of functional units
- Multi-thread performance is improved drastically
8A simple pipeline
source EV8 DEC Alpha Processor, (c) Intel
9Superscalar pipeline
source EV8 DEC Alpha Processor, (c) Intel
10Speculative execution
source EV8 DEC Alpha Processor, (c) Intel
11Fine Grained Multithreading
source EV8 DEC Alpha Processor, (c) Intel
12Simultaneous Multithreading
source EV8 DEC Alpha Processor, (c) Intel
13Out of Order Execution
source EV8 DEC Alpha Processor, (c) Intel
14SMT pipeline
source EV8 DEC Alpha Processor, (c) Intel
15Resources Replication required
- Program counters
- Register maps
16Replication not required
- Register file (rename space)
- Instruction queue
- Branch predictor
- First and second level caches etc.
17Chip multiprocessor
- Number of transistors going up
- Have more than one core on the chip
- These still share the caches
18Some design issues
- Trade-off in choosing the cache size
- Power and performance
- Super pipelining trade-off
- Higher clock frequency and speculation penalty
Power - Power consumption
19Novel techniques for power
- Clock gating
- Run non-critical elements at a slower clock
- Reduce voltage swings (Voltage of operation)
- Sleep Mode/ Standby Mode
- Dynamic Voltage Frequency scaling