Title: Final Exam Review
1Final Exam Review
2Exam Format
- It will cover material after the mid-term (Cache
to multiprocessors) - It is similar to the style of mid-term exam
- We will have 6-7 questions in the exam
- One question true/false or short questions which
covers general topics. - 5-6 other questions require calculation
3Memory Systems
4A Typical Memory Hierarchy (With Two Levels of
Cache)
Larger Capacity
Virtual Memory, Secondary Storage (Disk)
Second Level Cache (SRAM) L2
Main Memory (DRAM)
10,000,000s (10s ms)
1s
Speed (ns)
10s
100s
Ks
100s
Gs
Size (bytes)
Ms
Ts
5Memory Hierarchy MotivationThe Principle Of
Locality
- Programs usually access a relatively small
portion of their address space (instructions/data)
at any instant of time (loops, data arrays). - Two Types of locality
- Temporal Locality If an item is referenced, it
will tend to be referenced again soon. - Spatial locality If an item is referenced,
items whose addresses are close by will tend to
be referenced soon . - The presence of locality in program behavior
(e.g., loops, data arrays), makes it possible to
satisfy a large percentage of program access
needs (both instructions and operands) using
memory levels with much less capacity than
program address space.
6Cache Design Operation Issues
- Q1 Where can a block be placed cache?
(Block placement strategy Cache
organization) - Fully Associative, Set Associative, Direct
Mapped. - Q2 How is a block found if it is in cache?
(Block identification) - Tag/Block.
- Q3 Which block should be replaced on a miss?
(Block replacement) - Random, LRU.
- Q4 What happens on a write? (Cache write
policy) - Write through, write back.
7Cache PerformanceAverage Memory Access Time
(AMAT), Memory Stall cycles
- The Average Memory Access Time (AMAT) The
number of cycles required to complete an average
memory access request by the CPU. - Memory stall cycles per memory access The
number of stall cycles added to CPU execution
cycles for one memory access. - For an ideal memory AMAT 1 cycle, this
results in zero memory stall cycles. - Memory stall cycles per average memory access
(AMAT -1) - Memory stall cycles per average instruction
- Memory stall cycles per average
memory access - x Number
of memory accesses per instruction - (AMAT -1 ) x ( 1
fraction of loads/stores)
Instruction Fetch
8Cache Performance
- Unified cache For a CPU with a single level (L1)
of cache for both instructions and data and no
stalls for cache hits - CPUtime IC x (CPIexecution Mem Stall
cycles per instruction) x Clock cycle time - CPU time IC x CPI execution Memory
accesses/instruction x Miss rate x - Miss penalty x
Clock cycle time - Split Cache For a CPU with separate or split
level one (L1) caches for instructions and
data and no stalls for cache hits - CPUtime IC x (CPIexecution Mem Stall
cycles per instruction) x Clock cycle time - Mem Stall cycles per instruction Instruction
Fetch Miss rate x Miss Penalty Data Memory
Accesses Per Instruction x Data Miss Rate x Miss
Penalty
9Cache Performance (various factors)
- Cache impact on performance
- With and without cache
- Processor clock rate
- Which one performs better unified or split
- Assuming same size
- What is the effect of cache organization on cache
performance 1-way, 8-way set associative - Tradeoffs between hit-time and hit-rate
10Cache Performance (various factors)
- What is the affect of write policy on cache
performance Write back or write through write
allocate vs. no-write allocate - Stall Cycles Per Memory Access reads x (1
- H1 ) x M write x M - Stall Cycles Per Memory Access (1-H1) x
( M x clean 2M x dirty ) - What is the effect of cache levels on
performance - Stall cycles per memory access (1-H1) x H2 x
T2 (1-H1)(1-H2) x M - Stall cycles per memory access (1-H1) x H2
x T2 (1-H1) x (1-H2) x H3 x T3
(1-H1)(1-H2) (1-H3)x M
11Reducing Misses (3 Cs)
- Classifying Cache Misses 3 Cs
- Compulsory(Misses even in infinite size cache)
- Capacity(Misses due to size of cache)
- Conflict(Misses due to associative and size of
cache) - How to reduce the 3 Cs (Miss rate)
- Increase Block Size
- Increase Associativity
- Use a Victim Cache
- Use a Pseudo Associative Cache
- Use a prefetching technique
12Memory Interleaving Reduce miss penalty
Interleaving
Default
Begin accessing one word, and while waiting,
start accessing other three words (pipelining)
Must finish accessing one word before starting
the next access
(1251)x4 108 cycles
30 cycles
Requires 4 separate memories, each 1/4 size
Interleaving worksperfectly with caches
Spread out addresses among the memories
13Memory Interleaving An Example
- Given the following system parameters with single
cache level L1 - Block size1 word Memory bus width1 word
Miss rate 3 Miss penalty27 cycles - (1 cycles to send address 25 cycles access
time/word, 1 cycles to send a word) - Memory access/instruction 1.2 Ideal CPI
(ignoring cache misses) 2 - Miss rate (block size2 word)2 Miss rate
(block size4 words) 1 - The CPI of the base machine with 1-word blocks
2(1.2 x 0.03 x 27) 2.97 - Increasing the block size to two words gives the
following CPI - 32-bit bus and memory, no interleaving 2 (1.2
x .02 x 2 x 27) 3.29 - 32-bit bus and memory, interleaved 2 (1.2 x
.02 x (28)) 2.67 - Increasing the block size to four words
resulting CPI - 32-bit bus and memory, no interleaving 2 (1.2
x 0.01 x 4 x 27) 3.29 - 32-bit bus and memory, interleaved 2 (1.2 x
0.01 x (30)) 2.36
14Cache vs. Virtual Memory
- Motivation for virtual memory (Physical memory
size, multiprogramming) - Concept behind VM is almost identical to concept
behind cache. - But different terminology!
- Cache Block VM Page
- Cache Cache Miss VM Page Fault
- Caches implemented completely in hardware. VM
implemented in software, with hardware support
from CPU. - Cache speeds up main memory access, while main
memory speeds up VM access - Translation Look-Aside Buffer (TLB)
- How to calculate the size of page tables for a
given memory system - How to calculate the size of pages given the size
of page table
15I/O Systems
16I/O Systems
17I/O concepts
- Disk Performance
- Disk latency average seek time average
rotational delay transfer time controller
overhead - Interrupt-driven I/O
- Memory-mapped I/O
- I/O channels
- DMA (Direct Memory Access)
- I/O Communication protocols
- Daisy chaining
- Polling
- I/O Buses
- Synchronous vs. asynchronous
18RAID Systems
- Examined various RAID architectures RAID0-RAID5
Cost, Performance (BW, I/O request rate) - RAID-0 No redundancy
- RAID-1 Mirroring
- RAID-2 Memory-style ECC
- RAID-3 bit-interleaved parity
- RAID-4 block-interleaved parity
- RAID-5 block-interleaved distributed parity
19Storage Architectures
- Examined various Storage architectures (Pros. And
Cons) - DAS - Directly-Attached Storage
- NAS - Network Attached Storage
- SAN - Storage Area Network
20Multiprocessors
21Motivation
- Application needs
- Amdhals law
- T(n)
- As n ? ?, T(n) ?
- Gustafsons law
- T'(n) s np T'(?) ? ?!!!!
1 sp/n
1 s
22Flynns Taxonomy of Computing
- SISD (Single Instruction, Single Data)
- Typical uniprocessor systems that weve studied
throughout this course. - SIMD (Single Instruction, Multiple Data)
- Multiple processors simultaneously executing the
same instruction on different data. - Specialized applications (e.g., image
processing). - MIMD (Multiple Instruction, Multiple Data)
- Multiple processors autonomously executing
different instructions on different data.
23Shared Memory Multiprocessors
Shared Memory
24MPP (Massively Parallel Processing)Distributed
Memory Multiprocessors
MB Memory Bus NIC Network Interface Circuitry
MB
MB
P/C
P/C
LM
LM
NIC
NIC
Custom-Designed Network
25Cluster
LD Local Disk IOB I/O Bus
MB
MB
P/C
P/C
M
M
Bridge
Bridge
LD
LD
IOB
IOB
NIC
NIC
Commodity Network (Ethernet, ATM, Myrinet)
26Grid
P/C
P/C
P/C
P/C
IOC
IOC
Hub/LAN
Hub/LAN
NIC
NIC
LD
LD
SM
SM
SM
SM
Internet
27Multiprocessor concepts
- SIMD Applications (Image processing)
- MIMD
- Shared memory
- Cache coherence problems
- Bus scalability problems
- Distributed memory
- Interconnection networks
- Cluster of workstations
28Preparation Strategy
- Read this review to focus your preparation
- 1 general question
- 5-6 other questions
- Around 50 for memory systems
- Around 50 I/O and multiprocessors
- Go through the lecture notes
- Go through the training problems
- We will have more office hours for help
- Good luck