Final Exam Review - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Final Exam Review

Description:

Memory stall cycles per average memory access = (AMAT -1) ... Stall Cycles Per Memory Access = (1-H1) x ( M x % clean 2M x % dirty ) ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 29

Provided by: mot112

Category:

more less

Transcript and Presenter's Notes

Title: Final Exam Review

1
Final Exam Review
2
Exam Format

It will cover material after the mid-term (Cache
to multiprocessors)
It is similar to the style of mid-term exam
We will have 6-7 questions in the exam
One question true/false or short questions which
covers general topics.
5-6 other questions require calculation

3
Memory Systems
4
A Typical Memory Hierarchy (With Two Levels of
Cache)
Larger Capacity
Virtual Memory, Secondary Storage (Disk)
Second Level Cache (SRAM) L2
Main Memory (DRAM)
10,000,000s (10s ms)
1s
Speed (ns)
10s
100s
Ks
100s
Gs
Size (bytes)
Ms
Ts
5
Memory Hierarchy MotivationThe Principle Of
Locality

Programs usually access a relatively small
portion of their address space (instructions/data)
at any instant of time (loops, data arrays).
Two Types of locality
Temporal Locality If an item is referenced, it
will tend to be referenced again soon.
Spatial locality If an item is referenced,
items whose addresses are close by will tend to
be referenced soon .
The presence of locality in program behavior
(e.g., loops, data arrays), makes it possible to
satisfy a large percentage of program access
needs (both instructions and operands) using
memory levels with much less capacity than
program address space.

6
Cache Design Operation Issues

Q1 Where can a block be placed cache?
(Block placement strategy Cache
organization)
Fully Associative, Set Associative, Direct
Mapped.
Q2 How is a block found if it is in cache?
(Block identification)
Tag/Block.
Q3 Which block should be replaced on a miss?
(Block replacement)
Random, LRU.
Q4 What happens on a write? (Cache write
policy)
Write through, write back.

7
Cache PerformanceAverage Memory Access Time
(AMAT), Memory Stall cycles

The Average Memory Access Time (AMAT) The
number of cycles required to complete an average
memory access request by the CPU.
Memory stall cycles per memory access The
number of stall cycles added to CPU execution
cycles for one memory access.
For an ideal memory AMAT 1 cycle, this
results in zero memory stall cycles.
Memory stall cycles per average memory access
(AMAT -1)
Memory stall cycles per average instruction
Memory stall cycles per average
memory access
x Number
of memory accesses per instruction
(AMAT -1 ) x ( 1
fraction of loads/stores)

Instruction Fetch
8
Cache Performance

Unified cache For a CPU with a single level (L1)
of cache for both instructions and data and no
stalls for cache hits
CPUtime IC x (CPIexecution Mem Stall
cycles per instruction) x Clock cycle time
CPU time IC x CPI execution Memory
accesses/instruction x Miss rate x
Miss penalty x
Clock cycle time
Split Cache For a CPU with separate or split
level one (L1) caches for instructions and
data and no stalls for cache hits
CPUtime IC x (CPIexecution Mem Stall
cycles per instruction) x Clock cycle time
Mem Stall cycles per instruction Instruction
Fetch Miss rate x Miss Penalty Data Memory
Accesses Per Instruction x Data Miss Rate x Miss
Penalty

9
Cache Performance (various factors)

Cache impact on performance
With and without cache
Processor clock rate
Which one performs better unified or split
Assuming same size
What is the effect of cache organization on cache
performance 1-way, 8-way set associative
Tradeoffs between hit-time and hit-rate

10
Cache Performance (various factors)

What is the affect of write policy on cache
performance Write back or write through write
allocate vs. no-write allocate
Stall Cycles Per Memory Access reads x (1
- H1 ) x M write x M
Stall Cycles Per Memory Access (1-H1) x
( M x clean 2M x dirty )
What is the effect of cache levels on
performance
Stall cycles per memory access (1-H1) x H2 x
T2 (1-H1)(1-H2) x M
Stall cycles per memory access (1-H1) x H2
x T2 (1-H1) x (1-H2) x H3 x T3
(1-H1)(1-H2) (1-H3)x M

11
Reducing Misses (3 Cs)

Classifying Cache Misses 3 Cs
Compulsory(Misses even in infinite size cache)
Capacity(Misses due to size of cache)
Conflict(Misses due to associative and size of
cache)
How to reduce the 3 Cs (Miss rate)
Increase Block Size
Increase Associativity
Use a Victim Cache
Use a Pseudo Associative Cache
Use a prefetching technique

12
Memory Interleaving Reduce miss penalty
Interleaving
Default
Begin accessing one word, and while waiting,
start accessing other three words (pipelining)
Must finish accessing one word before starting
the next access
(1251)x4 108 cycles
30 cycles
Requires 4 separate memories, each 1/4 size
Interleaving worksperfectly with caches
Spread out addresses among the memories
13
Memory Interleaving An Example

Given the following system parameters with single
cache level L1
Block size1 word Memory bus width1 word
Miss rate 3 Miss penalty27 cycles
(1 cycles to send address 25 cycles access
time/word, 1 cycles to send a word)
Memory access/instruction 1.2 Ideal CPI
(ignoring cache misses) 2
Miss rate (block size2 word)2 Miss rate
(block size4 words) 1
The CPI of the base machine with 1-word blocks
2(1.2 x 0.03 x 27) 2.97
Increasing the block size to two words gives the
following CPI
32-bit bus and memory, no interleaving 2 (1.2
x .02 x 2 x 27) 3.29
32-bit bus and memory, interleaved 2 (1.2 x
.02 x (28)) 2.67
Increasing the block size to four words
resulting CPI
32-bit bus and memory, no interleaving 2 (1.2
x 0.01 x 4 x 27) 3.29
32-bit bus and memory, interleaved 2 (1.2 x
0.01 x (30)) 2.36

14
Cache vs. Virtual Memory

Motivation for virtual memory (Physical memory
size, multiprogramming)
Concept behind VM is almost identical to concept
behind cache.
But different terminology!
Cache Block VM Page
Cache Cache Miss VM Page Fault
Caches implemented completely in hardware. VM
implemented in software, with hardware support
from CPU.
Cache speeds up main memory access, while main
memory speeds up VM access
Translation Look-Aside Buffer (TLB)
How to calculate the size of page tables for a
given memory system
How to calculate the size of pages given the size
of page table

15
I/O Systems
16
I/O Systems
17
I/O concepts

Disk Performance
Disk latency average seek time average
rotational delay transfer time controller
overhead
Interrupt-driven I/O
Memory-mapped I/O
I/O channels
DMA (Direct Memory Access)
I/O Communication protocols
Daisy chaining
Polling
I/O Buses
Synchronous vs. asynchronous

18
RAID Systems

Examined various RAID architectures RAID0-RAID5
Cost, Performance (BW, I/O request rate)
RAID-0 No redundancy
RAID-1 Mirroring
RAID-2 Memory-style ECC
RAID-3 bit-interleaved parity
RAID-4 block-interleaved parity
RAID-5 block-interleaved distributed parity

19
Storage Architectures

Examined various Storage architectures (Pros. And
Cons)
DAS - Directly-Attached Storage
NAS - Network Attached Storage
SAN - Storage Area Network

20
Multiprocessors
21
Motivation

Application needs
Amdhals law
T(n)
As n ? ?, T(n) ?
Gustafsons law
T'(n) s np T'(?) ? ?!!!!

1 sp/n
1 s
22
Flynns Taxonomy of Computing

SISD (Single Instruction, Single Data)
Typical uniprocessor systems that weve studied
throughout this course.
SIMD (Single Instruction, Multiple Data)
Multiple processors simultaneously executing the
same instruction on different data.
Specialized applications (e.g., image
processing).
MIMD (Multiple Instruction, Multiple Data)
Multiple processors autonomously executing
different instructions on different data.

23
Shared Memory Multiprocessors
Shared Memory
24
MPP (Massively Parallel Processing)Distributed
Memory Multiprocessors
MB Memory Bus NIC Network Interface Circuitry
MB
MB
P/C
P/C
LM
LM
NIC
NIC
Custom-Designed Network
25
Cluster
LD Local Disk IOB I/O Bus
MB
MB
P/C
P/C
M
M
Bridge
Bridge
LD
LD
IOB
IOB
NIC
NIC
Commodity Network (Ethernet, ATM, Myrinet)
26
Grid
P/C
P/C
P/C
P/C
IOC
IOC
Hub/LAN
Hub/LAN
NIC
NIC
LD
LD
SM
SM
SM
SM
Internet
27
Multiprocessor concepts