Final Exam Review - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Final Exam Review

Description:

Memory stall cycles per average memory access = (AMAT -1) ... Stall Cycles Per Memory Access = (1-H1) x ( M x % clean 2M x % dirty ) ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 29
Provided by: mot112
Category:
Tags: exam | final | review | stall

less

Transcript and Presenter's Notes

Title: Final Exam Review


1
Final Exam Review
2
Exam Format
  • It will cover material after the mid-term (Cache
    to multiprocessors)
  • It is similar to the style of mid-term exam
  • We will have 6-7 questions in the exam
  • One question true/false or short questions which
    covers general topics.
  • 5-6 other questions require calculation

3
Memory Systems
4
A Typical Memory Hierarchy (With Two Levels of
Cache)
Larger Capacity
Virtual Memory, Secondary Storage (Disk)
Second Level Cache (SRAM) L2
Main Memory (DRAM)
10,000,000s (10s ms)
1s
Speed (ns)
10s
100s
Ks
100s
Gs
Size (bytes)
Ms
Ts
5
Memory Hierarchy MotivationThe Principle Of
Locality
  • Programs usually access a relatively small
    portion of their address space (instructions/data)
    at any instant of time (loops, data arrays).
  • Two Types of locality
  • Temporal Locality If an item is referenced, it
    will tend to be referenced again soon.
  • Spatial locality If an item is referenced,
    items whose addresses are close by will tend to
    be referenced soon .
  • The presence of locality in program behavior
    (e.g., loops, data arrays), makes it possible to
    satisfy a large percentage of program access
    needs (both instructions and operands) using
    memory levels with much less capacity than
    program address space.

6
Cache Design Operation Issues
  • Q1 Where can a block be placed cache?
    (Block placement strategy Cache
    organization)
  • Fully Associative, Set Associative, Direct
    Mapped.
  • Q2 How is a block found if it is in cache?
    (Block identification)
  • Tag/Block.
  • Q3 Which block should be replaced on a miss?
    (Block replacement)
  • Random, LRU.
  • Q4 What happens on a write? (Cache write
    policy)
  • Write through, write back.

7
Cache PerformanceAverage Memory Access Time
(AMAT), Memory Stall cycles
  • The Average Memory Access Time (AMAT) The
    number of cycles required to complete an average
    memory access request by the CPU.
  • Memory stall cycles per memory access The
    number of stall cycles added to CPU execution
    cycles for one memory access.
  • For an ideal memory AMAT 1 cycle, this
    results in zero memory stall cycles.
  • Memory stall cycles per average memory access
    (AMAT -1)
  • Memory stall cycles per average instruction
  • Memory stall cycles per average
    memory access
  • x Number
    of memory accesses per instruction
  • (AMAT -1 ) x ( 1
    fraction of loads/stores)

Instruction Fetch
8
Cache Performance
  • Unified cache For a CPU with a single level (L1)
    of cache for both instructions and data and no
    stalls for cache hits
  • CPUtime IC x (CPIexecution Mem Stall
    cycles per instruction) x Clock cycle time
  • CPU time IC x CPI execution Memory
    accesses/instruction x Miss rate x
  • Miss penalty x
    Clock cycle time
  • Split Cache For a CPU with separate or split
    level one (L1) caches for instructions and
    data and no stalls for cache hits
  • CPUtime IC x (CPIexecution Mem Stall
    cycles per instruction) x Clock cycle time
  • Mem Stall cycles per instruction Instruction
    Fetch Miss rate x Miss Penalty Data Memory
    Accesses Per Instruction x Data Miss Rate x Miss
    Penalty

9
Cache Performance (various factors)
  • Cache impact on performance
  • With and without cache
  • Processor clock rate
  • Which one performs better unified or split
  • Assuming same size
  • What is the effect of cache organization on cache
    performance 1-way, 8-way set associative
  • Tradeoffs between hit-time and hit-rate

10
Cache Performance (various factors)
  • What is the affect of write policy on cache
    performance Write back or write through write
    allocate vs. no-write allocate
  • Stall Cycles Per Memory Access reads x (1
    - H1 ) x M write x M
  • Stall Cycles Per Memory Access (1-H1) x
    ( M x clean 2M x dirty )
  • What is the effect of cache levels on
    performance
  • Stall cycles per memory access (1-H1) x H2 x
    T2 (1-H1)(1-H2) x M
  • Stall cycles per memory access (1-H1) x H2
    x T2 (1-H1) x (1-H2) x H3 x T3
    (1-H1)(1-H2) (1-H3)x M

11
Reducing Misses (3 Cs)
  • Classifying Cache Misses 3 Cs
  • Compulsory(Misses even in infinite size cache)
  • Capacity(Misses due to size of cache)
  • Conflict(Misses due to associative and size of
    cache)
  • How to reduce the 3 Cs (Miss rate)
  • Increase Block Size
  • Increase Associativity
  • Use a Victim Cache
  • Use a Pseudo Associative Cache
  • Use a prefetching technique

12
Memory Interleaving Reduce miss penalty
Interleaving
Default
Begin accessing one word, and while waiting,
start accessing other three words (pipelining)
Must finish accessing one word before starting
the next access
(1251)x4 108 cycles
30 cycles
Requires 4 separate memories, each 1/4 size
Interleaving worksperfectly with caches
Spread out addresses among the memories
13
Memory Interleaving An Example
  • Given the following system parameters with single
    cache level L1
  • Block size1 word Memory bus width1 word
    Miss rate 3 Miss penalty27 cycles
  • (1 cycles to send address 25 cycles access
    time/word, 1 cycles to send a word)
  • Memory access/instruction 1.2 Ideal CPI
    (ignoring cache misses) 2
  • Miss rate (block size2 word)2 Miss rate
    (block size4 words) 1
  • The CPI of the base machine with 1-word blocks
    2(1.2 x 0.03 x 27) 2.97
  • Increasing the block size to two words gives the
    following CPI
  • 32-bit bus and memory, no interleaving 2 (1.2
    x .02 x 2 x 27) 3.29
  • 32-bit bus and memory, interleaved 2 (1.2 x
    .02 x (28)) 2.67
  • Increasing the block size to four words
    resulting CPI
  • 32-bit bus and memory, no interleaving 2 (1.2
    x 0.01 x 4 x 27) 3.29
  • 32-bit bus and memory, interleaved 2 (1.2 x
    0.01 x (30)) 2.36

14
Cache vs. Virtual Memory
  • Motivation for virtual memory (Physical memory
    size, multiprogramming)
  • Concept behind VM is almost identical to concept
    behind cache.
  • But different terminology!
  • Cache Block VM Page
  • Cache Cache Miss VM Page Fault
  • Caches implemented completely in hardware. VM
    implemented in software, with hardware support
    from CPU.
  • Cache speeds up main memory access, while main
    memory speeds up VM access
  • Translation Look-Aside Buffer (TLB)
  • How to calculate the size of page tables for a
    given memory system
  • How to calculate the size of pages given the size
    of page table

15
I/O Systems
16
I/O Systems
17
I/O concepts
  • Disk Performance
  • Disk latency average seek time average
    rotational delay transfer time controller
    overhead
  • Interrupt-driven I/O
  • Memory-mapped I/O
  • I/O channels
  • DMA (Direct Memory Access)
  • I/O Communication protocols
  • Daisy chaining
  • Polling
  • I/O Buses
  • Synchronous vs. asynchronous

18
RAID Systems
  • Examined various RAID architectures RAID0-RAID5
    Cost, Performance (BW, I/O request rate)
  • RAID-0 No redundancy
  • RAID-1 Mirroring
  • RAID-2 Memory-style ECC
  • RAID-3 bit-interleaved parity
  • RAID-4 block-interleaved parity
  • RAID-5 block-interleaved distributed parity

19
Storage Architectures
  • Examined various Storage architectures (Pros. And
    Cons)
  • DAS - Directly-Attached Storage
  • NAS - Network Attached Storage
  • SAN - Storage Area Network

20
Multiprocessors
21
Motivation
  • Application needs
  • Amdhals law
  • T(n)
  • As n ? ?, T(n) ?
  • Gustafsons law
  • T'(n) s np T'(?) ? ?!!!!

1 sp/n
1 s
22
Flynns Taxonomy of Computing
  • SISD (Single Instruction, Single Data)
  • Typical uniprocessor systems that weve studied
    throughout this course.
  • SIMD (Single Instruction, Multiple Data)
  • Multiple processors simultaneously executing the
    same instruction on different data.
  • Specialized applications (e.g., image
    processing).
  • MIMD (Multiple Instruction, Multiple Data)
  • Multiple processors autonomously executing
    different instructions on different data.

23
Shared Memory Multiprocessors
Shared Memory
24
MPP (Massively Parallel Processing)Distributed
Memory Multiprocessors
MB Memory Bus NIC Network Interface Circuitry
MB
MB
P/C
P/C
LM
LM
NIC
NIC
Custom-Designed Network
25
Cluster
LD Local Disk IOB I/O Bus
MB
MB
P/C
P/C
M
M
Bridge
Bridge
LD
LD
IOB
IOB
NIC
NIC
Commodity Network (Ethernet, ATM, Myrinet)
26
Grid
P/C
P/C
P/C
P/C
IOC
IOC
Hub/LAN
Hub/LAN
NIC
NIC
LD
LD
SM
SM
SM
SM
Internet
27
Multiprocessor concepts
  • SIMD Applications (Image processing)
  • MIMD
  • Shared memory
  • Cache coherence problems
  • Bus scalability problems
  • Distributed memory
  • Interconnection networks
  • Cluster of workstations

28
Preparation Strategy
  • Read this review to focus your preparation
  • 1 general question
  • 5-6 other questions
  • Around 50 for memory systems
  • Around 50 I/O and multiprocessors
  • Go through the lecture notes
  • Go through the training problems
  • We will have more office hours for help
  • Good luck
Write a Comment
User Comments (0)
About PowerShow.com