Final Exam Review - PowerPoint PPT Presentation

About This Presentation
Title:

Final Exam Review

Description:

Bus/Custom-Designed Network. Shared Memory. 34. COMP381 by M. Hamdi ... Custom-Designed Network. MB : Memory Bus NIC : Network Interface Circuitry. 35 ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 39
Provided by: mot112
Category:
Tags: exam | final | review

less

Transcript and Presenter's Notes

Title: Final Exam Review


1
Final Exam Review
2
Exam Format
  • It will cover material after the mid-term (Cache
    to multiprocessors)
  • It is similar to the style of mid-term exam
  • We will have 6-7 questions in the exam
  • One question true/false or short questions which
    covers general topics.
  • 5-6 other questions require calculation

3
Memory Systems
4
Memory Hierarchy - the Big Picture
  • Problem memory is too slow and/or too small
  • Solution memory hierarchy

Larger Capacity
Slowest
Fastest
Biggest
Smallest
Lowest
Highest
5
Why Hierarchy Works
  • The principle of locality
  • Programs access a relatively small portion of the
    address space at any instant of time.
  • Temporal locality recently accessed
    instruction/data is likely to be used again
  • Spatial locality instruction/data near recently
    accessed /instruction data is likely to be used
    soon
  • Result the illusion of large, fast memory

6
Cache Design Operation Issues
  • Q1 Where can a block be placed cache?
    (Block placement strategy Cache
    organization)
  • Fully Associative, Set Associative, Direct
    Mapped.
  • Q2 How is a block found if it is in cache?
    (Block identification)
  • Tag/Block.
  • Q3 Which block should be replaced on a miss?
    (Block replacement)
  • Random, LRU.
  • Q4 What happens on a write? (Cache write
    policy)
  • Write through, write back.

7
Q1 Block Placement
  • Where can block be placed in cache?
  • In one predetermined place - direct-mapped
  • Use fragment of address to calculate block
    location in cache
  • Compare cache block with tag to test if block
    present
  • Anywhere in cache - fully associative
  • Compare tag to every block in cache
  • In a limited set of places - set-associative
  • Use address fragment to calculate set
  • Place in any block in the set
  • Compare tag to every block in set
  • Hybrid of direct mapped and fully associative

8
Q2 Block Identification
  • Every cache block has an address tag and index
    that identifies its location in memory
  • Hit when tag and index of desired word
    match(comparison by hardware)
  • Q What happens when a cache block is empty?A
    Mark this condition with a valid bit

Tag/index
Valid
Data
0x00001C0
0xff083c2d
1
9
Cache Replacement Policy
  • Random
  • Replace a randomly chosen line
  • LRU (Least Recently Used)
  • Replace the least recently used line

10
Write-through Policy
0x1234
0x1234
0x1234
0x5678
0x5678
0x1234
Processor
Cache
Memory
11
Write-back Policy
0x1234
0x1234
0x1234
0x5678
0x9ABC
0x1234
0x5678
0x5678
Processor
Cache
Memory
12
Cache PerformanceAverage Memory Access Time
(AMAT), Memory Stall cycles
  • The Average Memory Access Time (AMAT) The
    number of cycles required to complete an average
    memory access request by the CPU.
  • Memory stall cycles per memory access The
    number of stall cycles added to CPU execution
    cycles for one memory access.
  • For an ideal memory AMAT 1 cycle, this
    results in zero memory stall cycles.
  • Memory stall cycles per average memory access
    (AMAT -1)
  • Memory stall cycles per average instruction
  • Memory stall cycles per average
    memory access
  • x Number
    of memory accesses per instruction
  • (AMAT -1 ) x ( 1
    fraction of loads/stores)

Instruction Fetch
13
Cache Performance
  • Unified cache For a CPU with a single level (L1)
    of cache for both instructions and data and no
    stalls for cache hits
  • CPUtime IC x (CPIexecution Mem Stall
    cycles per instruction) x Clock cycle time
  • CPU time IC x CPI execution Memory
    accesses/instruction x Miss rate x
  • Miss penalty x
    Clock cycle time
  • Split Cache For a CPU with separate or split
    level one (L1) caches for instructions and
    data and no stalls for cache hits
  • CPUtime IC x (CPIexecution Mem Stall
    cycles per instruction) x Clock cycle time
  • Mem Stall cycles per instruction Instruction
    Fetch Miss rate x Miss Penalty Data Memory
    Accesses Per Instruction x Data Miss Rate x Miss
    Penalty

14
Memory Access TreeFor Unified Level 1 Cache
CPU Memory Access
L1 Hit Hit Rate H1 Access Time
1 Stalls H1 x 0 0 ( No Stall)
L1 Miss (1- Hit rate) (1-H1)
Access time M 1 Stall cycles per access
M x (1-H1)
L1
AMAT H1 x 1 (1 -H1 ) x
(M 1) 1 M x ( 1
-H1) Stall Cycles Per Access AMAT - 1
M x (1 -H1)
M Miss Penalty H1 Level 1 Hit Rate 1- H1
Level 1 Miss Rate
15
Memory Access TreeFor Separate Level 1 Caches
CPU Memory Access
Instruction
Data
L1
Instruction L1 Hit Access Time 1 Stalls 0
Instruction L1 Miss Access Time M
1 Stalls Per access instructions x (1 -
Instruction H1 ) x M
Data L1 Miss Access Time M 1 Stalls per
access data x (1 - Data H1 ) x M
Data L1 Hit Access Time 1 Stalls 0
Stall Cycles Per Access Instructions x ( 1
- Instruction H1 ) x M data x (1 - Data
H1 ) x M AMAT 1 Stall Cycles per access

16
Cache Performance (various factors)
  • Cache impact on performance
  • With and without cache
  • Processor clock rate
  • Which one performs better unified or split
  • Assuming same size
  • What is the effect of cache organization on cache
    performance 1-way, 8-way set associative
  • Tradeoffs between hit-time and hit-rate

17
Cache Performance (various factors)
  • What is the affect of write policy on cache
    performance Write back or write through write
    allocate vs. no-write allocate
  • Stall Cycles Per Memory Access reads x (1
    - H1 ) x M write x M
  • Stall Cycles Per Memory Access (1-H1) x
    ( M x clean 2M x dirty )
  • What is the effect of cache levels on
    performance
  • Stall cycles per memory access (1-H1) x H2 x
    T2 (1-H1)(1-H2) x M
  • Stall cycles per memory access (1-H1) x H2
    x T2 (1-H1) x (1-H2) x H3 x T3
    (1-H1)(1-H2) (1-H3)x M

18
Performance Equation
To reduce CPUtime, we need to reduce Cache Miss
Rate
19
Reducing Misses (3 Cs)
  • Classifying Cache Misses 3 Cs
  • Compulsory(Misses even in infinite size cache)
  • Capacity(Misses due to size of cache)
  • Conflict(Misses due to associative and size of
    cache)
  • How to reduce the 3 Cs (Miss rate)
  • Increase Block Size
  • Increase Associativity
  • Use a Victim Cache
  • Use a Pseudo Associative Cache
  • Use a prefetching technique

20
Performance Equation
To reduce CPUtime, we need to reduce Cache Miss
Penalty
21
Memory Interleaving Reduce miss penalty
Interleaving
Default
Begin accessing one word, and while waiting,
start accessing other three words (pipelining)
Must finish accessing one word before starting
the next access
(1251)x4 108 cycles
30 cycles
Requires 4 separate memories, each 1/4 size
Interleaving worksperfectly with caches
Spread out addresses among the memories
22
Memory Interleaving An Example
  • Given the following system parameters with single
    cache level L1
  • Block size1 word Memory bus width1 word
    Miss rate 3 Miss penalty27 cycles
  • (1 cycles to send address 25 cycles access
    time/word, 1 cycles to send a word)
  • Memory access/instruction 1.2 Ideal CPI
    (ignoring cache misses) 2
  • Miss rate (block size2 word)2 Miss rate
    (block size4 words) 1
  • The CPI of the base machine with 1-word blocks
    2(1.2 x 0.03 x 27) 2.97
  • Increasing the block size to two words gives the
    following CPI
  • 32-bit bus and memory, no interleaving 2 (1.2
    x .02 x 2 x 27) 3.29
  • 32-bit bus and memory, interleaved 2 (1.2 x
    .02 x (28)) 2.67
  • Increasing the block size to four words
    resulting CPI
  • 32-bit bus and memory, no interleaving 2 (1.2
    x 0.01 x 4 x 27) 3.29
  • 32-bit bus and memory, interleaved 2 (1.2 x
    0.01 x (30)) 2.36

23
Cache vs. Virtual Memory
  • Motivation for virtual memory (Physical memory
    size, multiprogramming)
  • Concept behind VM is almost identical to concept
    behind cache.
  • But different terminology!
  • Cache Block VM Page
  • Cache Cache Miss VM Page Fault
  • Caches implemented completely in hardware. VM
    implemented in software, with hardware support
    from CPU.
  • Cache speeds up main memory access, while main
    memory speeds up VM access
  • Translation Look-Aside Buffer (TLB)
  • How to calculate the size of page tables for a
    given memory system
  • How to calculate the size of pages given the size
    of page table

24
Virtual Memory Definitions
  • Key idea simulate a larger physical memory than
    is actually available
  • General approach
  • Break address space up into pages
  • Each program accesses a working set of pages
  • Store pages
  • In physical memory as space permits
  • On disk when no space left in physical memory
  • Access pages using virtual address

Individual Pages
Memory Map
Disk
Physical Memory
Virtual Memory
25
I/O Systems
26
I/O Systems
27
I/O concepts
  • Disk Performance
  • Disk latency average seek time average
    rotational delay transfer time controller
    overhead
  • Interrupt-driven I/O
  • Memory-mapped I/O
  • I/O channels
  • DMA (Direct Memory Access)
  • I/O Communication protocols
  • Daisy chaining
  • Polling
  • I/O Buses
  • Synchronous vs. asynchronous

28
RAID Systems
  • Examined various RAID architectures RAID0-RAID5
    Cost, Performance (BW, I/O request rate)
  • RAID-0 No redundancy
  • RAID-1 Mirroring
  • RAID-2 Memory-style ECC
  • RAID-3 bit-interleaved parity
  • RAID-4 block-interleaved parity
  • RAID-5 block-interleaved distributed parity

29
Storage Architectures
  • Examined various Storage architectures (Pros. And
    Cons)
  • DAS - Directly-Attached Storage
  • NAS - Network Attached Storage
  • SAN - Storage Area Network

30
Multiprocessors
31
Motivation
  • Application needs
  • Amdhals law
  • T(n)
  • As n ? ?, T(n) ?
  • Gustafsons law
  • T'(n) s np T'(?) ? ?!!!!

1 sp/n
1 s
32
Flynns Taxonomy of Computing
  • SISD (Single Instruction, Single Data)
  • Typical uniprocessor systems that weve studied
    throughout this course.
  • SIMD (Single Instruction, Multiple Data)
  • Multiple processors simultaneously executing the
    same instruction on different data.
  • Specialized applications (e.g., image
    processing).
  • MIMD (Multiple Instruction, Multiple Data)
  • Multiple processors autonomously executing
    different instructions on different data.

33
Shared Memory Multiprocessors
Shared Memory
34
MPP (Massively Parallel Processing)Distributed
Memory Multiprocessors
MB Memory Bus NIC Network Interface Circuitry
MB
MB
P/C
P/C
LM
LM
NIC
NIC
Custom-Designed Network
35
Cluster
LD Local Disk IOB I/O Bus
MB
MB
P/C
P/C
M
M
Bridge
Bridge
LD
LD
IOB
IOB
NIC
NIC
Commodity Network (Ethernet, ATM, Myrinet)
36
Grid
P/C
P/C
P/C
P/C
IOC
IOC
Hub/LAN
Hub/LAN
NIC
NIC
LD
LD
SM
SM
SM
SM
Internet
37
Multiprocessor concepts
  • SIMD Applications (Image processing)
  • MIMD
  • Shared memory
  • Cache coherence problems
  • Bus scalability problems
  • Distributed memory
  • Interconnection networks
  • Cluster of workstations

38
Preparation Strategy
  • Read this review to focus your preparation
  • 1 general question
  • 5-6 other questions
  • Around 50 for memory systems
  • Around 50 I/O and multiprocessors
  • Go through the lecture notes
  • Go through the training problems
  • We will have more office hours for help
  • Good luck
Write a Comment
User Comments (0)
About PowerShow.com