CSC324 Machine Organization - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

CSC324 Machine Organization

Description:

... a memory that is large, cheap and fast (most of the time) ... Cheap, slow memory furthest ... High density, low power, cheap, slow. Dynamic: need to be ' ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 38
Provided by: kaiw
Category:

less

Transcript and Presenter's Notes

Title: CSC324 Machine Organization


1
CSC324 Machine Organization
  • Lecture 7
  • Kai Wang
  • Computer Science Department
  • University of South Dakota

http//www.usd.edu/Kai.Wang/csc324/csc324.html
2
5 classic components of any computer
Personal Computer
Computer
Processor (CPU)
Memory (passive) (where programs, data live
when running)
Devices
Input
Control
Datapath
Output
Components of every computer belong to one of
these five categories
3
Technology Trends
  • Capacity Speed (latency)
  • Logic 2x in 3 years 2x in 3 years
  • DRAM 4x in 3 years 2x in 10 years
  • Disk 4x in 3 years 2x in 10 years

DRAM Year Size Cycle
Time 1980 64 Kb 250 ns 1983 256 Kb 220 ns 1986 1
Mb 190 ns 1989 4 Mb 165 ns 1992 16 Mb 145
ns 1995 64 Mb 120 ns 1998 256 Mb 100 ns 2001 1
Gb 80 ns
21!
10001!
4
Memory Trends
  • Users want large and fast memories!
  • SRAM (Static random access memory ). The memory
    retains its contents as long as power remains
    applied.
  • DRAM (Dynamic random access memory). It needs to
    be periodically refreshed to keep the data.
  • 2004
  • SRAM access times are .5 5ns at cost of 4000
    to 10,000 per GB.
  • DRAM access times are 50-70ns at cost of 100 to
    200 per GB.
  • Disk access times are 5 to 20 million ns at cost
    of .50 to 2 per GB.

5
Memory Latency Problem
Processor-DRAM Memory Performance Gap Motivation
for Memory Hierarchy
µProc 60/yr. (2X/1.5yr)
1000
CPU
100
Processor-Memory Performance Gap(grows 50 /
year)
Performance
10
DRAM 5/yr. (2X/15 yrs)
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
6
How to Bridge the Processor DRAM Performance Gap
  • How do we create a memory that is large, cheap
    and fast (most of the time)?
  • Fact Large memories are slow, fast memories are
    small
  • Hierarchy of Levels
  • Uses smaller and faster memory technologies close
    to the processor
  • Fast access time in highest level of hierarchy
  • Cheap, slow memory furthest from processor
  • The aim of memory hierarchy design is to have
    access time close to the highest level and size
    equal to the lowest level

7
Memory Hierarchy Pyramid
Processor (CPU)
transfer datapath bus
Decreasing distance from CPU, Decreasing Access
Time (Memory Latency)
Increasing Distance from CPU,Decreasing cost /
MB
Level n
Size of memory at each level
8
Why Hierarchy works Natural Locality
  • The Principle of Locality
  • Programs access a relatively small portion of the
    address space at any second

1
Probability of reference
0
0
2n - 1
Memory Address
9
Why Hierarchy works Natural Locality
  • Temporal Locality (Locality in Time)? Recently
    accessed data tend to be referenced again soon
  • Spatial Locality (Locality in Space)? nearby
    items will tend to be referenced soon
  • Data in lower level memory will move to the upper
    level memory

10
Todays Situation Microprocessors
  • Rely on caches to bridge gap
  • Cache
  • Originally, the memory between the main memory
    and the processor
  • Today, any storage to take advantage of locality
    of access
  • Appears in early 1960 in research computers
  • Go to production computers in later 1960
  • Alpha
  • 1980 no cache
  • 1997 2-level cache

11
Memory Hierarchy Terminology
  • Hit data appear in some block in the upper level
    (example Block X)
  • Hit Rate the fraction of memory access found in
    the upper level
  • Hit Time Time to access the upper level which
    consists of
  • RAM access time Time to determine hit/miss
  • Miss data need to be retrieved from a block in
    the lower level (Block Y)
  • Miss Rate 1 - (Hit Rate)
  • Miss Penalty Time to replace a block in the
    upper level
  • Time to deliver the block the processor
  • Hit Time ltlt Miss Penalty

Main Memory
Cache
To Processor
Blk X
From Processor
Blk Y
12
Current Memory Hierarchy
  • By taking advantage of the principle of locality
  • Present the user with as much memory as is
    available in the cheapest technology.
  • Provide access at the speed offered by the
    fastest technology.

Processor
Control
Secon- dary Mem- ory
Main Mem- ory
L2 Cache
Data-path
L1 cache
regs
Speed(ns) 1ns 2ns 6ns 100ns 10,000,000ns Size
(MB) 0.0005 0.1 1-4 100-2000 100,000 Cost
(/MB) -- 100 30 1 0.05 Technology Regs SR
AM SRAM DRAM Disk
13
Memory Hierarchy Technology
  • Random Access
  • Random is good access time is the same for all
    locations
  • DRAM Dynamic Random Access Memory
  • High density, low power, cheap, slow
  • Dynamic need to be refreshed regularly
  • SRAM Static Random Access Memory
  • Low density, high power, expensive, fast
  • Static content will last forever(until lose
    power)
  • Non-so-random Access Technology
  • Access time varies from location to location and
    from time to time
  • Examples Disk, CDROM
  • Sequential Access Technology access time linear
    in location (e.g.,Tape)

14
Memories overview
  • SRAM
  • value is stored on a pair of inverting gates
  • very fast but takes up more space than DRAM (4 to
    6 transistors)
  • DRAM
  • value is stored as a charge on capacitor (must be
    refreshed)
  • very small but slower than SRAM (factor of 5 to
    10)

15
How is the hierarchy managed?
  • Registers Memory
  • By the compiler (or assembly language Programmer)
  • Cache Main Memory
  • By hardware
  • Main Memory Disks
  • By combination of hardware and the operating
    system (virtual memory will cover next)
  • By the programmer (Files)

16
General Principles of Memory
  • Locality
  • Temporal Locality referenced memory is likely to
    be referenced again soon (e.g. code within a
    loop)
  • Spatial Locality memory close to referenced
    memory is likely to be referenced soon (e.g.,
    data in a sequentially access array)
  • Definitions
  • Upper memory closer to processor
  • Block minimum unit that is present or not
    present
  • Block address location of block in memory
  • Hit Data is found in the desired location
  • Hit time time to access upper level
  • Miss rate percentage of time item not found in
    upper level
  • Locality smaller HW is faster memory
    hierarchy
  • Levels each smaller, faster, more expensive/byte
    than level below
  • Inclusive data found in upper level also found
    in the lower level

17
Memory Hierarchy
Secondary Storage
Lower level
Disks (Magnetic)
Main Memory (DRAM)
Memory Hierarchy
L2 Cache (SRAM)
L1 Cache (SRAM)
Upper level
Processor
Registers (D Flip-Flops)
18
Differences in Memory Levels
19
Measuring Cache Performance
  • Generally, for an program, we say.
  • CPU time Execution cycles X clock cycle time
  • E.g. Use 100 cycles to execute a program, each
    cycles takes 100ns, CPU time 10000ns
  • Because of the hierarchy structure of the memory
  • Data may not be found by one access (cache miss)
  • Miss penalty
  • If cache miss
  • (Execution cycles Memory stall cycles) X clock
    cycle time
  • Read-stall cycles reads X Read miss rate X
    Read miss penalty
  • Write-stall cycles writes X write miss rate X
    write miss penalty
  • Memory-stall cycles Read-stall write stall
  • Memory accesses X miss rate X miss penalty
  • instictions X misses / instiction X miss
    penalty

20
Example
  • Q All instructions take 2.0 cycles in a computer
    with cache. What is the performance of the
    computer without considering cache misses. Assume
    it has a cache miss penalty 50 cycles, cache
    miss rate of 2, and 1.33 memory references per
    instruction. What is its performance now. How
    about without using cache?
  • Answer
  • Without considering the cache misses
  • CPU time clock cycles x cycle time IC x
    CPI x cycle time
  • IC x 2.0 x cycle time
  • Performance including cache misses is
  • CPU time IC x CPI X cycle time
  • IC X memory per
    instruction X miss rate X miss penalty x cycle
    time
  • IC X 2.0 X cycle time
    IC X 1.33 X 0.02 X 50 X cycle time
  • IC x 3.33 x cycle time
  • Hence, including the memory hierarchy stretches
    CPU time by 1.67
  • Without using cache, the CPI would increase to
  • CPU time IC X (2.0 50 x 1.33) X cycle
    time
  • IC x 68.5 x cycle time
  • a factor of over 30 times longer.

21
Four Questions for Memory Hierarchy Designers
  • Q1 Where can a block be placed in the upper
    level? (Block placement)
  • Q2 How is a block found if it is in the upper
    level? (Block identification)
  • Q3 Which block should be replaced on a miss?
    (Block replacement)
  • Q4 What happens on a write? (Write strategy)

22
Cache Organization
  • (1) How do you know if something is in the cache?
  • (2) If it is in the cache, how to find it?
  • Answer to (1) and (2) depends on type or
    organization of the cache
  • In a direct mapped cache, each memory address is
    associated with one possible block within the
    cache
  • Therefore, we only need to look in a single
    location in the cache for the data if it exists
    in the cache

23
Simplest Cache Direct Mapped
4-Block Direct Mapped Cache
Memory
Cache Index
Block Address
0
0
0000two
1
1
2
2
3
3
4
0100two
  • Block Size 32/64 Bytes

5
6
  • Cache Block 0 can be occupied by data from
  • Memory block 0, 4, 8, 12
  • Cache Block 1 can be occupied by data from
  • Memory block 1, 5, 9, 13

7
8
1000two
9
10
11
12
1100two
13
14
15
24
Simplest Cache Direct Mapped
4-Block Direct Mapped Cache
MainMemory
Cache Index
Block Address
0
0
1
1
2
2
0010
3
3
4
Memory block address
5
6
0110
index
tag
7
8
9
  • index determines block in cache
  • index (address) mod ( blocks)
  • If number of cache blocks is power of 2, then
    cache index is just the lower n bits of memory
    address n log2( blocks)

10
1010
11
12
13
14
1110
15
25
Simplest Cache Direct Mapped w/Tag
Direct Mapped Cache
MainMemory
cache index
Block Address
tag
data
0
0
1
1
11
2
2
0010
3
3
4
5
6
0110
7
  • tag determines which memory block occupies cache
    block
  • tag left hand bits of address
  • hit cache tag field tag bits of address
  • miss tag field ? tag bits of addr.

8
9
10
1010
11
12
13
14
1110
15
26
Finding Item within Block
  • In reality, a cache block consists of a number of
    bytes/words (32 or 64 bytes) to (1) increase
    cache hit due to locality property and (2) reduce
    the cache miss time.
  • Mapping memory block I can be mapped to cache
    block frame I mod x, where x is the number of
    blocks in the cache
  • Called congruent mapping
  • Given an address of item, index tells which block
    of cache to look in
  • Then, how to find requested item within the cache
    block?
  • Or, equivalently, What is the byte offset of the
    item within the cache block?

27
Issues with Direct-Mapped
  • If block size gt 1, rightmost bits of index are
    really the offset within the indexed block

28
Accessing data in a direct mapped cache
  • Three types of events
  • cache miss nothing in cache in appropriate
    block, so fetch from memory
  • cache hit cache block is valid and contains
    proper address, so read desired word
  • cache miss, block replacement wrong data is in
    cache at appropriate block, so discard it and
    fetch desired data from memory
  • Cache Access Procedure (1) Use Index bits to
    select cache block (2) If valid bit is 1, compare
    the tag bits of the address with the cache block
    tag bits (3) If they match, use the offset to
    read out the word/byte.

29
Data valid, tag OK, so read offset return word d
  • 000000000000000000 0000000001 1100

3
1
Index
2
0
1
0
a
b
c
d
0
0
0
0
0
0
0
0
30
An Example Cache DecStation 3100
  • Commercial Workstation 1985
  • MIPS R2000 Processor (similar to pipelined
    machine of chapter 6)
  • Separate instruction and data caches
  • direct mapped
  • 64K Bytes (16K words) each
  • Block Size 1 Word (Low Spatial Locality)
  • Solution
  • Increase block size 2nd example

31
DecStation 3100 Cache
3
1

3
0












1
7

1
6

1
5











5

4

3

2

1

0
Address (showing bit positions)
ByteOffset
1
6
1
4
Data
H
i
t
1
6

b
i
t
s
3
2

b
i
t
s
V
a
l
i
d
T
a
g
D
a
t
a
1
6
K
e
n
t
r
i
e
s
If miss, cache controller stalls the processor,
loads data from main memory
1
6
3
2
32
64KB Cache with 4-word (16-byte) blocks
31 . . . 16 15 . . 4 3 2 1 0
Address (showing bit positions)
1
6
1
2
B
y
t
e
2
H
i
t
D
a
t
a
T
a
g
o
f
f
s
e
t
B
l
o
c
k

o
f
f
s
e
t
I
n
d
e
x
1
6

b
i
t
s
1
2
8

b
i
t
s
Tag
Data
V
4
K
e
n
t
r
i
e
s
1
6
3
2
3
2
3
2
3
2
M
u
x
3
2
33
Miss rates 1-word vs. 4-word block (cache
similar to DecStation 3100)
I-cache D-cache CombinedProgram miss
rate miss rate miss rategcc 6.1 2.1 5.4sp
ice 1.2 1.3 1.2gcc 2.0 1.7 1.9spice
0.3 0.6 0.4
1-wordblock
4-wordblock
34
Miss Rate Versus Block Size
4
0

3
5

3
0

2
5

e
t
a
r

s
2
0

s
i
M
1
5

1
0

5

0

256
64
16
4
B
l
o
c
k

s
i
z
e

(bytes)
1

K
B
total cache size
8

K
B
1
6

K
B
  • Figure 7.12 - for direct mapped cache

6
4

K
B
2
5
6

K
B
35
Extreme Example 1-block cache
  • Suppose choose block size cache size? Then
    only one block in the cache
  • Temporal Locality says if an item is accessed, it
    is likely to be accessed again soon
  • But it is unlikely that it will be accessed again
    immediately!!!
  • The next access is likely to be a miss
  • Continually loading data into the cache
    butforced to discard them before they are used
    again
  • Worst nightmare of a cache designer Ping Pong
    Effect

36
Block Size and Miss Penality
  • With increase in block size, the cost of a miss
    also increases
  • Miss penalty time to fetch the block from the
    next lower level of the hierarchy and load it
    into the cache
  • With very large blocks, increase in miss penalty
    overwhelms decrease in miss rate
  • Can minimize average access time if design memory
    system right

37
Block Size Tradeoff
Miss Rate
Exploits Spatial Locality
Fewer blocks compromises temporal locality
Block Size
Average Access Time
Increased Miss Penalty Miss Rate
Block Size
Write a Comment
User Comments (0)
About PowerShow.com