Title: Computer Architecture and Organization Miles Murdocca and Vincent Heuring
1Computer Architecture and OrganizationMiles
Murdocca and Vincent Heuring
Chapter 7 Memory
2Chapter Contents
- 7.1 The Memory Hierarchy
- 7.2 Random-Access Memory
- 7.3 Memory Chip Organization
- 7.4 Case Study Rambus Memory
- 7.5 Cache Memory
- 7.6 Virtual Memory
- 7.7 Advanced Topics
- 7.8 Case Study Associative Memory in Routers
- 7.9 Case Study The Intel Pentium 4 Memory System
3The Memory Hierarchy
4Functional Behavior of a RAM Cell
Static RAM cell (a) and dynamic RAM cell (b).
5Simplified RAM Chip Pinout
6A Four-Word Memory with Four Bits per Word in a
2D Organization
7A Simplified Representation of the Four-Word by
Four-Bit RAM
82-1/2D Organization of a 64-Word by One-Bit RAM
9Two Four-Word by Four-Bit RAMs are Used in
Creating a Four-Word by Eight-Bit RAM
10Two Four-Word by Four-Bit RAMs Make up an
Eight-Word by Four-Bit RAM
11Single-In-Line Memory Module
256 MB dual in-line memory module organized for
a 64-bit word with 16 16M 8-bit RAM chips
(eight chips on each side of the DIMM).
12Single-In-Line Memory Module
Schematic diagram of 256 MB dual in-line memory
module. (Source adapted from http//www-s.ti.com/
sc/ds/tm4en64kpu.pdf.)
13A ROM Stores Four Four-Bit Words
14A Lookup Table (LUT) Implements an Eight-Bit ALU
15Flash Memory
(a) External view of flash memory module and
(b) flash module internals. (Source adapted from
HowStuffWorks.com.)
16Cell Structure for Flash Memory
Current flows from source to drain when a
sufficient negative charge is placed on the
dielectric material, preventing current flow
through the word line. This is the logical 0
state. When the dielectric material is not
charged, current flows between the bit and word
lines, which is the logical 1 state.
17Rambus Memory
Comparison of DRAM and RDRAM configurations.
18Rambus Memory
Rambus technology on the Nintendo 64
motherboard (left) enables cost savings over the
conventional Sega Saturn motherboard design
(right).
Nintendo 64 game console
19Placement of Cache Memory in a Computer System
The locality principle a recently referenced
memory location is likely to be referenced again
(temporal locality) a neighbor of a recently
referenced memory location is likely to be
referenced (spatial locality).
20An Associative Mapping Scheme for a Cache Memory
21Associative Mapping Example
Consider how an access to memory location
(A035F014)16 is mapped to the cache for a 232
word memory. The memory is divided into 227
blocks of 25 32 words per block, and the cache
consists of 214 slots
If the addressed word is in the cache, it will
be found in word (14)16 of a slot that has tag
(501AF80)16, which is made up of the 27 most
significant bits of the address. If the addressed
word is not in the cache, then the block
corresponding to tag field (501AF80)16 is brought
into an available slot in the cache from the main
memory, and the memory reference is then
satisfied from the cache.
22Associative Mapping Area Allocation
Area allocation for associative mapping scheme
based on bits stored
23Replacement Policies
When there are no available slots in which to
place a block, a replacement policy is
implemented. The replacement policy governs the
choice of which slot is freed up for the new
block. Replacement policies are used for
associative and set-associative mapping schemes,
and also for virtual memory. Least recently
used (LRU) First-in/first-out (FIFO) Least
frequently used (LFU) Random Optimal (used
for analysis only look backward in time and
reverse-engineer the best possible strategy for a
particular sequence of memory references.)
24A Direct Mapping Scheme for Cache Memory
25Direct Mapping Example
For a direct mapped cache, each main memory
block can be mapped to only one slot, but each
slot can receive more than one block. Consider
how an access to memory location (A035F014)16 is
mapped to the cache for a 232 word memory. The
memory is divided into 227 blocks of 25 32
words per block, and the cache consists of 214
slots
If the addressed word is in the cache, it will
be found in word (14)16 of slot (2F80)16, which
will have a tag of (1406)16.
26Direct Mapping Area Allocation
Area allocation for direct mapping scheme based
on bits stored
27A Set Associative Mapping Scheme for a Cache
Memory
28Set-Associative Mapping Example
Consider how an access to memory location
(A035F014)16 is mapped to the cache for a 232
word memory. The memory is divided into 227
blocks of 25 32 words per block, there are two
blocks per set, and the cache consists of 214
slots
The leftmost 14 bits form the tag field,
followed by 13 bits for the set field, followed
by five bits for the word field
29Set Associative Mapping Area Allocation
Area allocation for set associative mapping
scheme based on bits stored
30Cache Read and Write Policies
31Hit Ratios and Effective Access Times
Hit ratio and effective access time for single
level cache
Hit ratios and effective access time for
multi-level cache
32Direct Mapped Cache Example
Compute hit ratio and effective access time for
a program that executes from memory locations 48
to 95, and then loops 10 times from 15 to 31.
The direct mapped cache has four 16-word slots, a
hit time of 80 ns, and a miss time of 2500 ns.
Load-through is used. The cache is initially
empty.
33Table of Events for Example Program
34Calculation of Hit Ratio and Effective Access
Time for Example Program
35Multi-level Cache Memory
As an example, consider a two-level cache in
which the L1 hit time is 5 ns, the L2 hit time is
20 ns, and the L2 miss time is 100 ns. There are
10,000 memory references of which 10 cause L2
misses and 90 cause L1 misses. Compute the hit
ratios of the L1 and L2 caches and the overall
effective access time. H1 is the ratio of the
number of times the accessed word is in the L1
cache to the total number of memory accesses.
There are a total of 85 (L1) and 15 (L2) misses,
and so
(Continued on next slide.)
36Multi-level Cache Memory (Cont)
H2 is the ratio of the number of times the
accessed word is in the L2 cache to the number of
times the L2 cache is accessed, and so
The effective access time is then
5.23 ns per access
37Neat Little LRU Algorithm
A sequence is shown for the Neat Little LRU
Algorithm for a cache with four slots. Main
memory blocks are accessed in the sequence 0, 2,
3, 1, 5, 4.
38Cache Coherency
The goal of cache coherence is to ensure that
every cache sees the same value for a referenced
location, which means making sure that any shared
operand that is changed is updated throughout the
system. This brings us to the issue of false
sharing, which reduces cache performance when two
operands that are not shared between processes
share the same cache line. The situation is shown
below. The problem is that each process will
invalidate the others cache line when writing
data without a real need, unless the compiler
prevents this.
39Overlays
A partition graph for a program with a main
routine and three subroutines
40Virtual Memory
Virtual memory is stored in a hard disk image.
The physical memory holds a small number of
virtual pages in physical page frames. A
mapping between a virtual and a physical memory
41Page Table
The page table maps between virtual memory and
physical memory.
42Using the Page Table
A virtual address is translated into a physical
address
Typical page table entry
43Using the Page Table (cont)
The configuration of a page table changes as a
program executes. Initially, the page table is
empty. In the final configuration, four pages are
in physical memory.
44Segmentation
A segmented memory allows two users to share
the same word processor code, with different data
spaces
45Fragmentation
(a) Free area of memory after initial-ization
(b) after fragment-ation (c) after coalescing.
46Translation Lookaside Buffer
An example TLB holds 8 entries for a system
with 32 virtual pages and 16 page frames.
47Putting it All Together
An example TLB holds 8 entries for a system
with 32 virtual pages and 16 page frames.
48Content Addressable Memory Addressing
Relationships between random access memory and
content addressable memory
49Overview of CAM
Source (Foster, C. C., Content Addressable
Parallel Processors, Van Nostrand Reinhold
Company, 1976.)
50Addressing Subtrees for a CAM
51Associative Memory in Routers
A simple network with three routers.
The use of associativememories in high-end
routersreduces the lookup time by allowing a
search to be performed in a single operation.
The search is based on the destination address,
rather than the physical memory address. Access
methods for this memory have been standardized
into an interface interoperability agreement by
the Network Processing Forum.
52Block Diagram of Dual-Read RAM
A dual-read or dual-port RAM allows any two
words to be simultaneously read from the same
memory.
53The Intel 4 Pentium Memory System