Internal Memory

About This Presentation

Title:

Internal Memory

Description:

William Stallings Computer Organization and Architecture Chapter 4 Internal Memory ?Computer memory is organized into a hierarchy. ?Decreasing cost/bit, increasing ... – PowerPoint PPT presentation

Number of Views:460

Avg rating:3.0/5.0

Slides: 84

Provided by: Adria172

Category:

more less

Transcript and Presenter's Notes

Title: Internal Memory

1
William Stallings Computer Organization and
Architecture
Chapter 4 Internal Memory
2
The four-level memory hierarchy
?Computer memory is organized into a
hierarchy. ?Decreasing cost/bit, increasing
capacity, slower access time, and decreasing
frequency of access of the memory by the
processor ?The cache automatically retains a copy
of some of the recently used words from the DRAM.
3
Memory Hierarchy

Registers
In CPU
Internal or Main memory
May include one or more levels of cache
RAM
External memory
Backing store

4
4.1 COMPUTER MEMORY SYSTEM OVERVIEW

Characteristics of Memory Systems
Location
Capacity
Unit of transfer
Access method
Performance
Physical type
Physical characteristics
Organisation

5
Location

The term location refers to whether memory is
internal or external to the computer.
CPU
The processor requires its own local memory , in
the form of registers.
Internal
Main memory, cache
External
Peripheral storage devices, such as disk and tape

6
Capacity

Internal memory capacity typically expressed in
terms of bytes(1byte8bits)or words.
External memory capacity expressed in bytes.
Word
The natural unit of organisation
Word length usually 8, 16 and 32 bits
The size of the word is typically equal to the
number of bits used to represent a number and to
the instruction length. Unfortunately, there are
many exceptions.

7
Unit of Transfer

Internal
Usually governed by data bus width
External
Usually a block which is much larger than a word
Addressable unit
Smallest location which can be uniquely addressed
At the word level or byte level
In any case,
2AN, A is the length in bits of an address
N is the number of addressable units

8
Access Methods (1)

Sequential access
Start at the beginning and read through in order
Access time depends on location of data and
previous location
variable
e.g. tape
Direct access
Individual blocks have unique address
Access is by jumping to vicinity plus sequential
search
Access time depends on location and previous
location
variable
e.g. disk

9
Access Methods (2)

Random
Individual addresses identify locations exactly
Access time is independent of location or
previous access and is constant
e.g. RAM
Associative
Data is located by a comparison with contents of
a portion of the store
Access time is independent of location or
previous access and is constant
e.g. cache

10
Performance Parameters

Access time
For random-access memory
the time it takes to perform a read or write
operation.
Time between presenting the address to the memory
and getting the valid data
For non-random-access memory
The time it takes to position the read-write
mechanism at the desired location.
Memory Cycle time
Cycle time is access time plus additional time
Time may be required for the memory to recover
before next access
Transfer Rate
Rate at which data can be moved

11
Physical Types

Semiconductor
RAM
Magnetic
Disk Tape
Optical
CD (Compact Disk) DVD (Digital Video Disk)
Others
Bubble
Hologram

12
Physical Characteristics

Decay
Volatility
In a volatile memory, information decays
naturally or is lost when electrical power is
switched off.
In a nonvolatile memory, no electrical power is
needed to retain information, e.g.
magnetic-surface memory.
Erasable
Power consumption

13
Organisation

Organisation means physical arrangement of bits
into words
Obvious arrangement not always used

14
Memory Hierarchy

Registers
In CPU
Internal or Main memory
May include one or more levels of cache
RAM
External memory
Backing store

15
The Bottom Line

The design constraints on a computers memory
How much?
Capacity
How fast?
Time is money
How expensive?
A trade-ff among the three key characteristics
of memory cost, capacity, and access time.

16
Hierarchy List

Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape

17
Hierarchy List

Across this spectrum of technologies
Faster access time, greater cost per bit
Greater capacity, smaller cost per bit
Greater capacity, slower access time
From top to down
Decreasing cost per bit
Increasing capacity
Increasing access time
Decreasing frequency of access of the memory by
the processor

18
So you want fast?

It is possible to build a computer which uses
only static RAM (see later)
This would be very fast
This would need no cache
How can you cache cache?
This would cost a very large amount

19
Locality of Reference

During the course of the execution of a program,
memory references tend to cluster
e.g. loops and subroutines
Main memory is usually extended with a
higher-speed, smaller cache. It is a device for
staging the movement of data between main memory
and processor registers to improve performance.
External memory, called Secondary or auxiliary
memory are used to store program and data files
and visible to the programmer only in terms of
files and records.

20
4.2 Semiconductor Main Memory

Table 4.2 Semiconductor Memory Types

21
Types of Random-Access Semiconductor Memory

RAM
Misnamed as all semiconductor memory is random
access, because all of the types listed in the
table are random access.
Read/Write
Volatile
A RAM must be provided with a constant power
supply.
Temporary storage
Static or dynamic

22
Dynamic RAM (DRAM)

Bits stored as charge in capacitors
Charges leak
Need refreshing even when powered
Simpler construction
Smaller per bit
Less expensive
Need refresh circuits
Slower
Main memory

23
Static RAM (SRAM)

Bits stored as on/off switches
No charges to leak
No refreshing needed when powered
More complex construction
Larger per bit
More expensive
Does not need refresh circuits
Faster
Cache

24
Read Only Memory (ROM)

Permanent storage
Applications
Microprogramming (see later)
Library subroutines
Systems programs (BIOS)
Function tables

25
Types of ROM

Written during manufacture
Very expensive for small runs
Programmable (once)
PROM
Needs special equipment to program
Read mostly
Erasable Programmable (EPROM)
Erased by UV
Electrically Erasable (EEPROM)
Takes much longer to write than read
Flash memory
It is intermediate between EPROM and EEPROM in
both cost and functionality.
Erase whole memory electrically or erase blocks
of memory

26
Organisation in detail

Memory cell
The basic element of a semiconductor memory
Two stable states
being written into to set the state, or being
read to sense the state
Chip Logic
One extreme organization the physical
arrangement of cells in the array is the same as
the logical arrangement.
The array is organized into W words of B bits
each.
e.g. A 16Mbit chip can be organised as 1M 16-bit
words
One-bit-per-chip in which data is read/written
one bit at a time
A bit per chip system has 16 lots of 1Mbit chip
with bit 1 of each word in chip 1 and so on

27
Chip Logic

Typical organization of a 16-Mbit DRAM
A 16Mbit chip can be organised as a 2048 x 2048 x
4bit array
Reduces number of address pins
Multiplex row address and column address
11 pins to address (2112048)
An additional 11 address lines select one of 2048
columns of 4bits per column. Four data lines are
for the input and output of 4 bits to and from a
data buffer. On write, the bit driver of each bit
line is activated for a 1 or 0 according to the
value of the corresponding data line. On read,
the value of each bit line selects which row of
cells is used for reading or writing.
Adding one more pin devoted to addressing doubles
the number of rows and columns, and so the size
of the chip memory grows by a factor 4.

28
Typical 16 Mb DRAM (4M x 4)
29
Refreshing

Refresh circuit included on chip
Disable chip
Count through rows
Read Write back
Takes time
Slows down apparent performance

30
Chip Packaging
EPROM package , which is a one-word-per-chip,
8-Mbit chip organized as 1M8 The address of the
word being accessed . For 1M words, a total of 20
pins (2201M) are needed. D0D7 The power
supply to the chip (VCC) A ground pin (Vss) A
chip enable (CE) pin the CE pin is used to
indicate whether or not the address is valid for
this chip. A program voltage (Vpp)
31
DRAM package, 16-Mbit chip organized as 4M4 RAM
chip can be updated, the data pins are
input/output different from ROM chip Write
Enable pin (WE) Output Enable pin (OE) Row
Address Select (RAS) Column Address Select (CAS)
32
Module Organisation
If a RAM chip contain only 1bit per word,
clearly a number of chips equal to the number of
bits per words are needed. e.g. How a memory
module consisting of 256K 8-bit words could be
organized? 256K218, an 18-bit address
needed The address is presented to 8
256K1-bit chips, each of which provides the
input/output of 1 bit.
Figure 4.6 256kbyte memory Organization
33
Module Organisation (2)
Figure 4.7 1-Mbyte Memory Organization
34

(1M8bit/256K8bit)422
As show in figure 4.7, 1M word by 8bits per
word is organized as four columns of chips, each
column containing 256K words arranged as in
Figure 4.6.
1M220
For 1M word, 20 address lines are needed.
The 18 least significant bits are routed to all
32 modules.
The high-order 2 bits are input to a group
select logic module that sends a chip enable
signal to one of the four columns of modules.

35
Error Correction

Hard Failure
Permanent defect
Soft Error
Random, non-destructive
No permanent damage to memory
Detected using Hamming error correcting code

36
Error Correcting Code Function
A function f, is performed on the data to
produce a code. When the previously stored word
is read out, the code is used to detect and
possible correct errors. A new set of K code
bits is generated from the M data bits and
compared with the fetched code bits.
37
Even Parity bits
Figure 4.9 Hamming Error-Correcting Code
Figure 4.9 uses Venn diagrams to illustrate the
use of Hamming code on 4-bit words (M4). With
three intersection circles, there are seven
compartments. We assign the 4 data bits to the
inner compartments. The remaining compartments
are filled with parity bits. Each parity bit is
chosen so that the total number of 1s in its
circle is even.
38
Figure 4.8 Error-Correcting Code

The comparison logic receives as input two k-bit
values. A bit-by-bit comparison is done by taking
the exclusive-or of the two inputs. The results
is called the syndrome word.
The syndrome word is therefore K bits wide and
has a range between 0 and 2K-1. The value 0
indicates that no error was detected. Leaving
2K-1 values to indicate, if there is an error,
which bit was in error (the numerical value of
the syndrome indicates the position of the data
bit in error).
An error could occur on any of the M data bits or
K check bits so,
2K-1MK
(This equation gives the number of bits
needed to correct a single bit error in a word
containing M data bits.)

39
?Those bit positions whose position number are
powers of 2 are designated as check bits. ?Each
check bit operates on every data bit position
whose position number contains a 1 in the
corresponding column position. ?Bit position n
is checked by those bits Ci such that ?in.
C8 C4 C2 C1
Figure 4.10 Layout of Data bits and Check bits
40
The check bits are calculated as follows, where
the symbol designates the exclusive-or
operation
Assume that the 8-bit input words is 00111001,
with data bit M1 in the right-most position. The
calculations are as follows
Suppose the data bit 3 sustains an error and is
changed from 0 to 1.
41
When the new check bits are compared with the old
check bits, the syndrome word is formed
The result is 0110, indicating that bit position
6, which contains data bit 3, in error.
42
Figure 4.11 Check Bit Degeneration
a single-error-correction (SEC) code
43
More commonly, semiconductor memory is equipped
with a single-error-correcting double-error-detect
ing (SEC-DED) code. An error-correction code
enhances the reliability of the memory at the
cost of added complexity.
Table 4.3 Increase in Word Length with Error
Correction
44
1
1
Figure 4.12 Hamming SEC-DEC Code
The sequence show that if two errors occur
(Figure 4.12 c), the checking procedure goes
astray (d) and worsens the problem by creating a
third error (e). To overcome the problem, an
eighth bit is added that is set so that the total
number of 1s in the diagram is even.
45
4.3 CASHE MEMORY

Small amount of fast memory
Sits between normal main memory and CPU
May be located on CPU chip or module

46
Cache operation - overview

Figure 4.14 Cache/Main-Memory Structure (P118)
Cache includes tags to identify which block of
main memory is in each cache slot. The tag is
usually a portion of the main memory address.

Block
Line Number
Tag
0 1 2 C-1

? ? ?

Block length (k words)
(a) Cache
47
Memory address

0 1 2 3 2n-1
Block (K words)
Block
Word Length
(b) Main Memory
48
Figure 4.15 Cache Read Operation (P119) CPU
requests contents of memory location Check
cache for this data If present, get from cache
(fast) If not present, read required block
from main memory to cache Then deliver from
cache to CPU
49
Typical Cache Organization
In this organization, the cache connects to the
processor via data, control, and address
lines. The data and address lines attach to data
and address buffers, which attach to a system bus
from which main memory is reached. When a cache
hit occurs, the data and address buffers are
disabled and communication is only between
processor and cache, with no system bus
traffic When a cache miss occurs, the desired
address is loaded onto the system bus and the
data are returned through a data buffer to both
the cache and main memory.
Figure 4.16 Typical Cache Organization
50
Elements of Cache Design

Size
Mapping Function
Direct
Associative
Set Associative
Replacement Algorithm
Least recently used (LRU)
First in first out (FIFO)
Least frequently used (LFU)
Random
Write Policy
Write through
Write back
Write once
Block Size
Number of Caches
Single or two level
Unified or split

51
Cache Size

A trade-off between cost per bit and access time
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Optimum cache sizes Suggested between 1K and
512K words.

52
Mapping Function

Three techniques
direct, associative, and set associative
Elements of the example
Cache of 64kByte
Cache block of 4 bytes
Data is transferred between memory and the cache
in blocks of 4 bytes each.
i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory
24 bit address (22416M)
Main memory (4M blocks of 4 bytes each)

53
Direct Mapping

Each block of main memory maps to only one cache
line
i.e. if a block is in cache, it must be in one
specific place
Address is in two parts
Least Significant w bits identify unique word or
byte within a block of main memory.
Most Significant s bits specify one memory block
The MSBs are split into a cache line field r and
a tag of s-r (most significant)
The line field of r identifies one of the m2r
lines of the cache

54
Direct Mapping Cache Line Table
Every row has the same cache line number Every
column has the same tag number.

Cache line Main Memory blocks
assigned
0 0, m, 2m, 2s-m
1 1, m1, 2m12s-m1
m-1 m-1, 2m-1, 3m-1 2s-1

The mapping is expressed as i j modulo
m where i cache line number j main
memory block number m number of lines in
the cache
No two blocks in the same line have the same Tag
field!
55
Direct Mapping Cache Organization
The r-bit line number is used as an index into
the cache to access a particular line. If the
(s-r) bit tag number matches the tag number
currently stored in that line, then the w-bit
word number is used to select one of the 2w bytes
in that line. Otherwise, the s bits
tag-plus-line field is used to fetch a block from
main memory.
56
Direct MappingAddress Structure
Tag s-r
Line or Slot r
Word w
14
2
8

24 bit address
w 2 bit word identifier (4 byte block)
s22 bit block identifier
8 bit tag (22-14)
14 bit slot or line
No two blocks in the same line have the same Tag
field
Check contents of cache by finding line and
checking Tag

57
Direct Mapping Example
The cache is organized as 16K214 lines of 4
bytes each. The main memory consists of
16Mbytes, organized as 4M blocks of 4 bytes
each. i j modulo m i cache line number
j main memory block number m number of
lines in the cache Note that no two blocks that
map into the same line number have the same tag
number.
Main Memory Address
58
Direct Mapping pros cons

Advantages
Simple
Inexpensive
Disadvantages
Fixed location for given block
If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high

59
Associative Mapping

A main memory block can load into any line of
cache
Memory address is interpreted as a tag and a word
field.
Tag uniquely identifies block of memory
Every lines tag is examined for a match
Disadvantages of associative mapping
Cache searching gets expensive
Complex circuitry required to examine the tags of
all caches in parallel.

60
Fully Associative Cache Organization
61
Associative MappingAddress Structure
Word 2 bit
Tag 22 bit

22 bit tag stored with each 32 bit (4B) block of
data
Compare tag field with tag entry in cache to
check for hit
Least significant 2 bits of address identify
which 16-bit word is required from 32 bit data
block
e.g.
Address Tag Data Cache line
16339C 058CE7 FEDCBA98 0001

62
Associative Mapping Example
Main Memory Address
63
Set Associative Mapping

Cache is divided into a number of sets
Each set contains a number of lines
A given block maps to any line in a given set
e.g. Block B can be in any line of set i
e.g. 2 lines per set
2 way associative mapping
A given block can be in one of 2 lines in only
one set

64
Set Associative Mapping

In this case , the cache is divided into v sets,
each of which consists of k lines.
The relationships are
m v k
i j modulo v
where
icache set number
jmain memory block number
mnumber of lines in the cache
This is referred to as k-way set associative
mapping.

65
Two Way Set Associative Cache Organization
The d set bits specify one of v2d sets. The s
bits of the tag and set fields specify one of the
2s blocks of main memory. With K-way set
associative mapping, the tag in a memory address
is much smaller and is only compared to the k
tags within a single set.
66
Set Associative MappingExample

13 bit set number
Block number in main memory is modulo 213
000000, 00A000, 00B000, 00C000 map to same set

67
Set Associative MappingAddress Structure
Word 2 bit
Tag 9 bit
Set 13 bit

Use set field to determine cache set to look in
TagSet field specifies one of the blocks in the
main memory.
Compare tag field to see if we have a hit
e.g
Address Tag Data Set number
1FF 7FFC 1FF 24682468 1FFF

68
Two Way Set Associative Mapping Example
e.g Address Tag Data Set number 1FF
7FFC 1FF 24682468 1FFF 02C 0004
02C 11235813 0001
Main Memory Address
69
Replacement Algorithms (1)Direct mapping

When a new block is brought into the cache, one
of the existing blocks must be replaced.
Direct mapping
No choice
Each block only maps to one line
Replace that line

70
Replacement Algorithms (2)Associative Set
Associative

Hardware implemented algorithm (speed)
Least Recently used (LRU)
Replace that block in the set which has been in
the cache longest with no reference to it. (hit
ratio time)
e.g. in 2 way set associative
Which of the 2 block is LRU?
First in first out (FIFO)
replace block in the set that has been in cache
longest. (time)
Least frequently used
replace block in the set which has had fewest
hits. (hit ratio)
Random

71
Write Policy

Must not overwrite a cache block unless main
memory is up to date
Problems to contend with
More than one device may have access to main
memory.
Data inconsistent between memory and cache
Multiple CPUs may have individual caches
Data inconsistent among caches
Write Policy
Write through
Write back
Write once

72
Write through

All writes go to main memory as well as cache
Any other processor-cache can monitor main memory
traffic to keep local (to CPU) cache updated.
Disadvantages
Lots of traffic
Slows down writes

73
Write back

Updates initially made in cache only
Update bit for cache slot is set when update
occurs
If block in cache is to be replaced, write to
main memory only if update bit is set
Other caches get out of sync
I/O must access main memory through cache
Because portions of main memory are invalid

74
Approaches to cache coherency

Bus watching with write through
Each cache controller monitors the address lines
to detect write operations to memory by other bus
masters.
This strategy depends on the use of a
write-through policy by all cache controller.
Hardware transparency
Additional hardware is used to ensure that all
the updates to main memory via cache are
reflected in all caches.
Noncachable memory
Only a portion of main memory is shared by more
than one processor.
In such a system, all accesses to shared memory
are cache misses, because the shared memory is
never copied to the cache.
The noncachable memory can be identified using
chip-select logic or high-access bits.

75
Line Size

The principle of locality
Data in the vicinity of a referenced word is
likely to be referenced in the near future.
The relationship between block size and hit ratio
is complex, depending on the locality
characteristics of a particular program, and no
definitive optimum value has been found.
A size of from two to eight words seems
reasonably close to optimum.

76
Number of caches

A single cache
Multiple caches
The number of levels of caches
The use of unified versus split caches
Split caches one dedicated to instructions and
one dedicated to data
Key advantage of split caches eliminate
contention for cache between the instruction
processor and the execution unit.
Unified cache a single cache used to store
references to both data and instructions
For a given cache size, a unified cache has a
higher hit rate than split caches because it
balances the load between instruction and data
fetches automatically.

77
Number of caches

The on-chip cache cache and processor on the
same chip
When the requested instruction or data is found
in the on-chip cache, the bus access is
eliminated. Because of the short data paths
internal to the processor, on-chip cache accesses
will complete appreciably faster than would even
zero-wait state bus cycles.
Advantages
Reduce the processors external bus activity
Speed up execution times
Increase overall system performance
A two-level cache
The internal cache designated as level 1 (L1)
The external cache designated as level 2 (L2)

78
4.4 Pentium Cache

Foreground reading
Find out detail of Pentium II cache systems
NOT just from Stallings!

79
4.5 Newer RAM Technology (1)

Basic DRAM same since first RAM chips
Constraints of the traditional DRAM chip
its internal architecture and its interface
to the processors memory bus.
Enhanced DRAM
Contains small SRAM as well
SRAM holds last line read
A comparator stores the 11-bit value of the most
recent row address selection.
Cache DRAM (CDRAM)
Larger SRAM component
Use as cache or serial buffer

80
Newer RAM Technology (2)

Synchronous DRAM (SDRAM)
Access is synchronized with an external clock
unlike DRAM asynchronous.
Address is presented to RAM
Since SDRAM moves data in time with system clock,
CPU knows when data will be ready
CPU does not have to wait, it can do something
else
Burst mode allows SDRAM to set up stream of data
and fire it out in block

81
Internal logic of the SDRAM
In burst mode, a series of data bits can be
clocked out rapidly after the first bit has been
accessed. Burst mode is useful when all the bits
to be accessed are in sequence and in the same
row of the array as the initial access A
dual-bank internal architecture that improves
opportunities for on-chip parallelism. The mode
register and associated control logic provide a
mechanism to customize the SDRAM to suit specific
system needs.
82
Newer RAM Technology (3)