Internal Memory - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Internal Memory

Description:

William Stallings Computer Organization and Architecture Chapter 4 Internal Memory ?Computer memory is organized into a hierarchy. ?Decreasing cost/bit, increasing ... – PowerPoint PPT presentation

Number of Views:477
Avg rating:3.0/5.0
Slides: 84
Provided by: Adria172
Category:

less

Transcript and Presenter's Notes

Title: Internal Memory


1
William Stallings Computer Organization and
Architecture
Chapter 4 Internal Memory
2
The four-level memory hierarchy
?Computer memory is organized into a
hierarchy. ?Decreasing cost/bit, increasing
capacity, slower access time, and decreasing
frequency of access of the memory by the
processor ?The cache automatically retains a copy
of some of the recently used words from the DRAM.
3
Memory Hierarchy
  • Registers
  • In CPU
  • Internal or Main memory
  • May include one or more levels of cache
  • RAM
  • External memory
  • Backing store

4
4.1 COMPUTER MEMORY SYSTEM OVERVIEW
  • Characteristics of Memory Systems
  • Location
  • Capacity
  • Unit of transfer
  • Access method
  • Performance
  • Physical type
  • Physical characteristics
  • Organisation

5
Location
  • The term location refers to whether memory is
    internal or external to the computer.
  • CPU
  • The processor requires its own local memory , in
    the form of registers.
  • Internal
  • Main memory, cache
  • External
  • Peripheral storage devices, such as disk and tape

6
Capacity
  • Internal memory capacity typically expressed in
    terms of bytes(1byte8bits)or words.
  • External memory capacity expressed in bytes.
  • Word
  • The natural unit of organisation
  • Word length usually 8, 16 and 32 bits
  • The size of the word is typically equal to the
    number of bits used to represent a number and to
    the instruction length. Unfortunately, there are
    many exceptions.

7
Unit of Transfer
  • Internal
  • Usually governed by data bus width
  • External
  • Usually a block which is much larger than a word
  • Addressable unit
  • Smallest location which can be uniquely addressed
  • At the word level or byte level
  • In any case,
  • 2AN, A is the length in bits of an address
  • N is the number of addressable units

8
Access Methods (1)
  • Sequential access
  • Start at the beginning and read through in order
  • Access time depends on location of data and
    previous location
  • variable
  • e.g. tape
  • Direct access
  • Individual blocks have unique address
  • Access is by jumping to vicinity plus sequential
    search
  • Access time depends on location and previous
    location
  • variable
  • e.g. disk

9
Access Methods (2)
  • Random
  • Individual addresses identify locations exactly
  • Access time is independent of location or
    previous access and is constant
  • e.g. RAM
  • Associative
  • Data is located by a comparison with contents of
    a portion of the store
  • Access time is independent of location or
    previous access and is constant
  • e.g. cache

10
Performance Parameters
  • Access time
  • For random-access memory
  • the time it takes to perform a read or write
    operation.
  • Time between presenting the address to the memory
    and getting the valid data
  • For non-random-access memory
  • The time it takes to position the read-write
    mechanism at the desired location.
  • Memory Cycle time
  • Cycle time is access time plus additional time
  • Time may be required for the memory to recover
    before next access
  • Transfer Rate
  • Rate at which data can be moved

11
Physical Types
  • Semiconductor
  • RAM
  • Magnetic
  • Disk Tape
  • Optical
  • CD (Compact Disk) DVD (Digital Video Disk)
  • Others
  • Bubble
  • Hologram

12
Physical Characteristics
  • Decay
  • Volatility
  • In a volatile memory, information decays
    naturally or is lost when electrical power is
    switched off.
  • In a nonvolatile memory, no electrical power is
    needed to retain information, e.g.
    magnetic-surface memory.
  • Erasable
  • Power consumption

13
Organisation
  • Organisation means physical arrangement of bits
    into words
  • Obvious arrangement not always used

14
Memory Hierarchy
  • Registers
  • In CPU
  • Internal or Main memory
  • May include one or more levels of cache
  • RAM
  • External memory
  • Backing store

15
The Bottom Line
  • The design constraints on a computers memory
  • How much?
  • Capacity
  • How fast?
  • Time is money
  • How expensive?
  • A trade-ff among the three key characteristics
    of memory cost, capacity, and access time.

16
Hierarchy List
  • Registers
  • L1 Cache
  • L2 Cache
  • Main memory
  • Disk cache
  • Disk
  • Optical
  • Tape

17
Hierarchy List
  • Across this spectrum of technologies
  • Faster access time, greater cost per bit
  • Greater capacity, smaller cost per bit
  • Greater capacity, slower access time
  • From top to down
  • Decreasing cost per bit
  • Increasing capacity
  • Increasing access time
  • Decreasing frequency of access of the memory by
    the processor

18
So you want fast?
  • It is possible to build a computer which uses
    only static RAM (see later)
  • This would be very fast
  • This would need no cache
  • How can you cache cache?
  • This would cost a very large amount

19
Locality of Reference
  • During the course of the execution of a program,
    memory references tend to cluster
  • e.g. loops and subroutines
  • Main memory is usually extended with a
    higher-speed, smaller cache. It is a device for
    staging the movement of data between main memory
    and processor registers to improve performance.
  • External memory, called Secondary or auxiliary
    memory are used to store program and data files
    and visible to the programmer only in terms of
    files and records.

20
4.2 Semiconductor Main Memory
  • Table 4.2 Semiconductor Memory Types

21
Types of Random-Access Semiconductor Memory
  • RAM
  • Misnamed as all semiconductor memory is random
    access, because all of the types listed in the
    table are random access.
  • Read/Write
  • Volatile
  • A RAM must be provided with a constant power
    supply.
  • Temporary storage
  • Static or dynamic

22
Dynamic RAM (DRAM)
  • Bits stored as charge in capacitors
  • Charges leak
  • Need refreshing even when powered
  • Simpler construction
  • Smaller per bit
  • Less expensive
  • Need refresh circuits
  • Slower
  • Main memory

23
Static RAM (SRAM)
  • Bits stored as on/off switches
  • No charges to leak
  • No refreshing needed when powered
  • More complex construction
  • Larger per bit
  • More expensive
  • Does not need refresh circuits
  • Faster
  • Cache

24
Read Only Memory (ROM)
  • Permanent storage
  • Applications
  • Microprogramming (see later)
  • Library subroutines
  • Systems programs (BIOS)
  • Function tables

25
Types of ROM
  • Written during manufacture
  • Very expensive for small runs
  • Programmable (once)
  • PROM
  • Needs special equipment to program
  • Read mostly
  • Erasable Programmable (EPROM)
  • Erased by UV
  • Electrically Erasable (EEPROM)
  • Takes much longer to write than read
  • Flash memory
  • It is intermediate between EPROM and EEPROM in
    both cost and functionality.
  • Erase whole memory electrically or erase blocks
    of memory

26
Organisation in detail
  • Memory cell
  • The basic element of a semiconductor memory
  • Two stable states
  • being written into to set the state, or being
    read to sense the state
  • Chip Logic
  • One extreme organization the physical
    arrangement of cells in the array is the same as
    the logical arrangement.
  • The array is organized into W words of B bits
    each.
  • e.g. A 16Mbit chip can be organised as 1M 16-bit
    words
  • One-bit-per-chip in which data is read/written
    one bit at a time
  • A bit per chip system has 16 lots of 1Mbit chip
    with bit 1 of each word in chip 1 and so on

27
Chip Logic
  • Typical organization of a 16-Mbit DRAM
  • A 16Mbit chip can be organised as a 2048 x 2048 x
    4bit array
  • Reduces number of address pins
  • Multiplex row address and column address
  • 11 pins to address (2112048)
  • An additional 11 address lines select one of 2048
    columns of 4bits per column. Four data lines are
    for the input and output of 4 bits to and from a
    data buffer. On write, the bit driver of each bit
    line is activated for a 1 or 0 according to the
    value of the corresponding data line. On read,
    the value of each bit line selects which row of
    cells is used for reading or writing.
  • Adding one more pin devoted to addressing doubles
    the number of rows and columns, and so the size
    of the chip memory grows by a factor 4.

28
Typical 16 Mb DRAM (4M x 4)
29
Refreshing
  • Refresh circuit included on chip
  • Disable chip
  • Count through rows
  • Read Write back
  • Takes time
  • Slows down apparent performance

30
Chip Packaging
EPROM package , which is a one-word-per-chip,
8-Mbit chip organized as 1M8 The address of the
word being accessed . For 1M words, a total of 20
pins (2201M) are needed. D0D7 The power
supply to the chip (VCC) A ground pin (Vss) A
chip enable (CE) pin the CE pin is used to
indicate whether or not the address is valid for
this chip. A program voltage (Vpp)
31
DRAM package, 16-Mbit chip organized as 4M4 RAM
chip can be updated, the data pins are
input/output different from ROM chip Write
Enable pin (WE) Output Enable pin (OE) Row
Address Select (RAS) Column Address Select (CAS)
32
Module Organisation
If a RAM chip contain only 1bit per word,
clearly a number of chips equal to the number of
bits per words are needed. e.g. How a memory
module consisting of 256K 8-bit words could be
organized? 256K218, an 18-bit address
needed The address is presented to 8
256K1-bit chips, each of which provides the
input/output of 1 bit.
Figure 4.6 256kbyte memory Organization
33
Module Organisation (2)
Figure 4.7 1-Mbyte Memory Organization
34
  • (1M8bit/256K8bit)422
  • As show in figure 4.7, 1M word by 8bits per
    word is organized as four columns of chips, each
    column containing 256K words arranged as in
    Figure 4.6.
  • 1M220
  • For 1M word, 20 address lines are needed.
  • The 18 least significant bits are routed to all
    32 modules.
  • The high-order 2 bits are input to a group
    select logic module that sends a chip enable
    signal to one of the four columns of modules.

35
Error Correction
  • Hard Failure
  • Permanent defect
  • Soft Error
  • Random, non-destructive
  • No permanent damage to memory
  • Detected using Hamming error correcting code

36
Error Correcting Code Function
A function f, is performed on the data to
produce a code. When the previously stored word
is read out, the code is used to detect and
possible correct errors. A new set of K code
bits is generated from the M data bits and
compared with the fetched code bits.
37
Even Parity bits
Figure 4.9 Hamming Error-Correcting Code
Figure 4.9 uses Venn diagrams to illustrate the
use of Hamming code on 4-bit words (M4). With
three intersection circles, there are seven
compartments. We assign the 4 data bits to the
inner compartments. The remaining compartments
are filled with parity bits. Each parity bit is
chosen so that the total number of 1s in its
circle is even.
38
Figure 4.8 Error-Correcting Code
  • The comparison logic receives as input two k-bit
    values. A bit-by-bit comparison is done by taking
    the exclusive-or of the two inputs. The results
    is called the syndrome word.
  • The syndrome word is therefore K bits wide and
    has a range between 0 and 2K-1. The value 0
    indicates that no error was detected. Leaving
    2K-1 values to indicate, if there is an error,
    which bit was in error (the numerical value of
    the syndrome indicates the position of the data
    bit in error).
  • An error could occur on any of the M data bits or
    K check bits so,
    2K-1MK
  • (This equation gives the number of bits
    needed to correct a single bit error in a word
    containing M data bits.)

39
?Those bit positions whose position number are
powers of 2 are designated as check bits. ?Each
check bit operates on every data bit position
whose position number contains a 1 in the
corresponding column position. ?Bit position n
is checked by those bits Ci such that ?in.
C8 C4 C2 C1
Figure 4.10 Layout of Data bits and Check bits
40
The check bits are calculated as follows, where
the symbol designates the exclusive-or
operation
Assume that the 8-bit input words is 00111001,
with data bit M1 in the right-most position. The
calculations are as follows
Suppose the data bit 3 sustains an error and is
changed from 0 to 1.
41
When the new check bits are compared with the old
check bits, the syndrome word is formed
The result is 0110, indicating that bit position
6, which contains data bit 3, in error.
42
Figure 4.11 Check Bit Degeneration
a single-error-correction (SEC) code
43
More commonly, semiconductor memory is equipped
with a single-error-correcting double-error-detect
ing (SEC-DED) code. An error-correction code
enhances the reliability of the memory at the
cost of added complexity.
Table 4.3 Increase in Word Length with Error
Correction
44
1
1
Figure 4.12 Hamming SEC-DEC Code
The sequence show that if two errors occur
(Figure 4.12 c), the checking procedure goes
astray (d) and worsens the problem by creating a
third error (e). To overcome the problem, an
eighth bit is added that is set so that the total
number of 1s in the diagram is even.
45
4.3 CASHE MEMORY
  • Small amount of fast memory
  • Sits between normal main memory and CPU
  • May be located on CPU chip or module

46
Cache operation - overview
  • Figure 4.14 Cache/Main-Memory Structure (P118)
  • Cache includes tags to identify which block of
    main memory is in each cache slot. The tag is
    usually a portion of the main memory address.

Block
Line Number
Tag
0 1 2 C-1



? ? ?

Block length (k words)
(a) Cache
47
Memory address





0 1 2 3 2n-1
Block (K words)
Block
Word Length
(b) Main Memory
48
Figure 4.15 Cache Read Operation (P119) CPU
requests contents of memory location Check
cache for this data If present, get from cache
(fast) If not present, read required block
from main memory to cache Then deliver from
cache to CPU
49
Typical Cache Organization
In this organization, the cache connects to the
processor via data, control, and address
lines. The data and address lines attach to data
and address buffers, which attach to a system bus
from which main memory is reached. When a cache
hit occurs, the data and address buffers are
disabled and communication is only between
processor and cache, with no system bus
traffic When a cache miss occurs, the desired
address is loaded onto the system bus and the
data are returned through a data buffer to both
the cache and main memory.
Figure 4.16 Typical Cache Organization
50
Elements of Cache Design
  • Size
  • Mapping Function
  • Direct
  • Associative
  • Set Associative
  • Replacement Algorithm
  • Least recently used (LRU)
  • First in first out (FIFO)
  • Least frequently used (LFU)
  • Random
  • Write Policy
  • Write through
  • Write back
  • Write once
  • Block Size
  • Number of Caches
  • Single or two level
  • Unified or split

51
Cache Size
  • A trade-off between cost per bit and access time
  • Cost
  • More cache is expensive
  • Speed
  • More cache is faster (up to a point)
  • Checking cache for data takes time
  • Optimum cache sizes Suggested between 1K and
    512K words.

52
Mapping Function
  • Three techniques
  • direct, associative, and set associative
  • Elements of the example
  • Cache of 64kByte
  • Cache block of 4 bytes
  • Data is transferred between memory and the cache
    in blocks of 4 bytes each.
  • i.e. cache is 16k (214) lines of 4 bytes
  • 16MBytes main memory
  • 24 bit address (22416M)
  • Main memory (4M blocks of 4 bytes each)

53
Direct Mapping
  • Each block of main memory maps to only one cache
    line
  • i.e. if a block is in cache, it must be in one
    specific place
  • Address is in two parts
  • Least Significant w bits identify unique word or
    byte within a block of main memory.
  • Most Significant s bits specify one memory block
  • The MSBs are split into a cache line field r and
    a tag of s-r (most significant)
  • The line field of r identifies one of the m2r
    lines of the cache

54
Direct Mapping Cache Line Table
Every row has the same cache line number Every
column has the same tag number.
  • Cache line Main Memory blocks
    assigned
  • 0 0, m, 2m, 2s-m
  • 1 1, m1, 2m12s-m1
  • m-1 m-1, 2m-1, 3m-1 2s-1

The mapping is expressed as i j modulo
m where i cache line number j main
memory block number m number of lines in
the cache
No two blocks in the same line have the same Tag
field!
55
Direct Mapping Cache Organization
The r-bit line number is used as an index into
the cache to access a particular line. If the
(s-r) bit tag number matches the tag number
currently stored in that line, then the w-bit
word number is used to select one of the 2w bytes
in that line. Otherwise, the s bits
tag-plus-line field is used to fetch a block from
main memory.
56
Direct MappingAddress Structure
Tag s-r
Line or Slot r
Word w
14
2
8
  • 24 bit address
  • w 2 bit word identifier (4 byte block)
  • s22 bit block identifier
  • 8 bit tag (22-14)
  • 14 bit slot or line
  • No two blocks in the same line have the same Tag
    field
  • Check contents of cache by finding line and
    checking Tag

57
Direct Mapping Example
The cache is organized as 16K214 lines of 4
bytes each. The main memory consists of
16Mbytes, organized as 4M blocks of 4 bytes
each. i j modulo m i cache line number
j main memory block number m number of
lines in the cache Note that no two blocks that
map into the same line number have the same tag
number.
Main Memory Address
58
Direct Mapping pros cons
  • Advantages
  • Simple
  • Inexpensive
  • Disadvantages
  • Fixed location for given block
  • If a program accesses 2 blocks that map to the
    same line repeatedly, cache misses are very high

59
Associative Mapping
  • A main memory block can load into any line of
    cache
  • Memory address is interpreted as a tag and a word
    field.
  • Tag uniquely identifies block of memory
  • Every lines tag is examined for a match
  • Disadvantages of associative mapping
  • Cache searching gets expensive
  • Complex circuitry required to examine the tags of
    all caches in parallel.

60
Fully Associative Cache Organization
61
Associative MappingAddress Structure
Word 2 bit
Tag 22 bit
  • 22 bit tag stored with each 32 bit (4B) block of
    data
  • Compare tag field with tag entry in cache to
    check for hit
  • Least significant 2 bits of address identify
    which 16-bit word is required from 32 bit data
    block
  • e.g.
  • Address Tag Data Cache line
  • 16339C 058CE7 FEDCBA98 0001

62
Associative Mapping Example
Main Memory Address
63
Set Associative Mapping
  • Cache is divided into a number of sets
  • Each set contains a number of lines
  • A given block maps to any line in a given set
  • e.g. Block B can be in any line of set i
  • e.g. 2 lines per set
  • 2 way associative mapping
  • A given block can be in one of 2 lines in only
    one set

64
Set Associative Mapping
  • In this case , the cache is divided into v sets,
    each of which consists of k lines.
  • The relationships are
  • m v k
  • i j modulo v
  • where
  • icache set number
  • jmain memory block number
  • mnumber of lines in the cache
  • This is referred to as k-way set associative
    mapping.

65
Two Way Set Associative Cache Organization
The d set bits specify one of v2d sets. The s
bits of the tag and set fields specify one of the
2s blocks of main memory. With K-way set
associative mapping, the tag in a memory address
is much smaller and is only compared to the k
tags within a single set.
66
Set Associative MappingExample
  • 13 bit set number
  • Block number in main memory is modulo 213
  • 000000, 00A000, 00B000, 00C000 map to same set

67
Set Associative MappingAddress Structure
Word 2 bit
Tag 9 bit
Set 13 bit
  • Use set field to determine cache set to look in
  • TagSet field specifies one of the blocks in the
    main memory.
  • Compare tag field to see if we have a hit
  • e.g
  • Address Tag Data Set number
  • 1FF 7FFC 1FF 24682468 1FFF

68
Two Way Set Associative Mapping Example
e.g Address Tag Data Set number 1FF
7FFC 1FF 24682468 1FFF 02C 0004
02C 11235813 0001
Main Memory Address
69
Replacement Algorithms (1)Direct mapping
  • When a new block is brought into the cache, one
    of the existing blocks must be replaced.
  • Direct mapping
  • No choice
  • Each block only maps to one line
  • Replace that line

70
Replacement Algorithms (2)Associative Set
Associative
  • Hardware implemented algorithm (speed)
  • Least Recently used (LRU)
  • Replace that block in the set which has been in
    the cache longest with no reference to it. (hit
    ratio time)
  • e.g. in 2 way set associative
  • Which of the 2 block is LRU?
  • First in first out (FIFO)
  • replace block in the set that has been in cache
    longest. (time)
  • Least frequently used
  • replace block in the set which has had fewest
    hits. (hit ratio)
  • Random

71
Write Policy
  • Must not overwrite a cache block unless main
    memory is up to date
  • Problems to contend with
  • More than one device may have access to main
    memory.
  • Data inconsistent between memory and cache
  • Multiple CPUs may have individual caches
  • Data inconsistent among caches
  • Write Policy
  • Write through
  • Write back
  • Write once

72
Write through
  • All writes go to main memory as well as cache
  • Any other processor-cache can monitor main memory
    traffic to keep local (to CPU) cache updated.
  • Disadvantages
  • Lots of traffic
  • Slows down writes

73
Write back
  • Updates initially made in cache only
  • Update bit for cache slot is set when update
    occurs
  • If block in cache is to be replaced, write to
    main memory only if update bit is set
  • Other caches get out of sync
  • I/O must access main memory through cache
  • Because portions of main memory are invalid

74
Approaches to cache coherency
  • Bus watching with write through
  • Each cache controller monitors the address lines
    to detect write operations to memory by other bus
    masters.
  • This strategy depends on the use of a
    write-through policy by all cache controller.
  • Hardware transparency
  • Additional hardware is used to ensure that all
    the updates to main memory via cache are
    reflected in all caches.
  • Noncachable memory
  • Only a portion of main memory is shared by more
    than one processor.
  • In such a system, all accesses to shared memory
    are cache misses, because the shared memory is
    never copied to the cache.
  • The noncachable memory can be identified using
    chip-select logic or high-access bits.

75
Line Size
  • The principle of locality
  • Data in the vicinity of a referenced word is
    likely to be referenced in the near future.
  • The relationship between block size and hit ratio
    is complex, depending on the locality
    characteristics of a particular program, and no
    definitive optimum value has been found.
  • A size of from two to eight words seems
    reasonably close to optimum.

76
Number of caches
  • A single cache
  • Multiple caches
  • The number of levels of caches
  • The use of unified versus split caches
  • Split caches one dedicated to instructions and
    one dedicated to data
  • Key advantage of split caches eliminate
    contention for cache between the instruction
    processor and the execution unit.
  • Unified cache a single cache used to store
    references to both data and instructions
  • For a given cache size, a unified cache has a
    higher hit rate than split caches because it
    balances the load between instruction and data
    fetches automatically.

77
Number of caches
  • The on-chip cache cache and processor on the
    same chip
  • When the requested instruction or data is found
    in the on-chip cache, the bus access is
    eliminated. Because of the short data paths
    internal to the processor, on-chip cache accesses
    will complete appreciably faster than would even
    zero-wait state bus cycles.
  • Advantages
  • Reduce the processors external bus activity
  • Speed up execution times
  • Increase overall system performance
  • A two-level cache
  • The internal cache designated as level 1 (L1)
  • The external cache designated as level 2 (L2)

78
4.4 Pentium Cache
  • Foreground reading
  • Find out detail of Pentium II cache systems
  • NOT just from Stallings!

79
4.5 Newer RAM Technology (1)
  • Basic DRAM same since first RAM chips
  • Constraints of the traditional DRAM chip
  • its internal architecture and its interface
    to the processors memory bus.
  • Enhanced DRAM
  • Contains small SRAM as well
  • SRAM holds last line read
  • A comparator stores the 11-bit value of the most
    recent row address selection.
  • Cache DRAM (CDRAM)
  • Larger SRAM component
  • Use as cache or serial buffer

80
Newer RAM Technology (2)
  • Synchronous DRAM (SDRAM)
  • Access is synchronized with an external clock
    unlike DRAM asynchronous.
  • Address is presented to RAM
  • Since SDRAM moves data in time with system clock,
    CPU knows when data will be ready
  • CPU does not have to wait, it can do something
    else
  • Burst mode allows SDRAM to set up stream of data
    and fire it out in block

81
Internal logic of the SDRAM
In burst mode, a series of data bits can be
clocked out rapidly after the first bit has been
accessed. Burst mode is useful when all the bits
to be accessed are in sequence and in the same
row of the array as the initial access A
dual-bank internal architecture that improves
opportunities for on-chip parallelism. The mode
register and associated control logic provide a
mechanism to customize the SDRAM to suit specific
system needs.
82
Newer RAM Technology (3)
  • Foreground reading
  • Check out any other RAM you can find
  • See Web site
  • The RAM Guide

83
Exercises
  • P143 4.4, 4.6, 4.7, 4.8
  • P145 4.20
  • Deadline
Write a Comment
User Comments (0)
About PowerShow.com