Chapter6. Memory Organization - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Chapter6. Memory Organization

Description:

Chapter6. Memory Organization The objective of memory design is to provide adequate storage capacity with an acceptable level of performance and cost. memory ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 43
Provided by: ETAS3
Category:

less

Transcript and Presenter's Notes

Title: Chapter6. Memory Organization


1
Chapter6. Memory Organization
2
Transfer between P and M should be such that P
can operate at its maximum speed. ? not
feasible to use a single memory using one
technology. CPU registers a small set
of high-speed registers in P as working memory
for temporary storage of
instructions and data. Single clock cycle access.
Main(primary) memory can be accessed
directly and rapidly by CPU.
While an IC technology is similar to that of
CPU registers, access is
slower because of large capacity and physical
separation from CPU Secondary(back up)
memory much larger in capacity, much slower,
and much cheaper than main memory. Cache
an intermediate temporary storage unit between
processor registers and main-memory. One to
three clock cycle access.
3
  • The objective of memory design is to provide
    adequate storage
  • capacity with an acceptable level of performance
    and cost. ? memory
  • hierarchy, automatic storage concepts, virtual
    memory concepts, and the
  • design of communication link.
  • Memory Device Characteristics
  • 1. Cost C P/S (dollars/bits)
  • 2. Access time(tA) the average time
    required to read one word from
  • the memory. From the time a read
    request is received by memory
  • to the time when all the requested
    information has been made at
  • the memory output.
  • depending on the physical
    nature of the storage medium and
  • on the access mechanism used.
    Memory units with fast
  • access are expensive.
  • 3. Access mode
  • RAM(Random Access Memory) accessed in any
    order
  • and access time is independent of the
    location.
  • Serial-access memory(tape)

4
4. Alterability ROM(Read Only Memory),
PROM(Programmable), EPROM(Extended)
. 5. Performance
of storage destructive readout, dynamic
storage, and volatility. ex) dynamic
memory(DRAM) required periodic refreshing.
static random access
memory(SRAM) require no periodic refreshing.
DRAM is much cheaper then SRAM

volatile if the stored information can be
destroyed by a power failure. 6. Cycle
time(tM) the mean time that must elapse between
the initiation of two consecutive access
operations. tM can be greater than tA. (
Dynamic memory cant initiate a new access until
a refresh operation) 7. Physical
characteristics Storage density
Reliability MTBF.
5
(No Transcript)
6
RAM The access and cycle times for every
location are constant and
independent of its position.
7
(No Transcript)
8
Array organization The memory address is
partitioned into d components so that the
address Ai of cell ci becomes a d-dimensional
vector (Ai1, Ai2, ,Aid)Ai. Each of d
parts goes to a different decoder ? d-dimensional
array. Usually, we use 2-dimensional array
organization. If
less access circuitry and less
time. 2-D memory organization matches
well the circuit structure by IC technology.
Key issue How to reduce access time,
fault-tolerant techniques 6.2 Memory Systems
A hierarchical storage system managed by
operating system. 1. To free programmers from
the need to carry out storage allocation and to
permit efficient sharing of memory space
among different users. 2. To make programs
independent of the configuration and the capacity
of the memory systems used during
their execution. 3. To achieve the high access
rates and low cost per bit that is possible with
a memory hierarchy ? implemented by
an automatic address mapping
mechanism. A typical hierarchy of memory ( M1,
M2, , Mk ).
9
Generally, all information in Mi-1 at any time is
also stored in Mi, but not vice versa. Let, Ci
cost per bit Ci gt C i1
tAi access time tAi lt tAi1
Si storage space Si
lt Si1
10
If the address which CPU generates is currently
assigned only to Mi for i ? 1, the execution of
the program must be suspended until reassigned
from Mi to M1. ? very slow
? To work
efficiently, the address by CPU should be found
in M1, as often as possible. Memory
hierarchy works due to the common characteristic
of programs (locality of
reference)
11
  • Locality of reference The address generated by
    a typical program tend to be
  • confined to small regions of its logical
    address space over the short term.
  • spatial locality Consecutive memory
    references are to address that are close to
  • one another in the memory-address space. ?
    Instead of transferring one
  • instruction I to M1, transfer one
    page of consecutive words containing I.
  • temporal locality Is in a loop are
    executed repeatedly, resulting in a high
  • frequency of reference to their addresses.

12
? Hit ratio H the prob. that a
logical address generated by CPU refers to
information in M1 ? want H to be 1. By
executing a set of representative programs,
N1 of address references by M1.
N2 of address references by
M2. Miss ratio 1 - H
13
  • 6.2.2 Address Translation map the logical
    addresses into the physical address
  • space P of main memory ? by the OS while
    the program is being executed.
  • Static translation assign fixed values to
    the base address of each block when
  • the program is first
    loaded.
  • Dynamic translation allocates storage during
    execution.
  • Base addressing Aeff B D ( or Aeff B . D
    )

14
(No Transcript)
15
Translation look-aside buffer(TLB)
16
  • Segments A segment is a set of logically
    related, contiguous words such as
  • programs or data sets.
  • The physical addresses assigned to the segments
    are kept in a segment table.
  • A presence bit P that indicates whether the
    segment is currently assigned to M1.
  • A copy bit C that specifies whether this is the
    original ( master ) copy of the
  • descriptor.
  • A 20-bit size field Z that specifies the number
    of words in the segment.
  • A 20-bit address field S that is the segments
    real address in M1 ( when P 1 )
  • or M2 ( when P 0 ).

17
  • Pages fixed-length blocks
  • adv. very simple memory allocation.
  • Logical address a page address displacement
    within the page.
  • Page table logical page address and
    corresponding physical address.
  • disadv. no logical significance between
    neighboring pages.
  • Paged segment divide each segment into pages.
  • Logical address a segment address a page
    address displacement
  • adv. dont need to store the segment in a
    contiguous region of the main memory (more
    flexible memory management).

18
  • Optimal page size on the paged segment.
  • Sp page size ? impact on storage utilization
    and memory access rate.
  • too small Sp ? large page
    table ? reduced utilization.
  • too big Sp ? excessive internal
    fragmentation.
  • S memory space overhead due to the paged
    segment.

19
  • A special processor MMU(Memory Management Unit)
    to handle address translations
  • Main memory allocation
  • Main memory is divided into regions each of
    which has a base address to which a
  • particular block is to be assigned.
  • Main memory allocation the process to determine
    the region.
  • 1. an occupied space list block name,
    address, size.
  • 2. an available space list empty space.
  • 3. a secondary memory directory.
  • Deallocated When a block is no longer required
    in main memory, it transfer from the
  • occupied space list to the available space
    list.

20
  • Suppose that a block Ki of ni words is
    transferred from secondary to main memory.
  • preemptive if an incoming block can be
    assigned to a region occupied by
  • another block either by moving or
    expelling.
  • non-preemptive if an incoming block can
    be placed only in an unoccupied region
    that is large enough to accommodate.
  • ? non-preemptive allocation if none of blocks
    is preempted by a block Ki of ni

  • words, then
  • ? find an unoccupied available region of ni
    or more words.
  • ? first fit method and best fit method.
  • ? firstfit method scans the map sequentially
    until available region is found,
  • then allocate.
  • ? bestfit method scans the map sequentially
    and then Ki to a region nj ? ni such that
    (nj ni) is minimized.

21
(No Transcript)
22
? preemptive allocation In non-preemptive
allocation, overflow can occur.
reallocation for more efficient use 1. The
blocks already in M1can be relocated within M1
to make a large gap for the
incoming block. 2. Make more
available region by deallocating blocks. ? how to
select the blocks to
be replaced. Dirty blocks(modified blocks)
before overwritten, it must be copied into
the secondary memory ?
I/O operation Clean
blocks(unmodified blocks) simply
overwrite Compaction technique combine into a
single block.
Adv eliminate the problem of selecting an
available region. Disadv. compaction
time required.
23
Replacement policies to maximize the hit-ratio
FIFO and LRU Optimal replacement strategy at
time ti, determine tj gt ti at which the next
reference to
block K is to occur, than replace K for which
(tj-ti) is
maximum. ? will require two passes through the
program. The first is a simulation run to
determine the sequence SB of virtual block
addresses. The second is the execution run, which
uses the optimal sequence SBOPT to specify the
blocks to be replaced. not practical
FIFO Select for replacement the block least
recently loaded into main memory. LRU(Least
Recently Used) Select for replacement the least
recently accessed block,
assuming that the least recently used block is
the one least likely to be reference in the
future. Implementation FIFO much simple.
Disadvantage of FIFO A frequently used
block such as one containing a program loop
may be replaced because it is the oldest block
(terrible) but LRU avoid the replacement
of frequently used block. Factors of H.
1. Types of address streams encountered.
2. Average block size. 3. Capacity
of main memory. 4. Replacement
policy.
24
Page address stream 2 3 2 1 5 2 4
5 3 2 5 2
25
6.3. Caches
High speed memory Several
approaches to increase the effective P, M
interface bandwidth. 1. decrease the memory
access time by using a faster technology(limited
due to cost). 2. access more
than one word during memory cycle. 3.
insert a cache memory between P and M.
4. use associate addressing in place of
the random access method. Cache a small fast
memory placed between P and M.
Many of techniques for virtual memory management
have applied to cache systems
26
In a multiprocessor system, each processor
has its own cache to reduce the effective time by
a processor to access addresses, instructions, or
data. Cache store a set of main memory
address Ai and the corresponding word M(Ai). A
physical address A is sent from CPU to cache at
the start of read or write memory access cycle.
The cache compares the address tag A to all the
addresses it currently stores. If there is a
match(cache hit), a cache selects M(A). If a
cache miss occurs, copy into cache the main
memory block P(A) containing the desired item
M(A).
27
look-aside the cache and the main memory
are directly connected to the
system bus look-through
faster, but more expensive CPU communicates with
the cache via a separate bus. The system
bus is available for use by other units to
communicate with main memory ? cache access
and main-memory access not involving
CPU can proceed concurrently. Only after a
cache miss, CPU sends memory requests to main
memory
28
(No Transcript)
29
(No Transcript)
30
  • Two important issues of the cache design.
  • 1.How to map main memory addresses into cache
    addresses.
  • 2.How to update main memory when a write
    operation changes the content
  • of the cache.

31
Updating main memory
write-back The cache block into which any write
operation occurred, are copied back
into the main memory. Single processor case
When this part removed, copied back into the main
memory
Multi-processor case inconsistency
write
P1
P2
M1
Pk
Problem if there are several processors with
independent caches.
32
write-through transfer the data
word to both cache and main memory during
each write cycle, even when the target address is
already assigned to the cache. ? more
write to main memory then
write-back
33
6.3.2. Address Mapping
  • When a tag address is present to the cache, it
    must be quickly compared to the
  • stored tags.
  • scanning all tag in sequence
    unacceptably slow
  • the fastest technique associative( or
    content ) addressing to compare
  • simultaneously all tags.
  • Associative addressing Any stored item can be
    accessed by using the contents of
  • the item in question as an address.
  • associated memory content addressable memory
    ( CAM )
  • Item in associate memory have two-field format
  • Key,
    Data
  • Stored address
    Information to be accessed
  • An associative cache a tag as the key.
  • the incoming tag is compared
    simultaneously to all tags stored in the
  • caches tag memory.

34
Associative memory
  • Any subfield of the word can be the key,
    specified by a mask register.
  • Since all words in the memory are required to
    compare their keys with the input
  • key simultaneously, each needs its own match
    circuit.
  • much more complex and expensive than
    conventional memories VLSI
  • techniques have made CAM economically
    feasible.

35
(No Transcript)
36
  • All words share a common set
  • of data and mask lines for each
  • position simultaneous
  • comparisons.

37
Direct mapping simpler address mapping for
caches
  • Simple implementation The low order S bits of
    each block address form a set
  • address.
  • Main drawback If two or more frequently used
    blocks happen to map onto the
  • same region in the
    cache, the hit ratio drops sharply.

38
Set-associative mapping associate direct
mapping
39
6.3.3. Structure VS Performance
  • Cache types I-cache and D-cache the
    different access patterns. Programs
  • involve few write accesses, more temporal and
    spatial locality than the data
  • they process.
  • Two or more cache levels in high-performance
    systems
  • the feasibility of including part of real memory
    space on a microprocessor chip
  • and growth in the size of main memory.
  • L1 cache on-chip memory
  • L2 cache off-chip memory
  • The desirability of an L2 cache increases with
    the size of main memory, assuming
  • L1 cache has fixed size.

40
Performance
  • tA tA1 ( 1 H ) tB
  • tA average access time
  • tA1 cache access time
  • tA2 M2 access time
  • tB block transfer time from
    M2 to M1
  • With a sufficiently wide M2-to-M1 data bus, a
    block can be loaded into the cache
  • in a single M2 read operation tB
    tA2
  • tA tA1 ( 1 H ) tA2
  • Suppose that M2 is six times slower than M1
  • For H 99 tA 1.06 tA1 ,
  • For H 95 tA 1.30
    tA1
  • A small decrease in the caches H has a
    disproportionately large impact on
  • performance.

41
  • A general approach to the design of the caches
    main size parameters S1 ( of sets), K ( of
    Blocks per set ), and P1 ( of bytes per block )
  • 1.Select a block (line) size p1. This value is
    typically the same as the width w
  • of the data path between the CPU and main
    memory, or it is a small multiple of
  • w.
  • 2.Select the programs for the representative
    workloads and estimate the number of
  • address references to be simulated. Particular
    care should be taken to ensure that
  • the cache is initially filled before H is
    measured.
  • 3.Simulate the possible designs for each set size
    s1 and associativity degree k of
  • acceptable cost. Methods similar to stack
    processing ( section 6.2.3 ) can be used
  • to simulate several cache configurations in a
    single pass.
  • 4. Plot the resulting data and determine a
    satisfactory trade-off between
  • performance and cost.

42
  • In many cases, doubling the cache size from S1 to
    2S1 increases H by about 30
Write a Comment
User Comments (0)
About PowerShow.com