Chapter 6: Memory - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 6: Memory

Description:

Chapter 6: Memory Memory is organized into a hierarchy Memory near the top of the hierarchy is faster, but also more expensive, so we have less of it in the computer ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 26
Provided by: nkuEdufo
Learn more at: https://www.nku.edu
Category:
Tags: chapter | memory

less

Transcript and Presenter's Notes

Title: Chapter 6: Memory


1
Chapter 6 Memory
  • Memory is organized into a hierarchy
  • Memory near the top of the hierarchy is faster,
    but also more expensive, so we have less of it in
    the computer this presents a challenge
  • how do we make use of faster memory without
    having to go down the hierarchy to slower memory?
  • CPU accesses memory at least once per
    fetch-execute cycle
  • Instruction fetch
  • Possible operand reads
  • Possible operand write
  • RAM is much slower than the CPU, so we need a
    compromise
  • Cache
  • We will explore memory here
  • RAM, ROM, Cache, Virtual Memory

2
Types of Memory
  • Cache
  • SRAM (static RAM) made up of flip-flops (like
    Registers)
  • Slower than registers because of added circuits
    to find the proper cache location, but much
    faster than RAM
  • DRAM is 10-100 times slower than SRAM
  • ROM
  • Read-only memory contents of memory are fused
    into place
  • Variations
  • PROM programmable (comes blank and the user can
    program it once)
  • EPROM erasable PROM, where the contents of all
    of PROM can be erased by using ultraviolet light
  • EEPROM electrical fields can alter parts of the
    contents, so it is selectively erasable, a newer
    variation, flash memory, provides greater speed
  • RAM
  • stands for random access memory because you
    access into memory by supplying the address
  • it should be called read-write memory (Cache and
    ROMs are also random access memories)
  • Actually known as DRAM (dynamic RAM) and is built
    out of capacitors
  • Capacitors lose their charge, so must be
    recharged often (every couple of milliseconds)
    and have destructive reads, so must be recharged
    after a read

3
Memory Hierarchy Terms
  • The goal of the memory hierarchy is to keep the
    contents that are needed now at or near the top
    of the hierarchy
  • We discuss the performance of the memory
    hierarchy using the following terms
  • Hit when the datum being accessed is found at
    the current level
  • Miss when the datum being accessed is not found
    and the next level of the hierarchy must be
    examined
  • Hit rate how many hits out of all memory
    accesses
  • Miss rate how many misses out of all memory
    accesses
  • NOTE hit rate 1 miss rate, miss rate 1
    hit rate
  • Hit time time to access this level of the
    hierarchy
  • Miss penalty time to access the next level

4
Effective Access Time Formula
  • We want to determine the impact that the memory
    hierarchy has on the CPU
  • In a pipeline machine, we expect 1 instruction to
    leave the pipeline each cycle
  • the system clock is usually set to the speed of
    cache
  • but a memory access to DRAM takes more time, so
    this impacts the CPUs performance
  • On average, we want to know how long a memory
    access takes (whether it is cache, DRAM or
    elsewhere)
  • effective access time hit time miss rate
    miss penalty
  • that is, our memory access, on average, is the
    time it takes to access the cache, plus for a
    miss, how much time it takes to access memory
  • With a 2-level cache, we can expand our formula
  • average memory access time hit time0 miss
    rate0 (hit time1 miss rate1 miss penalty1 )
  • We can expand the formula more to include access
    to swap space (hard disk)

5
Locality of Reference
  • The better the hit rate for level 0, the better
    off we are
  • Similarly, if we use 2 caches, we want the hit
    rate of level 1 to be as high as possible
  • We want to implement the memory hierarchy to
    follow Locality of Reference
  • accesses to memory will generally be near recent
    memory accesses and those in the near future will
    be around this current access
  • Three forms of locality
  • Temporal locality recently accessed items tend
    to be accessed again in the near future (local
    variables, instructions inside a loop)
  • Spatial locality accesses tend to be clustered
    (accessing ai will probably be followed by
    ai1 in the near future)
  • Sequential locality instructions tend to be
    accessed sequentially
  • How do we support locality of reference?
  • If we bring something into cache, bring in
    neighbors as well
  • Keep an item in the cache for awhile as we hope
    to keep using it

6
Cache
  • Cache is fast memory
  • Used to store instructions and data
  • It is hoped that what is needed will be in cache
    and what isnt needed will be moved out of cache
    back to memory
  • Issues
  • What size cache? How many caches?
  • How do you access what you need?
  • since cache only stores part of what is in
    memory, we need a mechanism to map from the
    memory address to the location in cache
  • this is known as the caches mapping function
  • If you have to bring in something new, what do
    you discard?
  • this is known as the replacement strategy
  • What happens if you write a new value to cache?
  • we must update the now obsolete value(s) in memory

7
Cache and Memory Organization
  • Group memory locations into lines (or refill
    lines)
  • For instance, 1 line might store 16 bytes or 4
    words
  • The line size varies architecture-to-architecture
  • All main memory addresses are broken into two
    parts
  • the line
  • the location in the line
  • If we have 256 Megabytes, word accessed, with
    word sizes of 4, and 4 words per line, we would
    have 16,777,216 lines so our 26 bit address has
    24 bits for the line number and 2 bits for the
    word in the line
  • The cache has the same organization but there are
    far fewer line numbers (say 1024 lines of 4 words
    each)
  • So the remainder of the address becomes the tag
  • The tag is used to make sure that the line we
    want is the line we found

The valid bit is used to determine if the given
line has been modified or not (is the line in
memory still valid or outdated?)
8
Types of Cache
  • The mapping function is based on the type of
    cache
  • Direct-mapped each entry in memory has 1
    specific place where it can be placed in cache
  • this is a cheap and easy cache to implement (and
    also fast), but since there is no need for a
    replacement strategy it has the poorest hit rate
  • Associative any memory item can be placed in
    any cache line
  • this cache uses associative memory so that an
    entry is searched for in parallel this is
    expensive and tends to be slower than a
    direct-mapped cache, however, because we are free
    to place an entry anywhere, we can use a
    replacement strategy and thus get the best hit
    rate
  • Set-associative a compromise between these two
    extremes
  • by grouping lines into sets so that a line is
    mapped into a given set, but within that set, the
    line can go anywhere
  • a replacement strategy is used to determine which
    line within a set should be used, so this cache
    improves on the hit rate of the direct-mapped
    cache
  • while not being as expensive or as slow as the
    associative cache

9
Direct Mapped Cache
  • Assume m refill lines
  • A line j in memory will be found in cache at
    location j mod m
  • Since each line has 1 and only 1 location in
    cache, there is no need for a replacement
    strategy
  • This yields poor hit rate but fast performance
    (and cheap)
  • All addresses are broken into 3 parts
  • a line number (to determine the line in cache)
  • a word number
  • the rest is the tag compare the tag to make
    sure you have the right line

Assume 24 bit addresses, if the cache has 16384
lines, each storing 4 words, then we have the
following
10
(No Transcript)
11
Associative Cache
  • Any line in memory can be placed in any line in
    cache
  • No line number portion of the address, just a tag
    and a word within the line
  • Because the tag is longer, more tag storage space
    is needed in the cache, so these caches need more
    space and so are more costly
  • All tags are searched simultaneously using
    associative memory to find the tag requested
  • This is both more expensive and slower than
    direct-mapped caches but, because there are
    choices of where to place a new line, associative
    caches require a replacement strategy which might
    require additional hardware to implement

Notice how big the tag is, our cache now requires
more space to store more tag space!
From our previous example, our address now looks
like this
12
Set Associative Cache
  • In order to provide some degree of variability in
    placement, we need more than a direct-mapped
    cache
  • A 2-way set associative cache provides 2 refill
    lines for each line number
  • Instead of n refill lines, there are now n / 2
    sets, each set storing 2 refill lines
  • We can think of this as having 2 direct-mapped
    caches of half the size
  • Because there are ½ as many refill lines, the
    line number has 1 fewer bits and the tag number
    has 1 more
  • We can expand this to
  • 4-way set associative
  • 8-way set associative
  • 16-way set associative, etc
  • As the number increases, the hit rate improves,
    but the expense also increases and the hit time
    gets worse
  • Eventually we reach an n-way cache, which is a
    fully associative cache

13
(No Transcript)
14
Replacement And Write Strategies
  • When we need to bring in a new line from memory,
    we will have to throw out a line
  • Which one?
  • No choice in a direct-mapped cache
  • For associative and set-associative, we have
    choices
  • We rely on a replacement strategy to make the
    best choice
  • this should promote locality of reference
  • 3 replacement strategies are
  • Least recently used (hard to implement, how do we
    determine which line was least recently used?)
  • First-in first out (easy to implement, but not
    very good results)
  • Random
  • If we are to write a datum to cache, what about
    writing it to memory?
  • Write-through write to both cache and memory at
    the same time
  • if we write to several data in the same line
    though, this becomes inefficient
  • Write-back wait until the refill line is being
    discarded and write back any changed values to
    memory at that time
  • This causes stale or dirty values in memory

15
Virtual Memory
  • Just as DRAM acts as a backup for cache, hard
    disk (known as the swap space) acts as a backup
    for DRAM
  • This is known as virtual memory
  • Virtual memory is necessary because most programs
    are too large to store entirely in memory
  • Also, there are parts of a program that are not
    used very often, so why waste the time loading
    those parts into memory if they wont be used?
  • Page a fixed sized unit of memory all
    programs and data are broken into pages
  • Paging the process of bringing in a page when
    it is needed (this might require throwing a page
    out of memory, moving it back to the swap disk)
  • The operating system is in charge of Virtual
    Memory for us
  • it moves needed pages into memory from disk and
    keeps track of where a specific page is placed

16
The Paging Process
  • When the CPU generates a memory address, it is a
    logical (or virtual) address
  • The first address of a program is 0, so the
    logical address is merely an offset into the
    program or into the data segment
  • For instance, address 25 is located 25 from the
    beginning of the program
  • But 25 is not the physical address in memory, so
    the logical address must be translated (or
    mapped) into a physical address
  • Assume memory is broken into fixed size units
    known as frames (1 page fits into 1 frame)
  • We know the logical address as its page and the
    offset into the page
  • We have to translate the page into the frame
    (that is, where is that particular page currently
    be stored in memory or is it even in memory?)
  • Thus, the mapping process for paging means
    finding the frame and replacing the page with
    it

17
Example of Paging
Here, we have a process of 8 pages but only 4
physical frames in memory therefore we must
place a page into one of the available frames in
memory whenever a page is needed At this point
in time, pages 0, 3, 4 and 7 have been moved into
memory at frames 2, 0, 1 and 3 respectively This
information (of which page is stored in which
frame) is stored in memory in a location known as
the Page Table. The page table also stores
whether the given page has been modified (the
valid bit much like our cache)
18
A More Complete Example
Virtual address mapped to physical address
the page table
Address 1010 is page 101, item 0 Page 101 (5)
is located in frame 11 (3) so the item 1010 is
found at 110
Logical and physical memory for our program
19
Page Faults
  • Just as cache is limited in size, so is main
    memory a process is usually given a limited
    number of frames
  • What if a referenced page is not currently in
    memory?
  • The memory reference causes a page fault
  • The page fault requires that the OS handle the
    problem
  • The process status is saved and the CPU switches
    to the OS
  • The OS determines if there is an empty frame for
    the referenced page, if not, then the OS uses a
    replacement strategy to select a page to discard
  • if that page is dirty, then the page must be
    written to disk instead of discarded
  • The OS locates the requested page on disk and
    loads it into the appropriate frame in memory
  • The page table is modified to reflect the change
  • Page faults are time consuming because of the
    disk access this causes our effective memory
    access time to deteriorate badly!

20
Another Paging Example
Here, we have 13 bits for our addresses even
though main memory is only 4K 212
21
The Full Paging Process
We want to avoid memory accesses (we prefer cache
accesses) but if every memory access now
requires first accessing the page table, which is
in memory, it slows down our computer So we move
the most used portion of the page table into a
special cache known as the Table Lookaside Buffer
or Translation Lookaside Buffer, abbrev. as the
TLB The process is also shown in the next slide
as a flowchart
22
(No Transcript)
23
A Variation Segmentation
  • One flaw of paging is that, because a page is
    fixed in size, a chunk of code might be divided
    into two or more pages
  • So page faults can occur any time
  • Consider, as an example, a loop which crosses 2
    pages
  • If the OS must remove one of the two pages to
    load the other, then the OS generates 2 page
    faults for each loop iteration!
  • A variation of paging is segmentation
  • instead of fixed size blocks, programs are
    divided into procedural units equal to their size
  • We subdivide programs into procedures
  • We subdivide data into structures (e.g., arrays,
    structs)
  • We still use the on-demand approach of virtual
    memory, but when a block of code is loaded into
    memory, the entire needed block is loaded in
  • Segmentation uses a segment table instead of a
    page table and works similarly although addresses
    are put together differently
  • But segmentation causes fragmentation when a
    segment is discarded from memory for a new
    segment, there may be a chunk of memory that goes
    unused
  • One solution to fragmentation is to use paging
    with segmentation

24
Effective Access With Paging
  • We modify our previous formula to include the
    impact of paging
  • effective access time hit time0 miss rate0
    (hit time1 miss rate1 (hit time2 miss rate2
    miss penalty2))
  • Level 0 is on-chip cache
  • Level 1 is off-chip cache
  • Level 2 is main memory
  • Level 3 is disk (miss penalty2 is disk access
    time, which is lengthy)
  • Example
  • On chip cache hit rate is 90, hit time is 5 ns,
    off chip cache hit rate is 96, hit time is 10
    ns, main memory hit rate is 99.8, hit time is 60
    ns, memory miss penalty is 10 ms 10,000 ns
  • memory miss penalty is the same as the disk hit
    time, or disk access time
  • Access time 5 ns .10 (10 ns .04 (60 ns
    .002 10,000 ns)) 6.32 ns
  • So our memory hierarchy adds over 20 to our
    memory access

25
Memory Organization
Here we see a typical memory layout Two on-chip
caches one for data, one for instructions with
part of each cache Reserved for a TLB One
off-chip cache to back-up both on-chip
caches Main memory, backed up by virtual memory
Write a Comment
User Comments (0)
About PowerShow.com