Virtual Memory - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Virtual Memory

Description:

Computers run lots of processes simultaneously. No full address ... Page size 4 kilobytes. Therefore: 0000. 0000. 0000. 0000. 0000. 0000. 0000. 0000. 0000. 0000 ... – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 65
Provided by: pradondet
Category:

less

Transcript and Presenter's Notes

Title: Virtual Memory


1
Virtual Memory
  • Pradondet Nilagupta
  • Spring 2005
  • (original notes from Prof. Michael Niemier,
  • Prof. Mike Schulte )

2
The Full Memory Hierarchy
Capacity Access Time Cost
Upper Level
Staging Xfer Unit
faster
CPU Registers 100s Bytes lt10s ns
Registers
prog./compiler 1-8 bytes
Instr. Operands
Cache K Bytes 10-100 ns 1-0.1 cents/bit
Cache
cache cntl 8-128 bytes
Blocks
Main Memory M Bytes 200ns- 500ns .0001-.00001
cents /bit
Memory
OS 4K-16K bytes
Pages
Disk G Bytes, 10 ms (10,000,000 ns) 10 - 10
cents/bit
Disk
-5
-6
user/operator Mbytes
Files
Larger
Tape infinite sec-min 10
Tape
Lower Level
-8
3
Virtual Memory
  • Some facts of computer life
  • Computers run lots of processes simultaneously
  • No full address space of memory for each process
  • Must share smaller amounts of physical memory
    among many processes
  • Virtual memory is the answer!
  • Divides physical memory into blocks, assigns them
    to different processes

4
Virtual Memory
  • Virtual memory (VM) allows main memory (DRAM) to
    act like a cache for secondary storage (magnetic
    disk).
  • VM address translation a provides a mapping from
    the virtual address of the processor to the
    physical address in main memory or on disk.

Compiler assigns data to a virtual address. VA
translated to a real/physical somewhere in
memory (allows any program to run
anywhere where is determined by a particular
machine, OS)
5
VM Benefit
  • VM provides the following benefits
  • Allows multiple programs to share the same
    physical memory
  • Allows programmers to write code as though they
    have a very large amount of main memory
  • Automatically handles bringing in data from disk

6
Virtual Memory Basics
  • Programs reference virtual addresses in a
    non-existent memory
  • These are then translated into real physical
    addresses
  • Virtual address space may be bigger than physical
    address space
  • Divide physical memory into blocks, called pages
  • Anywhere from 512 to 16MB (4k typical)
  • Virtual-to-physical translation by indexed table
    lookup
  • Add another cache for recent translations (the
    TLB)
  • Invisible to the programmer
  • Looks to your application like you have a lot of
    memory!
  • Anyone remember overlays?

7
VM Page Mapping
Process 1s Virtual Address Space
Page Frames
Process 2s Virtual Address Space
Disk
Physical Memory
8
VM Address Translation
12 bits
20 bits
Log2 of pagesize
Virtual page number
Page offset
Per-process page table
Valid bit Protection bits Dirty bt Reference bit
Page Table base
Physical page number
Page offset
To physical memory
9
Example of virtual memory
  • Relieves problem of making a program that was too
    large to fit in physical memory well.fit!
  • Allows program to run in any location in physical
    memory
  • (called relocation)
  • Really useful as you might want to run same
    program on lots machines

Logical program is in contiguous VA space here,
consists of 4 pages A, B, C, D The physical
location of the 3 pages 3 are in main memory
and 1 is located on the disk
10
Cache terms vs. VM terms
  • So, some definitions/analogies
  • A page or segment of memory is analogous to a
    block in a cache
  • A page fault or address fault is analogous to
    a cache miss

real/physical memory
so, if we go to main memory and our data isnt
there, we need to get it from disk
11
More definitions and cache comparisons
  • These are more definitions than analogies
  • With VM, CPU produces virtual addresses that
    are translated by a combination of HW/SW to
    physical addresses
  • The physical addresses access main memory
  • The process described above is called memory
    mapping or address translation

12
Cache VS. VM comparisons (1/2)
  • Its a lot like what happens in a cache
  • But everything (except miss rate) is a LOT worse

13
Cache VS. VM comparisons (2/2)
  • Replacement policy
  • Replacement on cache misses primarily controlled
    by hardware
  • Replacement with VM (i.e. which page do I
    replace?) usually controlled by OS
  • Because of bigger miss penalty, want to make the
    right choice
  • Sizes
  • Size of processor address determines size of VM
  • Cache size independent of processor address size

14
Virtual Memory
  • Timings tough with virtual memory
  • AMAT Tmem (1-h) Tdisk
  • 100nS (1-h) 25,000,000nS
  • h (hit rate) had to be incredibly (almost
    unattainably) close to perfect to work
  • so VM is a cache but an odd one.

15
Pages
16
Paging Hardware
How big is a page? How big is the page table?
17
Address Translation in a Paging System
18
How big is a page table?
  • Suppose
  • 32 bit architecture
  • Page size 4 kilobytes
  • Therefore

Offset 212
Page Number 220
19
Test Yourself
  • A processor asks for the contents of virtual
    memory address 0x10020. The paging scheme in use
    breaks this into a VPN of 0x10 and an offset of
    0x020.
  • PTR (a CPU register that holds the address of the
    page table) has a value of 0x100 indicating that
    this processes page table starts at location
    0x100.
  • The machine uses word addressing and the page
    table entries are each one word long.

20
Test Yourself
  • ADDR CONTENTS
  • 0x00000 0x00000
  • 0x00100 0x00010
  • 0x00110 0x00022
  • 0x00120 0x00045
  • 0x00130 0x00078
  • 0x00145 0x00010
  • 0x10000 0x03333
  • 0x10020 0x04444
  • 0x22000 0x01111
  • 0x22020 0x02222
  • 0x45000 0x05555
  • 0x45020 0x06666
  • What is the physical address calculated?
  • 10020
  • 22020
  • 45000
  • 45020
  • none of the above

21
Test Yourself
  • ADDR CONTENTS
  • 0x00000 0x00000
  • 0x00100 0x00010
  • 0x00110 0x00022
  • 0x00120 0x00045
  • 0x00130 0x00078
  • 0x00145 0x00010
  • 0x10000 0x03333
  • 0x10020 0x04444
  • 0x22000 0x01111
  • 0x22020 0x02222
  • 0x45000 0x05555
  • 0x45020 0x06666
  • What is the physical address calculated?
  • What is the contents of this address returned to
    the processor?
  • How many memory accesses in total were required
    to obtain the contents of the desired address?

22
Another Example
Physical memory 0 1 2 3 4 i 5 j 6 k 7 l 8 m 9 n 10
o 11 p 12 13 14 15 16 17 18 19 20 a 21 b 22 c 23
d 24 e 25 f 26 g 27 h 28 29 30 31
Logical memory 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i
9 j 10 k 11 l 12 m 13 n 14 o 15 p
Page Table
0 1 2 3
5 6 1 2
23
Replacement policies
24
Block replacement
  • Which block should be replaced on a virtual
    memory miss?
  • Again, well stick with the strategy that its a
    good thing to eliminate page faults
  • Therefore, we want to replace the LRU block
  • Many machines use a use or reference bit
  • Periodically reset
  • Gives the OS an estimation of which pages are
    referenced

25
Writing a block
  • What happens on a write?
  • We dont even want to think about a write through
    policy!
  • Time with accesses, VM, hard disk, etc. is so
    great that this is not practical
  • Instead, a write back policy is used with a dirty
    bit to tell if a block has been written

26
Mechanism vs. Policy
  • Mechanism
  • paging hardware
  • trap on page fault
  • Policy
  • fetch policy when should we bring in the pages
    of a process?
  • 1. load all pages at the start of the process
  • 2. load only on demand demand paging
  • replacement policy which page should we evict
    given a shortage of frames?

27
Replacement Policy
  • Given a full physical memory, which page should
    we evict??
  • What policy?
  • Random
  • FIFO First-in-first-out
  • LRU Least-Recently-Used
  • MRU Most-Recently-Used
  • OPT (will-not-be-used-farthest-in-future)

28
Replacement Policy Simulation
  • example sequence of page numbers
  • 0 1 2 3 42 2 37 1 2 3
  • FIFO?
  • LRU?
  • OPT?
  • How do you keep track of LRU info? (another data
    structure question)

29
Page tables and lookups
  • 1. its slow! Weve turned every access to
    memory into two accesses to memory
  • solution add a specialized cache called a
    translation lookaside buffer (TLB) inside the
    processor
  • punt this issue for a lecture (until Thursday)
  • 2. its still huge!
  • even worse were ultimately going to have a page
    table for every process. Suppose 1024 processes,
    thats 4GB of page tables!

30
Paging/VM (1/3)
Disk
Physical Memory
Operating System
CPU
42
356
356
page table
i
31
Paging/VM (2/3)
Disk
Physical Memory
Operating System
CPU
42
356
356
Place page table in physical memory However this
doubles the time per memory access!!
32
Paging/VM (3/3)
Disk
Physical Memory
Operating System
CPU
42
356
356
Cache!
Special-purpose cache for translations Historicall
y called the TLB Translation Lookaside Buffer
33
Translation Cache
Just like any other cache, the TLB can be
organized as fully associative, set associative,
or direct mapped TLBs are usually small,
typically not more than 128 - 256 entries even
on high end machines. This permits fully
associative lookup on these machines. Most
mid-range machines use small n-way set
associative organizations. Note 128-256 entries
times 4KB-16KB/entry is only 512KB-4MB the L2
cache is often bigger than the span of the TLB.
hit
miss
VA
PA
TLB Lookup
Cache
Main Memory
CPU
Translation with a TLB
hit
miss
Trans- lation
data
34
Translation Cache
A way to speed up translation is to use a special
cache of recently used page table entries
-- this has many names, but the most
frequently used is Translation Lookaside Buffer
or TLB
Virtual Page Physical Frame Dirty
Ref Valid Access
tag
Really just a cache (a special-purpose cache) on
the page table mappings TLB access time
comparable to cache access time (much less
than main memory access time)
35
An example of a TLB
Page Offset
Page frame addr.
Read/write policies and permissions
lt30gt
lt13gt
1
2
V lt1gt
Tag lt30gt
Phys. Addr. lt21gt
R lt2gt
W lt2gt
(Low-order 13 bits of addr.)
lt13gt
...
4
34-bit physical address

(High-order 21 bits of addr.)
321 Mux
3
lt21gt
36
The big picture and TLBs
  • Address translation is usually on the critical
    path
  • which determines the clock cycle time of the mP
  • Even in the simplest cache, TLB values must be
    read and compared
  • TLB is usually smaller and faster than the
    cache-address-tag memory
  • This way multiple TLB reads dont increase the
    cache hit time
  • TLB accesses are usually pipelined b/c its so
    important!

37
Segmentation
Virtual Address
Segment Number
Offset
Segment Table Entry
Other Control Bits
Length
Segment Base
P
M
For bound checking
38
Address Translation in a Segmentation System
This could use all 32 bits cant just
concatenate
39
Pages versus segments
40
The big picture and TLBs
TLB access
Virtual Address
Yes
No
TLB Hit?
Yes
No
Try to read from page table
Write?
Try to read from cache
Set in TLB
Page fault?
Cache/buffer memory write
Yes
Yes
No
No
Cache hit?
Replace page from disk
TLB miss stall
Cache miss stall
Deliver data to CPU
41
Paged and Segmented VM(Figure 5.33, pg. 463)
  • Virtual memories can be catagorized into two main
    classes
  • Paged memory fixed size blocks
  • Segmented memory variable size blocks

Figure 5.33
42
Paged vs. Segmented VM
  • Paged memory
  • Fixed sized blocks (4 KB to 64 KB)
  • One word per address (page number page offset)
  • Easy to replace pages (all same size)
  • Internal fragmentation (not all of page is used)
  • Efficient disk traffic (optimize for page size)
  • Segmented memory
  • Variable sized blocks (up to 64 KB or 4GB)
  • Two words per address (segment offset)
  • Difficult to replace segments (find where segment
    fits)
  • External fragmentation (unused portions of
    memory)
  • Inefficient disk traffic (may have small or large
    transfers)
  • Hybrid approaches
  • Paged segments segments are a multiple of a page
    size
  • Multiple page sizes (e.g., 8 KB, 64 KB, 512 KB,
    4096 KB)

43
Pages are Cached in a Virtual Memory System
  • Can Ask the Same Four Questions we did about
    caches
  • Q1 Block Placement
  • choice lower miss rates and complex placement or
    vice versa
  • miss penalty is huge
  • so choose low miss rate gt place page anywhere
    in physical memory
  • similar to fully associative cache model
  • Q2 Block Addressing - use additional data
    structure
  • fixed size pages - use a page table
  • virtual page number gt physical page number and
    concatenate offset
  • tag bit to indicate presence in main memory

44
Normal Page Tables
  • Size is number of virtual pages
  • Purpose is to hold the translation of VPN to PPN
  • Permits ease of page relocation
  • Make sure to keep tags to indicate page is mapped
  • Potential problem
  • Consider 32bit virtual address and 4k pages
  • 4GB/4KB 1MW required just for the page table!
  • Might have to page in the page table
  • Consider how the problem gets worse on 64bit
    machines with even larger virtual address spaces!
  • Alpha has a 43bit virtual address with 8k pages
  • Might have multi-level page tables

45
Inverted Page Tables
  • Similar to a set-associative mechanism
  • Make the page table reflect the of physical
    pages (not virtual)
  • Use a hash mechanism
  • virtual page number gt HPN index into inverted
    page table
  • Compare virtual page number with the tag to make
    sure it is the one you want
  • if yes
  • check to see that it is in memory - OK if yes -
    if not page fault
  • If not - miss
  • go to full page table on disk to get new entry
  • implies 2 disk accesses in the worst case
  • trades increased worst case penalty for decrease
    in capacity induced miss rate since there is now
    more room for real pages with smaller page table

46
Inverted Page Table
Page
Offset
  • Only store entries
  • For pages in physical
  • memory

Hash
Page
V
Frame

OK
Frame
Offset
47
Address Translation Reality
  • The translation process using page tables takes
    too long!
  • Use a cache to hold recent translations
  • Translation Lookaside Buffer
  • Typically 8-1024 entries
  • Block size same as a page table entry (1 or 2
    words)
  • Only holds translations for pages in memory
  • 1 cycle hit time
  • Highly or fully associative
  • Miss rate lt 1
  • Miss goes to main memory (where the whole page
    table lives)
  • Must be purged on a process switch

48
Back to the 4 Questions
  • Q3 Block Replacement (pages in physical memory)
  • LRU is best
  • So use it to minimize the horrible miss penalty
  • However, real LRU is expensive
  • Page table contains a use tag
  • On access the use tag is set
  • OS checks them every so often, records what it
    sees, and resets them all
  • On a miss, the OS decides who has been used the
    least
  • Basic strategy Miss penalty is so huge, you can
    spend a few OS cycles to help reduce the miss rate

49
Last Question
  • Q4 Write Policy
  • Always write-back
  • Due to the access time of the disk
  • So, you need to keep tags to show when pages are
    dirty and need to be written back to disk when
    theyre swapped out.
  • Anything else is pretty silly
  • Remember the disk is SLOW!

50
Page Sizes
  • An architectural choice
  • Large pages are good
  • reduces page table size
  • amortizes the long disk access
  • if spatial locality is good then hit rate will
    improve
  • Large pages are bad
  • more internal fragmentation
  • if everything is random each structures last
    page is only half full
  • Half of bigger is still bigger
  • if there are 3 structures per process text,
    heap, and control stack
  • then 1.5 pages are wasted for each process
  • process start up time takes longer
  • since at least 1 page of each type is required to
    prior to start
  • transfer time penalty aspect is higher

51
More on TLBs
  • The TLB must be on chip
  • otherwise it is worthless
  • small TLBs are worthless anyway
  • large TLBs are expensive
  • high associativity is likely
  • gt Price of CPUs is going up!
  • OK as long as performance goes up faster

52
Address Translation with Page Table (Figure 5.35,
pg. 464)
  • A page table translates a virtual page number
    into a physical page number
  • The page offset remains unchaged
  • Page tables are large
  • 32 bit virtual address
  • 4 KB page size
  • 220 4 byte table entries 4MB
  • Page tables are stored in main memory gt slow
  • Cache table entries in a translation buffer

53
Fast Address Translation with Translation Buffer
(TB) (Figure 5.36, pg. 466)
  • Cache translated addresses in TB
  • Alpha 21064 data TB
  • 32 entries
  • fully associative
  • 30 bit tag
  • 21 bit physical address
  • Valid and read/write bits
  • Separate TB for instr.
  • Steps in translation
  • compare page no. to tags
  • check for memory access violation
  • send physical page no. of matching tag
  • combine physical page no. and page offset

Figure 5.36 Operation of the Alpha 21246 data TLB
During address Translation
54
Selecting a Page Size
  • Reasons for larger page size
  • Page table size is inversely proportional to the
    page size therefore memory saved
  • Fast cache hit time easy when cache size lt page
    size (VA caches) bigger page makes this
    feasible as cache size grows
  • Transferring larger pages to or from secondary
    storage, possibly over a network, is more
    efficient
  • Number of TLB entries are restricted by clock
    cycle time, so a larger page size maps more
    memory, thereby reducing TLB misses
  • Reasons for a smaller page size
  • Want to avoid internal fragmentation dont waste
    storage data must be contiguous within page
  • Quicker process start for small processes - dont
    need to bring in more memory than needed

55
Memory Protection
  • With multiprogramming, a computer is shared by
    several programs or processes running
    concurrently
  • Need to provide protection
  • Need to allow sharing
  • Mechanisms for providing protection
  • Provide Base and Bound registers Base ? Address
    ? Bound
  • Provide both user and supervisor (operating
    system) modes
  • Provide CPU state that the user can read, but
    cannot write
  • Branch and bounds registers, user/supervisor bit,
    exception bits
  • Provide method to go from user to supervisor mode
    and vice versa
  • system call user to supervisor
  • system return supervisor to user
  • Provide permissions for each flag or segment in
    memory

56
Alpha VM Mapping(Figure 5.39, pg. 472)
  • 64-bit address divided into 3 segments
  • seg0 (bit 630) user code
  • seg1 (bit 63 1, 62 1) user stack
  • kseg (bit 63 1, 62 0) kernel segment for OS
  • Three level page table, each one page
  • Reduces page table size
  • Increases translation time
  • PTE bits valid, kernel user read write enable

Figure 5.39 The Mapping of an Alpha virtual
address
57
Alpha 21064 Memory Hierarchy
  • The Alpha 21064 memory hierarchy includes
  • A 32 entry, fully associative, data TB
  • A 12 entry, fully associative instruction TB
  • A 8 KB direct-mapped physically addressed data
    cache
  • A 8 KB direct-mapped physically addressed
    instruction cache
  • A 4 entry by 64-bit instruction prefetch stream
    buffer
  • A 4 entry by 256-bit write buffer
  • A 2 MB directed mapped second level unified cache
  • The virtual memory
  • Maps a 43-bit virtual address to a 34-bit
    physical address
  • Has a page size of 8 KB

58
Alpha Memory Performance Miss Rates
8K
8K
2M
59
Alpha CPI Components
  • Largest increase in CPI due to
  • I stall Instruction stalls from branch
    mispredictions
  • Other data hazards, structural hazards

60
Pitfall Address space to small
  • One of the biggest mistakes than can be made when
    designing an architect is to devote to few bits
    to the address
  • address size limits the size of virtual memory
  • difficult to change since many components depend
    on it (e.g., PC, registers, effective-address
    calculations)
  • As program size increases, larger and larger
    address sizes are needed
  • 8 bit Intel 8080 (1975)
  • 16 bit Intel 8086 (1978)
  • 24 bit Intel 80286 (1982)
  • 32 bit Intel 80386 (1985)
  • 64 bit Intel Merced (1998)

61
Pitfall Predicting Cache Performance of one
Program from Another Program
  • 4KB Data cache miss rate 8,12,or 28?
  • 1KB Instr cache miss rate 0,3,or 10?
  • Alpha vs. MIPS for 8KB Data17 vs. 10

62
Pitfall Simulating Too Small an Address Trace
63
Virtual Memory Summary
  • Virtual memory (VM) allows main memory (DRAM) to
    act like a cache for secondary storage (magnetic
    disk).
  • The large miss penalty of virtual memory leads to
    different stategies from cache
  • Fully associative, TB PT, LRU, Write-back
  • Designed as
  • paged fixed size blocks
  • segmented variable size blocks
  • hybrid segmented paging or multiple page sizes
  • Avoid small address size

64
Summary 2 Typical Choices
Write a Comment
User Comments (0)
About PowerShow.com