Computer System Chapter 10' Virtual Memory - PowerPoint PPT Presentation

1 / 116
About This Presentation
Title:

Computer System Chapter 10' Virtual Memory

Description:

Use Physical DRAM as a Cache for the Disk. Address space of a process can exceed physical memory size ... Blocks are called 'pages' (both virtual and physical) ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 117
Provided by: randa88
Category:

less

Transcript and Presenter's Notes

Title: Computer System Chapter 10' Virtual Memory


1
Computer System Chapter 10. Virtual Memory
Lynn Choi Korea University
2
Motivations for Virtual Memory
  • Simplify Memory Management
  • Provide each process with a uniform address space
  • Provide illusion of infinite amount of memory
  • Each process with its own address space
  • Use Physical DRAM as a Cache for the Disk
  • Address space of a process can exceed physical
    memory size
  • Only active code and data is actually in memory
  • Allocate more memory to process as needed.
  • Sum of address spaces of multiple processes can
    exceed physical memory size
  • Multiple processes partially resident in main
    memory.
  • Provide Protection
  • One process cant interfere with another.
  • Because they operate in different address spaces.
  • Different sections of address spaces have
    different permissions.
  • User process cannot access privileged information

3
Virtual Memory
  • Benefits
  • Easier programming
  • Software portability
  • Protection
  • Increased CPU utilization more programs can run
    at the same time
  • Virtual address space
  • Programmers view of infinite memory
  • Physical address space
  • Machines physical memory
  • Require the following functions
  • Memory allocation (Placement)
  • Fully-associative
  • The tag size is small compared to a block (page)
    size
  • Memory deallocation (Replacement)
  • LRU replacement policy
  • Memory mapping (Translation)
  • Virtual address to physical address translation

4
Paging
  • Divide address space into fixed size page frames
  • VA consists of (VPN, offset)
  • PA consists of (PPN, offset)
  • Map a virtual page to a physical page at runtime
  • Page table entry (PTE) contains
  • VPN to PPN mapping
  • Presence bit
  • Reference bit
  • Dirty bit
  • Access control read/write/execute
  • Privilege level
  • Disk address
  • Demand paging bring in a page on a page miss
  • Internal fragmentation

5
Motivation 1 DRAM a Cache for Disk
  • Full address space is quite large
  • 32-bit addresses
    4,000,000,000 (4 billion) bytes
  • 64-bit addresses 16,000,000,000,000,000,000 (16
    quintillion) bytes
  • Disk storage is 300X cheaper than DRAM storage
  • 80 GB of DRAM 33,000
  • 80 GB of disk 110
  • To access large amounts of data in a
    cost-effective manner, the bulk of the data must
    be stored on disk

6
Levels in Memory Hierarchy
cache
virtual memory
Memory
disk
8 B
32 B
4 KB
Register
Cache
Memory
Disk Memory
size speed /Mbyte line size
32 B 1 ns 8 B
32 KB-4MB 2 ns 125/MB 32 B
1024 MB 30 ns 0.20/MB 4 KB
100 GB 8 ms 0.001/MB
larger, slower, cheaper
7
DRAM vs. SRAM as a Cache
  • DRAM vs. disk is more extreme than SRAM vs. DRAM
  • Access latencies
  • DRAM 10X slower than SRAM
  • Disk 100,000X slower than DRAM
  • Importance of exploiting spatial locality
  • First byte is 100,000X slower than successive
    bytes on disk
  • vs. 4X improvement for page-mode vs. regular
    accesses to DRAM
  • Bottom line
  • Design decisions made for DRAM caches driven by
    enormous cost of misses

DRAM
Disk
SRAM
8
Impact of Properties on Design
  • If DRAM was to be organized similar to an SRAM
    cache, how would we set the following design
    parameters?
  • Line size?
  • Large, since disk better at transferring large
    blocks
  • Associativity?
  • High, to minimize miss rates
  • Write through or write back?
  • Write back, since cant afford to perform small
    writes to disk
  • What would the impact of these choices be on
  • miss rate
  • Extremely low. ltlt 1
  • hit time
  • Must match cache/DRAM performance
  • miss latency
  • Very high. 20ms
  • tag storage overhead
  • Low, relative to block size

9
Locating an Object in a Cache
  • SRAM Cache
  • Tag stored with cache line
  • Maps from cache block to memory blocks
  • From cached to uncached form
  • Save a few bits by only storing tag
  • No tag for block not in cache
  • Hardware retrieves information
  • Can quickly match against multiple tags

10
Locating an Object in Cache (cont.)
  • DRAM Cache
  • Each allocated page of virtual memory has entry
    in page table
  • Mapping from virtual pages to physical pages
  • From uncached form to cached form
  • Page table entry even if page not in memory
  • Specifies disk address
  • Only way to indicate where to find page
  • OS retrieves information

Cache
Page Table
Location
0
On Disk

1
11
A System with Physical Memory Only
  • Examples
  • most Cray machines, early PCs, nearly all
    embedded systems, etc.
  • Addresses generated by the CPU correspond
    directly to bytes in physical memory

12
A System with Virtual Memory
  • Examples
  • workstations, servers, modern PCs, etc.
  • Address Translation Hardware converts virtual
    addresses to physical addresses via OS-managed
    lookup table (page table)

Memory
Page Table
Virtual Addresses
Physical Addresses
0
1
P-1
Disk
13
Page Faults (like Cache Misses)
  • What if an object is on disk rather than in
    memory?
  • Page table entry indicates virtual address not in
    memory
  • OS exception handler invoked to move data from
    disk into memory
  • Current process suspends, others can resume
  • OS has full control over placement, etc.

Before fault
After fault
Memory
Memory
Page Table
Page Table
Virtual Addresses
Physical Addresses
Virtual Addresses
Physical Addresses
CPU
CPU
Disk
Disk
14
Servicing a Page Fault
(1) Initiate Block Read
  • Processor Signals Controller
  • Read block of length P starting at disk address X
    and store starting at memory address Y
  • Read Occurs
  • Direct Memory Access (DMA)
  • Under control of I/O controller
  • I / O Controller Signals Completion
  • Interrupt processor
  • OS resumes suspended process

Processor
Reg
(3) Read Done
Cache
Memory-I/O bus
(2) DMA Transfer
I/O controller
Memory
disk
Disk
15
Motivation 2 Memory Management
  • Multiple processes can reside in physical memory.
  • How do we resolve address conflicts?
  • What if two processes access something at the
    same address?

memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region forshared libraries
Linux/x86 process memory image
the brk ptr
runtime heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
forbidden
0
16
Solution Separate Virt. Addr. Spaces
  • Virtual and physical address spaces divided into
    equal-sized blocks
  • Blocks are called pages (both virtual and
    physical)
  • Each process has its own virtual address space
  • Operating system controls how virtual pages as
    assigned to physical memory

0
Physical Address Space (DRAM)
Address Translation
Virtual Address Space for Process 1
0
VP 1
PP 2
VP 2
...
N-1
(e.g., read/only library code)
PP 7
Virtual Address Space for Process 2
0
VP 1
PP 10
VP 2
...
M-1
N-1
17
Motivation 3 Protection
  • Page table entry contains access rights
    information
  • Hardware enforces this protection (trap into OS
    if violation occurs)

Page Tables
Memory
Process i
Process j
18
VM Address Translation
  • Virtual Address Space
  • V 0, 1, , N1
  • Physical Address Space
  • P 0, 1, , M1
  • M lt N
  • Address Translation
  • MAP V ? P U ?
  • For virtual address a
  • MAP(a) a if data at virtual address a at
    physical address a in P
  • MAP(a) ? if data at virtual address a not in
    physical memory
  • Either invalid or stored on disk

19
VM Address Translation Hit
Processor
Hardware Addr Trans Mechanism
Main Memory
a
a'
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
20
VM Address Translation Miss
page fault
fault handler
Processor
?
Hardware Addr Trans Mechanism
Main Memory
Secondary memory
a
a'
OS performs this transfer (only if miss)
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
21
VM Address Translation
  • Parameters
  • P 2p page size (bytes).
  • N 2n Virtual address limit
  • M 2m Physical address limit

n1
0
p1
p
virtual address
virtual page number
page offset
address translation
0
p1
p
m1
physical address
physical page number
page offset
Page offset bits dont change as a result of
translation
22
Page Tables
Memory resident page table (physical page or
disk address)
Virtual Page Number
Physical Memory
Valid
1
1
0
1
1
1
0
1
Disk Storage (swap file or regular file system
file)
0
1
23
Address Translation via Page Table
24
Page Table Operation
  • Translation
  • Separate (set of) page table(s) per process
  • VPN forms index into page table (points to a page
    table entry)

25
Page Table Operation
  • Computing Physical Address
  • Page Table Entry (PTE) provides information about
    page
  • if (valid bit 1) then the page is in memory.
  • Use physical page number (PPN) to construct
    address
  • if (valid bit 0) then the page is on disk
  • Page fault

26
Page Table Operation
  • Checking Protection
  • Access rights field indicate allowable access
  • e.g., read-only, read-write, execute-only
  • typically support multiple protection modes
    (e.g., kernel vs. user)
  • Protection violation fault if user doesnt have
    necessary permission

27
Integrating VM and Cache
  • Most Caches Physically Addressed
  • Accessed by physical addresses
  • Allows multiple processes to have blocks in cache
    at same time
  • Allows multiple processes to share pages
  • Cache doesnt need to be concerned with
    protection issues
  • Access rights checked as part of address
    translation
  • Perform Address Translation Before Cache Lookup
  • But this could involve a memory access itself (of
    the PTE)
  • Of course, page table entries can also become
    cached

28
Speeding up Translation with a TLB
  • Translation Lookaside Buffer (TLB)
  • Small hardware cache in MMU
  • Maps virtual page numbers to physical page
    numbers
  • Contains complete page table entries for small
    number of pages

29
TLB (Translation Lookaside Buffer)
  • Hardware memory management
  • Cache of page table entries (PTEs)
  • On TLB hit, can do virtual to physical
    translation without accessing the page map table
  • On TLB miss, must search page table for the
    mapping and insert it into the TLB before
    processing continues
  • TLB walker HW to perform the page table search
  • TLB configuration
  • 100 entries, fully or set-associative cache
  • sometimes mutil-level TLBs, TLB shootdown issue
  • usually separate I-TLB and D-TLB, accessed every
    cycle
  • Miss handling - sometimes both by HW and SW
  • By HW - HW page walker
  • Software (OS) managed TLBs - TLB insert/replace
    instr
  • flexible but slow TLB miss handler 100
    instructions

30
Address Translation with a TLB
n1
0
p1
p
virtual address
virtual page number
page offset
valid
physical page number
tag
TLB
.
.
.

TLB hit
physical address
tag
byte offset
index
valid
tag
data
Cache

data
cache hit
31
TLB and Cache Implementation of DECStation 3100
32
Address Translation Symbols
  • Virtual Address Components
  • VPO virtual page offset
  • VPN virtual page number
  • TLBI TLB index
  • TLBT TLB tag
  • Physical Address Components
  • PPO physical page offset
  • PPN physical page number
  • CO byte offset within cache block
  • CI cache index
  • CT cache tag

33
Simple Memory System Example
  • Addressing
  • 14-bit virtual addresses
  • 12-bit physical address
  • Page size 64 bytes

(Virtual Page Offset)
(Virtual Page Number)
(Physical Page Number)
(Physical Page Offset)
34
Simple Memory System Page Table
  • Only show first 16 entries

35
Simple Memory System TLB
  • TLB
  • 16 entries
  • 4-way associative

36
Simple Memory System Cache
  • Cache
  • 16 lines
  • 4-byte line size
  • Direct mapped

37
Address Translation Example 1
  • Virtual Address 0x03D4
  • VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
    Fault? __ PPN ____
  • Physical Address
  • Offset ___ CI___ CT ____ Hit? __ Byte ____

38
Address Translation Example 2
  • Virtual Address 0x0B8F
  • VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
    Fault? __ PPN ____
  • Physical Address
  • Offset ___ CI___ CT ____ Hit? __ Byte ____

39
Address Translation Example 3
  • Virtual Address 0x0040
  • VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
    Fault? __ PPN ____
  • Physical Address
  • Offset ___ CI___ CT ____ Hit? __ Byte ____

40
Multi-Level Page Tables
Level 2 Tables
  • Given
  • 4KB (212) page size
  • 32-bit address space
  • 4-byte PTE
  • Problem
  • Would need a 4 MB page table!
  • 220 4 bytes
  • Common solution
  • multi-level page tables
  • e.g., 2-level table (P6)
  • Level 1 table 1024 entries, each of which points
    to a Level 2 page table.
  • This is called page directory
  • Level 2 table 1024 entries, each of which
    points to a page

Level 1 Table
...
41
Program Start Scenario
  • Before starting the process
  • Load the page directory into physical memory
  • Load the PDBR (page directory base register) with
    the beginning of the page directory
  • Load the PC with the start address of code
  • When the 1st reference to code triggers
  • iTLB miss (translation failed for instruction
    address)
  • Exception handler looks up PTE1
  • dTLB miss (translation failed for PTE1)
  • Exception handler looks up PTE2
  • Lookup page directory and find PTE2
  • Add PTE2 to dTLB
  • dTLB hit, but page miss (PTE1 not in memory)
  • Load page containing PTE1
  • Lookup page table and find PTE1
  • Add PTE1 to iTLB
  • iTLB hit, but page miss (code page not present in
    memory)
  • Load the instruction page
  • Cache miss, but memory returns the instruction

42
P6 Memory System
  • 32 bit address space
  • 4 KB page size
  • L1, L2, and TLBs
  • 4-way set associative
  • inst TLB
  • 32 entries
  • 8 sets
  • data TLB
  • 64 entries
  • 16 sets
  • L1 i-cache and d-cache
  • 16 KB
  • 32 B line size
  • 128 sets
  • L2 cache
  • unified
  • 128 KB -- 2 MB

DRAM
external system bus (e.g. PCI)
L2 cache
cache bus
bus interface unit
inst TLB
data TLB
instruction fetch unit
L1 i-cache
L1 d-cache
processor package
43
Overview of P6 Address Translation
CPU
32
L2 and DRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
44
P6 2-level Page Table Structure
  • Page directory
  • 1024 4-byte page directory entries (PDEs) that
    point to page tables
  • one page directory per process.
  • page directory must be in memory when its process
    is running
  • always pointed to by PDBR
  • Page tables
  • 1024 4-byte page table entries (PTEs) that point
    to pages.
  • page tables can be paged in and out.

Up to 1024 page tables
1024 PTEs
page directory
...
1024 PTEs
1024 PDEs
...
1024 PTEs
45
P6 Page Directory Entry (PDE)
31
12
11
9
8
7
6
5
4
3
2
1
0
Page table physical base addr
Avail
G
PS
A
CD
WT
U/S
R/W
P1
Page table physical base address 20 most
significant bits of physical page table address
(forces page tables to be 4KB aligned) Avail
These bits available for system programmers G
global page (dont evict from TLB on task
switch) PS page size 4K (0) or 4M (1) A
accessed (set by MMU on reads and writes, cleared
by software) CD cache disabled (1) or enabled
(0) WT write-through or write-back cache policy
for this page table U/S user or supervisor mode
access R/W read-only or read-write access P
page table is present in memory (1) or not (0)
31
0
1
Available for OS (page table location in
secondary storage)
P0
46
P6 Page Table Entry (PTE)
31
12
11
9
8
7
6
5
4
3
2
1
0
Page physical base address
Avail
G
0
D
A
CD
WT
U/S
R/W
P1
Page base address 20 most significant bits of
physical page address (forces pages to be 4 KB
aligned) Avail available for system
programmers G global page (dont evict from TLB
on task switch) D dirty (set by MMU on
writes) A accessed (set by MMU on reads and
writes) CD cache disabled or enabled WT
write-through or write-back cache policy for this
page U/S user/supervisor R/W read/write P page
is present in physical memory (1) or not (0)
31
0
1
Available for OS (page location in secondary
storage)
P0
47
How P6 Page Tables Map VirtualAddresses to
Physical Ones
10
10
12
Virtual address
VPN1
VPO
VPN2
word offset into page directory
word offset into page table
word offset into physical and virtual page
page directory
page table
physical address of page base (if P1)
PTE
PDE
PDBR
physical address of page table base (if P1)
physical address of page directory
20
12
Physical address
PPN
PPO
48
Representation of Virtual Address Space
  • Simplified Example
  • 16 page virtual address space
  • Flags
  • P Is entry in physical memory?
  • M Has this part of VA space been mapped?

49
P6 TLB Translation
CPU
32
L2 andDRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
50
P6 TLB
  • TLB entry (not all documented, so this is
    speculative)
  • V indicates a valid (1) or invalid (0) TLB entry
  • PD is this entry a PDE (1) or a PTE (0)?
  • tag disambiguates entries cached in the same set
  • PDE/PTE page directory or page table entry
  • Structure of the data TLB
  • 16 sets, 4 entries/set

51
Translating with the P6 Page Tables (case 1/1)
  • Case 1/1 page table and page present.
  • MMU Action
  • MMU builds physical address and fetches data
    word.
  • OS action
  • none

20
12
VPN
VPO
20
12
VPN1
VPN2
PPN
PPO
Mem
PDE
p1
PTE
p1
data
PDBR
Data page
Page directory
Page table
Disk
52
Translating with the P6 Page Tables (case 1/0)
  • Case 1/0 page table present but page missing.
  • MMU Action
  • Page fault exception
  • Handler receives the following args
  • VA that caused fault
  • Fault caused by non-present page or page-level
    protection violation
  • Read/write
  • User/supervisor

20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p1
PTE
p0
PDBR
Page directory
Page table
data
Disk
Data page
53
Translating with the P6 Page Tables (case 1/0)
  • OS Action
  • Check for a legal virtual address.
  • Read PTE through PDE.
  • Find free physical page (swapping out current
    page if necessary)
  • Read virtual page from disk and copy to virtual
    page
  • Restart faulting instruction by returning from
    exception handler.

20
12
VPN
VPO
20
12
VPN1
VPN2
PPN
PPO
Mem
PDE
p1
PTE
p1
data
PDBR
Data page
Page directory
Page table
Disk
54
Translating with the P6 Page Tables (case 0/1)
  • Case 0/1 page table missing but page present.
  • Introduces consistency issue.
  • Potentially every page out requires update of
    disk page table.
  • Linux disallows this
  • If a page table is swapped out, then swap out its
    data pages too.

20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p0
data
PDBR
Data page
Page directory
PTE
p1
Disk
Page table
55
Translating with the P6 Page Tables (case 0/0)
  • Case 0/0 page table and page missing.
  • MMU Action
  • Page fault exception

20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p0
PDBR
Page directory
PTE
data
p0
Disk
Page table
Data page
56
Translating with the P6 Page Tables (case 0/0)
  • OS action
  • Swap in page table.
  • Restart faulting instruction by returning from
    handler.
  • Like case 1/0 from here on.

20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p1
PTE
p0
PDBR
Page table
Page directory
data
Disk
Data page
57
P6 L1 Cache Access
CPU
32
L2 andDRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
58
Speeding Up L1 Access
Tag Check
20
5
7
CT
CO
CI
Physical address (PA)
PPO
PPN
Addr. Trans.
No Change
CI
virtual address (VA)
VPN
VPO
20
12
  • Observation
  • Bits that determine CI identical in virtual and
    physical address
  • Can index into cache while address translation
    taking place
  • Then check with CT from physical address
  • Virtually indexed, physically tagged
  • Cache carefully sized to make this possible

59
Linux Organizes VM as Collection of Areas
  • Area
  • Contiguous chunk of (allocated) virtual memory
    whose pages are related
  • Examples code segment, data segment, heap,
    shared library segment, etc.
  • Any existing virtual page is contained in some
    area.
  • Any virtual page that is not part of some area
    does not exist and cannot be referenced!
  • Thus, the virtual address space can have gaps.
  • The kernel does not keep track of virtual pages
    that do not exist.
  • task_struct
  • Kernel maintains a distinct task structure for
    each process
  • Contain all the information that the kernel needs
    to run the process
  • PID, pointer to the user stack, name of the
    executable object file, program counter, etc.
  • mm_struct
  • One of the entries in the task structure that
    characterizes the current state of virtual memory
  • pgd base of the page directory table
  • mmap points to a list of vm_area_struct

60
Linux Organizes VM as Collection of Areas
process virtual memory
vm_area_struct
task_struct
mm_struct
vm_end
vm_start
pgd
mm
vm_prot
vm_flags
mmap
shared libraries
vm_next
0x40000000
vm_end
vm_start
data
  • vm_prot
  • read/write permissions for this area
  • vm_flags
  • shared with other processes or private to this
    process

vm_prot
vm_flags
0x0804a020
text
vm_next
vm_end
vm_start
0x08048000
vm_prot
vm_flags
0
vm_next
61
Linux Page Fault Handling
process virtual memory
  • Is the VA legal?
  • i.e. is it in an area defined by a
    vm_area_struct?
  • if not then signal segmentation violation (e.g.
    (1))
  • Is the operation legal?
  • i.e., can the process read/write this area?
  • if not then signal protection violation
    fault (e.g., (2))
  • If OK, handle the page fault
  • e.g., (3)

vm_area_struct
shared libraries
1
read
3
data
read
2
text
write
0
62
Memory Mapping
  • Linux (also, UNIX) initializes the contents of a
    virtual memory area by associating it with an
    object on disk
  • Create new vm_area_struct and page tables for
    area
  • Areas can be mapped to one of two types of
    objects (i.e., get its initial values from)
  • Regular file on disk (e.g., an executable object
    file)
  • The file is divided into page-sized pieces.
  • The initial contents of a virtual page comes from
    each piece.
  • If the area is larger than file section, then the
    area is padded with zeros.
  • Anonymous file (e.g., bss)
  • An area can be mapped to an anonymous file,
    created by the kernel.
  • The initial contents of these pages are
    initialized as zeros
  • Also, called demand-zero pages
  • Key point no virtual pages are copied into
    physical memory until they are referenced!
  • Known as demand paging
  • Crucial for time and space efficiency

63
User-Level Memory Mapping
  • void mmap(void start, int len,
  • int prot, int flags, int fd, int
    offset)
  • map len bytes starting at offset offset of the
    file specified by file description fd, preferably
    at address start (usually 0 for dont care).
  • prot PROT_EXEC, PROT_READ, PROT_WRITE
  • flags MAP_PRIVATE, MAP_SHARED, MAP_ANON
  • MAP_PRIVATE indicates a private copy-on-write
    object
  • MAP_SHARED indicates a shared object
  • MAP_ANON with NULL fd indicates an anonymous file
    (demand-zero pages)
  • Return a pointer to the mapped area.
  • Int munmap(void start, int len)
  • Delete the area starting at virtual address start
    and length len

64
Shared Objects
  • Why shared objects?
  • Many processes need to share identical read-only
    text areas. For example,
  • Each tcsh process has the same text area.
  • Standard library functions such as printf
  • It would be extremely wasteful for each process
    to keep duplicate copies in physical memory
  • An object can be mapped as either a shared object
    or a private object
  • Shared object
  • Any write to that area is visible to any other
    processes that have also mapped the shared
    object.
  • The changes are also reflected in the original
    object on disk.
  • A virtual memory area into which a shared object
    is mapped is called a shared area.
  • Private object
  • Any write to that area is not visible to other
    processes.
  • The changes are not reflected back to the object
    on disk.
  • Private objects are mapped into virtual memory
    using copy-on-write.
  • Only one copy of the private object is stored in
    physical memory.
  • The page table entries for the private area are
    flagged as read-only
  • Any write to some page in the private area
    triggers a protection fault
  • The hander needs to create a new copy of the page
    in physical memory and then restores the write
    permission to the page.
  • After the handler returns, the process proceeds
    normally

65
Shared Object
66
Private Object
67
Exec() Revisited
  • To run a new program p in the current process
    using exec()
  • Free vm_area_structs and page tables for old
    areas.
  • Create new vm_area_structs and page tables for
    new areas.
  • stack, bss, data, text, shared libs.
  • text and data backed by ELF executable object
    file.
  • bss and stack initialized to zero.
  • Set PC to entry point in .text
  • Linux will swap in code and data pages as needed.

process-specific data structures (page
tables, task and mm structs)
physical memory
same for each process
kernel code/data/stack
kernel VM
0xc0
demand-zero
stack
esp
process VM
Memory mapped region for shared libraries
.data
.text
libc.so
brk
runtime heap (via malloc)
demand-zero
uninitialized data (.bss)
initialized data (.data)
.data
program text (.text)
.text
p
forbidden
0
68
Fork() Revisited
  • To create a new process using fork()
  • Make copies of the old processs mm_struct,
    vm_area_structs, and page tables.
  • At this point the two processes are sharing all
    of their pages.
  • How to get separate spaces without copying all
    the virtual pages from one space to another?
  • copy on write technique.
  • copy-on-write
  • Make pages of writeable areas read-only
  • flag vm_area_structs for these areas as private
    copy-on-write.
  • Writes by either process to these pages will
    cause page faults.
  • Fault handler recognizes copy-on-write, makes a
    copy of the page, and restores write permissions.
  • Net result
  • Copies are deferred until absolutely necessary
    (i.e., when one of the processes tries to modify
    a shared page).

69
Dynamic Memory Allocation
  • Heap
  • An area of demand-zero memory that begins
    immediately after the bss area.
  • Allocator
  • Maintains the heap as a collection of various
    sized blocks.
  • Each block is a contiguous chunk of virtual
    memory that is either allocated or free.
  • Explicit allocator requires the application to
    allocate and free space
  • E.g., malloc and free in C
  • Implicit allocator requires the application to
    allocate, but not to free space
  • The allocator needs to detect when an allocated
    block is no longer being used
  • Implicit allocators are also known as garbage
    collectors.
  • The process of automatically freeing unused
    blocks is known as garbage collection.
  • E.g. garbage collection in Java, ML or Lisp

70
Heap
memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region for shared libraries
the brk ptr points to the top of the heap
run-time heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
0
71
Malloc Package
  • include ltstdlib.hgt
  • void malloc(size_t size)
  • If successful
  • Returns a pointer to a memory block of at least
    size bytes
  • (Typically) aligned to 8-byte boundary so that
    any kind of data object can be contained in the
    block
  • If size 0, returns NULL
  • If unsuccessful (i.e. larger than virtual
    memory) returns NULL (0) and sets errno.
  • Two other variations calloc (initialize the
    allocated memory to zero) and realloc
  • Use the mmap or munmap function, or use sbrk
    function
  • void realloc(void p, size_t size)
  • Changes the size of block to p and returns
    pointer to the new block.
  • Contents of the new block unchanged up to min of
    old and new size.
  • void free(void p)
  • Returns the block pointed at by p to pool of
    available memory
  • p must come from a previous call to malloc or
    realloc.

72
Malloc Example
void foo(int n, int m) int i, p /
allocate a block of n ints / if ((p (int )
malloc(n sizeof(int))) NULL)
perror("malloc") exit(0) for (i0
iltn i) pi i / add m bytes to end
of p block / if ((p (int ) realloc(p, (nm)
sizeof(int))) NULL) perror("realloc")
exit(0) for (in i lt nm i)
pi i / print new array / for (i0
iltnm i) printf("d\n", pi) free(p)
/ return p to available memory pool /
73
Allocation Examples
p1 malloc(4)
p2 malloc(5)
p3 malloc(6)
free(p2)
p4 malloc(2)
74
Requirements (Explicit Allocators)
  • Applications
  • Can issue arbitrary sequence of allocation and
    free requests
  • Free requests must correspond to an allocated
    block
  • Allocators
  • Cant control the number or the size of allocated
    blocks
  • Must respond immediately to all allocation
    requests
  • i.e., cant reorder or buffer requests
  • Must allocate blocks from free memory
  • i.e., can only place allocated blocks in free
    memory
  • Must align blocks so they satisfy all alignment
    requirements
  • 8 byte alignment for GNU malloc (libc malloc) on
    Linux boxes
  • Can only manipulate and modify free memory
  • Cant move the allocated blocks once they are
    allocated
  • i.e., compaction is not allowed

75
Goals of Allocators
  • Maximize throughput
  • Throughput number of completed requests per unit
    time
  • Example
  • 5,000 malloc calls and 5,000 free calls in 10
    seconds
  • Throughput is 1,000 operations/second
  • Maximize memory utilization
  • Need to minimize fragmentation.
  • Fragmentation (holes) unused area
  • There is a tradeoff between throughput and memory
    utilization
  • Need to balance these two goals
  • Good locality properties
  • Similar objects should be allocated close in
    space

76
Internal Fragmentation
  • Poor memory utilization caused by fragmentation.
  • Comes in two forms internal and external
    fragmentation
  • Internal fragmentation
  • For some block, internal fragmentation is the
    difference between the block size and the payload
    size.
  • Caused by overhead of maintaining heap data
    structures, i.e. padding for alignment purposes.
  • Any virtual memory allocation policy using the
    fixed sized block such as paging can suffer from
    internal fragmentation

block
Internal fragmentation
payload
Internal fragmentation
77
External Fragmentation
Occurs when there is enough aggregate heap
memory, but no single free block is large enough
p1 malloc(4)
p2 malloc(5)
p3 malloc(6)
free(p2)
p4 malloc(6)
oops!
External fragmentation depends on the pattern of
future requests, and thus is difficult to
measure.
78
Implementation Issues
  • Free block organization
  • How do we know the size of a free block?
  • How do we keep track of the free blocks?
  • Placement
  • How do we choose an appropriate free block in
    which to place a newly allocated block?
  • Splitting
  • What do we do with the extra space after the
    placement?
  • Coalescing
  • What do we do with small blocks that have been
    freed

p1 malloc(1)
79
How do we know the size of a block?
  • Standard method
  • Keep the length of a block in the word preceding
    the block.
  • This word is often called the header field or
    header
  • Requires an extra word for every allocated block
  • Format of a simple heap block

0
31
1
2
3
a 1 Allocated a 0 Free The block size
includes the header, payload, and any padding.
malloc returns a pointer to the beginning of the
payload
Block size
0 0 a
Payload (allocated block only)
Padding (optional)
80
Example
81
Keeping Track of Free Blocks
  • Method 1 Implicit list using lengths -- links
    all blocks
  • Method 2 Explicit list among the free blocks
    using pointers within the free blocks
  • Method 3 Segregated free list
  • Different free lists for different size classes

5
4
2
6
5
4
2
6
82
Placement Policy
  • First fit
  • Search list from the beginning, choose the first
    free block that fits
  • Can take linear time in total number of blocks
    (allocated and free)
  • () Tend to retain large free blocks at the end
  • (-) Leave small free blocks at beginning
  • Next fit
  • Like first-fit, but search the list starting from
    the end of previous search
  • () Run faster than the first fit
  • (-) Worse memory utilization than the first fit
  • Best fit
  • Search the list, choose the free block with the
    closest size that fits
  • () Keeps fragments small better memory
    utilization than the other two
  • (-) Will typically run slower requires an
    exhaustive search of the heap

83
Splitting
  • Allocating in a free block - splitting
  • Since allocated space might be smaller than free
    space, we might want to split the block

4
4
2
6
p
addblock(p, 2)
2
4
2
4
4
84
Coalescing
  • Coalescing
  • When the allocator frees a block, there might be
    other free blocks that are adjacent.
  • Such adjacent free blocks can cause a false
    fragmentation, where there is an enough free
    space, but chopped up into small, unusable free
    spaces.
  • Need to coalesce next and/or previous block if
    they are free
  • Coalescing with next block
  • But how do we coalesce with previous block?

2
4
2
4
p
free(p)
4
4
2
6
85
Bidirectional Coalescing
  • Boundary tags Knuth73
  • Replicate size/allocated word (called footer) at
    the bottom of a block
  • Allows us to traverse the list backwards, but
    requires extra space
  • Important and general technique! allow constant
    time coalescing

1 word
Header
size
a
a 1 allocated block a 0 free block size
total block size payload application
data (allocated blocks only)
payload and padding
Format of allocated and free blocks
size
a
Boundary tag (footer)
4
4
4
4
6
4
6
4
86
Constant Time Coalescing
Case 1
Case 2
Case 3
Case 4
allocated
allocated
free
free
block being freed
allocated
free
allocated
free
87
Constant Time Coalescing (Case 1)
m1
1
m1
1
m1
1
m1
1
n
1
n
0
n
1
n
0
m2
1
m2
1
m2
1
m2
1
88
Constant Time Coalescing (Case 2)
m1
1
m1
1
m1
1
m1
1
nm2
0
n
1
n
1
m2
0
nm2
0
m2
0
89
Constant Time Coalescing (Case 3)
m1
0
nm1
0
m1
0
n
1
n
1
nm1
0
m2
1
m2
1
m2
1
m2
1
90
Constant Time Coalescing (Case 4)
m1
0
nm1m2
0
m1
0
n
1
n
1
m2
0
m2
0
nm1m2
0
91
Implicit Lists Summary
  • Implementation is very simple
  • Allocate takes linear time in the worst case
  • Free takes constant time in the worst case --
    even with coalescing
  • Memory usage will depend on placement policy
  • First fit, next fit or best fit
  • Not used in practice for malloc/free because of
    linear time allocate.
  • Used for special purpose applications where the
    total number of blocks is known beforehand to be
    small
  • However, the concepts of splitting and boundary
    tag coalescing are general to all allocators.

92
Keeping Track of Free Blocks
  • Method 1 Implicit list using lengths -- links
    all blocks
  • Method 2 Explicit list among the free blocks
    using pointers within the free blocks
  • Method 3 Segregated free lists
  • Different free lists for different size classes

5
4
2
6
93
Explicit Free Lists
  • Use data space for pointers
  • Typically doubly linked
  • Still need boundary tags for coalescing

Forward links
A
B
4
4
4
4
6
6
4
4
4
4
C
Back links
94
Format of Doubly-Linked Heap Blocks
0
31
1
2
3
0
31
1
2
3
Block size
a/f
Block size
a/f
Header
Header
Payload

pred (Predecessor)
succ (Successor)
Old payload
Padding (optional)
Padding (optional)
Block size
a/f
Footer
Block size
a/f
Footer
Allocated Block
Free Block
95
Freeing With Explicit Free Lists
  • Insertion policy Where in the free list do you
    put a newly freed block?
  • LIFO (last-in-first-out) policy
  • Insert freed block at the beginning of the free
    list
  • () Simple and freeing a block can be performed
    in constant time.
  • If boundary tags are used, coalescing can also be
    performed in constant time.
  • Address-ordered policy
  • Insert freed blocks so that free list blocks are
    always in address order
  • i.e. addr(pred) lt addr(curr) lt addr(succ)
  • (-) Freeing a block requires linear-time search
  • () Studies suggest address-ordered first fit
    enjoys better memory utilization than
    LIFO-ordered first fit.

96
Explicit List Summary
  • Comparison to implicit list
  • Allocation time takes linear in the number of
    free blocks instead of total blocks
  • Much faster allocates when most of the memory is
    full
  • Slightly more complicated allocate and free since
    needs to splice blocks in and out of the list
  • Extra space for the links (2 extra words needed
    for each block)
  • This results in a larger minimum block size, and
    potentially increase the degree of internal
    fragmentation
  • Main use of linked lists is in conjunction with
    segregated free lists
  • Keep multiple linked lists of different size
    classes, or possibly for different types of
    objects

97
Keeping Track of Free Blocks
  • Method 1 Implicit list using lengths -- links
    all blocks
  • Method 2 Explicit list among the free blocks
    using pointers within the free blocks
  • Method 3 Segregated free list
  • Different free lists for different size classes
  • Can be used to reduce the allocation time
    compared to a linked list organization

5
4
2
6
5
4
2
6
98
Segregated Storage
  • Partition the set of all free blocks into
    equivalent classes called size classes
  • The allocator maintains an array of free lists,
    with one free list per size class ordered by
    increasing size.
  • Often have separate size class for every small
    size (2,3,4,)
  • Classes with larger sizes typically have a size
    class for each power of 2
  • Variations of segregated storage
  • They differ in how they define size classes, when
    they perform coalescing, and when they request
    additional heap memory to OS, whether they allow
    splitting, and so on.
  • Examples simple segregated storage, segregated
    fits

99
Simple Segregated Storage
  • Separate heap and free list for each size class
  • Free list for each size class contains same-sized
    blocks of the largest element size
  • For example, the free list for size class 17-32
    consists entirely of block size 32
  • To allocate a block of size n
  • If free list for size n is not empty, allocate
    the first block in its entirety
  • If free list is empty, get a new page from OS,
    create a new free list from all the blocks in
    page, and then allocate the first block on list
  • To free a block
  • Simply insert the free block at the front of the
    appropriate free list
  • () Both allocating and freeing blocks are fast
    constant-time operations.
  • () Little per-block memory overhead no
    splitting and no coalescing
  • (-) Susceptible to internal and external
    fragmentation
  • Internal fragmentation since free blocks are
    never split
  • External fragmentation since free blocks are
    never coalesced

100
Segregated Fits
  • Array of free lists, each one for some size class
  • Free list fir each size class contains
    potentially different-sized blocks
  • To allocate a block of size n
  • Do a first-fit search of the appropriate free
    list
  • If an appropriate block is found
  • Split (option) the block and place the fragment
    on the appropriate list
  • If no block is found, try the next larger class
    and repeat this until block is found
  • If none of free lists yields a block that fits,
    request additional heap memory to OS, allocate
    the block out of this new heap memory, and place
    the remainder in the largest size
  • To free a block
  • Coalesce and place on the appropriate list
  • () Fast
  • Since searches are limited to part of the heap
    rather than the entire heap area
  • However, coalescing can increase search times
  • () Good memory utilization
  • A simple first-fit search approximates a best-fit
    search of the entire heap
  • Popular choice for production-quality allocators
    such as GNU malloc

101
Garbage Collection
  • Garbage collector dynamic storage allocator that
    automatically frees allocated blocks that are no
    longer used
  • Implicit memory management an application never
    has to free

void foo() int p malloc(128) return
/ p block is now garbage /
  • Common in functional languages, scripting
    languages, and modern object oriented languages
  • Lisp, ML, Java, Perl, Mathematica,
  • Variants (conservative garbage collectors) exist
    for C and C
  • Cannot collect all garbages

102
Garbage Collection
  • How does the memory manager know when memory can
    be freed?
  • In general we cannot know what is going to be
    used in the future since it depends on
    conditionals
  • But we can tell that certain blocks cannot be
    used if there are no pointers to them
  • Need to make certain assumptions about pointers
  • Memory manager need to distinguish pointers from
    non-pointers
  • Garbage Collection
  • Garbage collectors views memory as a reachability
    graph and periodically reclaim the unreachable
    nodes
  • Classical GC Algorithms
  • Mark and sweep collection (McCarthy, 1960)
  • Does not move blocks (unless you also compact)
  • Reference counting (Collins, 1960)
  • Does not move blocks (not discussed)
  • Copying collection (Minsky, 1963)
  • Moves blocks (not discussed)

103
Memory as a Graph
  • Reachability graph we view memory as a directed
    graph
  • Each block is a node in the graph
  • Each pointer is an edge in the graph
  • Locations not in the heap that contain pointers
    into the heap are called root node
  • e.g. registers, locations on the stack, global
    variables

Root nodes
Heap nodes
reachable
Not-reachable(garbage)
  • A node (block) is reachable if there is a path
    from any root to that node.
  • Non-reachable nodes are garbage (never needed by
    the application)

104
Mark and Sweep Garbage Collectors
  • A MarkSweep garbage collector consists of a mark
    phase followed by a sweep phase
  • Use extra mark bit in the head of each block
  • When out of space
  • Mark Start at roots and set mark bit on all
    reachable memory blocks
  • Sweep Scan all blocks and free blocks that are
    not marked

Mark bit set
root
Before mark
After mark
After sweep
free
free
105
Mark and Sweep (cont.)
Mark using depth-first traversal of the memory
gra
Write a Comment
User Comments (0)
About PowerShow.com