Title: Computer System Chapter 10' Virtual Memory
1Computer System Chapter 10. Virtual Memory
Lynn Choi Korea University
2Motivations for Virtual Memory
- Simplify Memory Management
- Provide each process with a uniform address space
- Provide illusion of infinite amount of memory
- Each process with its own address space
- Use Physical DRAM as a Cache for the Disk
- Address space of a process can exceed physical
memory size - Only active code and data is actually in memory
- Allocate more memory to process as needed.
- Sum of address spaces of multiple processes can
exceed physical memory size - Multiple processes partially resident in main
memory. - Provide Protection
- One process cant interfere with another.
- Because they operate in different address spaces.
- Different sections of address spaces have
different permissions. - User process cannot access privileged information
3Virtual Memory
- Benefits
- Easier programming
- Software portability
- Protection
- Increased CPU utilization more programs can run
at the same time - Virtual address space
- Programmers view of infinite memory
- Physical address space
- Machines physical memory
- Require the following functions
- Memory allocation (Placement)
- Fully-associative
- The tag size is small compared to a block (page)
size - Memory deallocation (Replacement)
- LRU replacement policy
- Memory mapping (Translation)
- Virtual address to physical address translation
4Paging
- Divide address space into fixed size page frames
- VA consists of (VPN, offset)
- PA consists of (PPN, offset)
- Map a virtual page to a physical page at runtime
- Page table entry (PTE) contains
- VPN to PPN mapping
- Presence bit
- Reference bit
- Dirty bit
- Access control read/write/execute
- Privilege level
- Disk address
- Demand paging bring in a page on a page miss
- Internal fragmentation
5Motivation 1 DRAM a Cache for Disk
- Full address space is quite large
- 32-bit addresses
4,000,000,000 (4 billion) bytes - 64-bit addresses 16,000,000,000,000,000,000 (16
quintillion) bytes - Disk storage is 300X cheaper than DRAM storage
- 80 GB of DRAM 33,000
- 80 GB of disk 110
- To access large amounts of data in a
cost-effective manner, the bulk of the data must
be stored on disk
6Levels in Memory Hierarchy
cache
virtual memory
Memory
disk
8 B
32 B
4 KB
Register
Cache
Memory
Disk Memory
size speed /Mbyte line size
32 B 1 ns 8 B
32 KB-4MB 2 ns 125/MB 32 B
1024 MB 30 ns 0.20/MB 4 KB
100 GB 8 ms 0.001/MB
larger, slower, cheaper
7DRAM vs. SRAM as a Cache
- DRAM vs. disk is more extreme than SRAM vs. DRAM
- Access latencies
- DRAM 10X slower than SRAM
- Disk 100,000X slower than DRAM
- Importance of exploiting spatial locality
- First byte is 100,000X slower than successive
bytes on disk - vs. 4X improvement for page-mode vs. regular
accesses to DRAM - Bottom line
- Design decisions made for DRAM caches driven by
enormous cost of misses
DRAM
Disk
SRAM
8Impact of Properties on Design
- If DRAM was to be organized similar to an SRAM
cache, how would we set the following design
parameters? - Line size?
- Large, since disk better at transferring large
blocks - Associativity?
- High, to minimize miss rates
- Write through or write back?
- Write back, since cant afford to perform small
writes to disk - What would the impact of these choices be on
- miss rate
- Extremely low. ltlt 1
- hit time
- Must match cache/DRAM performance
- miss latency
- Very high. 20ms
- tag storage overhead
- Low, relative to block size
9Locating an Object in a Cache
- SRAM Cache
- Tag stored with cache line
- Maps from cache block to memory blocks
- From cached to uncached form
- Save a few bits by only storing tag
- No tag for block not in cache
- Hardware retrieves information
- Can quickly match against multiple tags
10Locating an Object in Cache (cont.)
- DRAM Cache
- Each allocated page of virtual memory has entry
in page table - Mapping from virtual pages to physical pages
- From uncached form to cached form
- Page table entry even if page not in memory
- Specifies disk address
- Only way to indicate where to find page
- OS retrieves information
Cache
Page Table
Location
0
On Disk
1
11A System with Physical Memory Only
- Examples
- most Cray machines, early PCs, nearly all
embedded systems, etc. - Addresses generated by the CPU correspond
directly to bytes in physical memory
12A System with Virtual Memory
- Examples
- workstations, servers, modern PCs, etc.
- Address Translation Hardware converts virtual
addresses to physical addresses via OS-managed
lookup table (page table)
Memory
Page Table
Virtual Addresses
Physical Addresses
0
1
P-1
Disk
13Page Faults (like Cache Misses)
- What if an object is on disk rather than in
memory? - Page table entry indicates virtual address not in
memory - OS exception handler invoked to move data from
disk into memory - Current process suspends, others can resume
- OS has full control over placement, etc.
Before fault
After fault
Memory
Memory
Page Table
Page Table
Virtual Addresses
Physical Addresses
Virtual Addresses
Physical Addresses
CPU
CPU
Disk
Disk
14Servicing a Page Fault
(1) Initiate Block Read
- Processor Signals Controller
- Read block of length P starting at disk address X
and store starting at memory address Y - Read Occurs
- Direct Memory Access (DMA)
- Under control of I/O controller
- I / O Controller Signals Completion
- Interrupt processor
- OS resumes suspended process
Processor
Reg
(3) Read Done
Cache
Memory-I/O bus
(2) DMA Transfer
I/O controller
Memory
disk
Disk
15Motivation 2 Memory Management
- Multiple processes can reside in physical memory.
- How do we resolve address conflicts?
- What if two processes access something at the
same address?
memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region forshared libraries
Linux/x86 process memory image
the brk ptr
runtime heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
forbidden
0
16Solution Separate Virt. Addr. Spaces
- Virtual and physical address spaces divided into
equal-sized blocks - Blocks are called pages (both virtual and
physical) - Each process has its own virtual address space
- Operating system controls how virtual pages as
assigned to physical memory
0
Physical Address Space (DRAM)
Address Translation
Virtual Address Space for Process 1
0
VP 1
PP 2
VP 2
...
N-1
(e.g., read/only library code)
PP 7
Virtual Address Space for Process 2
0
VP 1
PP 10
VP 2
...
M-1
N-1
17Motivation 3 Protection
- Page table entry contains access rights
information - Hardware enforces this protection (trap into OS
if violation occurs)
Page Tables
Memory
Process i
Process j
18VM Address Translation
- Virtual Address Space
- V 0, 1, , N1
- Physical Address Space
- P 0, 1, , M1
- M lt N
- Address Translation
- MAP V ? P U ?
- For virtual address a
- MAP(a) a if data at virtual address a at
physical address a in P - MAP(a) ? if data at virtual address a not in
physical memory - Either invalid or stored on disk
19VM Address Translation Hit
Processor
Hardware Addr Trans Mechanism
Main Memory
a
a'
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
20VM Address Translation Miss
page fault
fault handler
Processor
?
Hardware Addr Trans Mechanism
Main Memory
Secondary memory
a
a'
OS performs this transfer (only if miss)
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
21VM Address Translation
- Parameters
- P 2p page size (bytes).
- N 2n Virtual address limit
- M 2m Physical address limit
n1
0
p1
p
virtual address
virtual page number
page offset
address translation
0
p1
p
m1
physical address
physical page number
page offset
Page offset bits dont change as a result of
translation
22Page Tables
Memory resident page table (physical page or
disk address)
Virtual Page Number
Physical Memory
Valid
1
1
0
1
1
1
0
1
Disk Storage (swap file or regular file system
file)
0
1
23Address Translation via Page Table
24Page Table Operation
- Translation
- Separate (set of) page table(s) per process
- VPN forms index into page table (points to a page
table entry)
25Page Table Operation
- Computing Physical Address
- Page Table Entry (PTE) provides information about
page - if (valid bit 1) then the page is in memory.
- Use physical page number (PPN) to construct
address - if (valid bit 0) then the page is on disk
- Page fault
26Page Table Operation
- Checking Protection
- Access rights field indicate allowable access
- e.g., read-only, read-write, execute-only
- typically support multiple protection modes
(e.g., kernel vs. user) - Protection violation fault if user doesnt have
necessary permission
27Integrating VM and Cache
- Most Caches Physically Addressed
- Accessed by physical addresses
- Allows multiple processes to have blocks in cache
at same time - Allows multiple processes to share pages
- Cache doesnt need to be concerned with
protection issues - Access rights checked as part of address
translation - Perform Address Translation Before Cache Lookup
- But this could involve a memory access itself (of
the PTE) - Of course, page table entries can also become
cached
28Speeding up Translation with a TLB
- Translation Lookaside Buffer (TLB)
- Small hardware cache in MMU
- Maps virtual page numbers to physical page
numbers - Contains complete page table entries for small
number of pages
29TLB (Translation Lookaside Buffer)
- Hardware memory management
- Cache of page table entries (PTEs)
- On TLB hit, can do virtual to physical
translation without accessing the page map table - On TLB miss, must search page table for the
mapping and insert it into the TLB before
processing continues - TLB walker HW to perform the page table search
- TLB configuration
- 100 entries, fully or set-associative cache
- sometimes mutil-level TLBs, TLB shootdown issue
- usually separate I-TLB and D-TLB, accessed every
cycle - Miss handling - sometimes both by HW and SW
- By HW - HW page walker
- Software (OS) managed TLBs - TLB insert/replace
instr - flexible but slow TLB miss handler 100
instructions
30Address Translation with a TLB
n1
0
p1
p
virtual address
virtual page number
page offset
valid
physical page number
tag
TLB
.
.
.
TLB hit
physical address
tag
byte offset
index
valid
tag
data
Cache
data
cache hit
31TLB and Cache Implementation of DECStation 3100
32Address Translation Symbols
- Virtual Address Components
- VPO virtual page offset
- VPN virtual page number
- TLBI TLB index
- TLBT TLB tag
- Physical Address Components
- PPO physical page offset
- PPN physical page number
- CO byte offset within cache block
- CI cache index
- CT cache tag
33Simple Memory System Example
- Addressing
- 14-bit virtual addresses
- 12-bit physical address
- Page size 64 bytes
(Virtual Page Offset)
(Virtual Page Number)
(Physical Page Number)
(Physical Page Offset)
34Simple Memory System Page Table
- Only show first 16 entries
35Simple Memory System TLB
- TLB
- 16 entries
- 4-way associative
36Simple Memory System Cache
- Cache
- 16 lines
- 4-byte line size
- Direct mapped
37Address Translation Example 1
- Virtual Address 0x03D4
- VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____ - Physical Address
- Offset ___ CI___ CT ____ Hit? __ Byte ____
38Address Translation Example 2
- Virtual Address 0x0B8F
- VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____ - Physical Address
- Offset ___ CI___ CT ____ Hit? __ Byte ____
39Address Translation Example 3
- Virtual Address 0x0040
- VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____ - Physical Address
- Offset ___ CI___ CT ____ Hit? __ Byte ____
40Multi-Level Page Tables
Level 2 Tables
- Given
- 4KB (212) page size
- 32-bit address space
- 4-byte PTE
- Problem
- Would need a 4 MB page table!
- 220 4 bytes
- Common solution
- multi-level page tables
- e.g., 2-level table (P6)
- Level 1 table 1024 entries, each of which points
to a Level 2 page table. - This is called page directory
- Level 2 table 1024 entries, each of which
points to a page
Level 1 Table
...
41Program Start Scenario
- Before starting the process
- Load the page directory into physical memory
- Load the PDBR (page directory base register) with
the beginning of the page directory - Load the PC with the start address of code
- When the 1st reference to code triggers
- iTLB miss (translation failed for instruction
address) - Exception handler looks up PTE1
- dTLB miss (translation failed for PTE1)
- Exception handler looks up PTE2
- Lookup page directory and find PTE2
- Add PTE2 to dTLB
- dTLB hit, but page miss (PTE1 not in memory)
- Load page containing PTE1
- Lookup page table and find PTE1
- Add PTE1 to iTLB
- iTLB hit, but page miss (code page not present in
memory) - Load the instruction page
- Cache miss, but memory returns the instruction
42P6 Memory System
- 32 bit address space
- 4 KB page size
- L1, L2, and TLBs
- 4-way set associative
- inst TLB
- 32 entries
- 8 sets
- data TLB
- 64 entries
- 16 sets
- L1 i-cache and d-cache
- 16 KB
- 32 B line size
- 128 sets
- L2 cache
- unified
- 128 KB -- 2 MB
DRAM
external system bus (e.g. PCI)
L2 cache
cache bus
bus interface unit
inst TLB
data TLB
instruction fetch unit
L1 i-cache
L1 d-cache
processor package
43Overview of P6 Address Translation
CPU
32
L2 and DRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
44P6 2-level Page Table Structure
- Page directory
- 1024 4-byte page directory entries (PDEs) that
point to page tables - one page directory per process.
- page directory must be in memory when its process
is running - always pointed to by PDBR
- Page tables
- 1024 4-byte page table entries (PTEs) that point
to pages. - page tables can be paged in and out.
Up to 1024 page tables
1024 PTEs
page directory
...
1024 PTEs
1024 PDEs
...
1024 PTEs
45P6 Page Directory Entry (PDE)
31
12
11
9
8
7
6
5
4
3
2
1
0
Page table physical base addr
Avail
G
PS
A
CD
WT
U/S
R/W
P1
Page table physical base address 20 most
significant bits of physical page table address
(forces page tables to be 4KB aligned) Avail
These bits available for system programmers G
global page (dont evict from TLB on task
switch) PS page size 4K (0) or 4M (1) A
accessed (set by MMU on reads and writes, cleared
by software) CD cache disabled (1) or enabled
(0) WT write-through or write-back cache policy
for this page table U/S user or supervisor mode
access R/W read-only or read-write access P
page table is present in memory (1) or not (0)
31
0
1
Available for OS (page table location in
secondary storage)
P0
46P6 Page Table Entry (PTE)
31
12
11
9
8
7
6
5
4
3
2
1
0
Page physical base address
Avail
G
0
D
A
CD
WT
U/S
R/W
P1
Page base address 20 most significant bits of
physical page address (forces pages to be 4 KB
aligned) Avail available for system
programmers G global page (dont evict from TLB
on task switch) D dirty (set by MMU on
writes) A accessed (set by MMU on reads and
writes) CD cache disabled or enabled WT
write-through or write-back cache policy for this
page U/S user/supervisor R/W read/write P page
is present in physical memory (1) or not (0)
31
0
1
Available for OS (page location in secondary
storage)
P0
47How P6 Page Tables Map VirtualAddresses to
Physical Ones
10
10
12
Virtual address
VPN1
VPO
VPN2
word offset into page directory
word offset into page table
word offset into physical and virtual page
page directory
page table
physical address of page base (if P1)
PTE
PDE
PDBR
physical address of page table base (if P1)
physical address of page directory
20
12
Physical address
PPN
PPO
48Representation of Virtual Address Space
- Simplified Example
- 16 page virtual address space
- Flags
- P Is entry in physical memory?
- M Has this part of VA space been mapped?
49P6 TLB Translation
CPU
32
L2 andDRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
50P6 TLB
- TLB entry (not all documented, so this is
speculative) - V indicates a valid (1) or invalid (0) TLB entry
- PD is this entry a PDE (1) or a PTE (0)?
- tag disambiguates entries cached in the same set
- PDE/PTE page directory or page table entry
- Structure of the data TLB
- 16 sets, 4 entries/set
51Translating with the P6 Page Tables (case 1/1)
- Case 1/1 page table and page present.
- MMU Action
- MMU builds physical address and fetches data
word. - OS action
- none
20
12
VPN
VPO
20
12
VPN1
VPN2
PPN
PPO
Mem
PDE
p1
PTE
p1
data
PDBR
Data page
Page directory
Page table
Disk
52Translating with the P6 Page Tables (case 1/0)
- Case 1/0 page table present but page missing.
- MMU Action
- Page fault exception
- Handler receives the following args
- VA that caused fault
- Fault caused by non-present page or page-level
protection violation - Read/write
- User/supervisor
20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p1
PTE
p0
PDBR
Page directory
Page table
data
Disk
Data page
53Translating with the P6 Page Tables (case 1/0)
- OS Action
- Check for a legal virtual address.
- Read PTE through PDE.
- Find free physical page (swapping out current
page if necessary) - Read virtual page from disk and copy to virtual
page - Restart faulting instruction by returning from
exception handler.
20
12
VPN
VPO
20
12
VPN1
VPN2
PPN
PPO
Mem
PDE
p1
PTE
p1
data
PDBR
Data page
Page directory
Page table
Disk
54Translating with the P6 Page Tables (case 0/1)
- Case 0/1 page table missing but page present.
- Introduces consistency issue.
- Potentially every page out requires update of
disk page table. - Linux disallows this
- If a page table is swapped out, then swap out its
data pages too.
20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p0
data
PDBR
Data page
Page directory
PTE
p1
Disk
Page table
55Translating with the P6 Page Tables (case 0/0)
- Case 0/0 page table and page missing.
- MMU Action
- Page fault exception
20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p0
PDBR
Page directory
PTE
data
p0
Disk
Page table
Data page
56Translating with the P6 Page Tables (case 0/0)
- OS action
- Swap in page table.
- Restart faulting instruction by returning from
handler. - Like case 1/0 from here on.
20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p1
PTE
p0
PDBR
Page table
Page directory
data
Disk
Data page
57P6 L1 Cache Access
CPU
32
L2 andDRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
58Speeding Up L1 Access
Tag Check
20
5
7
CT
CO
CI
Physical address (PA)
PPO
PPN
Addr. Trans.
No Change
CI
virtual address (VA)
VPN
VPO
20
12
- Observation
- Bits that determine CI identical in virtual and
physical address - Can index into cache while address translation
taking place - Then check with CT from physical address
- Virtually indexed, physically tagged
- Cache carefully sized to make this possible
59Linux Organizes VM as Collection of Areas
- Area
- Contiguous chunk of (allocated) virtual memory
whose pages are related - Examples code segment, data segment, heap,
shared library segment, etc. - Any existing virtual page is contained in some
area. - Any virtual page that is not part of some area
does not exist and cannot be referenced! - Thus, the virtual address space can have gaps.
- The kernel does not keep track of virtual pages
that do not exist. - task_struct
- Kernel maintains a distinct task structure for
each process - Contain all the information that the kernel needs
to run the process - PID, pointer to the user stack, name of the
executable object file, program counter, etc. - mm_struct
- One of the entries in the task structure that
characterizes the current state of virtual memory - pgd base of the page directory table
- mmap points to a list of vm_area_struct
60Linux Organizes VM as Collection of Areas
process virtual memory
vm_area_struct
task_struct
mm_struct
vm_end
vm_start
pgd
mm
vm_prot
vm_flags
mmap
shared libraries
vm_next
0x40000000
vm_end
vm_start
data
- vm_prot
- read/write permissions for this area
- vm_flags
- shared with other processes or private to this
process
vm_prot
vm_flags
0x0804a020
text
vm_next
vm_end
vm_start
0x08048000
vm_prot
vm_flags
0
vm_next
61Linux Page Fault Handling
process virtual memory
- Is the VA legal?
- i.e. is it in an area defined by a
vm_area_struct? - if not then signal segmentation violation (e.g.
(1)) - Is the operation legal?
- i.e., can the process read/write this area?
- if not then signal protection violation
fault (e.g., (2)) - If OK, handle the page fault
- e.g., (3)
vm_area_struct
shared libraries
1
read
3
data
read
2
text
write
0
62Memory Mapping
- Linux (also, UNIX) initializes the contents of a
virtual memory area by associating it with an
object on disk - Create new vm_area_struct and page tables for
area - Areas can be mapped to one of two types of
objects (i.e., get its initial values from) - Regular file on disk (e.g., an executable object
file) - The file is divided into page-sized pieces.
- The initial contents of a virtual page comes from
each piece. - If the area is larger than file section, then the
area is padded with zeros. - Anonymous file (e.g., bss)
- An area can be mapped to an anonymous file,
created by the kernel. - The initial contents of these pages are
initialized as zeros - Also, called demand-zero pages
- Key point no virtual pages are copied into
physical memory until they are referenced! - Known as demand paging
- Crucial for time and space efficiency
63User-Level Memory Mapping
- void mmap(void start, int len,
- int prot, int flags, int fd, int
offset) - map len bytes starting at offset offset of the
file specified by file description fd, preferably
at address start (usually 0 for dont care). - prot PROT_EXEC, PROT_READ, PROT_WRITE
- flags MAP_PRIVATE, MAP_SHARED, MAP_ANON
- MAP_PRIVATE indicates a private copy-on-write
object - MAP_SHARED indicates a shared object
- MAP_ANON with NULL fd indicates an anonymous file
(demand-zero pages) - Return a pointer to the mapped area.
- Int munmap(void start, int len)
- Delete the area starting at virtual address start
and length len
64Shared Objects
- Why shared objects?
- Many processes need to share identical read-only
text areas. For example, - Each tcsh process has the same text area.
- Standard library functions such as printf
- It would be extremely wasteful for each process
to keep duplicate copies in physical memory - An object can be mapped as either a shared object
or a private object - Shared object
- Any write to that area is visible to any other
processes that have also mapped the shared
object. - The changes are also reflected in the original
object on disk. - A virtual memory area into which a shared object
is mapped is called a shared area. - Private object
- Any write to that area is not visible to other
processes. - The changes are not reflected back to the object
on disk. - Private objects are mapped into virtual memory
using copy-on-write. - Only one copy of the private object is stored in
physical memory. - The page table entries for the private area are
flagged as read-only - Any write to some page in the private area
triggers a protection fault - The hander needs to create a new copy of the page
in physical memory and then restores the write
permission to the page. - After the handler returns, the process proceeds
normally
65Shared Object
66Private Object
67Exec() Revisited
- To run a new program p in the current process
using exec() - Free vm_area_structs and page tables for old
areas. - Create new vm_area_structs and page tables for
new areas. - stack, bss, data, text, shared libs.
- text and data backed by ELF executable object
file. - bss and stack initialized to zero.
- Set PC to entry point in .text
- Linux will swap in code and data pages as needed.
process-specific data structures (page
tables, task and mm structs)
physical memory
same for each process
kernel code/data/stack
kernel VM
0xc0
demand-zero
stack
esp
process VM
Memory mapped region for shared libraries
.data
.text
libc.so
brk
runtime heap (via malloc)
demand-zero
uninitialized data (.bss)
initialized data (.data)
.data
program text (.text)
.text
p
forbidden
0
68Fork() Revisited
- To create a new process using fork()
- Make copies of the old processs mm_struct,
vm_area_structs, and page tables. - At this point the two processes are sharing all
of their pages. - How to get separate spaces without copying all
the virtual pages from one space to another? - copy on write technique.
- copy-on-write
- Make pages of writeable areas read-only
- flag vm_area_structs for these areas as private
copy-on-write. - Writes by either process to these pages will
cause page faults. - Fault handler recognizes copy-on-write, makes a
copy of the page, and restores write permissions. - Net result
- Copies are deferred until absolutely necessary
(i.e., when one of the processes tries to modify
a shared page).
69Dynamic Memory Allocation
- Heap
- An area of demand-zero memory that begins
immediately after the bss area. - Allocator
- Maintains the heap as a collection of various
sized blocks. - Each block is a contiguous chunk of virtual
memory that is either allocated or free. - Explicit allocator requires the application to
allocate and free space - E.g., malloc and free in C
- Implicit allocator requires the application to
allocate, but not to free space - The allocator needs to detect when an allocated
block is no longer being used - Implicit allocators are also known as garbage
collectors. - The process of automatically freeing unused
blocks is known as garbage collection. - E.g. garbage collection in Java, ML or Lisp
70Heap
memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region for shared libraries
the brk ptr points to the top of the heap
run-time heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
0
71Malloc Package
- include ltstdlib.hgt
- void malloc(size_t size)
- If successful
- Returns a pointer to a memory block of at least
size bytes - (Typically) aligned to 8-byte boundary so that
any kind of data object can be contained in the
block - If size 0, returns NULL
- If unsuccessful (i.e. larger than virtual
memory) returns NULL (0) and sets errno. - Two other variations calloc (initialize the
allocated memory to zero) and realloc - Use the mmap or munmap function, or use sbrk
function - void realloc(void p, size_t size)
- Changes the size of block to p and returns
pointer to the new block. - Contents of the new block unchanged up to min of
old and new size. - void free(void p)
- Returns the block pointed at by p to pool of
available memory - p must come from a previous call to malloc or
realloc.
72Malloc Example
void foo(int n, int m) int i, p /
allocate a block of n ints / if ((p (int )
malloc(n sizeof(int))) NULL)
perror("malloc") exit(0) for (i0
iltn i) pi i / add m bytes to end
of p block / if ((p (int ) realloc(p, (nm)
sizeof(int))) NULL) perror("realloc")
exit(0) for (in i lt nm i)
pi i / print new array / for (i0
iltnm i) printf("d\n", pi) free(p)
/ return p to available memory pool /
73Allocation Examples
p1 malloc(4)
p2 malloc(5)
p3 malloc(6)
free(p2)
p4 malloc(2)
74Requirements (Explicit Allocators)
- Applications
- Can issue arbitrary sequence of allocation and
free requests - Free requests must correspond to an allocated
block - Allocators
- Cant control the number or the size of allocated
blocks - Must respond immediately to all allocation
requests - i.e., cant reorder or buffer requests
- Must allocate blocks from free memory
- i.e., can only place allocated blocks in free
memory - Must align blocks so they satisfy all alignment
requirements - 8 byte alignment for GNU malloc (libc malloc) on
Linux boxes - Can only manipulate and modify free memory
- Cant move the allocated blocks once they are
allocated - i.e., compaction is not allowed
75Goals of Allocators
- Maximize throughput
- Throughput number of completed requests per unit
time - Example
- 5,000 malloc calls and 5,000 free calls in 10
seconds - Throughput is 1,000 operations/second
- Maximize memory utilization
- Need to minimize fragmentation.
- Fragmentation (holes) unused area
- There is a tradeoff between throughput and memory
utilization - Need to balance these two goals
- Good locality properties
- Similar objects should be allocated close in
space
76Internal Fragmentation
- Poor memory utilization caused by fragmentation.
- Comes in two forms internal and external
fragmentation - Internal fragmentation
- For some block, internal fragmentation is the
difference between the block size and the payload
size. - Caused by overhead of maintaining heap data
structures, i.e. padding for alignment purposes. - Any virtual memory allocation policy using the
fixed sized block such as paging can suffer from
internal fragmentation
block
Internal fragmentation
payload
Internal fragmentation
77External Fragmentation
Occurs when there is enough aggregate heap
memory, but no single free block is large enough
p1 malloc(4)
p2 malloc(5)
p3 malloc(6)
free(p2)
p4 malloc(6)
oops!
External fragmentation depends on the pattern of
future requests, and thus is difficult to
measure.
78Implementation Issues
- Free block organization
- How do we know the size of a free block?
- How do we keep track of the free blocks?
- Placement
- How do we choose an appropriate free block in
which to place a newly allocated block? - Splitting
- What do we do with the extra space after the
placement? - Coalescing
- What do we do with small blocks that have been
freed
p1 malloc(1)
79How do we know the size of a block?
- Standard method
- Keep the length of a block in the word preceding
the block. - This word is often called the header field or
header - Requires an extra word for every allocated block
- Format of a simple heap block
0
31
1
2
3
a 1 Allocated a 0 Free The block size
includes the header, payload, and any padding.
malloc returns a pointer to the beginning of the
payload
Block size
0 0 a
Payload (allocated block only)
Padding (optional)
80Example
81Keeping Track of Free Blocks
- Method 1 Implicit list using lengths -- links
all blocks - Method 2 Explicit list among the free blocks
using pointers within the free blocks - Method 3 Segregated free list
- Different free lists for different size classes
5
4
2
6
5
4
2
6
82Placement Policy
- First fit
- Search list from the beginning, choose the first
free block that fits - Can take linear time in total number of blocks
(allocated and free) - () Tend to retain large free blocks at the end
- (-) Leave small free blocks at beginning
- Next fit
- Like first-fit, but search the list starting from
the end of previous search - () Run faster than the first fit
- (-) Worse memory utilization than the first fit
- Best fit
- Search the list, choose the free block with the
closest size that fits - () Keeps fragments small better memory
utilization than the other two - (-) Will typically run slower requires an
exhaustive search of the heap
83Splitting
- Allocating in a free block - splitting
- Since allocated space might be smaller than free
space, we might want to split the block
4
4
2
6
p
addblock(p, 2)
2
4
2
4
4
84Coalescing
- Coalescing
- When the allocator frees a block, there might be
other free blocks that are adjacent. - Such adjacent free blocks can cause a false
fragmentation, where there is an enough free
space, but chopped up into small, unusable free
spaces. - Need to coalesce next and/or previous block if
they are free - Coalescing with next block
-
- But how do we coalesce with previous block?
2
4
2
4
p
free(p)
4
4
2
6
85Bidirectional Coalescing
- Boundary tags Knuth73
- Replicate size/allocated word (called footer) at
the bottom of a block - Allows us to traverse the list backwards, but
requires extra space - Important and general technique! allow constant
time coalescing
1 word
Header
size
a
a 1 allocated block a 0 free block size
total block size payload application
data (allocated blocks only)
payload and padding
Format of allocated and free blocks
size
a
Boundary tag (footer)
4
4
4
4
6
4
6
4
86Constant Time Coalescing
Case 1
Case 2
Case 3
Case 4
allocated
allocated
free
free
block being freed
allocated
free
allocated
free
87Constant Time Coalescing (Case 1)
m1
1
m1
1
m1
1
m1
1
n
1
n
0
n
1
n
0
m2
1
m2
1
m2
1
m2
1
88Constant Time Coalescing (Case 2)
m1
1
m1
1
m1
1
m1
1
nm2
0
n
1
n
1
m2
0
nm2
0
m2
0
89Constant Time Coalescing (Case 3)
m1
0
nm1
0
m1
0
n
1
n
1
nm1
0
m2
1
m2
1
m2
1
m2
1
90Constant Time Coalescing (Case 4)
m1
0
nm1m2
0
m1
0
n
1
n
1
m2
0
m2
0
nm1m2
0
91Implicit Lists Summary
- Implementation is very simple
- Allocate takes linear time in the worst case
- Free takes constant time in the worst case --
even with coalescing - Memory usage will depend on placement policy
- First fit, next fit or best fit
- Not used in practice for malloc/free because of
linear time allocate. - Used for special purpose applications where the
total number of blocks is known beforehand to be
small - However, the concepts of splitting and boundary
tag coalescing are general to all allocators.
92Keeping Track of Free Blocks
- Method 1 Implicit list using lengths -- links
all blocks - Method 2 Explicit list among the free blocks
using pointers within the free blocks - Method 3 Segregated free lists
- Different free lists for different size classes
5
4
2
6
93Explicit Free Lists
- Use data space for pointers
- Typically doubly linked
- Still need boundary tags for coalescing
Forward links
A
B
4
4
4
4
6
6
4
4
4
4
C
Back links
94 Format of Doubly-Linked Heap Blocks
0
31
1
2
3
0
31
1
2
3
Block size
a/f
Block size
a/f
Header
Header
Payload
pred (Predecessor)
succ (Successor)
Old payload
Padding (optional)
Padding (optional)
Block size
a/f
Footer
Block size
a/f
Footer
Allocated Block
Free Block
95Freeing With Explicit Free Lists
- Insertion policy Where in the free list do you
put a newly freed block? - LIFO (last-in-first-out) policy
- Insert freed block at the beginning of the free
list - () Simple and freeing a block can be performed
in constant time. - If boundary tags are used, coalescing can also be
performed in constant time. - Address-ordered policy
- Insert freed blocks so that free list blocks are
always in address order - i.e. addr(pred) lt addr(curr) lt addr(succ)
- (-) Freeing a block requires linear-time search
- () Studies suggest address-ordered first fit
enjoys better memory utilization than
LIFO-ordered first fit.
96Explicit List Summary
- Comparison to implicit list
- Allocation time takes linear in the number of
free blocks instead of total blocks - Much faster allocates when most of the memory is
full - Slightly more complicated allocate and free since
needs to splice blocks in and out of the list - Extra space for the links (2 extra words needed
for each block) - This results in a larger minimum block size, and
potentially increase the degree of internal
fragmentation - Main use of linked lists is in conjunction with
segregated free lists - Keep multiple linked lists of different size
classes, or possibly for different types of
objects
97Keeping Track of Free Blocks
- Method 1 Implicit list using lengths -- links
all blocks - Method 2 Explicit list among the free blocks
using pointers within the free blocks - Method 3 Segregated free list
- Different free lists for different size classes
- Can be used to reduce the allocation time
compared to a linked list organization
5
4
2
6
5
4
2
6
98Segregated Storage
- Partition the set of all free blocks into
equivalent classes called size classes - The allocator maintains an array of free lists,
with one free list per size class ordered by
increasing size.
- Often have separate size class for every small
size (2,3,4,) - Classes with larger sizes typically have a size
class for each power of 2 - Variations of segregated storage
- They differ in how they define size classes, when
they perform coalescing, and when they request
additional heap memory to OS, whether they allow
splitting, and so on. - Examples simple segregated storage, segregated
fits
99Simple Segregated Storage
- Separate heap and free list for each size class
- Free list for each size class contains same-sized
blocks of the largest element size - For example, the free list for size class 17-32
consists entirely of block size 32 - To allocate a block of size n
- If free list for size n is not empty, allocate
the first block in its entirety - If free list is empty, get a new page from OS,
create a new free list from all the blocks in
page, and then allocate the first block on list - To free a block
- Simply insert the free block at the front of the
appropriate free list - () Both allocating and freeing blocks are fast
constant-time operations. - () Little per-block memory overhead no
splitting and no coalescing - (-) Susceptible to internal and external
fragmentation - Internal fragmentation since free blocks are
never split - External fragmentation since free blocks are
never coalesced
100Segregated Fits
- Array of free lists, each one for some size class
- Free list fir each size class contains
potentially different-sized blocks - To allocate a block of size n
- Do a first-fit search of the appropriate free
list - If an appropriate block is found
- Split (option) the block and place the fragment
on the appropriate list - If no block is found, try the next larger class
and repeat this until block is found - If none of free lists yields a block that fits,
request additional heap memory to OS, allocate
the block out of this new heap memory, and place
the remainder in the largest size - To free a block
- Coalesce and place on the appropriate list
- () Fast
- Since searches are limited to part of the heap
rather than the entire heap area - However, coalescing can increase search times
- () Good memory utilization
- A simple first-fit search approximates a best-fit
search of the entire heap - Popular choice for production-quality allocators
such as GNU malloc
101Garbage Collection
- Garbage collector dynamic storage allocator that
automatically frees allocated blocks that are no
longer used - Implicit memory management an application never
has to free
void foo() int p malloc(128) return
/ p block is now garbage /
- Common in functional languages, scripting
languages, and modern object oriented languages - Lisp, ML, Java, Perl, Mathematica,
- Variants (conservative garbage collectors) exist
for C and C - Cannot collect all garbages
102Garbage Collection
- How does the memory manager know when memory can
be freed? - In general we cannot know what is going to be
used in the future since it depends on
conditionals - But we can tell that certain blocks cannot be
used if there are no pointers to them - Need to make certain assumptions about pointers
- Memory manager need to distinguish pointers from
non-pointers - Garbage Collection
- Garbage collectors views memory as a reachability
graph and periodically reclaim the unreachable
nodes - Classical GC Algorithms
- Mark and sweep collection (McCarthy, 1960)
- Does not move blocks (unless you also compact)
- Reference counting (Collins, 1960)
- Does not move blocks (not discussed)
- Copying collection (Minsky, 1963)
- Moves blocks (not discussed)
103Memory as a Graph
- Reachability graph we view memory as a directed
graph - Each block is a node in the graph
- Each pointer is an edge in the graph
- Locations not in the heap that contain pointers
into the heap are called root node - e.g. registers, locations on the stack, global
variables
Root nodes
Heap nodes
reachable
Not-reachable(garbage)
- A node (block) is reachable if there is a path
from any root to that node. - Non-reachable nodes are garbage (never needed by
the application)
104Mark and Sweep Garbage Collectors
- A MarkSweep garbage collector consists of a mark
phase followed by a sweep phase - Use extra mark bit in the head of each block
- When out of space
- Mark Start at roots and set mark bit on all
reachable memory blocks - Sweep Scan all blocks and free blocks that are
not marked
Mark bit set
root
Before mark
After mark
After sweep
free
free
105Mark and Sweep (cont.)
Mark using depth-first traversal of the memory
gra