Computer System Chapter 10' Virtual Memory

About This Presentation

Title:

Computer System Chapter 10' Virtual Memory

Description:

Use Physical DRAM as a Cache for the Disk. Address space of a process can exceed physical memory size ... Blocks are called 'pages' (both virtual and physical) ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 117

Provided by: randa88

Category:

more less

Transcript and Presenter's Notes

Title: Computer System Chapter 10' Virtual Memory

1
Computer System Chapter 10. Virtual Memory
Lynn Choi Korea University
2
Motivations for Virtual Memory

Simplify Memory Management
Provide each process with a uniform address space
Provide illusion of infinite amount of memory
Each process with its own address space
Use Physical DRAM as a Cache for the Disk
Address space of a process can exceed physical
memory size
Only active code and data is actually in memory
Allocate more memory to process as needed.
Sum of address spaces of multiple processes can
exceed physical memory size
Multiple processes partially resident in main
memory.
Provide Protection
One process cant interfere with another.
Because they operate in different address spaces.
Different sections of address spaces have
different permissions.
User process cannot access privileged information

3
Virtual Memory

Benefits
Easier programming
Software portability
Protection
Increased CPU utilization more programs can run
at the same time
Virtual address space
Programmers view of infinite memory
Physical address space
Machines physical memory
Require the following functions
Memory allocation (Placement)
Fully-associative
The tag size is small compared to a block (page)
size
Memory deallocation (Replacement)
LRU replacement policy
Memory mapping (Translation)
Virtual address to physical address translation

4
Paging

Divide address space into fixed size page frames
VA consists of (VPN, offset)
PA consists of (PPN, offset)
Map a virtual page to a physical page at runtime
Page table entry (PTE) contains
VPN to PPN mapping
Presence bit
Reference bit
Dirty bit
Access control read/write/execute
Privilege level
Disk address
Demand paging bring in a page on a page miss
Internal fragmentation

5
Motivation 1 DRAM a Cache for Disk

Full address space is quite large
32-bit addresses
4,000,000,000 (4 billion) bytes
64-bit addresses 16,000,000,000,000,000,000 (16
quintillion) bytes
Disk storage is 300X cheaper than DRAM storage
80 GB of DRAM 33,000
80 GB of disk 110
To access large amounts of data in a
cost-effective manner, the bulk of the data must
be stored on disk

6
Levels in Memory Hierarchy
cache
virtual memory
Memory
disk
8 B
32 B
4 KB
Register
Cache
Memory
Disk Memory
size speed /Mbyte line size
32 B 1 ns 8 B
32 KB-4MB 2 ns 125/MB 32 B
1024 MB 30 ns 0.20/MB 4 KB
100 GB 8 ms 0.001/MB
larger, slower, cheaper
7
DRAM vs. SRAM as a Cache

DRAM vs. disk is more extreme than SRAM vs. DRAM
Access latencies
DRAM 10X slower than SRAM
Disk 100,000X slower than DRAM
Importance of exploiting spatial locality
First byte is 100,000X slower than successive
bytes on disk
vs. 4X improvement for page-mode vs. regular
accesses to DRAM
Bottom line
Design decisions made for DRAM caches driven by
enormous cost of misses

DRAM
Disk
SRAM
8
Impact of Properties on Design

If DRAM was to be organized similar to an SRAM
cache, how would we set the following design
parameters?
Line size?
Large, since disk better at transferring large
blocks
Associativity?
High, to minimize miss rates
Write through or write back?
Write back, since cant afford to perform small
writes to disk
What would the impact of these choices be on
miss rate
Extremely low. ltlt 1
hit time
Must match cache/DRAM performance
miss latency
Very high. 20ms
tag storage overhead
Low, relative to block size

9
Locating an Object in a Cache

SRAM Cache
Tag stored with cache line
Maps from cache block to memory blocks
From cached to uncached form
Save a few bits by only storing tag
No tag for block not in cache
Hardware retrieves information
Can quickly match against multiple tags

10
Locating an Object in Cache (cont.)

DRAM Cache
Each allocated page of virtual memory has entry
in page table
Mapping from virtual pages to physical pages
From uncached form to cached form
Page table entry even if page not in memory
Specifies disk address
Only way to indicate where to find page
OS retrieves information

Cache
Page Table
Location
0
On Disk

1
11
A System with Physical Memory Only

Examples
most Cray machines, early PCs, nearly all
embedded systems, etc.
Addresses generated by the CPU correspond
directly to bytes in physical memory

12
A System with Virtual Memory

Examples
workstations, servers, modern PCs, etc.
Address Translation Hardware converts virtual
addresses to physical addresses via OS-managed
lookup table (page table)

Memory
Page Table
Virtual Addresses
Physical Addresses
0
1
P-1
Disk
13
Page Faults (like Cache Misses)

What if an object is on disk rather than in
memory?
Page table entry indicates virtual address not in
memory
OS exception handler invoked to move data from
disk into memory
Current process suspends, others can resume
OS has full control over placement, etc.

Before fault
After fault
Memory
Memory
Page Table
Page Table
Virtual Addresses
Physical Addresses
Virtual Addresses
Physical Addresses
CPU
CPU
Disk
Disk
14
Servicing a Page Fault
(1) Initiate Block Read

Processor Signals Controller
Read block of length P starting at disk address X
and store starting at memory address Y
Read Occurs
Direct Memory Access (DMA)
Under control of I/O controller
I / O Controller Signals Completion
Interrupt processor
OS resumes suspended process

Processor
Reg
(3) Read Done
Cache
Memory-I/O bus
(2) DMA Transfer
I/O controller
Memory
disk
Disk
15
Motivation 2 Memory Management

Multiple processes can reside in physical memory.
How do we resolve address conflicts?
What if two processes access something at the
same address?

memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region forshared libraries
Linux/x86 process memory image
the brk ptr
runtime heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
forbidden
0
16
Solution Separate Virt. Addr. Spaces

Virtual and physical address spaces divided into
equal-sized blocks
Blocks are called pages (both virtual and
physical)
Each process has its own virtual address space
Operating system controls how virtual pages as
assigned to physical memory

0
Physical Address Space (DRAM)
Address Translation
Virtual Address Space for Process 1
0
VP 1
PP 2
VP 2
...
N-1
(e.g., read/only library code)
PP 7
Virtual Address Space for Process 2
0
VP 1
PP 10
VP 2
...
M-1
N-1
17
Motivation 3 Protection

Page table entry contains access rights
information
Hardware enforces this protection (trap into OS
if violation occurs)

Page Tables
Memory
Process i
Process j
18
VM Address Translation

Virtual Address Space
V 0, 1, , N1
Physical Address Space
P 0, 1, , M1
M lt N
Address Translation
MAP V ? P U ?
For virtual address a
MAP(a) a if data at virtual address a at
physical address a in P
MAP(a) ? if data at virtual address a not in
physical memory
Either invalid or stored on disk

19
VM Address Translation Hit
Processor
Hardware Addr Trans Mechanism
Main Memory
a
a'
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
20
VM Address Translation Miss
page fault
fault handler
Processor
?
Hardware Addr Trans Mechanism
Main Memory
Secondary memory
a
a'
OS performs this transfer (only if miss)
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
21
VM Address Translation

Parameters
P 2p page size (bytes).
N 2n Virtual address limit
M 2m Physical address limit

n1
0
p1
p
virtual address
virtual page number
page offset
address translation
0
p1
p
m1
physical address
physical page number
page offset
Page offset bits dont change as a result of
translation
22
Page Tables
Memory resident page table (physical page or
disk address)
Virtual Page Number
Physical Memory
Valid
1
1
0
1
1
1
0
1
Disk Storage (swap file or regular file system
file)
0
1
23
Address Translation via Page Table
24
Page Table Operation

Translation
Separate (set of) page table(s) per process
VPN forms index into page table (points to a page
table entry)

25
Page Table Operation

Computing Physical Address
Page Table Entry (PTE) provides information about
page
if (valid bit 1) then the page is in memory.
Use physical page number (PPN) to construct
address
if (valid bit 0) then the page is on disk
Page fault

26
Page Table Operation

Checking Protection
Access rights field indicate allowable access
e.g., read-only, read-write, execute-only
typically support multiple protection modes
(e.g., kernel vs. user)
Protection violation fault if user doesnt have
necessary permission

27
Integrating VM and Cache

Most Caches Physically Addressed
Accessed by physical addresses
Allows multiple processes to have blocks in cache
at same time
Allows multiple processes to share pages
Cache doesnt need to be concerned with
protection issues
Access rights checked as part of address
translation
Perform Address Translation Before Cache Lookup
But this could involve a memory access itself (of
the PTE)
Of course, page table entries can also become
cached

28
Speeding up Translation with a TLB

Translation Lookaside Buffer (TLB)
Small hardware cache in MMU
Maps virtual page numbers to physical page
numbers
Contains complete page table entries for small
number of pages

29
TLB (Translation Lookaside Buffer)

Hardware memory management
Cache of page table entries (PTEs)
On TLB hit, can do virtual to physical
translation without accessing the page map table
On TLB miss, must search page table for the
mapping and insert it into the TLB before
processing continues
TLB walker HW to perform the page table search
TLB configuration
100 entries, fully or set-associative cache
sometimes mutil-level TLBs, TLB shootdown issue
usually separate I-TLB and D-TLB, accessed every
cycle
Miss handling - sometimes both by HW and SW
By HW - HW page walker
Software (OS) managed TLBs - TLB insert/replace
instr
flexible but slow TLB miss handler 100
instructions

30
Address Translation with a TLB
n1
0
p1
p
virtual address
virtual page number
page offset
valid
physical page number
tag
TLB
.
.
.

TLB hit
physical address
tag
byte offset
index
valid
tag
data
Cache

data
cache hit
31
TLB and Cache Implementation of DECStation 3100
32
Address Translation Symbols

Virtual Address Components
VPO virtual page offset
VPN virtual page number
TLBI TLB index
TLBT TLB tag
Physical Address Components
PPO physical page offset
PPN physical page number
CO byte offset within cache block
CI cache index
CT cache tag

33
Simple Memory System Example

Addressing
14-bit virtual addresses
12-bit physical address
Page size 64 bytes

(Virtual Page Offset)
(Virtual Page Number)
(Physical Page Number)
(Physical Page Offset)
34
Simple Memory System Page Table

Only show first 16 entries

35
Simple Memory System TLB

TLB
16 entries
4-way associative

36
Simple Memory System Cache

Cache
16 lines
4-byte line size
Direct mapped

37
Address Translation Example 1

Virtual Address 0x03D4
VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____
Physical Address
Offset ___ CI___ CT ____ Hit? __ Byte ____

38
Address Translation Example 2

Virtual Address 0x0B8F
VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____
Physical Address
Offset ___ CI___ CT ____ Hit? __ Byte ____

39
Address Translation Example 3

Virtual Address 0x0040
VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____
Physical Address
Offset ___ CI___ CT ____ Hit? __ Byte ____

40
Multi-Level Page Tables
Level 2 Tables

Given
4KB (212) page size
32-bit address space
4-byte PTE
Problem
Would need a 4 MB page table!
220 4 bytes
Common solution
multi-level page tables
e.g., 2-level table (P6)
Level 1 table 1024 entries, each of which points
to a Level 2 page table.
This is called page directory
Level 2 table 1024 entries, each of which
points to a page

Level 1 Table
...
41
Program Start Scenario

Before starting the process
Load the page directory into physical memory
Load the PDBR (page directory base register) with
the beginning of the page directory
Load the PC with the start address of code
When the 1st reference to code triggers
iTLB miss (translation failed for instruction
address)
Exception handler looks up PTE1
dTLB miss (translation failed for PTE1)
Exception handler looks up PTE2
Lookup page directory and find PTE2
Add PTE2 to dTLB
dTLB hit, but page miss (PTE1 not in memory)
Load page containing PTE1
Lookup page table and find PTE1
Add PTE1 to iTLB
iTLB hit, but page miss (code page not present in
memory)
Load the instruction page
Cache miss, but memory returns the instruction

42
P6 Memory System

32 bit address space
4 KB page size
L1, L2, and TLBs
4-way set associative
inst TLB
32 entries
8 sets
data TLB
64 entries
16 sets
L1 i-cache and d-cache
16 KB
32 B line size
128 sets
L2 cache
unified
128 KB -- 2 MB

DRAM
external system bus (e.g. PCI)
L2 cache
cache bus
bus interface unit
inst TLB
data TLB
instruction fetch unit
L1 i-cache
L1 d-cache
processor package
43
Overview of P6 Address Translation
CPU
32
L2 and DRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
44
P6 2-level Page Table Structure

Page directory
1024 4-byte page directory entries (PDEs) that
point to page tables
one page directory per process.
page directory must be in memory when its process
is running
always pointed to by PDBR
Page tables
1024 4-byte page table entries (PTEs) that point
to pages.
page tables can be paged in and out.

Up to 1024 page tables
1024 PTEs
page directory
...
1024 PTEs
1024 PDEs
...
1024 PTEs
45
P6 Page Directory Entry (PDE)
31
12
11
9
8
7
6
5
4
3
2
1
0
Page table physical base addr
Avail
G
PS
A
CD
WT
U/S
R/W
P1
Page table physical base address 20 most
significant bits of physical page table address
(forces page tables to be 4KB aligned) Avail
These bits available for system programmers G
global page (dont evict from TLB on task
switch) PS page size 4K (0) or 4M (1) A
accessed (set by MMU on reads and writes, cleared
by software) CD cache disabled (1) or enabled
(0) WT write-through or write-back cache policy
for this page table U/S user or supervisor mode
access R/W read-only or read-write access P
page table is present in memory (1) or not (0)
31
0
1
Available for OS (page table location in
secondary storage)
P0
46
P6 Page Table Entry (PTE)
31
12
11
9
8
7
6
5
4
3
2
1
0
Page physical base address
Avail
G
0
D
A
CD
WT
U/S
R/W
P1
Page base address 20 most significant bits of
physical page address (forces pages to be 4 KB
aligned) Avail available for system
programmers G global page (dont evict from TLB
on task switch) D dirty (set by MMU on
writes) A accessed (set by MMU on reads and
writes) CD cache disabled or enabled WT
write-through or write-back cache policy for this
page U/S user/supervisor R/W read/write P page
is present in physical memory (1) or not (0)
31
0
1
Available for OS (page location in secondary
storage)
P0
47
How P6 Page Tables Map VirtualAddresses to
Physical Ones
10
10
12
Virtual address
VPN1
VPO
VPN2
word offset into page directory
word offset into page table
word offset into physical and virtual page
page directory
page table
physical address of page base (if P1)
PTE
PDE
PDBR
physical address of page table base (if P1)
physical address of page directory
20
12
Physical address
PPN
PPO
48
Representation of Virtual Address Space

Simplified Example
16 page virtual address space
Flags
P Is entry in physical memory?
M Has this part of VA space been mapped?

49
P6 TLB Translation
CPU
32
L2 andDRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
50
P6 TLB

TLB entry (not all documented, so this is
speculative)
V indicates a valid (1) or invalid (0) TLB entry
PD is this entry a PDE (1) or a PTE (0)?
tag disambiguates entries cached in the same set
PDE/PTE page directory or page table entry
Structure of the data TLB
16 sets, 4 entries/set

51
Translating with the P6 Page Tables (case 1/1)

Case 1/1 page table and page present.
MMU Action
MMU builds physical address and fetches data
word.
OS action
none

20
12
VPN
VPO
20
12
VPN1
VPN2
PPN
PPO
Mem
PDE
p1
PTE
p1
data
PDBR
Data page
Page directory
Page table
Disk
52
Translating with the P6 Page Tables (case 1/0)

Case 1/0 page table present but page missing.
MMU Action
Page fault exception
Handler receives the following args
VA that caused fault
Fault caused by non-present page or page-level
protection violation
Read/write
User/supervisor

20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p1
PTE
p0
PDBR
Page directory
Page table
data
Disk
Data page
53
Translating with the P6 Page Tables (case 1/0)

OS Action
Check for a legal virtual address.
Read PTE through PDE.
Find free physical page (swapping out current
page if necessary)
Read virtual page from disk and copy to virtual
page
Restart faulting instruction by returning from
exception handler.

20
12
VPN
VPO
20
12
VPN1
VPN2
PPN
PPO
Mem
PDE
p1
PTE
p1
data
PDBR
Data page
Page directory
Page table
Disk
54
Translating with the P6 Page Tables (case 0/1)

Case 0/1 page table missing but page present.
Introduces consistency issue.
Potentially every page out requires update of
disk page table.
Linux disallows this
If a page table is swapped out, then swap out its
data pages too.

20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p0
data
PDBR
Data page
Page directory
PTE
p1
Disk
Page table
55
Translating with the P6 Page Tables (case 0/0)

Case 0/0 page table and page missing.
MMU Action
Page fault exception

20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p0
PDBR
Page directory
PTE
data
p0
Disk
Page table
Data page
56
Translating with the P6 Page Tables (case 0/0)

OS action
Swap in page table.
Restart faulting instruction by returning from
handler.
Like case 1/0 from here on.

20
12
VPN
VPO
VPN1
VPN2
Mem
PDE
p1
PTE
p0
PDBR
Page table
Page directory
data
Disk
Data page
57
P6 L1 Cache Access
CPU
32
L2 andDRAM
result
20
12
virtual address (VA)
VPN
VPO
L1 miss
L1 hit
4
16
TLBT
TLBI
L1 (128 sets, 4 lines/set)
TLB hit
TLB miss
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
VPN2
20
12
20
5
7
PPN
PPO
CT
CO
CI
physical address (PA)
PDE
PTE
Page tables
PDBR
58
Speeding Up L1 Access
Tag Check
20
5
7
CT
CO
CI
Physical address (PA)
PPO
PPN
Addr. Trans.
No Change
CI
virtual address (VA)
VPN
VPO
20
12

Observation
Bits that determine CI identical in virtual and
physical address
Can index into cache while address translation
taking place
Then check with CT from physical address
Virtually indexed, physically tagged
Cache carefully sized to make this possible

59
Linux Organizes VM as Collection of Areas

Area
Contiguous chunk of (allocated) virtual memory
whose pages are related
Examples code segment, data segment, heap,
shared library segment, etc.
Any existing virtual page is contained in some
area.
Any virtual page that is not part of some area
does not exist and cannot be referenced!
Thus, the virtual address space can have gaps.
The kernel does not keep track of virtual pages
that do not exist.
task_struct
Kernel maintains a distinct task structure for
each process
Contain all the information that the kernel needs
to run the process
PID, pointer to the user stack, name of the
executable object file, program counter, etc.
mm_struct
One of the entries in the task structure that
characterizes the current state of virtual memory
pgd base of the page directory table
mmap points to a list of vm_area_struct

60
Linux Organizes VM as Collection of Areas
process virtual memory
vm_area_struct
task_struct
mm_struct
vm_end
vm_start
pgd
mm
vm_prot
vm_flags
mmap
shared libraries
vm_next
0x40000000
vm_end
vm_start
data

vm_prot
read/write permissions for this area
vm_flags
shared with other processes or private to this
process

vm_prot
vm_flags
0x0804a020
text
vm_next
vm_end
vm_start
0x08048000
vm_prot
vm_flags
0
vm_next
61
Linux Page Fault Handling
process virtual memory

Is the VA legal?
i.e. is it in an area defined by a
vm_area_struct?
if not then signal segmentation violation (e.g.
(1))
Is the operation legal?
i.e., can the process read/write this area?
if not then signal protection violation
fault (e.g., (2))
If OK, handle the page fault
e.g., (3)

vm_area_struct
shared libraries
1
read
3
data
read
2
text
write
0
62
Memory Mapping

Linux (also, UNIX) initializes the contents of a
virtual memory area by associating it with an
object on disk
Create new vm_area_struct and page tables for
area
Areas can be mapped to one of two types of
objects (i.e., get its initial values from)
Regular file on disk (e.g., an executable object
file)
The file is divided into page-sized pieces.
The initial contents of a virtual page comes from
each piece.
If the area is larger than file section, then the
area is padded with zeros.
Anonymous file (e.g., bss)
An area can be mapped to an anonymous file,
created by the kernel.
The initial contents of these pages are
initialized as zeros
Also, called demand-zero pages
Key point no virtual pages are copied into
physical memory until they are referenced!
Known as demand paging
Crucial for time and space efficiency

63
User-Level Memory Mapping

void mmap(void start, int len,
int prot, int flags, int fd, int
offset)
map len bytes starting at offset offset of the
file specified by file description fd, preferably
at address start (usually 0 for dont care).
prot PROT_EXEC, PROT_READ, PROT_WRITE
flags MAP_PRIVATE, MAP_SHARED, MAP_ANON
MAP_PRIVATE indicates a private copy-on-write
object
MAP_SHARED indicates a shared object
MAP_ANON with NULL fd indicates an anonymous file
(demand-zero pages)
Return a pointer to the mapped area.
Int munmap(void start, int len)
Delete the area starting at virtual address start
and length len

64
Shared Objects

Why shared objects?
Many processes need to share identical read-only
text areas. For example,
Each tcsh process has the same text area.
Standard library functions such as printf
It would be extremely wasteful for each process
to keep duplicate copies in physical memory
An object can be mapped as either a shared object
or a private object
Shared object
Any write to that area is visible to any other
processes that have also mapped the shared
object.
The changes are also reflected in the original
object on disk.
A virtual memory area into which a shared object
is mapped is called a shared area.
Private object
Any write to that area is not visible to other
processes.
The changes are not reflected back to the object
on disk.
Private objects are mapped into virtual memory
using copy-on-write.
Only one copy of the private object is stored in
physical memory.
The page table entries for the private area are
flagged as read-only
Any write to some page in the private area
triggers a protection fault
The hander needs to create a new copy of the page
in physical memory and then restores the write
permission to the page.
After the handler returns, the process proceeds
normally

65
Shared Object
66
Private Object
67
Exec() Revisited

To run a new program p in the current process
using exec()
Free vm_area_structs and page tables for old
areas.
Create new vm_area_structs and page tables for
new areas.
stack, bss, data, text, shared libs.
text and data backed by ELF executable object
file.
bss and stack initialized to zero.
Set PC to entry point in .text
Linux will swap in code and data pages as needed.

process-specific data structures (page
tables, task and mm structs)
physical memory
same for each process
kernel code/data/stack
kernel VM
0xc0
demand-zero
stack
esp
process VM
Memory mapped region for shared libraries
.data
.text
libc.so
brk
runtime heap (via malloc)
demand-zero
uninitialized data (.bss)
initialized data (.data)
.data
program text (.text)
.text
p
forbidden
0
68
Fork() Revisited

To create a new process using fork()
Make copies of the old processs mm_struct,
vm_area_structs, and page tables.
At this point the two processes are sharing all
of their pages.
How to get separate spaces without copying all
the virtual pages from one space to another?
copy on write technique.
copy-on-write
Make pages of writeable areas read-only
flag vm_area_structs for these areas as private
copy-on-write.
Writes by either process to these pages will
cause page faults.
Fault handler recognizes copy-on-write, makes a
copy of the page, and restores write permissions.
Net result
Copies are deferred until absolutely necessary
(i.e., when one of the processes tries to modify
a shared page).

69
Dynamic Memory Allocation

Heap
An area of demand-zero memory that begins
immediately after the bss area.
Allocator
Maintains the heap as a collection of various
sized blocks.
Each block is a contiguous chunk of virtual
memory that is either allocated or free.
Explicit allocator requires the application to
allocate and free space
E.g., malloc and free in C
Implicit allocator requires the application to
allocate, but not to free space
The allocator needs to detect when an allocated
block is no longer being used
Implicit allocators are also known as garbage
collectors.
The process of automatically freeing unused
blocks is known as garbage collection.
E.g. garbage collection in Java, ML or Lisp

70
Heap
memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region for shared libraries
the brk ptr points to the top of the heap
run-time heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
0
71
Malloc Package

include ltstdlib.hgt
void malloc(size_t size)
If successful
Returns a pointer to a memory block of at least
size bytes
(Typically) aligned to 8-byte boundary so that
any kind of data object can be contained in the
block
If size 0, returns NULL
If unsuccessful (i.e. larger than virtual
memory) returns NULL (0) and sets errno.
Two other variations calloc (initialize the
allocated memory to zero) and realloc
Use the mmap or munmap function, or use sbrk
function
void realloc(void p, size_t size)
Changes the size of block to p and returns
pointer to the new block.
Contents of the new block unchanged up to min of
old and new size.
void free(void p)
Returns the block pointed at by p to pool of
available memory
p must come from a previous call to malloc or
realloc.

72
Malloc Example
void foo(int n, int m) int i, p /
allocate a block of n ints / if ((p (int )
malloc(n sizeof(int))) NULL)
perror("malloc") exit(0) for (i0
iltn i) pi i / add m bytes to end
of p block / if ((p (int ) realloc(p, (nm)
sizeof(int))) NULL) perror("realloc")
exit(0) for (in i lt nm i)
pi i / print new array / for (i0
iltnm i) printf("d\n", pi) free(p)
/ return p to available memory pool /
73
Allocation Examples
p1 malloc(4)
p2 malloc(5)
p3 malloc(6)
free(p2)
p4 malloc(2)
74
Requirements (Explicit Allocators)

Applications
Can issue arbitrary sequence of allocation and
free requests
Free requests must correspond to an allocated
block
Allocators
Cant control the number or the size of allocated
blocks
Must respond immediately to all allocation
requests
i.e., cant reorder or buffer requests
Must allocate blocks from free memory
i.e., can only place allocated blocks in free
memory
Must align blocks so they satisfy all alignment
requirements
8 byte alignment for GNU malloc (libc malloc) on
Linux boxes
Can only manipulate and modify free memory
Cant move the allocated blocks once they are
allocated
i.e., compaction is not allowed

75
Goals of Allocators

Maximize throughput
Throughput number of completed requests per unit
time
Example
5,000 malloc calls and 5,000 free calls in 10
seconds
Throughput is 1,000 operations/second
Maximize memory utilization
Need to minimize fragmentation.
Fragmentation (holes) unused area
There is a tradeoff between throughput and memory
utilization
Need to balance these two goals
Good locality properties
Similar objects should be allocated close in
space

76
Internal Fragmentation

Poor memory utilization caused by fragmentation.
Comes in two forms internal and external
fragmentation
Internal fragmentation
For some block, internal fragmentation is the
difference between the block size and the payload
size.
Caused by overhead of maintaining heap data
structures, i.e. padding for alignment purposes.
Any virtual memory allocation policy using the
fixed sized block such as paging can suffer from
internal fragmentation

block
Internal fragmentation
payload
Internal fragmentation
77
External Fragmentation
Occurs when there is enough aggregate heap
memory, but no single free block is large enough
p1 malloc(4)
p2 malloc(5)
p3 malloc(6)
free(p2)
p4 malloc(6)
oops!
External fragmentation depends on the pattern of
future requests, and thus is difficult to
measure.
78
Implementation Issues

Free block organization
How do we know the size of a free block?
How do we keep track of the free blocks?
Placement
How do we choose an appropriate free block in
which to place a newly allocated block?
Splitting
What do we do with the extra space after the
placement?
Coalescing
What do we do with small blocks that have been
freed

p1 malloc(1)
79
How do we know the size of a block?

Standard method
Keep the length of a block in the word preceding
the block.
This word is often called the header field or
header
Requires an extra word for every allocated block
Format of a simple heap block

0
31
1
2
3
a 1 Allocated a 0 Free The block size
includes the header, payload, and any padding.
malloc returns a pointer to the beginning of the
payload
Block size
0 0 a
Payload (allocated block only)
Padding (optional)
80
Example
81
Keeping Track of Free Blocks

Method 1 Implicit list using lengths -- links
all blocks
Method 2 Explicit list among the free blocks
using pointers within the free blocks
Method 3 Segregated free list
Different free lists for different size classes

5
4
2
6
5
4
2
6
82
Placement Policy

First fit
Search list from the beginning, choose the first
free block that fits
Can take linear time in total number of blocks
(allocated and free)
() Tend to retain large free blocks at the end
(-) Leave small free blocks at beginning
Next fit
Like first-fit, but search the list starting from
the end of previous search
() Run faster than the first fit
(-) Worse memory utilization than the first fit
Best fit
Search the list, choose the free block with the
closest size that fits
() Keeps fragments small better memory
utilization than the other two
(-) Will typically run slower requires an
exhaustive search of the heap

83
Splitting

Allocating in a free block - splitting
Since allocated space might be smaller than free
space, we might want to split the block

4
4
2
6
p
addblock(p, 2)
2
4
2
4
4
84
Coalescing

Coalescing
When the allocator frees a block, there might be
other free blocks that are adjacent.
Such adjacent free blocks can cause a false
fragmentation, where there is an enough free
space, but chopped up into small, unusable free
spaces.
Need to coalesce next and/or previous block if
they are free
Coalescing with next block
But how do we coalesce with previous block?

2
4
2
4
p
free(p)
4
4
2
6
85
Bidirectional Coalescing

Boundary tags Knuth73
Replicate size/allocated word (called footer) at
the bottom of a block
Allows us to traverse the list backwards, but
requires extra space
Important and general technique! allow constant
time coalescing

1 word
Header
size
a
a 1 allocated block a 0 free block size
total block size payload application
data (allocated blocks only)
payload and padding
Format of allocated and free blocks
size
a
Boundary tag (footer)
4
4
4
4
6
4
6
4
86
Constant Time Coalescing
Case 1
Case 2
Case 3
Case 4
allocated
allocated
free
free
block being freed
allocated
free
allocated
free
87
Constant Time Coalescing (Case 1)
m1
1
m1
1
m1
1
m1
1
n
1
n
0
n
1
n
0
m2
1
m2
1
m2
1
m2
1
88
Constant Time Coalescing (Case 2)
m1
1
m1
1
m1
1
m1
1
nm2
0
n
1
n
1
m2
0
nm2
0
m2
0
89
Constant Time Coalescing (Case 3)
m1
0
nm1
0
m1
0
n
1
n
1
nm1
0
m2
1
m2
1
m2
1
m2
1
90
Constant Time Coalescing (Case 4)
m1
0
nm1m2
0
m1
0
n
1
n
1
m2
0
m2
0
nm1m2
0
91
Implicit Lists Summary

Implementation is very simple
Allocate takes linear time in the worst case
Free takes constant time in the worst case --
even with coalescing
Memory usage will depend on placement policy
First fit, next fit or best fit
Not used in practice for malloc/free because of
linear time allocate.
Used for special purpose applications where the
total number of blocks is known beforehand to be
small
However, the concepts of splitting and boundary
tag coalescing are general to all allocators.

92
Keeping Track of Free Blocks

Method 1 Implicit list using lengths -- links
all blocks
Method 2 Explicit list among the free blocks
using pointers within the free blocks
Method 3 Segregated free lists
Different free lists for different size classes

5
4
2
6
93
Explicit Free Lists

Use data space for pointers
Typically doubly linked
Still need boundary tags for coalescing

Forward links
A
B
4
4
4
4
6
6
4
4
4
4
C
Back links
94
Format of Doubly-Linked Heap Blocks
0
31
1
2
3
0
31
1
2
3
Block size
a/f
Block size
a/f
Header
Header
Payload

pred (Predecessor)
succ (Successor)
Old payload
Padding (optional)
Padding (optional)
Block size
a/f
Footer
Block size
a/f
Footer
Allocated Block
Free Block
95
Freeing With Explicit Free Lists

Insertion policy Where in the free list do you
put a newly freed block?
LIFO (last-in-first-out) policy
Insert freed block at the beginning of the free
list
() Simple and freeing a block can be performed
in constant time.
If boundary tags are used, coalescing can also be
performed in constant time.
Address-ordered policy
Insert freed blocks so that free list blocks are
always in address order
i.e. addr(pred) lt addr(curr) lt addr(succ)
(-) Freeing a block requires linear-time search
() Studies suggest address-ordered first fit
enjoys better memory utilization than
LIFO-ordered first fit.

96
Explicit List Summary

Comparison to implicit list
Allocation time takes linear in the number of
free blocks instead of total blocks
Much faster allocates when most of the memory is
full
Slightly more complicated allocate and free since
needs to splice blocks in and out of the list
Extra space for the links (2 extra words needed
for each block)
This results in a larger minimum block size, and
potentially increase the degree of internal
fragmentation
Main use of linked lists is in conjunction with
segregated free lists
Keep multiple linked lists of different size
classes, or possibly for different types of
objects

97
Keeping Track of Free Blocks

Method 1 Implicit list using lengths -- links
all blocks
Method 2 Explicit list among the free blocks
using pointers within the free blocks
Method 3 Segregated free list
Different free lists for different size classes
Can be used to reduce the allocation time
compared to a linked list organization

5
4
2
6
5
4
2
6
98
Segregated Storage

Partition the set of all free blocks into
equivalent classes called size classes
The allocator maintains an array of free lists,
with one free list per size class ordered by
increasing size.

Often have separate size class for every small
size (2,3,4,)
Classes with larger sizes typically have a size
class for each power of 2
Variations of segregated storage
They differ in how they define size classes, when
they perform coalescing, and when they request
additional heap memory to OS, whether they allow
splitting, and so on.
Examples simple segregated storage, segregated
fits

99
Simple Segregated Storage

Separate heap and free list for each size class
Free list for each size class contains same-sized
blocks of the largest element size
For example, the free list for size class 17-32
consists entirely of block size 32
To allocate a block of size n
If free list for size n is not empty, allocate
the first block in its entirety
If free list is empty, get a new page from OS,
create a new free list from all the blocks in
page, and then allocate the first block on list
To free a block
Simply insert the free block at the front of the
appropriate free list
() Both allocating and freeing blocks are fast
constant-time operations.
() Little per-block memory overhead no
splitting and no coalescing
(-) Susceptible to internal and external
fragmentation
Internal fragmentation since free blocks are
never split
External fragmentation since free blocks are
never coalesced

100
Segregated Fits

Array of free lists, each one for some size class
Free list fir each size class contains
potentially different-sized blocks
To allocate a block of size n
Do a first-fit search of the appropriate free
list
If an appropriate block is found
Split (option) the block and place the fragment
on the appropriate list
If no block is found, try the next larger class
and repeat this until block is found
If none of free lists yields a block that fits,
request additional heap memory to OS, allocate
the block out of this new heap memory, and place
the remainder in the largest size
To free a block
Coalesce and place on the appropriate list
() Fast
Since searches are limited to part of the heap
rather than the entire heap area
However, coalescing can increase search times
() Good memory utilization
A simple first-fit search approximates a best-fit
search of the entire heap
Popular choice for production-quality allocators
such as GNU malloc

101
Garbage Collection

Garbage collector dynamic storage allocator that
automatically frees allocated blocks that are no
longer used
Implicit memory management an application never
has to free

void foo() int p malloc(128) return
/ p block is now garbage /

Common in functional languages, scripting
languages, and modern object oriented languages
Lisp, ML, Java, Perl, Mathematica,
Variants (conservative garbage collectors) exist
for C and C
Cannot collect all garbages

102
Garbage Collection

How does the memory manager know when memory can
be freed?
In general we cannot know what is going to be
used in the future since it depends on
conditionals
But we can tell that certain blocks cannot be
used if there are no pointers to them
Need to make certain assumptions about pointers
Memory manager need to distinguish pointers from
non-pointers
Garbage Collection
Garbage collectors views memory as a reachability
graph and periodically reclaim the unreachable
nodes
Classical GC Algorithms
Mark and sweep collection (McCarthy, 1960)
Does not move blocks (unless you also compact)
Reference counting (Collins, 1960)
Does not move blocks (not discussed)
Copying collection (Minsky, 1963)
Moves blocks (not discussed)

103
Memory as a Graph

Reachability graph we view memory as a directed
graph
Each block is a node in the graph
Each pointer is an edge in the graph
Locations not in the heap that contain pointers
into the heap are called root node
e.g. registers, locations on the stack, global
variables

Root nodes
Heap nodes
reachable
Not-reachable(garbage)

A node (block) is reachable if there is a path
from any root to that node.
Non-reachable nodes are garbage (never needed by
the application)

104
Mark and Sweep Garbage Collectors

A MarkSweep garbage collector consists of a mark
phase followed by a sweep phase
Use extra mark bit in the head of each block
When out of space
Mark Start at roots and set mark bit on all
reachable memory blocks
Sweep Scan all blocks and free blocks that are
not marked