Title: Memory Management, Background and Hardware Support
1Memory Management,Background and Hardware Support
Fred Kuhns (fredk_at_arl.wustl.edu,
http//www.arl.wustl.edu/fredk) Applied Research
Laboratory Department of Computer Science and
Engineering Washington University in St. Louis
2Recall the Von Neumann Architecture
3Primary Memory
- Primary Memory Design Requirements
- Minimize access time hardware and software
requirement - Maximize available memory using physical and
virtual memory techniques - Cost-effective limited to a small percentage
total - Memory Manager Functions
- Allocate memory to processes
- Map process address space to allocated memory
- Minimize access times while limiting memory
requirements
4Process Address Space
- Compiler produces relocatable object modules
- Linker combines modules into an absolute module
(loadable module). - addresses are relative, typically starting at 0.
- Loader loads program into memory and adjusts
addresses to produce an executable module.
5UNIX Process Address Space
Low Address (0x00000000)
Process Address space
stack (dynamic)
High Address (0x7fffffff)
6Big Picture
kernel memory
7Memory Management
- Central Component of any operating system
- Memory Partitioning schemes Fixed, Dynamic,
Paging, Segmentation, Combination - Relocation
- Hierarchical layering to optimize performance and
cost - registers
- cache
- primary (main) memory
- secondary (backing store, local disk) memory
- file servers (networked storage)
- Policies target expected memory requirements of
processes - consider short, medium and long term resource
requirements - long term admission of new processes (overall
system requirements) - medium term memory allocation (per process
requirements) - short term processor scheduling (immediate
needs) - Common goal optimize number of runnable process
resident in memory
8Fixed Partitioning
- Partition memory into regions with fixed
boundaries - Equal-size partitions
- program size lt partition
- program size gt partition size, then must use
overlays - Use swapping when no available partitions
- Unequal-size partitions
- Main memory use is inefficient, suffers from
Internal Fragmentation
9Placement Algorithm with Partitions
Operating System
- Equal-size partitions
- any partition may be used since all are equal in
size - balance partition size with expected allocation
needs - Unequal-size partitions
- assign each process to the smallest partition
within which it will fit. Use per partition
process queues. - processes are assigned in such a way as to
minimize wasted memory within a partition
(internal fragmentation)
New Processes
wait for best fit
Operating System
New Processes
select smallest available
10Variable Partitioning
- External Fragmentation - small holes in memory
between allocated partitions. - Partitions are of variable length and number
- Process is allocated exactly as much memory as
required - Must use compaction to shift processes so they
are contiguous and all free memory is in one block
11Example Dynamic Partitioning
Add processes 1 - 320K 2 - 224K 3 288K
Add processes 4 - 128K Swap Processes 2
224K swap process 2 to make room for process 4.
Remove processes 1 - 320K Swapped Processes 2
224K Relocate Processes move process 2 into
memory freed by process 1
12Variable Partition Placement Algorithm
- Best-fit generally worst performer overall
- place in smallest unused block to minimize unused
fragment sizes - Worst-fit
- place in largest unused block to maximize unused
fragment sizes - First-fit simple and fast
- scan from beginning and choose first that is
large enough. - may have many process loaded in the front end of
memory that must be scanned - Next-fit tends to perform worse than first-fit
- scan memory from the location of the last
allocation and select next available block large
enough to hold process - tens to allocate at the end of memory where the
largest block is found - Use compaction to combine unused blocks into
larger continuous blocks.
13Variable Partition Placement Algorithm
start of memory
8K
8K
alloc 16K block
12K
12K
First Fit
22K
6K Fragment
6K
Last allocated block (14K)
Best Fit
18K
2K Fragment
2K
8K
8K
6K
6K
Allocated block
Free block
14K
14K
Next Fit
36K
20K Fragment
20K
Before
After
14Addresses
- Logical Address
- reference to a memory location independent of the
current assignment of data to memory - Relative Address (type of logical address)
- address expressed as a location relative to some
known point - Physical Address
- the absolute address or actual location
15Relocation
- Fixed partitions When program loaded absolute
memory locations assigned - A process may occupy different partitions over
time - swapping and compaction cause a program to occupy
different partitions gt different absolute memory
locations - Dynamic Address Relocation relative address
used with HW support - Special purpose registers are set when process is
loaded relocated at run-time - Base register starting address for the process
- relative address is added to base register to
produce an absolute address - Bounds register ending location of the process
- Absolute address compared to bounds register, if
not within bounds then an interrupt is generated
16Hardware Support for Relocation
Relative address
Process Control Block
Base Register
Adder
Program
Absolute address
Bounds Register
Comparator
Data
Interrupt to operating system
Stack
Process image in main memory
17Techniques
- Paging
- Partition memory into small equal-size chunks
- Chunks of memory are called frames
- Divide each process into the same size chunks
- Chunks of a process are called pages
- Operating system maintains a page table for each
process - contains the frame location for each process page
- memory address page number offset
- Segmentation
- All segments of all programs do not have to be of
the same length - There is a maximum segment length
- Addressing consist of two parts - a segment
number and an offset - Since segments are not equal, segmentation is
similar to dynamic partitioning
18Memory Management Requirements
- Relocation
- program memory location determined at load time.
- program may be moved to different location at run
time (relocate). - Consequences memory references must be
translated to actual physical memory address - Protection
- Protect against inter-process interference
(transparent isolation). - Consequences Must check addresses at run time
when relocation supported. - Sharing
- Controlled sharing between processes
- Access restrictions may depend on the type of
access - Permit sharing of read-only program text for
efficiency reasons - Require explicit concurrency protocol for
processes to share program data segments.
19Memory Hierarchy
CPU Registers
500 Bytes 1 clock cycle
Executable Memory
Cache Memory
lt 10MB 1-2 Clock cycles
Primary Memory
lt 1GB 1-4 Clock cycles
lt 100GB (per device) 5-50 usec
Rotating Magnetic Memory
Secondary Storage
Optical Memory
lt 15GB (per device) 25 usec 1 sec
lt 5GB (per tape) seconds
Sequential Accessed Memory
20Principle of Locality
- Programs tend to cluster memory references for
both data and instructions. Further, this
clustering changes slowly with time. - Hardware and software exploit principle of
locality. - Temporal locality if location is referenced
once, then it is likely to be referenced again in
the near future. - Spatial locality if a memory location is
referenced then other nearby locations will be
referenced. - Stride-k (data) reference patterns
- visit every kth element of a contiguous vector.
- stride-1 reference patterns are very common.
- for (i 1, Array0 0 i lt N i)
- Arrayi calc_element(Arrayi-i)
21Caching A possible Scenario
Client Host
Web Server
Disk (files)
CPU
DRAM (Primary)
page.html
page.html
4
page.html
page.html
cache
image.jpg
image.jpg
2
3
1
- Copy of web page moved to a file on the client
(cached). - Part of the file is copied into primary memory so
program can process data (cached) - A cache line is copied into cache for program
to use - Individual words are copied into CPU registers as
they are manipulated by the program
22Hardware Requirements
- Protection Prevent process from changing own
memory maps - Residency CPU distinguishes between resident and
non-resident pages - Loading Load pages and restart interrupted
program instructions - Dirty Determine if pages have been modified
23Memory Management Unit
- Translates Virtual Addresses
- Page tables
- Translation Lookaside Buffer (TLB)
- Page tables
- One for kernel addresses
- One or more for user space processes
- Page Table Entry (PTE) one per virtual page
- 32 bits - page frame, protection, valid,
modified, referenced
24Caching terminology
- Cache hit requested data is found in cache
- Cache miss data not found in cache memory
- cold miss cache is empty
- conflict miss cache line occupied by a different
memory location - capacity miss working set is larger than cache
- Placement policy where new block (i.e. cache
line) is placed - Replacement policy controls which block is
selected for eviction - direct mapped one-to-one mapping between cache
lines and memory locations. - fully associative any line in memory can be
cached in any cache line - N-way set associative A line in memory can be
stored in any of N-lines associated with the
mapped set.
25Cache/Primary Memory Structure
Memory Address
Set Number
Cache
0
1
2
Block
Set 0
3
...
Set S-1
E lines per set m address bits t s b M
2m, maximum memory address t m (sb) tag
bits per line S 2s, sets in the cache B 2b,
data Bytes per line V valid bit, 1 per line.
May also require a dirty bit. C cache size
B?E?S 2sb?E
Block
2n - 1
Word Length
The s-bits select the set number while the t-bits
(tag) uniquely id the memory location.
26Cache Design
- Write policy
- hit write-through versus write-back
- miss write-allocate versus no-write-allocate
- Replacement algorithm
- determines which block to replace (LRU)
- Block size
- data unit exchanged between cache and main memory
27Translation
- Virtual address
- virtual page number offset
- Finds PTE for virtual page
- Extract physical page and adds offset
- Fail (MMU raises an exception - page fault)
- bounds error - outside address range
- validation error - non-resident page
- protection error - not permitted access
28Some details
- Limit Page Table size
- segments
- page the page table (multi-level page table)
- MMU has registers which point to the current page
table(s) - kernel and MMU can modify page tables and
registers - Problem
- Page tables require perhaps multiple memory
access per instruction - Solution
- rely on HW caching (virtual address cache)
- cache the translations themselves - TLB
29Translation Lookaside Buffer
- Associative cache of address translations
- Entries may contain a tag identifying the process
as well as the virtual address. - Why is this important?
- MMU typically manages the TLB
- Kernel may need to invalidate entries,
- Would the kernel ever need to invalidate entries?
- Contains page table entries that have been most
recently used - Functions same way as a memory cache
- Given a virtual address, processor examines the
TLB - If present (a hit), the frame number is retrieved
and the real address is formed - If not found (a miss), page number is used to
index the process page table
30Address Translation - General
CPU
virtual address
MMU
cache
Physical address
data
Global memory
31Address Translation Overview
MMU
Virtual address
CPU
physical address
cache
TLB
Page tables
32Page Table Entry
Y bits
X bits
Virtual address
virtual page number
offset in page
frame number
M
R
control bits
Page Table Entry (PTE)
Z bits
- Resident bit indicates if page is in memory
- Modify bit to indicate if page has been altered
since loaded into main memory - Other control bits
- frame number, this is the physical frame address.
33Example 1-level address Translation
Virtual address
DRAM Frames
12 bits
20 bits
Frame X
X
offset
add
PTE
frame number
M
R
control bits
(Process) Page Table
current page table register
34SuperSPARC Reference MMU
Physical address
offset
Physical page
Context Tbl Ptr register
Context Tbl
12 Bits
24 Bits
PTD
Level 1
Level 2
PTD
Level 2
PTD
Context register
12 bit
PTE
6 bits
8 bits
6 bits
12 bits
Virtual address
4096
index 1
index 2
index 3
offset
virtual page
- 12 bit index for 4096 entries
- 8 bit index for 256 entries
- 6 bit index for 64 entries
- Virtual page number has 20 bits for 1M pages
- Physical frame number has 24 bits with a 12 bit
offset,permitting 16M frames.
35Page Table Descriptor/Entry
Page Table Descriptor
type
Page Table Pointer
2 1 0
Page Table Entry
ACC
C
M
R
Phy Page Number
type
8 7 6 5 4 2 1 0
Type PTD, PTE, Invalid C - Cacheable M -
Modify R - Reference ACC - Access permissions
36Page Size
- Smaller page size
- Reduce internal fragmentation
- If program uses relatively small segments of
memory then small page sizes reflect this
behavior - Large page size
- Secondary memory is designed to efficiently
transfer large blocks of data - Smaller page size gt more pages required per
process gt larger page tables gt larger page
tables means large portion of page tables in
virtual memory and smaller TLB footprint - Multiple page sizes provide the flexibility
needed to effectively use a TLB - Large pages can be used for program instructions
or better for kernel memory thereby decreasing
its footprint in the page tables - Most operating system support only one page size
37Segmentation
- May be unequal, dynamic size
- Simplifies handling of growing data structures
- Allows programs to be altered and recompiled
independently - Lends itself to sharing data among processes
- Lends itself to protection
- Segment tables
- corresponding segment in main memory
- Each entry contains the length of the segment
- A bit is needed to determine if segment is
already in main memory - Another bit is needed to determine if the segment
has been modified since it was loaded in main
memory
38Segment Table Entries
Virtual Address
Segment Number
Offset
Segment table Entry
Length
ctrl
Segment Base Address
39Combined Paging and Segmentation
- Paging is transparent to the programmer
- Paging eliminates external fragmentation
- Segmentation is visible to the programmer
- Segmentation allows for growing data structures,
modularity, and support for sharing and
protection - Each segment is broken into fixed-size pages
40(No Transcript)