Memory Management, Background and Hardware Support

About This Presentation

Title:

Memory Management, Background and Hardware Support

Description:

... many process loaded in the front end of memory that must ... Client Host. Web Server. page.html. page.html. page.html. page.html. image.jpg. 1. 2. 3. Fred Kuhns ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 41

Provided by: Fre58

Category:

more less

Transcript and Presenter's Notes

Title: Memory Management, Background and Hardware Support

1
Memory Management,Background and Hardware Support
Fred Kuhns (fredk_at_arl.wustl.edu,
http//www.arl.wustl.edu/fredk) Applied Research
Laboratory Department of Computer Science and
Engineering Washington University in St. Louis
2
Recall the Von Neumann Architecture
3
Primary Memory

Primary Memory Design Requirements
Minimize access time hardware and software
requirement
Maximize available memory using physical and
virtual memory techniques
Cost-effective limited to a small percentage
total
Memory Manager Functions
Allocate memory to processes
Map process address space to allocated memory
Minimize access times while limiting memory
requirements

4
Process Address Space

Compiler produces relocatable object modules
Linker combines modules into an absolute module
(loadable module).
addresses are relative, typically starting at 0.
Loader loads program into memory and adjusts
addresses to produce an executable module.

5
UNIX Process Address Space
Low Address (0x00000000)
Process Address space
stack (dynamic)
High Address (0x7fffffff)
6
Big Picture
kernel memory
7
Memory Management

Central Component of any operating system
Memory Partitioning schemes Fixed, Dynamic,
Paging, Segmentation, Combination
Relocation
Hierarchical layering to optimize performance and
cost
registers
cache
primary (main) memory
secondary (backing store, local disk) memory
file servers (networked storage)
Policies target expected memory requirements of
processes
consider short, medium and long term resource
requirements
long term admission of new processes (overall
system requirements)
medium term memory allocation (per process
requirements)
short term processor scheduling (immediate
needs)
Common goal optimize number of runnable process
resident in memory

8
Fixed Partitioning

Partition memory into regions with fixed
boundaries
Equal-size partitions
program size lt partition
program size gt partition size, then must use
overlays
Use swapping when no available partitions
Unequal-size partitions
Main memory use is inefficient, suffers from
Internal Fragmentation

9
Placement Algorithm with Partitions
Operating System

Equal-size partitions
any partition may be used since all are equal in
size
balance partition size with expected allocation
needs
Unequal-size partitions
assign each process to the smallest partition
within which it will fit. Use per partition
process queues.
processes are assigned in such a way as to
minimize wasted memory within a partition
(internal fragmentation)

New Processes

wait for best fit
Operating System
New Processes

select smallest available
10
Variable Partitioning

External Fragmentation - small holes in memory
between allocated partitions.
Partitions are of variable length and number
Process is allocated exactly as much memory as
required
Must use compaction to shift processes so they
are contiguous and all free memory is in one block

11
Example Dynamic Partitioning
Add processes 1 - 320K 2 - 224K 3 288K
Add processes 4 - 128K Swap Processes 2
224K swap process 2 to make room for process 4.
Remove processes 1 - 320K Swapped Processes 2
224K Relocate Processes move process 2 into
memory freed by process 1
12
Variable Partition Placement Algorithm

Best-fit generally worst performer overall
place in smallest unused block to minimize unused
fragment sizes
Worst-fit
place in largest unused block to maximize unused
fragment sizes
First-fit simple and fast
scan from beginning and choose first that is
large enough.
may have many process loaded in the front end of
memory that must be scanned
Next-fit tends to perform worse than first-fit
scan memory from the location of the last
allocation and select next available block large
enough to hold process
tens to allocate at the end of memory where the
largest block is found
Use compaction to combine unused blocks into
larger continuous blocks.

13
Variable Partition Placement Algorithm
start of memory
8K
8K
alloc 16K block
12K
12K
First Fit
22K
6K Fragment
6K
Last allocated block (14K)
Best Fit
18K
2K Fragment
2K
8K
8K
6K
6K
Allocated block
Free block
14K
14K
Next Fit
36K
20K Fragment
20K
Before
After
14
Addresses

Logical Address
reference to a memory location independent of the
current assignment of data to memory
Relative Address (type of logical address)
address expressed as a location relative to some
known point
Physical Address
the absolute address or actual location

15
Relocation

Fixed partitions When program loaded absolute
memory locations assigned
A process may occupy different partitions over
time
swapping and compaction cause a program to occupy
different partitions gt different absolute memory
locations
Dynamic Address Relocation relative address
used with HW support
Special purpose registers are set when process is
loaded relocated at run-time
Base register starting address for the process
relative address is added to base register to
produce an absolute address
Bounds register ending location of the process
Absolute address compared to bounds register, if
not within bounds then an interrupt is generated

16
Hardware Support for Relocation
Relative address
Process Control Block
Base Register
Adder
Program
Absolute address
Bounds Register
Comparator
Data
Interrupt to operating system
Stack
Process image in main memory
17
Techniques

Paging
Partition memory into small equal-size chunks
Chunks of memory are called frames
Divide each process into the same size chunks
Chunks of a process are called pages
Operating system maintains a page table for each
process
contains the frame location for each process page
memory address page number offset
Segmentation
All segments of all programs do not have to be of
the same length
There is a maximum segment length
Addressing consist of two parts - a segment
number and an offset
Since segments are not equal, segmentation is
similar to dynamic partitioning

18
Memory Management Requirements

Relocation
program memory location determined at load time.
program may be moved to different location at run
time (relocate).
Consequences memory references must be
translated to actual physical memory address
Protection
Protect against inter-process interference
(transparent isolation).
Consequences Must check addresses at run time
when relocation supported.
Sharing
Controlled sharing between processes
Access restrictions may depend on the type of
access
Permit sharing of read-only program text for
efficiency reasons
Require explicit concurrency protocol for
processes to share program data segments.

19
Memory Hierarchy
CPU Registers
500 Bytes 1 clock cycle
Executable Memory
Cache Memory
lt 10MB 1-2 Clock cycles
Primary Memory
lt 1GB 1-4 Clock cycles
lt 100GB (per device) 5-50 usec
Rotating Magnetic Memory
Secondary Storage
Optical Memory
lt 15GB (per device) 25 usec 1 sec
lt 5GB (per tape) seconds
Sequential Accessed Memory
20
Principle of Locality

Programs tend to cluster memory references for
both data and instructions. Further, this
clustering changes slowly with time.
Hardware and software exploit principle of
locality.
Temporal locality if location is referenced
once, then it is likely to be referenced again in
the near future.
Spatial locality if a memory location is
referenced then other nearby locations will be
referenced.
Stride-k (data) reference patterns
visit every kth element of a contiguous vector.
stride-1 reference patterns are very common.
for (i 1, Array0 0 i lt N i)
Arrayi calc_element(Arrayi-i)

21
Caching A possible Scenario
Client Host
Web Server
Disk (files)
CPU
DRAM (Primary)
page.html
page.html
4
page.html
page.html
cache
image.jpg
image.jpg
2
3
1

Copy of web page moved to a file on the client
(cached).
Part of the file is copied into primary memory so
program can process data (cached)
A cache line is copied into cache for program
to use
Individual words are copied into CPU registers as
they are manipulated by the program

22
Hardware Requirements

Protection Prevent process from changing own
memory maps
Residency CPU distinguishes between resident and
non-resident pages
Loading Load pages and restart interrupted
program instructions
Dirty Determine if pages have been modified

23
Memory Management Unit

Translates Virtual Addresses
Page tables
Translation Lookaside Buffer (TLB)
Page tables
One for kernel addresses
One or more for user space processes
Page Table Entry (PTE) one per virtual page
32 bits - page frame, protection, valid,
modified, referenced

24
Caching terminology

Cache hit requested data is found in cache
Cache miss data not found in cache memory
cold miss cache is empty
conflict miss cache line occupied by a different
memory location
capacity miss working set is larger than cache
Placement policy where new block (i.e. cache
line) is placed
Replacement policy controls which block is
selected for eviction
direct mapped one-to-one mapping between cache
lines and memory locations.
fully associative any line in memory can be
cached in any cache line
N-way set associative A line in memory can be
stored in any of N-lines associated with the
mapped set.

25
Cache/Primary Memory Structure
Memory Address
Set Number
Cache
0
1
2
Block
Set 0
3
...
Set S-1
E lines per set m address bits t s b M
2m, maximum memory address t m (sb) tag
bits per line S 2s, sets in the cache B 2b,
data Bytes per line V valid bit, 1 per line.
May also require a dirty bit. C cache size
B?E?S 2sb?E
Block
2n - 1
Word Length
The s-bits select the set number while the t-bits
(tag) uniquely id the memory location.
26
Cache Design

Write policy
hit write-through versus write-back
miss write-allocate versus no-write-allocate
Replacement algorithm
determines which block to replace (LRU)
Block size
data unit exchanged between cache and main memory

27
Translation

Virtual address
virtual page number offset
Finds PTE for virtual page
Extract physical page and adds offset
Fail (MMU raises an exception - page fault)
bounds error - outside address range
validation error - non-resident page
protection error - not permitted access

28
Some details

Limit Page Table size
segments
page the page table (multi-level page table)
MMU has registers which point to the current page
table(s)
kernel and MMU can modify page tables and
registers
Problem
Page tables require perhaps multiple memory
access per instruction
Solution
rely on HW caching (virtual address cache)
cache the translations themselves - TLB

29
Translation Lookaside Buffer

Associative cache of address translations
Entries may contain a tag identifying the process
as well as the virtual address.
Why is this important?
MMU typically manages the TLB
Kernel may need to invalidate entries,
Would the kernel ever need to invalidate entries?
Contains page table entries that have been most
recently used
Functions same way as a memory cache
Given a virtual address, processor examines the
TLB
If present (a hit), the frame number is retrieved
and the real address is formed
If not found (a miss), page number is used to
index the process page table

30
Address Translation - General
CPU
virtual address
MMU
cache
Physical address
data
Global memory
31
Address Translation Overview
MMU
Virtual address
CPU
physical address
cache
TLB
Page tables
32
Page Table Entry
Y bits
X bits
Virtual address
virtual page number
offset in page
frame number
M
R
control bits
Page Table Entry (PTE)
Z bits

Resident bit indicates if page is in memory
Modify bit to indicate if page has been altered
since loaded into main memory
Other control bits
frame number, this is the physical frame address.

33
Example 1-level address Translation
Virtual address
DRAM Frames
12 bits
20 bits
Frame X
X
offset
add
PTE
frame number
M
R
control bits
(Process) Page Table
current page table register
34
SuperSPARC Reference MMU
Physical address
offset
Physical page
Context Tbl Ptr register
Context Tbl
12 Bits
24 Bits
PTD
Level 1
Level 2
PTD
Level 2
PTD
Context register
12 bit
PTE
6 bits
8 bits
6 bits
12 bits
Virtual address
4096
index 1
index 2
index 3
offset
virtual page

12 bit index for 4096 entries
8 bit index for 256 entries
6 bit index for 64 entries

Virtual page number has 20 bits for 1M pages
Physical frame number has 24 bits with a 12 bit
offset,permitting 16M frames.

35
Page Table Descriptor/Entry
Page Table Descriptor
type
Page Table Pointer
2 1 0
Page Table Entry
ACC
C
M
R
Phy Page Number
type
8 7 6 5 4 2 1 0
Type PTD, PTE, Invalid C - Cacheable M -
Modify R - Reference ACC - Access permissions
36
Page Size

Smaller page size
Reduce internal fragmentation
If program uses relatively small segments of
memory then small page sizes reflect this
behavior
Large page size
Secondary memory is designed to efficiently
transfer large blocks of data
Smaller page size gt more pages required per
process gt larger page tables gt larger page
tables means large portion of page tables in
virtual memory and smaller TLB footprint
Multiple page sizes provide the flexibility
needed to effectively use a TLB
Large pages can be used for program instructions
or better for kernel memory thereby decreasing
its footprint in the page tables
Most operating system support only one page size

37
Segmentation

May be unequal, dynamic size
Simplifies handling of growing data structures
Allows programs to be altered and recompiled
independently
Lends itself to sharing data among processes
Lends itself to protection
Segment tables
corresponding segment in main memory
Each entry contains the length of the segment
A bit is needed to determine if segment is
already in main memory
Another bit is needed to determine if the segment
has been modified since it was loaded in main
memory

38
Segment Table Entries
Virtual Address
Segment Number
Offset
Segment table Entry
Length
ctrl
Segment Base Address
39
Combined Paging and Segmentation

Paging is transparent to the programmer
Paging eliminates external fragmentation
Segmentation is visible to the programmer
Segmentation allows for growing data structures,
modularity, and support for sharing and
protection
Each segment is broken into fixed-size pages

40
(No Transcript)

Write a Comment

User Comments (0)