Title: Memory Management Motivation
1Memory Management - Motivation
- n processes, each spending a fraction p of their
time waiting for i/o, gives a probability of pn
of all processes waiting for i/o simultanously - cpu utilization 1 - pn
2Utilizing Memory
- Assume each process takes 200k and so does the
operating system - Assume there is 1Mb of memory available and that
p0.8 - space for 4 processes 60 cpu
utilization - Another 1Mb enables 9 processes
- 87 cpu utilization
3Issues - Relocation and Linking
- Compile time - create absolute code
- Load time - linker lists relocatable
instructions and loader changes instructions (at
each reload..) - Execution time - special hardware needed to
support moving of processes during run time - Dynamic Linking - used with system libraries and
includes only a stub in each user routine,
indicating how to locate the memory-resident
library function (or how to load it, if needed)
4Multiprogramming with fixed partitions
- How to organize the memory ?
- How to assign jobs to partitions ?
- Separate queues vs. single queue
5Allocating memory - growing segments
6Memory allocation - Keeping track (bitmaps
linked lists)
7Strategies for Allocation
- First fit do not search too much..
- Next fit - start search from last location
- Best fit - a drawback generates small holes
- Worst fit - solve the above problems badly
- Quick fit - several queues of different sizes
- An example elaborate scheme the Buddy system
(Knuth 1973) - Separate lists of free holes of sizes of powers
of two - For any request, pick the 1st hole of the right
size - Not very good memory utilization
- Freed blocks can only be merged with their own
size - Main problem of memory allocation -
Fragmentation - Internal wasted parts of allocated space
- External wasted unallocated space
8Memory Protection
- Hardware
- history IBM 360 had a 4bit protection code in
PSW and memory in 2k partitions - process code in
PSW matches memory partition code - Two registers - base limit
- base is added by hardware without changing
instructions dynamic relocation - every request is checked against limit
runtime bound checking - reminder In the IBM/pc there are segment
registers (but no limit)
9Managing memory by Swapping
- Processes from disk to memory and from memory to
disk - Whenever there are too many jobs to fit in memory
- To use memory more efficiently - variable
partitions - Allocating memory
- Freeing memory and holes
- possible solution memory compaction
- some form of swapping is required with any
multiprogramming - since swapping is performed on whole processes
it results in a noticeable response time - longer queues of blocked processes can lead to
many swaps.. - Allocating swap space
- Processes are swapped in/out from the same
location - Allocate space for non-memory resident processes
only
10Paging and Virtual Memory
- Divide memory into fixed-size blocks
(page-frames) - Small enough blocks - many for one process
- Allocate to processes non-contiguous memory
chunks - avoiding holes.. - 232 addresses for a 32 bit (address bus) machine
- virtual addresses - A memory management unit (MMU) does the mapping
to physical addresses - pages ---gt page frames
- Machine instructions reference addresses more
than one address per instruction plus fetching
instructions - absolute code becomes meaningless..
11Memory Management Unit
12Paging
13MMU Operation - page fault if accessed page is
absent
14Page table considerations
- Can be very large (1M pages for 32bits addresses)
- Must be fast (every instruction needs it)
- One extreme will have it all in hardware - fast
registers that hold the page table and are loaded
with each process, too expensive for the above
size - The other extreme has it all in memory (using a
page table base register (ptbr) to point to it -
each memory reference during instruction
translation is doubled... - To avoid keeping complete page tables in memory -
make them multilevel (and avoid the danger of
accumulating memory references per instruction by
caching) - a fast cache (additional 20) and a 98 hit
ratio, on a four-level page table, for a 100
nanoseconds memory access machine - effective access time 0.98 x 120 0.02 x 520
128 nanosecs
15Page Tables - Handling the size problem
16SPARC 3 level pagingContext table (MMU
hardware) - 1 entry per process
17Associative Memory - content addressable
memorypage insertion - complete entry from page
tablepage deletion - just the modified bit to
page table
18Associative Memory - comments
- With a large enough hit-ratio the average access
time is close to 0 - linked lists, for example, are bad..
- Only a complete virtual address (all levels) can
be counted as a hit - with multi-processing associative memory can be
cleared on context switch - wasteful.. - Add a field to the associative memory to hold
process ID and a special register for PID
19No page tables - MIPS R2000
- 64 entry associative memory for virtual pages
- if not found, TRAP to the operating system
- software uses some hardware registers to find the
virtual page needed - a second trap may happen by page fault...
20Inverted page tables
- for very large memories (page tables) one can
have an inverted page table sorted by
(physical) page frames - IBM RT HP Spectrum (thinking of 64 bit
memories) - to avoid linear search for every virtual
address of a process use a hash table (one or a
few memory references) - only one page table the physical one for all
processes currently in memory - in addition to the hash table, associative
memory registers are used to store recently used
page table entries - the only way to deal with a 64 bit memory 4k
size pages two-level page tables can result in
242 entries
21Inverted Page Table Architecture
22Pages the dataPage frames the physical memory
locations
- Page Table Entries (PTE) contain (per page)
- Page frame number (physical address)
- Present/absent bit (valid bit)
- Dirty (modified) bit
- Referenced (accessed) bit
- Protection
- Caching disable/enable
page frame number
23Page fault Handling
- 1. trap to kernel, save PC on stack and
(sometimes) partial state in registers (and/or
stack) - 2. assembly routine saves volatile information
and calls the operating system - 3. find requested virtual page
- 4. check protection. If legal, find free page
frame (or invoke page replacement algorithm) - 5. if replacing, check if modified and start
write to disk. Mark frame busy. Call scheduler
to block process until the write-to-disk process
has completed.
24Page fault Handling (contnd.)
- 6. transfer of requested page from disk
(scheduler runs alternative processes) - 7. upon transfer completion, enter page table,
mark new page as valid and update all other
parameters - 8. back up faulted instruction which was in
principle in mid execution now the PC can be
set back to its initial value - 9. schedule faulting process, return from
operating system - 10. restore state (i.e. all volatile information
stored by the assembly routine) and restart
execution of faulted process
25Architecture - Instruction backup
- page faulting instructions trap to OS
- OS must restart instruction
- The page fault may originate at the op-code or
any of the operands - PC value useless - the location of the instruction itself is lost
- worse still, undoing of autoincrement or
autodecrement - was it already performed ?? - Hardware solutions
- Register to store PC value of instruction and
register to store changes to other registers
(increment/decrement) - Micro-code dumps all information on the stack
- Restart complete instruction and redo increments
etc. - Do nothing - RISC ......
26Demand Paging
- Processes reside on disk and their swapping-in
is performed partially only part of their pages - During run time a process may encounter a
missing page and demand it - a missing page has its invalid bit on (which
will need to be differentiated by the page-fault
routine from illegal address) - Page missing ?? Retrieve page into empty page
frame - No empty page frame ?? Evict (replace) a page
- Many algorithms possible for selecting a page for
replacement - Optimal page replacement
- Discard page to be used the longest time ahead
- Not realizable...
- but can be used to compare to real algorithms !!
27Optimal page replacement
- Demand comes in for pages
- 7, 5, 1, 0, 5, 4, 7, 0, 2,
1, 0, 7 - an optimal algorithm faults on
- 7 5 1 (0,1) - (4,5) - - (2,4) (1,2)
- - - altogether 7 page-replacements
- take FIFO for example
- 7 5 1 (0,7) - (4,5) (7,1) - (2,0) (1,4)
(0,7)(7,2) - 3 additional page-replacements
28Good old FIFO
- implemented as a queue
- the usual drawback
- oldest page may be a referenced (needed) page
- second chance FIFO
- if reference bit is on - move to end of queue
- Better to implement as a circular queue
- save overhead of movements on the queue
29Page replacement NRU - Not Recently Used
- There are 4 classes of pages, according to
reference and modification bits - Select a page at random from the least-needed
class - Easy scheme to implement
- Prefers a frequently referenced (not modified)
page on an old modified page - Class b is interesting, can only happen when
clock tick generates an erasure of the referenced
bit..
30LRU - Least Recently Used
- Approximate the optimal algorithm -
- most recently used page as most probable next
reference - Replace page used furthest in the past
- Not easy to implement - needs counting of
references - Use a large counter (number of operations) and
save in a field in the page table, for each page
reference operation - Another option is to use a bit array of nxn bits
- In both cases the page entry with the smallest
number attached to it is selected for replacement
31LRU with bit tables
32NFU - Not Frequently Used
- In order to record frequently used pages add a
counter to all table entries - At each clock tick add the R bit to the counters
- Select page with lowest counter for replacement
- problem remembers everything
- remedy (an aging algorithm)
- shift-right the counter before adding the
reference bit - add the reference bit at the left
- Less operations than LRU, depends on the
intervals used for updating
33NFU - the aging simulation version
34Modelling paging algorithms
- Beladys anomaly
- Example FIFO with reference string 123412512345
35Characterizing paging systems
- a Reference string (of requested pages)
- number of virtual pages n
- number of physical page frames m
- a page replacement algorithm
- can be represented by an array M of n rows
36Stack Algorithms
- Definition Set of pages in physical memory with
m page frames is a subset of the pages in
physical memory with m1 page frames (for every
reference string) - Stack algorithms have no anomaly
- Example LRU, optimal replacement
- FIFO is not a stack algorithm
- Useful definition
- Distance string distance from top of stack
37Predicting page fault number
- Ci is the number of times that i is in the
distance string - the number of page faults with m frames is
- Fm
38Page Frame Allocation
- for a page-fault rate p, memory access time of
100 nanosecs and page-fault service time of 25
milisecs the effective access time is (1-p) x
100 p x 25,000,000 - for p of 0.001 the effective access time is
still larger than 100 nanosecs by a factor of 250 - for a goal of only a 10 degradation in access
time we need p 0.0000004 - policies for page-frame allocation must allocate
as much as possible to processes, to enhance
performance leave no unassigned page-frame - difficult to know how much frames to allocate to
processes differ in size structure priority
39Allocation to multiprocesses
- Fair share is not the best policy (static !!)
- allocate according to process size
- must be a minimum for running a process...
Age
A6
A6
40(dynamic) Page Allocation Policies
- 1st option - fixed number of pages per process
2nd option proportional to process size - Locality of reference - a valid statistical
phenomenon - Working set - sets of pages used by each process
- Working set model - dynamic number of pages per
process, a necessary condition for running (can
be used for prepaging - load working set before
running process) - Keep track by aging by lookback parameter
WSClock - Thrashing - very frequent page faults (more than
computation) - cpu utilization decreases ? increase
multiprogramming degree ? more utilization
decreases ? - whenever the in-memory pages are not the working
set - what to do for processes being swapped ?
41Dynamic set - Page Allocation
- 0 2 1 3 5 4 6 3 7 5 7 3 3 5 6 4
- with 5 page frames (LRU)
- p p p p p p p - p - - - - - -
- optimal - with ? 5 (and LRU)
- p p p p p p p - p - - (4)(3) - p -
- for a window of size 5 the allocated WS is
decreasing after request 12 and 14 - the maximum page allocation is ?
- extra page fault, because of the size of the WS
- after the last request, page 4, the number of
allocated page frames increases again (4)
42Dynamic set - Clock Algorithm
- WSClock is a global clock algorithm - for pages
held by all processes in memory - Circling the clock, the algorithm uses the
reference bit and adds to it a measure of window
size ? - Each time a reference bit is set an additional
data structure, ref(frame), is set to the current
virtual time of the process - WSClock Use an additional condition that
measures elapsed (process) time and compares it
to ? - replace page when two conditions apply
- reference bit is unset
- Tp -- ref(frame) gt ?
43Dynamic set - WSClock Example
- 3 processes p0, p1 and p2
- current (virtual) times of the 3 processes are
- Tp0 50 Tp1 70 Tp2 90
- WSClock replace when Tp -- ref(frame) gt ?
- the minimal distance (window size) is ? 20
- The clock hand is currently pointing to page
frame 4 - page-frames 0 1 2 3 4 5 6
7 8 9 10 - ref. bit 0 0 1 1 1 0 1
0 0 1 0 - process ID 0 1 0 1 2 1 0
0 1 2 2 - last_ref 10 30 52 71 81 37 61 37 31
47 55
44Page Daemons - Unix
- It is assumed useful to keep a number of free
pages - freeing of page frames can be done by a page
daemon - a process that sleeps most of the time - awakened periodically to inspect the state of
memory - if there are too few free page frames
then they free page frames - yet another type of (global) dynamic page
replacement policy - this strategy performs better than evicting pages
when needed (and writing the modified to disk in
a hurry)
45Comment - Page size analysis
- To minimize wasted memory
- process size s
- page size p
- page table entry size e
- Fragmentation overhead is
- Table space overhead is
- Total overhead is
- Minimize overhead
- Example s 128k e 8bytes
- optimal page size is 1488 bytes... i.e. use
1k or 2k
46Additional issues - Locking and Sharing
- i/o channel/processor (DMA) transfers data
independently - page must not be replaced during transfer
- OS can use a lock variable per page
- Pages of editors code - shared among processes
- swapping out, or terminating, process A (and its
pages) may cause many page faults for process B
that shares them - looking up for evicted pages in all page tables
is impossible - solution maintain special data structures for
shared pages
47Handling the backing store
- need to store non-resident pages on disk
- the backing store (disk swap area) need to be
managed - allocate swap area to (whole) processes and
address pages by offset from swap address - processes grow during execution - assign separate
swap areas to Text Data and Stack - allocate disk blocks when needed - needs disk
addresses in memory to keep track of swapped pages
48Segmentation
- several logical address spaces per process
- a compiler needs segments for
- source text
- symbol table
- constants segment
- stack
- parse tree
- compiler executable code
- Most of these segments grow during execution
symbol table
symbol table
Source Text
source text
constant table
parse tree
call stack
49Segmentation vs. Paging
50Segmentation - segment table
51Segmentation with Paging
- MULTICS combined segmentation and paging
- 218 segments of up to 64k words (36 bits)
- addresses are 34 bits -
- 18 bit segment number
- 16 bit - page number (6) offset within page
(10) - Each process has a segment table (STBR)
- the segment table is a segment and is paged
(8bits page10 offset). STBR added to 18 bits
seg-num - Each segment is a separate virtual memory with a
page table (6 bits) - segment tables contain segment descriptors - 18
bits page table address 9 bits segment length
52MULTICS segment descriptors
53segmentation and paging - locating addresses
54Segmentation - Memory reference procedure
- 1. Use segment number to find segment descriptor
- segment table is itself paged because it is
large, so in actuality a STBR is used to locate
page of descriptor - 2. Check if page table is in memory
- if not a segment fault occurs
- if there is a protection violation TRAP (fault)
- 3. page table examined, a page fault may occur.
- if page is in memory the address of start of page
is extracted from page table - 4. offset is added to the page origin to
construct main memory address - 5. perform read/store etc.
55Paged segmentation on the INTEL 80386
- 16k segments, each up to 1G (32bit words)
- 2 types of segment descriptors
- Local Descriptor Table (LDT), for each process
- Global (GDT) system etc.
- access by loading a 16bit selector to one of the
6 segment registers CS, DS, SS, (holding the
16bit selector during run time, 0 means
not-in-use) - Selector points to segment descriptor (8 bytes)
Privilege level (0-3)
0 GDT/ 1 LDT
13
1
2
Index
5680386 - segment descriptors
5780386 - Forming the linear address
- Segment descriptor is in internal (microcode)
register - If segment is not zero (TRAP) or paged out (TRAP)
- Segment size is checked against limit field of
descriptor - Base field of descriptor is added to offset (4k
page-size)
5880386 - paged segmentation (contnd.)
- Combine descriptor and offset into linear address
- If paging disabled, pure segmentation (286
compatibility). Linear address is physical
address - Paging is 2-level
- page directory (1k) page table (1k)
- pages are 4k bytes each (12bit offset)
- Page directory is pointed to by a special
register - PTEs have 20bits page frame and 12 bits of
modified, accessed, protection, etc. - Small segments have just a few page tables
5980386 - 2-level paging
60Intel 30386 address translation
61The Buddy System