Title: Memory Management
1Memory Management
- The part of O.S. that manages the memory
hierarchy (i.e., main memory and disk) is called
the memory manager - Memory manager allocates memory to process when
they need it and deallocate it when they are
done. It also manages swapping between main
memory and disk when main memory is too small to
hold all processes. -
2Memory Managements Schemes
- There are two types of memory managements the
ones that move back and forth between main memory
and disk (i.e., swapping and paging) and those
that do not. - The example of the second one is monoprogramming
without swapping (rarely used today). In this
approach only one program is running at a time
and memory is used by this program and O.S. - Three ways of organizing the memory in
monoprogramming approach is shown is next slide
3 4Relocation and Protection
- When a program is linked (all parts of program
are combined into a single address space), linker
should know at what address the program will
begin in memory. - Because the program each time is loaded into a
different partition, its source code should be
re-locatable. - One solution is relocation during loading which
means linker should modify the re-locatable
addresses. - The problem is even this method can not stop the
program to read or writes the memory locations
belonging to the other users (i.e., protection
problem)
5Relocation and Protection
- Solution to both of these problems is using two
special hardware registers, called base and limit
registers. When process is scheduled, the base
register is loaded with the address of start of
its partition and limit register with the length
of its partition. - Example base addr, X if loaded at location
x10 - x 10 20
- 2x 3 23
- x 40 50
6Swapping
- In the interactive system because sometimes there
is not enough memory for the active processes,
excess processes must kept on disk and brought in
to run dynamically. - Swapping is one of the memory management
technique, in which each process is brought in
its entirely and after running it for a while
putting it back on the disk. - The partitions can be variable that means number,
location and size of the partitions vary
dynamically to improve memory utilization
7 8Swapping
- Swapping may cause multiple holes in memory in
that case all the processes should move downward
to be able to combine the holes into one big one
( memory compaction) - Swapping is slow and also if processes data
segment grow dynamically there should be a little
extra memory whenever a process is swapped in. If
there is no space in the memory when processes
growing they should be swapped out of the memory
or killed (if there is no disk space). - Also if both stack segment (for return addresses
and local variable) and data segment (for heap
and dynamically allocated variables) of the
processes grow, special memory configuration
should be considered (see next slide)
9 10Issues with Variable Partitions
- How do we choose which hole to assign to a
process? (allocation strategy) - How do we manage memory to keep track of holes?
(see next slide) - How do we avoid many small holes? One possible
solution is compaction but there are several
variations for compaction. The questions is which
processes should be moved)
11Memory Management with Bitmaps
- Each allocation unit in the memory corresponds to
one bit in the bitmap (e.g. it is 0 if the unit
is free and 1 if it is occupied) - The size of allocation unit is important. The
smaller allocation unit the larger bitmap. - If the allocation unit is chosen large, the
bitmap is smaller but if process size is not an
exact multiple of the allocation unit the memory
may be wasted
12Memory Management with Linked Lists
- A linked list keeps track of allocated and free
segments (i.e., process or hole). -
13Memory Management with Linked Lists
- If the list is sorted by address the combinations
of terminating the process x can be shown as
follows. List update is different for each
situation. Using double-linked list is more
convenient.
14Memory Management with Linked Lists
- For allocation of memory for new process
allocation strategies are - First fit first available hole. Generates
largest holes - Next fit next hole. Search starts from the
previous place instead of the beginning of the
list - Best fit finds the best hole with the size close
to the size needed. It is slower and the results
useless tiny holes - Worst fit takes the largest hole
- Quick fit maintains the separate lists of common
sizes such as 4KB,12KB and 20KB holes. It is fast
but the disadvantage is merging of free spaces is
expensive
15Virtual Memory
- Overlays are the parts of program that are
created by programmer. By keeping some overlays
on disk it was possible to fit a program larger
than memory in the memory. Swapping overlays in
and out (they call each other) and also splitting
a program to overlays were boring and time
consuming. - To solve this problem virtual memory was devised
in 1961. It means if the size of program exceeds
the size of memory, O.S. keeps those part of the
program currently in use, in main memory and the
rest on disk. It can be used in single program
and multiprogramming environments.
16Paging
- Most of the virtual memory systems use a
technique called paging, which is based on
virtual addressing. - When a program references an address, this
address is virtual address and forms virtual
address space. A virtual address does not go
directly to a memory bus. Instead it goes to
Memory Management Unit (MMU) that maps virtual
address onto physical memory address. (see next
slide)
17Virtual Address
18Paging
- Virtual address space is divided into the units
called pages corresponding to the units in the
physical memory called page frames. - Page table maps virtual pages onto page frames.
(see next slide) - If the related page frame exist in the memory
physical address is found and the address in the
instruction transformed into physical address. - If related page frame does not exist in memory
it is a page fault. That means the related page
exist on disk. In case of page fault O.S. frees
one of the used page frames (writes it to disk)
and fetched the reference page into that page
frame and updates the page table.
19 20Page Table
- The internal structure of the page table consists
of 3 bits that shows the number of page frame
and a bit that shows availability of page frame
in memory. - The incoming address consists of virtual page
number that translate to 3 bits address of page
frame and offset that copies directly to output
address. (see next slide) - Each process has its own page table. If O.S.
copies the process page table from memory to an
array of fast hardware registers, the memory
references reduced but since page table is large,
it is expensive. If O.S. keeps the process page
table in the memory, there should be one or two
memory reference (reading the page table) for
each instruction. This is a disadvantage because
of slowing down the execution. We will see
different solutions later.
21 22Paging
- In general by knowing the logical address space
and page size the page table can be designed - For example with 16 bit addressing and 4k page
size since 4K 212 it means we need 12 bit
offset and 16- 12 4 bit for higher numbers for
memory logical address. The structure of page
table can be same as previous slide.
23Paging
- Example For the 8 bit addressing system and page
size of 32 bytes we need 5 bits for offset and 3
bits for the higher bits of memory logical
address. If we assume 8 bit for physical
addressing the page table can be as the
following - 000 010
- 001 111
- 010 011
- 011 000
- 100 101
- 101 001
- 110 100
- 111 110
- changes logical address 011 01101gt 000 01101
(physical) - and logical address 110 0110gt 100 0110
(physical)
24Issues with the Page Tables
- With 4KB page size and 32 bit address space 1
million pages can be addressed. It means the page
table must have 1 million entries. One of the way
for dealing with such a large page table is using
multilevel page table. With multilevel page table
we can keep only the pages that are needed in the
memory (see next slide). - Another issue is mapping must be fast. For
example if the instruction takes 4 nsec, the page
lookup must be done under 1 nsec.
252 level scheme
- The first 2 bits are the index to the table1 that
remains in the main memory. - The second 2 bits are index to table2 that
contains the pages which may not exist in memory.
(table2 is subject to paging) - table1 table2 page frame
- 00 ---------- 00
- 01
- 10 -------1011
- 11
-
- 01--------------- 00
- 01
-
- for example 0010 1100 changes to 1011 1100
- Only the tables that are currently used are kept
in the memory
26Structure of a PageTable Entry
- The modified bit or dirty bit shows if the page
is modified in the memory. In the case of
reclaiming the page frame page with dirty bit
must be written back to disk. Otherwise it could
be abandoned. -
27TLBs Translation Lookaside Buffers
- Using fast hardware lookup cache ( typically
between 8-2048 entries) is standard solution to
speed up paging (see next slide). - In general the logical address is searched in
TLB. If address is there and access does not
violate the protection bit the page frame is
taken directly from TLB. In case of miss it goes
to the page table. The missed page looked up from
page table and replaced by one of the entry of
TLB for the future use.
28TLBs Translation Lookaside Buffers
29Inverted Page Tables
- There is only one entry per page frame in real
memory. - Advantage Less memory is required for example
with 256 MB Ram and 4K page size 216 pages can
hold in the memory so we need 65,536 (equal to
216) entries instead of 252 entries that was
required for traditional page table with 64bit
addressing - Disadvantage Must search the whole table to find
the address of the page for the requested process
because it is not indexed on virtual page (see
the next slide).
30Inverted Page Tables
31Page Replacement Algorithm
- When a page fault occurs the operating system has
to choose the page to evict from the memory. - The Optimal Page Replacement Algorithm is based
on the knowledge of the future usage of a page.
It evicts the pages that will be used as far as
possible in the future. It can not be implemented
except for if the program runs for the second
time and system recorded the usage of the pages
from the first time run of that program. - The only use of optimal algorithm is to compare
its performance with the other realizable
algorithms.
32NRU (Not Recently Used)
- This algorithm uses the M (modified) and
R(referenced) bit of page table entry. On each
page fault it removes a page at random from the
lowest numbered of following classes - Class 0 not referenced, not modified
- Class 1not referenced , modified
- Class 2referenced, not modified
- Class 3referenced, modified
33The Second Chance Replacement Algorithm
- FIFO List of the pages with head and tail. On
each page fault the oldest page which is at the
head is removed - Problem with FIFO is it may throw out the heavily
used pages that came early - Second chance algorithm is enhancement of FIFO to
solve this problem - On each page fault it checks the R bit of the
oldest page if it is 0, the page is replaced if
it is 1, the bit cleared and the page is put onto
the end of the list of pages. (see the next
slide)
34The Second Chance Replacement Algorithm
35The Clock Page Replacement Algorithm
- Problem with Second Page Replacement Algorithm is
inefficiency because of moving pages around its
list. - Clock Page Replacement solves this problem. On
each page fault the pointed page is inspected. If
its R bit is 0, the page is evicted and
pointer(hand) is advanced one position. If R is
1, it is cleared and pointer is advanced to the
next page (see next slide)
36The Clock Page Replacement Algorithm
37The Least Recently Used (LRU) Page Replacement
Algorithm
- LRU throw out the page that has not been used
for the longest time. Implementation - Link list of the used pages.
- Expensive because link list requires update on
each reference to the page - 64 bit hardware counter that stores the time of
references for each page. On each page fault the
page with lowest counter is LRU page. - Each page table entry should be large enough
to contain the counter. - Using a matrix (hardware) nn for n page frames
On each reference to the page k, first all bits
of row k are set to 1 and then all bits of column
k are set to zeros. The row with the lowest
binary value is the LRU page.
38LRU with matrix. Pages 0,1,2,3,2,1,0,3,2,3
39Simulation LRU in Software
- NFU (Not Frequently Used) algorithm is
approximation of LRU that can be done in
software. - There is a software counter for each page. O.S
adds the value (0 or 1) of the R bit related to
that page on each clock interrupt. - The problem with NFU is it does not forget the
number of counts. It means if a page received
lots of references in the past, even if it has
not been used recently it never evicts from the
memory.
40Simulation LRU in Software
- To solve this problem the modified algorithm,
known as aging works as the follows - 8 bit reference byte is used for each page and at
referenced intervals (clock tick) a one bit right
shift occurs. Then R bit is added to the leftmost
bit. The lowest binary value is the LRU page.
(see next slide) - The difference between aging and LRU is in aging
if two pages contain the same value of 0, we can
not say which one is less used before 8 clock
ticks.
41Aging algorithm simulation of LRU
42More on Page replacement algorithms
- High paging activities is called thrashing . A
process is thrashing if it is spending more time
on paging than execution. - Usually processes exhibit locality of references.
Means they reference small fraction of their
pages - When a process brings the pages after a while it
has most of the pages it needs. This strategy is
called demand paging that means pages are load on
demand, not in advance. - The strategies that try to load the pages before
process run are prepaging.
43Working set model
- The set of pages that a process is currently
using is called its working set. - If k considered the most recent memory
references, there is a wide range of k in which
working set is unchanged. (see next slide) It
means prepaging is possible. - Prepaging algorithms guess which pages will be
needed when a process restarts. This prediction
is based on the process working set when process
was stopped. - At each page fault, working set page replacement
algorithms check to see if the page is part of
the working set of the current process or not. - WSClock algorithm is the modified working set
algorithm that is based on clock algorithm.
44Working set model
K
45Summary of Page Replacement Algorithms
46Modeling Page Replacement Algorithms
- For some page replacement algorithms, increasing
the number of frames does not necessarily reduce
the number of page faults. This strange
situation has become known as Beladys Anomaly. - For example FIFO caused more page faults with
four page frames than with three for a program
with five virtual pages and following page
references (see next slide) - 0 1 2 3 0 1 4 0 1 2 3 4
47 48Stack Algorithms
- In these modeling algorithms, paging can be
characterized by three items - 1- The reference string of the executing process
- 2- The page replacement algorithm
- 3- The number of page frames available in the
memory - For example using LRU page replacement and
virtual address of eight pages and physical
memory of four pages next slide shows the state
of memory for each of the reference string - 0 2 1 3 5 4 6 3 7 4 7 3 3 5 5 3 1 1 1 7 2 3 4
1
49The State of Memory With LRU stack-based
algorithm
50The Distance String
- The distance from top of the stack is called
distance string. - Distance string depends on reference string and
paging algorithm. It shows the performance of
algorithm. For example for (a) most of the string
are between 1 and k. It shows with a memory of k
page frames few page fault occur. But for (b) the
reference are so spread out that generates lots
of page faults.
(b)
(a)
51Segmentation
- Although paging provides a large linear address
space without having to buy physical memory but
using one linear address space may cause some
problems. - For example suppose with having virtual address
of 0 to some maximum address we want to allocate
the space for different tables of a program that
are built up as compilation proceeds. Since we
have a one-dimensional memory, all five tables
that are produced by compiler, should be
allocated as contiguous chunks. The problem is
with growing tables, one table may bump into
another. (see the next slide)
52 53Segmentation
- Segmentation is a general solution to this
problem. It is providing the machine with many
completely independent address spaces, called
segments - Segments have variable sizes and each of them can
grow or shrink independently without affecting
each other. - For example next slide shows a segmented memory
for the compiler tables
54 55Advantages of Segmentation
- Simplifying handling of growing/shrinking data
structures - Since each segment starts with address of 0, if
each procedure occupies a separate segment then
linking up of procedures compiled separately is
simplified. It is true because a call to
procedure in segment n uses two part address
(n,0) it means size changes of other procedures
does not affect the starting address which is n - Sharing procedures or data between several
processes is easier with segments (e.g. shared
library) - Different segments can have different kind of
protection (e.g. read/write but not executable
for an array segment) - Major problem with segmentation is external
fragmentation that can be dealt by compaction
(see next slide)
56 57Segmentation with Paging
- If the segments are large by paging them only the
pages that are needed can be placed in memory. - MULTICS (OS of Honeywell 6000) uses segmentation
with paging. Each 34-bit MULTICS has segment
number and address within that segment. The
segment number is used to find segment
descriptor. Segment descriptor points to the page
table by keeping 18 bits address and contains
the segment information (see next slide)
58 59Segmentation with Paging in MULTICS
- By finding segment number from virtual address
then if there is no segment fault, page frame
number can be find. If it is in the memory the
memory address of the page is extracted and will
be added to the offset. (see next slide) - If page frame is not in the memory page fault
occurs. - The problem is this algorithm is slow so MULTICS
uses a 16-word TBL to speedup the searching.
60