Title: Chapter 8: Memory Management
1Chapter 8 Memory Management
- A program may not be executed until it is
- associated with a process
- brought into memory
- In allow multi-programming, the OS must be able
to allocate memory to each process - Several processes at once
- Requires a Memory Management scheme and
appropriate hardware support - Security?
- The memory management scheme has a large impact
upon how a program for a particular platform must
be designed and compiled - How much memory is available?
- How do should we bind addresses?
2Address Binding
- Instruction and data addresses in program source
code are symbolic - goto errjmp
- X A B
- These symbolic addresses must be bound to
addresses in physical memory before the code can
be executed - Address binding a mapping from one address
space to another - The address binding can take place at compile
time, load time, or execution time. - Compile-time Binding the compiler generates
absolute code - memory location must be known a priori
- must recompile to move code
- MS-DOS .COM format programs
3Load-time Binding
- Most modern compilers generate relocatable object
code - symbolic address are bound to a relocatable
address - i.e. 286 bytes from the beginning for the module
doomC.o - The linkage editor (linker) combines the multiple
modules into a relocatable executable - The load module (loader) is places the program in
memory - The loader performs the final binding of
relocatable addresses to absolute addresses - Load-time Binding Bind relocatable code to
address on load - Must generate relocatable code
- Memory location need not be known at compile time
- If starting address must change, we must reload
code
4Execution-time Binding
- A logical (or virtual) address space may be bound
to a separate physical address space - Provides an abstraction of physical memory
- Logical (virtual) address generated by the CPU
- Physical address address seen by the memory
unit - The user program deals with logical addresses it
never sees the real physical addresses - Memory-Management Unit (MMU) Hardware device
that translates CPU-generated logical addresses
into physical memory addresses - Execution-time Binding Binding delayed until
run time - process can be moved during its execution from
one memory segment to another - logical and physical addresses differ (requires
mapping) - requires hardware and OS support for address
mapping
5Memory-Management Unit (MMU)
- Logical and physical addresses are the same in
compile-time and load-time address-binding
schemes logical (virtual) and physical addresses
differ in execution-time address-binding scheme. - The user program deals with logical addresses it
never sees the real physical addresses. - Hardware device that maps virtual to physical
address. - In most basic MMU scheme, all logical addresses
begin at 0, and the base register is replaced by
a relocation register - The value in the relocation register is added to
every logical address generated by a user process
at the time it is sent to memory to generate the
necessary physical address - To move the program, simply change the value in
the register - The limit register remains unchanged
- Thus, each logical address is bound to a physical
address - Is security maintained?
6Can we reduce memory requirements?
- Loading Placing the program in memory
- Dynamic Loading Routine is not loaded until it
is called - Program must check and load before calling
- If a needed routine is not available in memory,
the relocatable linker/loader loads the routine
and updates the programs address tables - Better memory-space utilization unused routine
is never loaded - Size of executable is unchanged
- Runtime footprint is smaller
- Useful when large amounts of code are needed to
handle infrequently occurring cases. - No special support from the operating system is
required - Implemented through program design
7Can we reduce executable size?
- Linking combining object modules into an
executable - Most OSes require static linking
- All library routines become part of the
executable - Modern OSes often allow dynamic linking
- Linking postponed until execution time
- Instead of placing the code for each library
routine in the executable, include only a stub (a
small piece of code) which - locates the appropriate memory-resident library
routine - replaces itself with the address of the routine,
and executes the routine - Executable footprint is reduced
- program will not run w/o libraries
- New (minor) versions of the library do not
require recompilation - Some operating systems provide support for
sharing the memory associated with library
modules between processes (shared libs.) - Very efficient! No read() required, less overall
memory usage
8What if there isnt enough memory?
- How can we execute an executable whose code
footprint is larger than the memory available? - This was a major problem in the 60s and 70s for
general purpose computers and remains a major
problem - Consider memory usage in an e-mail pager or ISDN
box - Solution Keep in memory only those instructions
and data that are needed at any given time
overload during run-time - Overwrite this memory with a new set of
instructions and data when we get to a
significantly different part of the code - Each set of instructions/data is an overlay
- Programming design of overlay structure is
non-trivial - No special support needed from operating system
- Implemented by user design
- Modern general purpose OSes use virtual memory to
deal with this problem
9How does the OS allocate memory?
- Contiguous Allocation Scheme All memory granted
to a process must be contiguous - Single-partition contiguous allocation
- Only one partition exists in memory for user
processes - Only one user process is granted memory at a time
- The resident operating system must also be held
in memory - OS size changes as transient code is loaded
- Place OS in low memory, use relocation-register
to define the beginning of the user partition - Relocation-register protects the OS code and data
- Alows relocation of user code if OS requirements
change - Relocation register contains value of smallest
physical address limit register contains range
of logical addresses each logical address must
be less than the limit register - To change context, must swap out main memory to a
backing store
10Swapping
- A process can be suspended and swapped
temporarily out of memory to a backing store, and
then brought back into memory for continued
execution - Backing store usually a fast disk large enough
to accommodate copies of all memory images for
all users must provide direct access to these
memory images - swap may be from memory (conventional) to memory
(extended) - Roll out, roll in swapping variant used for
priority-based scheduling algorithms (or
round-robin with a huge quantum) lower-priority
process is swapped out so higher-priority process
can be loaded and executed. - Major part of swap time is transfer time total
transfer time is directly proportional to the
amount of memory swapped. - Requires execution-time binding if process can be
restored to a different memory space then it
occupied previously - OS management of I/O buffers required to swap a
process awaiting I/O - Modified versions of swapping are found on many
systems, i.e., UNIX and Microsoft Windows
11Swapping in Single Partition Scheme
12Contiguous Allocation (Cont.)
- For multi-processing systems it is far more
efficient to allow several user processes to
allocate memory - The OS must keep track of the size and owner of
each partition - The OS must determine how and where to allocate
new requests - Multiple-partition contiguous allocation
- Fixed-partition Memory is pre-partitioned, the
OS must assign each process to the best free
partition - Hard limit to the number of processes in memory
- Efficient?
13Contiguous Allocation (Cont.)
- Multiple-partition contiguous allocation
- Dynamic allocation Memory is partitioned by the
OS on the fly - Operating system maintains information abouta)
allocated partitions b) free partitions (hole) - Hole block of available memory holes of various
size are scattered throughout memory. - When a process arrives, it is allocated memory
from a hole large enough to accommodate it
14Dynamic Storage-Allocation Problem
- How do we satisfy a request of size n from a list
of free holes. Optimization metrics include
speed and storage utilization. - First-fit Allocate the first hole that is big
enough. Search begins at top of list. Fast
search. - Next-fit Allocate the first hole that is big
enough. Search begins at the end of the last
search. Fast search. - Best-fit Allocate the smallest hole that is big
enough must search entire list, unless ordered
by size. Produces the smallest leftover hole. - Worst-fit Allocate the largest hole must also
search entire list, unless ordered by size.
Produces the largest leftover hole. - Simulation shows that
- First-fit is better (in terms of storage
utilization) than worst-fit - First-fit is as good (in terms of storage
utilization) than best-fit - First-fit is faster than best-fit
- Next-fit is generally better than first-fit
15Fragmentation
- How do we measure storage utilization?
- How much space is wasted?
- Internal fragmentation allocated memory may be
slightly larger than requested memory this size
difference is memory internal to a partition, but
not being used - Problem in fixed-partition allocation
- External fragmentation total memory space
exists to satisfy a request, but it is not
contiguous. - Problem in dynamic allocation
- 50 rule Simulations show that for n-blocks,
n/2-blocks of memory are wasted. 1/3 of memory
is lost to fragmentation - External fragmentation can be reduced by
compaction - Shuffle memory contents to place all free memory
together in one large block - Compaction is possible only if relocation is
dynamic, and is done at execution time and if the
OS provides I/O buffers so that devices dont DMA
reallocated memory
16Non-Contiguous Memory Allocation
- Goal Reduce memory loss to external
fragmentation without incurring the overhead of
compaction - Solution Abandon the requirement that
allocation memory be contiguous. - Non-contiguous memory allocation approaches
include - Paging Allow logical address space of a process
to be noncontiguous in physical memory. This
complicates the binding (MMU) but allows the
process to be allocated physical memory wherever
it is available. - Segmentation Allow the segmentation of a
process into many logically connected components.
Each begins at its own (local) virtual address
0. - This allows many other useful features, including
protection permisions on a per segment basis,
etc. - Example segmentation Text, Data, Stack.
- Segmentation with Paging Hybrid approach
17Paging
- Physical memory is broken up into fixed-size
partitions called frames - Logical memory is broken up into frame-size
partitions called pages - The OS keeps track of all free frames
- Frame size Page size (power of 2, usually 512
- 8k bytes) - To run a program of size n pages, need to find n
free frames and load program - Internal fragmentation (average of 50 of one
page per process) - Logical addresses must be mapped to physical
addresses - Set up a page table to note which frame holds
each page - Logical Address generated by CPU is divided into
- Page number (p) used as an index into a page
table which contains base address of each page in
physical memory - Page offset (d) combined with base address to
define the physical memory address that is sent
to the memory unit
18Paging Example
19Implementing Paging
- Paging is transparent to the process (still
viewed as contiguous) - Divide a m-bit logical address for a system with
pages of size 2n into - n-bit page offset (d)
- (m-n)-bit page number (p)
- The page number p is an index to the page table
which stores the location of the frame - Frames and pages are the same size, thus the
displacement within a page is also the
displacement within the frame - Mapping is
- Physical address page-table(p) d
20Address Translation Architecture
21Page Size
- How large should a page be?
- Smaller pages reduce internal fragmentation
- Larger pages reduce the number of page table
entries - If s is the average process size, p is the page
size (in bytes) and e is the of bytes per page
table entry, then
- For current process sizes, and available physical
memory, optimal page sizes range between 512 - 8K
bytes - Page table must be kept in main memory.
- Why? If a page is 8k (12 bits) and the CPU uses
a 32-bits address then there are 220 possible
pages per process - of bits per entry depends upon size of physical
memory - The memory consumed by this table is
overhead/waste
22Implementation of Page Table
- The page table must be kept in main memory
- Page-table base register (PTBR) points to the
page table - add PTBR page number (p) to get lookup address
- Page-table length register (PRLR) indicates size
of the table - Only make the page table as large as necessary
- Addresses in unallocated pages cause an exception
- For each CPU memory access in there are two
physical accesses - access the page table (in memory) to retrieve
frame - access the data/instruction
- The inefficiency of this two memory access
solution can be reduced by the use of a special
fast-lookup hardware cache for the page table - associative registers or translation look-aside
buffers (TLBs) - Hit Ratio The percentage for which the necessary
data is present in the cache - otherwise, get data from page table in main memory
23Effective Access Time
- Effective Access Time (EAT) is a weighted average
-
tTLB time required for a TLB lookup tmem time
required for an access to main memory ? hit
ratio EAT ? ( tTLB tmem) (1-
?)(tTLBtmemtmem)
- Even for fairly small TLBs, hit ratios of .98 -
.99 are common - Most programs refer to memory very sequentially
and locally - The 32-entry TLB in the 486 generally has a .98
hit ratio - Thus, we can implement paging without suffering a
significant latency cost - Try it with TLB search of 20ns, Memory access of
100ns, and hit ratios of .80 and .98
24Memory Protection
- Protections bits are included for each entry in
the page table - Valid-invalid bit indicates if the associated
page is in the process logical address space,
and is thus a legal page - Machines which have a PTLR can avoid the wasted
page table entries necessary to house the i bit. - RO/RW/X bits indicates if the page should be
considered read-only, read-write and/or
executable - Protection exceptions are calculated in parallel
with the physical address (after the page table
lookup) - Page tables allow processes to share memory by
having their page tables point to the same frame - Note Processes can not reference physical memory
that the OS does not allow them to via page table
setup - The OS keeps a frame-table (one entry per frame)
which indicates if each frame is full or empty,
to which process the frame is allocated, when was
it last referenced, etc - Memory protection implemented by associating
protection bit with each frame
25Shared Pages
- Private code and data
- Each process keeps a separate copy of the code
and data - Shared code
- To be sharable, code must be reentrant (or
pure) - All non-self modifying code is pure - it never
changes during execution (I.e. read only code) - Each process has its own copy of registers and
data storage to hold the data for its process
execution - One copy of reentrant code can be shared among
processes (i.e., text editors, compilers, window
systems) - Problem Shared code must appear in at the same
location in the logical address space of each
process - internal branch and memory addresses must be
consistent
26Shared Pages Example
27Two-Level Paging
- Consider a page table for a 32-bit logical
address space on a machine with a 32-bit physical
address space and size 4K pages - logical space/page size 232 / 212 220 entries
- physical space/frame size 232/212 220, 20
bits/entry 12 protection bits 4 Bytes/entry - Page table size 220 entries 4 Bytes/entry 4
MB - 4 MB gtgt 4K The page table itself is larger than
one page! - We cant allocate the page table in contiguous
memory - We must page the page table! The page number is
divided into - How many 4 Byte entries per 4K page? 212/22 210
- a 10-bit page offset
- How many bits remain? 20 - 10 10
- a 10-bit page number
- Thus, a logical address is divided pi, an index
into the outer page table, and p2, the
displacement within the page of the outer page
table
28Two-Level Page-Table Scheme
29Multilevel Paging Performance
- The concept can be extended to any number of
page-table levels - Since each level is stored as a separate table in
memory, covering a logical address to a physical
one may take many memory accesses - Even though time needed for one memory access is
increased, caching (via TLB) permits performance
to remain reasonable - Example In a system with a two-level paging
scheme, a memory access time of 100ns, and 20ns
TLB with a hit rate of 98 percent - effective access time 0.98 x (20 100)
- 0.02
x (20 100 100 100) - 124
nanoseconds.which is only a 24 percent slowdown
in memory access time.
30Inverted Page Table
- Problem Each process requires its own page
table, which consists many entries (possibly
millions). How can we reduce this overhead? - Solution The number of frames is fixed (and
shared between the processes). Store the
process/page information by frame! - One entry for each real page of memory
- Entry consists of the virtual address of the page
stored in that real memory location, with
information about the process that owns that page - Concern Decreases memory needed to store each
page table, but increases time needed to search
the table when a page reference occurs - Use hash table to limit the search to one or at
most a few page-table entries - hash table requires another memory lookup (of
course) - Concern for later The use of an inverted page
table does not obviate the need for a normal page
table in demand paged systems (ch. 9)
31Inverted Page Table Architecture
32Segmentation
- Segmentation is a non-contiguous memory
allocation scheme - simpler than paging, but not as efficient
- supports user view of memory
- Programmers tend not to consider memory as a
linear array of bytes, they prefer to view memory
as a collection of variable sized segments - Never forget, however, that memory is a linear
array of bytes - A segment is a logical unit such as
- main program, procedure, function, local
variables, global variables, common block, stack,
symbol table, arrays, etc. - Segmentation is a memory management scheme that
supports this user view of memory - segments are numbered and referred to by that
number - a logical address consists of a segment, and an
offset - A mapping between segments and physical addresses
must be performed
33Logical View of Segmentation
1
2
3
4
user space
physical memory space
34Segmentation Architecture
- Logical address consists of a two tuple
- ltsegment-number, offsetgt,
- Segment table maps two-dimensional physical
addresses each table entry has - base contains the starting physical address
where the segments reside in memory. - limit specifies the length of the segment.
- Segment-table base register (STBR) points to the
segment tables location in memory. - Segment-table length register (STLR) indicates
number of segments used by a program - segment number s is legal if s
lt STLR.
35Segmentation Architecture (Cont.)
- Relocation
- dynamic (execution-time)
- by segment table
- Sharing
- similar to sharing in a paged system
- shared segments
- must have same segment number in each program
- protection/sharing bits in each segment table
entry - Memory allocation
- segment vary in length
- dynamic-storage problem first fit/best fit?
- external fragmentation
- segmentation dont use frames, thus external
fragmentation exists - periodic compaction may be necessary and is
possible as dynamic relocation is supported
36Sharing of segments
37Hybrid Segmentation with Paging
- Segmentation and paging have their advantages and
disadvantages - segmentation suffers from dynamic allocation
problems - lengthy search time for a memory hole
- external fragmentation can waste significant
resources - paging reduces dynamic allocation problems
- quick search (just find enough empty frames if
they exist) - eliminates external fragmentation
- Note it does introduce internal fragmentation
- Solution page the segments!
- First seen in MULTICS, dominates current
allocation schemes - Solution differs from pure segmentation in that
the segment-table entry contains not the base
address of the segment, but rather the base
address of a page table for the segment
38MULTICS Address Translation Scheme
39Generalized Summary
- Parkinsons Law Programs expand to fill
available memory - Mono-programmed systems
- One user process in memory
- OS and device drivers also present
- Overlays used to increase program size
- Relocatable at compile-time only
- Protection Base and limit register
- Multi-programmed systems/fixed number of tasks
(OS/360 MFT) - Memory allocation on fixed-sized/numbered
partitions - Queue for each partition size
- Relocatable at load time
- Protection Base and limit register, or
protection code (pid) if multiple non-contiguous
blocks are allowed
40Generalized Summary
- Multi-programmed and time-shared systems with
variable partitions - Memory manager must keep track of partitions and
holes - Dynamic allocation algorithm First-fit,
Next-fit, Best-fit, etc. - Compaction to reduce external fragmentation
- Protection
- relocation (base) register and limit register, or
- virtual addresses - the OS produces the physical
address user programs can not generate addresses
which belong to other processes - Relocatable during execution (or no compaction
possible) - Change relocation register value or page-to-frame
mapping