Memory Management Motivation - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Memory Management Motivation

Description:

Utilizing Memory. Assume each process takes 200k and so does the ... Not very good memory utilization. Freed blocks can only be merged with their own size ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 62
Provided by: csBg
Category:

less

Transcript and Presenter's Notes

Title: Memory Management Motivation


1
Memory Management - Motivation
  • n processes, each spending a fraction p of their
    time waiting for i/o, gives a probability of pn
    of all processes waiting for i/o simultanously
  • cpu utilization 1 - pn

2
Utilizing Memory
  • Assume each process takes 200k and so does the
    operating system
  • Assume there is 1Mb of memory available and that
    p0.8
  • space for 4 processes 60 cpu
    utilization
  • Another 1Mb enables 9 processes
  • 87 cpu utilization

3
Issues - Relocation and Linking
  • Compile time - create absolute code
  • Load time - linker lists relocatable
    instructions and loader changes instructions (at
    each reload..)
  • Execution time - special hardware needed to
    support moving of processes during run time
  • Dynamic Linking - used with system libraries and
    includes only a stub in each user routine,
    indicating how to locate the memory-resident
    library function (or how to load it, if needed)

4
Multiprogramming with fixed partitions
  • How to organize the memory ?
  • How to assign jobs to partitions ?
  • Separate queues vs. single queue

5
Allocating memory - growing segments
6
Memory allocation - Keeping track (bitmaps
linked lists)
7
Strategies for Allocation
  • First fit do not search too much..
  • Next fit - start search from last location
  • Best fit - a drawback generates small holes
  • Worst fit - solve the above problems badly
  • Quick fit - several queues of different sizes
  • An example elaborate scheme the Buddy system
    (Knuth 1973)
  • Separate lists of free holes of sizes of powers
    of two
  • For any request, pick the 1st hole of the right
    size
  • Not very good memory utilization
  • Freed blocks can only be merged with their own
    size
  • Main problem of memory allocation -
    Fragmentation
  • Internal wasted parts of allocated space
  • External wasted unallocated space

8
Memory Protection
  • Hardware
  • history IBM 360 had a 4bit protection code in
    PSW and memory in 2k partitions - process code in
    PSW matches memory partition code
  • Two registers - base limit
  • base is added by hardware without changing
    instructions dynamic relocation
  • every request is checked against limit
    runtime bound checking
  • reminder In the IBM/pc there are segment
    registers (but no limit)

9
Managing memory by Swapping
  • Processes from disk to memory and from memory to
    disk
  • Whenever there are too many jobs to fit in memory
  • To use memory more efficiently - variable
    partitions
  • Allocating memory
  • Freeing memory and holes
  • possible solution memory compaction
  • some form of swapping is required with any
    multiprogramming
  • since swapping is performed on whole processes
    it results in a noticeable response time
  • longer queues of blocked processes can lead to
    many swaps..
  • Allocating swap space
  • Processes are swapped in/out from the same
    location
  • Allocate space for non-memory resident processes
    only

10
Paging and Virtual Memory
  • Divide memory into fixed-size blocks
    (page-frames)
  • Small enough blocks - many for one process
  • Allocate to processes non-contiguous memory
    chunks - avoiding holes..
  • 232 addresses for a 32 bit (address bus) machine
    - virtual addresses
  • A memory management unit (MMU) does the mapping
    to physical addresses
  • pages ---gt page frames
  • Machine instructions reference addresses more
    than one address per instruction plus fetching
    instructions
  • absolute code becomes meaningless..

11
Memory Management Unit
12
Paging
13
MMU Operation - page fault if accessed page is
absent
14
Page table considerations
  • Can be very large (1M pages for 32bits addresses)
  • Must be fast (every instruction needs it)
  • One extreme will have it all in hardware - fast
    registers that hold the page table and are loaded
    with each process, too expensive for the above
    size
  • The other extreme has it all in memory (using a
    page table base register (ptbr) to point to it -
    each memory reference during instruction
    translation is doubled...
  • To avoid keeping complete page tables in memory -
    make them multilevel (and avoid the danger of
    accumulating memory references per instruction by
    caching)
  • a fast cache (additional 20) and a 98 hit
    ratio, on a four-level page table, for a 100
    nanoseconds memory access machine
  • effective access time 0.98 x 120 0.02 x 520
    128 nanosecs

15
Page Tables - Handling the size problem
16
SPARC 3 level pagingContext table (MMU
hardware) - 1 entry per process
17
Associative Memory - content addressable
memorypage insertion - complete entry from page
tablepage deletion - just the modified bit to
page table
18
Associative Memory - comments
  • With a large enough hit-ratio the average access
    time is close to 0
  • linked lists, for example, are bad..
  • Only a complete virtual address (all levels) can
    be counted as a hit
  • with multi-processing associative memory can be
    cleared on context switch - wasteful..
  • Add a field to the associative memory to hold
    process ID and a special register for PID

19
No page tables - MIPS R2000
  • 64 entry associative memory for virtual pages
  • if not found, TRAP to the operating system
  • software uses some hardware registers to find the
    virtual page needed
  • a second trap may happen by page fault...

20
Inverted page tables
  • for very large memories (page tables) one can
    have an inverted page table sorted by
    (physical) page frames
  • IBM RT HP Spectrum (thinking of 64 bit
    memories)
  • to avoid linear search for every virtual
    address of a process use a hash table (one or a
    few memory references)
  • only one page table the physical one for all
    processes currently in memory
  • in addition to the hash table, associative
    memory registers are used to store recently used
    page table entries
  • the only way to deal with a 64 bit memory 4k
    size pages two-level page tables can result in
    242 entries

21
Inverted Page Table Architecture
22
Pages the dataPage frames the physical memory
locations
  • Page Table Entries (PTE) contain (per page)
  • Page frame number (physical address)
  • Present/absent bit (valid bit)
  • Dirty (modified) bit
  • Referenced (accessed) bit
  • Protection
  • Caching disable/enable

page frame number
23
Page fault Handling
  • 1. trap to kernel, save PC on stack and
    (sometimes) partial state in registers (and/or
    stack)
  • 2. assembly routine saves volatile information
    and calls the operating system
  • 3. find requested virtual page
  • 4. check protection. If legal, find free page
    frame (or invoke page replacement algorithm)
  • 5. if replacing, check if modified and start
    write to disk. Mark frame busy. Call scheduler
    to block process until the write-to-disk process
    has completed.

24
Page fault Handling (contnd.)
  • 6. transfer of requested page from disk
    (scheduler runs alternative processes)
  • 7. upon transfer completion, enter page table,
    mark new page as valid and update all other
    parameters
  • 8. back up faulted instruction which was in
    principle in mid execution now the PC can be
    set back to its initial value
  • 9. schedule faulting process, return from
    operating system
  • 10. restore state (i.e. all volatile information
    stored by the assembly routine) and restart
    execution of faulted process

25
Architecture - Instruction backup
  • page faulting instructions trap to OS
  • OS must restart instruction
  • The page fault may originate at the op-code or
    any of the operands - PC value useless
  • the location of the instruction itself is lost
  • worse still, undoing of autoincrement or
    autodecrement - was it already performed ??
  • Hardware solutions
  • Register to store PC value of instruction and
    register to store changes to other registers
    (increment/decrement)
  • Micro-code dumps all information on the stack
  • Restart complete instruction and redo increments
    etc.
  • Do nothing - RISC ......

26
Demand Paging
  • Processes reside on disk and their swapping-in
    is performed partially only part of their pages
  • During run time a process may encounter a
    missing page and demand it
  • a missing page has its invalid bit on (which
    will need to be differentiated by the page-fault
    routine from illegal address)
  • Page missing ?? Retrieve page into empty page
    frame
  • No empty page frame ?? Evict (replace) a page
  • Many algorithms possible for selecting a page for
    replacement
  • Optimal page replacement
  • Discard page to be used the longest time ahead
  • Not realizable...
  • but can be used to compare to real algorithms !!

27
Optimal page replacement
  • Demand comes in for pages
  • 7, 5, 1, 0, 5, 4, 7, 0, 2,
    1, 0, 7
  • an optimal algorithm faults on
  • 7 5 1 (0,1) - (4,5) - - (2,4) (1,2)
    - -
  • altogether 7 page-replacements
  • take FIFO for example
  • 7 5 1 (0,7) - (4,5) (7,1) - (2,0) (1,4)
    (0,7)(7,2)
  • 3 additional page-replacements

28
Good old FIFO
  • implemented as a queue
  • the usual drawback
  • oldest page may be a referenced (needed) page
  • second chance FIFO
  • if reference bit is on - move to end of queue
  • Better to implement as a circular queue
  • save overhead of movements on the queue

29
Page replacement NRU - Not Recently Used
  • There are 4 classes of pages, according to
    reference and modification bits
  • Select a page at random from the least-needed
    class
  • Easy scheme to implement
  • Prefers a frequently referenced (not modified)
    page on an old modified page
  • Class b is interesting, can only happen when
    clock tick generates an erasure of the referenced
    bit..

30
LRU - Least Recently Used
  • Approximate the optimal algorithm -
  • most recently used page as most probable next
    reference
  • Replace page used furthest in the past
  • Not easy to implement - needs counting of
    references
  • Use a large counter (number of operations) and
    save in a field in the page table, for each page
    reference operation
  • Another option is to use a bit array of nxn bits
  • In both cases the page entry with the smallest
    number attached to it is selected for replacement

31
LRU with bit tables
32
NFU - Not Frequently Used
  • In order to record frequently used pages add a
    counter to all table entries
  • At each clock tick add the R bit to the counters
  • Select page with lowest counter for replacement
  • problem remembers everything
  • remedy (an aging algorithm)
  • shift-right the counter before adding the
    reference bit
  • add the reference bit at the left
  • Less operations than LRU, depends on the
    intervals used for updating

33
NFU - the aging simulation version
34
Modelling paging algorithms
  • Beladys anomaly
  • Example FIFO with reference string 123412512345

35
Characterizing paging systems
  • a Reference string (of requested pages)
  • number of virtual pages n
  • number of physical page frames m
  • a page replacement algorithm
  • can be represented by an array M of n rows

36
Stack Algorithms
  • Definition Set of pages in physical memory with
    m page frames is a subset of the pages in
    physical memory with m1 page frames (for every
    reference string)
  • Stack algorithms have no anomaly
  • Example LRU, optimal replacement
  • FIFO is not a stack algorithm
  • Useful definition
  • Distance string distance from top of stack

37
Predicting page fault number
  • Ci is the number of times that i is in the
    distance string
  • the number of page faults with m frames is
  • Fm

38
Page Frame Allocation
  • for a page-fault rate p, memory access time of
    100 nanosecs and page-fault service time of 25
    milisecs the effective access time is (1-p) x
    100 p x 25,000,000
  • for p of 0.001 the effective access time is
    still larger than 100 nanosecs by a factor of 250
  • for a goal of only a 10 degradation in access
    time we need p 0.0000004
  • policies for page-frame allocation must allocate
    as much as possible to processes, to enhance
    performance leave no unassigned page-frame
  • difficult to know how much frames to allocate to
    processes differ in size structure priority

39
Allocation to multiprocesses
  • Fair share is not the best policy (static !!)
  • allocate according to process size
  • must be a minimum for running a process...

Age
A6
A6
40
(dynamic) Page Allocation Policies
  • 1st option - fixed number of pages per process
    2nd option proportional to process size
  • Locality of reference - a valid statistical
    phenomenon
  • Working set - sets of pages used by each process
  • Working set model - dynamic number of pages per
    process, a necessary condition for running (can
    be used for prepaging - load working set before
    running process)
  • Keep track by aging by lookback parameter
    WSClock
  • Thrashing - very frequent page faults (more than
    computation)
  • cpu utilization decreases ? increase
    multiprogramming degree ? more utilization
    decreases ?
  • whenever the in-memory pages are not the working
    set
  • what to do for processes being swapped ?

41
Dynamic set - Page Allocation
  • 0 2 1 3 5 4 6 3 7 5 7 3 3 5 6 4
  • with 5 page frames (LRU)
  • p p p p p p p - p - - - - - -
    - optimal
  • with ? 5 (and LRU)
  • p p p p p p p - p - - (4)(3) - p -
  • for a window of size 5 the allocated WS is
    decreasing after request 12 and 14
  • the maximum page allocation is ?
  • extra page fault, because of the size of the WS
  • after the last request, page 4, the number of
    allocated page frames increases again (4)

42
Dynamic set - Clock Algorithm
  • WSClock is a global clock algorithm - for pages
    held by all processes in memory
  • Circling the clock, the algorithm uses the
    reference bit and adds to it a measure of window
    size ?
  • Each time a reference bit is set an additional
    data structure, ref(frame), is set to the current
    virtual time of the process
  • WSClock Use an additional condition that
    measures elapsed (process) time and compares it
    to ?
  • replace page when two conditions apply
  • reference bit is unset
  • Tp -- ref(frame) gt ?

43
Dynamic set - WSClock Example
  • 3 processes p0, p1 and p2
  • current (virtual) times of the 3 processes are
  • Tp0 50 Tp1 70 Tp2 90
  • WSClock replace when Tp -- ref(frame) gt ?
  • the minimal distance (window size) is ? 20
  • The clock hand is currently pointing to page
    frame 4
  • page-frames 0 1 2 3 4 5 6
    7 8 9 10
  • ref. bit 0 0 1 1 1 0 1
    0 0 1 0
  • process ID 0 1 0 1 2 1 0
    0 1 2 2
  • last_ref 10 30 52 71 81 37 61 37 31
    47 55

44
Page Daemons - Unix
  • It is assumed useful to keep a number of free
    pages
  • freeing of page frames can be done by a page
    daemon - a process that sleeps most of the time
  • awakened periodically to inspect the state of
    memory - if there are too few free page frames
    then they free page frames
  • yet another type of (global) dynamic page
    replacement policy
  • this strategy performs better than evicting pages
    when needed (and writing the modified to disk in
    a hurry)

45
Comment - Page size analysis
  • To minimize wasted memory
  • process size s
  • page size p
  • page table entry size e
  • Fragmentation overhead is
  • Table space overhead is
  • Total overhead is
  • Minimize overhead
  • Example s 128k e 8bytes
  • optimal page size is 1488 bytes... i.e. use
    1k or 2k

46
Additional issues - Locking and Sharing
  • i/o channel/processor (DMA) transfers data
    independently
  • page must not be replaced during transfer
  • OS can use a lock variable per page
  • Pages of editors code - shared among processes
  • swapping out, or terminating, process A (and its
    pages) may cause many page faults for process B
    that shares them
  • looking up for evicted pages in all page tables
    is impossible
  • solution maintain special data structures for
    shared pages

47
Handling the backing store
  • need to store non-resident pages on disk
  • the backing store (disk swap area) need to be
    managed
  • allocate swap area to (whole) processes and
    address pages by offset from swap address
  • processes grow during execution - assign separate
    swap areas to Text Data and Stack
  • allocate disk blocks when needed - needs disk
    addresses in memory to keep track of swapped pages

48
Segmentation
  • several logical address spaces per process
  • a compiler needs segments for
  • source text
  • symbol table
  • constants segment
  • stack
  • parse tree
  • compiler executable code
  • Most of these segments grow during execution

symbol table
symbol table
Source Text
source text
constant table
parse tree
call stack
49
Segmentation vs. Paging
50
Segmentation - segment table
51
Segmentation with Paging
  • MULTICS combined segmentation and paging
  • 218 segments of up to 64k words (36 bits)
  • addresses are 34 bits -
  • 18 bit segment number
  • 16 bit - page number (6) offset within page
    (10)
  • Each process has a segment table (STBR)
  • the segment table is a segment and is paged
    (8bits page10 offset). STBR added to 18 bits
    seg-num
  • Each segment is a separate virtual memory with a
    page table (6 bits)
  • segment tables contain segment descriptors - 18
    bits page table address 9 bits segment length

52
MULTICS segment descriptors
53
segmentation and paging - locating addresses
54
Segmentation - Memory reference procedure
  • 1. Use segment number to find segment descriptor
  • segment table is itself paged because it is
    large, so in actuality a STBR is used to locate
    page of descriptor
  • 2. Check if page table is in memory
  • if not a segment fault occurs
  • if there is a protection violation TRAP (fault)
  • 3. page table examined, a page fault may occur.
  • if page is in memory the address of start of page
    is extracted from page table
  • 4. offset is added to the page origin to
    construct main memory address
  • 5. perform read/store etc.

55
Paged segmentation on the INTEL 80386
  • 16k segments, each up to 1G (32bit words)
  • 2 types of segment descriptors
  • Local Descriptor Table (LDT), for each process
  • Global (GDT) system etc.
  • access by loading a 16bit selector to one of the
    6 segment registers CS, DS, SS, (holding the
    16bit selector during run time, 0 means
    not-in-use)
  • Selector points to segment descriptor (8 bytes)

Privilege level (0-3)
0 GDT/ 1 LDT
13
1
2
Index
56
80386 - segment descriptors
57
80386 - Forming the linear address
  • Segment descriptor is in internal (microcode)
    register
  • If segment is not zero (TRAP) or paged out (TRAP)
  • Segment size is checked against limit field of
    descriptor
  • Base field of descriptor is added to offset (4k
    page-size)

58
80386 - paged segmentation (contnd.)
  • Combine descriptor and offset into linear address
  • If paging disabled, pure segmentation (286
    compatibility). Linear address is physical
    address
  • Paging is 2-level
  • page directory (1k) page table (1k)
  • pages are 4k bytes each (12bit offset)
  • Page directory is pointed to by a special
    register
  • PTEs have 20bits page frame and 12 bits of
    modified, accessed, protection, etc.
  • Small segments have just a few page tables

59
80386 - 2-level paging
60
Intel 30386 address translation
61
The Buddy System
Write a Comment
User Comments (0)
About PowerShow.com