School of Computing Science - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

School of Computing Science

Description:

Only part of the program needs to be in memory for execution ... If a page fault occurs and there is no free frame, we need to free one. Two ways: ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 54
Provided by: mohamedh
Category:

less

Transcript and Presenter's Notes

Title: School of Computing Science


1
  • School of Computing Science
  • Simon Fraser University
  • CMPT 300 Operating Systems I
  • Ch 9 Virtual Memory
  • Dr. Mohamed Hefeeda

2
Objectives
  • Understand virtual memory system, its benefits,
    and mechanisms that make it feasible
  • Demand paging
  • Page-replacement algorithms
  • Frame allocation
  • Locality and working-set models
  • Understand how the kernel memory is allocated and
    used

3
Background
  • Virtual memory separation of user logical
    memory from physical memory
  • Only part of the program needs to be in memory
    for execution
  • Logical address space can therefore be much
    larger than physical address space
  • Allows address spaces to be shared by several
    processes
  • Allows for more efficient process creation
  • Virtual memory can be implemented via
  • Demand paging
  • Demand segmentation

4
Virtual Memory That is Larger Than Physical Memory
?
5
Demand Paging
  • The core enabling idea of virtual memory systems
  • A page is brought into memory only when needed
  • Why?
  • Less I/O needed
  • Less memory needed
  • Faster response
  • More processes can be admitted to the system
  • How?
  • Process generates logical (virtual) addresses
    which are mapped to physical addresses using a
    page table
  • If the requested page is not in memory, kernel
    brings it from hard disk
  • How do we know whether a page is in memory?

6
Valid-Invalid Bit
  • Each page table entry has a validinvalid bit
  • v ? in-memory,
  • i ? not-in-memory
  • Initially, it is set to i on for entries
  • During address translation, if validinvalid bit
    is i, then it could be
  • Illegal reference (outside process address
    space) ? abort process
  • Legal reference but not in memory ? page fault
    (bring the page from disk)

7
Handling Page Fault
  • OS looks at another table to decide
  • Invalid reference ? abort
  • Just not in memory ? bring it in
  • Get empty frame
  • Swap page into frame (I/O operation)
  • Reset tables
  • Set validation bit v
  • Restart the instruction that caused page fault

8
Handling a Page Fault (contd)
  • Restarting an instruction e.g., C ? A B
  • assume page fault when accessing C
  • bring page that has C in memory (I/O ? process
    may be suspended)
  • fetch ADD instruction (again)
  • fetch A, B (again)
  • do the addition (again)
  • then and store in C
  • Restarting an instruction can be complicated,
    e.g.,
  • MVC (Move Character) instruction in IBM 360/370
    systems
  • Can move up to 256 bytes from one location to
    another, possibly overlapping
  • Page fault may occur in the middle of copying ?
  • some data may be overwritten ?
  • simply restarting instruction is not enough (data
    has been modified)
  • Solution hardware attempts to access both ends
    of both blocks if any is not in memory, a page
    fault occurs before executing instruction
  • Bottom line demanding paging may raise subtle
    problems and they must be addressed

9
Performance of Demand Paging
  • Page Fault Rate 0 ? p ? 1.0
  • if p 0 means no page faults
  • if p 1, means every reference is a fault
  • Effective Access Time (EAT)
  • EAT (1 p) x memory access time
  • p x (page fault time)
  • Page fault time service page-fault interrupt
    (microseconds)
  • read in
    requested page (milliseconds)
  • restart process
    (microseconds)
  • Note reading in requested page may require
    writing another page to disk if there is no free
    frame

10
Demand Paging Example
  • Memory access time 200 nanoseconds
  • Average page fault time 8 milliseconds
  • (disk latency, seek and transfer time)
  • EAT (1 p) x 200 p (8 milliseconds)
  • (1 p) x 200 p x 8,000,000
  • 200 p x 7,999,800
  • If one access out of 1,000 causes a page fault,
    then
  • EAT 8.2 microseconds.
  • This is a slowdown by a factor of 40!!
  • Bottom line We should minimize number of page
    faults they are very costly

11
Virtual Memory and Process Creation
  • VM allows faster/efficient process creation using
  • Copy-on-Write (COW) technique
  • COW allows both parent and child processes to
    initially share the same pages in memory (during
    fork())
  • If either process modifies a shared page, page is
    copied

Copy of C
12
Page Replacement
  • Page fault occurs ? need to bring requested page
    in memory
  • Find location of the requested page on disk
  • Find a free frame
  • If there is a free frame, use it
  • If there is no free frame, use a page
    replacement algorithm to select a victim frame
  • Bring requested page into the free frame
  • Update the page table and free frame list
  • Restart the process

13
Page Replacement (contd)
  • Note
  • we can save swap out overhead if victim page was
    NOT modified
  • ? significant savings (I/O operation)
  • We associate a dirty (modify) bit with each page
    to indicate whether a page has been modified

14
Page Replacement Algorithms
  • Objective minimize page-fault rate
  • Algorithm evaluation
  • Take a particular string of memory references,
    and
  • Compute number of page faults on that string
  • The reference string looks like
  • 1, 2, 3, 4, 1, 2, 5, 1, 2, 3,
    4, 5
  • Notes
  • We use page numbers
  • The address sequence could have been 100, 250,
    270, 301, 490, , Assuming a page of size 100
    bytes.
  • References 250 and 270 are in the same page (2)
    only the first one may cause a page fault. It is
    why we mention 2 only once

15
Page Faults vs. Number of Frames
  • We expect number of page faults decreases as
    number of physical frames allocated to process
    increases

16
Page Replacement First-In-First-Out (FIFO)
  • Reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3,
    4, 5
  • 3 frames (3 pages can be in memory at any time)
  • Let us work it out
  • On every page fault, we show memory contents
  • Number of page faults 9
  • Pros
  • Easy to understand and implement
  • Cons
  • Performance may not always be good
  • It may replace a page that is used heavily (e.g.,
    one that has a variable which is accessed most of
    the time)
  • It suffers from Beladys anomaly

17
FIFO Beladys Anomaly
  • Assume reference string
  • 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
  • If we have 3 frames, how many page faults?
  • 9 page faults
  • If we have 4 frames, how many page faults?
  • 10 page faults
  • More frames are supposed to result in fewer page
    faults!
  • Beladys Anomaly more frames ? more page faults

18
FIFO Beladys Anomaly (contd)
19
Optimal Algorithm
  • Replace page that will not be used for longest
    period of time
  • 4-frame example 1, 2, 3, 4, 1, 2, 5,
    1, 2, 3, 4, 5
  • 6 page faults
  • How can we know the future? We cannot!
  • Used for comparing algorithms

20
Least Recently Used (LRU) Algorithm
  • Try to approximate Optimal policy look at past
    to infer future
  • LRU Replace page that has not been used for
    longest period
  • Rational this page may not be needed anymore
    (e.g., pages of initialization module)
  • 4-frame example 1, 2, 3, 4, 1, 2,
    5, 1, 2, 3, 4, 5
  • 8 page faults (compare to optimal 6, FIFO 10)
  • LRU and Optimal do not suffer from Beladys
    anomaly

21
LRU Implementation Counters
  • Every page-table entry has a time-of-use
    (counter) field
  • When page is referenced, copy CPU logical clock
    into this field
  • CPU clock is maintained in a register and
    incremented with every memory access
  • Need to replace a page, search for the page with
    smallest (oldest) value
  • Cons
  • search time, updating the time-of-use fields
    (writing to memory!), clock overflow
  • Need hardware support (increment clock and update
    time-of-use field)

22
LRU Implementation Stack
  • Keep stack of page numbers in a doubly-linked
    list
  • If a page is referenced, move it to the top
  • The least recently used page sinks to the bottom
  • Cons
  • Each memory reference is a bit expensive
    (requires updating 6 pointers in worst case)
  • Pros
  • No search for replacement
  • Also needs hardware support to update the stack

23
LRU Implementation (contd)
  • Can we implement LRU without hardware support?
  • Say by using interrupts, i.e., when hardware
    needs to update the stack or the counters, it
    issues an interrupt and an ISR does the update?
  • NO. Too costly, it will slow every memory
    reference by a factor of at least 10
  • Even LRU (which approximates OPT) is not easy to
    implement without hardware support!

24
Second-chance (Clock) Replacement
  • An approximation of LRU, aka Clock replacement
  • Each page has a reference bit (ref_bit),
    initially 0
  • When page is referenced, ref_bit is set to 1 (by
    hardware)
  • Maintain a moving pointer to the next (candidate)
    victim
  • When choosing a page to replace, check the
    ref_bit of the victim,
  • if ref_bit 0, replace it
  • else set ref_bit to 0
  • leave page in memory (give it another chance),
  • move pointer to next page,
  • repeat till a victim is found

25
Second-Chance (Clock) Replacement
26
Counting Replacement Algorithms
  • Keep a counter of number of references that have
    been made to each page
  • LFU Algorithm replace page with smallest count
  • Argument page with smallest count is not used
    often
  • Problem some pages were heavily used at earlier
    time, but are no longer needed, will stay in (and
    waste) memory
  • MFU Algorithm replace page with highest count
  • Argument page with the smallest count was
    probably just brought in and has yet to be used
  • Problem consider a code that uses a module or a
    subroutine heavily, MFU will consider it a good
    candidate for eviction!

27
Counting Replacement Algorithms (contd)
  • LFU vs. MFU
  • Consider the following example
  • A database code that reads many pages then
    processes them
  • Which policy (LFU or MFU) would perform better?
  • MFU Even though the read module accumulated
    large frequency, we need to evict its pages
    during processing

28
Commercial Ad
  • CMPT 371 Computer Networks (Spring 2007)
  • Internet Real networks, real Protocols
  • Lost of fun Projects (ALL in JAVA)
  • Multi-threaded web server
  • Ping client
  • Reliable Data Transfer Protocol (part of TCP)
  • Routing Protocol (RIP, used by many routers)
  • Network measurements and analysis experiments
  • http//nsl.cs.surrey.sfu.ca/teaching/07/371/

29
Allocation of Frames
  • Each process needs a minimum number of pages
  • Defined by the computer architecture (hardware)
  • instruction width and number of address
    indirection levels
  • Consider an instruction that takes one operand
    and allows one level of indirection. What is the
    minimum number of frames needed to execute it?
  • load addr
  • Answer 3 (load is in a page, addr is in
    another, addr is in a third)
  • Note Maximum number of frames allocated to a
    process is determined by the OS

30
Frame Allocation
  • Equal allocation All processes get the same
    number of frames
  • m frames, n processes ? each process gets m/n
    frames
  • Proportional allocation Allocate according to
    the size of process
  • Priority Use proportional allocation using
    priorities rather than size

31
Global vs. Local Frame Replacement
  • If a page fault occurs and there is no free
    frame, we need to free one. Two ways
  • Global replacement
  • Process selects a replacement frame from the set
    of all frames one process can take a frame from
    another
  • Commonly used in operating systems
  • Pros
  • Better throughput (process can use any available
    frame)
  • Cons
  • A process cannot control its own page-fault rate

32
Global vs. Local Frame Replacement (contd)
  • Local replacement
  • Each process selects from only its own set of
    allocated frames
  • Pros
  • Each process has its own share of frames not
    impacted by the paging behavior of others
  • Cons
  • A process may suffer from high page-fault rate
    even though there are lightly used frames
    allocated to other processes

33
Thrashing
  • What happens if a process does not have enough
    frames to maintain its active set of pages in
    memory?
  • Page-fault rate is very high. This leads to
  • low CPU utilization, which
  • makes the OS think that it needs to increase the
    degree of multiprogramming, thus
  • OS admits another process to the system (making
    it worse!)
  • Thrashing ? a process is busy swapping pages in
    and out more than executing

34
Thrashing (cont'd)
35
Thrashing (contd)
  • To prevent thrashing, we should provide each
    process with as many frames as it needs
  • How do we know how many frames a process actually
    needs?
  • A program is usually composed of several
    functions or modules
  • When executing a function, memory references are
    made to instructions and local variables of that
    function and some global variables
  • So, we may need to keep in memory only the pages
    needed to execute the function
  • After finishing a function, we execute another.
    Then, we bring in pages needed by the new
    function
  • This is called the Locality Model

36
Locality Model
  • The Locality Model states that
  • As a process executes, it moves from locality to
    locality, where a locality is a set of pages that
    are actively used together
  • Notes
  • locality is not restricted to functions/modules
    it is more general. It could be a segment of code
    in a function, e.g., loop touching
    data/instructions in several pages
  • Localities may overlap
  • Locality is a major reason behind the success of
    demand paging
  • How can we know the size of a locality?
  • Using the Working-Set model

37
Working-Set Model
  • Let ? be a fixed number of page references
  • called working-set window
  • The set of pages in the most recent ? references
    is the working set
  • Example ? 10
  • Size of WS at t1 is 5 pages, and at t2 is 2 pages

38
Working-Set Model (contd)
  • Accuracy of WS model depends on choosing ?
  • if ? is too small, it will not encompass entire
    locality
  • if ? is too large, it will encompass several
    localities
  • if ? ? ? it will encompass entire program
  • Using WS model
  • OS monitors the WS of each process
  • It allocates number of frames WS size to that
    process
  • If we have more memory frames available, another
    process can be started

39
Keeping Track of the Working Set
  • WS is a moving window
  • At each memory reference, a new reference is
    added at one end, and another is dropped off the
    other end
  • Maintaining the entire window is costly
  • Solution Approximate with interval timer a
    reference bit
  • Example ? 10,000 references
  • Timer interrupts every 5,000 references
  • Keep in memory 2 bits for each page
  • Whenever a timer interrupts, copy and sets the
    values of all reference bits to 0
  • If one of the bits in memory 1 ? page in
    working set

40
Thrashing Control Using WS Model
  • WSSi ? the working set size of process Pi
  • Total number of pages referenced in the most
    recent ?
  • m ? memory size in frames
  • D ? WSSi ? total demand frames
  • if D gt m ? Thrashing
  • Policy if D gt m, then suspend one of the
    processes
  • But, marinating WS is costly. Is there an easier
    way to control thrashing?

41
Thrashing Control Using Page-Fault Rate
  • Monitor page-fault rate and increase/decrease
    allocated frames accordingly
  • Establish acceptable page-fault rate range
    (upper and lower bounds)
  • If actual rate too low, process loses frame
  • If actual rate too high, process gains frame

42
Allocating Kernel Memory
  • Treated differently from user memory, why?
  • Kernel requests memory for structures of varying
    sizes
  • Process descriptors (PCB), semaphores, file
    descriptors,
  • Some of them are less than a page
  • Some kernel memory needs to be contiguous
  • some hardware devices interact directly with
    physical memory without using virtual memory
  • Virtual memory may just be too expensive for the
    kernel (cannot afford a page fault)
  • Often, a free-memory pool is dedicated to kernel
    from which it allocates the needed memory using
  • Buddy system, or
  • Slab allocation

43
Buddy System
  • Allocates memory from fixed-size segment
    consisting of physically-contiguous pages
  • Memory allocated using power-of-2 allocator
  • Satisfies requests in units sized as power of 2
  • Request rounded up to next highest power of 2
  • Fragmentation 17 KB request will be rounded to
    32 KB!
  • When smaller allocation needed than is available,
    current chunk split into two buddies of
    next-lower power of 2
  • Continue until appropriate sized chunk available
  • Adjacent buddies are combined (or coalesced)
    together to form a large segment
  • Used in older Unix/Linux systems

44
Buddy System Allocator
45
Slab Allocator
  • Slab allocator
  • Creates caches, each consisting of one or more
    slabs
  • Slab is one or more physically contiguous pages
  • Single cache for each unique kernel data
    structure
  • Each cache is filled with objects
    instantiations of the data structure
  • Objects are initially marked as free
  • When structures stored, objects marked as used
  • Benefits
  • Fast memory allocation, no fragmentation
  • Used in Solaris, Linux

46
Slab Allocation
47
VM and Memory-Mapped Files
  • VM enables mapping a file to memory address space
    of a process
  • How?
  • A page-sized portion of the file is read from the
    file system into a physical frame
  • Subsequent reads/writes to/from file are treated
    as ordinary memory accesses
  • Example mmap() on Unix systems
  • Why?
  • I/O operations (e.g., read(), write()) on files
    are treated as memory accesses ? Simplifies file
    handling (simpler code)
  • More efficient memory accesses are less costly
    than I/O system calls
  • One way of implementing shared memory for
    inter-process communication

48
Memory-Mapped Files and Shared Memory
  • Memory-mapped files allow several processes to
    map the same file ?
  • Allowing pages in memory to be shared
  • Win XP implements shared memory using this
    technique

49
VM Issues Pre-paging
  • Page size selection impacts
  • fragmentation
  • page table size
  • I/O overhead
  • locality
  • Prepaging
  • Prepage all or some of the pages a process will
    need, before they are referenced
  • Tradeoff
  • Reduce number of page faults at process startup
  • But, may waste memory and I/O because some of the
    prepaged pages may not be used

50
VM Issues Program Structure
  • Program structure
  • int data 128128
  • Each row is stored in one page allocated frames
    lt128
  • How many page faults in each of the following
    programs?
  • Program 1
  • for (j 0 j lt128 j)
    for (i 0 i lt 128 i)
    dataij 0
  • page faults 128 x 128 16,384
  • Program 2
  • for (i 0 i lt 128 i)
    for (j 0 j lt 128 j)
    dataij 0
  • page faults 128

51
VM Issues I/O interlock
  • Example Scenario
  • A process allocates a buffer for an I/O request
    (in its own address space)
  • The process issues an I/O request and waits
    (blocks) for it
  • Meanwhile, CPU is given to another process, which
    makes a page fault
  • The (global) replacement algorithm chooses the
    page that contains the buffer as a victim!
  • Later, the I/O device sends an interrupt
    signaling the request is ready
  • BUT the frame that contains the buffer is now
    used by a different process!
  • Solutions
  • Lock the (buffer) page in memory (I/O Interlock)
  • Make I/O in kernel memory (not in user memory)
    data is first transferred to kernel buffers then
    copied to user space
  • Note page locking can be used in other
    situations as well, e.g., kernel pages are locked
    in memory

52
OS Example Windows XP
  • Uses demand paging with clustering
  • Clustering brings in pages surrounding the
    faulting page
  • Processes are assigned working set minimum and
    working set maximum
  • Working set minimum is the minimum number of
    pages the process is guaranteed to have in memory
  • A process may be assigned as many pages up to its
    working set maximum
  • When the amount of free memory in the system
    falls below a threshold, automatic working set
    trimming is performed to restore the amount of
    free memory
  • Working set trimming removes pages from processes
    that have pages in excess of their working set
    minimum

53
Summary
  • Virtual memory A technique to map a large
    logical address space onto a smaller physical
    address space
  • Uses demand paging bring pages into memory when
    needed
  • Allows running large programs in small physical
    memory, page sharing, efficient process creation,
    and simplifies programming
  • Page fault occurs when a referenced page is not
    in memory
  • Page replacement algorithms FIFO, OPT, LRU,
    second-chance,
  • Frame allocation proportional, global and local
    page replacement
  • Thrashing process does not have sufficient
    frames ? too many page faults ? poor CPU
    utilization
  • Locality and working-set models
  • Kernel memory buddy system, slab allocator
  • Many issues and tradeoffs page size, pre-paging,
    I/O interlock, .
Write a Comment
User Comments (0)
About PowerShow.com