Title: MET TC670 B1 Computer Science Concepts in Telecommunication Systems
1MET TC670 B1Computer Science Concepts in
Telecommunication Systems
2Lecture 4, September 30, 2003
- Memory management
- Programming Concepts, and Project 1
3Memory Management
- Goals of memory management
- convenient abstraction for programming
- isolation between processes
- allocate scarce memory resources between
competing processes, maximize performance
(minimize overhead) - Mechanisms
- physical vs. virtual address spaces
- page table management, segmentation policies
- page replacement policies
4Memory Management Topics
- Virtual memory techniques
- Paging system techniques
- Segmentation techniques
- Replacement algorithms
5Earlier Technique Virtual Memory
- The basic abstraction that the OS provides for
memory management is virtual memory (VM) - VM enables programs to execute without requiring
their entire address space to be resident in
physical memory - program can also execute on machines with less
RAM than it needs - many programs dont need all of their code or
data at once (or ever) - e.g., branches they never take, or data they
never read/write - no need to allocate memory for it, OS should
adjust amount allocated based on its run-time
behavior - virtual memory isolates processes from each other
- one process cannot name addresses visible to
others each process has its own isolated address
space
6In the beginning
- First, there was batch programming
- programs used physical addresses directly
- OS loads job, runs it, unloads it
- Then came multiprogramming
- need multiple processes in memory at once
- to overlap I/O and computation
- memory requirements
- protection restrict which addresses processes
can use, so they cant stomp on each other - fast translation memory lookups must be fast, in
spite of protection scheme - fast context switching when swap between jobs,
updating memory hardware (protection and
translation) must be quick
7Virtual Addresses
- To make it easier to manage memory of multiple
processes, make processes use virtual addresses - virtual addresses are independent of location in
physical memory (RAM) that referenced data lives - OS determines location in physical memory
- instructions issued by CPU reference virtual
addresses - e.g., pointers, arguments to load/store
instruction, PC, - virtual addresses are translated by hardware into
physical addresses (with some help from OS) - The set of virtual addresses a process can
reference is its address space - many different possible mechanisms for
translating virtual addresses to physical
addresses - well take a historical walk through them, ending
up with our current techniques
8Old technique 1 Fixed Partitions
- Physical memory is broken up into fixed
partitions - all partitions are equally sized, partitioning
never changes - hardware requirement base register
- physical address virtual address base
register - base register loaded by OS when it switches to a
process - how can we ensure protection?
- Advantages
- simple, ultra-fast context switch
- Problems
- internal fragmentation memory in a partition not
used by its owning process isnt available to
other processes - partition size problem no one size is
appropriate for all processes - fragmentation vs. fitting large programs in
partition
9Fixed Partitions (K bytes)
physical memory
0
partition 0
K
3K
partition 1
base register
2K
partition 2
3K
partition 3
offset
4K
virtual address
partition 4
5K
partition 5
10Old technique 2 Variable Partitions
- Obvious next step physical memory is broken up
into variable-sized partitions - hardware requirements base register, limit
register - physical address virtual address base
register - how do we provide protection?
- if (physical address gt base limit) then ?
- Advantages
- no internal fragmentation
- simply allocate partition size to be just big
enough for process - (assuming we know what that is!)
- Problems
- external fragmentation
- as we load and unload jobs, holes are left
scattered throughout physical memory
11Variable Partitions
physical memory
partition 0
base register
limit register
P3s base
P3s size
partition 1
partition 2
yes
lt?
offset
partition 3
virtual address
no
partition 4
raise protection fault
12Modern technique Paging
- Solve the external fragmentation problem by using
fixed sized units in both physical and virtual
memory
virtual memory
physical memory
page 0
frame 0
page 1
frame 1
page 2
frame 2
page 3
frame Y
page X
13Users Perspective
- Processes view memory as a contiguous address
space from bytes 0 through N - virtual address space (VAS)
- In reality, virtual pages are scattered across
physical memory frames - virtual-to-physical mapping
- this mapping is invisible to the program
- Protection is provided because a program cannot
reference memory outside of its VAS - the virtual address 0xDEADBEEF maps to different
physical addresses for different processes
14Paging
- Translating virtual addresses
- a virtual address has two parts virtual page
number offset - virtual page number (VPN) is index into a page
table - page table entry contains page frame number (PFN)
- physical address is PFNoffset
- Page tables
- managed by the OS
- map virtual page number (VPN) to page frame
number (PFN) - VPN is simply an index into the page table
- one page table entry (PTE) per page in virtual
address space - i.e., one PTE per VPN
15Paging
virtual address
offset
virtual page
physical memory
page frame 0
page table
page frame 1
physical address
page frame 2
offset
page frame
page frame
page frame 3
page frame Y
16Paging example
- assume 32 bit addresses
- assume page size is 4KB (4096 bytes, or 212
bytes) - VPN is 20 bits long (220 VPNs), offset is 12 bits
long - lets translate virtual address 0x13325328
- VPN is 0x13325, and offset is 0x328
- assume page table entry 0x13325 contains value
0x03004 - page frame number is 0x03004
- VPN 0x13325 maps to PFN 0x03004
- physical address PFNoffset 0x03004328
17Page Table Entries (PTEs)
20
2
1
1
1
page frame number
prot
M
R
V
- PTEs control mapping
- the valid bit says whether or not the PTE can be
used - says whether or not a virtual address is valid
- it is checked each time a virtual address is used
- the reference bit says whether the page has been
accessed - it is set when a page has been read or written to
- the modify bit says whether or not the page is
dirty - it is set when a write to the page has occurred
- the protection bits control which operations are
allowed - read, write, execute
- the page frame number determines the physical
page - physical page start address PFN
18Paging Advantages
- Easy to allocate physical memory
- physical memory is allocated from free list of
frames - to allocate a frame, just remove it from its free
list - external fragmentation is not a problem!
- complication for kernel contiguous physical
memory allocation - many lists, each keeps track of free regions of
particular size - regions sizes are multiples of page sizes
- buddy algorithm
- Easy to page out chunks of programs
- all chunks are the same size (page size)
- use valid bit to detect references to paged-out
pages - also, page sizes are usually chosen to be
convenient multiples of disk block sizes
19Paging Disadvantages
- Can still have internal fragmentation
- process may not use memory in exact multiples of
pages - Memory reference overhead
- 2 references per address lookup (page table, then
memory) - solution use a hardware cache to absorb page
table lookups - translation lookaside buffer (TLB) many
details, textbook - Memory required to hold page tables can be large
- need one PTE per page in virtual address space
- 32 bit AS with 4KB pages 220 PTEs 1,048,576
PTEs - 4 bytes/PTE 4MB per page table
- OSs typically have separate page tables per
process - 25 processes 100MB of page tables
- solution page the page tables (!!!)
- (ow, my brain hurtsso complicated)
20Two-level page tables
- With two-level PTs, virtual addresses have 3
parts - master page number, secondary page number, offset
- master PT maps master PN to secondary PT
- secondary PT maps secondary PN to page frame
number - offset PFN physical address
- Example
- 4KB pages, 4 bytes/PTE
- how many bits in offset? need 12 bits for 4KB
- want master PT in one page 4KB/4 bytes 1024
PTE - hence, 1024 secondary page tables
- so master page number 10 bits, offset 12
bits - with a 32 bit address, that leaves 10 bits for
secondary PN
21Two level page tables
virtual address
secondary page
master page
offset
physical memory
page frame 0
master page table
physical address
page frame 1
offset
page frame
secondary page table
secondary page table
page frame 2
page frame 3
page frame number
page frame Y
22Addressing Page Tables
- Where are page tables stored?
- and in which address space?
- Possibility 1 physical memory
- easy to address, no translation required
- but, page tables consume memory for lifetime of
VAS - Possibility 2 virtual memory (OSs VAS)
- cold (unused) page table pages can be paged out
to disk - but, addresses page tables requires translation
- how do we break the recursion?
- dont page the outer page table (called wiring)
- So, now that weve paged the page tables, might
as well page the entire OS address space! - tricky, need to wire some special code and data
(e.g., interrupt and exception handlers)
23Making it all efficient
- Original page table schemed doubled the cost of
memory lookups - one lookup into page table, a second to fetch the
data - Two-level page tables triple the cost!!
- two lookups into page table, a third to fetch the
data - How can we make this more efficient?
- goal make fetching from a virtual address about
as efficient as fetching from a physical address - solution use a hardware cache inside the CPU
- cache the virtual-to-physical translations in the
hardware - called a translation lookaside buffer (TLB)
- TLB is managed by the memory management unit (MMU)
24TLBs
- Translation lookaside buffers
- translates virtual page s into PTEs (not
physical addrs) - can be done in single machine cycle
- TLB is implemented in hardware
- is a fully associative cache (all entries
searched in parallel) - cache tags are virtual page numbers
- cache values are PTEs
- with PTE offset, MMU can directly calculate the
PA - TLBs exploit locality
- processes only use a handful of pages at a time
- 16-48 entries in TLB is typical (64-192KB)
- can hold the hot set or working set of
process - hit rates in the TLB are therefore really
important
25Managing TLBs
- Address translations are mostly handled by the
TLB - gt99 of translations, but there are TLB misses
occasionally - in case of a miss, who places translations into
the TLB? - Hardware (memory management unit, MMU)
- knows where page tables are in memory
- OS maintains them, HW access them directly
- tables have to be in HW-defined format
- this is how x86 works
- Software loaded TLB (OS)
- TLB miss faults to OS, OS finds right PTE and
loads TLB - must be fast (but, 20-200 cycles typically)
- CPU ISA has instructions for TLB manipulation
- OS gets to pick the page table format
26Managing TLBs (2)
- OS must ensure TLB and page tables are consistent
- when OS changes protection bits in a PTE, it
needs to invalidate the PTE if it is in the TLB - What happens on a process context switch?
- remember, each process typically has its own page
tables - need to invalidate all the entries in TLB!
(flush TLB) - this is a big part of why process context
switches are costly - can you think of a hardware fix to this?
- When the TLB misses, and a new PTE is loaded, a
cached PTE must be evicted - choosing a victim PTE is called the TLB
replacement policy - implemented in hardware, usually simple (e.g. LRU)
27More Techniques Segmentation
- A similar technique to paging is segmentation
- segmentation partitions memory into logical units
- stack, code, heap,
- on a segmented machine, a VA is ltsegment ,
offsetgt - segments are units of memory, from the users
perspective - A natural extension of variable-sized partitions
- variable-sized partition 1 segment/process
- segmentation many segments/process
- Hardware support
- multiple base/limit pairs, one per segment
- stored in a segment table
- segments named by segment , used as index into
table
28Segment lookups
segment table
physical memory
segment 0
segment
offset
segment 1
virtual address
segment 2
yes
lt?
segment 3
no
segment 4
raise protection fault
29Combining Segmentation and Paging
- Can combine these techniques
- x86 architecture supports both segments and
paging - Use segments to manage logically related units
- stack, file, module, heap, ?
- segment vary in size, but usually large (multiple
pages) - Use pages to partition segments into fixed chunks
- makes segments easier to manageme within PM
- no external fragmentation
- segments are pageable- dont need entire
segment in memory at same time - Linux
- 1 kernel code segment, 1 kernel data segment
- 1 user code segment, 1 user data segment
- N task state segments (stores registers on
context switch) - 1 local descriptor table segment (not really
used) - all of these segments are paged
- three-level page tables
30Cool Paging Tricks
- Exploit level of indirection between VA and PA
- shared memory
- regions of two separate processes address spaces
map to the same physical frames - read/write access to share data
- execute shared libraries!
- will have separate PTEs per process, so can give
different processes different access privileges - must the shared region map to the same VA in each
process? - copy-on-write (COW), e.g. on fork( )
- instead of copying all pages, created shared
mappings of parent pages in child address space - make shared mappings read-only in child space
- when child does a write, a protection fault
occurs, OS takes over and can then copy the page
and resume client
31Another great trick
- Memory-mapped files
- instead of using open, read, write, close
- map a file into a region of the virtual address
space - e.g., into region with base X
- accessing virtual address XN refers to offset
N in file - initially, all pages in mapped region marked as
invalid - OS reads a page from file whenever invalid page
accessed - OS writes a page to file when evicted from
physical memory - only necessary if page is dirty
32Demand Paging
- Pages can be moved between memory and disk
- this process is called demand paging
- is different than swapping (entire process moved,
not page) - OS uses main memory as a (page) cache of all of
the data allocated by processes in the system - initially, pages are allocated from physical
memory frames - when physical memory fills up, allocating a page
in requires some other page to be evicted from
its physical memory frame - evicted pages go to disk (only need to write if
they are dirty) - to a swap file
- movement of pages between memory / disk is done
by the OS - is transparent to the application
- except for performance
33Key Algorithms Replacement
- What happens to a process that references a VA in
a page that has been evicted? - when the page was evicted, the OS sets the PTE as
invalid and stores (in PTE) the location of the
page in the swap file - when a process accesses the page, the invalid PTE
will cause an exception (page fault) to be thrown - the OS will run the page fault handler in
response - handler uses invalid PTE to locate page in swap
file - handler reads page into a physical frame, updates
PTE to point to it and to be valid - handler restarts the faulted process
- But where does the page thats read in go?
- have to evict something else (page replacement
algorithm) - OS typically tries to keep a pool of free pages
around so that allocations dont inevitably cause
evictions
34Why does this work?
- Locality!
- temporal locality
- locations referenced recently tend to be
referenced again soon - spatial locality
- locations near recently references locations are
likely to be referenced soon (think about why) - Locality means paging can be infrequent
- once youve paged something in, it will be used
many times - on average, you use things that are paged in
- but, this depends on many things
- degree of locality in application
- page replacement policy and application reference
pattern - amount of physical memory and application
footprint
35Why is this demand paging?
- Think about when a process first starts up
- it has a brand new page table, with all PTE valid
bits false - no pages are yet mapped to physical memory
- when process starts executing
- instructions immediately fault on both code and
data pages - faults stop when all necessary code/data pages
are in memory - only the code/data that is needed (demanded!) by
process needs to be loaded - what is needed changes over time, of course
36Evicting the best page
- The goal of the page replacement algorithm
- reduce fault rate by selecting best victim page
to remove - the best page to evict is one that will never be
touched again - as process will never again fault on it
- never is a long time
- Beladys proof evicting the page that wont be
used for the longest period of time minimizes
page fault rate - Rest of this lecture
- survey a bunch of replacement algorithms
371 Beladys Algorithm
- Provably optimal lowest fault rate (remember
SJF?) - pick the page that wont be used for longest time
in future - problem impossible to predict future
- Why is Beladys algorithm useful?
- as a yardstick to compare other algorithms to
optimal - if Beladys isnt much better than yours, yours
is pretty good - Is there a lower bound?
- unfortunately, lower bound depends on workload
- but, random replacement is pretty bad
382 FIFO
- FIFO is obvious, and simple to implement
- when you page in something, put in on tail of
list - on eviction, throw away page on head of list
- Why might this be good?
- maybe the one brought in longest ago is not being
used - Why might this be bad?
- then again, maybe it is being used
- have absolutely no information either way
- FIFO suffers from Beladys Anomaly
- fault rate might increase when algorithm is given
more physical memory - a very bad property
393 Least Recently Used (LRU)
- LRU uses reference information to make a more
informed replacement decision - idea past experience gives us a guess of future
behavior - on replacement, evict the page that hasnt been
used for the longest amount of time - LRU looks at the past, Beladys wants to look at
future - when does LRU do well?
- when does it suck?
- Implementation
- to be perfect, must grab a timestamp on every
memory reference and put it in the PTE (way too
) - so, we need an approximation
40Approximating LRU
- Many approximations, all use the PTE reference
bit - keep a counter for each page
- at some regular interval, for each page, do
- if ref bit 0, increment the counter (hasnt
been used) - if ref bit 1, zero the counter (has
been used) - regardless, zero ref bit
- the counter will contain the of intervals since
the last reference to the page - page with largest counter is least recently used
- Some architectures dont have PTE reference bits
- can simulate reference bit using the valid bit to
induce faults - hack, hack, hack
414 LRU Clock
- AKA Not Recently Used (NRU) or Second Chance
- replace page that is old enough
- arrange all physical page frames in a big circle
(clock) - just a circular linked list
- a clock hand is used to select a good LRU
candidate - sweep through the pages in circular order like a
clock - if ref bit is off, it hasnt been used recently,
we have a victim - so, what is minimum age if ref bit is off?
- if the ref bit is on, turn it off and go to next
page - arm moves quickly when pages are needed
- low overhead if have plenty of memory
- if memory is large, accuracy of information
degrades - add more hands to fix
42Another Problem allocation of frames
- In a multiprogramming system, we need a way to
allocate physical memory to competing processes - what if a victim page belongs to another process?
- family of replacement algorithms that takes this
into account - Fixed space algorithms
- each process is given a limit of pages it can use
- when it reaches its limit, it replaces from its
own pages - local replacement some process may do well,
others suffer - Variable space algorithms
- processes set of pages grows and shrinks
dynamically - global replacement one process can ruin it for
the rest - linux uses global replacement
43Important concept working set model
- A working set of a process is used to model the
dynamic locality of its memory usage - i.e., working set set of pages process
currently needs - formally defined by Peter Denning in the 1960s
- Definition
- WS(t,w) pages P such that P was referenced in
the time interval (t, t-w) - t time, w working set window (measured in
page refs) - a page is in the working set (WS) only if it was
referenced in the last w references
445 Working Set Size
- The working set size changes with program
locality - during periods of poor locality, more pages are
referenced - within that period of time, the working set size
is larger - Intuitively, working set must be in memory,
otherwise youll experience heavy faulting
(thrashing) - when people ask How much memory does Netscape
need?, really they are asking what is
Netscapes average (or worst case) working set
size? - Hypothetical algorithm
- associate parameter w with each process
- only allow a process to start if its w, when
added to all other processes, still fits in
memory - use a local replacement algorithm within each
process
456 Page Fault Frequency (PFF)
- PFF is a variable-space algorithm that uses a
more ad-hoc approach - monitor the fault rate for each process
- if fault rate is above a given threshold, give it
more memory - so that it faults less
- doesnt always work (FIFO, Beladys anomaly)
- if the fault rate is below threshold, take away
memory - should fault more
- again, not always
467 LFU
- Evict the least frequently used page.
- Bookkeeping the number of visits before
- But
- How long is the history?
- A page was popular, but not known
- The problem of Pollution useless pages occupy
the space forever.
47Thrashing
- What the OS does if page replacement algos fail
- happens if most of the time is spent by an OS
paging data back and forth from disk - no time is spent doing useful work
- the system is over-committed
- no idea which pages should be in memory to
reduced faults - could be that there just isnt enough physical
memory for all processes - solutions?
- Yields some insight into systems researchers
- if system has too much memory
- page replacement algorithm doesnt matter
(over-provisioning) - if system has too little memory
- page replacement algorithm doesnt matter
(overcommitted) - problem is only interesting on the border between
over-provisioned and over-committed - many research papers live here, but not many real
systems do
48Just to mention Internet caches
- Similar idea, different applications
- Web caches, to keep Web pages
- Which to keep in cache?
- Policies LRU, LFU, cost-aware
- New issues different page sizes, different cost
(latency) to download a page - etc
49Summary
- demand paging
- start with no physical pages mapped, load them in
on demand - page replacement algorithms
- 1 Beladys optimal, but unrealizable
- 2 Fifo replace page loaded furthest in past
- 3 LRU replace page referenced furthest in
past - approximate using PTE reference bit
- 4 LRU Clock replace page that is old enough
- 5 working set keep set of pages in memory
that induces the minimal fault rate - 6 page fault frequency grow/shrink page set
as a function of fault rate - local vs. global replacement
- should processes be allowed to evict each others
pages?
50Lecture 4, September 30, 2003
- Memory management
- Programming Concepts and Project 1
51Project Assignment
- Handout
- Assignments
- Code
- Programs will be available online
- Hints on how to compile/run programs online
52Reading
- Chapter 4, section 4.1, 4.3, 4.4
53Next Lecture
- Cover Input/Output (Reading Chapter 5)
- Start File Systems (Reading Chapter 6)
- Homework 2