MET TC670 B1 Computer Science Concepts in Telecommunication Systems

About This Presentation

Title:

MET TC670 B1 Computer Science Concepts in Telecommunication Systems

Description:

Goals of memory management. convenient abstraction for programming ... protection: restrict which addresses processes can use, so they can't stomp on each other ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 45

Provided by: shudo

Category:

more less

Transcript and Presenter's Notes

Title: MET TC670 B1 Computer Science Concepts in Telecommunication Systems

1
MET TC670 B1Computer Science Concepts in
Telecommunication Systems

Fall 2003

2
Lecture 4, September 30, 2003

Memory management
Programming Concepts, and Project 1

3
Memory Management

Goals of memory management
convenient abstraction for programming
isolation between processes
allocate scarce memory resources between
competing processes, maximize performance
(minimize overhead)
Mechanisms
physical vs. virtual address spaces
page table management, segmentation policies
page replacement policies

4
Memory Management Topics

Virtual memory techniques
Paging system techniques
Segmentation techniques
Replacement algorithms

5
Earlier Technique Virtual Memory

The basic abstraction that the OS provides for
memory management is virtual memory (VM)
VM enables programs to execute without requiring
their entire address space to be resident in
physical memory
program can also execute on machines with less
RAM than it needs
many programs dont need all of their code or
data at once (or ever)
e.g., branches they never take, or data they
never read/write
no need to allocate memory for it, OS should
adjust amount allocated based on its run-time
behavior
virtual memory isolates processes from each other
one process cannot name addresses visible to
others each process has its own isolated address
space

6
In the beginning

First, there was batch programming
programs used physical addresses directly
OS loads job, runs it, unloads it
Then came multiprogramming
need multiple processes in memory at once
to overlap I/O and computation
memory requirements
protection restrict which addresses processes
can use, so they cant stomp on each other
fast translation memory lookups must be fast, in
spite of protection scheme
fast context switching when swap between jobs,
updating memory hardware (protection and
translation) must be quick

7
Virtual Addresses

To make it easier to manage memory of multiple
processes, make processes use virtual addresses
virtual addresses are independent of location in
physical memory (RAM) that referenced data lives
OS determines location in physical memory
instructions issued by CPU reference virtual
addresses
e.g., pointers, arguments to load/store
instruction, PC,
virtual addresses are translated by hardware into
physical addresses (with some help from OS)
The set of virtual addresses a process can
reference is its address space
many different possible mechanisms for
translating virtual addresses to physical
addresses
well take a historical walk through them, ending
up with our current techniques

8
Old technique 1 Fixed Partitions

Physical memory is broken up into fixed
partitions
all partitions are equally sized, partitioning
never changes
hardware requirement base register
physical address virtual address base
register
base register loaded by OS when it switches to a
process
how can we ensure protection?
Advantages
simple, ultra-fast context switch
Problems
internal fragmentation memory in a partition not
used by its owning process isnt available to
other processes
partition size problem no one size is
appropriate for all processes
fragmentation vs. fitting large programs in
partition

9
Fixed Partitions (K bytes)
physical memory
0
partition 0
K
3K
partition 1
base register
2K
partition 2
3K
partition 3

offset
4K
virtual address
partition 4
5K
partition 5
10
Old technique 2 Variable Partitions

Obvious next step physical memory is broken up
into variable-sized partitions
hardware requirements base register, limit
register
physical address virtual address base
register
how do we provide protection?
if (physical address gt base limit) then ?
Advantages
no internal fragmentation
simply allocate partition size to be just big
enough for process
(assuming we know what that is!)
Problems
external fragmentation
as we load and unload jobs, holes are left
scattered throughout physical memory

11
Variable Partitions
physical memory
partition 0
base register
limit register
P3s base
P3s size
partition 1
partition 2
yes

lt?
offset
partition 3
virtual address
no
partition 4
raise protection fault
12
Modern technique Paging

Solve the external fragmentation problem by using
fixed sized units in both physical and virtual
memory

virtual memory
physical memory
page 0
frame 0
page 1
frame 1
page 2
frame 2
page 3

frame Y
page X
13
Users Perspective

Processes view memory as a contiguous address
space from bytes 0 through N
virtual address space (VAS)
In reality, virtual pages are scattered across
physical memory frames
virtual-to-physical mapping
this mapping is invisible to the program
Protection is provided because a program cannot
reference memory outside of its VAS
the virtual address 0xDEADBEEF maps to different
physical addresses for different processes

14
Paging

Translating virtual addresses
a virtual address has two parts virtual page
number offset
virtual page number (VPN) is index into a page
table
page table entry contains page frame number (PFN)
physical address is PFNoffset
Page tables
managed by the OS
map virtual page number (VPN) to page frame
number (PFN)
VPN is simply an index into the page table
one page table entry (PTE) per page in virtual
address space
i.e., one PTE per VPN

15
Paging
virtual address
offset
virtual page
physical memory
page frame 0
page table
page frame 1
physical address
page frame 2
offset
page frame
page frame
page frame 3

page frame Y
16
Paging example

assume 32 bit addresses
assume page size is 4KB (4096 bytes, or 212
bytes)
VPN is 20 bits long (220 VPNs), offset is 12 bits
long
lets translate virtual address 0x13325328
VPN is 0x13325, and offset is 0x328
assume page table entry 0x13325 contains value
0x03004
page frame number is 0x03004
VPN 0x13325 maps to PFN 0x03004
physical address PFNoffset 0x03004328

17
Page Table Entries (PTEs)
20
2
1
1
1
page frame number
prot
M
R
V

PTEs control mapping
the valid bit says whether or not the PTE can be
used
says whether or not a virtual address is valid
it is checked each time a virtual address is used
the reference bit says whether the page has been
accessed
it is set when a page has been read or written to
the modify bit says whether or not the page is
dirty
it is set when a write to the page has occurred
the protection bits control which operations are
allowed
read, write, execute
the page frame number determines the physical
page
physical page start address PFN

18
Paging Advantages

Easy to allocate physical memory
physical memory is allocated from free list of
frames
to allocate a frame, just remove it from its free
list
external fragmentation is not a problem!
complication for kernel contiguous physical
memory allocation
many lists, each keeps track of free regions of
particular size
regions sizes are multiples of page sizes
buddy algorithm
Easy to page out chunks of programs
all chunks are the same size (page size)
use valid bit to detect references to paged-out
pages
also, page sizes are usually chosen to be
convenient multiples of disk block sizes

19
Paging Disadvantages

Can still have internal fragmentation
process may not use memory in exact multiples of
pages
Memory reference overhead
2 references per address lookup (page table, then
memory)
solution use a hardware cache to absorb page
table lookups
translation lookaside buffer (TLB) many
details, textbook
Memory required to hold page tables can be large
need one PTE per page in virtual address space
32 bit AS with 4KB pages 220 PTEs 1,048,576
PTEs
4 bytes/PTE 4MB per page table
OSs typically have separate page tables per
process
25 processes 100MB of page tables
solution page the page tables (!!!)
(ow, my brain hurtsso complicated)

20
Two-level page tables

With two-level PTs, virtual addresses have 3
parts
master page number, secondary page number, offset
master PT maps master PN to secondary PT
secondary PT maps secondary PN to page frame
number
offset PFN physical address
Example
4KB pages, 4 bytes/PTE
how many bits in offset? need 12 bits for 4KB
want master PT in one page 4KB/4 bytes 1024
PTE
hence, 1024 secondary page tables
so master page number 10 bits, offset 12
bits
with a 32 bit address, that leaves 10 bits for
secondary PN

21
Two level page tables
virtual address
secondary page
master page
offset
physical memory
page frame 0
master page table
physical address
page frame 1
offset
page frame
secondary page table
secondary page table
page frame 2
page frame 3
page frame number

page frame Y
22
Addressing Page Tables

Where are page tables stored?
and in which address space?
Possibility 1 physical memory
easy to address, no translation required
but, page tables consume memory for lifetime of
VAS
Possibility 2 virtual memory (OSs VAS)
cold (unused) page table pages can be paged out
to disk
but, addresses page tables requires translation
how do we break the recursion?
dont page the outer page table (called wiring)
So, now that weve paged the page tables, might
as well page the entire OS address space!
tricky, need to wire some special code and data
(e.g., interrupt and exception handlers)

23
Making it all efficient

Original page table schemed doubled the cost of
memory lookups
one lookup into page table, a second to fetch the
data
Two-level page tables triple the cost!!
two lookups into page table, a third to fetch the
data
How can we make this more efficient?
goal make fetching from a virtual address about
as efficient as fetching from a physical address
solution use a hardware cache inside the CPU
cache the virtual-to-physical translations in the
hardware
called a translation lookaside buffer (TLB)
TLB is managed by the memory management unit (MMU)

24
TLBs

Translation lookaside buffers
translates virtual page s into PTEs (not
physical addrs)
can be done in single machine cycle
TLB is implemented in hardware
is a fully associative cache (all entries
searched in parallel)
cache tags are virtual page numbers
cache values are PTEs
with PTE offset, MMU can directly calculate the
PA
TLBs exploit locality
processes only use a handful of pages at a time
16-48 entries in TLB is typical (64-192KB)
can hold the hot set or working set of
process
hit rates in the TLB are therefore really
important

25
Managing TLBs

Address translations are mostly handled by the
TLB
gt99 of translations, but there are TLB misses
occasionally
in case of a miss, who places translations into
the TLB?
Hardware (memory management unit, MMU)
knows where page tables are in memory
OS maintains them, HW access them directly
tables have to be in HW-defined format
this is how x86 works
Software loaded TLB (OS)
TLB miss faults to OS, OS finds right PTE and
loads TLB
must be fast (but, 20-200 cycles typically)
CPU ISA has instructions for TLB manipulation
OS gets to pick the page table format

26
Managing TLBs (2)

OS must ensure TLB and page tables are consistent
when OS changes protection bits in a PTE, it
needs to invalidate the PTE if it is in the TLB
What happens on a process context switch?
remember, each process typically has its own page
tables
need to invalidate all the entries in TLB!
(flush TLB)
this is a big part of why process context
switches are costly
can you think of a hardware fix to this?
When the TLB misses, and a new PTE is loaded, a
cached PTE must be evicted
choosing a victim PTE is called the TLB
replacement policy
implemented in hardware, usually simple (e.g. LRU)

27
More Techniques Segmentation

A similar technique to paging is segmentation
segmentation partitions memory into logical units
stack, code, heap,
on a segmented machine, a VA is ltsegment ,
offsetgt
segments are units of memory, from the users
perspective
A natural extension of variable-sized partitions
variable-sized partition 1 segment/process
segmentation many segments/process
Hardware support
multiple base/limit pairs, one per segment
stored in a segment table
segments named by segment , used as index into
table

28
Segment lookups
segment table
physical memory
segment 0
segment
offset
segment 1
virtual address
segment 2
yes

lt?
segment 3
no
segment 4
raise protection fault
29
Combining Segmentation and Paging

Can combine these techniques
x86 architecture supports both segments and
paging
Use segments to manage logically related units
stack, file, module, heap, ?
segment vary in size, but usually large (multiple
pages)
Use pages to partition segments into fixed chunks
makes segments easier to manageme within PM
no external fragmentation
segments are pageable- dont need entire
segment in memory at same time
Linux
1 kernel code segment, 1 kernel data segment
1 user code segment, 1 user data segment
N task state segments (stores registers on
context switch)
1 local descriptor table segment (not really
used)
all of these segments are paged
three-level page tables

30
Cool Paging Tricks

Exploit level of indirection between VA and PA
shared memory
regions of two separate processes address spaces
map to the same physical frames
read/write access to share data
execute shared libraries!
will have separate PTEs per process, so can give
different processes different access privileges
must the shared region map to the same VA in each
process?
copy-on-write (COW), e.g. on fork( )
instead of copying all pages, created shared
mappings of parent pages in child address space
make shared mappings read-only in child space
when child does a write, a protection fault
occurs, OS takes over and can then copy the page
and resume client

31
Another great trick

Memory-mapped files
instead of using open, read, write, close
map a file into a region of the virtual address
space
e.g., into region with base X
accessing virtual address XN refers to offset
N in file
initially, all pages in mapped region marked as
invalid
OS reads a page from file whenever invalid page
accessed
OS writes a page to file when evicted from
physical memory
only necessary if page is dirty

32
Demand Paging

Pages can be moved between memory and disk
this process is called demand paging
is different than swapping (entire process moved,
not page)
OS uses main memory as a (page) cache of all of
the data allocated by processes in the system
initially, pages are allocated from physical
memory frames
when physical memory fills up, allocating a page
in requires some other page to be evicted from
its physical memory frame
evicted pages go to disk (only need to write if
they are dirty)
to a swap file
movement of pages between memory / disk is done
by the OS
is transparent to the application
except for performance

33
Key Algorithms Replacement

What happens to a process that references a VA in
a page that has been evicted?
when the page was evicted, the OS sets the PTE as
invalid and stores (in PTE) the location of the
page in the swap file
when a process accesses the page, the invalid PTE
will cause an exception (page fault) to be thrown
the OS will run the page fault handler in
response
handler uses invalid PTE to locate page in swap
file
handler reads page into a physical frame, updates
PTE to point to it and to be valid
handler restarts the faulted process
But where does the page thats read in go?
have to evict something else (page replacement
algorithm)
OS typically tries to keep a pool of free pages
around so that allocations dont inevitably cause
evictions

34
Why does this work?

Locality!
temporal locality
locations referenced recently tend to be
referenced again soon
spatial locality
locations near recently references locations are
likely to be referenced soon (think about why)
Locality means paging can be infrequent
once youve paged something in, it will be used
many times
on average, you use things that are paged in
but, this depends on many things
degree of locality in application
page replacement policy and application reference
pattern
amount of physical memory and application
footprint

35
Why is this demand paging?

Think about when a process first starts up
it has a brand new page table, with all PTE valid
bits false
no pages are yet mapped to physical memory
when process starts executing
instructions immediately fault on both code and
data pages
faults stop when all necessary code/data pages
are in memory
only the code/data that is needed (demanded!) by
process needs to be loaded
what is needed changes over time, of course

36
Evicting the best page

The goal of the page replacement algorithm
reduce fault rate by selecting best victim page
to remove
the best page to evict is one that will never be
touched again
as process will never again fault on it
never is a long time
Beladys proof evicting the page that wont be
used for the longest period of time minimizes
page fault rate
Rest of this lecture
survey a bunch of replacement algorithms

37
1 Beladys Algorithm

Provably optimal lowest fault rate (remember
SJF?)
pick the page that wont be used for longest time
in future
problem impossible to predict future
Why is Beladys algorithm useful?
as a yardstick to compare other algorithms to
optimal
if Beladys isnt much better than yours, yours
is pretty good
Is there a lower bound?
unfortunately, lower bound depends on workload
but, random replacement is pretty bad

38
2 FIFO

FIFO is obvious, and simple to implement
when you page in something, put in on tail of
list
on eviction, throw away page on head of list
Why might this be good?
maybe the one brought in longest ago is not being
used
Why might this be bad?
then again, maybe it is being used
have absolutely no information either way
FIFO suffers from Beladys Anomaly
fault rate might increase when algorithm is given
more physical memory
a very bad property

39
3 Least Recently Used (LRU)

LRU uses reference information to make a more
informed replacement decision
idea past experience gives us a guess of future
behavior
on replacement, evict the page that hasnt been
used for the longest amount of time
LRU looks at the past, Beladys wants to look at
future
when does LRU do well?
when does it suck?
Implementation
to be perfect, must grab a timestamp on every
memory reference and put it in the PTE (way too
)
so, we need an approximation

40
Approximating LRU

Many approximations, all use the PTE reference
bit
keep a counter for each page
at some regular interval, for each page, do
if ref bit 0, increment the counter (hasnt
been used)
if ref bit 1, zero the counter (has
been used)
regardless, zero ref bit
the counter will contain the of intervals since
the last reference to the page
page with largest counter is least recently used
Some architectures dont have PTE reference bits
can simulate reference bit using the valid bit to
induce faults
hack, hack, hack

41
4 LRU Clock

AKA Not Recently Used (NRU) or Second Chance
replace page that is old enough
arrange all physical page frames in a big circle
(clock)
just a circular linked list
a clock hand is used to select a good LRU
candidate
sweep through the pages in circular order like a
clock
if ref bit is off, it hasnt been used recently,
we have a victim
so, what is minimum age if ref bit is off?
if the ref bit is on, turn it off and go to next
page
arm moves quickly when pages are needed
low overhead if have plenty of memory
if memory is large, accuracy of information
degrades
add more hands to fix

42
Another Problem allocation of frames

In a multiprogramming system, we need a way to
allocate physical memory to competing processes
what if a victim page belongs to another process?
family of replacement algorithms that takes this
into account
Fixed space algorithms
each process is given a limit of pages it can use
when it reaches its limit, it replaces from its
own pages
local replacement some process may do well,
others suffer
Variable space algorithms
processes set of pages grows and shrinks
dynamically
global replacement one process can ruin it for
the rest
linux uses global replacement

43
Important concept working set model

A working set of a process is used to model the
dynamic locality of its memory usage
i.e., working set set of pages process
currently needs
formally defined by Peter Denning in the 1960s
Definition
WS(t,w) pages P such that P was referenced in
the time interval (t, t-w)
t time, w working set window (measured in
page refs)
a page is in the working set (WS) only if it was
referenced in the last w references

44
5 Working Set Size

The working set size changes with program
locality
during periods of poor locality, more pages are
referenced
within that period of time, the working set size
is larger
Intuitively, working set must be in memory,
otherwise youll experience heavy faulting
(thrashing)
when people ask How much memory does Netscape
need?, really they are asking what is
Netscapes average (or worst case) working set
size?
Hypothetical algorithm
associate parameter w with each process
only allow a process to start if its w, when
added to all other processes, still fits in
memory
use a local replacement algorithm within each
process

45
6 Page Fault Frequency (PFF)

PFF is a variable-space algorithm that uses a
more ad-hoc approach
monitor the fault rate for each process
if fault rate is above a given threshold, give it
more memory
so that it faults less
doesnt always work (FIFO, Beladys anomaly)
if the fault rate is below threshold, take away
memory
should fault more
again, not always

46
7 LFU

Evict the least frequently used page.
Bookkeeping the number of visits before
But
How long is the history?
A page was popular, but not known
The problem of Pollution useless pages occupy
the space forever.

47
Thrashing

What the OS does if page replacement algos fail
happens if most of the time is spent by an OS
paging data back and forth from disk
no time is spent doing useful work
the system is over-committed
no idea which pages should be in memory to
reduced faults
could be that there just isnt enough physical
memory for all processes
solutions?
Yields some insight into systems researchers
if system has too much memory
page replacement algorithm doesnt matter
(over-provisioning)
if system has too little memory
page replacement algorithm doesnt matter
(overcommitted)
problem is only interesting on the border between
over-provisioned and over-committed
many research papers live here, but not many real
systems do