Title: Advanced Operating Systems
1Advanced Operating Systems
Lecture 3 OS design
- University of Tehran
- Dept. of EE and Computer Engineering
- By
- Dr. Nasser Yazdani
2How to design an OS
- Some general guides and experiences.
- References
- The Computer for the 21st Century, Mark Weiser
- Exokernel An Operating System Architecture for
Application Level Resource Management, Dawson
R., Engler M, Frans Kaashoek, et al. - On Micro-Kernel Constructions,
3Outline
- New applications/requirements
- Organizing operating systems
- Some microkernel examples
- Object-oriented organizations
- Spring
- Organization for multiprocessors
4New vision
- Two important problems location and scale.
- Ubiquitous computing tiny kernels of
functionality - Virtual Reality
- Mobility
- Intelligent devices
- distributed computing" make networks appear like
disks, memory, or other nonnetworked devices.
5Ubiquitous computing
- Transparent computing is the ultimate goal
- Computers should disappear into the background
- Computation becomes part of the environment
- Computing everywhere
- Desktop, Laptop, Palmtop
- Cars, Cell phones
- Shoes, Clothing, Walls (paper / paint)
- Connectivity everywhere
- Broadband
- Wireless
- Mobile everywhere
- Users move around
- Disposable devices
6Ubiquitous Computing
- Structure
- Resource and service discovery critical
- User location an issue
- Interface discovery
- Disconnected operation
- Ad-hoc organization
- Security
- Small devices with limited power
- Intermittent connectivity
- Agents
- Sensor Networks
7Grid Computing
- Federated system
- No single controlling authority
- Scheduling
- Processors, bandwidth and other resources
- Policy is an important issue
- Reliability, security, of who can use, and what
one is willing to use. - Systems
- Globus toolkit
- Condor
- Related but not grid CORBA, DCOM, DCE
- Applications
- Distributed supercomputing
8Peer-to-Peer Computing
- Locating Cooperative elements
- Scalability
- OS support
- Security
- Policies
9P2P File Sharing Issues
- Naming
- Data discovery
- Availability
- Security
- Encryption
- Fault tolerance
- Conflict resolution
- Replication
10Other Peer to Peer Technologies
- Ad-hoc networking
- Untrusted nodes used to relay messages
- Multiple routes (distributed and replicated)
- Extends range, reduces power, increases aggregate
bandwidth. - Increases latency, management more difficult.
- Sensor networks
- An application of ad-hoc networking
- Add processing/reduction in the network
11What is the big deal?
- Performance
- Border crossings are expensive
- Change in locality
- Copying between user and kernel buffers
- Application requirements differ in terms of
resource management
12Operating System Organization
- What is the best way to design an operating
system? - Put another way, what are the important software
characteristics of an OS? - What should be in OS kernel or application or
partitioning. - Is there a minimal set for kernel?
13Important OS Software Characteristics
- Correctness and simplicity
- Power and completeness
- Performance
- Extensibility and portability
- Flexibility
- Scalability
- Suitability for distributed and parallel systems
- Compatibility with existing systems
- Security and fault tolerance
14Common OS Organizations
- Monolithic
- Virtual machine
- Layered designs
- Kernel designs
- Microkernels
- Object-Oriented
- Note that individual OS components can be
organized these ways - Trade off between generality and specialization
15What are we shooting for?
- OS should be thin (like a microkernel) providing
only mechanisms not embodying policies (i.e.
management) - Fine grain access to system resources while
avoiding border crossings as much as possible
(like DOS) - Allow flexible extensions for management of
resources (like a microkernel) without
sacrificing safety (like a monolithic kernel)
16Monolithic OS Design
- Build OS as single combined module
- Hopefully using data abstraction,
compartmentalized function, etc. - OS lives in its own, single address space
- Examples
- DOS
- early Unix systems
- most VFS file systems
17Pros/Cons of Monolithic OS Organization
- Highly adaptable (at first . . .)
- Little planning required
- Potentially good performance
- Hard to extend and change
- Eventually becomes extremely complex
- Eventually performance becomes poor
- Highly prone to bugs
18Virtual Machine Organizations
- A base operating system provides services in a
very generic way - One or more other operating systems live on top
of the base system - Using the services it provides
- To offer different views of system to users
- Examples - IBMs VM/370, the Java interpreter
19Pros/Cons of Virtual Machine Organizations
- Allows multiple OS personalities on a single
machine - Good OS development environment
- Can provide good portability of applications
- Significant performance problems
- Especially if more than 2 layers
- Lacking in flexibility
20Old idea
- VM 370
- Virtualization for binary support for legacy apps
- Why resurgence today?
- Companies want a share of everybodys pie
- IBM zSeries mainframes support virtualization
for server consolidation - Enables billing and performance isolation while
hosting several customers - Microsoft has announced virtualization plans to
allow easy upgrades and hosting Linux! - You can see the dots connecting up
- From extensibility (a la SPIN) to virtualization
21Possible virtualization approaches
- Standard OS (such as Linux, Windows)
- Meta services (such as grid) for users to install
files and run processes - Administration, accountability, and performance
isolation become hard - Retrofit performance isolation into OSs
- Linux/RK, QLinux, SILK
- Accounting resource usage correctly can be an
issue unless done at the lowest level (e.g.
Exokernel) - Xen approach
- Multiplex physical resource at OS granularity
22Full virtualization
- Virtual hardware identical to real one
- Relies on hosted OS trapping to the VMM for
privileged instructions - Pros run unmodified OS binary on top
- Cons
- supervisor instructions can fail silently in some
hardware platforms (e.g. x86) - Solution in VMware Dynamically rewrite portions
of the hosted OS to insert traps - need for hosted OS to see real resources real
time, page coloring tricks for optimizing
performance, etc
23Xen principles
- Support for unmodified application binaries
- Support for multi-application OS
- Complex server configuration within a single OS
instance - Paravirtualization for strong resource isolation
on uncooperative hardware (x86) - Paravirtualization to enable optimizing guest OS
performance and correctness
24Xen VM management
- What would make VM virtualization easy
- Software TLB
- Tagged TLB gtno TLB flush on context switch
- X86 does not have either
- Xen approach
- Guest OS responsible for allocating and managing
hardware PT - Xen top 64MB of every address space. Why?
25Layered OS Design
- Design tiny innermost layer of software
- Next layer out provides more functionality
- Using services provided by inner layer
- Continue adding layers until all functionality
required has been provided - Examples
- Multics
- Fluke
- layered file systems and comm. protocols
26Pros/Cons of Layered Organization
- More structured and extensible
- Easy model and development
- Performance Layer crossing can be expensive
- In some cases, unnecessary layers, duplicated
functionality.
27Kernel OS Designs
- Similar to layers, but only two OS layers
- Kernel OS services
- Non-kernel OS services
- Move certain functionality outside kernel
- file systems, libraries
- Unlike virtual machines, kernel doesnt stand
alone - Examples - Most modern Unix systems
28Pros/Cons of Kernel OS Organization
- Many advantages of layering, without disadvantage
of too many layers - Easier to demonstrate correctness
- Not as general as layering
- Offers no organizing principle for other parts of
OS, user services - Kernels tend to grow to monoliths
29Object-Oriented OS Design
- Design internals of OS as set of privileged
objects, using OO methods - Sometimes extended into application space
- Tends to lead to client/server style of computing
- Examples
- Mach (internally)
- Spring (totally)
30Object-Oriented Organizations
- Object-oriented organization is increasingly
popular - Well suited to OS development, in some ways
- OSes manage important data structures
- OSes are modularizable
- Strong interfaces are good in OSes
31Object-Orientation and Extensibility
- One of the main advantages of object-oriented
programming is extensibility - Operating systems increasingly need extensibility
- So, again, object-oriented techniques are a good
match for operating system design
32How object-oriented should an OS be?
- Many OSes have been built with object-oriented
techniques - E.g., Mach and Windows NT
- But most of them leave object orientation at the
microkernel boundary - No attempt to force object orientation on
out-of-kernel modules
33Pros/Cons of Object Oriented OS Organization
- Offers organizational model for entire system
- Easily divides system into pieces
- Good hooks for security
- Can be a limiting model
- Must watch for performance problems
- Not widely used yet
34Microkernel OS Design
- Like kernels, only less number of abstractions
exported (threads, address space, communication
channel) - Try to include only small set of required
services in the microkernel - Moves even more out of innermost OS part
- Like parts of VM, IPC, paging, etc.
- System services (e.g. VM manager) implemented as
servers on top - High comm overhead between services implemented
at user level and microkernel limits
extensibility in practice - Examples - Mach, Amoeba, Plan 9, Windows NT,
Chorus, Spring, etc.
35Pros/Cons of Microkernel Organization
- Those of kernels, plus
- Minimizes code for most important OS services
- Offers model for entire system
- Microkernels tend to grow into kernels
- Requires very careful initial design choices
- Serious danger of bad performance
36Organizing the Total System
- In microkernel organizations, much of the OS is
outside the microkernel - But that doesnt answer the question of how the
system as a whole gets organized - How do you fit together the components to build
an integrated system? While maintaining all the
advantages of the microkernel
37Some Important Microkernel Designs
- Micro-ness is in the eye of the beholder
- Mach
- Spring
- Amoeba
- Plan 9
- Windows NT
38Mach
- Mach didnt start life as a microkernel
- Became one in Mach 3.0
- Object-oriented internally
- Doesnt force OO at higher levels
- Microkernel focus is on communications facilities
- Much concern with parallel/distributed systems
39Mach Model
User processes
User space
Software emulation layer
4.3BSD emul.
SysV emul.
HP/UX emul.
other emul.
Kernel space
Microkernel
40Whats In the Mach Microkernel?
- Tasks Threads
- Ports and Port Sets
- Messages
- Memory Objects
- Device Support
- Multiprocessor/Distributed Support
41Mach Tasks
- An execution environment providing basic unit of
resource allocation - Contains
- Virtual address space
- Port set
- One or more threads
42Mach Task Model
Address space
Process
User space
Thread
Process port
Bootstrap port
Exception port
Registered ports
Kernel
43Mach Threads
- Basic unit of Mach execution
- Runs in context of one task
- All threads in one task share its resources
- Unix process similar to Mach task with single
thread
44Task and Thread Scheduling
- Very flexible
- Controllable by kernel or user-level programs
- Threads of single task can execute in parallel
- On single processor
- Multiple processors
- User-level scheduling can extend to
multiprocessor scheduling
45Mach Ports
- Basic Mach object reference mechanism
- Kernel-protected communication channel
- Tasks communicate by sending messages to ports
- Threads in receiving tasks pull messages off a
queue - Ports are location independent
- Port queues protected by kernel bounded
46Port Rights
- mechanism by which tasks control who may talk to
their ports - Kernel prevents messages being set to a port
unless the sender has its port rights - Port rights also control which single task
receives on a port
47Port Sets
- A group of ports sharing a common message queue
- A thread can receive messages from a port set
- Thus servicing multiple ports
- Messages are tagged with the actual port
- A port can be a member of at most one port set
48Mach Messages
- Typed collection of data objects
- Unlimited size
- Sent to particular port
- May contain actual data or pointer to data
- Port rights may be passed in a message
- Kernel inspects messages for particular data
types (like port rights)
49Mach Memory Objects
- A source of memory accessible by tasks
- May be managed by user-mode external memory
manager - a file managed by a file server
- Accessed by messages through a port
- Kernel manages physical memory as cache of
contents of memory objects
50Mach Device Support
- Devices represented by ports
- Messages control the device and its data transfer
- Actual device driver outside the kernel in an
external object
51Mach Multiprocessor and DS Support
- Messages and ports can extend across
processor/machine boundaries - Location transparent entities
- Kernel manages distributed hardware
- Per-processor data structures, but also
structures shared across the processors - Intermachine messages handled by a server that
knows about network details
52Machs NetMsgServer
- User-level capability-based networking daemon
- Handles naming and transport for messages
- Provides world-wide name service for ports
- Messages sent to off-node ports go through this
server
53NetMsgServer in Action
User space
User space
User process
User process
NetMsgServer
NetMsgServer
Kernel space
Kernel space
Receiver
Sender
54Mach and User Interfaces
- Mach was built for the UNIX community
- UNIX programs dont know about ports, messages,
threads, and tasks - How do UNIX programs run under Mach?
- Mach typically runs a user-level server that
offers UNIX emulation - Either provides UNIX system call semantics
internally or translates it to Mach primitives
55Windows NT
- More layered than some microkernel designs
- NT Microkernel provides base services
- Executive builds on base services via modules to
provide user-level services - User-level services used by
- privileged subsystems (parts of OS)
- true user programs
56Windows NT Diagram
User Processes
Protected Subsystems
User Mode
Win32
POSIX
Kernel Mode
Executive
Microkernel
Hardware
57NT Microkernel
- Thread scheduling
- Process switching
- Exception and interrupt handling
- Multiprocessor synchronization
- Only NT part not preemptible or pageable
- All other NT components runs in threads
58NT Executive
- Higher level services than microkernel
- Runs in kernel mode
- but separate from the microkernel itself
- ease of change and expansion
- Built of independent modules
- all preemptible and pageable
59NT Executive Modules
- Object manager
- Security reference monitor
- Process manager
- Local procedure call facility (a la RPC)
- Virtual memory manager
- I/O manager
60Typical Activity in NT
Win32 Protected Subsystem
Client Process
Executive
Kernel
Hardware
61Windows NT Threads
- Executable entity running in an address space
- Scheduled by kernel
- Handled by kernels dispatcher
- Kernel works with stripped-down view of thread -
kernel thread object - Multiple process threads can execute on distinct
processors--even Executive ones
62Microkernel Process Objects
- A microkernel proxy for the real process
- Microkernels interface to the real process
- Contains pointers to the various resources owned
by the process - e.g., threads and address spaces
- Alterable only by microkernel calls
63Microkernel Thread Objects
- As microkernel process objects are proxies for
the real object, microkernel thread objects are
proxies for the real thread - One per thread
- Contains minimal information about thread
- Priorities, dispatching state
- Used by the microkernel for dispatching
64More On Microkernels
- Microkernels were the research architecture of
the 80s - But few commercial systems of the 90s really use
microkernels - To some extent, microkernel is now a dirty word
in OS design - Why?
65Microkernel Construction
- Most Microkernels do not perform well
- Is it inherent in the approach or
- Implementation?
- IPC, microkernel bottleneck, can implemented an
order of magnitude faster. - Not supervise memory
- Minimal address space management, grant, map,
flush. - Fast kernel-User Switch, usually 20-30 us but 3
in L3 implementation
66Exokernel
- Traditional operating systems fix the interface
and implementation of OS abstractions. - Abstractions must be overly general to work with
diverse application needs.
67Example
Traditional OS
68The Issues
- Performance
- Denies applications the advantages of
domain-specific optimizations - Flexibility
- Restricts the flexibility of application builders
- Functionality
- Discourages changes to the implementations of
existing abstractions
69Performance
- Example A DB can have predictable data access
patterns, that doesn't fit with OS LRU page
replacement, causing bad performance. - Cao et al. Found that application-controlled file
caching can reduce running time by as much as
45. - There is no single way to abstract physical
resources or to implement an abstraction that is
best for all applications. - OS is forced to make trade-offs
- Performance improvements of application-specific
policies could be substantial
70Flexibility
- Fixed high-level abstractions hide information
from applications. - Makes it difficult or impossible for applications
to implement their own resource management
abstractions.
71Functionality
- Only one available interface between applications
and hardware resources. - Because all applications must share one set of
abstractions, changes to these abstractions occur
rarely, if ever
72The Solution
- Separate protection from management
- Allow user level to manage resources
- Application libraries implement OS abstractions
- Exokernel exports resources
- Low level interface
- Protects, does not manage
- Expose hardware
73Exokernel Philosophy
- Applications know better than Operating Systems
what the goal of their resource management
decisions should be - Applications should be given as much control as
possible over those decisions - Implementation view
Exokernel
Frame Buffer TLB Network Memory Disk
HW
74Example
Exokernel Application level resource management
Exokernel
Hardware
75Implementation Overview
- Library O.S., which uses the low-level exokernel
interface to implement higher-level abstractions.
Library O.S.
Exokernel
Frame Buffer TLB Network Memory Disk
HW
76Implementation Overview
- Applications link to library kernel, leveraging
their higher-level abstractions.
Library O.S.
Library O.S.
Application
Application
Exokernel
Frame Buffer TLB Network Memory Disk
HW
77End-to-End Argument
- if something has to be done by the user program
itself, it is wasteful to do it in a lower level
as well. - Why should the OS do anything that the user
program can do itself? - In other words - all an OS should do is securely
allocate resources.
78Exokernel design
79Exokernel tasks
- Track ownership
- Guard all resources through bind points
- Revoke access to resources
80Design principle
- Expose hardware (securely)
- Expose allocation
- Expose names
- Expose revocation
81Secure binding
- Decouples authorization from use
- Allows kernel to protect resource without
understanding their semantics - Example TLB entry
- Virtual to physical mapping performed in the
library (above exokernel) - Binding loaded into the kernel used multiple
times - Example packet filter
- Predicates loaded into the kernel
- Checked on each packet arrival
82Implementing secure bindings
- Hardware mechanisms
- Capability for physical pages of a file
- Frame buffer regions (SGI)
- Software caching
- Exokernel large software TLB overlaying the
hardware TLB - Downloading code into kernel
- Avoid expensive boundary crossings
- Similar to the SPIN idea
83Examples of secure binding
- Physical memory allocation (hardware supported
binding) - Library allocates physical page
- Exokernel records the allocator and the
permissions and returns a capability an
encrypted cypher - Every access to this page by the library requires
this capability
- Page fault
- Kernel fields it
- Kicks it up to the library
- Library allocated a page gets an encrypted
capability - Library calls the kernel to enter a particular
translation into the TLB - by presenting the capability
84- Download code into kernel to establish secure
binding - Packet filter for demultiplexing network packets
- Exactly similar to SPIN
- How to ensure authenticity?
- Only trusted servers (library OS) can download
code into the kernel - Other use of downloaded code
- Execute code on behalf of an app that is not
currently scheduled - E.g. application handler for garbage collection
could be installed in the kernel
85Visible resource revocation
- Most resources are visibly revoked
- E.g. processor physical page
- Library can then perform necessary action before
relinquishing the resource - E.g. needed state saving for a processor
- E.g. update of page table
86Abort protocol
- Repossession exception passed to the library OS
- Repossession vector
- Gives info to the library OS as to what was
repossessed so that corrective action can be
taken - Library OS can seed the vector to enable
exokernel to autosave (e.g. disk blocks to which
a physical page being repossessed should be
written to)
87Aegis an exokernel
88Aegis processor time slice
- Linear vector of time slots
- Round robin
- An application can mark its position in the
vector for scheduling - Timer interrupt
- Beginning and end of time slices
- Control transferred to library specified handler
for actual saving/restoring - Time to save/restore is bounded
- Penalty? loss of a time slice next time!
89Aegis processor environments
- Exception context
- Program generated
- Interrupt context
- External e,g. timer
- Protected entry context
- Cross domain calls
- Addressing context
- Guaranteed mappings implemented by software TLB
mimicking the library OS page table
90Aegis performance
91Aegis - Address translation
- On TLB miss
- Kernel installs hardware from software TLB for
guaranteed mappings - Otherwise application handler called
- Application establishes mapping
- TLB entry with associated capability presented to
the kernel - Kernel installs and resumes execution of the
application
92ExOS library OS
- IPC abstraction
- VM
- Remote communication using ASH (application
specific safe handlers) - Takeaway
- significant performance improvement possible
compared to a monolithic implementation
93The Exokernel
- A thin veneer that multiplexes and exports
physical resources securely. - Simplicity allows efficiency
- The lower the level of a primitive, the more
efficiently it can be implemented, and the more
latitude it grants to implementers of higher
level abstractions.
94The Exokernel
- Resource management is restricted to
- allocation,
- revocation,
- sharing
- ownership tracking
95Library operating systems
- Use the low level exokernel interface
- Higher level abstractions
- Special purpose implementations
- An application can choose the library which best
suits its needs, or even build its own.
96Another Example
97Design Challenge
- How can an Exokernel allow libOSes to freely
manage physical resources while protecting them
from each other? - Track ownership of resources
- Secure bindings libOS can securely bind to
machine resources - Guard all resource usage
- Revoke access to resources
98Secure Bindings
- Exokernel allows libOSes to bind resources using
secure bindings - Multiplex resources securely
- Protection for mutually distrusted apps
- Efficient
99Secure Bindings
- Secure Binding a protection mechanism that
decouples authorization from actual use of a
resource - Allows the kernel to protect resources without
having to understand them
100Guard all resource usage
- Invisible resource revocation
- -Efficient application layer not involved
- -Traditional OS
- Visible resource revocation
- -Allows libOS to guide deallocation and track
availability of resources. - -Exokernel
101Revoke access to resources
- Abort protocol Allows exokernel to break secure
bindings of an uncooperative libOS by force
102Conclusion
- An Exokernel securely multiplexes available
hardware raw hardware among applications - Application level library operating systems
implement higher-level traditional OS
abstractions - LibOSes can specialize an implementation to suit
a particular application
103Conclusion
- The lower the level of a primitive
- the more efficiently it can be implemented
- the more latitude it gives to higher level
abstractions - So, separate management from protection and
- implement protection at a low level (exokernel)
- implement management at a higher level (libOS)
104Some Features
- It is possible to have different libOSes, for
example, one could export a Unix API and another
a Windows API
105Exokernel vs. Microkernel
- A micro-kernel provides abstractions to the
hardware such as files, sockets, graphics etc. - An exokernel provides almost raw access to the
hardware.
106Exokernel
- Implementation Overview
- Allows the extension, specialization, and even
replacement of abstractions. - Example Page Table implementations can vary from
libOS to libOS, and applications can choose
whichever is most suitable for their needs.
107Exokernel
- Implementation Principles
- Provide libOS'es maximum freedom while protecting
them from each other. It is achieved through
separation of protection and resource management.
- Resources should only be managed to the extent
required for protection. LibOS'es handle how best
to use resources, with exokernel arbitrating
between competing libraries. - LibOS's should be able to request specific
physical resources (like specific physical
pages). - Resources should not be implicitly allocated the
LibOS should participate in every allocation.
108Exokernel Design
- Secure Bindings
- Downloading Code
- Visible Revocation
- Abort Protocol
109Exokernel
- Secure Bindings
- Protection mechanism that decouples authorization
(bind time) from actual use of the resource
(access time). - Authorization performed at bind time.
- Expressed in simple operations that the exokernel
can implement quickly and efficiently. - Can protect resources without understanding them.
- Example
- When a page fault occurs, virtual to physical
address mapping is performed, the page is loaded
by the exokernel (bind time), and then used
multiple times (access time).
110Exokernel
- Downloading Code
- Code can be downloaded into the exokernel, for
execution at defined events (like packet
arrival). - Reduces kernel crossings.
- Can execute even when the application isn't
scheduled. - Can initiate events (e.g. - initiate response
message to packet) - Example
- A packet filter is downloaded into the exokernel
(bind time), and then run on every incoming
packet to determine the intended target
application (access time), and can even initiate
a response.
111Exokernel
- Visible Resource Revocation
- Traditionally, OS's revoke (deallocate) resources
invisibly, without application involvement (e.g.
- physical memory). - Advantage lower latency
- Disadvantage applications cannot guide
deallocation - Exokernel uses visible revocation for most
resources. The libraryOS is notified of the
intention to deallocate, and has the capability
of guiding the process. - Example libOS is told that exokernel will
deallocate physical page 5, it can use this
information to update it's page table, or even to
suggest a less important page for deallocation.
112Exokernel
- Abort Protocol
- Mechanism to take away resources when libOS's
fail to respond satisfactorily to visible
revocation requests. - A Repossession Vector is used to keep track of
forcibly deallocated resources. - Library OS's can pre-load the vector with
information that can be used to write state or
data about the resource when it is deallocated
(e.g. - define disk blocks for memory paging). - OS's normally require certain allocations to be
permanent, so exokernel can guarantee a small
number of resources that cannot be forcibly
deallocated. - Example page tables, exception areas
113Exokernel
- Implementation
- Aegis Exokernel
- Exports processor, physical memory,
TLB,exceptions, interrupts, and network
interface. - ExOS Library OS
- Implements processes, virtual memory, user-level
exceptions, interprocess abstractions, and
network protocols (ARP,IP,UDP,NFS) - Compared to Ultrix
114Exokernel
- Aegis
- Processor Time Slices
- Time Slices partitioned and allocated at the
clock granularity. Scheduled using round robin. - Advanced Scheduling can be implemented by libOS
through requesting specific positions in the time
slices. - Long running apps can allocate contiguous time
slices, while interactive apps can allocate
several equidistant slices
115Exokernel
- Aegis
- Exceptions
- Interrupts
- Address Translations
- Guarantees address mappings for small number of
pages, to simplify boot strapping. - Protected Control Transfers
- For IPC abstractions
- Changes program counter to agreed location, sets
appropriate data for context for callee, and
donates current time slice. - Dynamic Packet Filter
116Exokernel
- ExOS
- IPC Abstractions
- pipe ExOS uses shared memory buffer, order of
magnitude faster than Ultrix, which uses standard
unix pipes. - Application Level Virtual Memory
- 150x150 integer matrix mult doesn't use any
special ExOS or Aegis abilities shows
application level VM doesn't incur noticeable
overhead (.1 second difference) - All other tests performs comparably with Ultrix
(reading pages, flipping protection bits, etc...) - Downloaded code for networking handler
- Round Trip latency for RPC faster than FRPC
117Exokernel
- ExOS Extensibility
- Extensible Page-Table structures
- Implemented inverted page tables
- Extensible Schedulers
- Stride Scheduling (proportional share scheduling)
- The processes are succesfully scheduled at a
ration of 321
118Exokernel
- Conclusion
- Experiments with Aegis and ExOS show
- Simple exokernel primitives can be implemented
efficiently - Fast low-level hardware multiplexing can be
implemented efficiently - Traditional OS abstractions can be implemented as
User Level - Applications can create special-purpose
implementations by modifying libraries
119Exokernel
- Other Exokernel Work
- Porting Multithreading Libraries to an Exokernel
SystemErnest Artiaga, Albert Serra, Marisa
GilDept. of Computer ArchitectureUniversitat
Politecnica de CatalunyaACM SIGOPS European
Workshop, ACM 2000, pp. 121-126 - Ported Cthreads to Exokernel
- Slightly faster execution than without threading
120Exokernel
- Other Exokernel Work
- Fast and Flexible Application-Level Networking on
Exokernel SystemGergory Ganger, Dawson Engled,
et al.CMU, Stanford, MIT and Vividon, Inc.ACM
Transactions on Computer Systems, vol. 20, no. 1,
pp. 49--83, 2002 - Implemented TCP, HTTP server, and web
benchmarking tool - TCP 50-300 higher throughput
- HTTP 3-8 higher throughput
- Benchmarking Can produce loads 2-8 times heavier
121Key points of the paper
- Microkernel should provide minimal abstractions
- Address space, threads, IPC
- Abstractions machine independent but
implementation hardware dependent for performance - Myths about inefficiency of micro-kernel stem
from inefficient implementation and NOT from
microkernel approach
122What abstractions?
- Determining criterion
- Functionality not performance
- Hardware and microkernel should be trusted but
applications are not - Hardware provides page-based virtual memory
- Kernel builds on this to provide protection for
services above and outside the microkernel - Principles of independence and integrity
- Subsystems independent of one another
- Integrity of channels between subsystems
protected from other subsystems
123Microkernel Concepts
- Hardware provides address space
- mapping from virtual page to a physical page
- implemented by page tables and TLB
- Microkernel concept of address spaces
- Hides the hardware address spaces and provides an
abstraction that supports - Grant?
- Map?
- Flush?
- These primitives allows building a hierarchy of
protected address spaces
124Address spaces
R
R
A2, P2
V2, NIL
A1, P1
V1, R
(P1, v1)
(P1, v1)
map
A3, P3
V3, R
R
(P2, v2)
A2, P2
V2, R
(P3, v3)
(P1, v1)
flush
R
A3, P3
V3, NIL
(P2, v2)
(P1, v1)
grant
125- Power and flexibility of address spaces
- Initial memory manager for address space A0
appears by magic (similar to SPIN core service
BUT outside the kernel) and encompasses the
physical memory - Allow creation of stackable memory managers (all
outside the kernel) - Pagers can be part of a memory manager or outside
the memory manager - All address space changes (map, grant, flush)
orchestrated via kernel for protection - Device driver can be implemented as a special
memory manager outside the kernel as well
126PT
M2, A2, P2
Map/grant
M1, A1, P1
PT
PT
M0, A0, P0
processor
Microkernel
127Threads and IPC
- Executes in an address space
- PC, SP, processor registers, and state info (such
as address space) - IPC is cross address space communication
- Supported by the microkernel
- Classic method is message passing between threads
via the kernel - Sender sends info receiver decides if it wants
to receive it, and if so where - Address space operations such as map, grant,
flush need IPC - Higher level communication (e.g. RPC) built on
top of basic IPC
128- Interrupts?
- Each hardware device is a thread from kernels
perspective - Interrupt is a null message from a hardware
thread to the software thread - Kernel transforms hardware interrupt into a
message - Does not know or care about the semantics of the
interrupt - Device specific interrupt handling outside the
kernel - Clearing hardware state (if privileged) then
carried out by the kernel upon driver threads
next IPC - TLB handler?
- In theory software TLB handler can be outside the
microkernel - In practice first level TLB handler inside the
microkernel or in hardware
129Unique IDs
- Kernel provides uid over space and time for
- Threads
- IPC channels
130Breaking some performance myths
- Kernel user switches
- Address space switches
- Thread switches and IPC
- Memory effects
- Base system
- 486 (50 MHz) 20 ns cycle time
131Kernel-user switches
- Machine instruction for entering and exiting
- 107 cycles
- Mach measures 900 cycles for kernel-user switch
- Why?
- Empirical proof
- L3 kernel 123 cycles (accounting for some TLB,
cache misses) - Where did the remaining 800 cycles go in MACH?
- Kernel overhead (construction of the kernel, and
inherent in the approach)
132Address space switches
- Primer on TLBs
- AS tagged TLB (MIPS R4000) vs untagged TLB (486)
- Untagged TLB requires flush on AS switch
- Instruction and data caches
- Usually physically tagged in most modern
processors so TLB flush has no effect - Address space switch
- Complete reload of Pentium TLB 864 cycles
133- Do we need a TLB flush always?
- Implementation issue of protection domains
- SPIN implements protection domains as Modula
names within a single hardware address space - Liedtke suggests similar approach in the
microkernel in an architecture-specific manner - PowerPC use segment registers gt no flush
- Pentium or 486 share the linear hardware address
space among several user address spaces gt no
flush - There are some caveats in terms of size of user
space and how many can be packed in a 232
global space
134- Upshot?
- Address space switching among medium or small
protection domains can ALWAYS be made efficient
by careful construction of the microkernel - Large address spaces switches are going to be
expensive ALWAYS due to cache effects and TLB
effects, so switching cost is not the most
critical issue
135Thread switches and IPC
136Segment switch (instead of AS switch) makes cross
domain calls cheap
137Memory Effects System
138Capacity induced MCPI
139Portability Vs. Performance
- Microkernel on top of abstract hardware while
portable - Cannot exploit hardware features
- Cannot take precautions to avoid performance
problems specific to an arch - Incurs performance penalty due to abstract layer
140Examples of non-portability
- Same processor family
- Use address space switch implementation
- TLB flush method preferable for 486
- Segment register switch preferable for Pentium
- gt 50 change of microkernel!
- IPC implementation
- Details of the cache layout (associativity)
requires different handling of IPC buffers in 486
and Pentium - Incompatible processors
- Exokernel on R4000 (tagged TLB) Vs. 486 (untagged
TLB) - gt Microkernels are inherently non-portable
141Summary
- Minimal set of abstractions in microkernel
- Microkernels are processor specific (at least in
implementation) and non-portable - Right abstractions and processor-specific
implementation leads to efficient
processor-independent abstractions at higher
layers
142Performance
143Key points
- Goal extensibility akin to SPIN and Exokernel
goals - Main difference support running several
commodity operating systems on the same hardware
simultaneously without sacrificing performance or
functionality - Why?
- Application mobility
- Server consolidation
- Co-located hosting facilities
- Distributed web services
- .
144Multiprocessor OS
- Synchronization
- Communication
- Scheduling
- We have seen these issues already in the other
readings in this section of the course
145Key Issues
- Modern parallel machines
- Large system sizes stressing bottlenecks in
system software (e.g. global data structures) - Higher memory latencies
- NUMA effects (i.e. symmetric assumption does not
hold - Cache hierarchy
- Write sharing expensive due coherence traffic
- False sharing due to large cache lines
146Thesis of Tornado paper
- In designing multiprocessor OS
- Pay attention to locality
- Reduce shared system data structures
- Reduce distance between accessing processor and
target memory module
147Effect of global data structure shared counter
148Tornado design approach
- Object-oriented design for scalability
- Clustered objects
- Protected procedure call with a view to
preserving locality while ensuring concurrency - Semi automatic garbage collection for localizing
locking - OS objects have multiple implementations
- Low overhead version when scalability is not
required - Resort to scalable implementation when
performance critical - Optimize common case
- Object invocation should be fast object
creation/destruction can be slower - Page fault handling should be fast memory region
creation/deletion can be slower
149Next Lecture
- Process and Thread
- Cooperative Task Management Without Manual Stack
Management, by Atul Adya, et.al. - Capriccio Scalable Threads for Internet
Services, by Ron Von Behrn, et. al. - The Performance Implication of Thread Management
Alternative for Shared-Memory Multiprocessors,
Thomas E. Anderson, et.al.