Title: Multiprocessors, Threads and Microkernels
1Multiprocessors, Threadsand Microkernels
2Motivation for Multiprocessors
- Enhanced Performance -
- Concurrent execution of tasks for increased
throughput (between processes) - Exploit Concurrency in Tasks (Parallelism within
process) - Fault Tolerance -
- graceful degradation in face of failures
3Basic MP Architectures
- Single Instruction Single Data (SISD)
- conventional uniprocessor designs.
- Single Instruction Multiple Data (SIMD)
- Vector and Array Processors
- Multiple Instruction Single Data (MISD)
- Not Implemented.
- Multiple Instruction Multiple Data (MIMD)
- conventional MP designs
4MIMD Classifications
- Tightly Coupled System - all processors share the
same global memory and have the same address
spaces (Typical SMP system). - Main memory for IPC and Synchronization.
- Loosely Coupled System - memory is partitioned
and attached to each processor. Hypercube,
Clusters (Multi-Computer). - Message passing for IPC and synchronization.
5MP Block Diagram
6Memory Access Schemes
- Uniform Memory Access (UMA)
- Centrally located
- All processors are equidistant (access times)
- NonUniform Access (NUMA)
- physically partitioned but accessible by all
- processors have the same address space
- NO Remote Memory Access (NORMA)
- physically partitioned, not accessible by all
- processors have own address space
7Other Details of MP
- Interconnection technology
- Bus
- Cross-Bar switch
- Multistage Interconnect Network
- Caching - Cache Coherence Problem!
- Write-update
- Write-invalidate
- bus snooping
8MP OS Structure - 1
- Separate Supervisor -
- all processors have own copy of the kernel.
- Some share data for interaction
- dedicated I/O devices and file systems
- good fault tolerance but bad for concurrency
- Master/Slave Configuration
- Master monitors status and assigns work
- Slaves schedulable pool of resources
- master can be bottleneck
- poor fault tolerance
9MP OS Structure - 2
- Symmetric Configuration - Most Flexible.
- all processors are autonomous, treated equal
- one copy of the kernel executed concurrently
across all processors - Synchronized access to shared data structures
- Lock entire OS - Floating Master
- Mitigated by dividing OS into segments that
normally have little interaction - multithread kernel and control access to
resources (continuum)
10MP Overview
MultiProcessor
SIMD
MIMD
Shared Memory (tightly coupled)
Distributed Memory (loosely coupled)
Symmetric (SMP)
Clusters
Master/Slave
11SMP OS Design Issues
- Threads - effectiveness of parallelism depends on
performance of primitives used to express and
control concurrency. - Process Synchronization - disabling interrupts is
not sufficient. - Process Scheduling - efficient, policy
controlled, task scheduling. Issues - Global versus Local (per CPU)
- Task affinity for a particular CPU
- resource accounting
- inter-thread dependencies
12SMP OS design issues - cont.
- Memory Management - complication of shared main
memory. - cache coherence
- memory access synchronization
- balancing overhead with increased concurrency
- Reliability and fault Tolerance - degrade
gracefully in the event of failures
13Typical SMP System
CPU
CPU
CPU
CPU
500MHz
cache
MMU
cache
MMU
cache
MMU
cache
MMU
System/Memory Bus
- Issues
- Memory contention
- Limited bus BW
- I/O contention
- Cache coherence
I/O subsystem
50ns
Bridge
INT
ether
System Functions (timer, BIOS, reset)
scsi
- Typical I/O Bus
- 33MHz/32bit (132MB/s)
- 66MHz/64bit (528MB/s)
video
14Some Useful Definitions
- Parallelism degree to which a multiprocessor
application achieves parallel execution - Concurrency Maximum parallelism an application
can achieve with unlimited processors - System Concurrency kernel recognizes multiple
threads of control in a program - User Concurrency User space threads (coroutines)
provide a natural programming model for
concurrent applications.
15Introduction to Threads
Multithreaded Process Model
Single-Threaded Process Model
Thread
Thread
Thread
Thread Control Block
Thread Control Block
Thread Control Block
Process Control Block
User Stack
Process Control Block
User Stack
User Stack
User Stack
Kernel Stack
User Address Space
User Address Space
Kernel Stack
Kernel Stack
Kernel Stack
16Process Concept Embodies
- Unit of Resource ownership - process is allocated
a virtual address space to hold the process image - Unit of Dispatching - process is an execution
path through one or more programs - execution may be interleaved with other processes
- These two characteristics are treated
independently by the operating system
17Threads
- Effectiveness of parallel computing depends on
the performance of the primitives used to express
and control parallelism - Separate notion of execution from Process
abstraction - Useful for expressing the intrinsic concurrency
of a program regardless of resulting performance - We will discuss three examples of threading
- User threads,
- Kernel threads and
- Scheduler Activations
18Threads cont.
- Thread Dynamic object representing an execution
path and computational state. - One or more threads per process, each having
- Execution state (running, ready, etc.)
- Saved thread context when not running
- Execution stack
- Per-thread static storage for local variables
- Shared access to process resources
- all threads of a process share a common address
space.
19Thread States
- Primary states
- Running, Ready and Blocked.
- Operations to change state
- Spawn new thread provided register context and
stack pointer. - Block event wait, save user registers, PC and
stack pointer - Unblock moved to ready state
- Finish deallocate register context and stacks.
20User Level Threads
- User level threads - supported by user level
threads libraries - Examples
- POSIX Pthreads, Mach C-threads, Solaris threads
- Benefits
- no modifications required to kernel
- flexible and low cost
- Drawbacks
- can not block without blocking entire process
- no parallelism (not recognized by kernel)
21Kernel Level Threads
- Kernel level threads - directly supported by
kernel, thread is the basic scheduling entity - Examples
- Windows 95/98/NT/2000, Solaris, Tru64 UNIX, BeOS,
Linux - Benefits
- coordination between scheduling and
synchronization - less overhead than a process
- suitable for parallel application
- Drawbacks
- more expensive than user-level threads
- generality leads to greater overhead
22Scheduler Activations
- Attempt to combine benefits of both user and
kernel threading support - blocking system call should not block whole
process - user space library should make scheduling
decisions - efficiency by avoiding unnecessary user, kernel
mode switches. - Kernel assigns a set of virtual processors to
each process. User library then schedules
threads on these virtual processors.
23Scheduler Activations
- An activation
- execution context for running thread
- Kernel passes new activation to library when
upcall is performed. - Library schedules user threads on activations.
- space for kernel to save processor context of
current user thread when stopped by kernel - upall performed when one of the following occurs
- user thread performs blocking system call
- blocked thread belonging to process, then its
library is notified allowing it to either
schedule a new thread or resume the preempted
thread.
24Pthreads
- a POSIX standard (IEEE 1003.1c) API for thread
creation and synchronization. - API specifies behavior of the thread library,
implementation is up to development of the
library. - Common in UNIX operating systems.
25UNIX Support for Threading
- BSD
- process model only. 4.4 BSD enhancements.
- Solaris
- user threads, kernel threads, LWPs and in 2.6
Scheduler Activations - Mach
- kernel threads and tasks. Thread libraries
provide semantics of user threads, LWPs and
kernel threads. - Digital UNIX - extends MACH to provide usual UNIX
semantics. - Pthreads library.
26Solaris Threads
- Supports
- user threads (uthreads) via libthread and
libpthread - LWPs, abstraction that acts as a virtual CPU for
user threads. - LWP is bound to a kthread.
- kernel threads (kthread), every LWP is associated
with one kthread, however a kthread may not have
an LWP - interrupts as threads
27Solaris kthreads
- Fundamental scheduling/dispatching object
- all kthreads share same virtual address space
(the kernels) - cheap context switch - System threads - example STREAMS, callout
- kthread_t, /usr/include/sys/thread.h
- scheduling info, pointers for scheduler or sleep
queues, pointer to klwp_t and proc_t
28Solaris LWP
- Kernel provided mechanism to allow for both user
and kernel thread implementation on one platform.
- Bound to a kthread
- LWP data (see /usr/include/sys/klwp.h)
- user-level registers, system call params,
resource usage, pointer to kthread_t and proc_t - All LWPs in a process share
- signal handlers
- Each may have its own
- signal mask
- alternate stack for signal handling
- No global name space for LWPs
29Solaris User Threads
- Implemented in user libraries
- library provides synchronization and scheduling
facilities - threads may be bound to LWPs
- unbound threads compete for available LWPs
- Manage thread specific info
- thread id, saved register state, user stack,
signal mask, priority, thread local storage - Solaris provides two libraries libthread and
libpthread. - Try man thread or man pthreads
30Solaris Thread Data Structures
proc_t
p_tlist
kthread_t
t_procp
t_lwp
klwp_t
t_forw
lwp_thread
lwp_procp
31Solaris Threading Model (Combined)
Process 2
Process 1
user
Int kthr
kernel
hardware
32Solaris User Level Threads
Stop
Wakeup
Runnable
Continue
Stop
Stopped
Sleeping
Preempt
Dispatch
Stop
Sleep
Active
33Solaris Lightweight Processes
Timeslice or Preempt
Stop
Running
Wakeup
Dispatch
Blocking System Call
Runnable
Stopped
Continue
Wakeup
Stop
Blocked
34Solaris Interrupts
- One system wide clock kthread
- pool of 9 partially initialized kthreads per CPU
for interrupts - interrupt thread can block
- interrupted thread is pinned to the CPU
35Solaris Signals and Fork
- Divided into Traps (synchronous) and interrupts
(asynchronous) - each thread has its own signal mask, global set
of signal handlers - Each LWP can specify alternate stack
- fork replicates all LWPs
- fork1 only the invoking LWP/thread
36Mach
- Two abstractions
- Task - static object, address space and system
resources called port rights. - Thread - fundamental execution unit and runs in
context of a task. - Zero or more threads per task,
- kernel schedulable
- kernel stack
- computational state
- Processor sets - available processors divided
into non-intersecting sets. - permits dedicating processor sets tasks
37Mach c-thread Implementations
- Coroutine-based - multiple user threads onto a
single-threaded task - Thread-based - one-to-one mapping from c-threads
to Mach threads. Default. - Task-based - One Mach Task per c-thread.
38Digital UNIX
- Based on Mach 2.5 kernel
- Provides complete UNIX programmers interface
- 4.3BSD code and ULTRIX code ported to Mach
- u-area replaced by utask and uthread
- proc structure retained
- Threads
- Signals divided into synchronous and asynchronous
- global signal mask
- each thread can define its own handlers for
synchronous signals - global handlers for asynchronous signals
39Windows 2000 Threads
- Implements the one-to-one mapping.
- Each thread contains
- - a thread id
- - register set
- - separate user and kernel stacks
- - private data storage area
40Linux Threads
- Linux refers to them as tasks rather than
threads. - Thread creation is done through clone() system
call. - Clone() allows a child task to share the address
space of the parent task (process)
414.4 BSD UNIX
- Initial support for threads implemented but not
enabled in distribution - Proc structure and u-area reorganized
- All threads have a unique ID
- How are the proc and u areas reorganized to
support threads?
42Microkernel
- Transition to Microkernel discussion
43Microkernel
- Small operating system core
- Contains only essential operating systems
functions - Many services traditionally included in the
operating system are now external subsystems - device drivers
- file systems
- virtual memory manager
- windowing system and security services
44Microkernel Benefits
- Portability
- isolate port specific code to microkernel
- Reliability
- modular design, small microkernel, simpler
validation - Uniform interface
- all services are provided by means of message
passing - Extensibility
- allows the addition of new services
45Microkernel Benefits
- Flexibility
- existing features can be subtracted
- Distributed system support
- message are sent without knowing what the target
machine is or where it is located - Object-oriented operating system
- components are objects with clearly defined
interfaces that can be interconnected to form
software
46Microkernel Design
- Primitive memory management
- mapping each virtual page to a physical page
frame grant, map and flush. - Inter-process communication
- I/O and interrupt management