Title: Multiprocessors and Multithreading classroom slides
1Multiprocessors and Multithreading classroom
slides
2What is multithreading?
- technique allowing program to do multiple tasks
- is it a new technique?
- has existed since the 70s (concurrent Pascal,
Ada tasks, etc.) - why now?
- emergence of SMPs in particular
- time has come for this technology
3- What is an SMP?
- multiple CPUs in a single box sharing all the
resources such as memory and I/O - Is an SMP more cost effective than two
uniprocessor boxes? - yes (roughly 20 more for a dual processor SMP
compared to a uni) - modest speedup for a program on a dual-processor
SMP over a uni will make it worthwhile
4Example uses - 1
compute thread
I/O thread
compute
I/O request
I/O
I/O complete
I/O result Needed
I/O result Needed
compute
(a) Sequential process
(b) Multithreaded process
5Example uses - 2
Digitizer
Tracker
Alarm
6Programming Support for Threads
- creation
- pthread_create(top-level procedure, args)
- termination
- return from top-level procedure
- explicit kill
- rendezvous
- creator can wait for children
- pthread_join(child_tid)
- synchronization
- mutex
- condition variables
7Sample program thread create/join
- int foo(int n)
-
- .....
- return 0
-
- int main()
-
- int f
- thread_type child_tid
-
- .....
-
- child_tid thread_create (foo, f)
-
- .....
-
- thread_join(child_tid)
8Programming with Threads
- synchronization
- for coordination of the threads
- communication
- for inter-thread sharing of data
- threads can be in different processors
- how to achieve sharing in SMP?
- software accomplished by keeping all threads in
the same address space by the OS - hardware accomplished by hardware shared memory
and coherent caches
9Need for Synchronization
- digitizer()
-
- image_type dig_image
- int tail 0
- loop
- if (bufavail gt 0)
- grab(dig_image)
- frame_buftail mod MAX
- dig_image
- tail tail 1
- bufavail bufavail - 1
-
-
tracker() image_type track_image int head
0 loop if (bufavail lt MAX)
track_image frame_bufhead mod MAX
head head 1 bufavail bufavail
1 analyze(track_image)
Problem?
10Unsynchronized access to bufavail
digitizer
tracker
bufavail bufavail 1
bufavail
bufavail bufavail 1
Shared data structure
11Synchronization Primitives
- lock and unlock
- mutual exclusion among threads
- busy-waiting Vs. blocking
- pthread_mutex_trylock no blocking
- pthread_mutex_lock blocking
- pthread_mutex_unlock
12Fix number 1 with locks
- digitizer()
-
- image_type dig_image
- int tail 0
- loop
- thread_mutex_lock(buflock)
- if (bufavail gt 0)
- grab(dig_image)
- frame_buftail mod MAX
- dig_image
- tail tail 1
- bufavail bufavail - 1
-
- thread_mutex_unlock(buflock)
-
tracker() ( image_type track_image int head
0 loop thread_mutex_lock(buflock)
if (bufavail lt MAX) track_image
frame_bufhead
mod MAX head head 1
bufavail bufavail 1
analyze(track_image)
thread_mutex_unlock(buflock)
Problem?
13Fix number 2
- digitizer()
-
- image_type dig_image
- int tail 0
-
- loop
- grab(dig_image)
- thread_mutex_lock(buflock)
- while (bufavail 0) do nothing
- thread_mutex_unlock(buflock)
- frame_buftail mod MAX
- dig_image
- tail tail 1
- thread_mutex_lock(buflock)
- bufavail bufavail - 1
- thread_mutex_unlock(buflock)
-
tracker() image_type track_image
int head 0 loop thread_mutex_lock(bu
flock) while (bufavail MAX) do
nothing thread_mutex_unlock(buflock)
track_image frame_bufhead mod
MAX head
head 1 thread_mutex_lock(buflock)
bufavail bufavail 1 thread_mutex_unlock(b
uflock) analyze(track_image)
Problem?
14- condition variables
- pthread_cond_wait block for a signal
- pthread_cond_signal signal one waiting thread
- pthread_cond_broadcast signal all waiting threads
15Wait and signal with cond vars
16Fix number 3 cond var
- digitizer()
-
- image_type dig_image
- int tail 0
- loop
- grab(dig_image)
- thread_mutex_lock(buflock)
- if (bufavail 0) thread_cond_wait(buf_not_
full, - buflock)
- thread_mutex_unlock(buflock)
- frame_buftail mod MAX dig_image
- tail tail 1
- thread_mutex_lock(buflock)
- bufavail bufavail - 1
- thread_cond_signal(buf_not_empty)
- thread_mutex_unlock(buflock)
-
tracker() image_type track_image int head
0 loop thread_mutex_lock(buflock)
if (bufavail MAX) thread_cond_wait(buf_not_em
pty, buflock)
thread_mutex_unlock(buflock) track_image
frame_bufhead mod MAX head head 1
thread_mutex_lock(buflock) bufavail
bufavail 1 thread_cond_signal(buf_not_ful
l) thread_mutex_unlock(buflock)
analyze(track_image)
17Gotchas in programming with cond vars
- acquire_shared_resource()
-
- thread_mutex_lock(cs_mutex)
T3 is here - if (res_state BUSY)
- thread_cond_wait (res_not_busy, cs_mutex)
T2 is here - res_state BUSY
- thread_mutex_unlock(cs_mutex)
-
- release_shared_resource()
-
- thread_mutex_lock(cs_mutex)
- res_state NOT_BUSY
T1 is here - thread_cond_signal(res_not_buys)
- thread_mutex_unlock(cs_mutex)
-
18State of waiting queues
19Defensive programming retest predicate
- acquire_shared_resource()
-
- thread_mutex_lock(cs_mutex)
T3 is here - while (res_state BUSY)
- thread_cond_wait (res_not_busy, cs_mutex)
T2 is here - res_state BUSY
- thread_mutex_unlock(cs_mutex)
-
- release_shared_resource()
-
- thread_mutex_lock(cs_mutex)
- res_state NOT_BUSY
T1 is here - thread_cond_signal(res_not_buys)
- thread_mutex_unlock(cs_mutex)
-
20Threads as software structuring abstraction
21Threads and OS
- Traditional OS
- DOS
- memory layout
- protection between user and kernel?
22- protection between user and kernel?
- PCB?
23- programs in these traditional OS are single
threaded - one PC per program (process), one stack, one set
of CPU registers - if a process blocks (say disk I/O, network
communication, etc.) then no progress for the
program as a whole
24MT Operating Systems
- How widespread is support for threads in OS?
- Digital Unix, Sun Solaris, Win95, Win NT, Win XP
- Process Vs. Thread?
- in a single threaded program, the state of the
executing program is contained in a process - in a MT program, the state of the executing
program is contained in several concurrent
threads
25Process Vs. Thread
P1
P2
T3
T1
T2
T1
P1
P2
User
code
data
data
code
PCB
PCB
Kernel
kernel code and data
- computational state (PC, regs, ) for each thread
- how different from process state?
26(No Transcript)
27- threads
- share address space of process
- cooperate to get job done
- threads concurrent?
- may be if the box is a true multiprocessor
- share the same CPU on a uniprocessor
- threaded code different from non-threaded?
- protection for data shared among threads
- synchronization among threads
28- threads in a uniprocessor?
process
active
- allows concurrency between I/O and user
processing even in a uniprocessor box
29Threads Implementation
- user level threads
- OS independent
- scheduler is part of the runtime system
- thread switch is cheap (save PC, SP, regs)
- scheduling customizable, i.e., more app control
- blocking call by thread blocks process
30User
P2
P1
P3
T3
T2
T1
T3
T2
T1
Threads library
Threads library
T3
T1
T2
T3
T1
T2
thread ready_q
thread ready_q
mutex, cond_var
mutex, cond_var
Kernel
P3
P1
P2
process ready_q
31(No Transcript)
32- solution to blocking problem in user level
threads - non-blocking version of all system calls
- polling wrapper in scheduler for such calls
- switching among user level threads
- yield voluntarily
- how to make preemptive?
- timer interrupt from kernel to switch
33- Kernel level
- expensive thread switch
- makes sense for blocking calls by threads
- kernel becomes complicated process vs. threads
scheduling - thread packages become non-portable
- problems common to user and kernel level threads
- libraries
- solution is to have thread-safe wrappers to such
library calls
34(No Transcript)
35Solaris Threads
- Three kinds
- user, lwp, kernel
- user any number can be created and attached to
lwps - one to one mapping between lwp and kernel threads
- kernel threads known to the OS scheduler
- if a kernel thread blocks, associated lwp, and
user level threads block as well
36User
P2
P1
P3
T3
T2
T2
T1
T1
lwp
Kernel
Solaris threads
37Multiprocessor First Principles
- processors, memories, interconnection network
- Classification SISD, SIMD, MIMD, MISD
- message passing MPs e.g. IBM SP2
- shared address space MPs
- cache coherent (CC)
- SMP a bus-based CC MIMD machine
- several vendors Sun, Compaq, Intel, ...
- CC-NUMA SGI Origin 2000
- non-cache coherent (NCC)
- Cray T3D/T3E
38SMP
39SMP with per-processor caches
40Cache consistency problem
Shared Memory
Shared bus
X
X
X
P3
P2
P1
T3
T1
T2
41Two possible solutions