Multiprocessors and Multithreading classroom slides - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Multiprocessors and Multithreading classroom slides

Description:

Example use of threads - 2. Programming Support for Threads. creation ... return from top-level procedure. explicit kill. rendezvous. creator can wait for children ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 46
Provided by: umakishore
Category:

less

Transcript and Presenter's Notes

Title: Multiprocessors and Multithreading classroom slides


1
Multiprocessors and Multithreading classroom
slides
2
Example use of threads - 1
compute thread
I/O thread
compute
I/O request
I/O
I/O complete
I/O result Needed
I/O result Needed
compute
(a) Sequential process
(b) Multithreaded process
3
Example use of threads - 2
Digitizer
Tracker
Alarm
4
Programming Support for Threads
  • creation
  • pthread_create(top-level procedure, args)
  • termination
  • return from top-level procedure
  • explicit kill
  • rendezvous
  • creator can wait for children
  • pthread_join(child_tid)
  • synchronization
  • mutex
  • condition variables

5
Sample program thread create/join
  • int foo(int n)
  • .....
  • return 0
  • int main()
  • int f
  • thread_type child_tid
  • .....
  • child_tid thread_create (foo, f)
  • .....
  • thread_join(child_tid)

6
Programming with Threads
  • synchronization
  • for coordination of the threads
  • communication
  • for inter-thread sharing of data
  • threads can be in different processors
  • how to achieve sharing in SMP?
  • software accomplished by keeping all threads in
    the same address space by the OS
  • hardware accomplished by hardware shared memory
    and coherent caches

7
Need for Synchronization
  • digitizer()
  • image_type dig_image
  • int tail 0
  • loop
  • if (bufavail gt 0)
  • grab(dig_image)
  • frame_buftail mod MAX
  • dig_image
  • tail tail 1
  • bufavail bufavail - 1

tracker() image_type track_image int head
0 loop if (bufavail lt MAX)
track_image frame_bufhead mod MAX
head head 1 bufavail bufavail
1 analyze(track_image)
Problem?
8
digitizer
tracker
bufavail bufavail 1
bufavail
bufavail bufavail 1
Shared data structure
9
Synchronization Primitives
  • lock and unlock
  • mutual exclusion among threads
  • busy-waiting Vs. blocking
  • pthread_mutex_trylock no blocking
  • pthread_mutex_lock blocking
  • pthread_mutex_unlock

10
Fix number 1 with locks
  • digitizer()
  • image_type dig_image
  • int tail 0
  • loop
  • thread_mutex_lock(buflock)
  • if (bufavail gt 0)
  • grab(dig_image)
  • frame_buftail mod MAX
  • dig_image
  • tail tail 1
  • bufavail bufavail - 1
  • thread_mutex_unlock(buflock)

tracker() ( image_type track_image int head
0 loop thread_mutex_lock(buflock)
if (bufavail lt MAX) track_image
frame_bufhead
mod MAX head head 1
bufavail bufavail 1
analyze(track_image)
thread_mutex_unlock(buflock)
Problem?
11
Fix number 2
  • digitizer()
  • image_type dig_image
  • int tail 0
  • loop
  • grab(dig_image)
  • thread_mutex_lock(buflock)
  • while (bufavail 0) do nothing
  • thread_mutex_unlock(buflock)
  • frame_buftail mod MAX
  • dig_image
  • tail tail 1
  • thread_mutex_lock(buflock)
  • bufavail bufavail - 1
  • thread_mutex_unlock(buflock)

tracker() image_type track_image
int head 0 loop thread_mutex_lock(bu
flock) while (bufavail MAX) do
nothing thread_mutex_unlock(buflock)
track_image frame_bufhead mod
MAX head
head 1 thread_mutex_lock(buflock)
bufavail bufavail 1 thread_mutex_unlock(b
uflock) analyze(track_image)
Problem?
12
Fix number 3
  • digitizer()
  • image_type dig_image
  • int tail 0
  • loop
  • grab(dig_image)
  • while (bufavail 0) do nothing
  • frame_buftail mod MAX
  • dig_image
  • tail tail 1
  • thread_mutex_lock(buflock)
  • bufavail bufavail - 1
  • thread_mutex_unlock(buflock)

tracker() image_type track_image
int head 0 loop while (bufavail
MAX) do nothing track_image frame_bufhead
mod
MAX head head 1
thread_mutex_lock(buflock) bufavail
bufavail 1 thread_mutex_unlock(buflock)
analyze(track_image)
Problem?
13
  • condition variables
  • pthread_cond_wait block for a signal
  • pthread_cond_signal signal one waiting thread
  • pthread_cond_broadcast signal all waiting threads

14
Wait and signal with cond vars
15
Fix number 4 cond var
  • digitizer()
  • image_type dig_image
  • int tail 0
  • loop
  • grab(dig_image)
  • thread_mutex_lock(buflock)
  • if (bufavail 0) thread_cond_wait(buf_not_
    full,
  • buflock)
  • thread_mutex_unlock(buflock)
  • frame_buftail mod MAX dig_image
  • tail tail 1
  • thread_mutex_lock(buflock)
  • bufavail bufavail - 1
  • thread_cond_signal(buf_not_empty)
  • thread_mutex_unlock(buflock)

tracker() image_type track_image int head
0 loop thread_mutex_lock(buflock)
if (bufavail MAX) thread_cond_wait(buf_not_em
pty, buflock)
thread_mutex_unlock(buflock) track_image
frame_bufhead mod MAX head head 1
thread_mutex_lock(buflock) bufavail
bufavail 1 thread_cond_signal(buf_not_ful
l) thread_mutex_unlock(buflock)
analyze(track_image)
This solution is correct so long as there is
exactly one producer and one consumer
16
Gotchas in programming with cond vars
  • acquire_shared_resource()
  • thread_mutex_lock(cs_mutex)
  • if (res_state BUSY)
  • thread_cond_wait (res_not_busy, cs_mutex)
  • res_state BUSY
  • thread_mutex_unlock(cs_mutex)
  • release_shared_resource()
  • thread_mutex_lock(cs_mutex)
  • res_state NOT_BUSY
  • thread_cond_signal(res_not_busy)
  • thread_mutex_unlock(cs_mutex)

T3 is here
T2 is here
T1 is here
17
State of waiting queues
cs_mutex
T3
T2
cs_mutex
T3
res_not_busy
res_not_busy
T2
(a) Waiting queues after T1 signals
(a) Waiting queues before T1 signals
18
Defensive programming retest predicate
  • acquire_shared_resource()
  • thread_mutex_lock(cs_mutex)
    T3 is here
  • while (res_state BUSY)
  • thread_cond_wait (res_not_busy, cs_mutex)
    T2 is here
  • res_state BUSY
  • thread_mutex_unlock(cs_mutex)
  • release_shared_resource()
  • thread_mutex_lock(cs_mutex)
  • res_state NOT_BUSY
    T1 is here
  • thread_cond_signal(res_not_buys)
  • thread_mutex_unlock(cs_mutex)

19
Threads as software structuring abstraction
20
Threads and OS
  • Traditional OS
  • DOS
  • memory layout
  • protection between user and kernel?

21
  • Unix
  • memory layout
  • protection between user and kernel?
  • PCB?

22
  • programs in these traditional OS are single
    threaded
  • one PC per program (process), one stack, one set
    of CPU registers
  • if a process blocks (say disk I/O, network
    communication, etc.) then no progress for the
    program as a whole

23
MT Operating Systems
  • How widespread is support for threads in OS?
  • Digital Unix, Sun Solaris, Win95, Win NT, Win XP
  • Process Vs. Thread?
  • in a single threaded program, the state of the
    executing program is contained in a process
  • in a MT program, the state of the executing
    program is contained in several concurrent
    threads

24
Process Vs. Thread
P1
P2
T3
T1
T2
T1
P1
P2
User
code
data
data
code
PCB
PCB
Kernel
kernel code and data
  • computational state (PC, regs, ) for each thread
  • how different from process state?

25
(No Transcript)
26
  • threads
  • share address space of process
  • cooperate to get job done
  • threads concurrent?
  • may be if the box is a true multiprocessor
  • share the same CPU on a uniprocessor
  • threaded code different from non-threaded?
  • protection for data shared among threads
  • synchronization among threads

27
Threads Implementation
  • user level threads
  • OS independent
  • scheduler is part of the runtime system
  • thread switch is cheap (save PC, SP, regs)
  • scheduling customizable, i.e., more app control
  • blocking call by thread blocks process

28
User
P2
P1
P3
T3
T2
T1
T3
T2
T1
Threads library
Threads library
T3
T1
T2
T3
T1
T2
thread ready_q
thread ready_q
mutex, cond_var
mutex, cond_var
Kernel
P3
P1
P2
process ready_q
29
(No Transcript)
30
  • solution to blocking problem in user level
    threads
  • non-blocking version of all system calls
  • polling wrapper in scheduler for such calls
  • switching among user level threads
  • yield voluntarily
  • how to make preemptive?
  • timer interrupt from kernel to switch

31
  • Kernel level
  • expensive thread switch
  • makes sense for blocking calls by threads
  • kernel becomes complicated process vs. threads
    scheduling
  • thread packages become non-portable
  • problems common to user and kernel level threads
  • libraries
  • solution is to have thread-safe wrappers to such
    library calls

32
(No Transcript)
33
Solaris Threads
  • Three kinds
  • user, lwp, kernel
  • user any number can be created and attached to
    lwps
  • one to one mapping between lwp and kernel threads
  • kernel threads known to the OS scheduler
  • if a kernel thread blocks, associated lwp, and
    user level threads block as well

34
User
P2
P1
P3
T3
T2
T2
T1
T1
lwp
Kernel
Solaris threads
35
Thread safe libraries
/ original version / / thread safe
version /
mutex_lock_type cs_mutex void
malloc(size_t size) void malloc(size_t
size)
thread_mutex_lock(cs_mutex)
......
...... ...... ......

thread_mutex_unlock(cs_mutex)
return(memory_pointer) return
(memory_pointer)

36
Synchronization support
  • Lock
  • Test and set instruction

37
SMP
38
SMP with per-processor caches
39
Cache consistency problem
Shared Memory
Shared bus
X
X
X
P3
P2
P1
T3
T1
T2
40
Two possible solutions
41
Given the following details about an SMP
(symmetric multiprocessor) Cache coherence
protocol write-invalidate Cache to memory
policy write-back Initially The caches are
empty Memory locations A contains 10 B
contains 5 Consider the following timeline of
memory accesses from processors P1, P2, and
P3. Contents of caches and memory?
42
(No Transcript)
43
What is multithreading?
  • technique allowing program to do multiple tasks
  • is it a new technique?
  • has existed since the 70s (concurrent Pascal,
    Ada tasks, etc.)
  • why now?
  • emergence of SMPs in particular
  • time has come for this technology

44
  • threads in a uniprocessor?

process
active
  • allows concurrency between I/O and user
    processing even in a uniprocessor box

45
Multiprocessor First Principles
  • processors, memories, interconnection network
  • Classification SISD, SIMD, MIMD, MISD
  • message passing MPs e.g. IBM SP2
  • shared address space MPs
  • cache coherent (CC)
  • SMP a bus-based CC MIMD machine
  • several vendors Sun, Compaq, Intel, ...
  • CC-NUMA SGI Origin 2000
  • non-cache coherent (NCC)
  • Cray T3D/T3E

46
  • What is an SMP?
  • multiple CPUs in a single box sharing all the
    resources such as memory and I/O
  • Is an SMP more cost effective than two
    uniprocessor boxes?
  • yes (roughly 20 more for a dual processor SMP
    compared to a uni)
  • modest speedup for a program on a dual-processor
    SMP over a uni will make it worthwhile
Write a Comment
User Comments (0)
About PowerShow.com