Title: Lecture 19: Threads
1Lecture 19Threads
- Prof. Kenneth M. Mackenzie
- Computer Systems and Networks
- CS2200, Spring 2003
Includes slides from Bill Leahy
2Review Mutex
- Mutex -- mutual exclusion problem
- only one process/thread in a critical section
at a time - motivated by multiprocessor applies to
uniprocessor too - Five low-level solutions
- enable/disable interrupts (works
in-kernel only) - lock-free data structures (restrictive!)
- software solutions
- e.g. Petersons algorithm (O(n) time, space)
- special atomic operations in HW lt--- std.
solution - test set, swap
- speculate retry! possible
future soln?
3Today Threads
- 1. What are threads why do you want em
- 2. Styles of thread programming
- intro to POSIX threads (pthreads) API
- 3. Synchronization (again)
- from point of view of programmer, not of hardware
- primitives that block instead of spin-wait
- 4. Implementation
4Terminology
- Process full-blown virtual machine
- register set stack
- protected area of memory
- Thread multiplexed CPU only
- registers set stack
- A process may contain multiple threads. If so,
all threads see the same address space.
5Recall
- Process
- Program Counter
- Registers
- Stack
- Code (Text)
- Data
- Page Table
- etc.
- Processes must be protected from one another.
Memory
6Recall
- Context Switching
- Requires considerable work
- What about a single users application?
- Is there a way to make it more efficient
- In effect, allow the user to have multiple
processes executing in the same space? - Yes, solution Threads or Multithreading
7What is Multithreading?
- Technique allowing program to do multiple tasks
- Example Java GUI's
- Is it a new technique?
- no has existed since the 70s (concurrent
Pascal, Ada tasks, etc.)
8SMP?
- What is an SMP?
- Multiple CPUs in a single box sharing all the
resources such as memory and I/O - Is a dual-processor SMP more cost effective than
two uniprocessor boxes? - Yes, (roughly 20 more for a dual processor SMP
compared to a uniprocessor). - Modest speedup for a program on a dual-processor
SMP over a uniprocessor will make it worthwhile. - Example DELL WORKSTATION 650
- 2.4GHz Intel Xeon (Pentium 4)
- 1GB SDRAM memory, 80GB disk, 20/48X CD, 19
monitor, Quadro4 900XGL Graphics card, RedHat
Linux, 3yrs service - 2,584 for 2nd processor, add 434
9What is a Thread?
- Basic unit of CPU utilization
- Consists of
- Program Counter
- Register Set
- Stack Space
- Shares with peer threads
- Code
- Data
- OS Resources
- Open files
- Signals
10Process Vs. Thread
P1
P2
user
PCB
PCB
kernel
Kernel code and data
- Two single-threaded applications on one machine
11Process Vs. Thread
P1
P2
user
PCB
PCB
kernel
Kernel code and data
- P1 is multithreaded P2 is single-threaded
- Computational state (PC, regs, ) for each thread
- How different from process state?
12Threads
- Can be context switched more easily than
processes - Just registers and PC
- Not memory management
- Can run on different processors concurrently in
an SMP - Share CPU in a uniprocessor
- May (Will) require concurrency control
programming like mutex locks.
13Why use threads?
- Multiprocessor
- convenient way to use the multiple processors
- all memory is shared
- Uniprocessor?
14Threads in a Uniprocessor?
Process
active
- Allows concurrency between I/O and user
processing even in a uniprocessor box
15Example
- Omni 3000 Reading System
- scanner
- OCR software
- speech-to-text software
- on-screen viewer with bouncing ball
highlighting - Reading process
- 1. scan
- 2. OCR
- 3. speak and display simultaneously
20 seconds 30 seconds 10s to 100s of seconds
16Example
speech
scan
OCR
GUI
pipeline speak/display page N OCR on page
N1 Scan page N2
17Example
- Natural implementation with four threads
- Easy to write each thread does one thing
- Alternatives? Could interleave the four
operations - except thats really hard to do right
- OCR was an opaque component
18Thread Programming
- Three common models
- one per processor model
- workpile or pool of threads model
- pipeline model
19One per Processor
main
thread
workers one per physical processor
synchronization
Common strategy in multiprocessing
20Workpile model
- Central pile of work to do
- queue or other data structure
- N threads (gt of processors)
- read unit of work from pile
- do work, possibly generating more work
- add new work to pile
21Pipeline Model
- As in reading system application
- Good for tolerating I/O delays
- Also used in heterogenous system
- e.g. system with specialized processors
22Mailbox
Mailbox
23Programming Support for Threads
- Creation
- pthread_create(top-level procedure, args)
- Termination
- return to top-level procedure
- explicit kill
- Rendezvous
- creator can wait for children
- pthread_join(child_tid)
- Synchronization
- mutex
- condition variables
24Programming with Threads
- Synchronization
- For coordination of the threads
- Communication
- For inter-thread sharing of data
- Threads can be in different processors
- How to achieve sharing in SMP?
- Software accomplished by keeping all threads in
the same address space by the OS - Hardware accomplished by hardware shared memory
and coherent caches
25Synchronization
26Synchronization Primitives
- mutual exclusion
- enforce locks among threads
- pthread_mutex_lock
- pthread_mutex_unlock
- Two issues
- 1. mutex isnt very general
- 2. what do you do when you dont get the mutex?
- Plan
- 1. a more general construct semaphore w/examples
- 2. spinning vs. blocking
- 3. very general construct condition variables
27Semaphore
- semaphore is a positive integer
- sem_wait()
- wait if integer is zero
- decrement integer
- sem_signal()
- increment integer
28Case 1/3
29Case 2/3
- mutex semaphore works as a mutex
- mutex is a degenerate binary semaphore
30Case 3/3
- Counting, e.g. slots in a bounded buffer
31How to wait?
- 1. spin!
- easy to implement
- - locks out the thread youre waiting for?
32How to wait?
- 1. spin!
- 2. switch-spin
- easy to implement
- doesnt scale
- ideal for two threads, though
33How to wait?
- 1. spin!
- 2. switch-spin
- 3. sleep-spin
- easy to implement
- - wasteful
34How to wait?
- 1. spin!
- 2. switch-spin
- 3. sleep-spin
- 4. block
- - hard to implement
- - expensive
- absolutely the right thing if youre waiting a
long time
35How to wait?
- 1. spin!
- 2. switch-spin
- 3. sleep-spin
- 4. block
- 5. hybrid strategy
- e.g. spin 10uS then block
- competative no worse than 2x cost of blocking
36Example
Initially mutex is unlocked resource_state is
FREE
- lock(mutex)
- while (resource_state BUSY)
- //spin
- resource_state BUSY
- unlock(mutex)
- use resource
- lock(mutex)
- resource_state FREE
- unlock(mutex)
37Example
- lock(mutex)
- while (resource_state BUSY)
- //spin
- resource_state BUSY
- unlock(mutex)
- use resource
- lock(mutex)
- resource_state FREE
- unlock(mutex)
Thread 1
38Example
- lock(mutex)
- while (resource_state BUSY)
- //spin
- resource_state BUSY
- unlock(mutex)
- use resource
- lock(mutex)
- resource_state FREE
- unlock(mutex)
Thread 2
Thread 1
39Example
- lock(mutex)
- while (resource_state BUSY)
- //spin
- resource_state BUSY
- unlock(mutex)
- use resource
- lock(mutex)
- resource_state FREE
- unlock(mutex)
Thread 2
Thread 1
40Example with cond-var
- lock(mutex)
- while(resource_state BUSY)
- wait(cond_var) / implicitly give up mutex /
- / implicitly re-acquire mutex
/ -
- resource_state BUSY
- unlock(mutex)
- / use resource /
- lock(mutex)
- resource_state FREE
- unlock(mutex)
- signal(cond_var)
41Example with cond-var
- lock(mutex)
- while(resource_state BUSY)
- wait(cond_var) / implicitly give up mutex /
- / implicitly re-acquire mutex
/ -
- resource_state BUSY
- unlock(mutex)
- / use resource /
- lock(mutex)
- resource_state FREE
- unlock(mutex)
- signal(cond_var)
T1
42Example with cond-var
- lock(mutex)
- while(resource_state BUSY)
- wait(cond_var) / implicitly give up mutex /
- / implicitly re-acquire mutex
/ -
- resource_state BUSY
- unlock(mutex)
- / use resource /
- lock(mutex)
- resource_state FREE
- unlock(mutex)
- signal(cond_var)
T2
T1
43pthreads
- Mutex
- Must create mutex variables
- pthread_mutex_t padlock
- Must initialize mutex variable
- pthread_mutex_init(padlock, NULL)
- Condition Variable (used for signaling
- Must create condition variables
- pthread_cond_t non_full
- Must initialize condition variables
- pthread_cond_init(non_full, NULL)
44Classic CS ProblemProducer Consumer
- Producer
- If (! full)
- Add item to buffer
- empty FALSE
- if(buffer_is_full)
- full TRUE
- Consumer
- If (! empty)
- Remove item from buffer
- full FALSE
- if(buffer_is_empty)
- empty TRUE
buffer
...
45Example Producer Threads Program
- while(forever)
- // produce item
- pthread_mutex_lock(padlock)
- while (full)
- pthread_cond_wait(non_full, padlock)
- // add item to buffer
- if (buffercount BUFFERSIZE)
- full TRUE
- empty FALSE
- pthread_mutex_unlock(padlock)
- pthread_cond_signal(non_empty)
46Example Consumer Threads Program
- while(forever)
- pthread_mutex_lock(padlock)
- while (empty)
- pthread_cond_wait (non_empty, padlock)
- // remove item from buffer
- full false
- if (buffercount 0)
- empty true
- pthread_mutex_unlock(padlock)
- pthread_cond_signal(non_full)
- // consume_item
47// Producer while(forever) // produce
item pthread_mutex_lock(padlock) while (full)
pthread_cond_wait(non_full, padlock) // add
item to buffer if (buffercount BUFFERSIZE)
full TRUE empty FALSE pthread_mutex_unl
ock(padlock) pthread_cond_signal(non_empty) /
/ Consumer while(forever) pthread_mutex_lock(pa
dlock) while (empty) pthread_cond_wait
(non_empty, padlock) // remove item from
buffer full false if (buffercount 0)
empty true pthread_mutex_unlock(padlock)
pthread_cond_signal(non_full) //
consume_item
48Thread Implementation
49Threads Implementation
- User level threads
- OS independent
- Scheduler is part of the runtime system
- Thread switch is cheap (save PC, SP, regs)
- Scheduling customizable, i.e., more application
control - Blocking call by thread blocks process
50- Solution to blocking problem in user level
threads - Non-blocking version of all system calls
- Switching among user level threads
- Yield voluntarily
- How to make preemptive?
- Timer interrupt from kernel to switch
51- Kernel Level
- Expensive thread switch
- Makes sense for blocking calls by threads
- Kernel becomes complicated process vs. threads
scheduling - Thread packages become non-portable
- Problems common to user and kernel level threads
- Libraries
- Solution is to have thread-safe wrappers to such
library calls
52Solaris Threads
- Three kinds
- user, lwp, kernel
- User Any number can be created and attached to
lwps - One to one mapping between lwp and kernel threads
- Kernel threads known to the OS scheduler
- If a kernel thread blocks, associated lwp, and
user level threads block as well
53Solaris Terminology
54More Conventional Terminology
Processes
P1
P2
P3
Thread
kernel thread (user-level view)
(Inside the kernel)
55Kernel Threads vs. User Threads
- Advantages of kernel threads
- Can be scheduled on multiple CPUs
- Can be preempted by CPU
- Kernel scheduler knows their relative priorities
- Advantages of user threads
- (Unknown to kernel)
- Extremely lightweight No system call to needed
to change threads.
56Things to know?
- 1. The reason threads are around?
- 2. Benefits of increased concurrency?
- 3. Why do we need software controlled "locks"
(mutexes) of shared data? - 4. How can we avoid potential deadlocks/race
conditions. - 5. What is meant by producer/consumer thread
synchronization/communication using pthreads? - 6. Why use a "while" loop around a
pthread_cond_wait() call?
57Things to know?
- 7. Why should we minimize lock scope (minimize
the extent of code within a lock/unlock block)? - 8. Do you have any control over thread
scheduling?