Title: Pthreads: A shared memory programming model
1Pthreads A shared memory programming model
- POSIX standard shared memory multithreading
interface. - Not just for parallel programming, but for
general multithreaded programming - Provide primitives for thread management and
synchronization. - Threads are commonly associated with shared
memory architectures and operating systems. - Necessary for unleashing the computing power of
SMT and CMP processors. - Making it easy and efficient is very important at
this time.
2Pthreads execution model
- A single process can have multiple, concurrent
execution paths. - a.out creates a number of threads that can be
scheduled and run concurrently. - Each thread has local data, but also, shares the
entire resources (global data) of a.out. - Any thread can execute any subroutine at the same
time as other threads. - Threads communicate through global memory.
3Fork-join model for executing threads in an
application
Master thread
Fork
Parallel region
Join
4What does the developer have to do?
- Decide how to decompose the computation into
parallel parts. - Create and destroy threads to support the
decomposition - Add synchronization to make sure dependences are
covered.
5Creation
- Thread equivalent of fork()
- int pthread_create(
- pthread_t thread,
- pthread_attr_t attr,
- void (start_routine)(void ),
- void arg
- )
- Returns 0 if OK, and non-zero (gt 0) if error.
- Start_routine is what the thread will execute.
6Termination
- Thread Termination
- Return from initial function.
- void pthread_exit(void status)
- Process Termination
- exit() called by any thread
- main() returns
7Waiting for child thread
- int pthread_join( pthread_t tid, void status)
- Equivalent of waitpid()for processes
8Detaching a thread
- The detached thread can act as daemon thread
- The parent thread doesnt need to wait the tid
storage is reclaimed when the thread is done. - Mainly to save space.
- int pthread_detach(pthread_t tid)
- Detaching self
- pthread_detach(pthread_self())
9Example of thread creation
10General pthread structure
- A thread is a concurrent execution of a function
- The threaded version of the program must be
restructured such that the parallel part forms a
separate function. - See example1.c, example2.c
11Matrix Multiply
- For (I0 Iltn I)
- for (j0 jltn j)
- cIj 0
- for (k0 kltn k)
- cIj cIj aIk
bkj
12Parallel Matrix Multiply
- All I- or j-iterations can be run in parallel
- If we have p processors, n/p rows to each
processor - Corresponds to partitioning I-loop
-
-
13Matrix Multiply parallel part
- void mmult(void s)
-
- int slice (int ) s
- int from slice n / p
- int to ((slice 1)n/p)
- for (Ifrom Iltto I)
- for (j0 jltn j)
- cIj 0
- for (k0 kltn k)
- cIj aIkbkj
-
-
-
- In the parallel version
- We will need to know
- Number of threads (p)
- My ID mmult has a parameter for myid.
14Matrix Multiply Main (See mm_pthread.c)
- Int main()
-
- pthread_t thrdp
- int parap
- for (I0 Iltp I)
- paraI I / why do we need this? /
- pthread_create(thrdI, NULL, mmult,
(void )paraI) -
- for (Ifrom Iltto I)
- pthread_join(thrdI, NULL)
-
-
15General Program Structure
- Encapsulate parallel parts in functions.
- Use function arguments to parametrize what a
particular thread does. - Call pthread_create() with the function and
arguments, save thread identifier returned. - Call pthread_join() with that thread identifier
16Pthreads synchronization
- Create/exit/join
- Provides coarse grain synchronizations
- Requires thread creation/destruction
- Need for finer-grain synchronization
- Mutex locks, condition variables, semaphores
17Pthread synchronization
- Mutex
- Conditional variables
18Thread synchronization
- Most of threaded programs have threads that
interact with one another. - Interaction in the form of sharing access to
variables. - Multiple concurrent reads (ok)
- Multiple concurrent writes (not ok, outcome
non-deterministic) - One write, multiple reads (not ok, outcome
non-deterministic) - Needs to make sure that the outcome is
deterministic. - Synchronization allowing concurrent accesses to
variables, removing non-deterministic outcome by
enforcing the order of thread execution.
19Thread synchronization
- Typical types of synchronizations.
- Mutual exclusion (mutex in pthread)
Thread 2 insert B to tree
Thread 1 insert A to tree
Thread 2 lock(tree) insert A to tree
unlock(tree)
Thread 1 lock(tree) insert A to tree
unlock(tree)
20Thread synchronization
- Signal (ordering the execution of threads,
condition variable)
Thread 1 Thread 2
Thread 3 for (i0 ilt25
i) for (i25 ilt50 i) for
(i50 ilt75i) a(i1) a(i)1
a(i1) a(i) 1 a(i1)
a(i)1
Thread 1 Thread 2
Thread 3 for (i0 ilt25
i) a(i1) a(i)1
signal a(25) ready
wait for a(25) ready
for(i25ilt50i)
a(i1)
a(i)1
signal a(50) ready
wait for a(50) ready
21Mutex lock for mutual exclusion
- int counter 0
- void thread_func(void arg)
-
- int val
-
- / unprotected code why? /
- val counter
- counter val 1
- return NULL
22Mutex locks lock
- pthread_mutex_lock(pthread_mutex_t mutex)
- Tries to acquire the lock specified by mutex
- If mutex is already locked, then the calling
thread blocks until mutex is unlocked.
23Mutex locks unlock
- pthread_mutex_unlock(pthread_mutex_t mutex)
- If the calling thread has mutex currently locked,
this will unlock the mutex. - If other threads are blocked waiting on this
mutex, one will unblock and acquire mutex. - Which one is determined by the scheduler.
24Mutex example (See example3.c, example4.c)
- int counter 0
- ptread_mutex_t mutex PTHREAD_MUTEX_INITIALIZER
- void thread_func(void arg)
-
- int val
-
- / protected by mutex /
- Pthread_mutex_lock( mutex )
- val counter
- counter val 1
- Pthread_mutex_unlock( mutex )
- return NULL
25Condition Variable for signaling
- Think of Producer consumer problem
- Producers and consumers run in separate threads.
- Producer produces data and consumer consumes
data. - Producer has to inform the consumer when data is
available - Consumer has to inform producer when buffer space
is available
26Condition variables wait
- Pthread_cond_wait(pthread_cond_t cond,
pthread_mutex_t mutex) - Blocks the calling thread, waiting on cond.
- Unlock the mutex
- Re-acquires the mutex when unblocked.
27Condition variables signal
- Pthread_cond_signal(pthread_cond_t cond)
- Unblocks one thread waiting on cond.
- The scheduler determines which thread to unblock.
- If no thread waiting, then signal is a no-op
28Producer consumer program without condition
variables
29- / Globals /
- int data_avail 0
- pthread_mutex_t data_mutex PTHREAD_MUTEX_INITIAL
IZER - void producer(void )
-
- Pthread_mutex_lock(data_mutex)
- Produce data
- Insert data into queue
- data_avail1
- Pthread_mutex_unlock(data_mutex)
30- void consumer(void )
-
- while( !data_avail )
- / do nothing keep looping!!/
-
- Pthread_mutex_lock(data_mutex)
-
- Extract data from queue
- if (queue is empty)
- data_avail 0
- Pthread_mutex_unlock(data_mutex)
- consume_data()
31Producer consumer with condition variables
32- int data_avail 0
- pthread_mutex_t data_mutex PTHREAD_MUTEX_INITIAL
IZER - pthread_cont_t data_cond PTHREAD_COND_INITIALIZE
R - void producer(void )
-
- Pthread_mutex_lock(data_mutex)
- Produce data
- Insert data into queue
- data_avail 1
- Pthread_cond_signal(data_cond)
- Pthread_mutex_unlock(data_mutex)
33- void consumer(void )
-
- Pthread_mutex_lock(data_mutex)
- while( !data_avail )
- / sleep on condition variable/
- Pthread_cond_wait(data_cond, data_mutex)
-
- / woken up /
- Extract data from queue
- if (queue is empty)
- data_avail 0
- Pthread_mutex_unlock(data_mutex)
- consume_data()
34A note on condition variables
- A signal is forgotten if there is no
corresponding wait that has already occurred. - If you want the signal to be remembered, use
semaphores.
35Semaphores
- Counters for resources shared between threads.
- Sem_wait(sem_t sem)
- Blocks until the semaphore vale is non-zero
- Decrements the semaphore value on return.
- Sem_post(sem_t sem)
- Unblocks the semaphore and unblocks one waiting
thread - Increments the semaphore value otherwise
36Pipelined task parallelism with semaphore
- P1 for (I0 Iltnum_pics, read(in_pic) I)
- int_pic_1I trans1(in_pic)
- sem_post(event_1_2I)
-
- P2 for (I0 Iltnum_pics I)
- sem_wait(event_1_2I)
- int_pic_2I trans2(int_pic_1I)
- sem_post(event_2_3I)
-