Title: Shared Memory Parallel Programming
1Shared Memory Parallel Programming
2Parallel Programming Models
- Recall we distinguished several kinds of parallel
programming models - Low-level parallel programming models
- pthreads for shared memory, MPI for distributed
memory - High-level parallel programming
- OpenMP for shared memory, HPF for distributed
memory
3Parallel Programming Models
- High-level parallel programming
- programmer describes parallelism implicitly
- details of data and computation to be performed
on each processor determined by compiler - different approaches for shared memory and
distributed memory models
4Parallel Programming Models
- Low-level parallel programming
- programmer must describe parallelism explicitly
- data and computation to be performed on each
processor specified exactly by coder - different approaches for shared memory and
distributed memory models
5Parallel Programming Models
- Options for shared memory parallel programming
- OpenMP
- Vendor-specific directives
- Mostly superseded by OpenMP
- Unix utilities
- Fork, shmget
- Explicit threading
- E.g. Pthreads, Solaris UI threads
- Distributed memory API on shared memory
6Reminder Threads
- Process has its own address space
- A process may be executed by a team of threads
- A thread shares its address space with other
threads in same team - But thread stack provides space for data local
(private) to thread - Threads used for shared memory parallel
programming and multitasking
7Pthreads
- IEEE POSIX standard developed to replace
different vendor specific multithreading
libraries - Implementation called Pthreads or Posix threads
- Available on almost all Unix systems
- Offers a low-level programming interface
- No specific compiler support
Library is libpthread on most systems
8User-Level Multithreading
- Explicit shared memory parallel programming
- User fully specifies actions of each thread and
synchronizes (coordinates) threads - The good part control can be used to performance
benefit, can explicitly consider e.g. data
locality - The bad part can be complex, user has to deal
with it
9Applicability of Pthreads
- POSIX standard shared-memory multithreading
interface - For general multithreaded programming
- With C, not Fortran
- Provides primitives for process management and
synchronization - But missing many conveniences
Note std. makes assumptions about memory
consistency
10Rough Comparison with OpenMP
- Typically more work to create program
- Not incremental
- Can dynamically spawn (create) threads
- Better support for exception and signal handling
- Supports unstructured control flow
- Missing many conveniences such as atomic,
barrier, flush primitives, reductions,
Suited to different kinds of applications
11Uses of Pthreads
- Client-server applications
- GUI handlers
- Signal handlers
- Pipelined applications
- Can be used wherever OpenMP can be used, but
possibly with much more effort - (Much) higher maintenance costs
- Many more changes to code
12Pthreads
- We only look briefly at a subset of Pthreads
- For complete information, many good references
exist - Multithreaded Programming With Pthreads, Bill
Lewis, Daniel J. Berg - Pthreads Programming, Bradford Nichols, Dick
Buttlar, Jacqueline Proulx Farrell, Jackie
Farrell - www.mit.edu/people/proven/pthreads.html
13What does the user have to do?
- Decide how to decompose the computation into
parallel parts - Create (and destroy) processes to implement that
decomposition - Add synchronization to make sure dependences are
respected
14 General Thread Structure
- Typically, a thread is a concurrent execution of
a function or a procedure - A program needs to be restructured so that
parallel parts form separate procedures or
functions
15Thread Creation
- int pthread_create
- (pthread_t new_id,
- const pthread_attr_t attr,
- void (func) (void ),
- void arg)
- new_id threads unique identifier
- attr ignore for now
- func function to be run in parallel
- arg arguments for function func
16Example of Thread Creation
- void func(void arg)
- int Iarg
- ..
-
- void main()
-
- int X pthread_t id
- .
- pthread_create(id, NULL, func, X)
-
-
17Example of Thread Creation (contd.)
main()
pthread_ create(func)
func()
18Example Hello World
include ltpthread.hgt include ltstdio.hgt define
NUM_THREADS 4 void PrintHello(void threadid)
printf("\nd Hello World!\n", threadid)
pthread_exit(NULL) int main (int argc, char
argv) pthread_t threadsNUM_THREADS
int rc, t for(t0 tltNUM_THREADS t)
printf("Creating thread d\n", t)
rc pthread_create(threadst, NULL,
PrintHello, (void )t) if (rc)
printf("ERROR return code from
pthread_create() is d\n", rc)
exit(-1)
pthread_exit(NULL)
19OpenMP Hello World Example
include omp.h int main(int argc, char argv)
pragma omp parallel
printf(d, Hello, world!\n,omp_get_thread_num())
return 0
20Compiling Pthreads Applications
- gcc hello.c lpthread
- icc hello.c lpthread
- g hello.c -lpthread
21Pthread Termination
- void pthread_exit(void status)
- Terminates the currently running thread
- Implicitly called when the function invoked in
pthread_create returns
22Cancelling a thread
A thread can terminate another thread during the
exceution int pthread_cancel (pthread_t thread)
int pthread_setcancelstate (int state, int
oldstate) int pthread_setcanceltype (int type,
int oldstype)
23Example Cancelling a thread
int main() pthread_t e_th
pthread_t f_th int rc /
creates both threads / rc
pthread_create(e_th, NULL, Thread, (void
)e_str) if (rc) return
-1 rc pthread_create(f_th, NULL,
Thread, (void )f_str) if (rc)
return -1 / sleeps a while /
sleep(10) / requests
cancellation / pthread_cancel(e_th)
pthread_cancel(f_th) / sleeps a bit
more / sleep(10)
pthread_exit(NULL)
void Thread(void string) int i
int o_state / disables cancelability
/ pthread_setcancelstate(PTHREAD_CANCEL_DI
SABLE, o_state) / writes five
messages / for (i0 ilt5 i)
printf("s\n", (char )string) /
restores cancelability /
pthread_setcancelstate(o_state, o_state)
/ writes further / while (1)
printf("s\n", (char )string)
pthread_exit(NULL)
24Thread Joining
- int pthread_join(
- pthread_t new_id,
- void status)
- Waits for the thread with identifier new_id to
terminate, either by returning or by calling
pthread_exit() - Status receives the return value or the value
given as argument to pthread_exit()
25Thread Joining Example
- void func(void ) ..
- pthread_t id int X
- pthread_create(id, NULL, func, X)
- ..
- pthread_join(id, NULL)
- ..
26Example of Thread Creation (contd.)
main()
pthread_ create(func)
func()
pthread_ join(id)
pthread_ exit()
27Attributes
- pthread_attr_init(pthread_attr_t attr)
- pthread_attr_destroy(pthread_attr_t attr)
- pthread_attr_setXXX()
- pthread_attr_setdetachstate(pthread_attr_t attr,
int detachstate) - pthread_attr_setstacksize (attr, stacksize)
28Matrix Multiply
- for( i0 iltn i )
- for( j0 jltn j )
- cij 0.0
- for( k0 kltn k )
- cij aikbkj
-
29Parallel Matrix Multiply
- All i- or j-iterations can be run in parallel
- If we have p processors (or cores), assign n/p
rows to each processor - Corresponds to partitioning i-loop
- Create a thread for each part
- Threads all execute same function, but on
different data sets
30Matrix Multiply Parallel Part
- void mmult(void s)
-
- int slice (int) s
- int from (slicen)/p
- int to ((slice1)n)/p
- for(ifrom iltto i)
- for(j0 jltn j)
- cij 0.0
- for(k0 kltn k)
- cij aikbkj
-
-
31Matrix Multiply Main
- int main()
-
- pthread_t thrdp
-
- for( i0 iltp i )
- pthread_create(thrdi, NULL, mmult,(void)
i) - for( i0 iltp i )
- pthread_join(thrdi, NULL)
-
32Sequential Jacobi Code
- for some number of timesteps/iterations
- for (i0 iltn i )
- for( j1, jltn, j )
- tempij 0.25
- ( gridi-1j gridi1j
- gridij-1 gridij1 )
- for( i0 iltn i )
- for( j1 jltn j )
- gridij tempij
33Recall Parallel Jacobi
- First (i,j) loop nest can be parallelized
- Second (i,j) loop nest can be parallelized
- Barrier after each loop nest
- Give n/p rows to each processor
- Again, each thread executes same function on its
own data (data parallel)
34Pthreads Jacobi Parallel parts (1)
- void jac_1(void s)
-
- int slice (int) s
- int from (slicen)/p
- int to ((slice1)n)/p
- for( ifrom iltto i)
- for( j0 jltn j )
- tempij 0.25(gridi-1j gridi1j
- gridij-1 gridij1)
-
35Pthreads Jacobi Parallel parts (2)
- void jac_2(void s)
-
- int slice (int) s
- int from (slicen)/p
- int to ((slice1)n)/p
- for( ifrom iltto i)
- for( j0 jltn j )
- gridij tempij
36Pthreads Jacobi main
- for some number of timesteps
- for( i0 iltp i )
- pthread_create(thrdi, NULL, jac_1, (void
)i) - for( i0 iltp i )
- pthread_join(thrdi, NULL)
- for( i0 iltp i )
- pthread_create(thrdi, NULL, jac_2, (void
)i) - for( i0 iltp i )
- pthread_join(thrdi, NULL)
37Synchronizations
- mutexes - Mutual exclusion lock Block access to
variables by other threads. This enforces
exclusive access by a thread to a variable or set
of variables. - joins - Make a thread wait till others are
complete (terminated). - condition variables - data type pthread_cond_t
38Mutex Example
/ Note scope of variable and mutex are the same
/ pthread_mutex_t mutex1 PTHREAD_MUTEX_INITIALI
ZER int counter0 / Function C / void
functionC() pthread_mutex_lock( mutex1 )
counter pthread_mutex_unlock( mutex1 )
39Conditional Variables
- A condition variable is a variable of type
pthread_cond_t and is used with the appropriate
functions for waiting and later, process
continuation. - The condition variable mechanism allows threads
to suspend execution and relinquish the processor
until some condition is true.
40Conditional Variable Functions
- Creating/Destroying
- pthread_cond_init
- pthread_cond_t cond PTHREAD_COND_INITIALIZER
- pthread_cond_destroy
- Waiting on condition
- pthread_cond_wait
- pthread_cond_timedwait - place limit on how long
it will block. - Waking thread based on condition
- pthread_cond_signal
- pthread_cond_broadcast - wake up all threads
blocked by the specified condition variable.
41Thread Pitfalls
- Race Conditions
- Thread safe code
- Deadlocks