Title: Shared Memory Programming
1Shared Memory Programming
- Threads
- basics, why, POSIX API (Pthreads)
- Thread synchronization primitives
- mutexes, condition variables, read-write locks,
barrier - OpenMP
- the OpenMP programming model
- basic directives
- data handling in OpenMP
2Threads
- What is a thread?
- a single stream of control in the flow of a
program - Logical memory model of a thread
- shared variables in the global address space
- local variables on the threads own stack
- Why threads?
- software portability
- latency hiding
- scheduling and load balancing
- ease of programming, widespread use
3Thread synchronization primitives I
Race condition can easily occur when using
threads Example // each thread tries to update
variable best_cost as follows if (my_cost lt
best_cost) best_cost my_cost Assume that
at the beginning best_cost 100 thread
1 my_cost 75 thread 2 my_cost 50 If both
threads execute the comparison simultaneously,
the outcome depends on which thread finishes
first. It is possible that best_cost 75, i.e.
the program is incorrect.
4Thread synchronization primitives II
- What is needed atomic operations
- Basic terms
- mutual exclusion
- critical section
- Mutual exclusion lock (mutex)
- acquire the lock when entering the critical
section - pthread_mutex_lock()
- release the lock when leaving the critical
section - pthread_mutex_unlock()
- additional mutex functions
- pthread_mutex_init()
- pthread_mutex_trylock()
5Mutex example
Example Find and print the first k matching
database records. Each thread executes
findEntries() with a different starting pointer,
so that each handles about n/p records.
void findEntries(void start_ptr)
,,,db_rec_t next_rec
,,,int count ,
,,,curr_ptr
start_ptr ,,,do
,,,,,,next_rec nextEntry(curr_ptr)
,,,,,,count
outputRecord(next_rec)
,,, while (count lt k)
,
int outputRecord(db_rec_t rec_ptr)
,,,int count ,
,,,pthread_mutex_lock(out_cnt_lock)
,,,out_cnt
,,,count out_cnt ,,,pthread_mutex_un
lock(out_cnt_lock) ,,,if (count lt
k)
,,,,,,printRecord(rec_ptr)
,,,return count
6pthread_mutex_trylock() example
int outputRecord(db_rec_t rec_ptr)
,,,int count ,
,,,int lock_status
,
,,,lock_status pthread_mutex_trylock(out_cnt_lo
ck) ,,,if (lock_status EBUSY)
,,,,,,insert_into_local_list(rec_ptr)
,,,,,,return(0)
,,, else
,,,,,,count out_cnt
,,,,,,out_cnt
number_on_local_list1
,,,,,,pthread_mutex_unlock(out_cnt_lock)
,,,,,,printRecords(rec_ptr,
local_list, k-count)
,,,,,,return(count number_on_local_list1)
,,,
- pthread_mutex_trylock() is much faster then
pthread_mutex_lock() - the number of locking operations is reduced
- the number of records searched increases a bit
7Thread synchronization primitives III
- Condition variables
- always associated with a mutex
- a thread waits until the specified condition is
satisfied and the mutex is granted (or
interrupted by OS signal) - allow efficient, non-polling synchronization
- Read-write locks
- advantageous when there are many reads and few
writes - several threads can read simultaneously
- only one can write (and no readers can be
present) - Barrier
- the usual meaning
8OpenMP
- The OpenMP programming model
- high level directives translated by the
preprocessor into low, Pthread level calls - Basic directives
- parallelisation - thread creation and joining
- parallel for, sections
- synchronization directives barrier, single,
master, critical, atomic - Data handling in OpenMP
- different data classes private, shared,
firstprivate, lastprivate, - see www.openmp.org
9OpenMP Examples I
int a, b
main()
,,,// serial segment
,,,pragma
omp parallel num_threads(8) private (a) shared
(b) ,,,
,,,,,,// parallel segment
,,,
,,,//
rest of serial segment
int a, b
main()
,,,// serial segment
,,,for(i0 ilt8 i)
,,,,,,pthread_create(, thread_fn_name,
) ,,,for)i0 jlt8 i)
,,,,,,pthread_join(
)
,
void thread_fn_name(void
arguments) ,,,//parallel
segment
10OpenMP Examples II
Simple OpenMP program for calcularing B
pragma omp parallel default(private)
shared(npoints) ,,,,,,,,,,,,,,,,,,,,,reduction(
sum) num_threads(8)
,,,num_threads omp_get_num_threads()
,,,sample_points_pt
npoints/num_threads ,,,sum
0
,,,for(i0 iltsample_points_pt i)
,,,,,,rx random()-0.5 //
random number in 0.5 0.5 range
,,,,,,ry random()-0.5
,,,,,,if ((rxrx)(ryry)lt0.25)
,,,,,,,,,sum
,,,
11OpenMP Examples III
Calcularing B different way
pragma omp parallel default(private)
shared(npoints) ,,,,,,,,,,,,,,,,,,,,,reduction(
sum) num_threads(8)
,,,sum 0
,,,pragma omp for
,,,for(i0 iltsample_points_pt
i) ,,,,,,rx
random()-0.5 // random number in 0.5 0.5
range ,,,,,,ry random()-0.5
,,,,,,if
((rxrx)(ryry)lt0.25)
,,,,,,,,,sum
,,,
12Scheduling parallel for loops
- pragma omp parallel schedule(scheduling
class,parameter) - The parameter is typically chunk size
- Scheduling classes
- static
- dynamic
- guided progressively reducing chunk size
- runtime chosen at runtime depending on
environment variable
13OpenMP Examples IV
pragma omp parallel sections
,,,pragma omp
section ,,,
,,,,,,
,,,,,,,,,// producer thread
,,,,,,,,,taskproduce_task()
,,,,,,,,,pragma omp
critical(task_queue)
,,,,,,,,,
,,,,,,,,,,,,insert_into_task_queue(tas
k) ,,,,,,,,,
,, ,,,,,, ,
,,,pragma omp section
,,,,,,,,,,,, ,,,,,,,,,taskproduce_tas
k() ,, ,,,,,,,bb// consumer thread
,,,,,,,,,pragma omp
critical(task_queue)
,,,,,,,,,
,,,,,,,,,,,,itask
extract_from_queue()
,,,,,,,,,
,, ,,,,,,consume_task(task)
,,,,,,
14Data handling in OpenMP
- Data classes
- specified in the directives
- private threads private copy
- first_private private copy, initialized
- last_private updated by the last
section/iteration of the loop - shared shared, use cautiously
- threadprivate persists in the thread across
for loops, sections - Good practice
- use private as much as possible
- divide and distribute data and combine them
later using e.g. reduction clause