Shared Memory Programming - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Shared Memory Programming

Description:

If both threads execute the comparison simultaneously, the outcome depends on ... several threads can read simultaneously. only one can write (and no readers ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 15
Provided by: Pao3
Category:

less

Transcript and Presenter's Notes

Title: Shared Memory Programming


1
Shared Memory Programming
  • Threads
  • basics, why, POSIX API (Pthreads)
  • Thread synchronization primitives
  • mutexes, condition variables, read-write locks,
    barrier
  • OpenMP
  • the OpenMP programming model
  • basic directives
  • data handling in OpenMP

2
Threads
  • What is a thread?
  • a single stream of control in the flow of a
    program
  • Logical memory model of a thread
  • shared variables in the global address space
  • local variables on the threads own stack
  • Why threads?
  • software portability
  • latency hiding
  • scheduling and load balancing
  • ease of programming, widespread use

3
Thread synchronization primitives I
Race condition can easily occur when using
threads Example // each thread tries to update
variable best_cost as follows if (my_cost lt
best_cost) best_cost my_cost Assume that
at the beginning best_cost 100 thread
1 my_cost 75 thread 2 my_cost 50 If both
threads execute the comparison simultaneously,
the outcome depends on which thread finishes
first. It is possible that best_cost 75, i.e.
the program is incorrect.
4
Thread synchronization primitives II
  • What is needed atomic operations
  • Basic terms
  • mutual exclusion
  • critical section
  • Mutual exclusion lock (mutex)
  • acquire the lock when entering the critical
    section
  • pthread_mutex_lock()
  • release the lock when leaving the critical
    section
  • pthread_mutex_unlock()
  • additional mutex functions
  • pthread_mutex_init()
  • pthread_mutex_trylock()

5
Mutex example
Example Find and print the first k matching
database records. Each thread executes
findEntries() with a different starting pointer,
so that each handles about n/p records.
void findEntries(void start_ptr)
,,,db_rec_t next_rec
,,,int count ,
,,,curr_ptr
start_ptr ,,,do

,,,,,,next_rec nextEntry(curr_ptr)
,,,,,,count
outputRecord(next_rec)
,,, while (count lt k)

,

int outputRecord(db_rec_t rec_ptr)
,,,int count ,

,,,pthread_mutex_lock(out_cnt_lock)
,,,out_cnt
,,,count out_cnt ,,,pthread_mutex_un
lock(out_cnt_lock) ,,,if (count lt
k)
,,,,,,printRecord(rec_ptr)
,,,return count

6
pthread_mutex_trylock() example
int outputRecord(db_rec_t rec_ptr)
,,,int count ,
,,,int lock_status
,

,,,lock_status pthread_mutex_trylock(out_cnt_lo
ck) ,,,if (lock_status EBUSY)
,,,,,,insert_into_local_list(rec_ptr)
,,,,,,return(0)
,,, else
,,,,,,count out_cnt
,,,,,,out_cnt
number_on_local_list1
,,,,,,pthread_mutex_unlock(out_cnt_lock)
,,,,,,printRecords(rec_ptr,
local_list, k-count)
,,,,,,return(count number_on_local_list1)
,,,
  • pthread_mutex_trylock() is much faster then
    pthread_mutex_lock()
  • the number of locking operations is reduced
  • the number of records searched increases a bit

7
Thread synchronization primitives III
  • Condition variables
  • always associated with a mutex
  • a thread waits until the specified condition is
    satisfied and the mutex is granted (or
    interrupted by OS signal)
  • allow efficient, non-polling synchronization
  • Read-write locks
  • advantageous when there are many reads and few
    writes
  • several threads can read simultaneously
  • only one can write (and no readers can be
    present)
  • Barrier
  • the usual meaning

8
OpenMP
  • The OpenMP programming model
  • high level directives translated by the
    preprocessor into low, Pthread level calls
  • Basic directives
  • parallelisation - thread creation and joining
  • parallel for, sections
  • synchronization directives barrier, single,
    master, critical, atomic
  • Data handling in OpenMP
  • different data classes private, shared,
    firstprivate, lastprivate,
  • see www.openmp.org

9
OpenMP Examples I
int a, b
main()
,,,// serial segment
,,,pragma
omp parallel num_threads(8) private (a) shared
(b) ,,,
,,,,,,// parallel segment
,,,
,,,//
rest of serial segment

int a, b
main()
,,,// serial segment

,,,for(i0 ilt8 i)
,,,,,,pthread_create(, thread_fn_name,
) ,,,for)i0 jlt8 i)

,,,,,,pthread_join(
)

,
void thread_fn_name(void
arguments) ,,,//parallel
segment
10
OpenMP Examples II
Simple OpenMP program for calcularing B
pragma omp parallel default(private)
shared(npoints) ,,,,,,,,,,,,,,,,,,,,,reduction(
sum) num_threads(8)

,,,num_threads omp_get_num_threads()
,,,sample_points_pt
npoints/num_threads ,,,sum
0
,,,for(i0 iltsample_points_pt i)
,,,,,,rx random()-0.5 //
random number in 0.5 0.5 range
,,,,,,ry random()-0.5
,,,,,,if ((rxrx)(ryry)lt0.25)
,,,,,,,,,sum
,,,


11
OpenMP Examples III
Calcularing B different way
pragma omp parallel default(private)
shared(npoints) ,,,,,,,,,,,,,,,,,,,,,reduction(
sum) num_threads(8)

,,,sum 0
,,,pragma omp for
,,,for(i0 iltsample_points_pt
i) ,,,,,,rx
random()-0.5 // random number in 0.5 0.5
range ,,,,,,ry random()-0.5
,,,,,,if
((rxrx)(ryry)lt0.25)
,,,,,,,,,sum
,,,

12
Scheduling parallel for loops
  • pragma omp parallel schedule(scheduling
    class,parameter)
  • The parameter is typically chunk size
  • Scheduling classes
  • static
  • dynamic
  • guided progressively reducing chunk size
  • runtime chosen at runtime depending on
    environment variable

13
OpenMP Examples IV
pragma omp parallel sections

,,,pragma omp
section ,,,
,,,,,,
,,,,,,,,,// producer thread

,,,,,,,,,taskproduce_task()
,,,,,,,,,pragma omp
critical(task_queue)
,,,,,,,,,
,,,,,,,,,,,,insert_into_task_queue(tas
k) ,,,,,,,,,
,, ,,,,,, ,

,,,pragma omp section
,,,,,,,,,,,, ,,,,,,,,,taskproduce_tas
k() ,, ,,,,,,,bb// consumer thread
,,,,,,,,,pragma omp
critical(task_queue)
,,,,,,,,,
,,,,,,,,,,,,itask
extract_from_queue()
,,,,,,,,,
,, ,,,,,,consume_task(task)
,,,,,,

14
Data handling in OpenMP
  • Data classes
  • specified in the directives
  • private threads private copy
  • first_private private copy, initialized
  • last_private updated by the last
    section/iteration of the loop
  • shared shared, use cautiously
  • threadprivate persists in the thread across
    for loops, sections
  • Good practice
  • use private as much as possible
  • divide and distribute data and combine them
    later using e.g. reduction clause
Write a Comment
User Comments (0)
About PowerShow.com