Shared%20Memory%20Programming%20via%20Posix%20threads - PowerPoint PPT Presentation

About This Presentation
Title:

Shared%20Memory%20Programming%20via%20Posix%20threads

Description:

Shared Memory Programming via Posix threads – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 20
Provided by: laxmika
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Shared%20Memory%20Programming%20via%20Posix%20threads


1
Shared Memory Programmingvia Posix threads
  • Laxmikant Kale
  • CS433

2
Shared Address Space Model
  • All memory is accessible to all processes
  • Processes are mapped to processors, typically by
    a symmetric OS
  • Coordination among processes
  • by sharing variables
  • Avoid stepping on toes
  • using locks and barriers

3
Running Example computing pi
  • Area of circle prr
  • Ratio of the area of a circle, and that of the
    enclosing square
  • p/4
  • Method compute a set of random number pairs (in
    the range 0-1) and count the number of pairs that
    fall inside the circle
  • The ratio gives us an estimate for p/4
  • In parallel Let each processor compute a
    different set of random number pairs (in the
    range 0-1) and count the number of pairs that
    fall inside the circle

4
Pi on shared memory
int count Lock countLock piFunction(int
myProcessor) seed s makeSeed(myProcessor)
for (I0 Ilt100000/P I) x random(s) y
random(s) if (xx yy lt 1.0)
lock(countLock) count unlock(countLock)
barrier() if (myProcessor 0)
printf(pif\n, 4count/100000)
5
main() countLock createLock()
parallel(piFunction)
The system needs to provide the functions for
locks, barriers, and thread (or process) creation.
6
How fast will this run?
  • Assume perfect shared memory machine
  • (I.e. no problem scaling up because of limited
    bandwidth to memory)
  • But locks are a sequential bottleneck
  • If you have lots of processors, you will find
    most of them in the queue waiting for the lock,
    at any given time
  • But we are doing really little work inside the
    locked critical section
  • But obtaining lock is expensive (we will revisit
    why? later).
  • Can we analyze the performance more precisely?
  • Let Tw be the time for computing outside the lock
  • Let Tc be the time in getting the lock, doing the
    critical section work, and unlocking
  • Let P be the number of processors

7
Analysis How fast will this run?
  • Can we analyze the performance more precisely?
  • Let Tw be the time for computing outside the lock
  • Let Tc be the time in getting the lock, doing the
    critical section work, and unlocking
  • Let P be the number of processors

Tw work
Tc critical section
8
Analysis How fast will this run?
  • The other case is when the work section larger
    than PTc
  • Write expressions for completion time in both
    cases

Tw work
Tc critical section
9
Pi on shared memory efficient version
int count Lock countLock piFunction(int
myProcessor) int c seed s
makeSeed(myProcessor) for (I0 Ilt100000/P
I) x random(s) y random(s)
if (xx yy lt 1.0) c lock(countLock)
count c unlock(countLock) barrier() if
(myProcessor 0) printf(pif\n,
4count/100000)
10
Real SAS systems
  • Posix threads (Pthreads) is a standard for
    threads-based shared memory programming
  • Shared memory calls just a few, normally
    standard calls
  • In addition, lower level calls fetch-and-inc,
    fetch-and-add

11
Posix Threads on Origin 2000
  • Shared memory programming on Origin 2000
    Important calls
  • Thread creation and joining
  • pthread_create(pthread_t threadID,
    At,functionName, (void ) arg)
  • pthread_join(pthread_t, threadID, void result)
  • Locks
  • pthread_mutex_t lock
  • pthread_mutex_lock(lock)
  • pthread_mutex_unlock(lock)
  • Condition variables
  • pthread_cond_t cv
  • pthread_cond_init(cv, (pthread_condattr_t ) 0)
  • pthread_cond_wait(cv, cv_mutex)
  • pthread_cond_broadcast(cv)
  • Semaphores, and other calls

Follow the web link on the class web page for
detailed documentation
12
Computing pi (Pthreads) Declarations
/ pgm.c / include ltpthread.hgt include
ltstdlib.hgt include ltstdio.hgt define nThreads
4 define nSamples 1000000 typedef struct
_shared_value pthread_mutex_t lock int
value shared_value shared_value sval
13
Function in each thread
void doWork(void id) size_t tid (size_t)
id int nsucc, ntrials, i ntrials
nSamples/nThreads nsucc 0
srand48((long) tid) for(i0iltntrialsi)
double x drand48() double y
drand48() if((xx yy) lt 1.0)
nsucc pthread_mutex_lock((sval.lock))
sval.value nsucc pthread_mutex_unlock((sval
.lock)) return 0
14
Main function
int main(int argc, char argv) pthread_t
tidsnThreads size_t i double est
pthread_mutex_init((sval.lock), NULL)
sval.value 0 printf("Creating Threads\n")
for(i0iltnThreadsi)
pthread_create(tidsi, NULL, doWork, (void )
i) printf("Created Threads... waiting for
them to complete\n") for(i0iltnThreadsi)
pthread_join(tidsi, NULL) printf("Threads
Completed...\n") est 4.0 ((double)
sval.value / (double) nSamples)
printf("Estimated Value of PI lf\n", est)
exit(0)
Init lock/s
Create threads
Wait for threads to complete
15
Compiling Makefile
Makefile for solaris FLAGS -mt for
Origin2000 FLAGS pgm pgm.c cc -o
pgm (FLAGS) pgm.c -lpthread clean rm
-f pgm .o
16
So, do we understand the prog. Model?
  • Consider the following code

a 1 if (b 0) if (z 0) z 1
t1 a 0
b 1 if (a 0) if (z0) z 2
t2 b 0
store a load b load z store z store t store a
store b load a load z store z store t store b
1 3 - - - -
2 4 - - -
4 1 5 7 10 11
2 3 6 8 9 12
1 2 5 6 7 8
3 4 - - -
Expectation (if z, t began as (0,0)) they can
be (0,0) (1,1) or (2,2) but not (1,2), or (2,1).
If each processor allows its instructions to be
out of order (as long as its own results are
consistent), the result can be wrong.
For example the store from processor A may get
delayed.
17
Sequential consistency
  • So, we want to state that the implementation
    should disallow such reordering (of one
    processors instructions)
  • As seen by other processors
  • I.e. it is not enough for processor A to issue
    its operation in order, they must be seen as
    completed by others in the same order
  • But we dont want to restrict the freedom of the
    processor any more than really necessary
  • Speed will suffer
  • Sequential consistency
  • A parallel program should behave as if there is
    one processor and one memory (and no cache)
  • I.e. the results should be as if the instructions
    were interleaved in some order

18
Sequential consistency
  • More precisely
  • operational semantics
  • behave as if there is a single FIFO queue of
    memory operations coming from all processors (and
    there is no cache)
  • Now, the architect must keep this contract in
    mind while building a machine, but the programmer
    has a concrete understanding of what to expect
    from their programs
  • and it agrees with their intuitions (for most
    people)..
  • The architect is NOT required to build such a
    FIFO queue
  • Just make sure the system behaves as if there is
    one.

19
Another example
Proc 1
Proc 2
a1 b 1
while (b0) // wait print a
We should not see a 0 printed, right?
But a and b may be in different memory modules
(or caches) and the change in b may become
visible to the second process before the change
in a
Sequential consistency forces the machine
(designer) to make a visible before b is visible
Write a Comment
User Comments (0)
About PowerShow.com