Title: Multiprocessors and Multi-computers
1Multiprocessors and Multi-computers
- Multi-computers
- Distributed address space accessible by local
processors - Requires message passing
- Programming tends to be more difficult
- Multiprocessors
- Single address space accessible by all processors
- Simultaneous access to shared variables can
produce inconsistent results - Generally programming is more convenient
- Doesnt scale to more than about sixteen
processors
2Shared Memory Hardware
Processes
Bus configuration
Crossbar switch configuration
3Cache Coherence
Significantly impacts performance
- Cache Coherence Protocol
- Write-Update All caches immediately updated with
altered data - Write-Invalidate Altered data is invalidated in
all caches. Updates take place only if
subsequently referenced - False Sharing Cache updates take place because
multiple processes access the same cache block
but not the same locations
Memory
y
x
y
x
Processor 2
Processor 1
Cache Blocks
Note Significant because each processor has a
local cache
4Shared Memory Access
- Critical Section
- A section of code that needs to be protected from
simultaneous access - Mutual Exclusion
- The mechanism used to enforce a critical section
- Locks
- Semaphores
- Monitors
- Condition Variables
Shared Variable
x
1
2
Process 1
Process 2
5Sequential Consistency
- Formally defined by Lamport (1979)
- A multiprocessor result is sequentially
consistent if - The operations of each individual processors
occur in proper sequence specified by its
program. - The overall output matches some sequential order
of operations by all the processors - Summary Arbitrary interleaving of instructions
does not affect the output generated.
6Deadlock
Resources permanently blocked waiting for needed
resources
R1
R2
Rn-1
Rn
- Necessary Conditions
- Circular Wait
- Limited Resource
- Non-preemptive
- Hold and Wait
P1
P2
Pn-1
Pn
Deadly Embrace
R1
R2
P1
P2
Two Process Deadlock
7Locks
Locks are the simplest mutual exclusion
mechanism Normally, these are provided by
operating system calls
- Single bit variable 1locked, 0unlocked
- Enter door and lock the door at entry
- Spin locks (busy wait locks)
- while (lock1) spin() // Normally involves
hardware supportlock 1// Critical
sectionlock 0 - Advantages Simple and easy to understand
- Disadvantages
- Poor use of the CPU if process does not block
while waiting - Its easy to skip the lock0 statement
- Examples Pthreads and openMP provide OS
abstractions
Note The while and lock setting must be atomic
8Semaphores
- Limits concurrent access
- An integer variable, s, controls the mechanism
- Operations
- P operation passeren in Dutch for to pass
s--while (slt0) wait()// Critical section
code - V operation vrigeven in Dutch for to
releasesif (slt0) unblock a waiting
process
- p(s) / Critical section / v(s)
- Notes
- Set s1 initially for s to be a binary semaphore
which acts like a lock. - Set skgt1 initially if k simultaneous entries are
possible - Set sklt0 for consumer processes waiting to
consume data produced - Disadvantage Its easy to skip the v operation
- Example UNIX OS
9Monitors
- A Class mechanism that limits access to a shared
resourcepublic class doIt public doIt()
//Constructor logic public synchronized
void critMethod() wait() // Wait
till another thread signals notify()
- Advantage Most natural mutual exclusive
mechanism - Disadvantage Requires a language that supports
the construct - Examples Java, ADA, Modula II
10Condition Variables
Mechanism to guarantee a global condition before
critical section entry
- Advantages
- Reduce overhead with checking if a global
variable reaches some value - Avoids having to frequently poll the global
variable - Disadvantage Its easy to skip the unlock
operations - Example Pthreads
- Notes
- wait() unlocks and locks mutex automatically
- Threads must already be waiting for a signal when
it is thrown
- Example
- Thread 1
- lock(mutex)
- while (cltgtVALUE)
- wait(cVar,mutex)
- // Critical section
- unlock(mutex)
- Thread 2
- if (cVALUE) signal(condVar)
11Shared Memory Programming Alternatives
- Heavyweight processes
- Modified syntax of an existing language (HP
Fortran) - Programming language designed for parallel
processing (ADA) - Compiler extensions to specify parallel execution
(OpenMP) - Thread programming standard Java Threads and
pthreads
12Threads
Definition Path of execution through a process
- Heavyweight processes (UNIX fork, wait, waitpid,
shmat, shmdt) - Disadvantage time and memory expensive
- Advantage A blocked process doesnt block the
other processes - Lightweight threads (pthreads library)
- Only needs to share stack space and instruction
counter - "Thread Safe" programming required to guarantee
consistent results - Pthreads
- Threads can be spawned and started by other
threads - They can run independently (detached from their
parent thread) or require joins for termination - Formation of thread pools are possible
- Threads communicate through signals
- Processing order is indeterminate
13Forks and Joins
General thread flow of control pid fork() if
(pid 0) / Do spawned thread code /
else / Do spawning thread code / if (pid
0) exit(0) else wait(0)
Note Detached processes run independently from
its parent without joins
14Processes and Threads
- Notes
- Threads can be three orders of magnitude faster
than processes - Thread safe library routines can be used by
multiple concurrent threads - Synchronization uses shared variables
15Example Program (summing numbers)
- Heavyweight UNIX processes (Section 8.7.1)
- Pseudo code
- Create semaphores
- Allocate shared memory and attach shared memory
- Load array with numbers
- Fork child processes
- IF Parent THEN sum parent section
- ELSE sum child section
- P(semaphore) Add to global sum V(semaphore)
- IF (child) terminate ELSE join
- Print results
- Release semaphores, detatch and release shared
memory
Note The Java and pthread version require about
half the code
16Modify Existing Language Syntax
Example Constructs
- Declaration of a shared memory variable
- shared int x
- Specify statements to execute concurrently
- par s1() s2() s3() sn()
- Iterations assigned to different processors
- forall (i0 iltn i) //code
- Examples High Performance Fortran and C
17Compiler Optimizations
- The following works because the statements are
independentforall (i 0 i lt P i) ai
0 - Bernsteins conditions
- Outputs from one processor cannot be inputs to
another - Outputs from the processors cannot overlap
- Example a x y b x z are okay to
execute simultaneously
18Java Threads
- Instantiate and run a thread
- ThreadClass t new ThreadClass().start()
- Thread class
- Class ThreadClass extends Thread
- public ThreadClass //Constructor
- public void run()
- while (true)
- //yield or sleep periodically.
- //thread code executed here.
-
19Pthreads
IEEE POSIX 1003.1c 1995 UNIX-based C
standardized API
- Advantages
- Industry standardized interface which replaces
vendor proprietary APIs - Thread creation, synchronization, and context
switching are implemented in user space without
kernel intervention, which is inherently more
efficient than kernel-based thread operations - User-level implementation provides the
flexibility to choose a scheduler that best suits
the application, independent of the kernel
scheduler. - Drawbacks
- Poor locality limits performance when accessing
shared data across processors - The Pthreads scheduler hasn't proven suited to
manage large numbers of threads - Shared memory multithreaded programs typically
follow the SPMD model - Most parallel programs still are course-grain in
design
20Performance Comparisons
Pthreads versus Kernel Threads
Real wall clock time (actual elapsed time) User
time spent in user mode Sys time spent in the
kernel within the process
21Compiler Extensions (openMP)
- Extensions for C/C, Fortran, and Java (JOMP)
- Consists of Compiler directives, library
routines and environment variables - Recognized industry standard developed in the
late 1990s - Designed for shared memory programming
- Uses fork-join model, but uses threads
- Parallel sections of code execute teams of
threads - General Syntax
- C pragma omp ltdirectivegt
- JOMP //omp ltdirectivegt