Multiprocessors and Multi-computers - PowerPoint PPT Presentation

About This Presentation
Title:

Multiprocessors and Multi-computers

Description:

Title: Shared Memory Programming Author: harveyd Last modified by: Southern Oregon University Created Date: 5/7/2002 12:02:54 AM Document presentation format – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 22
Provided by: harveyd
Learn more at: http://webpages.sou.edu
Category:

less

Transcript and Presenter's Notes

Title: Multiprocessors and Multi-computers


1
Multiprocessors and Multi-computers
  • Multi-computers
  • Distributed address space accessible by local
    processors
  • Requires message passing
  • Programming tends to be more difficult
  • Multiprocessors
  • Single address space accessible by all processors
  • Simultaneous access to shared variables can
    produce inconsistent results
  • Generally programming is more convenient
  • Doesnt scale to more than about sixteen
    processors

2
Shared Memory Hardware
Processes
Bus configuration
Crossbar switch configuration
3
Cache Coherence
Significantly impacts performance
  • Cache Coherence Protocol
  • Write-Update All caches immediately updated with
    altered data
  • Write-Invalidate Altered data is invalidated in
    all caches. Updates take place only if
    subsequently referenced
  • False Sharing Cache updates take place because
    multiple processes access the same cache block
    but not the same locations

Memory
y
x
y
x
Processor 2
Processor 1
Cache Blocks
Note Significant because each processor has a
local cache
4
Shared Memory Access
  • Critical Section
  • A section of code that needs to be protected from
    simultaneous access
  • Mutual Exclusion
  • The mechanism used to enforce a critical section
  • Locks
  • Semaphores
  • Monitors
  • Condition Variables

Shared Variable
x
1
2
Process 1
Process 2
5
Sequential Consistency
  • Formally defined by Lamport (1979)
  • A multiprocessor result is sequentially
    consistent if
  • The operations of each individual processors
    occur in proper sequence specified by its
    program.
  • The overall output matches some sequential order
    of operations by all the processors
  • Summary Arbitrary interleaving of instructions
    does not affect the output generated.

6
Deadlock
Resources permanently blocked waiting for needed
resources
R1
R2
Rn-1
Rn
  • Necessary Conditions
  • Circular Wait
  • Limited Resource
  • Non-preemptive
  • Hold and Wait

P1
P2
Pn-1
Pn
Deadly Embrace
R1
R2
P1
P2
Two Process Deadlock
7
Locks
Locks are the simplest mutual exclusion
mechanism Normally, these are provided by
operating system calls
  • Single bit variable 1locked, 0unlocked
  • Enter door and lock the door at entry
  • Spin locks (busy wait locks)
  • while (lock1) spin() // Normally involves
    hardware supportlock 1// Critical
    sectionlock 0
  • Advantages Simple and easy to understand
  • Disadvantages
  • Poor use of the CPU if process does not block
    while waiting
  • Its easy to skip the lock0 statement
  • Examples Pthreads and openMP provide OS
    abstractions

Note The while and lock setting must be atomic
8
Semaphores
  • Limits concurrent access
  • An integer variable, s, controls the mechanism
  • Operations
  • P operation passeren in Dutch for to pass
    s--while (slt0) wait()// Critical section
    code
  • V operation vrigeven in Dutch for to
    releasesif (slt0) unblock a waiting
    process
  • p(s) / Critical section / v(s)
  • Notes
  • Set s1 initially for s to be a binary semaphore
    which acts like a lock.
  • Set skgt1 initially if k simultaneous entries are
    possible
  • Set sklt0 for consumer processes waiting to
    consume data produced
  • Disadvantage Its easy to skip the v operation
  • Example UNIX OS

9
Monitors
  • A Class mechanism that limits access to a shared
    resourcepublic class doIt public doIt()
    //Constructor logic public synchronized
    void critMethod() wait() // Wait
    till another thread signals notify()
  • Advantage Most natural mutual exclusive
    mechanism
  • Disadvantage Requires a language that supports
    the construct
  • Examples Java, ADA, Modula II

10
Condition Variables
Mechanism to guarantee a global condition before
critical section entry
  • Advantages
  • Reduce overhead with checking if a global
    variable reaches some value
  • Avoids having to frequently poll the global
    variable
  • Disadvantage Its easy to skip the unlock
    operations
  • Example Pthreads
  • Notes
  • wait() unlocks and locks mutex automatically
  • Threads must already be waiting for a signal when
    it is thrown
  • Example
  • Thread 1
  • lock(mutex)
  • while (cltgtVALUE)
  • wait(cVar,mutex)
  • // Critical section
  • unlock(mutex)
  • Thread 2
  • if (cVALUE) signal(condVar)

11
Shared Memory Programming Alternatives
  • Heavyweight processes
  • Modified syntax of an existing language (HP
    Fortran)
  • Programming language designed for parallel
    processing (ADA)
  • Compiler extensions to specify parallel execution
    (OpenMP)
  • Thread programming standard Java Threads and
    pthreads

12
Threads
Definition Path of execution through a process
  • Heavyweight processes (UNIX fork, wait, waitpid,
    shmat, shmdt)
  • Disadvantage time and memory expensive
  • Advantage A blocked process doesnt block the
    other processes
  • Lightweight threads (pthreads library)
  • Only needs to share stack space and instruction
    counter
  • "Thread Safe" programming required to guarantee
    consistent results
  • Pthreads
  • Threads can be spawned and started by other
    threads
  • They can run independently (detached from their
    parent thread) or require joins for termination
  • Formation of thread pools are possible
  • Threads communicate through signals
  • Processing order is indeterminate

13
Forks and Joins
General thread flow of control pid fork() if
(pid 0) / Do spawned thread code /
else / Do spawning thread code / if (pid
0) exit(0) else wait(0)
Note Detached processes run independently from
its parent without joins
14
Processes and Threads
  • Notes
  • Threads can be three orders of magnitude faster
    than processes
  • Thread safe library routines can be used by
    multiple concurrent threads
  • Synchronization uses shared variables

15
Example Program (summing numbers)
  • Heavyweight UNIX processes (Section 8.7.1)
  • Pseudo code
  • Create semaphores
  • Allocate shared memory and attach shared memory
  • Load array with numbers
  • Fork child processes
  • IF Parent THEN sum parent section
  • ELSE sum child section
  • P(semaphore) Add to global sum V(semaphore)
  • IF (child) terminate ELSE join
  • Print results
  • Release semaphores, detatch and release shared
    memory

Note The Java and pthread version require about
half the code
16
Modify Existing Language Syntax
Example Constructs
  • Declaration of a shared memory variable
  • shared int x
  • Specify statements to execute concurrently
  • par s1() s2() s3() sn()
  • Iterations assigned to different processors
  • forall (i0 iltn i) //code
  • Examples High Performance Fortran and C

17
Compiler Optimizations
  • The following works because the statements are
    independentforall (i 0 i lt P i) ai
    0
  • Bernsteins conditions
  • Outputs from one processor cannot be inputs to
    another
  • Outputs from the processors cannot overlap
  • Example a x y b x z are okay to
    execute simultaneously

18
Java Threads
  • Instantiate and run a thread
  • ThreadClass t new ThreadClass().start()
  • Thread class
  • Class ThreadClass extends Thread
  • public ThreadClass //Constructor
  • public void run()
  • while (true)
  • //yield or sleep periodically.
  • //thread code executed here.

19
Pthreads
IEEE POSIX 1003.1c 1995 UNIX-based C
standardized API
  • Advantages
  • Industry standardized interface which replaces
    vendor proprietary APIs
  • Thread creation, synchronization, and context
    switching are implemented in user space without
    kernel intervention, which is inherently more
    efficient than kernel-based thread operations
  • User-level implementation provides the
    flexibility to choose a scheduler that best suits
    the application, independent of the kernel
    scheduler.
  • Drawbacks
  • Poor locality limits performance when accessing
    shared data across processors
  • The Pthreads scheduler hasn't proven suited to
    manage large numbers of threads
  • Shared memory multithreaded programs typically
    follow the SPMD model
  • Most parallel programs still are course-grain in
    design

20
Performance Comparisons
Pthreads versus Kernel Threads
Real wall clock time (actual elapsed time) User
time spent in user mode Sys time spent in the
kernel within the process
21
Compiler Extensions (openMP)
  • Extensions for C/C, Fortran, and Java (JOMP)
  • Consists of Compiler directives, library
    routines and environment variables
  • Recognized industry standard developed in the
    late 1990s
  • Designed for shared memory programming
  • Uses fork-join model, but uses threads
  • Parallel sections of code execute teams of
    threads
  • General Syntax
  • C pragma omp ltdirectivegt
  • JOMP //omp ltdirectivegt
Write a Comment
User Comments (0)
About PowerShow.com