Implementation and Performance of Munin (Distributed Shared Memory System) - PowerPoint PPT Presentation

About This Presentation
Title:

Implementation and Performance of Munin (Distributed Shared Memory System)

Description:

Shared address space spanning the processors of a distributed memory multiprocessor ... Orca: reliable ordered broadcast protocol ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 45
Provided by: eecgTo
Category:

less

Transcript and Presenter's Notes

Title: Implementation and Performance of Munin (Distributed Shared Memory System)


1
ECE 1147, Parallel Computation Oct. 30, 2006
Implementation and Performance of Munin
(Distributed Shared Memory System)
Dongying Li
(Original Authors J. B. Carter, et al.)
Department of Electrical and Computer
Engineering University of Toronto
2
Distributed Shared Memory
  • Shared address space spanning the processors of a
    distributed memory multiprocessor

proc1
proc3
proc2
X0
X0
X0
X0
3
Distributed Shared Memory
shared memory
network
mem0
mem1
mem2
memN
...
proc0
proc1
proc2
procN
4
Distributed Shared Memory
  • Challenges
  • Good performance comparable to shared memory
    programs
  • No significant deviation from shared memory
    coding model
  • Low communication and message passing overheads

5
Munin System
  • Characterized features
  • Software released consistency
  • Multiple consistency protocols
  • Deviations from shared memory model
  • Annotated shared memory variable pattern
  • All Synchronization visible to system

6
Contents
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

7
Basic Concepts
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

8
Shared Object
8-kilo
8-kilo
8-kilo
x
x
x
y
9
Software Release Consistency
  • Sequential Consistency
  • All processors observe the same order
  • Must correspond to some serial order
  • Only ordering constraint is that reads/writes of
    P1 appear in the same order, but no restrictions
    on relative ordering between processors.
  • Synchronous read/write
  • Writes must be propagated before moving on to the
    next operation

10
Software consistency
  • Problems
  • Message passing overhead
  • False sharing

w(x)
w(x)
w(x)
r(x)
r(y)
r(y)
11
Weak Consistency
  • Data modifications only propagated at
    synchronization.
  • Works fine if program properly synchronized
    through system primitives.

w(x)
w(x)
w(x)
r(x)
r(y)
r(y)
synch
12
Weak Consistency
w(x)
w(x)
r(y)
r(y)
r(x)
synch
13
Software Release Consistency
  • Special weak consistency protocol
  • Reduction of message passing overhead
  • Two categories of shared variable operations
  • Ordinary access
  • Read
  • Write
  • Synchronization access (lock, semaphore, barrier)
  • Acquire
  • Release

14
Software Release Consistency
  • Before ordinary access (read, write) allowed, all
    previous acquire performed
  • Before release allowed, all previous ordinary
    access performed
  • Before acquire allowed, all previous release
    performed
  • Before release allowed, all previous acquire
    performed
  • In a word, results of writes prior to a release
    propagated before next processor acquiring this
    released lock

15
Eager Release Consistency
  • Write propagating at release

16
Lazy Release Consistency
  • Write propagating at acquire

17
Multiple Consistency Protocols
  • No single consistency protocol suitable for all
    parallelization purpose
  • Shared variables accessed in different ways
    within single program
  • Variable access pattern changes during execution
  • Multiple protocols allow access pattern-oriented
    tuning for different shared variables

18
Multiple Consistency Protocols
  • High-level sharing pattern annotation
  • Specified in shared variable declaration
  • Combinations of low-level protocol parameters
  • Low-level protocol parameter
  • Specified in shared variable directory
  • Specific aspect of protocol

19
Protocol Parameters
  • I invalidate or update?
  • R Replicas allowed?
  • D Delayed operation allowed?
  • FO Having fixed owner?
  • M Multiple writers allowed?
  • S Stable access pattern?
  • FL Flushing changes to owner?
  • W Writable? (write protected?)

20
Sharing annotations
  • Read only
  • Simplest pattern once initialized, no further
    access
  • Suitable for constant etc.
  • Migratory
  • Only one thread can access at one period of time
  • Suitable for variables accessed only in critical
    session
  • Write-shared
  • Can be written concurrently by multiple threads
  • Different threads update different words of
    variable
  • Producer-consumer
  • Written only by one threads and read by others
  • Replicate and update the object, not invalidate

21
Sharing annotations
  • Example producer-consumer
  • for some number of timesteps/iterations
  • for (i0 iltn i )
  • for( j1, jltn, j )
  • tempij 0.25
  • ( gridi-1j gridi1j
  • gridij-1 gridij1 )
  • for( i0 iltn i )
  • for( j1 jltn j )
  • gridij tempij
  • back

22
Sharing annotations
  • Reduction
  • Accessed by fetching and operation (read, write
    then release)
  • Example min(), a
  • Result
  • Phase 1 multiple write allowed
  • Phase 2 one thread (the result) access
    exclusively
  • Conventional
  • Conventional update protocol for shared variables

23
Sharing annotations
Sharing Annotations Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters
Sharing Annotations I R D FO M S FL W
Read-only N Y - - - - - N
Migratory Y N - N N - N Y
Write-shared N Y Y N Y N N Y
Producer-Consumer N Y Y N Y Y N Y
Reduction N Y N Y N - N Y
Result N Y Y Y Y - Y Y
Conventional Y Y N N N - N Y
24
Software Implementation
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

25
Prototype Overview
  • A simple processor converting annotations to
    suitable format
  • A linker creating the shared memory segment
  • Library routines linked into program
  • Operating system support for fault handling and
    page table manipulation

26
Execution Process
  • Compiling

Munin processor
Sharing annotations
Auxiliary file
Linker
Shared data description table
Shared data segment
27
Execution Process
  • Initialization

Munin root thread
user root thread
User_init()
P1
Code copy
Data segment
P2
Munin worker thread
. .
Code copy
Data segment
Pn
Munin worker thread
28
Execution Process
  • Synchronization

Munin root thread
P1
Synchronization operation
P2
User thread
Munin worker thread
. .
Pn
29
Advanced Programming Features
  • Associate data Synch
    back

rel(m)
msg
acq(m)
r(x)
r(x)
rel(m)
msg
w(x)
acq(m)
r(x)
30
Advanced Programming Features
  • PhaseChange()
  • Change the producer consumer relationship
  • Example adaptive mesh sor
  • ChangeAnnotation()
  • Change the access pattern in execution
  • Invalidate()
  • Flush()
  • SingleObject()
  • PreAcquire()

31
Data Object Directory
  • Start Address and Size
  • Protocol parameters
  • Object state (valid, writable, invalid)
  • Copyset (which remote has copies)
  • Synchq (corresponding synchronization object)
  • Probable owner
  • Home node
  • Access control semaphore
  • Links

32
Delayed Update Queue
rel(m)
acq(m)
w(x)
w(y)
x
x
y
33
Multiple Writer Handling
34
Multiple Writer Handling
35
Synchronization
  • Queue based synchronization
  • Request reply lock forward mechanism
  • AcquireLock(), Unlock(), WaitAtBarrier()

36
Performance
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

37
Matrix Multiply
38
Matrix Multiply Optimized
39
SOR
40
Effect of Multiple Protocols
Protocol Matrix Multiply SOR
Multiple 72.41 27.64
Write-shared 75.59 64.48
Conventional 75.85 67.64
41
Overview of Other DSM System
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

42
Overview of Other DSM System
  • Clouds per-segment (object) based consistency
    protocol
  • Mirage per-page based
  • Orca reliable ordered broadcast protocol
  • Amber user responsible for the data distribution
    among processors
  • Linda shared variable in tuple space, atomic
    operation insertion, removal, reading
  • Midway using entry consistency (weaker
    consistency than release consistency)
  • DASH hardware DSM

43
Conclusion
  • Objective efficient DSM system with similar
    protocol to shared memory programming and small
    message passing overhead
  • Special feature multiple protocols, software
    release consistency
  • Implementation synchronization realized by Munin
    root thread and Munin worker threads

44
Thank you
Write a Comment
User Comments (0)
About PowerShow.com