Implementation and Performance of Munin (Distributed Shared Memory System) - PowerPoint PPT Presentation

About This Presentation
Title:

Implementation and Performance of Munin (Distributed Shared Memory System)

Description:

Shared address space spanning the processors of a distributed memory multiprocessor ... Mirage: per-page based. Orca: reliable ordered broadcast protocol ... – PowerPoint PPT presentation

Number of Views:539
Avg rating:3.0/5.0
Slides: 42
Provided by: eecgTo
Category:

less

Transcript and Presenter's Notes

Title: Implementation and Performance of Munin (Distributed Shared Memory System)


1
ECE 1147, Parallel Computation Oct. 30, 2006
Implementation and Performance of Munin
(Distributed Shared Memory System)
Dongying Li
(Original Authors J. B. Carter, et al.)
Department of Electrical and Computer
Engineering University of Toronto
2
Distributed Shared Memory
  • Shared address space spanning the processors of a
    distributed memory multiprocessor

proc1
proc3
proc2
X0
X0
X0
X0
3
Distributed Shared Memory
shared memory
network
mem0
mem1
mem2
memN
...
proc0
proc1
proc2
procN
4
Distributed Shared Memory
  • Design objectives
  • Good performance comparable to shared memory
    programs
  • No significant deviation from shared memory
    coding model
  • Low communication and message passing overheads

5
Munin System
  • Characterized features
  • Software released consistency
  • Multiple consistency protocols
  • Same interface with shared memory code model
  • Threads, syncs, data sharing etc.
  • Deviations
  • All shared variable annotated by access pattern
  • Syncs explicitly visible to runtime system
    (important for release consistency!)

6
Contents
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

7
Basic Concepts
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

8
Shared Object
8-kilo
8-kilo
8-kilo
x
x
x
y
9
Software Release Consistency
  • Sequential Consistency
  • All processors observe the same order
  • Must correspond to some serial order
  • Only ordering constraint is that reads/writes of
    P1 appear in the same order, but no restrictions
    on relative ordering between processors.
  • Synchronous read/write
  • Writes must be propagated before moving on to the
    next operation

10
Software Release Consistency
  • Special weak consistency protocol
  • Reduction of message passing overhead
  • Two categories of shared variable operations
  • Ordinary access
  • Read
  • Write
  • Synchronization access (lock, semaphore, barrier)
  • Acquire
  • Release

11
Software Release Consistency
  • Before ordinary access (read, write) allowed, all
    previous acquire performed
  • Before release allowed, all previous ordinary
    access performed
  • Before acquire allowed, all previous release
    performed
  • Before release allowed, all previous acquire
    performed
  • In a word, results of writes prior to a release
    propagated before next processor acquiring this
    released lock

12
Release Consistency
  • Write propagating at release

13
Multiple Consistency Protocols
  • No single consistency protocol suitable for all
    parallelization purpose
  • Shared variables accessed in different ways
    within single program
  • Variable access pattern changes during execution
  • Multiple protocols allow access pattern-oriented
    tuning for different shared variables

14
Multiple Consistency Protocols
  • High-level sharing pattern annotation
  • Specified in shared variable declaration
  • Combinations of low-level protocol parameters
  • Low-level protocol parameter
  • Specified in shared variable directory
  • Specific aspect of protocol

15
Protocol Parameters
  • I propagate invalidating or updating after
    modification?
  • R Replicas allowed in other nodes?
  • D Delayed operation (update, invalidation)
    allowed?
  • FO Having fixed owner (no writes at other
    nodes)?
  • M Multiple writers allowed?
  • S Stable sharing pattern (accessed by fixed
    threads)?
  • FL Flush changes to owner invalidate local
    copy?
  • W Writable?

16
Sharing annotations
  • Read only
  • Simplest pattern once initialized, no further
    access
  • Suitable for constant etc.
  • Migratory
  • Only one thread can access at one period of time
  • Suitable for variables accessed only in critical
    session
  • Write-shared
  • Can be written concurrently by multiple threads
  • Different threads update different words of
    variable
  • Producer-consumer
  • Written only by one threads and read by others
  • Replicate and update the object, not invalidate

17
Sharing annotations
  • Example producer-consumer
  • for some number of timesteps/iterations
  • for (i0 iltn i )
  • for( j1, jltn, j )
  • tempij 0.25
  • ( gridi-1j gridi1j
  • gridij-1 gridij1 )
  • for( i0 iltn i )
  • for( j1 jltn j )
  • gridij tempij

18
Sharing annotations
  • Reduction
  • Accessed by fetching and operation (read, write
    then release)
  • Example min(), a
  • Result
  • Phase 1 multiple write allowed
  • Phase 2 one thread (the result) access
    exclusively
  • Conventional
  • Conventional update protocol for shared variables

19
Sharing annotations
w(x)
w(x)
r(x)
w(x)
w(x)
w(x)
w(x)
r(x)
w(x)
w(x)
20
Sharing annotations
Sharing Annotations Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters Protocol Parameters
Sharing Annotations I R D FO M S FL W
Read-only N Y - - - - - N
Migratory Y N - N N - N Y
Write-shared N Y Y N Y N N Y
Producer-Consumer N Y Y N Y Y N Y
Reduction N Y N Y N - N Y
Result N Y Y Y Y - Y Y
Conventional Y Y N N N - N Y
21
Software Implementation
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

22
Prototype Overview
  • A simple processor converting annotations to
    suitable format
  • A linker creating the shared memory segment
  • Library routines linked into program
  • Operating system support for page fault handling
    and page table manipulation

23
Execution Process
  • Compiling

Munin processor
Sharing annotations
Auxiliary files
Linker
Shared data description table
Shared data segment
24
Execution Process
  • Initialization

Munin root thread
user root thread
User_init()
P1
Code copy
Data segment
P2
Munin worker thread
. .
Code copy
Data segment
Pn
Munin worker thread
25
Execution Process
  • Synchronization

Munin root thread
P1
Synchronization operation
P2
User thread
Munin worker thread
. .
Pn
26
Advanced Programming Features
  • Associate data Synch

rel(m)
msg
acq(m)
r(x)
r(x)
rel(m)
msg
w(x)
acq(m)
r(x)
27
Advanced Programming Features
  • PhaseChange()
  • Change the producer consumer relationship
  • Example adaptive mesh sor
  • ChangeAnnotation()
  • Change the access pattern in execution
  • Invalidate()
  • Flush()
  • SingleObject()
  • PreAcquire()

28
Data Object Directory
  • Start Address and Size
  • Protocol parameters
  • Object state (valid, writable, invalid)
  • Copyset (which remote has copies)
  • Synchq (corresponding synchronization object)
  • Probable owner
  • Home node
  • Access control semaphore
  • Links

29
Delayed Update Queue
rel(m)
acq(m)
w(x)
w(y)
x
x
y
30
Multiple Writer Handling
31
Synchronization
  • Queue based synchronization
  • Request reply lock forward mechanism
  • CreateLock(), AcquireLock(), ReleaseLock(),
    CreateBarrier(), WaitAtBarrier()

32
Performance
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

33
Matrix Multiply
34
Matrix Multiply Optimized
35
SOR
36
Effect of Multiple Protocols
Protocol Matrix Multiply SOR
Multiple 72.41 27.64
Write-shared 75.59 64.48
Conventional 75.85 67.64
37
Performance Problem with Munin
  • Note inefficient performance for task-queue
    model! (TSP-Q, quicksort, etc.)
  • Eg. Speed up with MPI for TSP (16 procs)
  • code I code II
  • 8.9 13.4
  • Speed up with Munin
  • code I code II
  • 6.0 8.9
  • Major overhead time for thread waiting at the
    lock which protects the work queue caused by
    transferring whole work queue between threads

38
Overview of Other DSM System
  • Basic concepts
  • Shared object
  • Software release consistency
  • Multiple consistency protocols
  • Software implementation
  • Prototype overview
  • Execution process
  • Advanced programming features
  • Data object directory and delayed update queue
  • Synchronization
  • Performance
  • Overview of other DSM systems
  • Conclusion

39
Overview of Other DSM System
  • Clouds per-segment (object) based consistency
    protocol
  • Mirage per-page based
  • Orca reliable ordered broadcast protocol
  • Amber user responsible for the data distribution
    among processors
  • Linda shared variable in tuple space, atomic
    operation insertion, removal, reading
  • Midway using entry consistency (weaker
    consistency than release consistency)
  • DASH hardware DSM

40
Conclusion
  • Objective efficient DSM system with similar
    protocol to shared memory programming and small
    message passing overhead
  • Special feature multiple protocols, software
    release consistency
  • Implementation synchronization realized by Munin
    root thread and Munin worker threads

41
Thank you
Write a Comment
User Comments (0)
About PowerShow.com