TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS - PowerPoint PPT Presentation

About This Presentation

Title:

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Description:

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 32

Provided by: Jehan79

Learn more at: https://www2.cs.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

1
TECHNIQUES FOR REDUCING CONSISTENCY-RELATED
COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

J. B. CarterUniversity of Utah
J. K. Bennett and W. ZwaenepoelRice University

2
INTRODUCTION

Distributed shared memory is a software
abstraction allowing a set of workstations
connected by a LAN to share a single paged
virtual address space
Key issue in building a software DSM is
minimizing the amount of data communication among
the workstation memories

3
Why bother with DSM?

Key idea is to build fast parallel computers that
are cheaper than conventional architectures
are convenient to use
Conventional parallel computer architecture was
the shared memory multiprocessor

4
Conventional parallel architecture
CPU
CPU
CPU
CPU
Shared memory
5
Todays architecture

Clusters of workstations are much more cost
effective
No need to develop complex bus and cache
structures
Can use off-the-shelf networking hardware
Gigabit Ethernet
Myrinet (1.5 Gb/s)
Can quickly integrate newest microprocessors

6
Limitations of cluster approach

Communication within a cluster of workstation is
through message passing
Much harder to program than concurrent access to
a shared memory
Many big programs were written for shared memory
architectures
Converting them to a message passing architecture
is a nightmare

7
Distributed shared memory
main memories
DSM one shared global address space
8
Distributed shared memory

DSM makes a cluster of workstations look like a
shared memory parallel computer
Easier to write new programs
Easier to port existing programs
Key problem is that DSM only provides the
illusion of having a shared memory architecture
Data must still move back and forth among the
workstations

9
Characterizing a DSM (I)

Four important issues
1. Size of transfer units (level of granularity)
Big units are more efficient
Virtual memory pages
Can have false sharing whenever page contains
different variables that are accessed at the same
time by different processors

10
False Sharing
accesses y
accesses x
x y
page containing x and y will move back and
forthbetween main memories of workstations
11
Characterizing a DSM (II)

2. Consistency model
Strict consistency is not possible
Various authors have proposed weak consistency
models
Cheaper to implement
Harder to use in a correct fashion

12
Characterizing a DSM (III)

3. Portability of programs
Some DSMs allow programs written for a
multiprocessor architecture to run on a cluster
of workstations without any modifications (dusty
decks)
More efficient DSMs require more changes
4. Portability of DSM
Some DSMs require specific OS features

13
MUNIN

Developed at Rice University
Based on software objects (variables)
Uses the processor virtual memory to detect
access to the shared objects
Includes several techniques for reducing
consistency-related communication
Only runs on top of V kernel

14
Key features

Software release consistency only requires the
memory to be consistent at specific
synchronization points,
Multiple consistency protocols allow the user to
select the best consistency protocols for each
data item,
Write-shared protocols reduce false sharing,
An update-with-timeout mechanism

15
SW RELEASE CONSISTENCY (I)

Well-written parallel programs use locks to
achieve mutual exclusion when they access shared
variables
P(mutex) and V(mutex)
lock(csect) and unlock(csect)
request ( ) and release( )
Unprotected accesses can produce unpredictable
results

16
SW RELEASE CONSISTENCY (II)

SW release consistency will only guarantee
correctness of operations within a
request/release pair
No need to propagate new values of shared
variables until the release
Must guarantee that workstation has received the
most recent values of all shared variables when
it completes a request

17
SW RELEASE CONSISTENCY (III)

shared int x
request( )// wait for new value of x
xrelease ( )
// propagate x2

shared int x
request( ) x 1release ( )
// propagate x1

18
SW RELEASE CONSISTENCY (IV)

Munin uses eager release new values of shared
variables are propagated at release time
Lazy release delays propagation until a request
is issued (Threadmarks)
A workstation issuing a request gets the current
values of all shared variables
Shared variables are not associated to a
particular critical section (as in Midway)

19
Munin Implementation (I)

Three kinds of variables
Ordinary variables can only be accessed by the
process that created them
Shared data variables should always be
accessed from within critical regions
Synchronization variables
locks, barriers or condition variables
must be accessed through special library
procedures .

20
Munin Implementation (II)

When a processor modifies shared data inside a
critical region, all update messages are buffered
and delayed until the processor leaves the
critical region
Processes accessing shared data variables outside
critical regions do it at their own risks
Same as with shared memory model
Risk is higher

21
FOUR CONSISTENCY PROTOCOLS

1. Conventional shared variables
Replicated on demand
Single writer/multiple readers policy uses an
invalidation-based protocol
2. Read-only variables
Replicated on demand
Any attempt to modify them will result in a
runtime error

22
FOUR CONSISTENCY PROTOCOLS

3. Migratory variables
Migrated among the processes accessing them
Every process accessing them will always get full
read and write access
4. Write-shared variables
Can be updated concurrently because different
portions of the page are accessed

23
Implementation

Programmer uses annotations to specify any of the
last three consistency protocols
Read-only variables
Migratory variables
Write-shared variables
Incorrect annotations may result in inefficient
performance or in runtime errors but not in
incorrect results

24
WRITE-SHARED PROTOCOL (I)

Designed to fight false sharing
Uses a copy-on-write mechanism
Whenever a process is granted access to
write-shared data, the page containing these data
is marked copy-on-write
First attempt to modify the contents of the page
will result in the creation of a copy of the
page modified (the twin).

25
Example
Before
First write access
x 1 y 2
x 1 y 2
twin
After
Compare with twin
x 3 y 2
New value of x is 3
26
WRITE-SHARED PROTOCOL (II)

At release time, the DSM will perform a word by
word comparison of the page and its twin, store
the diff in the space used by the twin page and
notify all processors having a copy of the shared
data of the update
A runtime switch can be set to check for
conflicting updates to write-shared data.

27
UPDATE TIME-OUT MECHANISM

Munin does not send updates to processors holding
stale replicas
Anytime a processor receives an update for a page
for which it does not have a twin, the page is
marked supervisor-only and the time of receipt of
the update is recorded.
First local access to the page will cause a trap
that will remove the restriction

28
UPDATE TIME-OUT MECHANISM

When a process receives an update for a page that
is still marked supervisor only, it checks the
timestamp of the last update
If more than 50 ms have elapsed, process notifies
the originator of the update not to send more
updates and invalidates the page.

29
CONCLUSIONS (I)

The strongest point of Munin is its excellent
performance
typically within 5 to 33 of the performances of
hand-coded message passing versions of the same
programs
Its major limitation is its dependence of some
features of the V kernel

30
CONCLUSIONS (II)

Munin requires programs to access shared data
from within critical regions or after barriers
Appears to be a reasonable requirement
Munin allows users to tune the performance of
their programs by selecting the best consistency
protocol for each shared variable
Can quickly become a tedious process

31
FURTHER DEVELOPMENTS