Release Consistency - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Release Consistency

Description:

In any implementation of Sequential Consistency there should ... John B. Carter, John K. Bennett, and Willy Zwaenepoel. Implementation and Performance of MUNIN. ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 70
Provided by: dsl77
Category:

less

Transcript and Presenter's Notes

Title: Release Consistency


1
Release Consistency
  • Slides by Konstantin Shagin, 2002

2
The need for Relaxed Consistency Schemes
  • In any implementation of Sequential Consistency
    there should be some global control mechanism.
  • Either of writes or reads require memory
    synchronization operations.
  • In most implementation writes require some kind
    of memory synchronization

w(x) w(y) w(x)
A
B
3
The Idea of Relaxed Consistency Schemes
  • The Relaxed Consistency Schemes are designed to
    allow less memory synchronization operations.
  • Writes can be delayed, aggregated, eliminated.
  • This results in less communication and therefore
    higher performance.

w(x) w(y) w(x)
A
B
4
Software Distributed Shared Memory
page based, permissions, single system
image, shared virtual address space,
5
False Sharing
  • False sharing is a situation in which two or
    more processes access different variables within
    a page and at least one of the accesses is a
    write.
  • If only one process is allowed to write to a page
    at a time, false sharing leads to unnecessary
    communication, called the ping-pong effect.

6
Understanding False Sharing
x
w(x) w(x) w(x)
A
y
p
p
p
p
p
p
B
r(y) r(y) r(y)
x
w(x) w(x) w(x)
A
y
page p1
B
page p2
r(y) r(y) r(y)
7
False Sharing in Relaxed Consistency Schemes
  • False sharing has much smaller overhead in
    relaxed consistency models.
  • The overhead induced by false sharing can be
    further reduced by the the usage of
    multiple-writer protocols.
  • Multiple-writer protocols allow multiple
    processes to simultaneously modify their local
    copy of a shared page.
  • The modifications are merged at certain points of
    execution.

8
Release ConsistencyGharachorloo et al. 1990,
DASH
  • Introduces a special type of variables, called
    synchronization variables or locks.
  • Locks cannot be read or written to. They can be
    acquired and released. For a lock L those
    operations are denoted by acquire(L) and
    release(L) respectively
  • We will say that a process that acquired a lock L
    but has not released it, holds the lock L.
  • No more than one process can hold a lock L. One
    process holds the lock while others wait.

9
Using Release and Acquire to define
execution-flow synchronization primitives
  • Let a set of processes release tokens by reaching
    the operation release in their program order.
  • Let another set (possibly with overlap) acquire
    those tokens by performing acquire operation,
    where acquire can proceed only when all tokens
    have already arrived from all releasing
    processes.
  • 2-way synchronization lock-unlock, 1 release, 1
    acquire
  • n-way synchronization barrier, n releases, n
    acquires
  • PARCs synch k-way synchronization

10
Model of Atomicity
  • A read by Pi is considered performed with respect
    to process Pk at a point in time when the issuing
    of a write to the same address by Pk can not
    affect the value returned by the read.
  • A write by Pi is considered performed with
    respect to process Pk at a point in time when an
    issued read to the same address by Pk returns the
    value defined by this write (or a later value).
  • An access is performed when it is performed with
    respect to all processes.
  • An acquire(L)by Pi is performed when Pi receives
    exclusive ownership of L (before any other
    requester).
  • A release(L)by Pi is performed when Pi gives away
    its exclusive ownership of L.

11
Formal Definition of Release Consistency
  • Conditions for Release Consistency
  • Before a read or write access is allowed to
    perform with respect to any other process, all
    previous acquire accesses must be performed, and
  • Before a release access is allowed to perform
    with respect to any other process, all previous
    read or write accesses must be performed, and
  • acquire and release accesses are sequentially
    consistent.

12
Understanding RC
From this point all processes must see the value
1 in X
It is undefined what value is read here. It can
be any value written by some process. Here it can
be 0 or 1.
1 must be read according to rule (B), but the
programmer can not be sure of it
Programmer is sure that this will return 1
according to rules (C) and (A)
13
Acquire and Release
  • release serves as a memory-synch operation, or a
    flush of the local modifications to the attention
    of all other processes.
  • According to the definition, the acquire and
    release operations are not only used for
    synchronization of execution, but also for
    synchronization of memory, i.e. for propagation
    of writes from/to other processes.
  • This allows to overlap the two expensive kinds of
    synchronization.
  • This turns out also simpler on the programmer
    from semantic point of view.

14
Acquire and Release (cont.)
  • A release followed by an acquire of the same lock
    guarantees to the programmer that all writes
    previous to the release will be seen by all reads
    following the acquire.
  • The idea is to let the programmer decide which
    blocks of operations need be synchronized, and
    put them between matching pair of acquire-release
    operations.
  • In the absence of release/acquire pairs, there is
    no assurance that modifications will ever
    propagate between processes.

15
Consistency of synchronization operations
  • Note the relations of the release/acquire
    operations to themselves also define an
    independent memory consistency scheme.
  • The rule (C) defined it to be Sequential
    Consistency.
  • There are other flavors of RC in which the
    consistency of synchronization operations defined
    to be some consistency x (e.g., Coherence). Such
    a memory model is denoted by RCx.
  • RCx is weaker than RCy if x is weaker than y.
  • For simplicity, we deal only with RCsc.

16
Happened-Before relation induced by
acquire/release
  • Redefine the happened-before relation using
    acquire and release instead of receive and send
    respectively.
  • We say that event e happened before event e (and
    denote it by e ? e or e lt e) if one of the
    following properties holds

Processor Order e precedes e in the same
process Release-Acquire e is a release and e is
the following acquire of the same
lock Transitivity exists e s.t. e lt e and
elt e
17
Happened-Before relation induced by
acquire/release (cont.)
A
B
acq(L1)
C
rel(L2)
acq(L2)
18
Competing Accesses
  • Two memory accesses are not synchronized if they
    are independent events according to the
    previously defined happened-before relationship.
  • Two memory accesses are conflicting if they are
    accesses to the same memory location, and at
    least one of them is a write.
  • Conflicting accesses are said to be competing if
    there exists an execution in which they are not
    synchronized.
  • Competing accesses form a race condition as they
    may be executed concurrently.

19
Data Races in RC
  • Release Consistency does not guarantee anything
    about propagation of updates without
    synchronization. Example

Initially grades oldDatabase updated false

Thread T.A.
grades newDatabase updated true
Thread Lecturer
while (updated false) Xgrades.gradeOf(lectu
rersSon)
  • If the modification of variable updated is passed
    to Lecturer, while the modification of grades is
    not, then Lecturer looks at the old database!
  • This is possible in Release Consistency, but not
    in Sequential Consistency.

20
Expressiveness of Release ConsistencyGharachorlo
o et.al 1990
  • Let a properly-labeled (PL) program be such that
    has no
  • competing accesses.
  • Theorem RCsc SC for PL programs.
  • Should make sure there are no data-races.

21
Implementing RC
  • The first implementation was proposed by the
    inventors of RC and is called DASH.
  • DASH combats memory latency by pipelining writes
    to shared memory.
  • The processor is stalled only when executing a
    release, at which time it must wait for all its
    previous writes to perform.

22
Implementing RC (cont.)
  • It is important to reduce the number of messages
    exchanges, because every message has additional
    fixed overhead, independent of its size.
  • Another implementation of RC, called Munin
    reduces the number of messages by buffering
    writes until a release.

23
Eager Release ConsistencyCarter et al. 1991,
Munin
  • Implementation of Release Consistency (not a new
    memory model).
  • Postpone sending modifications to the next
    release.
  • Upon a release send all accumulated modifications
    to all caching processes.
  • No memory-synchronization operations on an
    acquire.
  • Upon a miss (no local caching of the variable)
    get latest modification from latest modifier
    (need some more control to store its identity, no
    big deal).

24
Understanding ERC
apply changes
apply changes
r(z)1
r(y)1
A
acq(L1)
z
x,y
apply changes
w(x)1
w(y)1
r(z)1
B
rel(L1)
x,y
z
w(z)1
acq(L2)
C
rel(L2)
apply changes
  • Release operation does not complete (is not
    performed) until the acknowledgements from all
    the processes are received.

25
Supporting Multiple Writers in ERC
  • Modifications are detected by twinning.
  • When writing to unmodified page, its twin is
    created.
  • When releasing, the final copy of a page is
    compared to its twin.
  • The resulting difference is called a diff.
  • Twinning and diffing not only allow multiple
    writers, but also reduce communication.
  • Sending a diff is cheaper than sending an entire
    page.

26
Twinning and Diffing
27
Update-based vs. Invalidate-based
  • In update-based protocols the modifications are
    sent whereas in invalidate-based protocol only
    notifications of modifications are sent.

Update-based
Invalidate-based
rel(L)
rel(L)
P1
P1
x1
I changed x and y
y2
P2
P2
28
Update-Based vs. Invalidate-Based (cont.)
  • Invalidations are smaller than the updates.
  • The bigger the coherency unit the bigger is the
    difference.
  • In invalidation-based schemes there can be
    significant overhead due to access misses.

rel(L)
P1
inv(x)
get(x)
x1
y2
get(y)
inv(y)
acq(L)
P2
r(y)
r(x)
29
Reducing the Number of Messages
  • In DASH and Munin systems all processes (or all
    processes that cache the page) see the updates of
    a process.
  • Consider the following example of execution in
    Munin

w(x)
rel(L)
P1
w(x)
acq(L)
rel(L)
P2
w(x)
acq(L)
rel(L)
P3
r(x)
acq(L)
P4
  • There are many unneeded messages. In DASH even
    more.
  • This problem exists in invalidation-based schemes
    as well.

30
Reducing the Number of Messages (cont.)
  • Logically, however it suffices to update each
    processors copy only when it acquires L.

w(x)
rel(L)
P1
w(x)
acq(L)
rel(L)
P2
w(x)
acq(L)
rel(L)
P3
r(x)
acq(L)
P4
  • Therefore, a new algorithm, called Lazy Release
    Consistency (LRC) for implementing RC was
    proposed.
  • LRC is aimed at reducing both the number of
    messages and the amount of data exchanged.

31
Lazy Release ConsistencyKeleher et al.,
Treadmarks 1992
  • The idea is to postpone sending of modifications
    until a remote processor actually needs them.
  • Invalidate-based protocol
  • The BIG advantage no need to get modifications
    that are irrelevant, because they are already
    masked by newer ones.
  • NOTE implements a slightly more relaxed memory
    model than RC!

32
Formal Definition of Lazy Release Consistency
  • Conditions for Lazy Release Consistency
  • Before a read or write access is allowed to
    perform with respect to any other process, all
    previous acquire accesses must be performed with
    respect to that other process, and
  • Before a release access is allowed to perform
    with respect to any other process, all previous
    read or write accesses must be performed with
    respect to that other process, and
  • acquire and release accesses are sequentially
    consistent.

33
Understanding the LRC Memory Model
A
B
C
  • It is guaranteed that the acquirer of the same
    lock sees the modification that precede the
    release in program order.

34
Understanding the LRC Memory Model Transitivity
  • The process C sees the modification of x by A.

35
Implementation of LRC
  • Satisfying the happened-before relationship
    between all operations is enough to satisfy LRC.
  • Maintenance and usage of such a detailed ordering
    would be expensive.
  • Instead, the ordering is applied to process
    intervals.
  • Intervals are segments of time in the execution
    of a single process.
  • New interval begins each time a process executes
    a synchronization operation.

36
Intervals
P1
P2
P3
37
Happened-before of Intervals
  • A happened before partial order is defined
    between intervals.
  • An interval i1 precedes an interval i2 according
    to happened-before of intervals, if all accesses
    in i1 precede accesses in i2 according to the
    happened-before of accesses.

38
Vector Timestamps
  • An interval is said to be performed at a process
    if all intervals accesses have been performed at
    that process.
  • Each process p has vector timestamp Vp that
    tracks which intervals have been performed at
    that process.
  • A vector timestamp consists of a set of interval
    indices, one per process in the system.

39
Management of Vector Timestamps
  • Vector timestamps are managed like vector clocks.
  • send and receive events are replaced by release
    and acquire (of the same lock) respectively.
  • A lock grant message (that is sent from releaser
    to acquirer to give acquire the exclusive
    ownership) contains the current timestamp of the
    releaser
  1. Just before executing a release or acquire in p
    Vpq Vpq 1
  2. A lock grant message m is time-stamped with
    t(m)Vp.
  3. Upon acquire for every q Vpq max Vpq,
    t(m)q

40
Vector Timestamps (cont.)
  • A process updates its vector timestamp at the end
    of an interval. Therefore during an interval the
    process timestamp does not change.
  • We denote the vector timestamp of process p at
    interval i by Vpi.
  • The entry for process q ? p is denoted by Vpiq.
  • It specifies the most recent interval of process
    q that has been performed at process p.
  • Entry Vpip is always equal to i.
  • An interval x of process q is said to be covered
    by Vpi if Vpiq ? x

41
Write Notices
  • Write notice is an indication that a given page
    has been modified.
  • Each process keeps a table of intervals covered
    by it.
  • An entry in this table represents an interval. It
    contains a write notice for every page that was
    modified during the segment of time corresponding
    to the interval.
  • Write notices are sent in the lock grant message
    along with the vector timestamp of the releaser.

42
Write Notices (cont.)
  • It is not necessary to send to acquirer the write
    notices belonging to intervals covered by its
    vector timestamp.
  • In order to let releaser know what intervals are
    covered by the acquirer, the acquirer sends the
    release its timestamp inside a lock request
    message.
  • When the releaser sends a lock grant message to
    the acquirer, it sends only the write notices
    belonging to interval covered by itself, but not
    covered by the acquirer.
  • When the acquirer receives the lock grant
    message, it invalidates all the pages for which a
    write notice is included in the message.

43
Write Notices (cont.)
w(y)
w(x)
acq(L)
rel(L)
A
write notices for intervals not covered by VCB
write notices
generate write notices
request diffs
diffs
lock request
B
acq(L)
r(y)
x,y
invalidate according to write notices
44
Access Misses
  • When accessing an invalidated page, all the
    modifications made to it in the intervals that
    happened before the current interval must be
    obtained.
  • Note that this is true even if the access is a
    write.
  • A process can identify those intervals and the
    processes that performed the modification by the
    write notices it has for the page.
  • A write notice is saved along with the id of the
    process from which it was received and its vector
    timestamp.
  • How do we merge modifications performed by
    concurrent writers to a page?

45
Tracking Modifications with Multiple Writers
  • It is possible that several processes make
    modifications to different variables at the same
    page.
  • If the intervals in which the modifications are
    performed are independent (according to
    happened-before), we cannot just bring a page
    from one of the processes.
  • What should we do? Employ the twinning and
    diffing technique again!

46
Twinning and Diffing (reminder)
47
Tracking Modifications with Multiple Writers
(cont.)
P1
inv(P)
page P
acq(L1)
r(x)
P2
x
y
inv(P)
P3
rel(L2)
  • Note that twinning and diffing not only allows
    multiple independent writers but also
    significantly reduces the amount of data sent.

48
Access Misses (cont.)
  • Consider the following scenario, in which P3 has
    a miss on a page containing variables x, y and z

w(x)
rel
P1
inv(x)
w(y)
acq
rel
P2
inv(x,y)
r(z)
mod(x,y)
acq
P3
  • When accessing z, P3 sees that according to the
    locally stored write notices there has been two
    previous modifications.
  • They are ordered by happened before relationship
    therefore P3 can request both modifications from
    P2.

49
Access Misses (cont.)
  • More generally, if processor q modified page P at
    its interval x, then q is guaranteed to have any
    diffs of P created intervals that
    happened-before the interval x.
  • Therefore even if diffs from multiple writers
    need to be retrieved, it is usually only
    necessary to communicate with very few
    processors.
  • How long should a process keep the diffs ?
  • How long should a process keep the write notices
    ?
  • Clearly, not forever! A garbage collection needs
    to be done

50
Garbage Collection
  • A diff needs to be retained until it is clear it
    will never be requested.
  • This happens when a diff has already been sent to
    every processor.
  • When a process sees it is running out of memory
    it initiates garbage collection, which is invoked
    at the next barrier.
  • Garbage collection piggybacks on the barrier to
    stop the world. Each process receives all write
    notices in the system and uses them to validate
    all of its cached pages. As a result, all write
    notices and diffs are discarded.

51
Lazy Diffing
  • Dont diff on every release do it only if
    theres communication with another node.
  • Delay generation of a diff until somebody asks
    for the lock.
  • When passing a lock token, send write notices for
    modified pages, but leave pages write-enabled.
  • If somebody asks for diff, diff and mark clean.
  • Diff may include updates from later intervals
    (e.g., under the scope of other locks).
  • Must also generate diff if a write notice
    arrives.
  • Must invalidate the page but keep modifications.

52
LRC with Lazy Diffing
diff
w(x)
P1
make twin
acq(L)
P2
r(x)
53
Benefits of Lazy Diffing
  • The gain is considerable.
  • The eventual diff may include modifications that
    would have been split over several diffs.
  • Lock acquisitions are faster no need to wait
    for diffs.
  • Reducing the number of diffs reduces overall
    amount of transmitted data.

54
Drawbacks of Traditional LRC
  • At access miss a node may have to obtain the
    diffs from several nodes.
  • This happens when there is a substantial
    write-write false sharing.
  • The same diff may be applied many times.
  • Once at each node that fetches the diff
  • The need to save all diffs seen by a node
    significantly increases memory consumption.
  • A node that creates a diff needs to store it
    locally.
  • A node stores diffs it fetched from other nodes.
  • Garbage collection is an expensive global
    operation.

55
Home-based Lazy Release Consistency
  • HLRC is a simple home-based multiple-writer
    protocol that implements LRC
  • Each page has a designated node, called the home
    node of the page, which contains its master copy.
  • Diffs are computed at lock transfer time, sent to
    the home nodes of the corresponding pages, and
    then discarded.
  • On access miss (read or write) an entire page is
    fetched from home.
  • HLRC solves the mentioned drawbacks of LRC.

56
Understanding HLRC
  • Assume x is a variable on a page p whose home is
    P3

diff
w(x)
P1
make twin
acq(L)
r(x)
P2
p
P3
  • What happens if P2 tries to fetch p before the
    diff arrives to P3?

57
Guaranteeing Update Completion Before a Fetch
  • There are several techniques to ensure that the
    homes copy of a page contains the required
    updates.
  • Write flushing
  • Page versions (scalar timestamps)
  • Vector timestamps
  • All the techniques require that the network
    delivers the messages in the order that they are
    sent.

58
Write Flushing
  • The simplest approach.
  • Delay the completion of the release events until
    all the updates are propagated to the
    corresponding home are completed
  • The completion is ensured by having home
    acknowledge the receipt of diffs.

59
Write Flushing (cont.)
diff
w(x)
P1
make twin
acq(L)
r(x)
P2
p
P3
apply diff
  • There are two drawbacks
  • Latency increases due to the need to wait for the
    completion of update operation.
  • Page prefetching a page fetched from home may
    contain more recent updates than the ones
    required by LRC.

60
Page prefetching
diff
P1
diff
P2
w(y)
rel(L2)
?(x)
?(y)
inv(p)
P3
p
p
apply diff
apply diff
acq(L1)
r(x)
r(y)
P4
inv(p)
inv(p)
acq(L2)
P5
No need to bring p again
61
Page Versions
  • A version number is attached to each page.
  • Page version number is incremented at home
    whenever the home receives the update performed
    by a non-home writer within an interval.
  • The home sends the page version either in reply
    to a diff message, or in reply to a fetch request
    along with the page itself.
  • The page version numbers are included in the
    write notices.
  • A local page is not invalidated if the local page
    version is greater than or equal to the required
    page version included in the write notice.

62
Page Versions (cont.)
diff
p is not invalidated, because the version of the
local copy is 2
P1
diff
2
P2
w(y)
rel(L2)
?(x)
?(y)
inv(p,2)
1
P3
(p,2)
apply diff
apply diff
acq(L1)
r(x)
r(y)
P4
inv(p,1)
inv(p,1)
acq(L2)
P5
  • Using page versions avoid unnecessary
    invalidations, but still require waiting for
    updates to complete at home.

63
Vector Timestamps
  • Vector timestamps represent page versions, but
    avoid the need to wait for completion of updates.
  • The lock can be transferred immediately, because
    the vector timestamp representing the new page
    version can be calculated without cooperation of
    home.
  • The prefetching is detected in same way it is
    detected with scalar page versions.

64
Vector Timestamps (cont.)
  • A vector timestamp is attached to a valid page
    and indicates the current version of the page.
  • Such timestamp is called flush timestamp.
  • At home, flush timestamps are updated each time
    updates corresponding to a remote interval are
    performed.
  • At non-home node, flush timestamp are updated
    either
  • at the end of intervals during which the page was
    written, or
  • when the page is fetched from home.

65
Vector Timestamps (cont.)
  • A vector timestamp is attached to an invalid page
    and indicates the page version that the node has
    to fetch from home.
  • Such timestamp is called lock timestamp.
  • Lock timestamp is updated at acquire time as a
    result of applying the write notices received
    from the last releaser.
  • The lock timestamp is presented to the home as a
    part of a fetch request.
  • The home delays answering a fetch request if the
    required version is not available (because the
    corresponding updates are not completed).

66
Vector Timestamps (cont.)
p is not invalidated
diff
P1
?(x)
diff
inv(p,)
P2
w(y)
rel(L2)
?(y)
apply diff
P3
(p,)
apply diff
acq(L1)
r(x)
r(y)
P4
inv(p,)
inv(p,)
acq(L2)
P5
read is delayed until the home node P3 applies
the diff from P1
67
Invalidation of a modified page
  • There are situations that require that a modified
    page is invalidated. Example

A and B write to two different locations of the
page P (no data race)
w(P)
acq(L)
rel(L)
A
inv(P)
B
acq(L)
w(P)
  • After invalidating P at acquire, B cannot discard
    the local copy of P, because it contains Bs
    recent modifications.

68
Invalidation of a modified page (2)
  • What happens when B fetches P from its home? (Let
    C be the home of P)

How do we merge Bs local copy with the fetched
one?
acq(L)
rel(L)
w(P)
A
inv(P)
diff(P)
r(P)
B
acq(L)
w(P)
P
C
  • Bs local copy is combined with the new one by
    two-way diffing.

69
Two-way diffing
  • Let Pold be the modified invalidated copy and let
    Told be its twin.
  • Let Pfetched be the fetched copy.
  • The new local copy Pnew is calculated by applying
    modification in Pold to Pnew
  • Pnew Pfetched Pold ? Told
  • In addition, the twin of P is replaced by
    Pfetched
  • Tnew Pfetched
  • Therefore, the next time the diff of P is
    calculated, both old and new modifications are
    detected.
Write a Comment
User Comments (0)
About PowerShow.com