Title: Consistency and Replication
1Consistency and Replication
2Update Propagation
- Three possibilities
- Propagate only a notification of an update
- Transfer data from one copy to another
- Propagate the update operation to other copies
3Pull versus Push Protocols
- Push-based approach
- Also called a server-based protocol
- Server initiates the transfer without the client
asking for it - Pull-based approach
- Transfer happens when the client asks for it
- Advantages depend on the type of workload
- Amount of data, frequency of update, frequency of
read-only operations
4Pull versus Push Protocols
Issue Push-based Pull-based
State of server List of client replicas and caches None
Messages sent Update (and possibly fetch update later) Poll and update
Response time at client Immediate (or fetch-update time) Fetch-update time
- A comparison between push-based and pull-based
protocols in the case of multiple client, single
server systems.
5Lease Protocols
- Have the copy expire after a period of time
- A client can ask to renew a lease
- A short lease can be given for a client than only
uses the item infrequently the server doesnt
have to maintain state for as long - Have to worry about different clocks!
- Fast client, slow server
- Fast server, slow client
6Lease Protocols
- Three kinds of leases
- Age-based leases given out on data items
depending on the last time the item was modified
for long-lasting data, reduce number of update
messages - Renewal-frequency based the client can receive
an update to its cached copy often - State-space overhead the server lowers the
lease time as it becomes overloaded, thus
reducing the amount of state information it has
to maintain - In all of the cases updates are pushed by the
server as long as the lease has not expired.
7Epidemic protocols
- Goal is to propagate replicas in as few messages
as possible based on the model of infectious
diseases! - A server is infective if it holds an update that
it is willing to spread to other servers - A server that has not been updated yet is
susceptible - An updated server not willing or able to spread
its update is said to be removed
8Epidemic protocols
- A popular propagation model is that of
anti-entropy - A server P picks another Q at random and
exchanges updates. Choices include - P only pushes its own update to Q
- Not as rapid for spreading updates
- P only pulls in new updates from Q
- Works best when many servers are infective
- P and Q send updates to each other
- An initial push to several servers helps spread
- A variant is gossiping if a server tried to
spread a rumor to a server that already knows it,
it become removed with probability 1/k - See the papers on Spinglass for more information!
9Remote-Write Protocols (1)
- Primary-based remote-write protocol with a fixed
server to which all read and write operations are
forwarded.
10Remote-Write Protocols (2)
- The principle of primary-backup protocol.
11Local-Write Protocols (1)
- Primary-based local-write protocol in which a
single copy is migrated between processes.
12Local-Write Protocols (2)
- Primary-backup protocol in which the primary
migrates to the process wanting to perform an
update.
13Quorum-Based Protocols
- Three examples of the voting algorithm
- A correct choice of read and write set
- A choice that may lead to write-write conflicts
- A correct choice, known as ROWA (read one, write
all)
14TreadMarks
- TreadMarks Shared Memory Computing on Networks
of Workstations, by Christiana Amza, Alan L. Cox,
Sandhya Dwarkadas, Pete Keleher, Honghui Lu,
Ramakrishnan Rajamony, Weimin Yu, Willy
Zwaenepoel, IEEE Computer, 29(2), 1996. - Rice University early 1990's
- Network of workstations
- Goal - distributed shared memory
15TreadMarks
- Runs at the user level in Unix
- Uses many techniques to reduce the communication
overhead. - Lazy release consistency
- A multiple writer protocol
- An API allows programs to create shared variables
and to call synchronization primitives
16Synchronization Primitives
- Two kinds
- Simple barrier Tmk_barrier()
- Mutex lock
- Tmk_lock_acquire()
- Tmk_lock_release()
17Example program Jacobi decomposition
- an application to solve partial differential
equations. - y1 -x1 - x2 .... x3 ...
- y2 x1 x2 .... x3 .....
- Use a grid (matrix) to estimate differentiation.
- The initial configuration is Generation 1. You
move to the next generation by modifying the
first. Each position is modified according to its
neighbor's values. - Look at the guy above, below, to the right and to
the left and calculate their numeric average. - When new results have been computed, put them in
the temporary scratch grid, then swap grids and
keep going until 'youre close enough'.
18Example program Jacobi decomposition
- If you have a really large grid, split the grid
in half, assign p0 to one half, p1 to the other.
Each computes their results, then they stop and
exchange rows. Then go again. - Each process do
- compute my portion
- Tmk_barrier()
- get new grid
- Tmk_barrier()
- // make sure everyone has fully copied to their
new grid - until done // until you're satisfy with your
results.
19TreadMarks view of memory
- At each processor
- Some portion of physical memory that is mapped to
global shared memory - Local memory (including cache)
- Kernel (OS memory)
20TreadMarks compared to IVY
- IVY is a DSM system that uses sequential
consistency and virtual memory on each
workstation - Memory is stored in pages
- Invalidations are sent out before writes to
shared memory - The next time this data item is accessed in IVY a
page fault will be issued
21TreadMarks compared to IVY
- For example, say processor 1 gets a page fault
when it tries to access a page in its global
virtual memory (the page is not there) - This causes an interrupt to the OS
- A network message is sent to processor 2 global
shared memory to get the page - The mechanism then copies the page to some cache
location of processor 1 local physical resources
22Other problems with IVY
- False sharing
- Since memory is shared in units of a page, more
than one process may write to the same page (but
not to the same location) - There is a lot of overhead and communication for
each DSM access - Context switch to OS kernel
- Network messages
- Interrupt processing when new page arrives
23TreadMarks
- To reduce communication and overhead
- Use lazy consistency
- only communicate the data when it is requested
- Operate in user space
- Avoid overhead for context switches
- Adds responsibility to programmer, since the
programmer must be aware of the use of shared
memory - All synchronization must be done with the
TreadMark primitives
24TreadMarks
- To reduce cost of false sharing use a multiple
writer protocol - most systems use single writer protocol
- In this protocol, the writer owns the page and no
one else can write to the page - Blocking to wait for access (causes delay in the
application that may not have to be there) - With multiple writer protocol you wait to
communicate updates until synchronization occurs.
(consistency traffic is deferred) - Lowers the communication costs
25Multiple Writer Protocol
- Idea
- When you read you acquire a copy
- When you write you make a twin (make another
copy) in system space, then you write to the
twin.
26Multiple Writer Protocol
- Suppose another process makes a request then
- - compare twin with original and make a diff
file (the diff between the twin and the original) - - the diff's are then sent
- - at the same time, discard the twin (since you
have a record of changes in the diff file). - Since the diff is smaller than the whole page,
the amount of communication is smaller. - Caveat is you have to use appropriate TreadMarks
synchronization tools to ensure program
correctness.