Replication and Fault Tolerant - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Replication and Fault Tolerant

Description:

Replication and Fault Tolerant * – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 40
Provided by: SteveA195
Category:

less

Transcript and Presenter's Notes

Title: Replication and Fault Tolerant


1
Replication and Fault Tolerant
2
Introduction
  • Reason for Replication
  • Reliability
  • Maintaining multiple copies if one crash
    continue with another replicas
  • Performance
  • Divide the work to multiple server
  • Place data close to the process that is using it
  • data access time reduces
  • any drawbacks? Cost?
  • Inconsistency?
  • bank account
  • Accessing Web pages
  • can cache pages
  • need to keep the cache updated all the time

3
Object Replication (1)
  • Organization of a distributed remote object
    shared by two different clients.
  • If remote objects are replicated,
  • need to ensure that operations are performed in
    the correct (same) order in all replicas
  • first, need to ensure that the concurrent
    invocations on each replica are handled correctly

4
How do you prevent concurrent access to
distributed Objects?
  • 2 choices
  • Let the object itself handle it Java allows
    methods to be synchronized In C, use pthreads,
    mutex,
  • The middleware handles it

5
Object Replication (2)
  1. A remote object capable of handling concurrent
    invocations on its own.
  2. A remote object for which an object adapter is
    required to handle concurrent invocations

6
Object Replication (3)
  1. A distributed system for replication-aware
    distributed objects.
  2. A distributed system responsible for replica
    management

7
Data-Centric Consistency Models
Contract between process and data
store(fileSys,S/memory,S/database) obey certain
rules,data store promises to obey certain
rules,data store promises to work
correctly. e.g process read the up-to-date
data stored from the last write operation.
  • The general organization of a logical data store,
    physically distributed and replicated across
    multiple processes.

8
Strict Consistency
Any read on a data item x Returns a value the
most recent write on x
  • Behavior of two processes, operating on the same
    data item.
  • A strictly consistent store.
  • A store that is not strictly consistent.
  • Observation It doesn't make sense to talk about
    "the most recent" in a distributed environment.
  • Assume all data items have been initialized to
    NIL
  • W( x) a value a is written to x
  • R( x) a reading x returns the value a

9
Sequential Consistency (1)
SQ results of any execution is same as if
operations from different processes are executed
in some sequential order.Operations of single
process must appear in order specified by
program any valid interleaving of read and write
operations is acceptable,but all processes must
see same interleaving of operations.
  1. A sequentially consistent data store.
  2. A data store that is not sequentially consistent.

10
Causal Consistency (1)
  • Necessary conditionWrites that are potentially
    causally related must be seen by all processes in
    the same order.
  • Concurrent writes may be seen in a different
    order on different machines.

11
Causal Consistency (2)
  • This sequence is allowed with a
    causally-consistent store, but not with
    sequentially or strictly consistent store.
  • A data store that is not sequentially consistent.

12
Causal Consistency (3)
Concurent write
Concurent write
  • A violation of a casually-consistent store.
  • A correct sequence of events in a
    casually-consistent store.

13
FIFO Consistency (1)
  • Necessary ConditionWrites done by a single
    process are seen by all other processes in the
    order in which they were issued, but writes from
    different processes may be seen in a different
    order by different processes.

14
FIFO Consistency (2)
  • A valid sequence of events of FIFO consistency

15
FIFO
Process P1 Process P2 Process P3
x 1 print ( y, z) y 1 print (x, z) z 1 print (x, y)
  • Three concurrently executing processes.

16
FIFO Consistency (3)
x 1 print (y, z) y 1 print(x, z) z 1 print (x, y) Prints 00 (a) x 1 y 1 print(x, z) print ( y, z) z 1 print (x, y) Prints 10 (b) y 1 print (x, z) z 1 print (x, y) x 1 print (y, z) Prints 01 (c)
  • Statement execution as seen by the three
    processes from the previous slide. The
    statements in bold are the ones that generate the
    output shown.

17
FIFO Consistency (4)
Process P1 Process P2
x 1 if (y 0) kill (P2) y 1 if (x 0) kill (P1)
Two concurrent processes. Both process can be
killed P1 read y 0 before it sees P2(y)1
18
Summary of Consistency Models
Consistency Description
Strict Absolute time ordering of all shared accesses matters.
Linearizability All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp
Sequential All processes see all shared accesses in the same order. Accesses are not ordered in time
Causal All processes see causally-related shared accesses in the same order.
FIFO All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order
(a)
  1. Consistency models not using synchronization
    operations.

19
Distribution Protocols
Replica Placement Update Propagation Epidemic
Protocols
20
Replica Placement
  • The logical organization of different kinds of
    copies of a data store into three concentric
    rings.

21
Replica Placement
  • Permanent replicas
  • Process/machine always having a initial set of
    replica
  • Web site(file ) mirroring (all the content)
    distributed database

22
Server-initiated replica
  • Process that can dynamically host a replica on
    request of another server in the data store
  • push caches-
  • create a replicate when they have burst
    request from certain location.
  • The Algorithm
  • Replication take place to reduce the load on a
    server.
  • Specified file on server can be migrate to the
    nearest request.

23
Server-Initiated Replicas
  • Q Counting access requests from different
    clients.
  • Eg Web Hosting Service

24
Client-initiated replica
  • client cache.
  • Local storage capacity
  • use temporarily to store a copy of data just
    requested.
  • Managing the cache is left to the client.
  • Access time improved if the cache hit is said to
    occurs.

25
Update propagation
  • Update are initiated at a client
  • Forwarded to one of the copies an propagate to
    another copies
  • Some design issues to consider in propagating the
    update.
  • State versus operations
  • Pull vs Push Protocol

26
Push and Pull based Approach
  • Push based Approach
  • Also referred as server-based protocol
  • Update are directly propagate to the replica
    without request.
  • Pull based Approach
  • Referred as client-based protocol
  • Client request a server to send any update it has
    at the moment.

27
Push versus Pull Protocols
Issue Push-based Pull-based
State of server List of client replicas and caches None
Messages sent Update (and possibly fetch update later) Poll and update
Response time at client Immediate (or fetch-update time) Fetch-update time
  • A comparison between push-based and pull-based
    protocols in the case of multiple client, single
    server systems.

28
Quorum-Based Protocols
  • Three examples of the voting algorithm
  • A correct choice of read and write set
  • A choice that may lead to write-write conflicts
  • A correct choice, known as ROWA (read one, write
    all)

29
Fault ToleranceBasic ConceptsFailure Models
30
Introduction
  • Partial failure in distributed system may happen
    when one component is fails.
  • May affect the operation in certain component
  • Leaving another component totally unaffected
  • The design goal in DS is
  • Build a system that automatically recover from a
    partial failure
  • Without seriously affecting the overall
    performance

31
Basic Concepts
  • Dependability Includes
  • Availability
  • Reliability
  • Safety
  • Maintainability

32
Availability
  • The system is ready to be used immediately
  • In general, the system is operating correctly at
    any given moment and is available to performs its
    functions.
  • Percentage of availability (total elapsed time
    sum of downtime)/total elapsed time
  • 99.9

33
Reliability
  • System can run continuously without failure.
  • High reliable system is one that will most likely
    continue to work without interruption during a
    relative long period of time
  • One measure used to define a component or
    system's reliability is mean time between
    failures (MTBF)
  • MTBF (total elapsed time sum of
    downtime)/number of failures
  • A related measurement is mean time to repair
    (MTTR). MTTR is the average time interval
    (usually expressed in hours) that it takes to
    repair a failed component.

34
Safety
  • Nothing catastrophic will happen if a system
    temporary fails to operate correctly.

35
Maintainability
  • Refers to how easy a failed system can be repaired

36
Terminology
  • Failure When a component is not living up to its
    specifications, a failure occurs
  • Error That part of a component's state that can
    lead to a failure
  • Fault The cause of an error
  • Fault prevention prevent the occurrence of a
    fault
  • Fault tolerance build a component in such a way
    that it can meet its specifications in the
    presence of faults

37
Failure Models
Type of failure Description
Crash failure A server halts, but is working correctly until it halts
Omission failure Receive omission Send omission A server fails to respond to incoming requestsA server fails to receive incoming messagesA server fails to send messages
Timing failure A server's response lies outside the specified time interval
Response failure Value failure State transition failure The server's response is incorrectThe value of the response is wrongThe server deviates from the correct flow of control
Arbitrary failure A server may produce arbitrary responses at arbitrary times
  • Different types of failures.

38
Failure Models(cont)
Timing failures The output of a component is
correct, but lies outside a specified real-time
interval - (performance failures too slow)
Response failures The output of a component is
incorrect Value failure The wrong value is
produced State transition failure Execution of
the component's service brings it into a wrong
state
39
Failure Models(cont)
Crash failures A component halts but behaves
correctly before halting Omission failures A
component fails to respond Receive omission A
server fails to receive incoming messages Send
omission A server fails to send
messages Arbitrary failures A component may
produce arbitrary output and be subject to
arbitrary timing failures
Write a Comment
User Comments (0)
About PowerShow.com