CS556: Distributed Systems presentation

About This Presentation

Transcript and Presenter's Notes

Title: CS556: Distributed Systems

1
CS-556 Distributed Systems
Fault Tolerance (II)

Manolis Marazakis
maraz_at_csd.uoc.gr

2
Dependability Basic Concepts

Availability
Reliability
Safety
Maintainability

Fault ? Error ? Failure

Faults
-Transient
Intermittent
Permanent

3
A 2-node cluster

4
Shared-disks vs Shared-nothing

Shared-disks
Dual hosting for the storage devices
SCSI, NAS, SAN
Access is arbitrated by external software that
runs on both servers
Shared-nothing
Replication schemes
Requires more effort to recover a server
More suitable for WAN
Requires a functional network and a functional
host on the other side to ensure that the writes
actually succeed
Danger of inconsistency after a failover

5
Failover Management Software

Key components of the system must be monitored
H/W is generally the easiest part to monitor
Relatively easy tests
Relatively few different varieties of H/W
components
How to monitor the health of an application ?
Examine systems process table
No guarantee that the app. is running properly!
Query the application itself
checking for accurate timely responses
For some apps, query is easy (eg DBMS)
Make sure the check is end-to-end
E.g DBMS s/w disk network
For others, this is hard!
Web server ? web page access
File server ? file access
Custom s/w ? ??

6
Active-Passive Configuration (I)

Both servers are connected to a set of
dual-hosted disks.
These disks are divided between 2 separate
controllers disk arrays
The data is mirrored from one controller to the
other.
A particular disk or filesystem can only be
accessed by one server at a time.
Ownership conflicts are arbitrated by the
clustering software.
Both servers are connected to the same public
network, and share a single IP address
which is migrated by the FMS from one server to
the other as part of the failover.

7
Active-Passive Configuration (II)

8
Active-Passive Configuration (III)

Cost
2 hosts are reserved to perform the work of one.
One host sits largely idle most of the time,
consuming electricity, administrative effort,
data center space, cooling, and other limited and
expensive resources.
However, active-passive configurations are going
to be the most highly available ones over time.
Since there are no unnecessary processes running
on the second host, there are fewer opportunities
for an error to cause the system to fail.

9
Active-Active Configuration (I)

Each host acts as the standby for its partner in
the cluster, while still delivering its own
critical services.
When one server fails, its partner takes over for
it begins to deliver both sets of critical
services
until the failed server can be repaired
returned to service.
The servers must be truly independent of each
other

10
Active-Active Configuration (II)

11
Service Group Failover (I)
Capability for multiple service groups that ran
together on one server to failover to separate
machines when that first server fails

12
Service Group Failover (II)

Service Group a set containing one or more IP
addresses, one or more disks or volumes, and one
or more critical processes
A service group is the unit that fails from one
server to another within a cluster.
For service groups to maintain their relevance
value, they must be totally independent of each
other.
If because of external requirements, two service
groups must failover together, then they are, in
reality, a single group.

13
N-to-1 clusters (I)
A single standby node for the whole cluster -
This node can see all disks.
After recovery of a failed node, we must fail its
services back to it, freeing up the one node to
takeover for another set of service groups.
4-to-1 SCSI cluster
14
N-to-1 clusters (II)

The hosts are all identically attached to the
storage.
SAN-based 6-to-1 cluster
15
N-plus-1 clusters
1 dedicated stand-by node

After recovery, no failover is needed from
standby to recovered node
- Over time, the layout of hosts services will
not match the original layout within the
cluster. - As long as all of the cluster members
have similar performance capabilities, and they
can see all of the required disks, it does not
actually matter which host actually runs the
service.
As clusters begin to grow, its possible that a
single standby node will not be adequate
SAN-based 6-to-1 cluster
16
Failure Models
17
Failure detectors

Not necessarily reliable !
P is here message, every T sec, assuming a max.
message transmission delay D
Categorization of processes (hints)
suspected vs unsuspected
A process may be functioning correctly on the
other side of a partitioned network
or it could be slow to respond to probes
Reliable detection
unsuspected vs failed (crashed)
Feasible only in synchronous systems
It is possible to give different responses to
different processes
different comm. conditions

18
Failure Masking by Redundancy (I)

Hide the occurrence of failures from other
processes, by redundancy
Information
Extra bits to allow recovery
Time
Transactions to allow abort/redo
Particularly suited for transient or intermittent
faults
Physical
Extra equipment to tolerate loss/malfunction of
some components
or redundant s/w processes
Voter circuitry
Voters are components too ? They may themselves
fail !

19
Failure Masking by Redundancy (II)

Triple modular redundancy (TMR)

20
Flat vs Hierarchical Groups (I)
Process resilience by replicating processes into
groups
Group membership protocols
21
Flat vs Hierarchical Groups (II)

Flat groups
Symmetrical (no special roles)
No single point of failure
Complex operation protocols (eg voting)
Hierarchical groups
Coordinator is a single point of failure

Group membership
group server
distributed management
Eg reliable multicast

Detection of failed processes?
Join/leave must be synchronous
with data messages!
How to rebuild a group after a major
failure?

22
Failure Masking Replication

Having a group of identical processes allows us
to mask gt1 faulty process(es)
Primary-backup protocols
Hierarchical organization
Election among backups to select a new primary
Replicated-write protocols
Flat process groups
Active replication
Quorum protocols

K-fault tolerant system
Fail-silent processes ? group size (k 1)
Byzantine failures ? group size gt (2k 1)
Assuming that processes do not team up !!
(independent failures)

23
Coordination/Agreement

A set of process must collaborate
or agree with one or more processes
without a fixed master/slave relationships
failure assumptions failure detectors
Problems
mutual exclusion
election
multicast
reliability ordering semantics
consensus
Byzantine agreement

24
Problems of Agreement

A set of processes need to agree on a value
(decision), after one or more processes have
proposed what that value (decision) should be
Examples
mutual exclusion, election, transactions
Processes may be correct, crashed, or they may
exhibit arbitrary (Byzantine) failures
Messages are exchanged on an one-to-one basis,
and they are not signed

25
Two Agreement Problems

Consensus problem every process i proposes a
value vi, while in the undecided state. Process i
exchanges messages until it makes decision di and
moves to decided state.
Termination all correct processes must make a
decision
Agreement same decision for all correct
processes
Integrity if all correct processes proposed same
value, any correct process decides that value
Byzantine generals problem a commander
process i orders value v.
The lieutenant processes must agree on what the
commander ordered.
Processes may be faulty
provide wrong or contradictory messages
Integrity requirement
A distinguished process decides a value for
others to agree upon
Solution only exists if N gt 3f, where f faulty
processes

26
Consensus for 3 processes
27
The Two-Army Problem

How can two perfect processes reach agreement
about 1 bit of information ?
over an unreliable comm. channel
Red army 5000 troops
Blue army 1, 2 3000 troops each
How can the blue armies reach agreement on when
to attack ?
Their only means of communication is by sending
messengers
that may be captured by the enemy !
No solution!
Proof by contradiction Assume there is a
solution with a minimum messages

28
Consensus No Failures Case
majority(v1, , vN) returns most frequently
occurring value - returns if no majority
exists
Consensus via reliable multicast
For ordered values, min/max could be used instead
of majority
In general, if failures can occur it is not 100
certain that consensus can be reached in finite
time !
Terminating Reliable Multicast (TRB) A single
process multicasts a msg, and all
correct processes must agree on that msg -
Even if sender crashes, all correct processes
must deliver a special msg (Server-Fault)
29
Relation among problems
A problem B reduces to a problem A if there is an
algorithm which transforms any algorithm for A
into an algorithm for B.
Synchronous systems TRB is equivalent to
Consensus
Asynchronous systems Consensus reduces to
TRB but not vice versa!
Asynchronous systems with crash failures
Atomic Multicast is equivalent to Consensus
30
Consensus in synchronous systems
Duration of round max. delay of B-multicast
Up to f faulty processes
Dolev Strong, 1983 Any algorithm to reach
consensus despite up to f failures requires (f
1) rounds.
31
Byzantine agreement synchronous
Faulty process
Nothing can be done to improve a correct
process knowledge beyond the first stage -
It cannot tell which process is faulty.
3 says 1 says u
Lamport et al, 1982 No solution for N 3, f
1
Pease et al, 1982 No solution for Nlt 3f
(assuming private comm. channels)
32
Agreement in Faulty Systems (I)

The Byzantine generals problem for 3 loyal
generals and 1 traitor
The generals announce their troop strengths
The vectors that each general assembles based on
(a)
The vectors that each general receives in step 3.

Consensus by generals 1, 2, 4 ? (1, 2, UNKNOWN,
4))
33
Agreement in Faulty Systems (II)
No majority !

The same as in previous slide, except now with 2
loyal generals and one traitor.

34
Byzantine agreement for N gt 3f
Example with N4, f1 - 1st round Commander
sends a value to each lieutenant - 2nd round
Each of the lieutenants sends the value it has
received to each of its peers.
- A lieutenant receives a total of (N 2) 1
values, of which (N 2) are correct. -
By majority(), the correct lieutenants compute
the same value.
In general, O(N(f1)) msgs
O(N2) for signed msgs
35
Impossibility of (deterministic) consensus in
asynchronous systems
M.J. Fischer, N. Lynch, and M. Paterson
Impossibility of distributed consensus with one
faulty process, J. ACM, 32(2), pp. 374-382,
1985.
A crashed process cannot be distinguished from a
slow one. - Not even with a 100 reliable
comm. network !
There is always a chance that some continuation
of the processes execution avoid consensus being
reached.
No guarantee for consensus, but Prob(consensus)
gt 0
Solutions based on randomization or
(unreliable) failure detectors or by fault
masking

Write a Comment

User Comments (0)

About PowerShow.com

CS556: Distributed Systems PowerPoint PPT Presentation