Computing in the - PowerPoint PPT Presentation

1 / 93
About This Presentation
Title:

Computing in the

Description:

Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, ... Chandra et al., Impossibility of Group. Membership in an Asynchronous Environment. IEEE ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 94
Provided by: marc259
Category:

less

Transcript and Presenter's Notes

Title: Computing in the


1
Computing in the
Reliable Array of Independent Nodes
Vasken Bohossian, Charles Fan, Paul LeMahieu,
Marc Riedel, Lihao Xu, Jehoshua Bruck
Marc Riedel
Marc Riedel
California Institute of Technology
IEEE Workshop on Fault-Tolerant Parallel and
Distributed Systems
May 5, 2000
2
RAIN Project
Collaboration
  • Caltechs Parallel and Distributed Computing
    Group www.paradise.caltech.edu
  • JPLs Center for Integrated Space Microsystems
    www.csmt.jpl.nasa.gov

3
RAIN Platform
node
node
node
Heterogeneous network of nodes and switches
switch
bus network
switch
node
node
node
4
RAIN Testbed
www.paradise.caltech.edu
5
Proof of Concept Video Server
Video client server on every node.
C
D
B
A
switch
switch
6
Limited Storage
Insufficient storage to replicate all the data on
each node.
C
D
B
A
switch
switch
7
k-of-n Code
Erasure-correcting code
from any k of n columns
8
Encoding
Encode video using 2-of-4 code.
C
D
B
A
switch
switch
9
Decoding
Retrieve data and decode.
C
D
B
A
switch
switch
10
Node Failure
C
D
B
A
switch
switch
11
Node Failure
C
D
B
A
switch
switch
12
Node Failure
Dynamically switch to another node.
C
D
B
A
switch
switch
13
Link Failure
D
C
B
A
switch
switch
14
Link Failure
C
D
B
A
switch
switch
15
Link Failure
Dynamically switch to another network path.
D
C
B
A
switch
switch
16
Switch Failure
D
C
B
A
switch
switch
17
Switch Failure
C
D
B
A
switch
switch
18
Switch Failure
Dynamically switch to another network path.
C
D
B
A
switch
switch
19
Node Recovery
C
D
B
A
switch
switch
20
Node Recovery
Continuous reconfiguration (e.g., load-balancing).
C
D
B
A
switch
switch
21
Features
High availability
  • tolerates multiple node/link/switch failures
  • no single point of failure

Efficient use of resources
  • multiple data paths
  • redundant storage
  • graceful degradation

Dynamic scalability/reconfigurability
22
RAIN Project Goals
Efficient, reliable distributed computing and
storage systems
key building blocks
Applications
Storage
Communication
Networks
23
Topics
Todays Talk
  • Fault-Tolerant Interconnect Topologies
  • Connectivity
  • Group Membership
  • Distributed Storage

Applications
Storage
Communication
Networks
24
Interconnect Topologies
Goal lose at most a constant number of nodes
for given network loss
N
N
N
N
N
N
N
N
N
N
Network
computing/storage node
N
25
Resistance to Partitions
Large partitions problematic for distributed
services/computation
N
N
N
N
N
N
N
N
N
N
Network
computing/storage node
N
26
Resistance to Partitions
Large partitions problematic for distributed
services/computation
N
N
N
N
N
N
N
N
N
N
Network
computing/storage node
N
27
Related Work
Embedding hypercubes, rings, meshes, trees in
fault-tolerant networks
  • Hayes et al., Bruck et al., Boesch et al.

Bus-based networks which are resistant to
partitioning
  • Ku and Hayes, 1997. Connective Fault-Tolerance
    in Multiple-Bus Systems

28
A Ring of Switches
degree-2 compute nodes, degree-4 switches
a naïve solution
29
A Ring of Switches
degree-2 compute nodes, degree-4 switches
a naïve solution
30
A Ring of Switches
degree-2 compute nodes, degree-4 switches
N
N
S
S
S
a naïve solution
N
N
easily partitioned
S
S
Node
N
N
N
S
S
Switch
S
N
31
Resistance to Partitioning
degree-2 compute nodes, degree-4 switches
nodes on diagonals
32
Resistance to Partitioning
degree-2 compute nodes, degree-4 switches
nodes on diagonals
33
Resistance to Partitioning
degree-2 compute nodes, degree-4 switches
2
8
8
2
nodes on diagonals
7
3
7
  • tolerates any 3 switch failures (optimal)
  • generalizes to arbitrary node/switch degrees.

3
4
5
5
Details paper IPPS98, www.paradise.caltech.edu
34
Resistance to Partitioning
Details paper IPPS98, www.paradise.caltech.edu
35
Point-to-Point Connectivity
A
node
node
node
?
Is the path from A to B up or down?
Network
node
node
node
B
36
Connectivity
Bi-directional communication.
Link is seen as up or down by each node.
Node A
Node B
U,D
U,D
Each node sends out pings.
A node may time-out, deciding the link is down.
37
Consistent History
A
B
38
The Slack
Node State
A
B
U
U
D
D
U
U
D
D
U
D
U
39
Consistent History
Consistency in error reporting If A sees channel
error, B sees channel error.
Node A
Node B
U,D
U,D
Birman et al. Reliability Through Consistency
Details paper IPPS99, www.paradise.caltech.edu
40
Group Membership
Consistent global view given local,
point-to-point connectivity information
ABCD
ABCD
  • link/node failures
  • dynamic reconfiguration

ABCD
ABCD
41
Related Work
42
Group Membership
Token-Ring based Group Membership Protocol
43
Group Membership
Token-Ring based Group Membership Protocol
B
A
Token carries
  • group membership list
  • sequence number

D
C
44
Group Membership
Token-Ring based Group Membership Protocol
B
A
1
Token carries
  • group membership list
  • sequence number

D
C
45
Group Membership
Token-Ring based Group Membership Protocol
B
A
1
2
Token carries
  • group membership list
  • sequence number

2 ABCD
D
C
46
Group Membership
Token-Ring based Group Membership Protocol
B
A
1
2
Token carries
  • group membership list
  • sequence number

3 ABCD
D
C
3
47
Group Membership
Token-Ring based Group Membership Protocol
B
A
1
2
Token carries
  • group membership list
  • sequence number

4 ABCD
D
C
3
4
48
Group Membership
Token-Ring based Group Membership Protocol
B
A
5
2
Token carries
  • group membership list
  • sequence number

D
C
3
4
49
Group Membership
Node or link fails
B
A
5
2
D
C
3
4
50
Group Membership
Node or link fails
B
A
5
D
C
3
4
51
Group Membership
Node or link fails
?
B
A
5
D
C
3
4
52
Group Membership
Node or link fails
?
B
A
5
D
C
3
4
53
Group Membership
Node or link fails
B
A
5
If a node is inaccessible, it is excluded and
bypassed.
D
C
3
4
54
Group Membership
Node or link fails
B
A
5
If a node is inaccessible, it is excluded and
bypassed.
6 ACD
D
C
6
4
55
Group Membership
Node or link fails
B
A
5
If a node is inaccessible, it is excluded and
bypassed.
D
C
6
7
56
Group Membership
Node or link fails
B
A
5
If a node is inaccessible, it is excluded and
bypassed.
D
C
6
7
57
Group Membership
Node with token fails
B
A
5
D
C
6
7
58
Group Membership
Node with token fails
B
A
5
D
C
6
59
Group Membership
Node with token fails
?
B
A
5
?
D
C
6
60
Group Membership
Node with token fails
?
B
A
5
If the token is lost, it is regenerated.
?
D
C
6
61
Group Membership
Node with token fails
B
A
5
If the token is lost, it is regenerated.
D
C
6
62
Group Membership
Node with token fails
B
A
5
If the token is lost, it is regenerated.
5 ACD
6 AD
D
C
6
63
Group Membership
Node with token fails
B
A
5
If the token is lost, it is regenerated.
5 ACD
Highest sequence number prevails.
6 AD
D
C
6
64
Group Membership
Node with token fails
B
A
7
If the token is lost, it is regenerated.
Highest sequence number prevails.
D
C
6
65
Group Membership
Node recovers
B
A
7
D
C
6
66
Group Membership
Node recovers
B
A
7
Recovering nodes are added.
D
C
6
67
Group Membership
Node recovers
B
A
7
Recovering nodes are added.
7 ADC
D
C
6
68
Group Membership
Node recovers
B
A
7
Recovering nodes are added.
8 ADC
D
C
6
8
69
Group Membership
Node recovers
B
A
7
Recovering nodes are added.
9 ADC
D
C
9
8
70
Group Membership
Node recovers
B
A
10
Recovering nodes are added.
D
C
9
8
71
Group Membership
Features
B
A
10
  • Unicast messages
  • Dynamic reconfiguration
  • Mean time-to-failure gt convergence time

D
Details publication forthcoming.
C
9
8
72
Distributed Storage
101001001000
73
Distributed Storage
Focus reliability and performance.
74
Array Codes
Ideally suited for distributed storage.
Low encoding/decoding complexity.
data
redundancy
B-code
75
Array Codes
Ideally suited for distributed storage.
Low encoding/decoding complexity.
a
b
c
d
dc
da
ab
bc
from any k of n columns
76
Array Codes
Ideally suited for distributed storage.
Low encoding/decoding complexity.
a
b
c
d
a
b
c
d
dc
da
ab
bc
B-Code and X-Code
  • optimally redundant
  • optimal encoding/decoding complexity

Details IEEE Trans. Info Theory,
www.paradise.caltech.edu
77
Summary
78
Proof-of-Concept Applications
79
Rainfinity
Start-up based on RAIN technology
www.rainfinity.com
  • availability
  • scalability
  • performance

80
Rainfinity
Start-up based on RAIN technology
Company
  • Founded Sept. 1998
  • Released first product April 1999
  • Received 15 million funding in Dec. 1999
  • Now over 50 employees

www.rainfinity.com
81
Future Research
  • Development of APIs
  • Fault-Tolerant Distributed Filesystem
  • Fault-Tolerant MPI/PVM implementation

82
End of Talk
Material that was cut...
83
Erasure Correcting Codes
Strategy encode data with an erasure-correcting
code.
84
Erasure Correcting Codes
Strategy encode data with an erasure-correcting
code.
data
k
lose up to m coordinates
1
0
1
0
1
1
0
1
0
1
0
0
0
n
85
Erasure Correcting Codes
Strategy encode data with an erasure-correcting
code.
data
k
lose up to m coordinates
1
0
1
0
1
1
0
1
0
1
0
0
0
n
86
Erasure Correcting Codes
Example Reed-Solomon code.
data
k
lose up to m coordinates
1
0
1
0
1
1
0
1
0
1
0
0
0
n
87
RAIN Distributed Store
  • Encode data with (n, k) array code
  • Store one symbol per node

disk
disk
disk
disk
88
RAIN Distributed Retrieve
  • Retrieve encoded data from any k nodes
  • Reconstruct data

disk
a
dc
89
RAIN Distributed Retrieve
  • Reliability (similar to RAID systems)

disk
a
dc
90
RAIN Distributed Retrieve
  • Reliability (similar to RAID systems)

disk
a
dc
91
RAIN Distributed Retrieve
  • Reliability (similar to RAID systems)
  • Performance load-balancing

disk
disk
disk
a
dc
92
RAIN Distributed Retrieve
  • Reliability (similar to RAID systems)
  • Performance load-balancing

disk
disk
disk
a
dc
93
RAIN Distributed Retrieve
  • Reliability (similar to RAID systems)
  • Performance load-balancing

disk
disk
disk
a
dc
Write a Comment
User Comments (0)
About PowerShow.com