Decentralizing Grids - PowerPoint PPT Presentation

1 / 102

About This Presentation

Title:

Decentralizing Grids

Description:

May have to live within the Grid ecosystem. Condor, Globus, Grid ... bingo! ... bingo! Routing = Discovery. Routing = Discovery. outside. client ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 103

Provided by: jino6

Category:

more less

Transcript and Presenter's Notes

Title: Decentralizing Grids

1
Decentralizing Grids

Jon Weissman
University of Minnesota
E-Science Institute
Nov. 8 2007

2
Roadmap

Background
The problem space
Some early solutions
Research frontier/opportunities
Wrapup

3
Background

Grids are distributed but also centralized
Condor, Globus, BOINC, Grid Services, VOs
Why? client-server based
Centralization pros
Security, policy, global resource management
Decentralization pros
Reliability, dynamic, flexible, scalable
Fertile CS research frontier

4
Challenges

May have to live within the Grid ecosystem
Condor, Globus, Grid services, VOs, etc.
First principle approaches are risky (Legion)
50K foot view
How to decentralize Grids yet retain their
existing features?
High performance, workflows, performance
prediction, etc.

5
Decentralized Grid platform

Minimal assumptions about each node
Nodes have associated assets (A)
basic CPU, memory, disk, etc.
complex application services
exposed interface to assets OS, Condor, BOINC,
Web service
Nodes may up or down
Node trust is not a given (do X, does Y instead)
Nodes may connect to other nodes or not
Nodes may be aggregates
Grid may be large gt 100K nodes, scalability is
key

6
Grid Overlay
Condor network
Grid service
Raw OS services
BOINC network
7
Grid Overlay - Join
Condor network
Grid service
Raw OS services
BOINC network
8
Grid Overlay - Departure
Condor network
Grid service
Raw OS services
BOINC network
9
Routing Discovery
discover A
Query contains sufficient information to locate a
node RSL, ClassAd, etc Exact match or semantic
match
10
Routing Discovery
bingo!
11
Routing Discovery
Discovered node returns a handle sufficient for
the client to interact with it - perform
service invocation, job/data transmission,
etc
12
Routing Discovery

Three parties
initiator of discovery events for A
client invocation, health of A
node offering A
Often initiator and client will be the same
Other times client will be determined dynamically
if W is a web service and results are returned to
a calling client, want to locate CW near W gt
discover W, then CW !

13
Routing Discovery
X
discover A
14
Routing Discovery
15
Routing Discovery
bingo!
16
Routing Discovery
17
Routing Discovery
outside client
18
Routing Discovery
discover As
19
Routing Discovery
20
Grid Overlay

This generalizes
Resource query (query contains job requirements)
Looks like decentralized matchmaking
These are the easy cases
independent simple queries
find a CPU with characteristics x, y, z
find 100 CPUs each with x, y, z
suppose queries are complex or related?
find N CPUs with aggregate power G Gflops
locate an asset near a prior discovered asset

21
Grid Scenarios

Grid applications are more challenging
Application has a more complex structure
multi-task, parallel/distributed, control/data
dependencies
individual job/task needs a resource near a data
source
workflow
queries are not independent
Metrics are collective
not simply raw throughput
makespan
response
QoS

22
Related Work

Maryland/Purdue
matchmaking
Oregon-CCOF
time-zone

CAN
23
Related Work (contd)

None of these approaches address the Grid
scenarios (in a decentralized manner)
Complex multi-task data/control dependencies
Collective metrics

24
50K Ft Research Issues

Overlay Architecture
structured, unstructured, hybrid
what is the right architecture?
Decentralized control/data dependencies
how to do it?
Reliability
how to achieve it?
Collective metrics
how to achieve them?

25
Context Application Model
answer
data source
26
Context Application Models
Reliability Collective metrics Data
dependence Control dependence
27
Context Environment

RIDGE project - ridge.cs.umn.edu
reliable infrastructure for donation grid envs
Live deployment on PlanetLab planet-lab.org
700 nodes spanning 335 sites and 35 countries
emulators and simulators
Applications
BLAST
Traffic planning
Image comparison

28
Application Models
Reliability Collective metrics Data
dependence Control dependence
29
Reliability Example

C
E
D
B
G
30
Reliability Example

C
E
D
B
CG
G
CG responsible for Gs health
31
Reliability Example

C
E
D
B
G, loc(CG )
CG
32
Reliability Example

C
E
D
B
G
CG
could also discover G then CG
33
Reliability Example

C
E
D
X
B
CG
34
Reliability Example

C
E
D
G.
CG
35
Reliability Example

C
E
D
G
CG
36
Client Replication

C
E
D
B
G
37
Client Replication

C
E
D
B
G
CG2
CG1
loc (G), loc (CG1), loc (CG2) propagated
38
Client Replication

C
E
D
B
G
CG2
X
CG1
client hand-off depends on nature of G and
interaction
39
Component Replication

C
E
D
B
G
40
Component Replication

C
E
D
G2
G1
CG
41
Replication Research

Nodes are unreliable crash, hacked, churn,
malicious, slow, etc.
How many replicas?
too many waste of resources
too few application suffers

42
System Model

Reputation rating ri degree of node reliability
Dynamically size the redundancy based on ri
Nodes are not connected and check-in to a central
server
Note variable sized groups

0.9
0.8
0.8
0.7
0.7
0.4
0.3
0.4
0.8
0.8
43
Reputation-based Scheduling

Reputation rating
Techniques for estimating reliability based on
past interactions
Reputation-based scheduling algorithms
Using reliabilities for allocating work
Relies on a success threshold parameter

44
Algorithm Space

How many replicas?
first-, best-fit, random, fixed,
algorithms compute how many replicas to meet a
success threshold
How to reach consensus?
M-first (better for timeliness)
Majority (better for byzantine threats)

45
Experimental Results correctness
This was a simulation based on byzantine behavior
majority voting
46
Experimental Results timeliness
M-first (M1), best BOINC (BOINC), conservative
(BOINC-) vs. RIDGE
47
Next steps

Nodes are decentralized, but not trust
management!
Need a peer-based trust exchange framework
Stanford Eigentrust project local exchange
until network converges to a global state

48
Application Models
Reliability Collective metrics Data
dependence Control dependence
49
Collective Metrics

Throughput not always the best metric
Response, completion time, application-centric
makespan - response

50
Communication Makespan

Nodes download data from replicated data nodes
Nodes choose data servers independently
(decentralized)
Minimize the maximum download time for all worker
nodes (communication makespan)

data download dominates
51
Data node selection

Several possible factors
Proximity (RTT)
Network bandwidth
Server capacity

Download Time vs. RTT - linear
Download Time vs. Bandwidth - exp
52
Heuristic Ranking Function

Query to get candidates, RTT/bw probes
Node i, data server node j
Cost function rtti,j exp(kj /bwi,j), kj
load/capacity
Least cost data node selected independently
Three server selection heuristics that use kj
BW-ONLY kj 1
BW-LOAD kj n-minute average load (past)
BW-CAND kj of candidate responses in last m
seconds ( future load)

53
Performance Comparison
54
Computational Makespan
55
Computational Makespan

variable-sized
equal-sized
56
Next Steps

Other makespan scenarios
Eliminate probes for bw and RTT -gt estimation
Richer collective metrics
deadlines user-in-the-loop

57
Application Models
Reliability Collective metrics Data
dependence Control dependence
58
Application Models
Reliability Collective metrics Data
dependence Control dependence
59
Data Dependence

Data-dependent component needs access to one or
more data sources data may be large

discover A
60
Data Dependence (contd)
discover A
Where to run it?
61
The Problem

Where to run a data-dependent component?
determine candidate set
select a candidate
Unlikely a candidate knows downstream bw from
particular data nodes
Idea infer bw from neighbor observations w/r to
data nodes!

62
Estimation Technique

C1 may have had little past interaction with
but its neighbors may have
For each neighbor generate a download estimate
DT prior download time to from
neighbor
RTT from candidate and neighbor to
respectively
DP average weighted measure of prior download
times for any node to any data source

63
Estimation Technique (contd)

Download Power (DP) characterizes download
capability of a node
DP average (DT RTT)
DT not enough (far-away vs. nearby data source)
Estimation associated with each neighbor ni
ElapsedEst ni a ß DT
a my_RTT/neighbor_RTT (to )
ß neighbor_DP /my_DP
no active probes historical data, RTT inference
Combining neighbor estimates
mean, median, min, .
median worked the best
Take a min over all candidate estimates

64
Comparison of Candidate Selection Heuristics
SELF uses direct observations
65
Take Away

Next steps
routing to the best candidates
Locality between a data source and component
scalable, no probing needed
many uses

66
Application Models
Reliability Collective metrics Data
dependence Control dependence
67
The Problem

How to enable decentralized control?
propagate downstream graph stages
perform distributed synchronization
Idea
distributed dataflow token matching
graph forwarding, futures (Mentat project)

68
Control Example

C
E
D
B
control node token matching
69
Simple Example

B
C
G
E
D
70
Control Example
E, BCD
B
C
E, BCD
C, G
G
E
D, G
D
E, BCD
71
Control Example

C
E
D
B
E, BCD
C
D
E, BCD
E, BCD
72
Control Example

C
E
D
B
E, BCD, loc(SB)
C
D
E, BCD, loc(SC)
E, BCD, loc(SD)
output stored at loc() where component is run,
or client, or a storage node
73
Control Example

C
E
D
B
B
C
D
74
Control Example

C
E
D
B
B
C
D
75
Control Example

C
E
D
B
B
C
D
76
Control Example

C
E
D
B
B
E
C
D
77
Control Example

C
E
D
B
B
E
C
D
78
Control Example

C
E
D
How to color and route tokens so that they arrive
to the same control node?
B
B
E
C
D
79
Open Problems

Support for Global Operations
troubleshooting what happened?
monitoring application progress?
cleanup application died, cleanup state
Load balance across different applications
routing to guarantee dispersion

80
Summary

Decentralizing Grids is a challenging problem
Re-think systems, algorithms, protocols, and
middleware gt fertile research
Keep our eye on the ball
reliability, scalability, and maintaining
performance
Some preliminary progress on point solutions

81
My visit

Looking to apply some of these ideas to existing
UK projects via collaboration
Current and potential projects
Decentralized dataflow (Adam Barker)
Decentralized applications Haplotype analysis
(Andrea Christoforou, Mike Baker)
Decentralized control openKnowledge (Dave
Robertson)
Goal improve reliability and scalability of
applications and/or infrastructures

Questions

83
(No Transcript)
84

EXTRAS

85
Non-stationarity

Nodes may suddenly shift gears
deliberately malicious, virus, detach/rejoin
underlying reliability distribution changes
Solution
window-based rating
adapt/learn ltarget
Experiment blackout at
round 300 (30 effected)

86
Adapting
87
Adaptive Algorithm
success rate
throughput
88
success rate
throughput
89
Scheduling Algorithms
90
Estimation Accuracy

Objects 27 (.5 MB 2MB)
Nodes 130 on PlanetLab
Download 15,000 times from a randomly chosen node

Download Elapsed Time Ratio (x-axis) is a ratio
of estimation to real measured time
1 means perfect estimation
Accept if the estimation is within a range
measured (measured error)
Accept with error0.33 67 of the total are
accepted
Accept with error0.50 83 of the total are
accepted

91
Impact of Churn
Random mean
Global(Prox) mean

Jinoh mean over what?

92
Estimating RTT

We use distance v(RTT1)
Simple RTT inference technique based on triangle
inequality
Triangle Inequality Latency(a,c) Latency(a,b)
Latency(b,c)
Latency(a,b)-Latency(b,c) Latency(a,c)
Latency(a,b)Latency(b,c)
Pick the intersected area as the range, and take
the mean

Lower bound
Higher bound
Via Neighbor A
Via Neighbor B
Via Neighbor C
Inference
RTT
Final inference
Intersected range
93
RTT Inference Result

More neighbors, greater accuracy
With 5 neighbors, 85 of the total lt 16 error

94
Other Constraints
E, BCD
B
C
E, BCD
C, A, dep-CD
A
E
D, A, dep-CD
D
E, BCD
C D interact and they should be co-allocated,
nearby Tokens in bold should route to same
control point so a collective query for C D can
be issued
95
Support for Global Operations

Troubleshooting what happened?
Monitoring application progress?
Cleanup application died, cleanup state
Solution mechanism propagate control node IPs
back to origin (gt origin IP piggybacked)
Control nodes and matcher nodes report progress
(or lack thereof via timeouts) to origin
Load balance across different applications

96
Other Constraints
E, BCD
B
C
E, BCD
C, A
A
E
D, A
D
E, BCD
C D interact and they should be co-allocated,
nearby
97
Combining Neighbors Estimation

MEDIAN shows best results using 3 neighbors
88 of the time error is within 50 (variation in
download times is a factor of 10-20)
3 neighbors gives greatest bang

98
Effect of Candidate Size
99
Performance Comparison