Decentralizing Grids - PowerPoint PPT Presentation

1 / 102
About This Presentation
Title:

Decentralizing Grids

Description:

May have to live within the Grid ecosystem. Condor, Globus, Grid ... bingo! ... bingo! Routing = Discovery. Routing = Discovery. outside. client ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 103
Provided by: jino6
Category:

less

Transcript and Presenter's Notes

Title: Decentralizing Grids


1
Decentralizing Grids
  • Jon Weissman
  • University of Minnesota
  • E-Science Institute
  • Nov. 8 2007

2
Roadmap
  • Background
  • The problem space
  • Some early solutions
  • Research frontier/opportunities
  • Wrapup

3
Background
  • Grids are distributed but also centralized
  • Condor, Globus, BOINC, Grid Services, VOs
  • Why? client-server based
  • Centralization pros
  • Security, policy, global resource management
  • Decentralization pros
  • Reliability, dynamic, flexible, scalable
  • Fertile CS research frontier

4
Challenges
  • May have to live within the Grid ecosystem
  • Condor, Globus, Grid services, VOs, etc.
  • First principle approaches are risky (Legion)
  • 50K foot view
  • How to decentralize Grids yet retain their
    existing features?
  • High performance, workflows, performance
    prediction, etc.

5
Decentralized Grid platform
  • Minimal assumptions about each node
  • Nodes have associated assets (A)
  • basic CPU, memory, disk, etc.
  • complex application services
  • exposed interface to assets OS, Condor, BOINC,
    Web service
  • Nodes may up or down
  • Node trust is not a given (do X, does Y instead)
  • Nodes may connect to other nodes or not
  • Nodes may be aggregates
  • Grid may be large gt 100K nodes, scalability is
    key

6
Grid Overlay
Condor network
Grid service
Raw OS services
BOINC network
7
Grid Overlay - Join
Condor network
Grid service
Raw OS services
BOINC network
8
Grid Overlay - Departure
Condor network
Grid service
Raw OS services
BOINC network
9
Routing Discovery
discover A
Query contains sufficient information to locate a
node RSL, ClassAd, etc Exact match or semantic
match
10
Routing Discovery
bingo!
11
Routing Discovery
Discovered node returns a handle sufficient for
the client to interact with it - perform
service invocation, job/data transmission,
etc
12
Routing Discovery
  • Three parties
  • initiator of discovery events for A
  • client invocation, health of A
  • node offering A
  • Often initiator and client will be the same
  • Other times client will be determined dynamically
  • if W is a web service and results are returned to
    a calling client, want to locate CW near W gt
  • discover W, then CW !

13
Routing Discovery
X
discover A
14
Routing Discovery
15
Routing Discovery
bingo!
16
Routing Discovery
17
Routing Discovery
outside client
18
Routing Discovery
discover As
19
Routing Discovery
20
Grid Overlay
  • This generalizes
  • Resource query (query contains job requirements)
  • Looks like decentralized matchmaking
  • These are the easy cases
  • independent simple queries
  • find a CPU with characteristics x, y, z
  • find 100 CPUs each with x, y, z
  • suppose queries are complex or related?
  • find N CPUs with aggregate power G Gflops
  • locate an asset near a prior discovered asset

21
Grid Scenarios
  • Grid applications are more challenging
  • Application has a more complex structure
    multi-task, parallel/distributed, control/data
    dependencies
  • individual job/task needs a resource near a data
    source
  • workflow
  • queries are not independent
  • Metrics are collective
  • not simply raw throughput
  • makespan
  • response
  • QoS

22
Related Work
  • Maryland/Purdue
  • matchmaking
  • Oregon-CCOF
  • time-zone

CAN
23
Related Work (contd)
  • None of these approaches address the Grid
    scenarios (in a decentralized manner)
  • Complex multi-task data/control dependencies
  • Collective metrics

24
50K Ft Research Issues
  • Overlay Architecture
  • structured, unstructured, hybrid
  • what is the right architecture?
  • Decentralized control/data dependencies
  • how to do it?
  • Reliability
  • how to achieve it?
  • Collective metrics
  • how to achieve them?

25
Context Application Model
answer
data source
26
Context Application Models
Reliability Collective metrics Data
dependence Control dependence
27
Context Environment
  • RIDGE project - ridge.cs.umn.edu
  • reliable infrastructure for donation grid envs
  • Live deployment on PlanetLab planet-lab.org
  • 700 nodes spanning 335 sites and 35 countries
  • emulators and simulators
  • Applications
  • BLAST
  • Traffic planning
  • Image comparison

28
Application Models
Reliability Collective metrics Data
dependence Control dependence
29
Reliability Example

C
E
D
B
G
30
Reliability Example

C
E
D
B
CG
G
CG responsible for Gs health
31
Reliability Example

C
E
D
B
G, loc(CG )
CG
32
Reliability Example

C
E
D
B
G
CG
could also discover G then CG
33
Reliability Example

C
E
D
X
B
CG
34
Reliability Example

C
E
D
G.
CG
35
Reliability Example

C
E
D
G
CG
36
Client Replication

C
E
D
B
G
37
Client Replication

C
E
D
B
G
CG2
CG1
loc (G), loc (CG1), loc (CG2) propagated
38
Client Replication

C
E
D
B
G
CG2
X
CG1
client hand-off depends on nature of G and
interaction
39
Component Replication

C
E
D
B
G
40
Component Replication

C
E
D
G2
G1
CG
41
Replication Research
  • Nodes are unreliable crash, hacked, churn,
    malicious, slow, etc.
  • How many replicas?
  • too many waste of resources
  • too few application suffers

42
System Model
  • Reputation rating ri degree of node reliability
  • Dynamically size the redundancy based on ri
  • Nodes are not connected and check-in to a central
    server
  • Note variable sized groups

0.9
0.8
0.8
0.7
0.7
0.4
0.3
0.4
0.8
0.8
43
Reputation-based Scheduling
  • Reputation rating
  • Techniques for estimating reliability based on
    past interactions
  • Reputation-based scheduling algorithms
  • Using reliabilities for allocating work
  • Relies on a success threshold parameter

44
Algorithm Space
  • How many replicas?
  • first-, best-fit, random, fixed,
  • algorithms compute how many replicas to meet a
    success threshold
  • How to reach consensus?
  • M-first (better for timeliness)
  • Majority (better for byzantine threats)

45
Experimental Results correctness
This was a simulation based on byzantine behavior
majority voting
46
Experimental Results timeliness
M-first (M1), best BOINC (BOINC), conservative
(BOINC-) vs. RIDGE
47
Next steps
  • Nodes are decentralized, but not trust
    management!
  • Need a peer-based trust exchange framework
  • Stanford Eigentrust project local exchange
    until network converges to a global state

48
Application Models
Reliability Collective metrics Data
dependence Control dependence
49
Collective Metrics
  • Throughput not always the best metric
  • Response, completion time, application-centric
  • makespan - response

50
Communication Makespan
  • Nodes download data from replicated data nodes
  • Nodes choose data servers independently
    (decentralized)
  • Minimize the maximum download time for all worker
    nodes (communication makespan)

data download dominates
51
Data node selection
  • Several possible factors
  • Proximity (RTT)
  • Network bandwidth
  • Server capacity

Download Time vs. RTT - linear
Download Time vs. Bandwidth - exp
52
Heuristic Ranking Function
  • Query to get candidates, RTT/bw probes
  • Node i, data server node j
  • Cost function rtti,j exp(kj /bwi,j), kj
    load/capacity
  • Least cost data node selected independently
  • Three server selection heuristics that use kj
  • BW-ONLY kj 1
  • BW-LOAD kj n-minute average load (past)
  • BW-CAND kj of candidate responses in last m
    seconds ( future load)

53
Performance Comparison
54
Computational Makespan
55
Computational Makespan

variable-sized
equal-sized
56
Next Steps
  • Other makespan scenarios
  • Eliminate probes for bw and RTT -gt estimation
  • Richer collective metrics
  • deadlines user-in-the-loop

57
Application Models
Reliability Collective metrics Data
dependence Control dependence
58
Application Models
Reliability Collective metrics Data
dependence Control dependence
59
Data Dependence
  • Data-dependent component needs access to one or
    more data sources data may be large

discover A
60
Data Dependence (contd)
discover A
Where to run it?
61
The Problem
  • Where to run a data-dependent component?
  • determine candidate set
  • select a candidate
  • Unlikely a candidate knows downstream bw from
    particular data nodes
  • Idea infer bw from neighbor observations w/r to
    data nodes!

62
Estimation Technique
  • C1 may have had little past interaction with
  • but its neighbors may have
  • For each neighbor generate a download estimate
  • DT prior download time to from
    neighbor
  • RTT from candidate and neighbor to
    respectively
  • DP average weighted measure of prior download
    times for any node to any data source

63
Estimation Technique (contd)
  • Download Power (DP) characterizes download
    capability of a node
  • DP average (DT RTT)
  • DT not enough (far-away vs. nearby data source)
  • Estimation associated with each neighbor ni
  • ElapsedEst ni a ß DT
  • a my_RTT/neighbor_RTT (to )
  • ß neighbor_DP /my_DP
  • no active probes historical data, RTT inference
  • Combining neighbor estimates
  • mean, median, min, .
  • median worked the best
  • Take a min over all candidate estimates

64
Comparison of Candidate Selection Heuristics
SELF uses direct observations
65
Take Away
  • Next steps
  • routing to the best candidates
  • Locality between a data source and component
  • scalable, no probing needed
  • many uses

66
Application Models
Reliability Collective metrics Data
dependence Control dependence
67
The Problem
  • How to enable decentralized control?
  • propagate downstream graph stages
  • perform distributed synchronization
  • Idea
  • distributed dataflow token matching
  • graph forwarding, futures (Mentat project)

68
Control Example

C
E
D
B
control node token matching
69
Simple Example

B
C
G
E
D
70
Control Example
E, BCD
B
C
E, BCD
C, G
G
E
D, G
D
E, BCD
71
Control Example

C
E
D
B
E, BCD
C
D
E, BCD
E, BCD
72
Control Example

C
E
D
B
E, BCD, loc(SB)
C
D
E, BCD, loc(SC)
E, BCD, loc(SD)
output stored at loc() where component is run,
or client, or a storage node
73
Control Example

C
E
D
B
B
C
D
74
Control Example

C
E
D
B
B
C
D
75
Control Example

C
E
D
B
B
C
D
76
Control Example

C
E
D
B
B
E
C
D
77
Control Example

C
E
D
B
B
E
C
D
78
Control Example

C
E
D
How to color and route tokens so that they arrive
to the same control node?
B
B
E
C
D
79
Open Problems
  • Support for Global Operations
  • troubleshooting what happened?
  • monitoring application progress?
  • cleanup application died, cleanup state
  • Load balance across different applications
  • routing to guarantee dispersion

80
Summary
  • Decentralizing Grids is a challenging problem
  • Re-think systems, algorithms, protocols, and
    middleware gt fertile research
  • Keep our eye on the ball
  • reliability, scalability, and maintaining
    performance
  • Some preliminary progress on point solutions

81
My visit
  • Looking to apply some of these ideas to existing
    UK projects via collaboration
  • Current and potential projects
  • Decentralized dataflow (Adam Barker)
  • Decentralized applications Haplotype analysis
    (Andrea Christoforou, Mike Baker)
  • Decentralized control openKnowledge (Dave
    Robertson)
  • Goal improve reliability and scalability of
    applications and/or infrastructures

82
  • Questions

83
(No Transcript)
84
  • EXTRAS

85
Non-stationarity
  • Nodes may suddenly shift gears
  • deliberately malicious, virus, detach/rejoin
  • underlying reliability distribution changes
  • Solution
  • window-based rating
  • adapt/learn ltarget
  • Experiment blackout at
  • round 300 (30 effected)

86
Adapting
87
Adaptive Algorithm
success rate
throughput
88
success rate
throughput
89
Scheduling Algorithms
90
Estimation Accuracy
  • Objects 27 (.5 MB 2MB)
  • Nodes 130 on PlanetLab
  • Download 15,000 times from a randomly chosen node
  • Download Elapsed Time Ratio (x-axis) is a ratio
    of estimation to real measured time
  • 1 means perfect estimation
  • Accept if the estimation is within a range
    measured (measured error)
  • Accept with error0.33 67 of the total are
    accepted
  • Accept with error0.50 83 of the total are
    accepted

91
Impact of Churn
Random mean
Global(Prox) mean
  • Jinoh mean over what?

92
Estimating RTT
  • We use distance v(RTT1)
  • Simple RTT inference technique based on triangle
    inequality
  • Triangle Inequality Latency(a,c) Latency(a,b)
    Latency(b,c)
  • Latency(a,b)-Latency(b,c) Latency(a,c)
    Latency(a,b)Latency(b,c)
  • Pick the intersected area as the range, and take
    the mean

Lower bound
Higher bound
Via Neighbor A
Via Neighbor B
Via Neighbor C
Inference
RTT
Final inference
Intersected range
93
RTT Inference Result
  • More neighbors, greater accuracy
  • With 5 neighbors, 85 of the total lt 16 error

94
Other Constraints
E, BCD
B
C
E, BCD
C, A, dep-CD
A
E
D, A, dep-CD
D
E, BCD
C D interact and they should be co-allocated,
nearby Tokens in bold should route to same
control point so a collective query for C D can
be issued
95
Support for Global Operations
  • Troubleshooting what happened?
  • Monitoring application progress?
  • Cleanup application died, cleanup state
  • Solution mechanism propagate control node IPs
    back to origin (gt origin IP piggybacked)
  • Control nodes and matcher nodes report progress
    (or lack thereof via timeouts) to origin
  • Load balance across different applications

96
Other Constraints
E, BCD
B
C
E, BCD
C, A
A
E
D, A
D
E, BCD
C D interact and they should be co-allocated,
nearby
97
Combining Neighbors Estimation
  • MEDIAN shows best results using 3 neighbors
    88 of the time error is within 50 (variation in
    download times is a factor of 10-20)
  • 3 neighbors gives greatest bang

98
Effect of Candidate Size
99
Performance Comparison
  • Parameters
  • Data size 2MB
  • Replication 10
  • Candidates 5

100
Computation Makespan (contd)
  • Now bring in reliability makespan improvement
    scales well

components
101
Token loss
  • Between B and matcher matcher and next stage
  • matcher must notify CB when token arrives (pass
    loc(CB) with Bs token
  • destination (E) must notify CB when token arrives
    (pass loc(CB) with Bs token

102
RTT Inference
  • gt 90-95 of Internet paths obey triangle
    inequality
  • RTT (a, c) lt RTT (a, b) RTT (b, c)
  • RTT (server, c) lt RTT (server, ni) RTT (ni, c)
  • upper- bound
  • lower-bound RTT (server, ni) - RTT (ni, c)
  • iterate over all neighbors to get max L, min U
  • return mid-point
Write a Comment
User Comments (0)
About PowerShow.com