Cluster Computing on the Fly: - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Cluster Computing on the Fly:

Description:

Partage, Self-organizing Flock of Condors (Hu, Butt, Zhang, 2003) ... (Iamnitchi and Foster, 2002); Condor matchmaking. Load sharing within and across institutions ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 28

Provided by: virgi6

Category:

more less

Transcript and Presenter's Notes

Title: Cluster Computing on the Fly:

1
Cluster Computing on the Fly

Peer-to-Peer Scheduling
of Idle Cycles in the Internet
Virginia Lo, Daniel Zappala, Dayi Zhou, Shanyu
Zhao, and Yuhong Liu
Network Research Group
University of Oregon

2
CCOF Motivation

A variety of users and their applications need
additional computational resources
Many machine throughout the Internet lie idle for
large periods of time
Many users are willing to donate cycles
How to provide cycles to the widest range of
users? (beyond institutional barriers)

3
CCOF Scenario 1

Chess hobbyist want to test her chess program
She only has a PC at home
She joins the chess interest group cycle-sharing
community and discovers hosts who will run her
chess state space search algorithm for a few
weeks

4
CCOF Scenario 2

Experiments with network game due in a week to
meet conference deadline
Planet Lab overloaded
Network Research Group machines overloaded
Requests for hosts go out to machines in the
department, campus, colleagues at other
universities, personal friends, and general
donors

5
CCOF Goals and Assumptions

Cycle sharing in an open peer-to-peer environment
Application-specific scheduling
Long term fairness
Hosts retain local control, sandbox

6
Cycle Sharing Applications

Four classes of applications that can benefit
from harvesting idle cycles
Infinite workpile
Workpile with deadlines
Tree-based search
Point-of-Presence (PoP)

7
Infinite workpile

Consume huge amounts of compute time
Master-slave model
Embarrassingly parallel no communication among
hosts
Ex SETI_at_home, Stanford Folding, etc.

8
Workpile with deadlines

Similar to infinite workpile but more moderate
Must be completed by a deadline (days or weeks)
Some capable of increasingly refined results
given extra time
Ex simulations with a large parameter space,
ray tracing, genetic algorithms

9
Tree-based Search

Tree of slave processes rooted in single master
node
Dynamic growth as search space is expanded
Dynamic pruning as costly solutions are abandoned
Low amount of communication among slave processes
to share lower bounds
Ex distributed branch and bound, alpha-beta
search, recursive backtracking

10
Point-of-presence

Minimal consumption of CPU cycles
Require placement of application code dispersed
throughout the Internet to meet specific
location, topological distribution, or resource
requirements
Ex security monitoring systems, traffic
analysis systems, protocol testing, distributed
games

11
CCOF Architecture
12
CCOF Architecture

Cycle sharing communities based on factors such
as interest, geography, performance, trust, or
generic willingness to share.
Span institutional boundaries without
institutional negotiation
A host can belong to more than one community
May want to control community membership

13
CCOF Architecture

Application schedulers to discover hosts,
negotiate access, export code, and collect and
verify results.
Application-specific (tailored to needs of
application)
Resource discovery
Monitors jobs for progress checks jobs for
correctness
Kills or migrates jobs as needed

14
CCOF Architecture (cont.)

Local schedulers enforce local policy
Run in background mode v. preempt when user
returns
QoS through admission control and reservation
policies
Local machine protected through sandbox
Tight control over communication

15
CCOF Architecture (cont.)

Coordinated scheduling
Across local schedulers, across application
schedulers
Enforce long-term fairness
Enhance resource discovery through information
exchange

16
CCOF Preliminary Work

Wave Scheduler
Resource discovery experiments
Quizzes for Correctness
Point-of-Presence Scheduler

17
Wave Scheduler

Well-suited for workpile with deadlines
Provides on-going access to dedicated cycles by
following night timezones around the globe
Uses a CAN-based overlay to organize hosts by
timezone

18
Wave Scheduler
19
Resource Discovery(Zhou and Lo, to appear
WGP2P04 at CC-Grid 04)

Highly dynamic environment (hosts come, go)
Hosts maintain profiles of blocks of idle time
Four basic search methods
Rendezvous points
Host advertisements
Client expanding ring search
Client random walk search

20
Resource Discovery

Rendezvous point best
high job completion rate and low msg overhead,
but favors large jobs under heavy workloads
gt coordinated scheduling needed for long term
fairness

21
CCOF Verification

Goal Verify correctness of returned results for
workpile and workpile with deadline
Quizzes easily verifiable computations that are
indistinguishable from the actual work
Standalone quiz v. Embedded quizzes
Quiz performance stored in reputation system
Quizzes v. replication

22
Point-of-Presence Scheduler

Scalable protocols for identifying selected
hosts in the community overlay network such that
each ordinary node is k-hops from C of the
selected hosts
(C,k) dominating set Problem
Useful for leader election, rendezvous point
placement, monitor location, etc.

23
CCOF Dom(C,k) Protocol

Round 1 Each node says HI to k-hop neighbors
ltEach node knows size of its own k-hop
neighborhoodgt
Round 2 Each node sends size of its k-hop
neighborhood to all its neighbors.
ltEach node knows size of all nbrs k-hop
nbrhoods.gt
Round 3 If a node is maximal among its nbrhood,
it declares itself a dominator and notifies
all nbrs.
ltSome nodes hear from some dominators, some
dontgt
For those not yet covered by C dominators, repeat
Rounds 1-3 excluding current dominators, until
all nodes covered.

24
CCOF Research Issues

Incentives and fairness
What incentives are needed to encourage hosts to
donate cycles?
How to keep track of resources consumed v.
resources donated?
How to prevent resource hogs from taking an
unfair share?
Resource discovery
How to discover hosts in a highly dynamic
environment (hosts come and go, withdraw cycles,
fail)
How to discover hosts that can be trusted, that
will provide the needed resources?

25
CCOF Research Issues

Verification, trust, and reputation
How to check returned results?
How to catch malicious or misbehaving hosts that
change results with low frequency?
Which reputation system?
Application-based scheduling
How does trust and reputation influence
scheduling?
How should a host decide from whom to accept work?

26
CCOF Research Issues

Quality of service and performance monitoring
How to provide local admission control?
How to evaluate and provide QoS - guaranteed
versus predictive service?
Security
How to prevent attacks launched from guest code
running on the host?
How to prevent denial of service attacks in which
useless code occupies many hosts

27
Related Work