Cluster Computing on the Fly: - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Cluster Computing on the Fly:

Description:

Partage, Self-organizing Flock of Condors (Hu, Butt, Zhang, 2003) ... (Iamnitchi and Foster, 2002); Condor matchmaking. Load sharing within and across institutions ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 28
Provided by: virgi6
Category:

less

Transcript and Presenter's Notes

Title: Cluster Computing on the Fly:


1
Cluster Computing on the Fly
  • Peer-to-Peer Scheduling
  • of Idle Cycles in the Internet
  • Virginia Lo, Daniel Zappala, Dayi Zhou, Shanyu
    Zhao, and Yuhong Liu
  • Network Research Group
  • University of Oregon

2
CCOF Motivation
  • A variety of users and their applications need
    additional computational resources
  • Many machine throughout the Internet lie idle for
    large periods of time
  • Many users are willing to donate cycles
  • How to provide cycles to the widest range of
    users? (beyond institutional barriers)

3
CCOF Scenario 1
  • Chess hobbyist want to test her chess program
  • She only has a PC at home
  • She joins the chess interest group cycle-sharing
    community and discovers hosts who will run her
    chess state space search algorithm for a few
    weeks

4
CCOF Scenario 2
  • Experiments with network game due in a week to
    meet conference deadline
  • Planet Lab overloaded
  • Network Research Group machines overloaded
  • Requests for hosts go out to machines in the
    department, campus, colleagues at other
    universities, personal friends, and general
    donors

5
CCOF Goals and Assumptions
  • Cycle sharing in an open peer-to-peer environment
  • Application-specific scheduling
  • Long term fairness
  • Hosts retain local control, sandbox

6
Cycle Sharing Applications
  • Four classes of applications that can benefit
    from harvesting idle cycles
  • Infinite workpile
  • Workpile with deadlines
  • Tree-based search
  • Point-of-Presence (PoP)

7
Infinite workpile
  • Consume huge amounts of compute time
  • Master-slave model
  • Embarrassingly parallel no communication among
    hosts
  • Ex SETI_at_home, Stanford Folding, etc.

8
Workpile with deadlines
  • Similar to infinite workpile but more moderate
  • Must be completed by a deadline (days or weeks)
  • Some capable of increasingly refined results
    given extra time
  • Ex simulations with a large parameter space,
    ray tracing, genetic algorithms

9
Tree-based Search
  • Tree of slave processes rooted in single master
    node
  • Dynamic growth as search space is expanded
  • Dynamic pruning as costly solutions are abandoned
  • Low amount of communication among slave processes
    to share lower bounds
  • Ex distributed branch and bound, alpha-beta
    search, recursive backtracking

10
Point-of-presence
  • Minimal consumption of CPU cycles
  • Require placement of application code dispersed
    throughout the Internet to meet specific
    location, topological distribution, or resource
    requirements
  • Ex security monitoring systems, traffic
    analysis systems, protocol testing, distributed
    games

11
CCOF Architecture
12
CCOF Architecture
  • Cycle sharing communities based on factors such
    as interest, geography, performance, trust, or
    generic willingness to share.
  • Span institutional boundaries without
    institutional negotiation
  • A host can belong to more than one community
  • May want to control community membership

13
CCOF Architecture
  • Application schedulers to discover hosts,
    negotiate access, export code, and collect and
    verify results.
  • Application-specific (tailored to needs of
    application)
  • Resource discovery
  • Monitors jobs for progress checks jobs for
    correctness
  • Kills or migrates jobs as needed

14
CCOF Architecture (cont.)
  • Local schedulers enforce local policy
  • Run in background mode v. preempt when user
    returns
  • QoS through admission control and reservation
    policies
  • Local machine protected through sandbox
  • Tight control over communication

15
CCOF Architecture (cont.)
  • Coordinated scheduling
  • Across local schedulers, across application
    schedulers
  • Enforce long-term fairness
  • Enhance resource discovery through information
    exchange

16
CCOF Preliminary Work
  • Wave Scheduler
  • Resource discovery experiments
  • Quizzes for Correctness
  • Point-of-Presence Scheduler

17
Wave Scheduler
  • Well-suited for workpile with deadlines
  • Provides on-going access to dedicated cycles by
    following night timezones around the globe
  • Uses a CAN-based overlay to organize hosts by
    timezone

18
Wave Scheduler
19
Resource Discovery(Zhou and Lo, to appear
WGP2P04 at CC-Grid 04)
  • Highly dynamic environment (hosts come, go)
  • Hosts maintain profiles of blocks of idle time
  • Four basic search methods
  • Rendezvous points
  • Host advertisements
  • Client expanding ring search
  • Client random walk search

20
Resource Discovery
  • Rendezvous point best
  • high job completion rate and low msg overhead,
    but favors large jobs under heavy workloads
  • gt coordinated scheduling needed for long term
    fairness

21
CCOF Verification
  • Goal Verify correctness of returned results for
    workpile and workpile with deadline
  • Quizzes easily verifiable computations that are
    indistinguishable from the actual work
  • Standalone quiz v. Embedded quizzes
  • Quiz performance stored in reputation system
  • Quizzes v. replication

22
Point-of-Presence Scheduler
  • Scalable protocols for identifying selected
    hosts in the community overlay network such that
    each ordinary node is k-hops from C of the
    selected hosts
  • (C,k) dominating set Problem
  • Useful for leader election, rendezvous point
    placement, monitor location, etc.

23
CCOF Dom(C,k) Protocol
  • Round 1 Each node says HI to k-hop neighbors
  • ltEach node knows size of its own k-hop
    neighborhoodgt
  • Round 2 Each node sends size of its k-hop
    neighborhood to all its neighbors.
  • ltEach node knows size of all nbrs k-hop
    nbrhoods.gt
  • Round 3 If a node is maximal among its nbrhood,
  • it declares itself a dominator and notifies
    all nbrs.
  • ltSome nodes hear from some dominators, some
    dontgt
  • For those not yet covered by C dominators, repeat
    Rounds 1-3 excluding current dominators, until
    all nodes covered.

24
CCOF Research Issues
  • Incentives and fairness
  • What incentives are needed to encourage hosts to
    donate cycles?
  • How to keep track of resources consumed v.
    resources donated?
  • How to prevent resource hogs from taking an
    unfair share?
  • Resource discovery
  • How to discover hosts in a highly dynamic
    environment (hosts come and go, withdraw cycles,
    fail)
  • How to discover hosts that can be trusted, that
    will provide the needed resources?

25
CCOF Research Issues
  • Verification, trust, and reputation
  • How to check returned results?
  • How to catch malicious or misbehaving hosts that
    change results with low frequency?
  • Which reputation system?
  • Application-based scheduling
  • How does trust and reputation influence
    scheduling?
  • How should a host decide from whom to accept work?

26
CCOF Research Issues
  • Quality of service and performance monitoring
  • How to provide local admission control?
  • How to evaluate and provide QoS - guaranteed
    versus predictive service?
  • Security
  • How to prevent attacks launched from guest code
    running on the host?
  • How to prevent denial of service attacks in which
    useless code occupies many hosts

27
Related Work
  • Systems most closely resembling CCOF
  • SHARP (Fu, Chase, Chun, Schwab, Vahdat,
    2003)
  • Partage, Self-organizing Flock of Condors
    (Hu, Butt, Zhang, 2003)
  • BOINC (Anderson, 2003) - limited to donation
    of cycles to workpile)
  • Resource discovery
  • (Iamnitchi and Foster, 2002) Condor
    matchmaking
  • Load sharing within and across institutions
  • Condor, Condor Flocks, Grid computing
  • Incentives and Fairness
  • See Berkeley Workshop on Economics of P2P
    Systems
  • OurGrid (Andrade, Cirne, Brasileiro,
    Roisenberg, 2003)
  • Trust and Reputation
  • EigenRep (Kamvar, Schlosser, Garcia-Molina,
    2003) TrustMe(Singh and Liu, 2003)
Write a Comment
User Comments (0)
About PowerShow.com