Exa-Scale Volunteer Computing - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Exa-Scale Volunteer Computing

Description:

Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley – PowerPoint PPT presentation

Number of Views:201
Avg rating:3.0/5.0
Slides: 43
Provided by: DavidA278
Category:

less

Transcript and Presenter's Notes

Title: Exa-Scale Volunteer Computing


1
Exa-Scale Volunteer Computing
  • David P. Anderson
  • Space Sciences Laboratory
  • U.C. Berkeley

2
Outline
  • Volunteer computing
  • BOINC
  • Applications
  • Research directions

3
processors
program runs too slow on PC
1
multiple jobs
single job
High-throughput computing
cluster (batch)
High-performance computing
100
cluster (MPI)
Grid
1000
Commercial cloud
supercomputer
Volunteer computing
10K-1M
4
Volunteer computing
  • Early projects
  • 1997 GIMPS, distributed.net
  • 1999 SETI_at_home, Folding_at_home
  • Today
  • 50 projects
  • 500K volunteers
  • 900K computers
  • 10 PetaFLOPS

5
The PetaFLOPS barrier
  • September 2007 Folding_at_home
  • January 2008 BOINC
  • June 2008 IBM Roadrunner

6
ExaFLOPS
  • Current PetaFLOPS breakdown
  • Potential ExaFLOPS by 2010
  • 4M GPUs 1 TFLOPS 0.25 availability

7
BOINC
  • Middleware for volunteer computing
  • client, server, web
  • Based at UC Berkeley Space Sciences Lab
  • Open source (LGPL)
  • NSF-funded since 2002
  • http//boinc.berkeley.edu

8
BOINC volunteers and projects
projects
volunteers
LHC_at_home
CPDN
WCG
attachments
9
The BOINC computing ecosystem
The worlds computing power
Scientific research
The public
  • Goals
  • Better research gets more computing power
  • The public decides whats better

10
BOINC software overview
MySQL
daemons
scheduler
data server
HTTP
project server
GUI
client
screensaver
volunteer host
apps
11
Scheduler RPC
  • Request
  • hardware, software description
  • work requests (CPU, GPU)
  • completed jobs
  • Reply
  • application descriptions
  • job descriptions

12
Client job scheduling
  • Queue lots of jobs
  • to avoid starvation
  • for variety
  • Job scheduling
  • Round-robin time-slicing
  • Earliest deadline first

13
Client work fetch policy
  • When? From which project? How much?
  • Goals
  • maintain enough work
  • minimize scheduler requests
  • honor resource shares
  • per-project debt

CPU 0
CPU 1
CPU 2
CPU 3
max
min
14
Work fetch for GPUs goals
  • Queue work separately for different resource
    types
  • Resource shares apply to aggregate
  • Example projects A, B have same resource share
  • A has CPU and GPU jobs, B has only GPU jobs

GPU
B
A
CPU
A
15
Work fetch for GPUs
  • For each resource type
  • per-project backoff
  • per-project debt
  • accumulate only while not backed off
  • A projects overall debt is weighted average of
    resource debts
  • Get work from project with highest overall debt

16
Scheduling server
  • Possible outcomes of a job
  • success
  • runs but returns wrong answer
  • doesnt run, returns wrong answer (hacker)
  • crashes, client reports it
  • never hear from client again
  • Job delay bounds
  • Replicated computing
  • homogeneous replication

17
Server abstractions
applications
app versions
Win32
Win32 NVIDIA
Win64
Win32 N-core
Mac OS X
jobs
instances
18
Scheduler overview
MySQL
schedulers
feeder
share-memory job cache
client
19
How scheduler chooses app versions
  • App versions have project-supplied planning
    function
  • Inputs
  • host description
  • Outputs
  • Whether host can run app version
  • Resource usage (CPUs, GPUs)
  • expected FLOPS

20
App version selection
  • Call planning function for platforms app
    versions
  • Skip versions that use resources for which no
    work is being requested
  • Use the version with highest expected FLOPS
  • Repeat this when a resource request is satisfied

21
Anonymous platform mechanism
  • The idea volunteer supplies app versions. Why?
  • security
  • optimization
  • unsupported platforms

22
Science areas using BOINC
  • Biology
  • protein study, genetic analysis
  • Medicine
  • drug discovery, epidemiology
  • Physics
  • LHC, nanotechnology, quantum computing
  • Astronomy
  • data analysis, cosmology, galactic modeling
  • Environment
  • climate modeling, ecosystem simulation
  • Math
  • Graphics rendering

23
Application types
  • Computing-intensive analysis of large data
  • Physical simulations
  • Genetic algorithms
  • GA-inspired optimization
  • Non-CPU-intensive
  • Internet study
  • distributed sensor network

24
Malariacontrol.net
  • Simulation models of the transmission dynamics
    and health effects of malaria are an important
    tool for malaria control. They can be used to
    determine optimal strategies for delivering
    mosquito nets, chemotherapy, or new vaccines
    which are currently under development and testing.

25
Climateprediction.net
26
Einstein_at_home
  • Gravitational waves gravitational pulsars

27
SETI_at_home
28
Milkyway_at_home
29
GPUGRID.net
30
AQUA_at_home
  • D-Wave Systems
  • Simulation of adiabatic quantum algorithms for
    binary quadratic optimization

31
Collatz Conjecture
  • even N ? N/2
  • odd N ? 3N 1
  • always goes to 1?

32
Quake Catcher Network
33
Organizational models
  • Umbrella projects
  • Institutional
  • Lattice, VTU_at_home
  • Corporate
  • IBM World Community Grid
  • Community
  • AlmereGrid
  • Research community
  • MindModeling.org

publicity web development sysadmin
Project
34
Volunteer computing research
  • Goals (mutually incompatible)
  • maximize throughput
  • minimize makespan of job batches
  • minimize average time until credit
  • minimize network traffic
  • minimize server disk usage

35
Characterizing hosts
powered on
available
connected
  • What are good models? What are correlations with
    other characteristics? How to model churn?
  • BOINC client is instrumented to log all this
    have data from 200K hosts over 1 year
  • Mining for Statistical Models of Availability in
    Large-Scale Distributed Systems An Empirical
    Study of SETI_at_home. Bahman Javadi, Derrick Kondo,
    Jean-Marc Vincent, David P. Anderson. 17th Annual
    Meeting of the IEEE/ACM International Symposium
    on Modelling, Analysis and Simulation of Computer
    and Telecommunication Systems, Sept 21-23 2009,
    London.
  • On Correlated Availability in Internet-Distributed
    Systems. Derrick Kondo, Artur Andrzejak, and
    David P. Anderson. 9th IEEE/ACM International
    Conference on Grid Computing (Grid 2008),
    Tsukuba, Japan, Sept 29 - Oct 1 2008.

36
Studying server scheduling policies
MySQL
Simulator of a large, dynamic set of volunteer
hosts
feeder
scheduler
share-memory job cache
  • EmBOINC BOINC project emulator
  • Performance Prediction and Analysis of BOINC
    Projects An Empirical Study with EmBOINC. Trilce
    Estrada, Michela Taufer, David Anderson. To
    appear, Journal of Grid Computing.
  • EmBOINC An Emulator for Performace Analysis of
    BOINC Projects. Trilce Estrada, Michela Taufer,
    Kevin Reed, David Anderson. 3rd Workshop on
    Desktop Grids and Volunteer Computing Systems
    (PCGrid 2009), May 29, 2009, Rome.

37
Studying client scheduling policies
  • BOINC client simulator
  • simulates a client connected to several projects
  • based on actual client code
  • Performance Evaluation of Scheduling Policies for
    Volunteer Computing. Derrick Kondo, David P.
    Anderson and John McLeod VII. 3rd IEEE
    International Conference on e-Science and Grid
    Computing. Banagalore, India, December 10-13
    2007.
  • Local Scheduling for Volunteer Computing. David
    P. Anderson and John McLeod VII. Workshop on
    Large-Scale, Volatile Desktop Grids (PCGrid 2007)
    held in conjunction with the IEEE International
    Parallel Distributed Processing Symposium
    (IPDPS), March 30, 2007, Long Beach.

38
Supporting distributed applications
  • Volpex Linda-like dataspace system
  • MPI layer
  • centralized implementation
  • fault tolerance, performance issues
  • A Communication Framework for Fault-tolerant
    Parallel Execution. Nagarajan Kanna, Jaspal
    Subhlok, Edgar Gabriel, Eshwar Rohit and David
    Anderson. The 22nd International Workshop on
    Languages and Compilers for Parallel Computing,
    Newark, Delaware, Oct 8-10 2009.

39
Using virtual machines
hypervisor (VirtualBox, kQEMU,etc.)
BOINC client
VM wrapper
VM
  • App version is VM wrapper virtual machine image
  • VM image may contain the client of a non-BOINC
    distributed batch system

40
Data-intensive computing
  • Maintain large data set on clients
  • 10 years of radio telescope data
  • gene/protein data
  • Compute against data set
  • MapReduce, other models

41
Volunteer motivation study
  • Online survey correlated with participation data
  • Survey is currently being designed
  • Preliminary findings
  • Talk is cheap claimed motivations not supported
    by data
  • Team members contribute more
  • Contribution decreases over time (especially for
    non-team members)

42
Conclusion
  • Volunteer computing Exa-scale potential
  • GPUs are crucial
  • BOINC enabling technology
  • Bottlenecks
  • organizational models
  • public awareness
  • Lots of research opportunities
Write a Comment
User Comments (0)
About PowerShow.com