Title: David P. Anderson
1Volunteer Computing
- David P. Anderson
- Space Sciences Laboratory
- University of California Berkeley
- davea_at_ssl.berkeley.edu
2Outline
- Volunteer computing
- BOINC an OS for volunteer computing
- Applications
- Challenges and research directions
3Where's the power?
home PCs
your computers
academic
business
- 2010 1 billion Internet-connected PCs, 55
privately owned - If 100M people participate
- 100 PetaFLOPs, 1 Exabyte (1018) storage
- Consumer products drive technology
- GPUs (NVIDIA, Sony Cell)
4Volunteer computing history
95 96 97 98 99 00 01 02
03 04 05
GIMPS, distributed.net
SETI_at_home, folding_at_home
fight_at_home
volunteer computing public resource
computing Internet computing screensaver
computing global computing _at_home
computing peer-to-peer computing Grid computing
climateprediction.net
BOINC
Einstein_at_home
5Volunteer/Grid differences
6Save money!
Suppose processing 1 GB of data takes X computer
days
cluster/Grid volunteer --------------
-- -------- computing 1 per
CPU/day free network free 1 per 20
GB cost per GB X 1/20
So volunteer computing is cheaper if X
1/20 (SETI_at_home X 1,000)
7Educational discount
Internet2 (free, underutilized)
UCLA
partner institutions
UCB
UIUC
Underutilized flat-rate ISP connections
commodity Internet ()
... so bandwidth may be effectively free also
8Infrastructure software
- Roll your own
- XtremWeb, cosm
- not complete/robust
- United Devices, Entropia
- not free
- Grid (Globus/Condor), jxta
- solve a different problem
- BOINC (Berkeley Open Infrastructure for Network
Computing) - http//boinc.berkeley.edu
9Projects and participants
diversity, autonomy
Climate
SETI
physics
biomedical
allocation, trust
heterogeneity
Joe
Jens
Alice
10Encourage participation in 1 project
work
work
project computing needs
think
think
time
- Better long-term resource utilization
- project A works while project B thinks
- Better short-term resource utilization
- communicate/compute in parallel
- match applications to resources
11Creating a BOINC project
- Install BOINC server software on Unix box
- Adapt or develop application
- compile for various platforms
- Write scripts/programs to
- generate tasks
- validate results
- handle results
- Develop web site
- Get media coverage
12Structure of a BOINC project
Ongoing tasks - monitor server correctness -
monitor server performance - develop and
maintain applications
13Redundant computing
- Addresses hardware errors, hackers
- Issue 2 or more copies of each task
- don't send to same host or user
- timed retry up to a limit
- Result comparison approaches
- Application-specific fuzzy comparison
- Homogeneous redundancy
- send copies only to numerically equivalent hosts
- Develop platform-independent app
14What do participants want?
- Incentives
- contribute to science
- get acknowledgement
- community
- screensaver graphics
- Invisibility, control of resource usage
- Involvement
- translation, porting etc.
15Credit accounting
- Credit is granted for
- computation (CPU time x benchmark)
- storage
- network communication
- Cheat-resistance
- Accounting
- user, host, team
- Credit DB export for 3rd-party web sites
- cross-project identification
16Participating
- Select project(s)
- Create account(s)
- Download/install BOINC client software
- Interact via web
- preferences
- leaderboards
- profile
- teams
- message boards, dynamic FAQ
17(No Transcript)
18Anonymous platform mechanism
- Participant compiles software from source
- Scheduler RPC platform is anonymous
- Purposes
- support obscure platforms
- security-conscious participants
- performance tuning of applications
19Client structure
servers
BOINC Manager
20Applications
- Computation model
- Workunits, results
- Deadlines, resource estimates
- Data model
- files, file references
- Mostly existing apps (FORTRAN, C)
- Categories
- Physical simulation
- Data processing
- Distribution for its own sake
21SETI_at_home
- Analysis of radio telescope data from Arecibo
- SETI search for narrowband signals
- Astropulse search for short broadband signals
- 0.3 MB in, 4 CPU hours, 10 KB out
- Enhancements under BOINC
- data archival on clients
- direct data distribution from observatory
22(No Transcript)
23Climateprediction.net
- Climate change study (Oxford University)
- Met Office model (FORTRAN, 1M lines)
- Input 10MB executable, 1MB data
- Output per workunit
- 10 MB summary (always upload)
- 1 GB detail file (archive on client, may upload)
- CPU time 2-3 months (can't migrate)
- trickle messages
- preemptive scheduling
24(No Transcript)
25Biology projects
- Protein folding
- Predictor_at_home (Scripps Institute)
- Folding_at_home (Stanford)
- Virtual drug discovery
- fightAIDS_at_home
- Gene sequence analysis
- NTT projects
- Lattice (U. Maryland)
26Einstein_at_home
- Gravitational wave detection LIGO
- UW Milwaukee/CalTech/Max Planck Inst.
- 30,000 40 MB data sets
- Each data set is analyzed w/ 40,000 different
parameter sets each takes 6 hrs CPU - Locality scheduling
- minimize data transfer, client disk usage
- minimize credit-granting delay
27(No Transcript)
28CERN projects
- LHC_at_home
- accelerator simulation (Sixtrack)
- HEP_at_home
- collision data analysis
29Others
- UCB Internet measurement
- Map/measure the Internet and home PCs
- BURP (big ugly rendering project)
- ray-tracing
- PlanetQuest
- image analysis for planetary transit detection
30Challenges and questions
- Get 100 million participants
- simplified account management
- Get more projects
- Distributed file system support
- Use peer-to-peer communication
- BitTorrent integration
- Use GPUs and other resources
- Integrate with Grid (Lattice, CERN)
31Volunteer computing
- A new high-performance computing paradigm
- Benefits to projects
- enables otherwise infeasible computational
research - economic advantage even for small projects
- Benefits to participants
- increase public scientific knowledge/interest
- catalyze virtual communities
- democratize resource allocation