David P. Anderson

1 / 31
About This Presentation
Title:

David P. Anderson

Description:

not free. Grid (Globus/Condor), jxta. solve a different problem ... screensaver graphics. Invisibility, control of resource usage. Involvement ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: David P. Anderson


1
Volunteer Computing
  • David P. Anderson
  • Space Sciences Laboratory
  • University of California Berkeley
  • davea_at_ssl.berkeley.edu

2
Outline
  • Volunteer computing
  • BOINC an OS for volunteer computing
  • Applications
  • Challenges and research directions

3
Where's the power?
home PCs
your computers
academic
business
  • 2010 1 billion Internet-connected PCs, 55
    privately owned
  • If 100M people participate
  • 100 PetaFLOPs, 1 Exabyte (1018) storage
  • Consumer products drive technology
  • GPUs (NVIDIA, Sony Cell)

4
Volunteer computing history
95 96 97 98 99 00 01 02
03 04 05
GIMPS, distributed.net
SETI_at_home, folding_at_home
fight_at_home
volunteer computing public resource
computing Internet computing screensaver
computing global computing _at_home
computing peer-to-peer computing Grid computing
climateprediction.net
BOINC
Einstein_at_home
5
Volunteer/Grid differences
6
Save money!
Suppose processing 1 GB of data takes X computer
days
cluster/Grid volunteer --------------
-- -------- computing 1 per
CPU/day free network free 1 per 20
GB cost per GB X 1/20
So volunteer computing is cheaper if X
1/20 (SETI_at_home X 1,000)
7
Educational discount
Internet2 (free, underutilized)
UCLA
partner institutions
UCB
UIUC
Underutilized flat-rate ISP connections
commodity Internet ()
... so bandwidth may be effectively free also
8
Infrastructure software
  • Roll your own
  • XtremWeb, cosm
  • not complete/robust
  • United Devices, Entropia
  • not free
  • Grid (Globus/Condor), jxta
  • solve a different problem
  • BOINC (Berkeley Open Infrastructure for Network
    Computing)
  • http//boinc.berkeley.edu

9
Projects and participants
diversity, autonomy
Climate
SETI
physics
biomedical
allocation, trust
heterogeneity
Joe
Jens
Alice
10
Encourage participation in 1 project
work
work
project computing needs
think
think
time
  • Better long-term resource utilization
  • project A works while project B thinks
  • Better short-term resource utilization
  • communicate/compute in parallel
  • match applications to resources

11
Creating a BOINC project
  • Install BOINC server software on Unix box
  • Adapt or develop application
  • compile for various platforms
  • Write scripts/programs to
  • generate tasks
  • validate results
  • handle results
  • Develop web site
  • Get media coverage

12
Structure of a BOINC project
Ongoing tasks - monitor server correctness -
monitor server performance - develop and
maintain applications
13
Redundant computing
  • Addresses hardware errors, hackers
  • Issue 2 or more copies of each task
  • don't send to same host or user
  • timed retry up to a limit
  • Result comparison approaches
  • Application-specific fuzzy comparison
  • Homogeneous redundancy
  • send copies only to numerically equivalent hosts
  • Develop platform-independent app

14
What do participants want?
  • Incentives
  • contribute to science
  • get acknowledgement
  • community
  • screensaver graphics
  • Invisibility, control of resource usage
  • Involvement
  • translation, porting etc.

15
Credit accounting
  • Credit is granted for
  • computation (CPU time x benchmark)
  • storage
  • network communication
  • Cheat-resistance
  • Accounting
  • user, host, team
  • Credit DB export for 3rd-party web sites
  • cross-project identification

16
Participating
  • Select project(s)
  • Create account(s)
  • Download/install BOINC client software
  • Interact via web
  • preferences
  • leaderboards
  • profile
  • teams
  • message boards, dynamic FAQ

17
(No Transcript)
18
Anonymous platform mechanism
  • Participant compiles software from source
  • Scheduler RPC platform is anonymous
  • Purposes
  • support obscure platforms
  • security-conscious participants
  • performance tuning of applications

19
Client structure
servers
BOINC Manager
20
Applications
  • Computation model
  • Workunits, results
  • Deadlines, resource estimates
  • Data model
  • files, file references
  • Mostly existing apps (FORTRAN, C)
  • Categories
  • Physical simulation
  • Data processing
  • Distribution for its own sake

21
SETI_at_home
  • Analysis of radio telescope data from Arecibo
  • SETI search for narrowband signals
  • Astropulse search for short broadband signals
  • 0.3 MB in, 4 CPU hours, 10 KB out
  • Enhancements under BOINC
  • data archival on clients
  • direct data distribution from observatory

22
(No Transcript)
23
Climateprediction.net
  • Climate change study (Oxford University)
  • Met Office model (FORTRAN, 1M lines)
  • Input 10MB executable, 1MB data
  • Output per workunit
  • 10 MB summary (always upload)
  • 1 GB detail file (archive on client, may upload)
  • CPU time 2-3 months (can't migrate)
  • trickle messages
  • preemptive scheduling

24
(No Transcript)
25
Biology projects
  • Protein folding
  • Predictor_at_home (Scripps Institute)
  • Folding_at_home (Stanford)
  • Virtual drug discovery
  • fightAIDS_at_home
  • Gene sequence analysis
  • NTT projects
  • Lattice (U. Maryland)

26
Einstein_at_home
  • Gravitational wave detection LIGO
  • UW Milwaukee/CalTech/Max Planck Inst.
  • 30,000 40 MB data sets
  • Each data set is analyzed w/ 40,000 different
    parameter sets each takes 6 hrs CPU
  • Locality scheduling
  • minimize data transfer, client disk usage
  • minimize credit-granting delay

27
(No Transcript)
28
CERN projects
  • LHC_at_home
  • accelerator simulation (Sixtrack)
  • HEP_at_home
  • collision data analysis

29
Others
  • UCB Internet measurement
  • Map/measure the Internet and home PCs
  • BURP (big ugly rendering project)
  • ray-tracing
  • PlanetQuest
  • image analysis for planetary transit detection

30
Challenges and questions
  • Get 100 million participants
  • simplified account management
  • Get more projects
  • Distributed file system support
  • Use peer-to-peer communication
  • BitTorrent integration
  • Use GPUs and other resources
  • Integrate with Grid (Lattice, CERN)

31
Volunteer computing
  • A new high-performance computing paradigm
  • Benefits to projects
  • enables otherwise infeasible computational
    research
  • economic advantage even for small projects
  • Benefits to participants
  • increase public scientific knowledge/interest
  • catalyze virtual communities
  • democratize resource allocation
Write a Comment
User Comments (0)