Title: Rensselaer Polytechnic Institute
1An Asynchronous Hybrid Genetic-Simplex Search
for Modeling the Milky Way Galaxy Using
Volunteer Computing
Travis Desell, Boleslaw Szymanski, Carlos Varela
- Rensselaer Polytechnic Institute
- Department of Computer Science
- GECCO 2008
- Tuesday, July 15, 2008
2Overview
- Introduction
- Motivation
- Driving Scientific Application
- Large Scale Computing
- Research Questions and Challenges
- Asynchronous Genetic Search
- Methodology
- Recombination
- ?Generic Optimization Framework
- Approach
- Vision
- Architecture
- Results
- Computing Environments
- Convergence Rates
- Recombination Analysis
- Conclusions Future Work
- Questions?
3Motivation
- Scientists need easily accessible distributed
optimization tools - Distribution is essential for scientific
computing - Scientific models are becoming increasingly
complex - Rates of data acquisition are far exceeding
increases in computing power - Traditional optimization strategies not well
suited to large scale computing - Lack scalability and fault tolerance
4Large Scale Computing
- Supercomputing
- 10k homogeneous processors
- Very fast communication
- Highly reliable
- Internet Computing
- 10k 100k heterogeneous processors
- Heterogeneous (up to global-scale) communication
- Highly volatile
- Computational Grids
- 100s of processors per homogeneous cluster
- Fast homogeneous communication within clusters
- Heterogeneous (up to global-scale) latency
between clusters - Moderately reliable
5 Astro-Informatics
What is the structure and origin of the Milky Way
galaxy?
- Being inside the Milky Way provides 3D data
- SLOAN digital sky survey has collected over 10 TB
data. - Can determine it's structure not possible for
other galaxies. - Very challenging evaluating a single model of
the Milky Way with a single set of parameters can
take hours or days on a typical high-end
computer. - Models determine where different star streams are
in the Milky Way, which helps us understand
better its structure and how it was formed.
6- Asynchronous
- Genetic Search
7Issues With Traditional Genetic Search
- Traditional genetic search is dependent and
iterative - Current generation is used to generate the next
generation - Dependencies and iterations limit scalability and
impact performance - With volatile hosts, what if an individual in the
next generation is lost? - Redundancy is expensive
- Scalability limited by population size
8Asynchronous Search Strategy
- Use an asynchronous search methodology
- No explicit dependencies
- No iterations
- Continuously updated population
- N individuals are generated randomly for the
initial population - Fulfil work requests by applying recombination
operators to the population - Update population with reported results
9Asynchronous Search Strategy (2)?
Workers
Report results and update population
Send work
Request work
Work Queue
Population
Request work when queue is low
Parameter Set (1)?
Fitness (1)?
Unevaluated Parameter Set (1)?
Parameter Set (2)?
Fitness (2)?
Unevaluated Parameter Set (2)?
. . . . .
. . . . .
. . . . .
Generate members from population
Parameter Set (n)?
Fitness (n)?
Unevaluated Parameter Set (m)?
10Asynchronous Genetic Search Operators (1)?
- Average
- Traditional operator for continuous problems
- Generated parameters are the average of two
randomly selected parents - Mutation
- Takes a parent and generates a mutation by
randomly selecting a parameter and mutating it
11Asynchronous Genetic Search Operators (2)?
- Double Shot - two parents generate three children
- The average of the parents
- A point outside the less fit parent, the same
distance from that parent as the average - A point outside the more fit parent, the same
distance from that parent as the average
12Asynchronous Genetic Search Operators (3)?
- Probabilistic Simplex
- N parents generate one or more children
- Points randomly along the line created by the
worst parent, and the centroid (average) of the
remaining parents
13-
- Generic Optimization Framework
14Approach
- Separation of Concerns
- Distributed Computing
- Optimization
- Scientific Modeling
- Plug-and-Play
- Simple generic interfaces
15Vision
16Architecture
- Asynchronous evaluations
- Faults can be ignored
- No processor dependencies
- Results may not be reported or reported late
- Grids Internet
- Single parallel evaluation
- Uses most evolved population
- Can use traditional methods
- Faults require recalculation
- Grids require load balancing
- Supercomputers Grids
17Synchronous Architecture?
Scientific Models
Search Routines
Data Initialisation Integral Function Integral
Composition Likelihood Function Likelihood
Composition
Gradient Descent Genetic Search Simplex
Initial Parameters
Optimised Parameters
Evaluation Request
Results
Distribute Parameters
Combine Results
Evaluator
Evaluator
Evaluator
Evaluator
Evaluator
Evaluator Creation
SALSA/Java (RPI Grid)?
MPI/C (BlueGene)?
Distributed Evaluation Framework
18Asynchronous Architecture?
Scientific Models
Search Routines
Data Initialisation Integral Function Integral
Composition Likelihood Function Likelihood
Composition
Evolutionary Methods Genetic Search Particle
Swarm Optimisation
Initial Parameters
Optimised Parameters
Work Request
Results
Work Request
Results
Work
Work
Evaluator (1)?
Evaluator (N)?
Evaluator Creation
BOINC (Internet)?
SALSA/Java (RPI Grid)?
Distributed Evaluation Framework
19 20Computing Environments - BlueGene
- RPI's CCNI BlueGene
- BlueGene used as a single worker
- Very fast communication topology enables parallel
function evaluation with the synchronous
architecture - Used a 512 node partition of 1024 processors
- One individual generated, evaluated and inserted
at a time - Mimics steady state genetic search
- Most evolved population always used
21Computing Environments - BOINC
- MilkyWay_at_Home http//milkyway.cs.rpi.edu/
- Multiple Asynchronous Workers
- Approximately 2,000 3,000 volunteered computers
used - Asynchronous architecture used
- Asynchronous Evaluation
- Each computer could request up to 20 pending
individuals at any time - Work queue filled with individuals generated form
current population - Population updated when results reported
- Individuals may not be reported
22Asynchronous vs Iterative Genetic Search
23Asynchronous GS-Simplex on BlueGene
24Asynchronous GS-Simplex on BOINC?
25Simplex Operator Analysis
- Even with a long time to report, results still
would improve the population
- Generation near reflection has highest insert
rate - Generation near centroid provide most population
improvement for fast report times - Generation near reflection provide most
population improvement for long report times
26 27Conclusions
- Asynchronous search is effective on large scale
computing environments - Fault tolerant without expensive redundancy
- Asynchronous evaluation on heterogeneous
environment increases diversity - BOINC converges almost as fast as the BlueGene,
while offering more availability and
computational power - Even computers with slow result report rates are
useful - Simplex-Genetic Hybrid provides significant
improvement in convergence
28Future Work
- Optimization
- Use report times to determine how to generate
individuals - More search methods (PSO, DE)?
- Simulate asynchrony for benchmarks
- Distributed Computing
- Parallel asynchronous workers
- Handle Malicious Volunteers
- Collaboration
http//www.nasa.gov
29 30- Thanks!
- http//milkyway.cs.rpi.edu
- http//wcl.cs.rpi.edu
31 32Simplex Operator Improvement (2)?
33Simplex Operator Improvement (3)?
34GMLE Architecture (Parallel-Asynchronous)?
Search Routines
Communication Layer
BOINC - HTTP
Grid - TCP/IP
Supercomputer - MPI
Results
Work
Work Request
Results
Work
Work Request
Worker (1)?
Worker (Z)?
Combine Results
Combine Results
Distribute Parameters
Distribute Parameters
MPI
MPI
Evaluator (1)?
Evaluator (N)?
Evaluator (2)?
Evaluator (1)?
Evaluator (M)?
Evaluator (2)?
35Operator Examination (1) - BlueGene
36Operator Examination (2) - BOINC
37Operator Examination (3) - BOINC
38Operator Examination (4) - BOINC
39Operator Examination (5) - BOINC
40Operator Examination (6) - BOINC
41Operator Examination (7) - BOINC