Title: Millions of Jobs or a few good solutions
1Millions of Jobs or a few good solutions .
X
- David Abramson
- Monash University
- MeSsAGE Lab
2No shortage of applicationsHow many jobs do they
want/need?
- Physics
- Chemistry
- Environmental Science
- Biological Systems
- Engineering
- Astronomy
3The Nimrod Tool Family
- Nimrod workflows for robust design and search
- Vary parameters
- Execute programs
- Copy data in and out
- Sequential and parallel dependencies
- Computational economy drives scheduling
- Computation scheduled near data when appropriate
- Use distributed high performance platforms
- Upper middleware broker for resources discovery
- Wide Community adoption
4Its pretty easy to specify lots of jobs!
Plan File
parameter pressure float range from 5000 to 6000
points 4 parameter concent float range from 0.002
to 0.005 points 2 parameter material text select
anyof Fe Al task main copy compModel
nodecompModel copy inputFile.skel
nodeinputFile.skel nodesubstitute
inputFile.skel inputFile nodeexecute
./compModel lt inputFile gt results copy
noderesults results.jobname endtask
5Nimrod Development Cycle
Sent to available machines
Prepare Jobs using Portal
Results displayed interpreted
Jobs Scheduled Executed Dynamically
6Parameter Sweeps and searches
- A full parameter sweep is the cross product of
all the parameters - Too easy to generate millions!
- An optimization run minimizes some output metric
and returns parameter combinations that do this - Limited concurrency (except GAs)
- Design of Experiments limits number of
combinations further. - And old idea .
7Issues for millions of jobs
- Generation issues
- Dont necessarily need 1,000,000 jobs!
- Smarter ways of specifying problems
- Dont want to see 1,000,000 jobs!
- Dont necessarily generate all at once
- Performance issues
- Nimrod/G Server load
- Hierarchical resource management
- Nimrod/K Handling token load in matching store
- Need k-bounded loops ideas from 1980s
- Fault tolerance
- Engaging the user
- Dont want to see 1,000,000 jobs
- Distributed experiment management (p2p)?
8Issues for millions of jobs
- Analysis issues
- Need smarter ways of interacting with results
- Scientific visualisation, data mining, mega-pixel
displays - Commercial realities
- License management
- Need parametric licenses like parallel ones.
- Appropriate infrastructure
- Tera Grid class of machine not most appropriate
- Parametric Clouds