Title: Providing Scientific Software as a Service in Consideration of Service Level Agreements
1Providing Scientific Software as a Servicein
Consideration of Service Level Agreements
- Oliver Niehörster(1), André Brinkmann(1), Georg
Birkenheuer(1), Sonja Herres-Pawlis(2), Julia
Niehörster(3), Jens Krüger(2), Brigitta
Elsässer(2), Lars Packschies(4) -
- (1) Paderborn Center for Parallel Computing,
Universität Paderborn, Germany - (2) Department Chemie, Universität Paderborn,
Germany - (3) Department Agricultural Sciences, Universität
Hohenheim, Germany - (4) Universität zu Köln, Germany
2Scientific SaaS
Cloud
- Provider Advantages
- Horizontal vertical scaling of virtual machines
- Better resource utilization
- Snapshot live migration
- High availability, possible error recovery
3Service Stack
- Today Concentration on business applications
- Online word processors, content management,
customer relationship management, human resource
management - Our contribution Support for scientific
applications - Gaussian, Gromacs, MoE, NWChem,
4Agenda
Motivation
Architecture Scientific SaaS
Survey
Results
5Architecture Stack
Is this approach applicable to scientific
applications? How do scientific applications
behave?
6Aim of the survey
- Can classical SLA used to provide information
about scientific applications? - Can we estimate temporal behaviour of the
applications? - Can we support disaster recovery?
- Can we use virtualisation capabilities to
increase scheduling efficiency? - Resizing of virtual machines?
- Change number of nodes?
7Questionnaire
8Result
Kind/Method Live progress status Benchmark Estimation Function Unknown
Durable Service
Batch Job Plabsoft, R, Gromacs, NWChem Scientific Gromacs Gaussian
Experiment Gromacs Gromacs Gaussian SAS, ASReml
9Application Parallelism
- How does application scale (max. CPUs, etc)?
- Parallelism is user decision - up to 1024 CPUs
- How does parallelism behave over time?
- no change in parallelism
- Is it possible to add nodes on line?
- No!
- Is the resource demand time dependent?
- Application dependent constant or increasing
over time - High internode or I/O communication?
- All computing intensive, most I/O intensive
10Application Input
- Are interactive user inputs possible during
calculation? - No
- Is all input available initial or are workflow
dependencies? - Initial available
- How much Data uses the application?
- Several MB to several GB
11 Application Progress Indication
- Does the application provide progress
information? - Progress often not available
- If available
- Gromacs reliable after 1000 iterations
- R -gt not linear
- PlabSoft, SAS, ASRemL -gt unreliable
- Guess end time from history
12Checkpoint and Restart
- Is application specific checkpointing possible?
- Unknown for some applications
- PlabSoft, SAS, ASRemL
- Manually started
- Gromacs
- Automatically started
- R, Gaussian, NWChem
- Can a checkpoint continue computation with more
nodes? - If available then possible
13Summary Conclusion
- Summary
- Scientific applications are batch jobs
- Halting problem Determination of application
finish not possible in every case - No online progress indication
- Virtualisation
- Online horizontal scaling virtual machines not
supported - Vertical scaling helps
- Conclusion
- Providing scientific SaaS is challenging
- Limited support for SLAs
14Georg Birkenheuer birke_at_uni-paderborn.de