Title: Computer Infrastructures for Reliable, LargeScale Simulations
1Computer Infrastructures for Reliable,
Large-Scale Simulations
Global Computing Lab University of Texas at El
Paso
2What Infrastructures and Simulations?
- Computer infrastructures
- Dedicated high performance systems clusters and
SMP machines on your campus - Grid computing the NSF initiative TeraGrid
- Desktop grid computing and volunteer computing
BOINC projects - Computer simulations
- Different methods, e.g., Monte Carlo, Molecular
Dynamics - Heterogeneous workflow, i.e., different phases
(codes) to accomplish a complete simulation
Given a computer simulation, each infrastructure
has its strengths and weaknesses. Choosing the
proper infrastructure is vital for fast, reliable
simulation results.
3Outline
- TeraGrid initiative
- Volunteer computing and BOINC
- Two applications on volunteer computing
- Protein structure prediction
- Protein-ligand docking
- Research challenges
- Research opportunities
4Grid Computing and TeraGrid
- Short overview of the TeraGrid project
5Volunteer Computing and BOINC Projects
BOINC project
- Computing resources (e.g., desktops, notebooks)
owned by volunteers and connected through the
Internet - Normally used to address fundamental problems in
science - BOINC (Berkeley Open Infrastructure for Network
Computing) is a well-known representative of VC - The computing power of BOINC is currently about
420 TeraFLOPS (based on credit granted across all
projects)? - The total free disk space on computers running
SETI_at_home is 12 Petabytes
Developers and Administrators
Results
Master
Internet
Scientists
Worker (home PC)?
Worker (home PC)?
Volunteers
6Predictor_at_home Project Goals
- CASP Critical Assessment of Techniques for
Protein Structure Prediction - Biannual competition which aims to advance the
research in structure prediction methods - Between June 2004 and August 2004, 64 targets
(sequences amino acids whose protein structure
was unknown) were ultimately solved
experimentally for comparison with the
participant predictions - Our previous experiences in CASP4 / CASP5
- Focus on development of algorithms for structure
prediction - Our objective in CASP6
- Improve predictions upon previous methods by
augmenting conformational sampling by orders of
magnitude - Our approach
- Deploy a structure prediction supercomputer
based on the volunteer computing paradigm
Predictor_at_Home - Our final goal
- Test the hypothesis that significantly increased
sampling affordable with Predictor_at_Home indeed
improves the quality of structure prediction
7Predictor_at_home Heterogeneous Workflow
8Predictor_at_home Significant Results
- Over three months, from June 2004 to August 2004
- 6786 users joined the project providing a total
compute time of about 12 billion seconds (380
years)? - P_at_H has identifies four types of targets
- Easy targets based on good templates do not
benefit from extensive sampling P_at_H and a
dedicated cluster provide similar results over
the same interval of time - Medium difficult targets based on loose templates
of unrelated proteins benefit from high P_at_H
sampling P_at_H provides better results than a
dedicated cluster over the same interval of time - Hard difficult targets without a template and
lengths up to 300 amino acids still benefit from
high P_at_H sampling - as for Medium difficult
target - Very hard targets without a template, longer than
300 amino acids, and multi-domains show sever
limitation of model to capture the multi-domians
- P_at_H and a dedicated cluster provide bad results
over the same interval of time
9Predictor_at_home Prediction Samples
Experimental structures P_at_H predictions
Comparative Modeling (easy)? Target t0277, 119
residues GDT 80.34, RMSD 1.88
Fold Recognition (medium)? Target t0274, 159
residues GDT 71.63, RMSD 3.40
New Fold (hard)? Target t0201, 94 residues GDT
43.88, RMSD 5.80
10Docking_at_home Objectives and Research Fields
- Objectives
- to explore the multi-scale nature of algorithmic
adaptations in protein-ligand docking - protein-ligand representation spanning scale
from rigid to flexible representation of
protein-ligand interactions - solvent representation spanning scale from less
accurate to more accurate modeling of water
treatment - sampling strategy spanning scale from fixed to
adaptive sampling of the protein-ligand docking
space - to develop cyber infrastructures based on
volunteer computing that efficiently accommodate
these adaptations - Interdisciplinary research fields
- docking methods (Drs. Charles L. Brooks III at
TSRI and Michela Taufer at UTEP)? - decision theory (Dr. Martine Ceberio at UTEP)?
- modeling for dynamic adaptation (Drs. Patricia J.
Teller and Michela Taufer at UTEP)? - volunteer computing (Drs. David P. Anderson at UC
Berkeley and Michela Taufer at UTEP)?
11Docking_at_home Portal
http//docking.utep.edu