Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and systematically search algorithm/program and parameter space - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and systematically search algorithm/program and parameter space

Description:

Automatic Structure Determination--- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and systematically ... – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and systematically search algorithm/program and parameter space


1
Automatic Structure Determination--- given a
data set, solve the structure quickly and better,
by using a parallel workflow engine to
automatically and systematically search
algorithm/program and parameter space
  • Zheng-Qing (Albert) Fu
  • SER-CAT, APS, Argonne National Laboratory
  • Biochem. Mol. Biology, Univ. Of Georgia,
    Athens, Georgia

2007 ACA Summer School
2
What we learnt from Structural Genomics
All Targets
Cloned
Crystals
Structures
Cloned (7) Crystals (33) StructuresOverall
Success Rate (from Clone to Structure) 2.45
3
From gene to final structure, crystallographic
analysis of protein structures is a complicated
Multi-Step, Multi-Discipline, Costly, and
Systematic Engineering Project.
Structure
Refinement
Map Tracing
Phasing
Tedious Time Consuming
Data Processing
Data Collection
Data Collection, Data Procession and Structure
Solving Process (Intensive Computing)
Crystallization
Key to Success
Protein Prep
Bottle Neck
Gene
Fu (2002) Diffraction Methods In Structural
Biology, Gordon Research Conferences. New London,
CT, USA.
4
Why Automation?
Reason 1 Automation may optimize the steps of
the whole process, and thus improve the success
rate and accuracy of the final structure.
5
Why Automation?
Reason 2 The Structural Biology in the
post-genomics era challenges the X-ray
crystallography to provide better hardware,
better software and better full services.
ltltlt A Decade Ago gtgtgt Every Structural Biologist
was also an Excellent
Crystallographer
ltltlt Nowadays gtgtgt Most of the new-generation
Structural Biologists only know, if any at all,
some basic concepts of Crystallography. They
depends on other peoples recipes, and at most
learn how to run a bunch of computer programs.
Do they want to, or have ability to solve new
problems related to Crystallography?
6
Why Automation?
Reason 3 Even experienced crystallographer may
make careless mistakes, too.
Blood Coagulation InhibitorA small protein
containing 12 Cys. Source venom of habu
(rattlesnake). A good target for S
phasing.Native Data were collected at both
home source and SER-CAT synchrotron beam line.
Synchrotron Source (1.74?)
Home Cr Source (2.29?)
Automation may help avoid such un-recoverable
mistakes that may happen at any step of the
complicated process.
7
Automation of Part of the Whole Process from Data
Collection to Structure-Solving Feasibility,
Current Implementation
Data Acquisition Processing
Structure-Solving Process
8
Data Acquisition Processing
9
1). How to detect and avoid these problems before
too late?
During data collection, any problem with the
diffraction system such as of X-ray
source Shutter Goniometer Stage Detector Cry
stal Mounting Other mechanical, optical,
electronic defects etc.can ruin the data
quality, leading to failure of the whole process.
10
In addition to the unexpected problems, there are
many other issues during data collection
2). Is the diffraction quality is
acceptable? 3). Is the data quality still
improving? 4). Is the data collected enough to
solve the structure? 5). Should continue
collecting more frames or better mount
another fresh crystal?
All these questions can be answered if and only
if we know how to monitor the Signal/Noise ratio
during data collection.
11
A New Statistic Index, Ras, to More Objectively
and Accurately Evaluate Signal/Noise Ratio
Signal/Noise ratio1) Ras Da/Dc
Da ltDI/sIgta
Here Da is the ratio of Bijvoet difference and
the standard error in intensity, calculated using
accentric reflections.
Dc is statistically evaluated as Da, but using
centric reflections. Theoretically, it should be
zero. Dc is the counter-part of Da, and thus can
serve as the indicator of noise level.
Dc ltDI/sIgtc
Ras, thus defined, can server as a signal/noise
ratio in terms of anomalous scattering. The
higher the better. Tests show that it is more
objective and reliable than other indices
currently used for measuring anomalous signal.
1). Fu et al. (2004). Acta Cryst D60499-506.
12
Signal-based Data Collectionwith Ras as a
reliable indicator, diffraction data can be
acquired more appropriately for a given crystal,
by monitoring the Signal/Noise ratio through the
data collection
13
Structure-Solving Process
14
After data processed, we have to face a set of
different issues in the structure-solving process
1). There are numerous programs (or algorithms)
to choose. A program may outperform others
in some cases and vise versa. Which
programs to use? 2). Each program has multiple
parameters. Which parameters to adjust?
What combination of the parameters can give the
best result? 3). If phasing produced a traceable
map, is it the best map for you to work on
for fitting, refining to complete the structure?
15
For a given data set, combination of different
programs or parameter settings can produce
totally different results. Some may succeed to
give a solution, but many others will fail1).
Test result on solving the structure of a
hydrolase protein (864AAs, 30Se). The 2.8Å data
was provided by Dr. Turner. Green dots are the
percentages of residues automatically traced from
maps generated by phasing with different programs
(SHELXD, ISAS, SOLVE, RESOLVE) and parameter
settings. Pink represents resolution cutoff for
heavy atom sites searching. Solid squares
indicate SHELXD, while open ones for SOLVE.
Blue represents resolution cutoff for phasing
and density modification. Solid diamond marker
indicate SOLVE/RESOLVE, while open one as ISAS.
The Current common Try Error practice in
solving a structure is time-consuming and
tedious. It may not give the best solution, and
may even fail to find any solution at all for
data with marginal quality.
1). Fu, Rose, Wang (2005) Acta Cryst D61951-959.
16
Parallel Workflow Engine to systematically
search program and parameter spaces to find the
best solution for given data.
Figure 1. The dark blocks represent parallel
tasks dynamically generated from various
crystallographic computing programs with
different parameter settings. The tasks are
distributed by workflow engine to the computing
facility and run parallel. Upon completion, the
workflow engine will harvest and analyze the
results, and dynamically create and start another
group of tasks for the next step. And so on,
until the whole process finishes. Fu (2003).
Proceeding of the 5th Int. Conference on Mol.
Struct. Biology. Vienna, Austria, Sept. 3-7. Fu
et al. (2005). Acta Cryst D61951-959.
17
Algorithm and Design
18
Where are we?
19
AcknowledgmentGeorge Wu and many Ph.D. students
including Dongsheng Che, Jizhen Zhao, Feng Sun,
Haijin Yan, Dept. of Computer Sciences, UGAB.C.
Wang, John Rose, SER-CAT, SECSG, UGAJohn Chrzas,
Zhongmin Jin, Jim Fait, SER-CAT, APSAndy Howard,
Illinois Institute of TechnologyRobert Sparks,
Bruker (formerly Siemens) AXS Inc.Xuong
Nguyen-Huu, UC San DiegoGeorge Sheldrick,
University of Göttingen, Germany.Randy Read,
Cambridge University, EnglandTom Terwilliger,
Los Alamos National Lab Peter Briggs (CCP4,
England) and Authors of all the programs plugged
into SGXPro.Work is supported in part with
funds from the National Institute of Health
(GM62407) and SERCAT, APS
Write a Comment
User Comments (0)
About PowerShow.com