Title: High-Throughput Crystallography at Monash
1High-Throughput Crystallography at Monash
- Noel Faux
- Dept of Biochemistry
- and Molecular Biology
- Monash University
2Structural Biology Pipe Line
Cloning
Purification
X-ray diffraction
Determine the structure
Expression
Crystallisation
Australian synchrotron online in 2007
Data processing and structural determination
major bottle neck
High throughput robots and technologies Tecan
Freedom Evolution ÄKTAxpress Trialing crystal
storage and imaging facilities
Target tracking / LIMS
Data Management
Phasing (CCP4/CNS GRID computing)
3The problems
- Target-tracking/Data management
- The process of protein structure determination
creates a large volume of data. - Storage, security, traceability, management and
backup of files is ad-hoc. - Remote access of the files is limited and
requires different media formats. - Structure determination
- CPU intensive
4- Part of a National Project for the development
of eResearch platforms for the management and
analysis of data for research groups in
Australia. - Aim establish common standardised software /
middleware applications that are adaptable to
many research capabilities
5Solution
- Central repository of files
- Attach metadata to the files
- World wide secure access to the files
- Automated collection and annotation of the files
from in-house and synchrotron detectors
6The infrastructure
X-ray image
Collection PC
Mounted crystal Streaming Video (SV)
Lab Temp
Instrument Rep
Crystal Temp
Sensor Data
Lab SV
Kepler
Lab Still Pics
Monash University ITs Sun GRID 54 dual 2.3 GHz
CPUs 208.7 GB (3.8 GB per node) gt10 TB storage
capacity Running Gridsphere
Lab PC
Storage Resource Broker
7Central web portal
8Central web portal
9Automated X-ray data reduction
- Automated processing of the diffraction data
- Investigating the incorporation of Xia2
Automated Data Reduction - New automated data reduction system designed to
work from raw diffraction data and a little
metadata, and produce usefully reduced data in a
form suitable for immediately starting phasing
and structure determination (CCP4)
1
1. (Graeme Winter) The CCP4 suite programs for
protein crystallography. (1994). Acta
Crystallogr. D50, 760-763.
10Divide and Conquer
- A large number of CPUs available across different
computer clusters at different locations - Monash ITs Sun grid
- VPAC
- (Brecca 97 dual Xeon 2.8 GHz CPUs, 160 GB (2 GB
per node) total memory Edda 185 Power5 CPUs,
552 GB (8-16 GB per node) total memory) - APAC
- 1680 processors, 3.56 terabytes of memory, 100
terabytes of disk - Personal computers
11DART and CCP4
- Aims Use the CCP4 interface locally but run the
jobs remotely across a distributed system - Nimrod to distribute the CCP4 jobs across the
different Grid systems - Investigating the possibility of incorporating
the CCP4 interface into the DART web portal
12Exhaustive Molecular Replacement
- No phasing data
- No sequence identity (lt20)
- No search model
- Is there a possible fold homolog
- Exhaustive Phaser scan of the PDB
- Exhaustive searches with different parameters and
search models
2
2. Acta Cryst. (2005). D61, 458-464.
Likelihood-enhanced fast translation functions A.
J. McCoy, R. W. Grosse-Kunstleve, L. C. Storoni
and R. J. Read.
13Exhaustive Molecular Replacement
- Proteins building blocks are domains
- Use subset of SCOP as search models in a PHASER
calculation. - The use of Grid computing will make this possible
1000 CPUs days for typical run
- Search at the family level
- Take the highest resolution structure
- Mutate to poly-alanine, and delete loops and
turns - Phaser
- Families with z-score ? 6 search with each of
their domain members
14Exhaustive Molecular Replacement
- Each node runs a perl script
- Requests a job
- Launch phaser
- Returns the results
- Repeats until the list is exhausted
- Database containing
- ToDo list
- Parameters
- Results
ITs Sun GRID
56 dual dual AMD OpteronCPUs 208.7 GB (3.8 GB per
node) gt10 TB storage capacity, 160 GB (2 GB per
node) total memory
Will be extended to use Nimrod to gain access to
APAC and the Pacific Rim Grid (Pragma)
15Final Pipeline
Cloning
Purification
X-ray diffraction
Determine the structure
Expression
Crystallisation
High through put robotics and technologies
Xia2
Data collection, management, storage, and remote
access DART
Data processing, exhaustive experimental (e.g.,
SAD, SIRAS, MIRAS) and MR phasing for final
refinement Grid Computing NIMROD PHASER AutoSHARP
CCP4 DART
16Acknowledgments
- Monash University
- Anthony Beitz
- Nicholas McPhee
- James Whisstock
- Ashley Buckle
- James Cook University
- Frank Eilert
- Tristan King
- DART Team