Title: Evaluating the Solution from MrBUMP and Balbes
1Evaluating the Solution from MrBUMP and Balbes
- Ronan Keegan, Fei Long, Martyn Winn, Garib
Murshudov - STFC Daresbury Laboratory
-
- York Structural Biology Laboratory, University of
York
2PDB Depositions (1999-2009)
1999
2009
3What are MrBUMP and Balbes?
- Automated molecular replacement (MR) and
beyond.... - MrBUMP
- Brute force approach - try a variety of search
models prepared in several different ways in MR - Balbes
- Identifies the best possible search model to be
used in MR by using a specially grafted version
of the PDB optimised for use in MR
4MrBUMP
- An automation framework for Molecular
Replacement. - Particular emphasis on generating a variety of
search models.
- Wraps Phaser and/or Molrep.
- Uses a variety of helper applications (e.g.
Chainsaw) and bioinformatics tools (e.g. Fasta,
Mafft) to generate search models - Makes use of up-to-date on-line databases (e.g.
PDB, Scop)
- In favourable cases, gives one-button solution
- In Complicated Cases, will suggest likely search
models for manual investigation (lead generation)
5The Pipeline
Target Details
Target MTZ Sequence
Template Search
N templates
Model Preparation
Check scores and exit or select the next model
N x M models
Molecular Replacement Refinement
Phase Improvement
6Search for model templates
- FASTA search of PDB
- Sequence based search using sequence of target
structure
All of the resulting PDB id codes are added to a
list
These structures are called model templates
- Other templates from
- SSM search using top hit from the FASTA search
- Can add additional PDB id codes to the list
- Can add local PDB files
7Multiple Alignment step
target
model templates
pairwise alignment (used in Chainsaw)
Jalview 2.08.1 Barton group, Dundee
currently support ClustalW, MAFFT, probcons or
T-coffee for multiple alignment
Model template scoring score sequence
identity X alignment quality
8Domain 1
X
Domain 2
SCOP
PQS/PISA
Multimers Better signal-to-noise ratio than
monomer, if assembly is correct for the target.
Domains e.g. if relative domain motion
template chains
superpose
Ensembles Create ensemble of top search models,
for use in additional run of Phaser. Need to be
similar in MW and rmsd
9Search Model Preparation
- Search models prepared in four ways
- PDBclip
- original PDB with waters removed, most probable
conformations selected and format tidied (e.g.
chain ID added) - Molrep
- Molrep model preparation function which aligns
the template sequence with the target sequence
and prunes the non-conserved side chains
accordingly. - Chainsaw
- Can be given any alignment between the target and
template sequences. Non-conserved residues are
pruned back to the gamma atom. - Polyalanine
- Created by excluding all of the side chain atoms
beyond the CB atom using the Pdbset program
more side chain truncation
deal with deletions
10Molecular replacement step
- Running MR
- For each search model, MR done with Molrep or
Phaser or both. - MR programs run mostly with defaults
- MrBUMP provides LABIN columns, MW of target,
sequence identity of search model, number of
copies to search for, number of clashes tolerated
11Molecular replacement step
- MR output
- MR scores and un-refined models available for
later inspection - ? assess quality of solution, extent of model
bias - MrBUMP doesnt use MR scores, but checks for
output file with positioned model, and passes to
refinement step
12Testing enantiomorphic spacegroups
- 11 pairs of enantiomorphic spacegroups containing
screw axes of opposite handedness, e.g. P41 and
P43) - usually both need to be tested in MR
- correct spacegroup indicated by TF and packing
- MrBUMP can test both in Molrep and/or Phaser.
- For each search model, best MR results used to
fix spacegroup for subsequent steps. - Discrimination good for good search model
correct MR solution
13Inclusion of fixed models
- MrBUMP will now accept one or more positioned
models. - These are included as fixed models in all MR
jobs.
- Thus, solve complexes through consecutive runs
of MrBUMP. - Automation of this in progress ....
14Restrained refinement step
- The resulting models from molecular replacement
are passed to Refmac for restrained refinement. - The change in the Rfree value during refinement
is used as rough estimate of how good the
resulting model is.
final Rfree lt 0.35 or final Rfree lt 0.5 and
dropped by 20
?
success
?
final Rfree lt 0.48 or final Rfree lt 0.52 and
dropped by 5
marginal
?
poor
otherwise
conservative .....
15Refinement results
16Summarised results..
Best search model so far and file location for
this model
List of sorted results so far
17Example (thanks to Elien Vandermarliere)
Target is an arabinofuranosidase Data to 1.55Å in
P212121
Small C domain (144 res) solved with 34 seq
ident model (1w9t_B_MOLREP best out of 4
solutions)
With C domain solution fixed, large N domain (345
res) solved with 28 seq ident model (1gyh_C_CHNSA
W best out of 7 solutions)
Acorn CC increases from 0.04 to 0.18 This step
part of MrBUMP ARP/wARP then builds 457/493
residues to R/Rfree 0.185/0.225
18MrBUMP output
- Log file gives summary of models tried and
results of MR - May get several putative solutions
- Ease of subsequent model re-building, model
completion may depend on choice of solution - Worth checking poor solutions
- Top solution available from ccp4i
19Using MrBUMP
- Part of CCP4 suite
- CCP4i GUI
20(Balbes)
Fei Long, Alexei Vagin, Paul Young, Garib
Murshudov YSBL, York University
http//www.ysbl.york.ac.uk/fei/balbes/index.html
21Balbes
- Balbes uses a reorganised version of the PDB
database customised for use in MR. - Has its own classification of domains most
suitable for use in molecular replacement - Database also includes classification of possible
oligomers based on the basic structures. - Built upon Molrep for molecular replacement and
Refmac for doing the refinement
22Balbes Database
- Cut down version of the PDB keeping only single
copies of unique folds. - Where 2 entries have sequence ID gt 80 and rmsd
lt 1 angstrom, the entry with the highest
resolution is retained - More than 20000 entries
- More than 23000 domains classified
- For each sequence additional information is
stored including secondary structure information,
number of domains and potential to form multimers - Flexible loops are cut from the stored structures
- Multimers are classified with information from
the EBI PISA service
23Balbes Workflow
24Assemblies
- Fully automatic support for handling assemblies
- Simply provide additional sequences in input
sequence file for each additional component in
the target data - Balbes will first look for assemblies in its
database containing all of the sequences but if
not find it will look for subsets of those
sequences
25Using Balbes
- Through CCP4i
- included in Linux and Mac OSX releases
- Simple CCP4i interface - provide sequence and
structure factor file (MTZ) - Enter multiple sequences in input file for
complexes
26Using Balbes
- Through the YSBL Web Server
- Balbes is one of several programs available to
use via the web at the York University YSBL Web
Server - Create an account for access
http//www.ysbl.york.ac.uk/YSBLPrograms/index.jsp
27Balbes Web Server
- Enter MTZ and sequence information as a file or
paste sequence - Option to check all related space groups
- Option to submit resulting map to ARP/wARP server
for final model building
28Web Server Results
- Summary of processing for each spacegroup
- Final best result highlighted
- For each spacegroup log file and all output files
are made available for download (5 days) - If user opted to use ARP/wARP server a link to
the ARP/wARP results is provided
29Balbes Output
- Spacegroup specific output
- Download files
- Main summary file showing results of MR and
refinement for each template model that was used. - Q value scoring
30(No Transcript)
31Balbes the statistics
- In 2006, 67 of structures deposited in the PDB
where solved by MR - Balbes improved this to 80 using
- A better organised database (for MR)
- A better choice of protocols for tackling the MR
problem - Improved algorithms in MR and refinement programs
32Balbes Usage
- 90 structures in PDB referencing use of Balbes as
part of structure solution - 35 citations
- More than 5000 users of web server version of
program - 150 users per month
33Summary
- MrBUMP
- Brute force approach, try everything
- May give lead in borderline cases as well
automating straight-forward cases - All results are easily accessible and summarised
- Balbes
- Efficient and quick at solving structures that
have reasonable homologues - Deals with assemblies/complexes fully
automatically - Combined with ARP/wARP it can take structure
nearly all the way to completion - In conclusion
- Try Both!! Compute cycles are very cheap.
34Acknowledgements
- Fei Long, Alexei Vagin, Paul Young, Andrey
Lebedev, Garib Murshudov Balbes and Refmac - Martyn Winn, MrBUMP
- Airlie McCoy, Randy Read Phaser
- Alexi Vagin Molrep
- Norman Stein Chainsaw and Ctruncate
- CCP4 Core team Support and Testing
35Downloading Balbes Database
- Balbes database available as a separate download
(1.6GB)
36PDB Statistics
- The number of structures in the PDB is increasing
rapidly year on year - By the end of 2009 there were gt 62000 structures
deposited - 7449 structures were deposited in 2009 (12 of
the total) - MrBUMP and Balbes seek to exploit this wealth of
data to improve the success rate of molecular
replacement