Title: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps
1A Probabilistic Approach to Protein Backbone
Tracing in Electron Density Maps
- Frank DiMaio, Jude Shavlik
- Computer Sciences Department
- George Phillips
- Biochemistry Department
- University of Wisconsin Madison
- USA
Presented at the Fourteenth Conference on
Intelligent Systems for Molecular Biology (ISMB
2006), Fortaleza, Brazil, August 7, 2006
2X-ray Crystallography
FFT
X-ray beam
ProteinCrystal
CollectionPlate
ElectronDensity Map (3D picture)
3Given Sequence Density Map
Sequence
Electron Density Map
4Find Each Atoms Coordinates
5Our Subtask Backbone Trace
Ca
Ca
Ca
Ca
6The Unit Cell
- 3D density function ?(x,y,z) provided over unit
cell - Unit cell may contain multiple copies of the
protein
7The Unit Cell
- 3D density function ?(x,y,z) provided over unit
cell - Unit cell may contain multiple copies of the
protein
8Density Map Resolution
2Å
4Å
3Å
ARP/wARP (Perrakis et al. 1997)
TEXTAL (Ioerger et al. 1999) Resolve (Terwilliger
2002)
Our focus
9Overview of ACMI (our method)
- Local Match
- Algorithm searches for sequence-specific 5-mers
centered at each amino acid - Many false positives
- Global Consistency
- Use probabilistic model to filter false positives
- Find most probable backbone trace
- Global Consistency
- Use probabilistic model to filter false positives
- Find most probable backbone trace
105-mer Lookup and Cluster
PDB
Cluster 1
Cluster 2
NOTE can be done in precompute step
wt0.67
wt0.33
115-mer Search
- 6D search (rotation translation)
forrepresentative structures in density map - Compute similarity
- Computed by Fourier convolution (Cowtan 2001)
- Use tuneset to convert similarity score to
probability
12Convert Scores to Probabilities
5-mer representative
13In This Talk
- Where we are now
- For each amino acid in the protein, we have a
probability distribution over the unit cell
- Where we are headed
- Find the backbone layout maximizing
14Pairwise Markov Field Models
- A type of undirected graphical model
- Represent joint probabilities as product of
vertex and edge potentials - Similar to (but more general than) Bayesian
networks
y
u1
u3
u2
15Protein Backbone Model
ALA
GLY
LYS
LEU
- Each vertex is an amino acid
- Each label is location
orientation - Evidence y is the electron density map
- Each vertex (or observational) potential
comes from the 5-mer matching
16Protein Backbone Model
ALA
GLY
LYS
LEU
- Two types of edge (or structural) potentials
- Adjacency constraints ensure adjacent amino acids
are 3.8Å apart and in the proper orientation
17Protein Backbone Model
ALA
GLY
LYS
LEU
- Two types of structural (edge) potentials
- Adjacency constraints ensure adjacent amino acids
are 3.8Å apart and in the proper orientation - Occupancy constraints ensure nonadjacent amino
acids do not occupy same 3D space
18Backbone Model Potential
Constraints between adjacent amino acids
x
19Backbone Model Potential
Constraints between nonadjacent amino acids
20Backbone Model Potential
Observational (amino-acid-finder) probabilities
21Probabilistic Inference
- Want to find backbone layout that maximizes
- Exact methods are intractable
- Use belief propagation (BP) to approximate
marginal distributions
22Belief Propagation (BP)
- Iterative, message-passing method (Pearl 1988)
- A message, , from amino acid i toamino
acid j indicates where i expects to find j - An approximation to the marginal (or belief)
,is given as the product of incoming messages
23Belief Propagation Example
ALA
GLY
24Technical Challenges
- Representation of potentials
- Store Fourier coefficients in Cartesian space
- At each location x, store a single orientation r
- Speeding up O(N2X2) naïve implementation
- X the unit cell size ( Fourier coefficients)
- N the number of residues in the protein
25Speeding Up O(N2X2) Implementation
- O(X2) computation for each occupancy message
- Each message must integrate over the unit cell
- O(X log X) as multiplication in Fourier space
- O(N2) messages computed stored
- Approx N-3 occupancy messages with a single
message - O(N) messages using a message product accumulator
- Improved implementation O(NX log X)
261XMT at 3Å Resolution
prob(AA at location)
HIGH
0.82
0.17
1.12Å RMSd 100 coverage
LOW
271VMO at 4Å Resolution
prob(AA at location)
HIGH
0.25
0.02
3.63Å RMSd 72 coverage
LOW
281YDH at 3.5Å Resolution
prob(AA at location)
HIGH
0.27
0.02
1.47Å RMSd 90 coverage
LOW
29Experiments
- Tested ACMI against other map interpretation
algorithms TEXTAL and Resolve - Used ten model-phased maps
- Smoothly diminished reflection intensitiesyieldin
g 2.5, 3.0, 3.5, 4.0 Å resolution maps
30RMS Deviation
ACMI
ACMI
Textal
Resolve
Ca RMS Deviation
Density Map Resolution
31Model Completeness
chain traced
residues identified
ACMI
ACMI
Textal
Resolve
Density Map Resolution
32Per-protein RMS Deviation
TEXTAL RMS Error
Resolve RMS Error
ACMI RMS Error
33Conclusions
- ACMI effectively combines weakly-matching
templates to construct a full model - Produces an accurate trace even with
poor-quality density map data - Reduces computational complexity from O(N2 X2)
to O(N X log X) - Inference possible for even large unit cells
34Future Work
- Improve amino-acid-finding algorithm
- Incorporate sidechain placement / refinement
- Manage missing data
- Disordered regions
- Only exterior visible (e.g., in CryoEM)
35Acknowledgements
- Ameet Soni
- Craig Bingman
- NLM grants 1R01 LM008796 and 1T15 LM007359