A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps

Description:

Each label is location orientation. Evidence y is the ... Store Fourier coefficients in Cartesian space. At each location x, store a single orientation r ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 36
Provided by: frankd86
Category:

less

Transcript and Presenter's Notes

Title: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps


1
A Probabilistic Approach to Protein Backbone
Tracing in Electron Density Maps
  • Frank DiMaio, Jude Shavlik
  • Computer Sciences Department
  • George Phillips
  • Biochemistry Department
  • University of Wisconsin Madison
  • USA

Presented at the Fourteenth Conference on
Intelligent Systems for Molecular Biology (ISMB
2006), Fortaleza, Brazil, August 7, 2006
2
X-ray Crystallography
FFT
X-ray beam
ProteinCrystal
CollectionPlate
ElectronDensity Map (3D picture)
3
Given Sequence Density Map
Sequence
Electron Density Map
4
Find Each Atoms Coordinates
5
Our Subtask Backbone Trace
Ca
Ca
Ca
Ca
6
The Unit Cell
  • 3D density function ?(x,y,z) provided over unit
    cell
  • Unit cell may contain multiple copies of the
    protein

7
The Unit Cell
  • 3D density function ?(x,y,z) provided over unit
    cell
  • Unit cell may contain multiple copies of the
    protein

8
Density Map Resolution



ARP/wARP (Perrakis et al. 1997)
TEXTAL (Ioerger et al. 1999) Resolve (Terwilliger
2002)
Our focus
9
Overview of ACMI (our method)
  • Local Match
  • Algorithm searches for sequence-specific 5-mers
    centered at each amino acid
  • Many false positives
  • Global Consistency
  • Use probabilistic model to filter false positives
  • Find most probable backbone trace
  • Global Consistency
  • Use probabilistic model to filter false positives
  • Find most probable backbone trace

10
5-mer Lookup and Cluster
  • VKH V LVSPEKIEELIKGY

PDB
Cluster 1
Cluster 2
NOTE can be done in precompute step
wt0.67
wt0.33
11
5-mer Search
  • 6D search (rotation translation)
    forrepresentative structures in density map
  • Compute similarity
  • Computed by Fourier convolution (Cowtan 2001)
  • Use tuneset to convert similarity score to
    probability

12
Convert Scores to Probabilities
5-mer representative
13
In This Talk
  • Where we are now
  • For each amino acid in the protein, we have a
    probability distribution over the unit cell
  • Where we are headed
  • Find the backbone layout maximizing

14
Pairwise Markov Field Models
  • A type of undirected graphical model
  • Represent joint probabilities as product of
    vertex and edge potentials
  • Similar to (but more general than) Bayesian
    networks

y
u1
u3
u2
15
Protein Backbone Model
ALA
GLY
LYS
LEU
  • Each vertex is an amino acid
  • Each label is location
    orientation
  • Evidence y is the electron density map
  • Each vertex (or observational) potential
    comes from the 5-mer matching

16
Protein Backbone Model
ALA
GLY
LYS
LEU
  • Two types of edge (or structural) potentials
  • Adjacency constraints ensure adjacent amino acids
    are 3.8Å apart and in the proper orientation

17
Protein Backbone Model
ALA
GLY
LYS
LEU
  • Two types of structural (edge) potentials
  • Adjacency constraints ensure adjacent amino acids
    are 3.8Å apart and in the proper orientation
  • Occupancy constraints ensure nonadjacent amino
    acids do not occupy same 3D space

18
Backbone Model Potential
Constraints between adjacent amino acids

x
19
Backbone Model Potential
Constraints between nonadjacent amino acids
20
Backbone Model Potential
Observational (amino-acid-finder) probabilities
21
Probabilistic Inference
  • Want to find backbone layout that maximizes
  • Exact methods are intractable
  • Use belief propagation (BP) to approximate
    marginal distributions

22
Belief Propagation (BP)
  • Iterative, message-passing method (Pearl 1988)
  • A message, , from amino acid i toamino
    acid j indicates where i expects to find j
  • An approximation to the marginal (or belief)
    ,is given as the product of incoming messages

23
Belief Propagation Example
ALA
GLY
24
Technical Challenges
  • Representation of potentials
  • Store Fourier coefficients in Cartesian space
  • At each location x, store a single orientation r
  • Speeding up O(N2X2) naïve implementation
  • X the unit cell size ( Fourier coefficients)
  • N the number of residues in the protein

25
Speeding Up O(N2X2) Implementation
  • O(X2) computation for each occupancy message
  • Each message must integrate over the unit cell
  • O(X log X) as multiplication in Fourier space
  • O(N2) messages computed stored
  • Approx N-3 occupancy messages with a single
    message
  • O(N) messages using a message product accumulator
  • Improved implementation O(NX log X)

26
1XMT at 3Å Resolution
prob(AA at location)
HIGH
0.82
0.17
1.12Å RMSd 100 coverage
LOW
27
1VMO at 4Å Resolution
prob(AA at location)
HIGH
0.25
0.02
3.63Å RMSd 72 coverage
LOW
28
1YDH at 3.5Å Resolution
prob(AA at location)
HIGH
0.27
0.02
1.47Å RMSd 90 coverage
LOW
29
Experiments
  • Tested ACMI against other map interpretation
    algorithms TEXTAL and Resolve
  • Used ten model-phased maps
  • Smoothly diminished reflection intensitiesyieldin
    g 2.5, 3.0, 3.5, 4.0 Å resolution maps

30
RMS Deviation
ACMI
ACMI
Textal
Resolve
Ca RMS Deviation
Density Map Resolution
31
Model Completeness
chain traced
residues identified
ACMI
ACMI
Textal
Resolve
Density Map Resolution
32
Per-protein RMS Deviation
TEXTAL RMS Error
Resolve RMS Error
ACMI RMS Error
33
Conclusions
  • ACMI effectively combines weakly-matching
    templates to construct a full model
  • Produces an accurate trace even with
    poor-quality density map data
  • Reduces computational complexity from O(N2 X2)
    to O(N X log X)
  • Inference possible for even large unit cells

34
Future Work
  • Improve amino-acid-finding algorithm
  • Incorporate sidechain placement / refinement
  • Manage missing data
  • Disordered regions
  • Only exterior visible (e.g., in CryoEM)

35
Acknowledgements
  • Ameet Soni
  • Craig Bingman
  • NLM grants 1R01 LM008796 and 1T15 LM007359
Write a Comment
User Comments (0)
About PowerShow.com