NMR Peak Assignment : Better Algorithm - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

NMR Peak Assignment : Better Algorithm

Description:

Improved Two Layer Algo. layerOne (sequence U, spin V, score) ... Improved Two Layer Algo. layerTwo (sequence, spin, score, possible assignment){ smax ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 52

Provided by: SVK

Category:

more less

Transcript and Presenter's Notes

Title: NMR Peak Assignment : Better Algorithm

1
NMR Peak Assignment Better Algorithm

Frederick Vizeacoumar
Tommy Chu

2
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

3
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

4
Introduction

Nuclear Magnetic Resonance (NMR)

5
Introduction

Nuclear Magnetic Resonance (NMR)
Use the strong magnetic wave to align nuclei
(isotopes).
When this spin transition occurs, the nuclei are
said to be in resonance with the applied
radiation.

6
NMR measurement

Chemical Shift
ppm
Electrons in the molecule have small magnetic
fields
When applied string magnetic field, electrons
tend to oppose the applied field.
NMR Spectrum

7
NMR spectroscopy

Study the physical, chemical, and biological
properties.
Problem
Identified sequence.
Unknown (complete) structure.
Known basic structure.
Unknown the structure corresponding to AAs.

8
Procedure to determine protein structure using NMR

Three steps
Data generation
Involves corresponding resonance peaks to AA and
forming spin system.
Data interpretation
Involves matching spin system to amino acids
providing inter and intra AA distance and angle.
NMR Structure calculation
Involves structure determination using Molecular
Dynamics (MD) Energy Minimization (EM).

9
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

10
Project -- Data Interpretation

Primary Goal
(Better) Automated Peak Assignment.
Steps for doing this data interpretation
Map resonance peaks from different NMR spectra to
same residue.
Identify adjacency relationship.
Assign the segments to the protein sequence.
Problem with existing Algorithms
Low accuracy.
High time complexity.

11
Peak Assignment

Assignment procedure need to address two crucial
information
Different AA types have different distribution of
spin system.
The adjacency info, scalar coupling, between spin
systems are obtained by identifying their common
resonance frequencies.
Enhancement techniques
HSQC
Dipolar coupling

12
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

13
Previous Works

1 formulated NMR assignment problem to a
Constraint Bipartite Matching (CBM) problem.
1 proposed a naïve (two layer) algorithm.
1,6 proved D-string CBM is NM-hard.
6 proposed two approximation algorithm.
5 applied the branch-and-bound techniques.
9 attempted to solve CBM using extensive search
techniques in artificial intelligence.

14
Bipartite Matching

G(U ? V,E)
U is sequence of AA
V is the set of Spins
U ? V ?
G(U ? V, M) ? G i.e. m ? M cannot share a same
vertex.

Protein Sequence
Spin Systems
A
3
G
2
C
1
T
4
15
Bipartite Matching

G(U ? V,E)
U is sequence of AA
V is the set of Spins
U ? V ?
G(U ? V, M) ? G i.e. m ? M cannot share a same
vertex

Protein Sequence
Spin Systems
A
3
G
2
C
1
T
4
16
Bipartite Matching

Perfect Matching
All node need to be covered by a match.
Weighted Bipartite Matching
Each edge associated with a weight.
Maximum (Perfect) Weighted Bipartite Matching
Maximize the total weight for all m ? M .

Protein Sequence
Spin Systems
A
3
G
2
C
1
T
4
17
Constraint Bipartite Matching

For all segment ? V
If (ui,vj) ? M i.e. ui ? U ? vj ? V
Then (ui1,vj1) ? M
D-string CBM
Specify the problem with maximum segment size is
D.

Protein Sequence
Spin Systems
A
3
G
2
C
1
T
4
18
Two Layered Algorithm 1

First layer
Filter out unlikely assignments for long strings.
Second layer
Try all possible combinations of assignments for
long strings to find the maximum one as the
result.

19
Approximation Algorithms 2

2D Approximation Algorithm
Let M be an optimal matching on G.
Every edges corresponding to a vertex will
conflict with at most 2 edges in M.
3logD Approximation algorithm
If the length of the longest string in V is at
most four times the length of shortest string
Then greedy algorithm will find a solution whose
weight is at least 1/6 of the optimal.

20
Branch and bounded algorithm

A systematic method for solving optimization
problems.
Construct a search tree and apply a carefully
selected criterion to determine which node to
expand the search.
Exponential time in the worst case

21
AUTOASSIGN/AUTOPEAK

5 Major Searches
Make strongest matches
Allow degenerate shifts
Extend assigned segments
Match weaker spin systems
Finish Assignments
Claims to have 98 accuracy.
Test only on RNA single strand sequence.
Infeasible to compute using protein sequences.

22
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

23
Input Parameters

Protein Sequence
Sequence of location and AA
Spin Systems
Segment of chemical shift separated by comma
Score Scheme
A table storing the score between an AA and the
range of chemical shift

24
Example input for protein sequence

1 GLY
2 SER
3 VAL
4 GLU
5 GLN
6 ILE
7 SER
8 GLY

25
Input Parameters

Protein Sequence
Sequence of location and AA
Spin Systems
Segment of chemical shift separated by comma
Score Scheme
A table storing the score between an AA and the
range of chemical shift

26
Example input for spin system (three segments)

12.5
13.5 ,
5.35
6.4
7.21 ,
16.1
17.2

27
Input Parameters

Protein Sequence
Sequence of location and AA
Spin Systems
Segment of chemical shift separated by comma
Score Scheme
A table storing the score between an AA and the
range of chemical shift

28
Example input for score scheme

5
GLY
SER
VAL
GLU
GLN
4
0 5
5 10
10 15
15 20

1 5 2 4
2 1 4 0
1 2 3 4
2 4 1 3
2 3 3 6

29
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

30
Implemented Design

CBM Two layer approximation algorithm
As mentioned earlier we use the bipartite graph
matching problem with some constraints to solve
the peak assignment problem
Key idea is the multi-dimensional NMR spectra
contains inter residual peaks that convey the
connectivity information between residues
We match the inter- and intra residual peaks
using their chemical shift making the
connectivity information straightforward

31
CBM Two layer Approximation

layerOne( sequence, spin, score, threshold)
for every segment Ui in spin do
for every position Vj in the sequence do
if score (Ui,Vj) threshold then
mark Vj as possible assignment
position
layerTwo( sequence, spin, score)
smax -8
for every possible legal combination set from
layerOne
do calculate the score and call it si
if si gt smax then
smax si and store current position
as final assignment

32
2 Approximation Algorithm

Key Idea is to form a Weighted Bipartite Graph
and use it to find the matching.
We use some restriction and constraints in
identifying the leading innermost edge as
explained.
We find an innermost edge, add that with all its
conflicting edges and take the minimum weight on
this set and subtract it and remove the edges
with weight 0.
The above process is called on the new set of
graph formed recursively to obtain feasible
matching

33
3 Log D Approximation

The idea is if there are m different segments in
the spin system we group these m small sets into
overlapping groups based on some formulas
find the score for each group and maximize the
final score and assignment to feasible solution
This also uses weighted bipartite graph
As explained before this problem includes still
more constraints in identifying there is no
overlapping between segments.

34
3 Log D Approximation

3 log D Approximation(Score U, Spin V)
r lmax / lmin , where lmax and lmin are max
and min length of
string in V
group V into g max(2, log4r) subsets Vi such
that
4i-1 s / lmin 4i
for every i ? 1,2, g
cal the set Ei of edges of G incident to
strings in Vi
initialize Mi Ø
while (Ei ? Ø)
find an edge e ? Ei of maximum
weight
add e to Mi and delete e and
all edges conflicting with e from Ei
greedily extend Mi to a maximal feasible
matching of G
output the heaviest one among Mi

35
Improved Two layer Algorithm

The threshold value in CBM algorithm is good to
eliminate but determination of this threshold
value is trial and error basis.
6 considers 3LogD approximation is better than
2 approximation.
In 3 Log D approximation, the partitioning the
subsets of segments in the spin system is based
on the formula 4i-1 s / lmin 4i . Does this
always give a better improvement in score ?

36
Improved Two Layer Algo.

layerOne (sequence U, spin V, score)
for every subset Vi of the set spin system V
do
Ei all edges incident from Vi to U
Mi Ø
while (Ei ? Ø)
find an edge e ? Ei of max weight
add e to Mi and delete e and all
conflicting edges with e from Ei
mark positions in sequence set for
corresponding Mi set as possible
assignment

37
Improved Two Layer Algo.

layerTwo (sequence, spin, score, possible
assignment)
smax -8
for every possible assingment position in U do
calculate the score and call it si
if si gt smax then
smax si and store current position
as final assignment
If there are m groups of spin segments, then
total number
of search would be 2m-1 .

38
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

39
Current Results
40
Expected Results
41
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

42
Interface with HSQC

Main work in Hetero-nuclear Single Quantum
Correlation involves in identifying the NH amide
side chain.
Basically it is a biological experiments which
yields data related to the NHx group directly
attached to proton
All AA produces one signal for N-H amide group
(except proline) based on its pH value and the
chemical shift exhibited, the NH side chain is
visible.

43
HSQC cont

Folded proteins or protein domains display a
broad distribution of NMR frequencies resulting
good spread-out.
Unfolded proteins do exhibit same resulting in
overlapping frequencies.
HSQC technique adds ligand shifting signals which
changes the overlapping.
This process also involves few calculation that
results in a better spin system values.

44
Survey on Dipolar Coupling

Unlike HSQC enabling to identify the NH amide
side chain, Dipolar coupling identifies the Ca -
Cb side chain.
Provides long range info which is lacking in NMR
experiments
This is also an biological experimental process
to improve the spin system which is the main part
in identifying the protein structure.
The results of this experiment enables us to
determine the side chain rotamer states using
rotamer prediction algorithm.

45
Dipolar Coupling cont

NMR solutions structures are determined primarily
using restraints derived from nuclear overhauser
effects. This derivation yields to the
proton-proton distance less than 5Å. Hence for
elongated molecules, NMR is not efficient.
Elongated molecules are present in the helical
structure of the protein sequence.
This local error on elongated molecules tends to
add over the length resulting in poor protein
structure determination.

46
Dipolar Coupling Cont

The size of dipolar coupling observed bet 2
nuclei is given by
DPQ(q,f)DaPQ(3cos2q 1)1.5 R sin2q cos 2f
Where R is rhombicity (shape of molecule)
The value obtained here helps to observe
elongated molecules as well

47
Agenda

Introduction
Problem Definition
Previous Works
Input Parameters
Design
Results
HSQC and Survey on Dipolar Coupling
Conclusion

48
Conclusion

Major part of NMR technique involves in peak
assignment process.
The main goal of our project in finding the
better algorithm lead us to think about improving
the matching and score scheme rather than the
improvement on the computational process of the
algorithms.
From the already existing algorithm, we found
that if there are m segments, a reduces amount of
subsets was taken and a better matching was done.
We thought it would be nice to look if all the
possible subsets are taken and observed for a
better match score value.

49
Future Directions

As given in the CBM two layer algorithm, if the
NMR experiment could someway help in identifying
the threshold value, then in our improved
version, the number of checks could be reduced
with this threshold value.
Extracting the results from HSQC experiment and
with some algorithms developed for this data set,
we can get a better spin system for the NH amide
side chain and improve our assignment process
Using the data from the Dipolar coupling to
identify the Ca - Cb side chain would get us even
better spin segments. This might yield to a
better protein structure determination.

50
Acknowledgements

Our Sincere thanks to
Dr. Guohui Lin, Professor, U of A.
Mr. Xiang Wan, Ph.D. Student, U of A.
Mr. Jon McCall, Spectrum Research LLC.
Dr. Gaetano T. Montelione, Rutgers Univ.

51
References

1 Y. Xu, D. Xu, D. Kim, V. Olman, J.
Razumovskaya, and T. Jiang. "Automated assignment
of backbone NMR peaks using constrained bipartite
matching", IEEE Computing Science Engineering,
450-62,2002.
2 C. Bartels, T. Xia, M. Billeter, P. Gu, and
K. Wu, "The program XEASY for computer-supported
NMR spectral analysis of biological
acromolecules", J. Biol. NMR 6, 1-10, 1995.
3 K. P. Neidig, M. Geyer, A. Go, C. Antz, R.
Saffrich, W. Beneicke,and H. R. Kalbitzer.
"AURELIA, a program for computer-aided analysis
of multidimensional NMR spectra", J. Biomol. NMR
6, 255-270, 1995.
4 B. R. Brooks, R. E. Bruccoleri, B. D.
Olafson, D. J. States, S. Swaminathan, and M.
Karplus. "CHARMM A Program for Macromolecular
Energy, Minimization, and Dynamics Calculations",
J. Comp. Chem. 4, 187-217, 1983.
5 G. Lin, D. Xu, Z-Z. Chen, T. Jiang, J. Wen,
Y. Xu. "An Efficient Branch-and-Bound Algorithm
for the Assignment of Protein Backbone NMR
Peaks", in Proceeding of the IEEE Computer
Society Bioinformatics Conference 2002 (CSB
2002), P165 - 174.

6 Z-Z. Chen, T. Jiang, G. Lin, J. Wen, D. Xu,
Ying Xu. "Better Approximation Algorithms for NMR
Spectral Peak Assignment." The second Workshop on
Algorithms in Bioinformatics (WABI)", LNCS 2454,
pp. 82-96, 2002.
7 F.Tian, H.Valafar and J.H. Prestegard. "A
dipolar coupling based strategy for simultaneous
resonance assignment and structure determination
of protein backbones", Journal of the American
Chemical Society, 12311791-11796, 2001.
8 R. Bar-Yehuda and S. Even. "A local-ratio
theorem for approximating the weight vertex cover
problem'', Annuals of Discrete Mathematics,
2527-46, 1985.
9 D E. Zimmerman, C A. Kulikowski, Y Huang, W
Feng, M Tashiro, S Shimotakahara,C-Y Chien, R
Powers, and G T. Montelione. "Automated Analysis
of Protein NMR Assignments Using Methods from
Artificial Intelligence'', J. Mol Biol 269,
592-610, (1997).
10 J.J. Warren and P.B. Moore. "Application of
dipolar coupling data to refinement of the
solution structure of the Sarcin-Ricin loop
RNA''. Journal of Bimolecular NMR, 20 311-323,
2001.
11 Michael Andrec, Yuichi Harano, Mathew P.
Jacobson, Richard A Friesner and Ronald M Levy,
"Complete Protein Structure Determination Using
Backbone Residual Dipolar Couplings and Side
chain Rotamer Prediction''.