NMR Peak Assignment : Better Algorithm - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

NMR Peak Assignment : Better Algorithm

Description:

Improved Two Layer Algo. layerOne (sequence U, spin V, score) ... Improved Two Layer Algo. layerTwo (sequence, spin, score, possible assignment){ smax ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 52
Provided by: SVK
Category:

less

Transcript and Presenter's Notes

Title: NMR Peak Assignment : Better Algorithm


1
NMR Peak Assignment Better Algorithm
  • Frederick Vizeacoumar
  • Tommy Chu

2
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

3
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

4
Introduction
  • Nuclear Magnetic Resonance (NMR)

5
Introduction
  • Nuclear Magnetic Resonance (NMR)
  • Use the strong magnetic wave to align nuclei
    (isotopes).
  • When this spin transition occurs, the nuclei are
    said to be in resonance with the applied
    radiation.

6
NMR measurement
  • Chemical Shift
  • ppm
  • Electrons in the molecule have small magnetic
    fields
  • When applied string magnetic field, electrons
    tend to oppose the applied field.
  • NMR Spectrum

7
NMR spectroscopy
  • Study the physical, chemical, and biological
    properties.
  • Problem
  • Identified sequence.
  • Unknown (complete) structure.
  • Known basic structure.
  • Unknown the structure corresponding to AAs.

8
Procedure to determine protein structure using NMR
  • Three steps
  • Data generation
  • Involves corresponding resonance peaks to AA and
    forming spin system.
  • Data interpretation
  • Involves matching spin system to amino acids
    providing inter and intra AA distance and angle.
  • NMR Structure calculation
  • Involves structure determination using Molecular
    Dynamics (MD) Energy Minimization (EM).

9
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

10
Project -- Data Interpretation
  • Primary Goal
  • (Better) Automated Peak Assignment.
  • Steps for doing this data interpretation
  • Map resonance peaks from different NMR spectra to
    same residue.
  • Identify adjacency relationship.
  • Assign the segments to the protein sequence.
  • Problem with existing Algorithms
  • Low accuracy.
  • High time complexity.

11
Peak Assignment
  • Assignment procedure need to address two crucial
    information
  • Different AA types have different distribution of
    spin system.
  • The adjacency info, scalar coupling, between spin
    systems are obtained by identifying their common
    resonance frequencies.
  • Enhancement techniques
  • HSQC
  • Dipolar coupling

12
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

13
Previous Works
  • 1 formulated NMR assignment problem to a
    Constraint Bipartite Matching (CBM) problem.
  • 1 proposed a naïve (two layer) algorithm.
  • 1,6 proved D-string CBM is NM-hard.
  • 6 proposed two approximation algorithm.
  • 5 applied the branch-and-bound techniques.
  • 9 attempted to solve CBM using extensive search
    techniques in artificial intelligence.

14
Bipartite Matching
  • G(U ? V,E)
  • U is sequence of AA
  • V is the set of Spins
  • U ? V ?
  • G(U ? V, M) ? G i.e. m ? M cannot share a same
    vertex.

Protein Sequence
Spin Systems
A
3
G
2
C
1
T
4
15
Bipartite Matching
  • G(U ? V,E)
  • U is sequence of AA
  • V is the set of Spins
  • U ? V ?
  • G(U ? V, M) ? G i.e. m ? M cannot share a same
    vertex

Protein Sequence
Spin Systems
A
3
G
2
C
1
T
4
16
Bipartite Matching
  • Perfect Matching
  • All node need to be covered by a match.
  • Weighted Bipartite Matching
  • Each edge associated with a weight.
  • Maximum (Perfect) Weighted Bipartite Matching
  • Maximize the total weight for all m ? M .

Protein Sequence
Spin Systems
A
3
G
2
C
1
T
4
17
Constraint Bipartite Matching
  • For all segment ? V
  • If (ui,vj) ? M i.e. ui ? U ? vj ? V
  • Then (ui1,vj1) ? M
  • D-string CBM
  • Specify the problem with maximum segment size is
    D.

Protein Sequence
Spin Systems
A
3
G
2
C
1
T
4
18
Two Layered Algorithm 1
  • First layer
  • Filter out unlikely assignments for long strings.
  • Second layer
  • Try all possible combinations of assignments for
    long strings to find the maximum one as the
    result.

19
Approximation Algorithms 2
  • 2D Approximation Algorithm
  • Let M be an optimal matching on G.
  • Every edges corresponding to a vertex will
    conflict with at most 2 edges in M.
  • 3logD Approximation algorithm
  • If the length of the longest string in V is at
    most four times the length of shortest string
  • Then greedy algorithm will find a solution whose
    weight is at least 1/6 of the optimal.

20
Branch and bounded algorithm
  • A systematic method for solving optimization
    problems.
  • Construct a search tree and apply a carefully
    selected criterion to determine which node to
    expand the search.
  • Exponential time in the worst case

21
AUTOASSIGN/AUTOPEAK
  • 5 Major Searches
  • Make strongest matches
  • Allow degenerate shifts
  • Extend assigned segments
  • Match weaker spin systems
  • Finish Assignments
  • Claims to have 98 accuracy.
  • Test only on RNA single strand sequence.
  • Infeasible to compute using protein sequences.

22
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

23
Input Parameters
  • Protein Sequence
  • Sequence of location and AA
  • Spin Systems
  • Segment of chemical shift separated by comma
  • Score Scheme
  • A table storing the score between an AA and the
    range of chemical shift

24
Example input for protein sequence
  • 1 GLY
  • 2 SER
  • 3 VAL
  • 4 GLU
  • 5 GLN
  • 6 ILE
  • 7 SER
  • 8 GLY

25
Input Parameters
  • Protein Sequence
  • Sequence of location and AA
  • Spin Systems
  • Segment of chemical shift separated by comma
  • Score Scheme
  • A table storing the score between an AA and the
    range of chemical shift

26
Example input for spin system (three segments)
  • 12.5
  • 13.5 ,
  • 5.35
  • 6.4
  • 7.21 ,
  • 16.1
  • 17.2

27
Input Parameters
  • Protein Sequence
  • Sequence of location and AA
  • Spin Systems
  • Segment of chemical shift separated by comma
  • Score Scheme
  • A table storing the score between an AA and the
    range of chemical shift

28
Example input for score scheme
  • 5
  • GLY
  • SER
  • VAL
  • GLU
  • GLN
  • 4
  • 0 5
  • 5 10
  • 10 15
  • 15 20
  • 1 5 2 4
  • 2 1 4 0
  • 1 2 3 4
  • 2 4 1 3
  • 2 3 3 6

29
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

30
Implemented Design
  • CBM Two layer approximation algorithm
  • As mentioned earlier we use the bipartite graph
    matching problem with some constraints to solve
    the peak assignment problem
  • Key idea is the multi-dimensional NMR spectra
    contains inter residual peaks that convey the
    connectivity information between residues
  • We match the inter- and intra residual peaks
    using their chemical shift making the
    connectivity information straightforward

31
CBM Two layer Approximation
  • layerOne( sequence, spin, score, threshold)
  • for every segment Ui in spin do
  • for every position Vj in the sequence do
  • if score (Ui,Vj) threshold then
  • mark Vj as possible assignment
    position
  • layerTwo( sequence, spin, score)
  • smax -8
  • for every possible legal combination set from
    layerOne
  • do calculate the score and call it si
  • if si gt smax then
  • smax si and store current position
    as final assignment

32
2 Approximation Algorithm
  • Key Idea is to form a Weighted Bipartite Graph
    and use it to find the matching.
  • We use some restriction and constraints in
    identifying the leading innermost edge as
    explained.
  • We find an innermost edge, add that with all its
    conflicting edges and take the minimum weight on
    this set and subtract it and remove the edges
    with weight 0.
  • The above process is called on the new set of
    graph formed recursively to obtain feasible
    matching

33
3 Log D Approximation
  • The idea is if there are m different segments in
    the spin system we group these m small sets into
    overlapping groups based on some formulas
  • find the score for each group and maximize the
    final score and assignment to feasible solution
  • This also uses weighted bipartite graph
  • As explained before this problem includes still
    more constraints in identifying there is no
    overlapping between segments.

34
3 Log D Approximation
  • 3 log D Approximation(Score U, Spin V)
  • r lmax / lmin , where lmax and lmin are max
    and min length of
  • string in V
  • group V into g max(2, log4r) subsets Vi such
    that
  • 4i-1 s / lmin 4i
  • for every i ? 1,2, g
  • cal the set Ei of edges of G incident to
    strings in Vi
  • initialize Mi Ø
  • while (Ei ? Ø)
  • find an edge e ? Ei of maximum
    weight
  • add e to Mi and delete e and
    all edges conflicting with e from Ei
  • greedily extend Mi to a maximal feasible
    matching of G
  • output the heaviest one among Mi

35
Improved Two layer Algorithm
  • The threshold value in CBM algorithm is good to
    eliminate but determination of this threshold
    value is trial and error basis.
  • 6 considers 3LogD approximation is better than
    2 approximation.
  • In 3 Log D approximation, the partitioning the
    subsets of segments in the spin system is based
    on the formula 4i-1 s / lmin 4i . Does this
    always give a better improvement in score ?

36
Improved Two Layer Algo.
  • layerOne (sequence U, spin V, score)
  • for every subset Vi of the set spin system V
    do
  • Ei all edges incident from Vi to U
  • Mi Ø
  • while (Ei ? Ø)
  • find an edge e ? Ei of max weight
  • add e to Mi and delete e and all
    conflicting edges with e from Ei
  • mark positions in sequence set for
    corresponding Mi set as possible
  • assignment

37
Improved Two Layer Algo.
  • layerTwo (sequence, spin, score, possible
    assignment)
  • smax -8
  • for every possible assingment position in U do
  • calculate the score and call it si
  • if si gt smax then
  • smax si and store current position
    as final assignment
  • If there are m groups of spin segments, then
    total number
  • of search would be 2m-1 .

38
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

39
Current Results
40
Expected Results
41
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

42
Interface with HSQC
  • Main work in Hetero-nuclear Single Quantum
    Correlation involves in identifying the NH amide
    side chain.
  • Basically it is a biological experiments which
    yields data related to the NHx group directly
    attached to proton
  • All AA produces one signal for N-H amide group
    (except proline) based on its pH value and the
    chemical shift exhibited, the NH side chain is
    visible.

43
HSQC cont
  • Folded proteins or protein domains display a
    broad distribution of NMR frequencies resulting
    good spread-out.
  • Unfolded proteins do exhibit same resulting in
    overlapping frequencies.
  • HSQC technique adds ligand shifting signals which
    changes the overlapping.
  • This process also involves few calculation that
    results in a better spin system values.

44
Survey on Dipolar Coupling
  • Unlike HSQC enabling to identify the NH amide
    side chain, Dipolar coupling identifies the Ca -
    Cb side chain.
  • Provides long range info which is lacking in NMR
    experiments
  • This is also an biological experimental process
    to improve the spin system which is the main part
    in identifying the protein structure.
  • The results of this experiment enables us to
    determine the side chain rotamer states using
    rotamer prediction algorithm.

45
Dipolar Coupling cont
  • NMR solutions structures are determined primarily
    using restraints derived from nuclear overhauser
    effects. This derivation yields to the
    proton-proton distance less than 5Å. Hence for
    elongated molecules, NMR is not efficient.
  • Elongated molecules are present in the helical
    structure of the protein sequence.
  • This local error on elongated molecules tends to
    add over the length resulting in poor protein
    structure determination.

46
Dipolar Coupling Cont
  • The size of dipolar coupling observed bet 2
    nuclei is given by
  • DPQ(q,f)DaPQ(3cos2q 1)1.5 R sin2q cos 2f
  • Where R is rhombicity (shape of molecule)
  • The value obtained here helps to observe
    elongated molecules as well

47
Agenda
  • Introduction
  • Problem Definition
  • Previous Works
  • Input Parameters
  • Design
  • Results
  • HSQC and Survey on Dipolar Coupling
  • Conclusion

48
Conclusion
  • Major part of NMR technique involves in peak
    assignment process.
  • The main goal of our project in finding the
    better algorithm lead us to think about improving
    the matching and score scheme rather than the
    improvement on the computational process of the
    algorithms.
  • From the already existing algorithm, we found
    that if there are m segments, a reduces amount of
    subsets was taken and a better matching was done.
    We thought it would be nice to look if all the
    possible subsets are taken and observed for a
    better match score value.

49
Future Directions
  • As given in the CBM two layer algorithm, if the
    NMR experiment could someway help in identifying
    the threshold value, then in our improved
    version, the number of checks could be reduced
    with this threshold value.
  • Extracting the results from HSQC experiment and
    with some algorithms developed for this data set,
    we can get a better spin system for the NH amide
    side chain and improve our assignment process
  • Using the data from the Dipolar coupling to
    identify the Ca - Cb side chain would get us even
    better spin segments. This might yield to a
    better protein structure determination.

50
Acknowledgements
  • Our Sincere thanks to
  • Dr. Guohui Lin, Professor, U of A.
  • Mr. Xiang Wan, Ph.D. Student, U of A.
  • Mr. Jon McCall, Spectrum Research LLC.
  • Dr. Gaetano T. Montelione, Rutgers Univ.

51
References
  • 1 Y. Xu, D. Xu, D. Kim, V. Olman, J.
    Razumovskaya, and T. Jiang. "Automated assignment
    of backbone NMR peaks using constrained bipartite
    matching", IEEE Computing Science Engineering,
    450-62,2002.
  • 2 C. Bartels, T. Xia, M. Billeter, P. Gu, and
    K. Wu, "The program XEASY for computer-supported
    NMR spectral analysis of biological
    acromolecules", J. Biol. NMR 6, 1-10, 1995.
  • 3 K. P. Neidig, M. Geyer, A. Go, C. Antz, R.
    Saffrich, W. Beneicke,and H. R. Kalbitzer.
    "AURELIA, a program for computer-aided analysis
    of multidimensional NMR spectra", J. Biomol. NMR
    6, 255-270, 1995.
  • 4 B. R. Brooks, R. E. Bruccoleri, B. D.
    Olafson, D. J. States, S. Swaminathan, and M.
    Karplus. "CHARMM A Program for Macromolecular
    Energy, Minimization, and Dynamics Calculations",
    J. Comp. Chem. 4, 187-217, 1983.
  • 5 G. Lin, D. Xu, Z-Z. Chen, T. Jiang, J. Wen,
    Y. Xu. "An Efficient Branch-and-Bound Algorithm
    for the Assignment of Protein Backbone NMR
    Peaks", in Proceeding of the IEEE Computer
    Society Bioinformatics Conference 2002 (CSB
    2002), P165 - 174.
  • 6 Z-Z. Chen, T. Jiang, G. Lin, J. Wen, D. Xu,
    Ying Xu. "Better Approximation Algorithms for NMR
    Spectral Peak Assignment." The second Workshop on
    Algorithms in Bioinformatics (WABI)", LNCS 2454,
    pp. 82-96, 2002.
  • 7 F.Tian, H.Valafar and J.H. Prestegard. "A
    dipolar coupling based strategy for simultaneous
    resonance assignment and structure determination
    of protein backbones", Journal of the American
    Chemical Society, 12311791-11796, 2001.
  • 8 R. Bar-Yehuda and S. Even. "A local-ratio
    theorem for approximating the weight vertex cover
    problem'', Annuals of Discrete Mathematics,
    2527-46, 1985.
  • 9 D E. Zimmerman, C A. Kulikowski, Y Huang, W
    Feng, M Tashiro, S Shimotakahara,C-Y Chien, R
    Powers, and G T. Montelione. "Automated Analysis
    of Protein NMR Assignments Using Methods from
    Artificial Intelligence'', J. Mol Biol 269,
    592-610, (1997).
  • 10 J.J. Warren and P.B. Moore. "Application of
    dipolar coupling data to refinement of the
    solution structure of the Sarcin-Ricin loop
    RNA''. Journal of Bimolecular NMR, 20 311-323,
    2001.
  • 11 Michael Andrec, Yuichi Harano, Mathew P.
    Jacobson, Richard A Friesner and Ronald M Levy,
    "Complete Protein Structure Determination Using
    Backbone Residual Dipolar Couplings and Side
    chain Rotamer Prediction''.
Write a Comment
User Comments (0)
About PowerShow.com