Structural Bioinformatics - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Structural Bioinformatics

Description:

DALI (Distance Alignment Tool): Subset of internal coordinate space. Lecture 11 CS566 ... Distance matrix based alignment (DALI Fig. 10.15) ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 12
Provided by: deendayald
Learn more at: https://sse.umkc.edu
Category:

less

Transcript and Presenter's Notes

Title: Structural Bioinformatics


1
Structural Bioinformatics Structure Comparison
  • Motivation
  • Concepts
  • Structure Comparison

2
Motivation
  • Given structures A and B, are they similar?
  • Implications
  • A and B might share the same set of functions
  • Given structure A, is a similar structure already
    known?
  • Implications
  • Each new experimentally solved structure can be
    placed in context of existing body of structural
    knowledge

3
Concepts
  • For n sequences and s corresponding structural
    equivalence classes, n gtgt s. Possible reasons
  • Structural divergence is slower than sequence
    divergence in evolution (à la RNA sequence
    alignment)
  • Convergent evolution Some structures are
    preferred for a functional reason
  • Coincidence Only so many structures are
    possible, for a given threshold of similarity
  • Terms used to describe structure
  • Architecture, Class, Fold, Super-family, Family

4
Superposition versus Alignment
  • Structural superposition versus structural
    alignment
  • Superposition
  • Residue correspondence already known, based on a
    statistically significant sequence alignment
  • Problem is that of finding optimal correspondence
    between two sets of points, given subset of
    equivalent points between the two sets
  • Optimality measured by lowest value of Root Mean
    Square Deviation (RMSD)
  • Alignment
  • Residue correspondence unknown
  • Need structure-based scoring function and
    evaluate this for all possible superposition of
    structures
  • Optimal solution frequently impractical because
    of high complexity (NP-hard, Why?)

5
Heuristic Structure Alignment
  • General strategy
  • Summarized/reduced representation of each
    structure
  • Consider only subsets of atoms (Just C? or C?)
  • Use summarized vectors to represent organized
    sub-structures
  • Approaches
  • Dynamic programming with empirical scoring
    functions
  • Distance-matrix correspondence in internal
    co-ordinate space

6
Heuristic Structure Alignment
  • Vector based strategies
  • VAST (Vector Alignment Search Tool) Compare
    summarized vector representations
  • SSAP (Secondary Structure Alignment Program)
    Compare nearest-neighbor vectors by double
    dynamic programming
  • Distance matrix comparisons (à la Dot-matrix)
  • DALI (Distance Alignment Tool) Subset of
    internal coordinate space

7
VAST alignment (Fig. 10.13)
  • Use only subset of atom coordinates
  • Replace atom coordinates with vector coordinates
    corresponding to secondary structural elements
    (structural words)
  • Compare sets of vectors to assess similarity

8
Double dynamic programming (SSAP/CATH Fig. 10.14)
  • First level
  • Represent each residue by neighborhood vector for
    C?
  • Compare n versus m neighborhood vectors
  • Generate optimal alignment based on vector
    differences and dynamic programming
  • Second Level
  • Add matrix scores if paths cross in a cumulative
    matrix
  • Generate optimal alignment based on the
    cumulative matrix

9
Distance matrix based alignment (DALI Fig. 10.15)
  • Generate dot-matrix of inter- C? distances, using
    threshold
  • Pick out secondary structure elements based on
    matrix patterns
  • Compare two matrices to generate structural
    alignment

10
Structure Comparison Databases
  • Several databases (CATH, MMDB, FSSP) maintain a
    hierarchical classification of known structures,
    based on pair-wise structural alignment scores
  • High complexity of the algorithms requires
    incremental additions
  • Actual classification is algorithm-dependent,
    with some consensus, but significant differences
    exist

11
Summary
  • Sequence similarity (gt 50 identity) implies
    structural similarity. Converse not necessarily
    true (evolutionary convergence/information
    convergence)
  • Structural similarity algorithms are heuristic
    ways to assess structural similarity
    independent of sequence similarity
  • Structural variation is smaller than that
    suggested by the number of possible sequences
Write a Comment
User Comments (0)
About PowerShow.com