Title: Databases: Navigating the MSD
1Databases Navigating the MSD
2MSDlite
- A simple form-based query system to search the
MSD Databases - Allows multiple search fields to be combined
- Relatively fast, despite performing complex SQL
queries
3MSDlite
- simple, easy to use form
- allows multiple search fields to be combined
- relatively fast, despite performing quite complex
SQL queries
- not exposing the power of a relational database
- user can't specify the relationship between
search fields - "name" AND "title" AND "keyword"
- "name" OR "title" OR "keyword"
- ( "name" OR "title" ) AND NOT "keyword"
- the search form is defined by the authors of the
search system, not the author of a query
4Complex Searches (Advanced Users)
- Wanted to allow the user to entirely control
their query developed MSDpro - Uses an applet to provide a dynamic "form" that
lets the user - choose the fields to be searched
- specify relationships between search fields
- choose result fields and how they are presented
- perform "complex" sub-queries e.g. SSM, FASTA
- MSDpro uses an applet for constructing queries
and a server to execute them - The user describes their query entirely
graphically, including logical operations such as
AND, OR and NOT
5MSD Atlas Pages
You can download coordinates and structure
factors (if available)
6Structural Similarity MSDfold
7If you have to ask.
- Are there any structures in the PDB that are
similar to mine? - What SCOP and/or CATH family could my structure
belong to ? - Can I get some idea about the possible function
of my protein based on structural similarity with
others? - How do I get a multiple alignment of many of my
structures ?
8Structure Alignment
Structure alignment may be defined as
identification of residues occupying equivalent
geometrical positions
- Unlike in sequence alignment, residue type is
neglected - Used for
- measuring the structural similarity
- protein classification and functional analysis
- database searches
9Methods
- Distance matrix alignment (DALI, Holm Sander,
EBI) - Vector alignment (VAST, Bryant et. al. NCBI)
- Depth-first recursive search on SSEs (DEJAVU,
Madsen Kleywegt, Uppsala) - Combinatorial extension (CE, Shindyalov Bourne,
SDSC) - Dynamical programming on Ca (Gerstein Levitt)
- Dynamical programming on SSEs (SSA, Singh
Brutlag, Stanford University) - many others
- MSDfold (SSM) employs a 2-step procedure
- Initial structure alignment and superposition
using SSE graph matching - Ca - alignment
10Graph representation of SSEs
E. M. Mitchell et al. (1990) J. Mol. Biol. 212151
SSE graphs differ from conventional chemical
graphs only in that they are labelled by vectors
of properties. In graph matching, the labels are
compared with tolerances chosen empirically.
11SSE graph matching
A
Matching the SSE graphs yields a correspondence
between secondary structure elements, that is,
groups of residues. The correspondence may be
used as initial guess for structure superposition
and alignment of individual residues.
B
12Ca - alignment
- SSE-alignment is used as an initial guess for
Ca-alignment - Ca-alignment is an iterative procedure based on
the expansion of shortest contacts at best
superposition of structures
- Ca-alignment is a compromise between the
alignment length Nalign and r.m.s.d. Longest
contacts are unmapped in order to maximise the
Q-score
13Using MSDfold
Discover hitherto unknown relationships
88 structural identity
11 Sequence identity
14MSDfold Search Interface
15MSDfold Output
- Table of matched Secondary Structure Elements
- Table of matched backbone Ca-atoms with distances
between them at best structure superposition - Rotation-translation matrix of best structure
superposition - Visualisation in Jmol and Rasmol
- r.m.s.d. of Ca-alignment
- Length of Ca-alignment Nalign
- Number of gaps in Ca-alignment
- Quality score Q
- Statistical significance scores P(S), Z
- Sequence identity
16Results Page For Pairwise Alignment
17Specific Pairwise Results
18Structural Alignment
Residue by residue structural alignment result
19Multiple 3D Alignment
- More than 2 structures are aligned simultaneously
- Multiple alignment is not equal to the set of
all-to-all pairwise alignments - Helps to identify common structure motifs for a
whole family of structures
20Multiple 3D Alignment Interface
21Results From Multiple 3D Alignment
22Conclusions from use of MSDfold
- residue identity may play a much less significant
role in protein structure than often believed - as a consequence, the role of residue identity in
protein function may be often overestimated - using sequence identity for the assessment of
structural or functional features may give more
false negatives than expected - physical-chemical properties of residues should
be given preference over residue identity in
structure and function analysis - modern methods for structure alignment are
efficient there is little sense to use sequence
alignment in structure-related studies