Structural Characterization of Proteins using Residues Environments - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Structural Characterization of Proteins using Residues Environments

Description:

Select 100 random SCOP families in ASTRAL 40. X-ray crystallographic structures only ... Analysis of a random member from each of 100 random families (SCOP) ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 25
Provided by: csie
Category:

less

Transcript and Presenter's Notes

Title: Structural Characterization of Proteins using Residues Environments


1
Structural Characterization of Proteins using
Residues Environments
  • Sean D. Mooney, Mike Hsin-Ping Liang, Rob
    DeConde, and Russ B. Altman
  • Stanford University, Indiana University
  • PROTEINS Structure, Function, and Bioinformatics
  • 2005

2
Outline
  • Introduction
  • Materials and Methods
  • Results
  • Discussions
  • Conclusions

3
Introduction
  • A primary challenge for structural genomics is
    the automated functional characterization of
    protein structure.
  • Current structural methods for identifying
    function rely on one of the following
  • Phylogenetic tree derived from sequence
    similarity (evolutionary trace)
  • Hand curated molecular fingerprints (template
    based)
  • Fold recognition and alignment methods (structure
    comparison)
  • Sequence-based methods for functional
    characterization rely on identifying conserved
    residues within protein structures

4
Introduction (cont)
  • It is important to develop sequence-independent
    methods for identifying function to complement
    sequence-based methods when they are limited by
  • Lack of sequence similarity
  • Small datasets
  • Methods for identifying key functional residues,
    or molecular fingerprints, can classify function

5
Introduction (cont)
  • Sequence-independent structure-based methods for
    function assignment are challenging for several
    reasons
  • Aligning local structure is a difficult
    computational task
  • Estimating the statistical significance of the
    results is challenging
  • Scanning through the entire protein data bank
    (PDB) can be computational demanding
  • Structural similarity and functional similarity
    are not always well correlated

6
Goal
  • Develop a method for unsupervised mining of
    structural datasets and automatically identifying
    local regions within protein structures that are
    statistically associated with a given annotation
  • Define the most structurally significant residues
    environments for given classification, based on
    the structural environments represented in that
    database

7
Methods
  • S-BLEST (Structure-Based Local Environment Search
    Tool)
  • Based on the FEATURE representation of a local
    environment
  • Rapidly search databases of vectors of local
    structure properties
  • S-BLEST method relies on k nearest neighbor (KNN)
    searches using a Manhattan distance metric
  • Significance score (z-score)

8
Local Environment
  • Local structure environment
  • A set of concentric shells extending outward from
    a positions of residues Cß atom
  • Glycine residues
  • Cß atom position were estimated by determining
    the average of position of a Cß from serine
    protease 1DSU
  • It is the simplest of the 20 standard amino
    acids its side chain is a hydrogen atom
  • Each shell contains 66 properties
  • of atoms associated with a given residue type
  • of positively and negatively charge ions
  • The van der Waals volume of the shell
  • The solvent accessibility
  • Four radial boundaries
  • 1.875, 3.75, 5.625, 7.5 Å
  • Vector dimension 264

glycine1
1DSU
1 http//en.wikipedia.org/wiki/Glycine
9
Materials
  • Datasets
  • ASTRAL 40 non-redundant structure database
    (ASTRAL 1.65)
  • X-ray crystal structures only
  • Steps for data cleaning
  • Remove all hetero-atoms (PDB HETATM)
  • Normalization

10
Identification of Residues Environments
Associated With a Structural of Functional
Annotation
  • The performance of each residues can be
    determined by creating a receiver operator
    characteristics (ROC) plot of the ranking
  • TP a protein structure that belongs to the same
    SCOP family as the query protein with a z-score
    of greater magnitude than the threshold
  • FP a protein structure that doesnt belong to
    the same SCOP family but has a z-score of greater
    magnitude than the threshold
  • The AUC of a residue in a query structure of
    known function indicates how well the reside
    environment classifies the SCOP family of the
    structure and can range from 0.0 to 1.0.

11
Congruence Approach for Combining S-BLEST Searches
  • Congruence approaches (Shotgun) are a useful way
    to combine several searches to increase
    statistical significant.
  • If input is a query with multiple residues (query
    chain)
  • Each residue in the query chain --gt most similar
    residues in each dataset chain (z-score
    threshold)
  • If there were n resides in the query chain, there
    would be n residues (possibly redundant) in the
    dataset chain that are identified as most similar
    to each of the n residues in the query each with
    a z-score.

12
Identification of Structurally Similar Residue
Environments in ASTRAL 40 v1.65
  • ASTRAL 40 v1.65 encoded 4129 crystallographically
    determined structures

distance histogram distribution (2TRXA)
13
Identification of the Residue Environments
Associated With a Structural Class
1DI9A
AUC
Yellow ATP binding site, Gray peptide binding
channel, Red Phosphorylated(???) Residue
14
ROC of the Ranked Chains Outputted from the
Congruence Approach
Of the 27 members in our dataset, the first 25
chains ranked were true positives, whereas the
method failed to recognize 1KOA and 1FMK as
structurally similar (AUC is 0.935).
15
Congruence Approach to Characterize Protein
Structures
  • Goal - to show that S-BLEST finds structurally
    similar environments with potential implications
    for fold, family, and function.
  • Select 100 random SCOP families in ASTRAL 40
  • X-ray crystallographic structures only
  • Z-score threshold for each protein is -5.5

16
ASTRAL 40 v1.65 (100 random members)
PPV Positive Predictive Value
z-score
17
Analysis of Uncharacterized PDB Structures
  • 86 structures
  • 86 of these structures had no significant hits
    when searched against the PDB using BLAST with
    e-value cutoff of 1e-4
  • How to obtain these 86 structures search PDB
    for the phrase unknown function

18
Hit Results from the 86 Structures with Unknown
Function (1/5)
True Positive
1VGYA
1LFWA
SCOP C.56.5.4
ARG97 ARG115, HIS68 HIS87, ASP70 ASP89,
GLY98 GLY112, GLU136 GLU154 z-score -6.36,
e-value 3x10-4
19
Hit Results from the 86 Structures with Unknown
Function (2/5)
1VGYA
AUC
20
Hit Results from the 86 Structures with Unknown
Function (3/5)
Of the five true positives in our dataset, three
were the top hits, the fourth was in position
five, and the fifth was ranked 65th overall (AUC
is 0.995)
21
Hit Results from the 86 Structures with Unknown
Function (4/5)
Questionable Significant
1B3UA
1OYZA
Clearly structurally related, and the best
residues matches occur between SSEs, and are
often observed bridging the structural
elements z-score -5.21
22
Hit Results from the 86 Structures with Unknown
Function (5/5)
Possible Unknown Hit
1LJOA
1B34A
Proteins share the same fold, but their
functional relationship is not known z-score
-5.64 e-value 1e-7
23
Discussions
  • This method is intended to identify statistically
    significant environments in protein structures
    and will be complementary to both sequence-based
    methods such as BLAST or HMMs and fold
    recognition methods
  • Analysis of a random member from each of 100
    random families (SCOP)
  • S-BLEST (threshold of -5.1, z-score) finds 28
    SCOP family members that BLAST (threshold of
    1e-5, E-value) does not find
  • BLAST finds 89 family members that S-BLEST does
    not find
  • Local structural variability between the proteins
  • However, of 66 false-family positives, all but 13
    of which share the superfamily of the query
  • for each BLAST hit, the degree of structural
    conservation of each residue environment can be
    easily determined using S-BLEST.
  • Enzyme
  • many residues that were annotated as being
    important for enzyme chemistry are not the ones
    that are most useful for recognizing structural
    similarities.
  • The method sometimes does not select the critical
    residues (such as the catalytic triad) likely
    because the environments around those residues
    are structurally variable between members.

24
Conclusions
  • We developed S-BLEST to meet a need for rapidly
    identifying similar structures to a query protein
    using local structural environment.
  • S-BLEST identifies constellations of structurally
    similar residues between the query protein and
    the full database of known protein structures.
  • We found that many of the structural environments
    in SCOP have statistically significant local
    environment neighbors.
  • S-BLEST was able to associated 20 proteins with
    at least one local structure neighbor and
    identify the amino acid environment that are most
    similar between those neighbors.
Write a Comment
User Comments (0)
About PowerShow.com