Author: Jason Weston et., al - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Author: Jason Weston et., al

Description:

Search biosequences from online ... Use protein 3-D structure database SCOP as golden standard. ... 7329 protein domains with known 3D structure on SCOP. ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 26
Provided by: Xin110
Category:
Tags: author | jason | scop | weston

less

Transcript and Presenter's Notes

Title: Author: Jason Weston et., al


1
Protein Ranking From Local to global structure
in protein similarity network
  • Author Jason Weston et., al
  • PANS
  • Presented by Tie Wang

2
Outline
  • Introduction
  • Background
  • Method
  • Experiment
  • Analysis

3
Introduction
  • Pairwise subtle sequence similarities imply
    structural functional and evolutionary relations
    among DNA and protein seqences
  • Search biosequences from online database is
    analogous to searching the WWW (search engine
    search the db for query and return a ranked
    list)
  • A protein ranking algorithm is presented for
    biosequence query

4
Background
  • Early algorithms only focus on pair-wise sequence
    similarity (SW LA search)
  • Statistical models use multiple alignments for
    similarity search (profile based, psi-blast)
  • Global similarity search can be mapped onto
    protein similarity network.

5
How to perform protein ranking?
  • Underlying idea Google ranking
  • Key feature Exploiting global structure by
    interring it from local hyperlink structure.
  • Construct a protein similarity network
  • Add query sequence
  • Weight diffusion
  • Rank proteins upon convergence

6
Algorithm
7
Experiment
  • Use protein 3-D structure database SCOP as golden
    standard.
  • Sequences have no more than 95 similarity.
  • 7329 proteins are splitted into 379 superfamilies
    as training and 332 for testing
  • 3 networks are generated using BLAST and
    PSI-BLAST.

8
Experiment
  • Value
  • Compare with other two experiments
  • 1. only local structure are considered
  • 2. non-local edges without weak edges
  • The result shows that the second one is only
    slightly worse than our algorithm


Where Sj(i) is E value assigned to protein I
given query j.
9
Analysis
Bower et al, Science vol 306, 2004
Cluster structure
10
Motif based protein ranking by network propagation
  • Author Kuang Rui et., al
  • Bioinformatics
  • Presented by Tie Wang

11
Outline
  • Introduction
  • Background
  • Method
  • Experiment
  • Analysis

12
Background
  • Direct measure of pairwise sequence is proved to
    be effective on classification.
  • Performance is dropped down when detecting subtle
    remotely homology sequences.
  • Those sequences share a conserved structure at
    least at some components.
  • Formulate problem based on this statement.

13
Protein motif bipartite network
  • Each protein contains a set of motifs.
  • Each motif belongs to a set of proteins.
  • Their relationship are mapped to a
  • Bipartite graph as shown on the left.
  • The edge weight indicates the probi-
  • lity that motif x is in protein y.

14
Motifdrop Algorithm
  • Set P represents protein sequences and set F
    represents motifs. H is the connectivity matrix.

is row normalized version of H.
is a vector of initial value for H.
is a vector of initial value for P.
15
MotifProp Algorithm
  • The convergence of motifdrop is guranteed.
  • The problem is reformulated based on the
    following rule,

is row normalized version of H.
is a vector of initial value for H.
is a vector of initial value for P.
16
Edge weighting scheme
  • PSI-BLAST E-value is assigned between pair-wise
    protein nodes.
  • Gaussian edge weights are calculated.
  • The Gaussian weights from query to each protein
    are assigned as initial value.

17
Value estimation
  • Sq(i) is the E-value of protein i and query q.
  • Eq(j) is the E-value of the jth motif and ith
    protein.

(1)
???
18
Estimation on substitution score
  • Substitutions score between a kmer f and sequence
    x can be estimated as,
  • where
  • and
  • sl is a log value which implied the S score
    below threshold can be a motif hits against
    sequence x.

19
Sequential MotifProp
  • Empirical experiments suggest that using a
    weighted linear combination of multiple motifs
    does not improve the results.
  • Apply a simple multiple motif sets scheme.
  • Motif nodes F can be divided into n set partition
  • in which F(i) is
    a set of motif from ith motif set.
  • F set represents the motifs instead of individual
    ones.

20
Motif-rich regions
21
Experiments
  • 7329 protein domains with known 3D structure on
    SCOP.
  • They are divided into training (4246) and testing
    (3083).
  • Apply additional 10602 from swiss-prot db.
  • Evaluation on ROC curve.

22
Results of classification
23
Results of classification (cont)
24
Results on Motif rich region
25
Conclusion
  • Two methods are presented on protein
    classification using protein ranking methods.
  • Similarity matrix and protein/motif propagation
    network are base structures.
  • Simple methods but innovative formulation.
  • Better results compared with current approaches.
  • Analysis on results play an important roles.
Write a Comment
User Comments (0)
About PowerShow.com