Title: Author: Jason Weston et., al
1Protein Ranking From Local to global structure
in protein similarity network
- Author Jason Weston et., al
- PANS
- Presented by Tie Wang
2Outline
- Introduction
- Background
- Method
- Experiment
- Analysis
3Introduction
- Pairwise subtle sequence similarities imply
structural functional and evolutionary relations
among DNA and protein seqences - Search biosequences from online database is
analogous to searching the WWW (search engine
search the db for query and return a ranked
list) - A protein ranking algorithm is presented for
biosequence query
4Background
- Early algorithms only focus on pair-wise sequence
similarity (SW LA search) - Statistical models use multiple alignments for
similarity search (profile based, psi-blast) - Global similarity search can be mapped onto
protein similarity network.
5How to perform protein ranking?
- Underlying idea Google ranking
- Key feature Exploiting global structure by
interring it from local hyperlink structure. - Construct a protein similarity network
- Add query sequence
- Weight diffusion
- Rank proteins upon convergence
6Algorithm
7Experiment
- Use protein 3-D structure database SCOP as golden
standard. - Sequences have no more than 95 similarity.
- 7329 proteins are splitted into 379 superfamilies
as training and 332 for testing - 3 networks are generated using BLAST and
PSI-BLAST.
8Experiment
- Value
- Compare with other two experiments
- 1. only local structure are considered
- 2. non-local edges without weak edges
- The result shows that the second one is only
slightly worse than our algorithm
Where Sj(i) is E value assigned to protein I
given query j.
9Analysis
Bower et al, Science vol 306, 2004
Cluster structure
10Motif based protein ranking by network propagation
- Author Kuang Rui et., al
- Bioinformatics
- Presented by Tie Wang
11Outline
- Introduction
- Background
- Method
- Experiment
- Analysis
12Background
- Direct measure of pairwise sequence is proved to
be effective on classification. - Performance is dropped down when detecting subtle
remotely homology sequences. - Those sequences share a conserved structure at
least at some components. - Formulate problem based on this statement.
13Protein motif bipartite network
- Each protein contains a set of motifs.
- Each motif belongs to a set of proteins.
- Their relationship are mapped to a
- Bipartite graph as shown on the left.
- The edge weight indicates the probi-
- lity that motif x is in protein y.
14Motifdrop Algorithm
- Set P represents protein sequences and set F
represents motifs. H is the connectivity matrix.
is row normalized version of H.
is a vector of initial value for H.
is a vector of initial value for P.
15MotifProp Algorithm
- The convergence of motifdrop is guranteed.
- The problem is reformulated based on the
following rule,
is row normalized version of H.
is a vector of initial value for H.
is a vector of initial value for P.
16Edge weighting scheme
- PSI-BLAST E-value is assigned between pair-wise
protein nodes. - Gaussian edge weights are calculated.
- The Gaussian weights from query to each protein
are assigned as initial value.
17Value estimation
- Sq(i) is the E-value of protein i and query q.
- Eq(j) is the E-value of the jth motif and ith
protein.
(1)
???
18Estimation on substitution score
- Substitutions score between a kmer f and sequence
x can be estimated as, - where
- and
- sl is a log value which implied the S score
below threshold can be a motif hits against
sequence x.
19Sequential MotifProp
- Empirical experiments suggest that using a
weighted linear combination of multiple motifs
does not improve the results. - Apply a simple multiple motif sets scheme.
- Motif nodes F can be divided into n set partition
- in which F(i) is
a set of motif from ith motif set. - F set represents the motifs instead of individual
ones.
20Motif-rich regions
21Experiments
- 7329 protein domains with known 3D structure on
SCOP. - They are divided into training (4246) and testing
(3083). - Apply additional 10602 from swiss-prot db.
- Evaluation on ROC curve.
22Results of classification
23Results of classification (cont)
24Results on Motif rich region
25Conclusion
- Two methods are presented on protein
classification using protein ranking methods. - Similarity matrix and protein/motif propagation
network are base structures. - Simple methods but innovative formulation.
- Better results compared with current approaches.
- Analysis on results play an important roles.