DDPIn Distance and Density Based Protein Indexing

About This Presentation

Title:

DDPIn Distance and Density Based Protein Indexing

Description:

SCOP (Structural Classification of Proteins) no need for an alignment. indexing various features ... classification against SCOP. Experimental results ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 18

Provided by: tup

Category:

more less

Transcript and Presenter's Notes

Title: DDPIn Distance and Density Based Protein Indexing

1
DDPIn Distance and Density Based Protein
Indexing

David Hoksza
Charles University in Prague Department of
Software Engineering Czech Republic

2
Presentation Outline

Biological background
Similarity search in protein structure databases
DDPIn
feature vector extraction
metrics
querying
one-step approach
multi-step approach
Experimental results
Conclusion

3
Biological Background

Proteins
molecules
translated from mRNA in ribosomes
DNA ? RNA ? protein
sequence of amino acids (20 AAs)
coded by codon (triplet of nucleotides)
Function of a protein derived from its three
dimensional structure
? similar proteins have similar functions
similar proteins have a common ancestor
Identifying protein structure ? finding similar
proteins ? getting clue to the function

4
Similarity Search in Protein Databases

Similarity between a pair of proteins
alignment similarity score
RMSD, TM-score,
visual inspection
DALI, CE, SAP, VAST
Classification
SCOP (Structural Classification of Proteins)
no need for an alignment
indexing various features
PSI, PSIST, ProGreSS, CTSS, DDPIn

5
DDPIn - Overview

Distance and Density based Protein Indexing
Classification method
Indexing of protein features
distances among Ca atoms used
each AA represents a feature ? protein p consists
of p features
various semantics used
based on clustering Ca atoms into rings
metric indexing employed (M-tree)
kNN querying
outcomes of several searches are merged to obtain
final results

6
DDPIn - Feature Extraction

Features
n-dimensional vectors of real numbers
AA viewpoint ? VPT (viewpoint tag)
sDens
density of AAs in rings with a predefined width
sDensSSE
enhanced with SSE information
sRad
widths of rings containing predefined percentage
of AAs
sRadSSE
enhanced with SSE information
sDir
number of AAs in a ring pointing from the
viepoint
sDens enhanced with direction information

7
DDPIn - Similarity of VPTs

Metrics
L2
weighted L2
close neighborhood of VPs is more important

8
DDPIn Indexing Structure

M-tree (Metric tree)
Dynamic, hierarchical indexing structure
Data space divided into ball shaped data regions
(hyper-spheres)
root node represent data region covering all data
children nodes represent regions covering parts
of the space,
data regions form balanced hierarchical structure
inner nodes ? routing entries
leaf nodes ? ground entries

9
Querying / Classification

One-step
extracting VPTs from query ? n queries
ranking scheme

Two-step
healing
reclassification with Smith-Waterman algorithm on
sequences

10
Experimental Results

SCOP 1.65 dataset
class ? fold ? superfamily ? family
1810 proteins
181 superfamilies
at least 10 proteins each
all a, all ß, a ß and a /ß classes
query set
reduced - 181 queries
full
used also by PSI, ProGreSS, PSIST methods
Testing of
superfamily classification accuracy
fold classification accuracy

11
Finding Optimal k for kNN Queries
12
Accuracy of VPT Semantics
13
Accuracy for Increasing Dimension
14
Accuracy of Various Metrics
15
Suitability of Pairs of VPT Semantics for Healing
identical correct classification
identical wrong classification
16
Comparison of Classification Methods
17
Conclusion