Bioinformatics Research Overview - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics Research Overview

Description:

Bioinformatics Research Overview. Li Liao. Develop new algorithms and (statistical) ... Drug discovery. Methods: Put data into a context: knowledge/data ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 17
Provided by: lil3
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Research Overview


1
Bioinformatics Research Overview Li
Liao Develop new algorithms and (statistical)
learning methods gt Capable of incorporating
domain knowledge gt Effective, Expressive,
Interpretable
2
Motivations
  • Understanding correlations between genotype and
    phenotype
  • Predicting genotype ltgt phenotype
  • Phenotypes
  • Protein function
  • Drug/therapy response
  • Drug-drug interactions for expression
  • Drug mechanism
  • Interacting pathways of metabolism

3
Projects
  • Homology detection, protein family classification
  • (funded by a DuPont SE award)
  • Support Vector Machines
  • Hidden Markov models
  • Graph theoretic methods
  • Probabilistic modeling for BioSequence (funded by
    NIH)
  • HMMs, and beyond
  • Motifs finding
  • Secondary structure
  • Comparative Genomics
  • Identify genome features for diagnostic and
    therapeutic purposes
  • (funded by an Army grant)
  • Evolution of metabolic pathways
  • Tree and graph comparisons

4
Detect remote homologues
  • Attributes to be looked at
  • Sequence similarity, Aggregate statistics (e.g.,
    protein families), Pattern/motif, and more
    attributes (presence at phylogenetic tree).
  • How to incorporate domain specific knowledge into
    the model so a classifier can be more accurate?
  • Results
  • Quasi-consensus based comparison of profile HMM
    for protein sequences (submitted to
    Bioinformatics)
  • Using extended phylogenetic profiles and support
    vector machines for protein family classification
    (SNPD 04)
  • Combining Pairwise Sequence Similarity and
    Support Vector Machines for Detecting Remote
    Protein Evolutionary and Structural Relationships
    (JCB 2003)

5
Support Vector Machines
6
  • Data phylogenetic profiles
  • - How to account for correlations among
    profile components?
  • profile extension (Narra Liao, SNPD 04)

Tree-based distance
Hamming distance
0 1 1 1 1
x
3 0.1
1 1 1 1 1
y
3 0.5
1 1 1 1 0
z
7
Quasi consensus based comparison of HMMs
  • From MSA to profile HMMs using
  • existing packages (SAM-T99 or HMMER)
  • Generation of quasi consensus
  • sequence from the model
  • Alignment of consensus sequence of a
  • model with the other model


8
(No Transcript)
9
Sequence Models (HMMs and beyond)
  • Motivations What is responsible for the
    function?
  • Patterns/motifs
  • Secondary structure
  • To capture long range correlations of bio
    sequences
  • Transporter proteins
  • RNA secondary structure
  • Methods generative versus discriminative
  • Linear dependent processes
  • Stochastic grammars
  • Model equivalence

10
TMMOD An improved hidden Markov model for
predicting transmembrane topology (to appear in
IEEE ICTAI04)
11
Mod. Reg. Data set Correct topology Correct location Sens- itivity Speci- ficity
TMMOD 1 (a) (b) (c) S-83 65 (78.3) 51 (61.4) 64 (77.1) 67 (80.7) 52 (62.7) 65 (78.3) 97.4 71.3 97.1 97.4 71.3 97.1
TMMOD 2 (a) (b) (c) S-83 61 (73.5) 54 (65.1) 54 (65.1) 65 (78.3) 61 (73.5) 66 (79.5) 99.4 93.8 99.7 97.4 71.3 97.1
TMMOD 3 (a) (b) (c) S-83 70 (84.3) 64 (77.1) 74 (89.2) 71 (85.5) 65 (78.3) 74 (89.2) 98.2 95.3 99.1 97.4 71.3 97.1
TMHMM S-83 64 (77.1) 69 (83.1) 96.2 96.2
PHDtm S-83 (85.5) (88.0) 98.8 95.2
TMMOD 1 (a) (b) (c) S-160 117 (73.1) 92 (57.5) 117 (73.1) 128 (80.0) 103 (64.4) 126 (78.8) 97.4 77.4 96.1 97.0 80.8 96.7
TMMOD 2 (a) (b) (c) S-160 120 (75.0) 97 (60.6) 118 (73.8) 132 (82.5) 121 (75.6) 135 (84.4) 98.4 97.7 98.4 97.2 95.6 97.2
TMMOD 3 (a) (b) (c) S-160 120 (75.0) 110 (68.8) 135 (84.4) 133 (83.1) 124 (77.5) 143 (89.4) 97.8 94.5 98.3 97.6 98.1 98.1
TMHMM S-160 123 (76.9) 134 (83.8) 97.1 97.7
12
Genomics study of enterobacterial BT agents
(funded by the US Army via Center for Biological
Defense, USF )
  • Goals
  • Identification of genes and sequence tags as
    targets for novel diagnosis and therapy
  • BT agents Yersinia pestis, Salmonella,
    Escherichia coli O157H7)
  • Methods
  • Various bioinformatics tools and databases

13
Comparative Genomics
  • Motivation
  • Evolution of metabolic pathways
  • Gene functions
  • De novo (alternative pathways)
  • Genetic engineering
  • Drug discovery
  • Methods
  • Put data into a context knowledge/data
    representation
  • Trees, graphs, etc.
  • Learning models/methods

14
Profiling pairs of attribute-value
15
  • What we found
  • Informative way to compare genomes
  • Majority pathways (or rather their enzyme
    components) evolve in congruence with species

16
What we do next
  • Database and search engine
  • Off-line self-consistent iteration
  • Pathways in a network
  • Graph comparisons
  • Identify key components of networks
  • Small world topology
  • Cross-level interactions with regulatory networks
Write a Comment
User Comments (0)
About PowerShow.com