CS/CBB 545 - Data Mining Function Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

CS/CBB 545 - Data Mining Function Prediction

Description:

CSCBB 545 Data Mining Function Prediction – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 60
Provided by: off661
Category:

less

Transcript and Presenter's Notes

Title: CS/CBB 545 - Data Mining Function Prediction


1
CS/CBB 545 - Data MiningFunction Prediction
Networks as an application of Mining
  • Mark Gerstein, Yale University
  • gersteinlab.org/courses/545
  • (class 2007,02.20 1430-1545)

2
Specific Applications Function Prediction
3
The problem Grappling with Function on a Genome
Scale?
.
530
  • 250 of 530 originally characterized on chr. 22
    Dunham et al.
  • gt25K Proteins in Entire Human Genome
  • (with alt. splicing)

4
Traditional single molecule way to integrate
evidence describe function
EF2_YEAST
Descriptive Name Elongation Factor 2
Lots of references to papers
Summary sentence describing functionThis
protein promotes the GTP-dependent translocation
of the nascent protein chain from the A-site to
the P-site of the ribosome.
5
Functional Classification
ENZYME (SwissProt Bairoch/Apweiler,just
enzymes, cross-org.)
COGs(cross-org., just conserved, NCBI
Koonin/Lipman)
GenProtEC(E. coli, Riley)
Also Other SwissProt Annotation WIT, KEGG (just
pathways) TIGR EGAD (human ESTs) SGD (yeast)
Fly (fly, Ashburner)now extended to GO
(cross-org.)
MIPS/PEDANT(yeast, Mewes)
6
Hierarchy of Protein Functions
7
Some obvious issues in scaling single molecule
definition to a genomic scale
  • Fundamental complexities
  • Often gt2 proteins/function
  • Multi-functionality 2 functions/protein
  • Role Conflation molecular, cellular, phenotypic

8
Some obvious issues in scaling single molecule
definition to a genomic scale
  • Fundamental complexities
  • Often gt2 proteins/function
  • Multi-functionality 2 functions/protein
  • Role Conflation molecular, cellular, phenotypic
  • Fun terms but do they scale?....
  • Starry night (P Adler, 94)
  • Lush cheapdate (former wants alcohol, later
    makes susceptible)
  • Vulcan Klingon
  • Sonic Kryptonite

9
Toward Systematic Ontologies for Function, using
Networks
Hierarchies DAGs Enzyme, Bairoch GO,
Ashburner MIPS, Mewes, Frishman
General Networks Eisenberg et al.
Interaction Vectors Lan et al, IEEE 901848
10
Gene Expression Information and Protein Features

11
Typical Predictors and Response for Yeast
12
Prediction of Function on a Genomic Scale from
Array Data Sequence Features
Core
Different Aspects of function molecular action,
cellular role, phenotypic manifestationAlso
localization, interactions, complexes
13
Specific Applications Networks -- What are the
types of Biological Networks
14
  • Graph a pair of sets GP,E where P is a set of
    nodes, and E is a set of edges that connect 2
    elements of P.
  • Directed, undirected graphs
  • Large, complex networks are ubiquitous in the
    world
  • Genetic networks
  • Nervous system
  • Social interactions
  • World Wide Web

15
Biological Networks
  1. Protein-protein interaction networks
  2. Regulatory networks
  3. Expression networks
  4. Metabolic networks
  5. more biological networks
  6. Other types of networks

16
Expression networks
Qian, et al, J. Mol. Bio., 3141053-1066
17
Format of Gene Expression Data
MCM3 MCM6 CDC47 MCM2 CDC46 CDC54
DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4
ORC2 ORC6 ORC5 ORC4 ORC3 ORC1
18
Clusteringthe yeast cell cycle to uncover
interacting proteins
Brown, Davis
Extra
Microarray timecourse of 1 ribosomal protein
19
Clusteringthe yeast cell cycle to uncover
interacting proteins
Extra
Random relationship from 18M
20
Clusteringthe yeast cell cycle to uncover
interacting proteins
Botstein Church, Vidal
Extra
Close relationship from 18M (2 Interacting
Ribosomal Proteins)
21
Clusteringthe yeast cell cycle to uncover
interacting proteins
Extra
Predict Functional Interaction of Unknown Member
of Cluster
22
Global Network of Relationships
Core
470K significant relationships from 18M
possible
23
  • Regulatory networks

Horak, et al, Genes Development, 163017-3033
24
Protein Interaction Network
Jeong et al.
25
Yeast two-hybrid
26
Affinity Purification and Mass Spec.
From ocw.mit.edu//791_ak_lecture7.pdf
27
  • Metabolic networks

DeRisi, Iyer, and Brown, Science, 278680-686
28
Interaction networks
Metabolic networks
29
... more biological networks
30
Networks as a universal language
Internet Burch Cheswick
Electronic Circuit
Food Web
Disease Spread Krebs
Neural Network Cajal
ProteinInteractions Barabasi
Social Network
31
Networks occupy a midway point in terms of level
of understanding
1D Complete Genetic Partslist
3D Detailed structural understanding of
cellular machinery
2D Bio-molecular Network Wiring Diagram
Jeong et al.
32
Richness of the Visual Representation of Networks
  • Some structure (connectivity) but some
    flexibility (e.g. edge colors, node positions and
    shapes) that can used to encode additional
    information

33
VisualComplexity.com
34
Networks What are the Main Quantities that Can
be Calculated from Network Topology?
35
  • Degree of a node the number of edges incident on
    the node

i
Degree of node i 5
36
Network parameters
  • Number of incoming and outgoing connections

Connectivity
37
Network parameters
  • Ratio of existing links to maximum number of
    links for neighbouring nodes

Measure of inter-connectedness of the network
Average coefficient 0.04
Clustering coefficient
1/6 0.17
38
Path length
  • Number of intermediate TFs to reach final
    target
  • Indication of how immediate a response is
  • Average path length 4

39
Characteristic path length ? GLOBAL property
  • is the number of edges in the shortest
    path between vertices i and j

Networks with small values of L are said to have
the small world property
40
Network motifs
  • Regulatory modules within the network

SIM
MIM
FFL
FBL
Alon
41
FFL Feed-forward loops
SBF
Yox1
Pog1
Tos8
Plm2
Alon
42
Cliques
  • Fully connected sub-components
  • Related measures k-cores, Hogue

43
Predicting protein interactions by completing
defective cliques
  • High-throughput experiments are prone to missing
    interactions

P
Q
  • If proteins P and Q interact with a clique K of
    proteins which all interact with each other, then
    P and Q are more likely to interact with each
    other
  • P, Q, and K form a defective clique

Yu et al. Bioinformatics (2006)
44
Networks Simple Mathematical Models for
Interpreting Complex Topology
45
Models for networks of complex topology
  • Erdos-Renyi (1960)
  • Watts-Strogatz (1998)
  • Barabasi-Albert (1999)

A Barabási R Albert "Emergence of scaling in
random networks," Science 286, 509-512 (1999).
46
The Erdos-Rényi ER model (1960)
  • Start with N vertices and no edges
  • Connect each pair of vertices with probability
    PER
  • Important result many properties in these graphs
    appear quite suddenly, at a threshold value of
    PER(N)
  • If PERc/N with clt1, then almost all vertices
    belong to isolated trees
  • Cycles of all orders appear at PER 1/N

47
The Watts-Strogatz WS model (1998)
  • Start with a regular network with N vertices
  • Rewire each edge with probability p
  • For p0 (Regular Networks)
  • high clustering coefficient
  • high characteristic path length
  • For p1 (Random Networks)
  • low clustering coefficient
  • low characteristic path length

QUESTION What happens for intermediate values of
p?
48
1) There is a broad interval of p for which L is
small but C remains large
2) Small world networks are common
49
The Barabási-Albert BA model (1999)
Look at the distribution of degrees
ER Model
ER Model
WS Model
www
actors
power grid
The probability of finding a highly connected
node decreases exponentially with k
50
  • ? two problems with the previous models
  • 1. N does not vary
  • 2. the probability that two vertices are
    connected is uniform
  • GROWTH starting with a small number of vertices
    m0 at every timestep add a new vertex with m m0
  • PREFERENTIAL ATTACHMENT the probability ? that
    a new vertex will be connected to vertex i
    depends on the connectivity of that vertex

51
Birth of Scale-Free Network
From Barabasi Bonabeau, Sci. Am., May '03
52
SCALE FREENESS GENERALLY EVOLVES THROUGH
PREFERENTIAL ATTACHMENT (THE RICH GET RICHER)
ILLUSTRATIVE
The Duplication Mutation Model
Description
  • Theoretical work shows that a mechanism of
    preferential attachment leads to a scale-free
    topology
  • (The rich get richer)
  • In interaction network, gene duplication followed
    by mutation of the duplicated gene is generally
    thought to lead to preferential attachment

The interaction partners of A are more likely to
be duplicated
Gene duplication
  • Simple reasoning The partners of a hub are more
    likely to be duplicated than the partners of a
    non-hub

Source Albert et al. Rev. Mod. Phys. (2002)
and Middendorf et al. PNAS (2005)
53
Random v Scale-free Networks
From Barabasi Bonabeau, Sci. Am., May '03
54
Scale-free networks in Biology
Power-law distribution
log(Frequency)
log(Degree)
Hubs dictate the structure of the network
Barabasi
55
Power-law distribution of connectivities
  • Many TFs have few target genes
  • Few TFs have many target genes

56
Knocking Out Nodes in Scale-free and Random
Networks
From Barabasi Bonabeau, Sci. Am., May '03
57
Hubs tend to be Essential
Integrate gene essentiality data with protein
interaction network. Perhaps hubs represent
vulnerable points? Lauffenburger, Barabasi
"hubbiness"
Essential
Non- Essential
Yu et al., 2003, TIG
58
Relationships extends to "Marginal Essentiality"
  • Marginal essentiality measures relative
    importance of each gene (e.g. in growth-rate and
    condition-specific essentiality experiments) and
    scales continuously with "hubbiness"

"hubbiness"
Essential
Not important
Very important
important
59
Bottlenecks Hubs
Yu et al., PLOS CB (2007)
60
Bottleneck bridging between processes
Write a Comment
User Comments (0)
About PowerShow.com