Part 1: Biological Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Part 1: Biological Networks

Description:

World Wide Web. Degree of a node: the number of edges incident on the node. i ... Definition of Learning. 3 types of learning. Supervised learning. Unsupervised ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 49
Provided by: alp96
Category:

less

Transcript and Presenter's Notes

Title: Part 1: Biological Networks


1
Part 1 Biological Networks
  • Protein-protein interaction networks
  • Regulatory networks
  • Expression networks
  • Metabolic networks
  • more biological networks
  • Other types of networks

2
Expression networks
Qian, et al, J. Mol. Bio., 3141053-1066
3
  • Regulatory networks

Horak, et al, Genes Development, 163017-3033
4
Expression networks
Regulatory networks
5
Interaction networks
Expression networks
Regulatory networks
6
  • Metabolic networks

DeRisi, Iyer, and Brown, Science, 278680-686
7
Interaction networks
Metabolic networks
8
... more biological networks
Hierarchies DAGs Enzyme, Bairoch GO,
Ashburner MIPS, Mewes, Frishman
9
... more biological networks
10
Other types of networks
Disease Spread Krebs
Electronic Circuit
Food Web
Internet Burch Cheswick
Social Network
11
Part 2 Graphs, Networks
  • Graph definition
  • Topological properties of graphs
  • Degree of a node
  • Clustering coefficient
  • Characteristic path length
  • Random networks
  • Small World networks
  • Scale Free networks

12
  • Graph a pair of sets GP,E where P is a set of
    nodes, and E is a set of edges that connect 2
    elements of P.
  • Directed, undirected graphs
  • Large, complex networks are ubiquitous in the
    world
  • Genetic networks
  • Nervous system
  • Social interactions
  • World Wide Web

13
  • Degree of a node the number of edges incident on
    the node

i
Degree of node i 5
14
  • Clustering coefficient ? LOCAL property
  • The clustering coefficient of node i is the ratio
    of the number of edges that exist among
    its neighbours, over the number of edges that
    could exist

Clustering coefficient of node i 1/6
  • The clustering coefficient for the entire network
    C is the average of all the

15
Characteristic path length ? GLOBAL property
  • is the number of edges in the shortest
    path between vertices i and j

Networks with small values of L are said to have
the small world property
16
Models for networks of complex topology
  • Erdos-Renyi (1960)
  • Watts-Strogatz (1998)
  • Barabasi-Albert (1999)

17
The Erdos-Rényi ER model (1960)
  • Start with N vertices and no edges
  • Connect each pair of vertices with probability
    PER
  • Important result many properties in these graphs
    appear quite suddenly, at a threshold value of
    PER(N)
  • If PERc/N with clt1, then almost all vertices
    belong to isolated trees
  • Cycles of all orders appear at PER 1/N

18
The Watts-Strogatz WS model (1998)
  • Start with a regular network with N vertices
  • Rewire each edge with probability p
  • For p0 (Regular Networks)
  • high clustering coefficient
  • high characteristic path length
  • For p1 (Random Networks)
  • low clustering coefficient
  • low characteristic path length

QUESTION What happens for intermediate values of
p?
19
1) There is a broad interval of p for which L is
small but C remains large
2) Small world networks are common
20
The Barabási-Albert BA model (1999)
Look at the distribution of degrees
ER Model
ER Model
WS Model
www
actors
power grid
The probability of finding a highly connected
node decreases exponentially with k
21
  • ? two problems with the previous models
  • 1. N does not vary
  • 2. the probability that two vertices are
    connected is uniform
  • GROWTH starting with a small number of vertices
    m0 at every timestep add a new vertex with m m0
  • PREFERENTIAL ATTACHMENT the probability ? that
    a new vertex will be connected to vertex i
    depends on the connectivity of that vertex

22
a) Connectivity distribution with N m0t300000
and m0m1(circles), m0m3 (squares), and m0m5
(diamons) and m0m7 (triangles)
b) P(k) for m0m5 and system size N100000
(circles), N150000 (squares) and N200000
(diamonds)
? Scale Free Networks
23
Part 3 Machine Learning
  • Artificial Intelligence/Machine Learning
  • Definition of Learning
  • 3 types of learning
  • Supervised learning
  • Unsupervised learning
  • Reinforcement Learning
  • Classification problems, regression problems
  • Occams razor
  • Estimating generalization
  • Some important topics
  • Naïve Bayes
  • Probability density estimation
  • Linear discriminants
  • Non-linear discriminants (Decision Trees, Support
    Vector Machines)

24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Classification Problems
Bayes Rule minimum classification error is
achieved by selecting the class with largest
posterior probability
28
Regression Problems
PROBLEM we are only given the red points, and we
would like approximate the blue curve (e.g. with
polynomial functions)
QUESTION which solution should I pick? And why?
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Naïve Bayes
Example given a set of features for each gene,
predict whether it is essential
33
(No Transcript)
34
Naïve Bayes approximation
35
Probability density estimation
  • Assume a certain probabilistic model for each
    class
  • Learn the parameters for each model (EM
    algorithm)

36
Linear discriminants
  • assume a specific functional form for the
    discriminant function
  • learn its parameters

37
Decision Trees (C4.5, CART)
  • ISSUES
  • how to choose the best attribute
  • how to prune the tree

Trees can be converted into rules !
38
Part 4 Networks Predictions
  • Naïve Bayes for inferring Protein-Protein
    Interactions

39
The data
Gold-Standards
Network
Jansen, Yu, et al., Science Yu, et al., Genome
Res.
40
Gold-Standards
Network
41
Gold-Standards
Network
L1 (4/4)/(3/6) 2
42
Gold-Standards
Network
L1 (4/4)/(3/6) 2
43
Gold-Standards
Network
L1 (4/4)/(3/6) 2
44
Gold-Standards
Network
L1 (4/4)/(3/6) 2
45
Gold-Standards
Network
L1 (4/4)/(3/6) 2
46
Gold-Standards
Network
L1 (4/4)/(3/6) 2 L2 (3/4)/(3/6) 1.5 For
each protein pair LR L1 ? L2 log(LR) log(L1)
log(L2)
47
Gold-Standards
Network
L1 (4/4)/(3/6) 2 L2 (3/4)/(3/6) 1.5 For
each protein pair LR L1 ? L2 log(LR) log(L1)
log(L2)
48
  • Individual features are weak predictors,
  • LR 10
  • Bayesian integration is much more powerful,
  • LRcutoff 600 9000 interactions
Write a Comment
User Comments (0)
About PowerShow.com