Title: Part 1: Biological Networks
1Part 1 Biological Networks
- Protein-protein interaction networks
- Regulatory networks
- Expression networks
- Metabolic networks
- more biological networks
- Other types of networks
2Expression networks
Qian, et al, J. Mol. Bio., 3141053-1066
3Horak, et al, Genes Development, 163017-3033
4Expression networks
Regulatory networks
5Interaction networks
Expression networks
Regulatory networks
6DeRisi, Iyer, and Brown, Science, 278680-686
7Interaction networks
Metabolic networks
8... more biological networks
Hierarchies DAGs Enzyme, Bairoch GO,
Ashburner MIPS, Mewes, Frishman
9... more biological networks
10Other types of networks
Disease Spread Krebs
Electronic Circuit
Food Web
Internet Burch Cheswick
Social Network
11Part 2 Graphs, Networks
- Graph definition
- Topological properties of graphs
- Degree of a node
- Clustering coefficient
- Characteristic path length
- Random networks
- Small World networks
- Scale Free networks
12- Graph a pair of sets GP,E where P is a set of
nodes, and E is a set of edges that connect 2
elements of P.
- Directed, undirected graphs
- Large, complex networks are ubiquitous in the
world - Genetic networks
- Nervous system
- Social interactions
- World Wide Web
13- Degree of a node the number of edges incident on
the node
i
Degree of node i 5
14- Clustering coefficient ? LOCAL property
- The clustering coefficient of node i is the ratio
of the number of edges that exist among
its neighbours, over the number of edges that
could exist
Clustering coefficient of node i 1/6
- The clustering coefficient for the entire network
C is the average of all the
15Characteristic path length ? GLOBAL property
- is the number of edges in the shortest
path between vertices i and j
Networks with small values of L are said to have
the small world property
16Models for networks of complex topology
- Erdos-Renyi (1960)
- Watts-Strogatz (1998)
- Barabasi-Albert (1999)
17The Erdos-Rényi ER model (1960)
- Start with N vertices and no edges
- Connect each pair of vertices with probability
PER
- Important result many properties in these graphs
appear quite suddenly, at a threshold value of
PER(N) - If PERc/N with clt1, then almost all vertices
belong to isolated trees - Cycles of all orders appear at PER 1/N
18The Watts-Strogatz WS model (1998)
- Start with a regular network with N vertices
- Rewire each edge with probability p
- For p0 (Regular Networks)
- high clustering coefficient
- high characteristic path length
- For p1 (Random Networks)
- low clustering coefficient
- low characteristic path length
QUESTION What happens for intermediate values of
p?
191) There is a broad interval of p for which L is
small but C remains large
2) Small world networks are common
20The Barabási-Albert BA model (1999)
Look at the distribution of degrees
ER Model
ER Model
WS Model
www
actors
power grid
The probability of finding a highly connected
node decreases exponentially with k
21- ? two problems with the previous models
- 1. N does not vary
- 2. the probability that two vertices are
connected is uniform
- GROWTH starting with a small number of vertices
m0 at every timestep add a new vertex with m m0
- PREFERENTIAL ATTACHMENT the probability ? that
a new vertex will be connected to vertex i
depends on the connectivity of that vertex
22a) Connectivity distribution with N m0t300000
and m0m1(circles), m0m3 (squares), and m0m5
(diamons) and m0m7 (triangles)
b) P(k) for m0m5 and system size N100000
(circles), N150000 (squares) and N200000
(diamonds)
? Scale Free Networks
23Part 3 Machine Learning
- Artificial Intelligence/Machine Learning
- Definition of Learning
- 3 types of learning
- Supervised learning
- Unsupervised learning
- Reinforcement Learning
- Classification problems, regression problems
- Occams razor
- Estimating generalization
- Some important topics
- Naïve Bayes
- Probability density estimation
- Linear discriminants
- Non-linear discriminants (Decision Trees, Support
Vector Machines)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Classification Problems
Bayes Rule minimum classification error is
achieved by selecting the class with largest
posterior probability
28Regression Problems
PROBLEM we are only given the red points, and we
would like approximate the blue curve (e.g. with
polynomial functions)
QUESTION which solution should I pick? And why?
29(No Transcript)
30(No Transcript)
31(No Transcript)
32Naïve Bayes
Example given a set of features for each gene,
predict whether it is essential
33(No Transcript)
34Naïve Bayes approximation
35Probability density estimation
- Assume a certain probabilistic model for each
class - Learn the parameters for each model (EM
algorithm)
36Linear discriminants
- assume a specific functional form for the
discriminant function - learn its parameters
37Decision Trees (C4.5, CART)
- ISSUES
- how to choose the best attribute
- how to prune the tree
Trees can be converted into rules !
38Part 4 Networks Predictions
- Naïve Bayes for inferring Protein-Protein
Interactions
39The data
Gold-Standards
Network
Jansen, Yu, et al., Science Yu, et al., Genome
Res.
40Gold-Standards
Network
41Gold-Standards
Network
L1 (4/4)/(3/6) 2
42Gold-Standards
Network
L1 (4/4)/(3/6) 2
43Gold-Standards
Network
L1 (4/4)/(3/6) 2
44Gold-Standards
Network
L1 (4/4)/(3/6) 2
45Gold-Standards
Network
L1 (4/4)/(3/6) 2
46Gold-Standards
Network
L1 (4/4)/(3/6) 2 L2 (3/4)/(3/6) 1.5 For
each protein pair LR L1 ? L2 log(LR) log(L1)
log(L2)
47Gold-Standards
Network
L1 (4/4)/(3/6) 2 L2 (3/4)/(3/6) 1.5 For
each protein pair LR L1 ? L2 log(LR) log(L1)
log(L2)
48- Individual features are weak predictors,
- LR 10
- Bayesian integration is much more powerful,
- LRcutoff 600 9000 interactions