COMP201 Java Programming - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

COMP201 Java Programming

Description:

Hill climbing algorithm. Phylogeny / Slide 3. Nevin L. Zhang, HKUST. Today ... When restricted to one particular site, a phylogenetic tree is an HLC model where ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 21

Provided by: CSD149

Category:

more less

Transcript and Presenter's Notes

Title: COMP201 Java Programming

1
COMP 538 Introduction of Bayesian networks
Lecture 16 Wrap-Up

2
Recap

Latent class models
Clustering
Clustering criterion conditional independence
Drawback Assumption too strong
Hierarchical latent class (HLC) models
Identifiability issues regularity, equivalence
Hill climbing algorithm

3
Today

Phylogenetic (evolution) trees
Closely related to HLC models
An example of viewing existing models in the
framework of BN
Another example HMM
Interesting because
Ease understanding
Techniques in one field applied to another
Structural EM for phylogenetic trees
Dynamic BNs for speech understanding
Development of general purpose algorithms
Bayesian networks for classification
Hand waving only

4
Phylogenetic Tree Outline

Introduction to phylogenetic trees
Probabilistic models of evolution
Tree reconstruction

5
Phylogenetic Trees

Assumption
All organisms on Earth have a common ancestor
This implies that any set of species is related.
Phylogeny
The relationship between any set of species.
Phylogenetic tree
Usually, the relationship can be represented by a
tree which is called a phylogenetic (evolution)
tree
this is not always true

6
Phylogenetic Trees

Phylogenetic trees

Current-day species at bottom
7
Phylogenetic Trees

TAXA (sequences) identify species
Edge lengths represent evoluation time
Assumption bifurcating tree toplogy

8
Probabilistic Models of Evolution

Characterize relationship between taxa using
substitution probability
P(x y, t) probability that ancestral sequence
y evolves into sequence x along an edge of length
t
P(X7), P(X5X7, t5), P(X6X7, t6), P(S1X5, t1),
P(S2X5, t2), .

9
Probabilistic Models of Evolution

What should P(xy, t) be?
Two assumptions of commonly used models
There are only substitutions, no
insertions/deletions (aligned)
One-to-one correspondence between sites in
different sequences
Each site evolves independently and identically
P(xy, t) Pi1 to m P(x(i) y(i), t)
m is sequence length

AAGGCAT
10
Probabilistic Models of Evolution

What should P(x(i)y(i), t) be?
Jukes-Cantor (Character Evolution) Model 1969
Rate of substitution a (Constant or parameter?)
Multiplicativity (lack of memory)

11
Tree Reconstruction

Given collection of current-day taxa
Find tree
Tree topology T
Edge lengths t
Maximum likelihood
Find tree to maximize P(data tree)

AGGGCAT, TAGCCCA, TAGACTT, AGCACAA, AGCGCTT
12
Tree Reconstruction

When restricted to one particular site, a
phylogenetic tree is an HLC model where
The structure is a binary tree and variables
share the same state space.
The conditional probabilities are from the
character evolution model, parameterized by edge
lengths instead of usual parameterization.
The model is the same for different sites

13
Tree Reconstruction

Current-day Taxa AGGGCAT, TAGCCCA, TAGACTT,
AGCACAA, AGCGCTT
Samples for HLC model. One Sample per site. The
samples are i.i.d.
1st site (A, T, T, A, A),
2nd site (G, A, A, G, G),
3rd site (G, G, G, C, C),

14
Tree Reconstruction

Finding ML phylogenetic tree Finding ML HLC
model
Model space
Model structures binary tree where all variables
share the same state space, which is known.
Parameterization one parameter for each edge.
(In general, P(xy) has xy-1 parameters).

15
Bayesian Networks for Classification

The problem
Given data
Find mapping
(A1, A2, , An) - C
Possible solutions
ANN
Decision tree (Quinlan)

16
Bayesian Networks for Classification

Naïve Bayes model
From data, learn
P(C), P(AiC)
Classification
arg max_c P(CcA1a1, , Anan)
Very good in practice

17
Bayesian Networks for Classification

Drawback of NB
Attributes mutually independent given class
variable
Often violated, leading to doubling counting.
Fixes
General BN classifiers
Tree augmented Naïve Bayes (TAN) models
Hierarchical NB

18
Bayesian Networks for Classification

General BN classifier
Treat class variable just as another variable
Learn a BN.
Classify the next instance based on values of
variables in the Markov blanket of the class
variable.
Pretty bad because it does not utilize all
available information

19
Bayesian Networks for Classification

TAN model
Friedman, N., Geiger, D., and Goldszmidt, M.
(1997). Bayesian networks classifiers. Machine
Learning, 29131-163.
Capture dependence among attributes using a tree
structure.
During learning,
First learn a tree among attributes use
Chow-Liu algorithm
Add class variable and estimate parameters
Classification
arg max_c P(CcA1a1, , Anan)

20
Bayesian Networks for Classification

Hierarchical Naïve Bayes models
N. L. Zhang, T. D. Nielsen, and F. V. Jensen
(2002). Latent variable discovery in
classification models. Artificial Intelligence in
Medicine, to appear.
Capture dependence among attributes using latent
variables
Detect interesting latent structures besides
classification
Currently, slow