COMP201 Java Programming - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

COMP201 Java Programming

Description:

Hill climbing algorithm. Phylogeny / Slide 3. Nevin L. Zhang, HKUST. Today ... When restricted to one particular site, a phylogenetic tree is an HLC model where ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 21
Provided by: CSD149
Category:
Tags: comp201 | hill | java | one | programming | tree

less

Transcript and Presenter's Notes

Title: COMP201 Java Programming


1
COMP 538 Introduction of Bayesian networks
Lecture 16 Wrap-Up

2
Recap
  • Latent class models
  • Clustering
  • Clustering criterion conditional independence
  • Drawback Assumption too strong
  • Hierarchical latent class (HLC) models
  • Identifiability issues regularity, equivalence
  • Hill climbing algorithm

3
Today
  • Phylogenetic (evolution) trees
  • Closely related to HLC models
  • An example of viewing existing models in the
    framework of BN
  • Another example HMM
  • Interesting because
  • Ease understanding
  • Techniques in one field applied to another
  • Structural EM for phylogenetic trees
  • Dynamic BNs for speech understanding
  • Development of general purpose algorithms
  • Bayesian networks for classification
  • Hand waving only

4
Phylogenetic Tree Outline
  • Introduction to phylogenetic trees
  • Probabilistic models of evolution
  • Tree reconstruction

5
Phylogenetic Trees
  • Assumption
  • All organisms on Earth have a common ancestor
  • This implies that any set of species is related.
  • Phylogeny
  • The relationship between any set of species.
  • Phylogenetic tree
  • Usually, the relationship can be represented by a
    tree which is called a phylogenetic (evolution)
    tree
  • this is not always true

6
Phylogenetic Trees
  • Phylogenetic trees

Current-day species at bottom
7
Phylogenetic Trees
  • TAXA (sequences) identify species
  • Edge lengths represent evoluation time
  • Assumption bifurcating tree toplogy

8
Probabilistic Models of Evolution
  • Characterize relationship between taxa using
    substitution probability
  • P(x y, t) probability that ancestral sequence
    y evolves into sequence x along an edge of length
    t
  • P(X7), P(X5X7, t5), P(X6X7, t6), P(S1X5, t1),
    P(S2X5, t2), .

9
Probabilistic Models of Evolution
  • What should P(xy, t) be?
  • Two assumptions of commonly used models
  • There are only substitutions, no
    insertions/deletions (aligned)
  • One-to-one correspondence between sites in
    different sequences
  • Each site evolves independently and identically
  • P(xy, t) Pi1 to m P(x(i) y(i), t)
  • m is sequence length

AAGGCAT
10
Probabilistic Models of Evolution
  • What should P(x(i)y(i), t) be?
  • Jukes-Cantor (Character Evolution) Model 1969
  • Rate of substitution a (Constant or parameter?)
  • Multiplicativity (lack of memory)

11
Tree Reconstruction
  • Given collection of current-day taxa
  • Find tree
  • Tree topology T
  • Edge lengths t
  • Maximum likelihood
  • Find tree to maximize P(data tree)

AGGGCAT, TAGCCCA, TAGACTT, AGCACAA, AGCGCTT
12
Tree Reconstruction
  • When restricted to one particular site, a
    phylogenetic tree is an HLC model where
  • The structure is a binary tree and variables
    share the same state space.
  • The conditional probabilities are from the
    character evolution model, parameterized by edge
    lengths instead of usual parameterization.
  • The model is the same for different sites

13
Tree Reconstruction
  • Current-day Taxa AGGGCAT, TAGCCCA, TAGACTT,
    AGCACAA, AGCGCTT
  • Samples for HLC model. One Sample per site. The
    samples are i.i.d.
  • 1st site (A, T, T, A, A),
  • 2nd site (G, A, A, G, G),
  • 3rd site (G, G, G, C, C),

14
Tree Reconstruction
  • Finding ML phylogenetic tree Finding ML HLC
    model
  • Model space
  • Model structures binary tree where all variables
    share the same state space, which is known.
  • Parameterization one parameter for each edge.
    (In general, P(xy) has xy-1 parameters).

15
Bayesian Networks for Classification
  • The problem
  • Given data
  • Find mapping
  • (A1, A2, , An) - C
  • Possible solutions
  • ANN
  • Decision tree (Quinlan)

16
Bayesian Networks for Classification
  • Naïve Bayes model
  • From data, learn
  • P(C), P(AiC)
  • Classification
  • arg max_c P(CcA1a1, , Anan)
  • Very good in practice

17
Bayesian Networks for Classification
  • Drawback of NB
  • Attributes mutually independent given class
    variable
  • Often violated, leading to doubling counting.
  • Fixes
  • General BN classifiers
  • Tree augmented Naïve Bayes (TAN) models
  • Hierarchical NB

18
Bayesian Networks for Classification
  • General BN classifier
  • Treat class variable just as another variable
  • Learn a BN.
  • Classify the next instance based on values of
    variables in the Markov blanket of the class
    variable.
  • Pretty bad because it does not utilize all
    available information

19
Bayesian Networks for Classification
  • TAN model
  • Friedman, N., Geiger, D., and Goldszmidt, M.
    (1997).  Bayesian networks classifiers. Machine
    Learning, 29131-163.
  • Capture dependence among attributes using a tree
    structure.
  • During learning,
  • First learn a tree among attributes use
    Chow-Liu algorithm
  • Add class variable and estimate parameters
  • Classification
  • arg max_c P(CcA1a1, , Anan)

20
Bayesian Networks for Classification
  • Hierarchical Naïve Bayes models
  • N. L. Zhang, T. D. Nielsen, and F. V. Jensen
    (2002). Latent variable discovery in 
    classification models. Artificial Intelligence in
    Medicine, to appear.
  • Capture dependence among attributes using latent
    variables
  • Detect interesting latent structures besides
    classification
  • Currently, slow
Write a Comment
User Comments (0)
About PowerShow.com