Maximum Likelihood - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Maximum Likelihood

Description:

Given p of success in single trial, what is probability of h success over n trials? ... CPU intensive and may take a long time to complete an evaluation ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 20

Provided by: peopleB

Learn more at: https://people.brandeis.edu

Category:

more less

Transcript and Presenter's Notes

Title: Maximum Likelihood

1
Maximum Likelihood

Flips usage of probability function
A typical calculation
P(hn,p) C(h, n) ph (1-p)(n-h)
The implied question
Given p of success in single trial, what is
probability of h success over n trials?

2
The ML question

The ML calculation
L(pnh) C(h, n) ph (1-p)(n-h)
What is probability that parameter p results in h
success over n trials?
Experiment with test values of p and choose the
one that results in highest likelihood

3
Consider a small alignment

Let sequences s 3
Each position a data point
For each position, 4s possible values, e.g.
A,A,A,A,A,T
In this example, 64 possible values each position.

4
Probability/Likelihood function
The simplest model use an arbitrary p for each
of the 64 possible values based on its observed
freq. 2 patterns have p0.5, all others
p0. Result works but is not biologically
interesting.
5
Maximum likelihood testing model
6
Definition

Method for the inference of phylogeny
Method that searches for the tree with the
highest probability or likelihood.

7
Example going through the Maximum likelihood model

Assume that we have the aligned nucleotide
sequences for four taxa
(1) A G G C U C C A A ....A
(2) A G G U U C G A A ....A
(3) A G C C C A G A A.... A
A U U U C G G A A.... C
Evaluate the likelihood of the uprooted tree
represented by the nucleotides of site j in the
sequence

http//www.icp.ucl.ac.be/opperd/private/max_likel
i.html
8

Since the likelihood of the tree is independent
of the position of the root, we can display the
figure as shown in Figure B.
Assume that the nucleotides evolve independently
(the Markovian model of evolution)
Calculate the likelihood for each site separately
and combine the likelihood into a total value
towards the end.
. To calculate the likelihood for site j, we have
to consider all the possible scenarios by which
the nucleotides present at the tips of the tree
could have evolved.
Therefore the likelihood for a particular site is
the summation of the probabilities of every
possible reconstruction of ancestral states,
given some model of base substitution.

http//www.icp.ucl.ac.be/opperd/private/max_likel
i.html
9

So in this specific case all possible nucleotides
A, G, C, and T occupying nodes (5) and (6), or 4
x 4 16 possibilities
Protein sequences each site may occupy 20 states
(that of the 20 amino acids)
20x20 thus 400 possibilities have to be
considered.
Since any one of these scenarios could have led
to the nucleotide configuration at the tip of the
tree, we must calculate the probability of each
and sum them to obtain the total probability for
each site j.

http//www.icp.ucl.ac.be/opperd/private/max_likel
i.html
10

The likelihood for the full tree then is product
of the likelihood at each site.

Since the individual likelihoods are extremely
small numbers it is convenient to sum the log
likelihoods at each site and report the
likelihood of the entire tree as the log
likelihood.

This above procedure is then repeated for all
possible topologies (for all possible trees).
The tree with the highest probability is the tree
with the highest maximum likelihood.

12
Hulsenbeck J., Crandall, K. Annu. Rev. Ecol.
Syst., 1997, 28437-66.
13
DNA Substitution Models
14
General DNA Substitution Model

Likelihood L is the propability of observing data
D given hypothesis H
L Pr(D/H)
The use of maximum likelihood (ML) algorithms in
developing phylogenetic hypotheses requires a
model of evolution.

15
The rate matrix for a general model of DNA
substitution is given by

The rows and columns are ordered A, C, G and T.
The matrix gives the rate of change from
nucleotide i(arranged along the rows) to
nucleotide j(along the columns). For example
r2pC gives the rate of change from A to C.

16
Let P(v,s) be the transition probability matrix
where pi,j(v,s) is the probability that
nucleotide i changes into j over branch length v.
The vector s contains the parameters of the
substitution model(eg. pA, pC, pG, pT,
r1,r2). For two-state case, to calculate the
probability of observing a change over a branch
of length v, the following matrix calculation is
performed P (v,s) eQv
17
DNA substitution Models
18
Advantages of Maximum likelihood

Lower variance than other methods
Least affected by sampling error
Robust to many violations of the assumptions of
the evolutionary model, even with very short
sequences, they outperform other methods).
Are less error prone.
Statistically well founded.
Evaluate different tree topologies.

19
Disadvantages of Maximum likelihood