Probabilistic Model of Sequences - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Probabilistic Model of Sequences

Description:

Example1: a b a c a b a b a c. Example2: 1 0 0 1 1 0 1 0 0 1 ... link' is measured by counting the umber of links which have a probability ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 25

Provided by: axk

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Model of Sequences

1
Probabilistic Model of Sequences

Ata Kaban
The University of Birmingham

2
Sequence

Example1 a b a c a b a b a c
Example2 1 0 0 1 1 0 1 0 0 1
Example3 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3
Roll a six-sided die N times. You get a sequence.
Roll it again You get another sequence.
Here is a sequence of characters, can you see it?
What is a sequence?
Alphabet1 a,b,c, Alphabet20,1,
Alphabet31,2,3,4,5,6

3
Probabilistic Model

Model system that simulates the sequence under
consideration
Probabilistic model model that produces
different outcomes with different probabilities
It includes uncertainty
It can therefore simulate a whole class of
sequences assigns a probability to each
individual sequence
Could you simulate any of the sequences on the
previous slide?

4
Random sequence model

Back to the die example (can possibly be loaded)
Model of a roll has 6 parameters
p1,p2,p3,p4,p5,p6
Here, p_i is the probability of throwing i
To be probabilities, these must be non-negative
and must sum to one.
What is the probability of the sequence 1, 6,
3?
p1p6p3
NOTE in the random sequence model, the
individual symbols in a sequence do not depend on
each other. This is the simplest sequence model.

5
Maximum Likelihood parameter estimation

The parameters of a probabilistic model are
typically estimated from large sets of trusted
examples, called training set.
Example (ttail, hhead) t t t h t h h t
Count up the frequencies t?5, h?3
Compute probabilities
p(t)5/(53), p(h)3/(53)
These are the Maximum Likelihood (ML) estimates
of the parameters of the coin.
Does it make sense?
What if you know the coin is fair?

6
Overfitting

A fair coin has probabilities p(t)0.5, p(h)0.5
If you throw it 3 times and get t, t, t, then
the ML estimates for this sequence are p(t)1,
p(h)0.
Consequently, from these estimates, the
probability of e.g. the sequence h, t, h, t
.
This is an example of what is called overfitting.
Overfitting is the greatest enemy of Machine
Learning!
Solution1 get more data
Solution2 build in what you already know into
the model. (Will return to it during the module)

7
Why is it called Maximum Likelihood?

It can be shown that using the frequencies to
compute probabilities maximises the total
probability of all the sequences given the model
(the likelihood). P(Dataparameters)

8
Probabilities

Have two dice D1 and D2
The probability of rolling I given die D1 is
called P(iD1). This is a conditional probability
Pick a die at random with probability P(Dj), j1
or 2. The probability for picking die Dj and
rolling i is is called joint probability and is
P(I,Dj)P(Dj)P(IDj).
For any events X and Y, P(X,Y)P(XY)P(Y)
If we know P(X,Y), then the so-called marginal
probability p(X) can be computed as

Now, we show that maximising P(Dataparameters)
for the random sequence model leads to the
frequency-based computation that we did
intuitively.

10
Why did we bother? Because in more complicated
models we cannot guess the result.
11
Markov Chains

Further examples of sequences
Bio-sequences
Web page request sequences while browsing
These are not anymore random sequences, but have
a time-structure.
How many parameters would such a model have?
We need to make simplifying assumptions to end up
with a reasonable number of parameters
The first order Markov assumption the
observation only depends on the immediately
previous one, no longer history
Markov Chain sequence model which makes the
Markov assumption

12
Markov Chains

The probability of a Markov sequence
The alphabets symbols are also called states
Once the parameters are estimated from training
data, the Markov chain can be used for prediction
Amongst others, Markov Chains are successful for
web browsing behavior prediction

13
Markov Chains

A Markov Chain is stationary if at any time, it
has the same transition probabilities.
We assume stationary models here.
Then the parameters of the model consist of the
transition probability matrix initial state
probabilities.

14
ML parameter estimation

We can derive how to compute the parameters of a
Markov Chain from data, using Maximum Likelihood,
as we did for random sequences.
The ML estimate of the transition matrix will be
again very intuitive

Remember that
15
Simple example

If it is raining today, it will rain tomorrow
with probability 0.8 ?implies the contrary has
probability 0.2
If it is not raining today, it will rain tomorrow
with probability 0.6 ?implies the contrary has
probability 0.4
Build the transition matrix
Be careful which numbers need to sum to one and
which dont. Such a matrix is called stochastic
matrix.
Q It rained all week, including today. What does
this model predict for tomorrow? Why? What does
it predict for a day from tomorrow? (Homework)

16
Examples of Web Applications

HTTP request prediction
To predict the probabilities of the next requests
from the same user based on the history of
requests from that client.
Adaptive Web navigation
To build a navigation agent which suggests which
other links would be of interest to the user
based on the statistics of previous visits.
The predicted link does not strictly have to be a
link present in the Web page currently being
viewed.
Tour generation
Is given as input the starting URL and generates
a sequence of states (or URLs) using the Markov
chain process.

17
Building Markov Models from Web Log Files

A Web log file is a collection of records of user
requests for documents on a Web site, an example
Transition matrix can be seen as a graph
Link pair (r - referrer, u - requested page, w
- hyperlink weight)
Link graph it is called the state diagram of the
MarkovChain
a directed weighted graph
a hierarchy from the homepage down to multiple
levels

177.21.3.4 - - 04/Apr/1999000111 0100 "GET
/studaffairs/ccampus.html HTTP/1.1" 200 5327
"http//www.ulst.ac.uk/studaffairs/accomm.html"
"Mozilla/4.0 (compatible MSIE 4.01 Windows 95)"
18
Link Graph an example (University of Ulster site)
Zhu et al. 2002
State diagram - Nodes states - Weighted arrows
number of transitions
19
Experimental Results(Sarukkai, 2000)

Simulations
Correct link refers to the actual link chosen
at the next step.
depth of the correct link is measured by
counting the umber of links which have a
probability greater than or equal to the correct
link.
Over 70 of correct links are in the top 20
scoring states.
Difficulties very large state space

20
Simple exercise

Build the Markov transition matrix of the
following sequence
a b b a c a b c b b d e e d e d e d
State space .

21
Further topics

Hidden Markov Model
Does not make the Markov assumption on the
observed sequence
Instead, it assumes that the observed sequence
was generated by another sequence which is
unobservable (hidden), and this other sequence is
assumed to be Markovian
More powerful
Estimation is more complicated
Aggregate Markov model
Useful for clustering sub-graphs of a transition
graph

22
HMM at an intuitive level

Suppose that we know all the parameters of the
following HMM, as shown on the state-diagram
below. What is the probability of observing the
sequence A,B if the initial state is S1? The
same question if the initial state is chosen
randomly with equal probabilities.

ANSWER If the initial state is S1
0.2(0.40.80.60.7) 0.148. In the second
case 0.50.1480.50.3(0.30.70.70.8)
0.1895.
23
Conclusions