Probabilistic Model of Sequences - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Probabilistic Model of Sequences

Description:

Example1: a b a c a b a b a c. Example2: 1 0 0 1 1 0 1 0 0 1 ... link' is measured by counting the umber of links which have a probability ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 25
Provided by: axk
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Model of Sequences


1
Probabilistic Model of Sequences
  • Ata Kaban
  • The University of Birmingham

2
Sequence
  • Example1 a b a c a b a b a c
  • Example2 1 0 0 1 1 0 1 0 0 1
  • Example3 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3
  • Roll a six-sided die N times. You get a sequence.
  • Roll it again You get another sequence.
  • Here is a sequence of characters, can you see it?
  • What is a sequence?
  • Alphabet1 a,b,c, Alphabet20,1,
    Alphabet31,2,3,4,5,6

3
Probabilistic Model
  • Model system that simulates the sequence under
    consideration
  • Probabilistic model model that produces
    different outcomes with different probabilities
  • It includes uncertainty
  • It can therefore simulate a whole class of
    sequences assigns a probability to each
    individual sequence
  • Could you simulate any of the sequences on the
    previous slide?

4
Random sequence model
  • Back to the die example (can possibly be loaded)
  • Model of a roll has 6 parameters
    p1,p2,p3,p4,p5,p6
  • Here, p_i is the probability of throwing i
  • To be probabilities, these must be non-negative
    and must sum to one.
  • What is the probability of the sequence 1, 6,
    3?
  • p1p6p3
  • NOTE in the random sequence model, the
    individual symbols in a sequence do not depend on
    each other. This is the simplest sequence model.

5
Maximum Likelihood parameter estimation
  • The parameters of a probabilistic model are
    typically estimated from large sets of trusted
    examples, called training set.
  • Example (ttail, hhead) t t t h t h h t
  • Count up the frequencies t?5, h?3
  • Compute probabilities
  • p(t)5/(53), p(h)3/(53)
  • These are the Maximum Likelihood (ML) estimates
    of the parameters of the coin.
  • Does it make sense?
  • What if you know the coin is fair?

6
Overfitting
  • A fair coin has probabilities p(t)0.5, p(h)0.5
  • If you throw it 3 times and get t, t, t, then
    the ML estimates for this sequence are p(t)1,
    p(h)0.
  • Consequently, from these estimates, the
    probability of e.g. the sequence h, t, h, t
    .
  • This is an example of what is called overfitting.
    Overfitting is the greatest enemy of Machine
    Learning!
  • Solution1 get more data
  • Solution2 build in what you already know into
    the model. (Will return to it during the module)

7
Why is it called Maximum Likelihood?
  • It can be shown that using the frequencies to
    compute probabilities maximises the total
    probability of all the sequences given the model
    (the likelihood). P(Dataparameters)

8
Probabilities
  • Have two dice D1 and D2
  • The probability of rolling I given die D1 is
    called P(iD1). This is a conditional probability
  • Pick a die at random with probability P(Dj), j1
    or 2. The probability for picking die Dj and
    rolling i is is called joint probability and is
    P(I,Dj)P(Dj)P(IDj).
  • For any events X and Y, P(X,Y)P(XY)P(Y)
  • If we know P(X,Y), then the so-called marginal
    probability p(X) can be computed as

9
  • Now, we show that maximising P(Dataparameters)
    for the random sequence model leads to the
    frequency-based computation that we did
    intuitively.

10
Why did we bother? Because in more complicated
models we cannot guess the result.
11
Markov Chains
  • Further examples of sequences
  • Bio-sequences
  • Web page request sequences while browsing
  • These are not anymore random sequences, but have
    a time-structure.
  • How many parameters would such a model have?
  • We need to make simplifying assumptions to end up
    with a reasonable number of parameters
  • The first order Markov assumption the
    observation only depends on the immediately
    previous one, no longer history
  • Markov Chain sequence model which makes the
    Markov assumption

12
Markov Chains
  • The probability of a Markov sequence
  • The alphabets symbols are also called states
  • Once the parameters are estimated from training
    data, the Markov chain can be used for prediction
  • Amongst others, Markov Chains are successful for
    web browsing behavior prediction

13
Markov Chains
  • A Markov Chain is stationary if at any time, it
    has the same transition probabilities.
  • We assume stationary models here.
  • Then the parameters of the model consist of the
    transition probability matrix initial state
    probabilities.

14
ML parameter estimation
  • We can derive how to compute the parameters of a
    Markov Chain from data, using Maximum Likelihood,
    as we did for random sequences.
  • The ML estimate of the transition matrix will be
    again very intuitive

Remember that
15
Simple example
  • If it is raining today, it will rain tomorrow
    with probability 0.8 ?implies the contrary has
    probability 0.2
  • If it is not raining today, it will rain tomorrow
    with probability 0.6 ?implies the contrary has
    probability 0.4
  • Build the transition matrix
  • Be careful which numbers need to sum to one and
    which dont. Such a matrix is called stochastic
    matrix.
  • Q It rained all week, including today. What does
    this model predict for tomorrow? Why? What does
    it predict for a day from tomorrow? (Homework)

16
Examples of Web Applications
  • HTTP request prediction
  • To predict the probabilities of the next requests
    from the same user based on the history of
    requests from that client.
  • Adaptive Web navigation
  • To build a navigation agent which suggests which
    other links would be of interest to the user
    based on the statistics of previous visits.
  • The predicted link does not strictly have to be a
    link present in the Web page currently being
    viewed.
  • Tour generation
  • Is given as input the starting URL and generates
    a sequence of states (or URLs) using the Markov
    chain process.

17
Building Markov Models from Web Log Files
  • A Web log file is a collection of records of user
    requests for documents on a Web site, an example
  • Transition matrix can be seen as a graph
  • Link pair (r - referrer, u - requested page, w
    - hyperlink weight)
  • Link graph it is called the state diagram of the
    MarkovChain
  • a directed weighted graph
  • a hierarchy from the homepage down to multiple
    levels

177.21.3.4 - - 04/Apr/1999000111 0100 "GET
/studaffairs/ccampus.html HTTP/1.1" 200 5327
"http//www.ulst.ac.uk/studaffairs/accomm.html"
"Mozilla/4.0 (compatible MSIE 4.01 Windows 95)"
18
Link Graph an example (University of Ulster site)
Zhu et al. 2002
State diagram - Nodes states - Weighted arrows
number of transitions
19
Experimental Results(Sarukkai, 2000)
  • Simulations
  • Correct link refers to the actual link chosen
    at the next step.
  • depth of the correct link is measured by
    counting the umber of links which have a
    probability greater than or equal to the correct
    link.
  • Over 70 of correct links are in the top 20
    scoring states.
  • Difficulties very large state space

20
Simple exercise
  • Build the Markov transition matrix of the
    following sequence
  • a b b a c a b c b b d e e d e d e d
  • State space .

21
Further topics
  • Hidden Markov Model
  • Does not make the Markov assumption on the
    observed sequence
  • Instead, it assumes that the observed sequence
    was generated by another sequence which is
    unobservable (hidden), and this other sequence is
    assumed to be Markovian
  • More powerful
  • Estimation is more complicated
  • Aggregate Markov model
  • Useful for clustering sub-graphs of a transition
    graph

22
HMM at an intuitive level
  • Suppose that we know all the parameters of the
    following HMM, as shown on the state-diagram
    below. What is the probability of observing the
    sequence A,B if the initial state is S1? The
    same question if the initial state is chosen
    randomly with equal probabilities.

ANSWER If the initial state is S1
0.2(0.40.80.60.7) 0.148. In the second
case 0.50.1480.50.3(0.30.70.70.8)
0.1895.
23
Conclusions
  • Probabilistic Model
  • Maximum Likelihood parameter estimation
  • Random sequence model
  • Markov chain model
  • ---------------------------------
  • Hidden Markov Model
  • Aggregate Markov Model

24
Any questions?
Write a Comment
User Comments (0)
About PowerShow.com