Gibbs Sampling and Hidden Markov Models in the Event Detection Problem - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Gibbs Sampling and Hidden Markov Models in the Event Detection Problem

Description:

Title: Slide 1 Author: Mark Sobel Last modified by: Mark Sobel Created Date: 9/4/2006 1:18:24 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 29
Provided by: MarkSo153
Learn more at: http://astro.temple.edu
Category:

less

Transcript and Presenter's Notes

Title: Gibbs Sampling and Hidden Markov Models in the Event Detection Problem


1
Gibbs Sampling and Hidden Markov Models in the
Event Detection Problem
  • By Marc Sobel

2
Event Detection Problems
  • A process like e.g., traffic flow, crowd
    formation, or financial electronic transactions
    is unfolding in time. We can monitor and observe
    the flow frequencies at many fixed time points.
    Typically, there are many causes influencing
    changes in these frequencies.

3
Causes for Change
  • Possible causes for change include
  • a) changes due to noise i.e., those
    best modeled by e.g., a Gaussian error
    distribution.
  • b) periodic changes i.e., those
    expected to happen over periodic intervals.
  • c) changes not due to either of the
    above these are usually the changes we would
    like to detect.

4
Examples
  • Examples include
  • 1) Detecting Events, which are not
    pre-planned, involving large numbers of people at
    a particular location.
  • 2) Detecting Fraudulent transactions.
  • We observe a variety of electronic
    transactions at many time intervals. We would
    like to detect when the number of transactions is
    significantly different from what is expected.

5
Model for Changes Due to Noise or Periodic
Changes or other
  • We model changes in flow frequency due to
    all possible known causes. This is done using
    latent Poisson processes.

  • The frequency count N(t) at time t is observed.
    N0(t), NE(t) are independent latent Poisson
    processes.
  • N0(t) denotes the frequency due to periodic and
    noise changes at time t. We use the
    terminology, ?(t) for the average rate for such
    changes. We write N0(t)P(N,?(t)) in this case.
  • NE(t) denotes the frequency due to causes
    different from periodic and noise changes. It
    has a rate function ?(t). We write,
    NE(t)P(N,?(t)) in this case.
  • The rate function ?(t) is regressed on a
    parametric function of the periodic effects as
    follows

6
The Process N0(t)
  • We focus on the first example given above and
    consider the problem of modeling the frequencies
    of people entering a building with the eventual
    purpose of modeling special events connected
    with these frequencies.
  • We let
  • a) d stand for day,
  • b) hh for half hour time interval.
  • c) b for base

7
Rate Function Due to Periodic and Noise Changes.
  • The rate function due to periodic and noise
    changes is

8
Rate Function Explained
  • This makes sense because we are thinking of the
    fact that, for a time t in day d and
    half-hour period h, we have (by Bayes rule)
  • In the sequel, we assume time t has been broken
    up into half-hour periods without re-indexing.

9
Example Work week
  • Say you worked 21 hours on average per week. So
    your base work rate (per week) is ?b 21.
  • Therefore your daily base work rate is 3.
  • Your average work rate for Sunday relative to
    this base is ?Sunday (total Sunday rate)/3.
  • The sum of your work rates for Sunday, Saturday
    is ?Sunday ?Saturday7

10
Modeling Occasional changes in event flow and
Noise
  • Where does the noise come in? How do we model
    occasional changes in the periodic rate
    parameters? The missing piece is (dramatic
    pause) ??????!!!!!!!!!

11
Priors Come to the Rescue
  • Priors serve the purpose of modeling noise
  • and occasional changes in
  • the values of parameters.








  • Thus
    spake the prior
  • The base parameter is given a gamma prior,
  • ?base p(?)?a-1 da exp(-?d)/G(a)
  • By flexibly assigning values to the
    hyperparameter a,d we can biuld a distribution
    which properly characterizes the base rate.

12
Interpretation
  • The ?day s, being conditional rates, satisfy,
  • ??day ?(average day i totals)/?base7
  • Similarly, summing over periods,
  • ??jth period during day i
  • ?(average jth period frequency in day
    i)/?day iD
  • where D stands for the number of half hour
    intervals in a day.

13
A Simple example illustrating the issue.
  • What do these mean. Assume, for purposes of
    simplification, that there are a total of 2 days
    in a week Sunday and Monday. Daily rates are
    measured in events per day.
  • The base rate is the average rate for
    sundays and mondays combined.
  • A) The Sunday and Monday relative rates add up to
    2.
  • B) If we observe 10 people (total) on sunday and
    30 on Monday, over a total of 10 weeks.

14
(continued)
  • C) maximum likelihood dictates estimating the
    base rate (per week) as (40/10)4 people per week
    (or 2 people per day), sundays relative rate is
  • (10 (people)/210 (weeks)) .5 and Mondays
    relative rate is 1.5.
  • D) But, this is incorrect because,
  • (i) it turns out that one week out of
    10, the conditional Monday rate shoots up to
    1.90, while the Sunday rate decreases to 0.1.
  • (ii) it turns out that usually, the
    conditional Sunday rate is 1 rather than .5.

15
The Bayesian formulation wins out
  • We can biuld a model with this new information by
    assuming a beta prior for half the Monday,Sunday
    relative rates
  • (.5)?sunday ?(.66-1)(1-?)(.66-1)/ß(.66,.66)
  • This prior has the required properties that
    the Sunday rate dips down to .10 about 10 percent
    of the time, but averages 1 over the entire
    interval.

16
The Failure of classical theory
  • The MLE of ?sunday is
  • The Bayes estimator of ?sunday is
  • But even more importantly, the posterior
    distribution of the parameter provides
    information useful to all other inference and
    prediction in the problem.
  • Medicare for classical statistics?

17
illustration
  • Posterior distribution for the twice the Sunday
    frequency rate.
  • 0 .5 .52
    1

18
Actual Priors Used
  • For our example, we have seven rather than 2 days
    in a week. We use scaled Dirichlet priors
    (extensions of beta priors) for this
  • Smaller alphas indicate smaller apriori relative
    frequencies. Smaller sum of alphas indicate
    greater relative frequency variance for the ps.
  • This provides a flexible way to model the daily
    rates.

19
Events The Process NE
  • Events signify times during which there are
    higher frequencies which are not due to periodic
    or noise causes. We can model this by assuming
    z(t)1 for such events and z(t)0 if not.
  • P(z(t)1z(t-1)0) 1-z00
  • P(z(t)0z(t-1)0) z00
  • P(z(t)1z(t-1)1)z11
  • P(z(t)0z(t-1)1)1-z11
  • i.e., if there is no event at time t-1, the
    chance of an event at time t is 1-z00

20
The need for a Bayesian treatment for events
  • This gives a latent geometric distribution.
    Assume z00.8 z11.1. Then non-events tend to
    last an average of (1/.2)5 half-hours while
    events tend to last an average of (1/.9)1.11
    half-hours. Classical statistics would dictate
    direct estimation of the zs but note that this
    says nothing about the tendencies of events to
    exhibit non-average behavior. It doesnt provide
    information about prediction and estimation.

21
Priors for event probabilities
  • Beta distributions priors for the zs.
  • z00 z00a0-1 (1-z00)b0-1 and
    z11 analogously.
  • This characterizes the behavior of the
    underlying latent process. The hyperparameters
    a,b are designed to model that behavior.
  • Recall that N0(t) (the non-event process)
    characterizes periodic and noise changes. The
    event process NE(t) characterizes other changes.
  • NE(t) is 0 if z(t)0 and Poisson with rate
    ?(t) if z(t)1.
  • So, if there is no event, NN0(t). If there
    is an event, the frequency due to periodic or
    noise changes is NN0(t)NE(t)
  • The rate ?(t) is itself gamma with parameters
    aE and bE. Hence it is marginally negative
    binomial with p(bE/(1bE) and nN.

22
Gibbs Sampling
  • Gibbs sampling works by simulating each
    parameter/latent variable conditional on all the
    rest. The ?s are parameters and the zs,Ns are
    the latent variables. The resulting simulated
    values have an empirical distribution similar to
    the true posterior distribution. It works as a
    result of the fact that the joint distribution of
    parameters is determined by the set of all such
    conditional distributions.

23
Gibbs Sampling
  • Given z(t)0 and the remaining parameters, Put
    N0(t)N(t) and NE(t)0.
  • If z(t)1, simulate NE(t) as negative binomial
    with parameters, N(t) and bE/(1bE). Put
    N0(t)N(t)-NE(t).
  • To simulate z(t), define

24
More of Gibbs Sampling
  • Then, if the previous state was 0, we get

25
Gibbs Sampling (Continued)
  • Having simulated z(t), we can simulate the
    parameters as follows
  • Where Nday denotes the number of day units in
    the data, Nhh denotes the number of hh
    periods in the data.

26
Gibbs Sampling (conclusion)
  • We can simulate from the remaining conditional
    posterior distributions using standard MCMC
    techniques.

27
  • END
  • Thank You

28
Polya Tree Priors
  • A more general methodology for introducing
    multiple prior levels is through Polya Tree
    Priors. (see Michael Lavine). For these priors,
    we divide the time interval (e.g., a week) into
    parts with relative frequencies p1,,pk. p
    has a dirichlet distribution. Given p, we further
    divide up each of the time interval parts into
    parts with corresponding conditional dirichlet
    distributions. We can continue to subdivide
    until it is no longer useful.
Write a Comment
User Comments (0)
About PowerShow.com