Gibbs Sampling and Hidden Markov Models in the Event Detection Problem - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Gibbs Sampling and Hidden Markov Models in the Event Detection Problem

Description:

Title: Slide 1 Author: Mark Sobel Last modified by: Mark Sobel Created Date: 9/4/2006 1:18:24 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 29

Provided by: MarkSo153

Learn more at: http://astro.temple.edu

Category:

more less

Transcript and Presenter's Notes

Title: Gibbs Sampling and Hidden Markov Models in the Event Detection Problem

1
Gibbs Sampling and Hidden Markov Models in the
Event Detection Problem

By Marc Sobel

2
Event Detection Problems

A process like e.g., traffic flow, crowd
formation, or financial electronic transactions
is unfolding in time. We can monitor and observe
the flow frequencies at many fixed time points.
Typically, there are many causes influencing
changes in these frequencies.

3
Causes for Change

Possible causes for change include
a) changes due to noise i.e., those
best modeled by e.g., a Gaussian error
distribution.
b) periodic changes i.e., those
expected to happen over periodic intervals.
c) changes not due to either of the
above these are usually the changes we would
like to detect.

4
Examples

Examples include
1) Detecting Events, which are not
pre-planned, involving large numbers of people at
a particular location.
2) Detecting Fraudulent transactions.
We observe a variety of electronic
transactions at many time intervals. We would
like to detect when the number of transactions is
significantly different from what is expected.

5
Model for Changes Due to Noise or Periodic
Changes or other

We model changes in flow frequency due to
all possible known causes. This is done using
latent Poisson processes.
The frequency count N(t) at time t is observed.
N0(t), NE(t) are independent latent Poisson
processes.
N0(t) denotes the frequency due to periodic and
noise changes at time t. We use the
terminology, ?(t) for the average rate for such
changes. We write N0(t)P(N,?(t)) in this case.
NE(t) denotes the frequency due to causes
different from periodic and noise changes. It
has a rate function ?(t). We write,
NE(t)P(N,?(t)) in this case.
The rate function ?(t) is regressed on a
parametric function of the periodic effects as
follows

6
The Process N0(t)

We focus on the first example given above and
consider the problem of modeling the frequencies
of people entering a building with the eventual
purpose of modeling special events connected
with these frequencies.
We let
a) d stand for day,
b) hh for half hour time interval.
c) b for base

7
Rate Function Due to Periodic and Noise Changes.

The rate function due to periodic and noise
changes is

8
Rate Function Explained

This makes sense because we are thinking of the
fact that, for a time t in day d and
half-hour period h, we have (by Bayes rule)
In the sequel, we assume time t has been broken
up into half-hour periods without re-indexing.

9
Example Work week

Say you worked 21 hours on average per week. So
your base work rate (per week) is ?b 21.
Therefore your daily base work rate is 3.
Your average work rate for Sunday relative to
this base is ?Sunday (total Sunday rate)/3.
The sum of your work rates for Sunday, Saturday
is ?Sunday ?Saturday7

10
Modeling Occasional changes in event flow and
Noise

Where does the noise come in? How do we model
occasional changes in the periodic rate
parameters? The missing piece is (dramatic
pause) ??????!!!!!!!!!

11
Priors Come to the Rescue

Priors serve the purpose of modeling noise
and occasional changes in
the values of parameters.
Thus
spake the prior
The base parameter is given a gamma prior,
?base p(?)?a-1 da exp(-?d)/G(a)
By flexibly assigning values to the
hyperparameter a,d we can biuld a distribution
which properly characterizes the base rate.

12
Interpretation

The ?day s, being conditional rates, satisfy,
??day ?(average day i totals)/?base7
Similarly, summing over periods,
??jth period during day i
?(average jth period frequency in day
i)/?day iD
where D stands for the number of half hour
intervals in a day.

13
A Simple example illustrating the issue.

What do these mean. Assume, for purposes of
simplification, that there are a total of 2 days
in a week Sunday and Monday. Daily rates are
measured in events per day.
The base rate is the average rate for
sundays and mondays combined.
A) The Sunday and Monday relative rates add up to
2.
B) If we observe 10 people (total) on sunday and
30 on Monday, over a total of 10 weeks.

14
(continued)

C) maximum likelihood dictates estimating the
base rate (per week) as (40/10)4 people per week
(or 2 people per day), sundays relative rate is
(10 (people)/210 (weeks)) .5 and Mondays
relative rate is 1.5.
D) But, this is incorrect because,
(i) it turns out that one week out of
10, the conditional Monday rate shoots up to
1.90, while the Sunday rate decreases to 0.1.
(ii) it turns out that usually, the
conditional Sunday rate is 1 rather than .5.

15
The Bayesian formulation wins out

We can biuld a model with this new information by
assuming a beta prior for half the Monday,Sunday
relative rates
(.5)?sunday ?(.66-1)(1-?)(.66-1)/ß(.66,.66)
This prior has the required properties that
the Sunday rate dips down to .10 about 10 percent
of the time, but averages 1 over the entire
interval.

16
The Failure of classical theory

The MLE of ?sunday is
The Bayes estimator of ?sunday is
But even more importantly, the posterior
distribution of the parameter provides
information useful to all other inference and
prediction in the problem.
Medicare for classical statistics?

17
illustration

Posterior distribution for the twice the Sunday
frequency rate.
0 .5 .52
1

18
Actual Priors Used

For our example, we have seven rather than 2 days
in a week. We use scaled Dirichlet priors
(extensions of beta priors) for this
Smaller alphas indicate smaller apriori relative
frequencies. Smaller sum of alphas indicate
greater relative frequency variance for the ps.
This provides a flexible way to model the daily
rates.

19
Events The Process NE

Events signify times during which there are
higher frequencies which are not due to periodic
or noise causes. We can model this by assuming
z(t)1 for such events and z(t)0 if not.
P(z(t)1z(t-1)0) 1-z00
P(z(t)0z(t-1)0) z00
P(z(t)1z(t-1)1)z11
P(z(t)0z(t-1)1)1-z11
i.e., if there is no event at time t-1, the
chance of an event at time t is 1-z00

20
The need for a Bayesian treatment for events

This gives a latent geometric distribution.
Assume z00.8 z11.1. Then non-events tend to
last an average of (1/.2)5 half-hours while
events tend to last an average of (1/.9)1.11
half-hours. Classical statistics would dictate
direct estimation of the zs but note that this
says nothing about the tendencies of events to
exhibit non-average behavior. It doesnt provide
information about prediction and estimation.

21
Priors for event probabilities

Beta distributions priors for the zs.
z00 z00a0-1 (1-z00)b0-1 and
z11 analogously.
This characterizes the behavior of the
underlying latent process. The hyperparameters
a,b are designed to model that behavior.
Recall that N0(t) (the non-event process)
characterizes periodic and noise changes. The
event process NE(t) characterizes other changes.
NE(t) is 0 if z(t)0 and Poisson with rate
?(t) if z(t)1.
So, if there is no event, NN0(t). If there
is an event, the frequency due to periodic or
noise changes is NN0(t)NE(t)
The rate ?(t) is itself gamma with parameters
aE and bE. Hence it is marginally negative
binomial with p(bE/(1bE) and nN.

22
Gibbs Sampling

Gibbs sampling works by simulating each
parameter/latent variable conditional on all the
rest. The ?s are parameters and the zs,Ns are
the latent variables. The resulting simulated
values have an empirical distribution similar to
the true posterior distribution. It works as a
result of the fact that the joint distribution of
parameters is determined by the set of all such
conditional distributions.

23
Gibbs Sampling

Given z(t)0 and the remaining parameters, Put
N0(t)N(t) and NE(t)0.
If z(t)1, simulate NE(t) as negative binomial
with parameters, N(t) and bE/(1bE). Put
N0(t)N(t)-NE(t).
To simulate z(t), define

24
More of Gibbs Sampling

Then, if the previous state was 0, we get

25
Gibbs Sampling (Continued)

Having simulated z(t), we can simulate the
parameters as follows
Where Nday denotes the number of day units in
the data, Nhh denotes the number of hh
periods in the data.

26
Gibbs Sampling (conclusion)

We can simulate from the remaining conditional
posterior distributions using standard MCMC
techniques.

END
Thank You

28
Polya Tree Priors

A more general methodology for introducing
multiple prior levels is through Polya Tree
Priors. (see Michael Lavine). For these priors,
we divide the time interval (e.g., a week) into
parts with relative frequencies p1,,pk. p
has a dirichlet distribution. Given p, we further
divide up each of the time interval parts into
parts with corresponding conditional dirichlet
distributions. We can continue to subdivide
until it is no longer useful.