Title: Gibbs Sampling and Hidden Markov Models in the Event Detection Problem
1Gibbs Sampling and Hidden Markov Models in the
Event Detection Problem
2Event Detection Problems
- A process like e.g., traffic flow, crowd
formation, or financial electronic transactions
is unfolding in time. We can monitor and observe
the flow frequencies at many fixed time points.
Typically, there are many causes influencing
changes in these frequencies.
3Causes for Change
- Possible causes for change include
- a) changes due to noise i.e., those
best modeled by e.g., a Gaussian error
distribution. - b) periodic changes i.e., those
expected to happen over periodic intervals. - c) changes not due to either of the
above these are usually the changes we would
like to detect.
4Examples
- Examples include
- 1) Detecting Events, which are not
pre-planned, involving large numbers of people at
a particular location. - 2) Detecting Fraudulent transactions.
- We observe a variety of electronic
transactions at many time intervals. We would
like to detect when the number of transactions is
significantly different from what is expected.
5Model for Changes Due to Noise or Periodic
Changes or other
- We model changes in flow frequency due to
all possible known causes. This is done using
latent Poisson processes. -
- The frequency count N(t) at time t is observed.
N0(t), NE(t) are independent latent Poisson
processes. - N0(t) denotes the frequency due to periodic and
noise changes at time t. We use the
terminology, ?(t) for the average rate for such
changes. We write N0(t)P(N,?(t)) in this case. - NE(t) denotes the frequency due to causes
different from periodic and noise changes. It
has a rate function ?(t). We write,
NE(t)P(N,?(t)) in this case. - The rate function ?(t) is regressed on a
parametric function of the periodic effects as
follows -
6The Process N0(t)
- We focus on the first example given above and
consider the problem of modeling the frequencies
of people entering a building with the eventual
purpose of modeling special events connected
with these frequencies. - We let
- a) d stand for day,
- b) hh for half hour time interval.
- c) b for base
7Rate Function Due to Periodic and Noise Changes.
- The rate function due to periodic and noise
changes is
8Rate Function Explained
- This makes sense because we are thinking of the
fact that, for a time t in day d and
half-hour period h, we have (by Bayes rule) - In the sequel, we assume time t has been broken
up into half-hour periods without re-indexing.
9Example Work week
- Say you worked 21 hours on average per week. So
your base work rate (per week) is ?b 21. - Therefore your daily base work rate is 3.
- Your average work rate for Sunday relative to
this base is ?Sunday (total Sunday rate)/3. - The sum of your work rates for Sunday, Saturday
is ?Sunday ?Saturday7
10Modeling Occasional changes in event flow and
Noise
- Where does the noise come in? How do we model
occasional changes in the periodic rate
parameters? The missing piece is (dramatic
pause) ??????!!!!!!!!!
11Priors Come to the Rescue
- Priors serve the purpose of modeling noise
- and occasional changes in
- the values of parameters.
-
-
Thus
spake the prior - The base parameter is given a gamma prior,
- ?base p(?)?a-1 da exp(-?d)/G(a)
- By flexibly assigning values to the
hyperparameter a,d we can biuld a distribution
which properly characterizes the base rate. -
12Interpretation
- The ?day s, being conditional rates, satisfy,
- ??day ?(average day i totals)/?base7
- Similarly, summing over periods,
- ??jth period during day i
- ?(average jth period frequency in day
i)/?day iD - where D stands for the number of half hour
intervals in a day.
13A Simple example illustrating the issue.
- What do these mean. Assume, for purposes of
simplification, that there are a total of 2 days
in a week Sunday and Monday. Daily rates are
measured in events per day. - The base rate is the average rate for
sundays and mondays combined. - A) The Sunday and Monday relative rates add up to
2. - B) If we observe 10 people (total) on sunday and
30 on Monday, over a total of 10 weeks.
14(continued)
- C) maximum likelihood dictates estimating the
base rate (per week) as (40/10)4 people per week
(or 2 people per day), sundays relative rate is - (10 (people)/210 (weeks)) .5 and Mondays
relative rate is 1.5. - D) But, this is incorrect because,
- (i) it turns out that one week out of
10, the conditional Monday rate shoots up to
1.90, while the Sunday rate decreases to 0.1. - (ii) it turns out that usually, the
conditional Sunday rate is 1 rather than .5.
15The Bayesian formulation wins out
- We can biuld a model with this new information by
assuming a beta prior for half the Monday,Sunday
relative rates - (.5)?sunday ?(.66-1)(1-?)(.66-1)/ß(.66,.66)
-
- This prior has the required properties that
the Sunday rate dips down to .10 about 10 percent
of the time, but averages 1 over the entire
interval.
16The Failure of classical theory
- The MLE of ?sunday is
- The Bayes estimator of ?sunday is
- But even more importantly, the posterior
distribution of the parameter provides
information useful to all other inference and
prediction in the problem. - Medicare for classical statistics?
17illustration
- Posterior distribution for the twice the Sunday
frequency rate. - 0 .5 .52
1
18Actual Priors Used
- For our example, we have seven rather than 2 days
in a week. We use scaled Dirichlet priors
(extensions of beta priors) for this - Smaller alphas indicate smaller apriori relative
frequencies. Smaller sum of alphas indicate
greater relative frequency variance for the ps. - This provides a flexible way to model the daily
rates.
19Events The Process NE
- Events signify times during which there are
higher frequencies which are not due to periodic
or noise causes. We can model this by assuming
z(t)1 for such events and z(t)0 if not. - P(z(t)1z(t-1)0) 1-z00
- P(z(t)0z(t-1)0) z00
- P(z(t)1z(t-1)1)z11
- P(z(t)0z(t-1)1)1-z11
- i.e., if there is no event at time t-1, the
chance of an event at time t is 1-z00
20The need for a Bayesian treatment for events
- This gives a latent geometric distribution.
Assume z00.8 z11.1. Then non-events tend to
last an average of (1/.2)5 half-hours while
events tend to last an average of (1/.9)1.11
half-hours. Classical statistics would dictate
direct estimation of the zs but note that this
says nothing about the tendencies of events to
exhibit non-average behavior. It doesnt provide
information about prediction and estimation.
21Priors for event probabilities
- Beta distributions priors for the zs.
- z00 z00a0-1 (1-z00)b0-1 and
z11 analogously. - This characterizes the behavior of the
underlying latent process. The hyperparameters
a,b are designed to model that behavior. - Recall that N0(t) (the non-event process)
characterizes periodic and noise changes. The
event process NE(t) characterizes other changes. -
- NE(t) is 0 if z(t)0 and Poisson with rate
?(t) if z(t)1. -
- So, if there is no event, NN0(t). If there
is an event, the frequency due to periodic or
noise changes is NN0(t)NE(t) - The rate ?(t) is itself gamma with parameters
aE and bE. Hence it is marginally negative
binomial with p(bE/(1bE) and nN.
22Gibbs Sampling
- Gibbs sampling works by simulating each
parameter/latent variable conditional on all the
rest. The ?s are parameters and the zs,Ns are
the latent variables. The resulting simulated
values have an empirical distribution similar to
the true posterior distribution. It works as a
result of the fact that the joint distribution of
parameters is determined by the set of all such
conditional distributions.
23Gibbs Sampling
- Given z(t)0 and the remaining parameters, Put
N0(t)N(t) and NE(t)0. - If z(t)1, simulate NE(t) as negative binomial
with parameters, N(t) and bE/(1bE). Put
N0(t)N(t)-NE(t). - To simulate z(t), define
24More of Gibbs Sampling
- Then, if the previous state was 0, we get
25Gibbs Sampling (Continued)
- Having simulated z(t), we can simulate the
parameters as follows - Where Nday denotes the number of day units in
the data, Nhh denotes the number of hh
periods in the data.
26Gibbs Sampling (conclusion)
- We can simulate from the remaining conditional
posterior distributions using standard MCMC
techniques.
27 28Polya Tree Priors
- A more general methodology for introducing
multiple prior levels is through Polya Tree
Priors. (see Michael Lavine). For these priors,
we divide the time interval (e.g., a week) into
parts with relative frequencies p1,,pk. p
has a dirichlet distribution. Given p, we further
divide up each of the time interval parts into
parts with corresponding conditional dirichlet
distributions. We can continue to subdivide
until it is no longer useful.