Title: Forgetting Counts : Constant Memory Inference for a Dependent Hierarchical Pitman-Yor Process
1Forgetting Counts Constant Memory Inference for
a Dependent Hierarchical Pitman-Yor Process
- Nicholas Bartlett, David Pfau, Frank Wood
- Presented by Yingjian Wang
- Nov. 17, 2010
2Outline
- Background
- The sequential memoizer
- Forgetting
- The dependent HPY
- Experiment results
3Background
2006,Teh, A hierarchical Bayesian language model
based on Pitman-Yor processes
N-gram Markov chain language model with the HPY
prior.
The Sequential Memoizer (SM) with linear
space/time inference scheme. (lossless)
2009, Wood, A Stochastic Memoizer for Sequence
Data
Combine the SM with an arithmetic coder to
develop a compressor (PLUMP/dePLUMP), see
www.deplump.com.
2010, Gasthaus, Lossless compression based
on the Sequence Memoizer
2010, Bartlett, Forgetting Counts Constant
Memory Inference for a Dependent HPY
Develop a constant memory/space inference for the
SM, by using a dependent HPY. (with loss)
4SM-Two concepts
- Memoizer (Donald Michie, 1968) A device which
- returns former results under the same input
instead of recalculating in order to save time.
- Stochastic Memoizer (Wood, 2009) The returned
results can change since the prediction
probability is based upon a stochastic process.
5SM-model and trie
- model
- The prefix trie
- restaurants.
6SM-the NSP (1)
- The Normalized Stable Process (Perman, 1990)
Pitman-Yor Process
Concentration parameter c0
Discount parameter d0
Dirichlet Process
A Normalized Stable Process
7SM-the NSP (2)
- Collapse the middle restaurants
- Theorem
- If
- Then
- Prefix tree
- restaurants
- (Weiner, 1973
- Ukkonen, 1995)
8SM-linear space inference
9Forgetting
- Motivation to achieve constant memory inference
on the basis of SM. How to do? --- - Methods Forgetting/delete the restaurants.
- Restaurants - the basic memory units in the
context tree - How to delete? two deletion schemes random
deletion greedy deleting.
10Deletion schemes
- Random deletion uniformly delete one leaf
restaurant. - Greedy deletion least negatively impacts the
estimated likelihood of the observed sequence.
Leaf restaurants
11The SMC algorithm
12The dependent HPY
- But wait, what we get after the
deletion-addition? Will the processes be
independent? No (Since the seating arrangement
in the parent restaurant has been changed.)
13The experiment results