Title: The Relative Entropy Rate of Two Hidden Markov Processes
1The Relative Entropy Rate of Two Hidden Markov
Processes
Or Zuk Dept. of Phys. Of Comp. Systems Weizmann
Inst. Of Science Rehovot, Israel
2Overview
- Introduction
- Distance Measures and Relative Entropy rate
- Results Generalization from Entropy Rate.
- Future Directions
3Introduction
- Hidden Markov Processes are relevant
- Error Correction (Markovian source noise)
- Signal Processing, Speech recognition
- Experimental physics -telegraph noise, TLSnoise,
quantum jumps. - Bioinformatics -biological sequences, gene
expression
Noise 10
t
Transmission
Markov chain
HMP
4HMP - Definitions
Models are denoted by ? and µ.
- Markov Process
- X Markov Process
- M? Transition Matrix
- m?(i,j) Pr(Xn1 j Xn i)
- Hidden Markov Process
- Y Noisy Observation of X
- R? Noise/Emission Matrix
- r?(i,j) Pr(Yn j Xn i)
5Example Binary HMP
Transition
Emission
6Example Binary HMP (Cont.)
- A simple, Symmetric Binary HMP
- M R
- All properties of the process depend on two
parameters, p and ?. Assume w.l.og. p, ? lt ½
7Overview
- Introduction
- Distance Measures and Relative Entropy rate
- Results Generalization from Entropy Rate.
- Future Directions
8Distance Measures for Two HMPs
- Why important ?
- Often, one learns a HMP from data. It is
important to know how different is the learned
model from the true model. - Sometimes, many HMPs may represent different
sources (e.g. different authors, different
protein families etc.), and we wish to know which
sources are similar. - What distance measure to use?
- Look at joint distributions of N consecutive Y
symbols P?(N) and Pµ(N) .
9Relative Entropy (RE) Rate
- Notation
- Relative Entropy for finite (N-symbol)
distributions -
-
-
- Take the limit to get the RE-rate
10Relative Entropy (RE) Rate
- First proposed for HMPs by JuangRabiner 85.
- Not a norm (not symmetric, no triangle
inequality). - Still it has several natural interpretations
- -If one generates data from ?, and gives
likelihood score to µ, then D(? µ) is the
average likelihood-loss per symbol (compared to
the optimal model ?). - -If one compresses data generated ?, assuming
erroneously it was generated by µ, then one
looses on average D(? µ) per symbol.
11Relative Entropy (RE) Rate
- For HMPs, D(? µ) is difficult to compute. So
far only bounds SilvaNarayanan or
approximation algorithms - Li et al. 05, Do 03, MohammadTranter 05 are
known. - D(? µ) generalizes the concept of the Shannon
entropy rate, using - H(?) log s D(? u)
- Where u is the uniform model, s is the alphabet
size of Y. - The entropy rate H for an HMP is a Lyapunov
Exponent, which is hard to compute generally.
Jacquet et al 04 - What is known (for H) ? Lyapunov exponent
representation, analyticity, asymptotic
expansions in different Regimes. - Generalize results and techniques to the RE-rate.
12Why is calculating D(? µ) difficult?
13Overview
- Introduction
- Distance Measures and Relative Entropy rate
- Results Generalization from Entropy Rate.
- Future Directions
14RE-Rate and Lyapunov Exponents
- What is Lyapunov exponent?
- Arises in Dynamical Systems, Control Theory,
Statistical Physics etc. Measures the stability
of the system. - Take two (square) matrices A,B. Choose each time
at random A (with prob. p) or B (w.p. 1-p). Look
at the norm - (1/N) log ABBBAABABBA
- The limit
- -Exists a.s. FurstenbergKesten 60
- -Called Top Lyaponov Exponent.
- -Independent of Matrix Norm chosen.
- HMP entropy rate is given as a Lyaponov Exponent
Jacquet et al. 04
15RE-Rate and Lyapunov Exponents
- What about RE-rate?
- Given as the difference of two Lyapunov
Exponents
-The Gs are random matrices, which are simply
obtained from M and R using the forward
equations. -Different matrices appear in the
two Lyapunov exponents, but the probabilities
selecting the matrices are the same.
16Analyticity of the RE-Rate
- Is the RE-rate continuous, smooth, or even
analytic in the parameters governing the HMPs? - For Lyapunov exponents Known analyticity in the
matrix entries Rulle 79, and their
probabilities Peres 90,91 separately. - For HMP entropy rate, analyticity was recently
shown by HanMarcus 05.
17Analyticity of the RE-Rate
- Using both results, we are able to show
- Thm The RE-rate is analytic in the HMPs
parameters. - Analyticity is shown only in the interior of the
parameters domain (i.e. strictly positive
probabilities). - Behavior on the boundaries is more complicated.
Sometimes analyticity remains on the boundaries
(and beyond). Sometimes we encounter
singularities. Full characterization is still
lacking MarcusHan 05.
18RE-Rate Taylor Series Expansion
- While in general the RE-rate is not known, there
are specific parameters values for which it is
easily given in closed-form (e.g. for
Markov-Chains). Perhaps we can expand around
these values, and get asymptotic results near
them. - Similar approach was used for Lyapunov exponents
Derrida, and for HMP entropy rate Jacquet et
al. 04, WeizmannOrdenlich 04, Zuk et al. 05
giving first-order asymptotics in various
regimes.
19Different Regimes Binary Case
- p -gt 0 , p -gt ½ (? fixed)
- ? -gt 0 , ? -gt ½ (p fixed)
-
-
- We concentrate on the High-SNR regime ? -gt 0,
and - almost-memoryless regime p-gt ½.
20RE-Rate Taylor Series Expansion
- In Zuk,Domany,KanterAizenman 06 we give a
procedure for calculating the full Taylor-Series
Expansion for the HMP entropy rate, in the High
SNR, and almost memoryless regime. - Main observation Finite systems give the
correct RE rate up to a given order - Was discovered using computer experiments
(symbolic computation in Maple). - Stronger result holds for the entropy rate
- (orders settle for N (k3)/2)
- Does not hold for any regime. For some regimes
(e.g. p-gt0), even first order never settles.
21Proof Outline (with M. Aizenman)
H(p,e) up to O(ek)
H(?)
D(?µ)
22Overview
- Introduction
- Distance Measures and Relative Entropy rate
- Results Generalization from Entropy Rate.
- Future Directions
23RE-Rate Taylor Series Expansion
- First order
- Higher orders were computed for the binary
symmetric case. - Similar results for the almost-memoryless
regime. - Radius of convergence seems larger for the latter
expansion, albeit no rigorous results are known.
24Future Directions
- Study other regimes. (e.g. two close models).
- Behavior of the EM algorithm.
- Generalizations (e.g. different alphabets sizes,
continuous case). - Physical realization of HMPs (mesoscopic systems,
quantum jumps) - Domain of Analyticity - Radius of convergence.
25Thanks
- Eytan Domany (Weizmann Inst.)
- Ido Kanter (Bar-Ilan Univ.)
- Michael Aizenman (Princeton Univ.)
- Libi Hertzberg (Weizmann Inst.)