Title: Hidden Process Models
1Hidden Process Models
- Rebecca Hutchinson
- May 26, 2006
- Thesis Proposal
- Carnegie Mellon University
- Computer Science Department
2Talk Outline
- Motivation fMRI (functional Magnetic Resonance
Imaging) data. - Problem new kind of probabilistic time series
modeling. - Solution Hidden Process Models (HPMs).
- Results preliminary experiments with HPMs.
- Extensions proposed improvements to HPMs.
3fMRI Data High-Dimensional and Sparse
- Imaged once per second for 15-20 minutes
- Only a few dozen trials (i.e. training examples)
- 10,000-15,000 voxels per image
4The Hemodynamic Response
- fMRI measures an indirect, temporally blurred
correlate of neural activity. - Also called BOLD response Blood Oxygen Level
Dependent.
Signal Amplitude
Subject reads a word and indicates whether it is
a noun or verb in less than a second.
Time (seconds)
5Study Pictures and Sentences
Press Button
View Picture
Read Sentence
Read Sentence
View Picture
Fixation
Rest
4 sec.
8 sec.
t0
- Task Decide whether sentence describes picture
correctly, indicate with button press. - 13 normal subjects, 40 trials per subject.
- Sentences and pictures describe 3 symbols , ,
and , using above, below, not above, not
below. - Images are acquired every 0.5 seconds.
6Motivation
- To track cognitive processes over time.
- Estimate process hemodynamic responses.
- Estimate process timings.
- Allowing processes that do not directly
correspond to the stimuli timing is a key
contribution of HPMs! - To compare hypotheses of cognitive behavior.
7The Thesis
- It is possible to
- simultaneously
- estimate the parameters and timing of
- temporally and spatially overlapped,
- partially observed processes
- (using many features and a small number of noisy
training examples). - We are developing a class of probabilistic models
called Hidden Process Models (HPMs) for this task.
8Related Work in fMRI
- General Linear Model (GLM)
- Must assume timing of process onset to estimate
hemodynamic response - Dale99
- 4-CAPS and ACT-R
- Predict fMRI data rather than learning parameters
of processes from the data - Anderson04, Just99
9Related Work in Machine Learning
- Classification of windows of fMRI data
- Does not typically estimate hemodynamic response
- Cox03, Haxby01, Mitchell04
- Dynamic Bayes Networks
- HPM assumptions/constraints are difficult to
encode in DBNs - Murphy02, Ghahramani97
10HPM Modeling Assumptions
- Model latent time series at process-level.
- Process instances share parameters based on their
process types. - Use prior knowledge from experiment design.
- Sum process responses linearly.
11HPM Formalism (Hutchinson06)
- HPM ltH,C,F,Sgt
- H lth1,,hHgt, a set of processes
- h ltW,d,W,Qgt, a process
- W response signature
- d process duration
- W allowable offsets
- Q multinomial parameters over values in W
- C ltc1,, cCgt, a set of configurations
- c ltp1,,pLgt, a set of process instances
- lth,l,Ogt, a process instance
- h process ID
- associated stimulus landmark
- O offset (takes values in Wh)
- ltf1,,fCgt, priors over C
- S lts1,,sVgt, standard deviation for each voxel
Notation parameter(entity) e.g. W(h) is the
response signature of process h, and O(p) is
the offset of process instance p.
12 Process 1 ReadSentence Response signature
W Duration d 11 sec. Offsets W 0,1
P(?) q0,q1
Process 2 ViewPicture Response signature
W Duration d 11 sec. Offsets W 0,1
P(?) q0,q1
Processes of the HPM
v1 v2
v1 v2
Input stimulus ?
sentence
picture
Timing landmarks ?
Process instance ?2 Process h 2 Timing
landmark ?2 Offset time O 1 sec (Start
time ?2 O)
?1
?2
One configuration c of process instances
?1, ?2, ?k (with prior fc)
?1
?2
?
Predicted mean
N(0,s1)
v1 v2
N(0,s2)
13HPMs the graphical model
Configuration c
Timing Landmark l
The set C of configurations constrains the
joint distribution on h(k),o(k) " k.
Process Type h
Offset o
Start Time s
S
p1,,pk
observed
unobserved
Yt,v
t1,T, v1,V
14Encoding Experiment Design
Processes
Input stimulus ?
Constraints Encoded h(p1) 1,2 h(p2)
1,2 h(p1) ! h(p2) o(p1) 0 o(p2) 0 h(p3)
3 o(p3) 1,2
ReadSentence 1
ViewPicture 2
Timing landmarks ?
?2
?1
Decide 3
Configuration 1
Configuration 2
Configuration 3
Configuration 4
15Inference
- Over configurations
- Choose the most likely configuration, where
- Cconfiguration, Yobserved data, Dinput
stimuli, HPMmodel
16Learning
- Parameters to learn
- Response signature W for each process
- Timing distribution Q for each process
- Standard deviation s for each voxel
- Case 1 Known Configuration.
- Least squares problem to estimate W.
- Standard MLEs for Q and s.
- Case 2 Unknown Configuration.
- Expectation-Maximization (EM) algorithm to
estimate W and Q. - E step estimate a probability distribution over
configurations. - M step update estimates of W (using reweighted
least squares) and Q (using standard MLEs) based
on the E step. - Standard MLEs for s.
17Case 1 Known Configuration
- Following Dale99, use GLM.
- The (known) configuration generates a TxD
convolution matrix X
DShd(h)
d(1)
d(3)
d(2)
Configuration p1 h1, start1 p2 h2,
start2 p3 h3, start2
1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1
0 1 0 0 1 0 0 0 0 0 0 1 0 0 1
t1 t2 t3 t4
T
For this example, d(1)d(2)d(3)3.
18Case 1 Known Configuration
V
d(1)
d(3)
d(2)
V
1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1
0 1 0 0 1 0 0 0 0 0 0 1 0 0 1
W(1)
d(1)
d(2)
W(2)
Y
T
W(3)
d(3)
19Case 2 Unknown Configuration
- E step Use the inference equation to estimate a
probability distribution over the set of
configurations. - M step Use the probabilities computed in the
E-step to form weights for the least-squares
procedure for estimating W.
20Case 2 Unknown Configuration
- Convolution matrix models several choices for
each time point.
Configurations for each row
d(1)
d(3)
d(2)
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0
1 ...
t1 t1 t2 t2 t18 t18 t18 t18
3,4 1,2 3,4 1,2 3 4 1 2
TgtT
21Case 2 Unknown Configuration
- Weight each row with probabilities from E-step.
d(1)
d(3)
d(2)
Configurations
Weights
e1 e2 e3 e4
3,4 1,2 3,4 1,2
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
Y
W
e1 P(C3Y,Wold,Qold,sold) P(C4Y,Wold,Qold,s
old)
22Learned HPM with 3 processes (S,P,D), and d13sec.
S
S
P
P
D?
D?
observed
23Results Model Selection
- Use cross-validation to choose a model.
- GNB Gaussian Naïve Bayes
- HPM-2 HPM with ViewPicture, ReadSentence
- HPM-3 HPM-2 Decide
Subject A B C
GNB 0.725 0.750 0.750
HPM-2 0.750 0.875 0.787
HPM-3 0.775 0.875 0.812
GNB -896 -786 -476
HPM-2 -876 -751 -466
HPM-3 -864 -713 -447
Accuracy predicting picture vs. sentence (random
0.5)
Data log likelihood
24Synthetic Data Results
- Timing of synthetic data mimics the real data,
but we have ground truth. - Can use to investigate effects of
- signal to noise ratio
- number of voxels
- number of training examples
- on
- training time
- cross-validated classification accuracy
- cross-validated data log-likelihood
25(No Transcript)
26Recall Motivation
- To track cognitive processes over time.
- Estimate process hemodynamic responses.
- Estimate process timings.
- Allowing processes that do not directly
correspond to the stimuli timing is a key
contribution of HPMs! - To compare hypotheses of cognitive behavior.
27Proposed Work
- Goals
- Increase efficiency.
- fewer parameters
- better accuracy from fewer examples
- faster inference and learning
- Handle larger, more complex problems.
- more voxels
- more processes
- fewer assumptions
- Research areas
- Model Parameterization
- Timing Constraints
- Learning Under Uncertainty
28Model Parameterization
- Goals
- Improve biological plausibility of learned
responses. - Decrease the number of parameters to be estimated
(improving sample complexity). - Tasks
- Parameter sharing across voxels
- Parametric form for response signatures
- Temporally smoothed response signatures
29Timing Constraints
- Goals
- Specify experiment design domain knowledge more
efficiently. - Improve the computational and sample complexities
of the HPM algorithms. - Tasks
- Formalize limitations in terms of fMRI experiment
design. - Improve the specification of timing constraints.
- Develop more efficient exact and/or approximate
algorithms.
30Learning Under Uncertainty
- Goals
- Relax the current modeling assumptions.
- Allow more types of uncertainty about the data.
- Tasks
- Learn process durations.
- Learn the number of processes in the model.
31HPM Parameter Sharing (Niculescu05)
Special case HPMs with known configuration.
Parameter reduction from d(h) V to d(h)
V.
Scaling parameter per voxel per process.
New mean for voxel v at time t
No more voxel index on weights.
32Extension to Unknown Timing
- Simplifying assumptions
- No clustering. All voxels share a response.
- Voxels that share a response for one process
share a response for all processes. - Algorithm notes
- Residual is linear in shared response parameters
and in scaling parameters, so minimize
iteratively. - Empirically, convergence occurs within 3-5
iterations.
33Iterative M-step Step 1
- Using current estimates of S, re-estimate W.
d(1)
d(3)
d(2)
s110 0 0 0 0 0 0 0 0 0 0 s21 0 0 0 0 0 0
s11 0 0 0 0 0 0 0 0 0 0 0 s21 0 0 0 0
W(1) W(2) W(3)
New shape TV x 1
No more voxel index here. Single column of
parameters describing the shared responses.
TxD for v1
Y
s120 0 0 0 0 0 0 0 0 0 0 s22 0 0 0 0 0 0
s12 0 0 0 0 0 0 0 0 0 0 0 s22 0 0 0 0
TxD for v2
Replace ones of convolution matrix with shv.
Repeat for all v.
34Iterative M-step Step 2
- Using current estimates of W, re-estimate S.
d(1)
d(3)
d(2)
s11 s1V s11 s1V s11 .... s1V s21
s2V s21 ... s2V sH1 .... sHV
w110 0 0 0 0 0 0 0 0 0 0 w210 0 0 0 0 0
w12 0 0 0 0 0 0 0 0 0 0 0 w22 0 0 0 0
Y
d(1)
Each column has the scaling parameters for a
voxel. The parameter for each process is
replicated over its duration.
Original size data matrix.
Original size convolution matrix. Ones replaced
with W estimates.
Need to constrain these parameter sets to be
equal.
35Next Step?
- Implement this approach.
- Anticipated memory issues
- Replicating the convolution matrix for each voxel
in step 1. - Working on exploiting sparsity/structure of these
matrices. - Add clustering back in
- Adapt for other parameterizations of response
signatures
36Response Signature Parameters
- Temporal smoothing
- Gamma functions
- Hemodynamic basis functions
37Temporally Smooth Responses
- Idea Add a regularizer to the loss function to
penalize large jumps between time points. - e.g. minimize (Y-XW)2 lSt(Wt-Wt-1)2
- choose l by cross-validation
- should be a straightforward extension to the
optimization code - Concerns
- this adds l instead of reducing the number of
parameters!
38Gamma-shaped Responses
- Idea Use a gamma function with 3 parameters for
each process response signature (Boynton96). - a controls amplitude
- t controls width of peak
- n controls delay of peak
- Questions
- Are gamma functions a reasonable modeling
assumption? - Details of how to fit parameters in M-step?
n
Signal Amplitude
t
a
Seconds
39Hemodynamic Basis Functions
- Idea Process response signatures are weighted
sum of basis functions. - parameters are weights on n basis functions
- e.g. gammas with different sets of parameters
- learn process durations for free with variable
length basis functions - share basis functions across voxels and processes
- Questions
- How to choose/learn basis? (Dale99)
40Schedule
- August 2006
- Parameter sharing.
- Progress on model parameterization.
- December 2006
- Improved expression of timing constraints.
- Corresponding updates to HPM algorithms.
- June 2007
- Application of HPMs to an open cognitive science
problem. - December 2007
- Projected completion.
41References
John R. Anderson, Daniel Bothell, Michael D.
Byrne, Scott Douglass, Christian Lebiere, and
Yulin Qin. An integrated theory of the mind.
Psychological Review, 111(4)10361060, 2004.
http//act-r.psy.cmu.edu/about/. Geoffrey M.
Boynton, Stephen A. Engel, Gary H. Glover, and
David J. Heeger. Linear systems analysis of
functional magnetic resonance imaging in human
V1. The Journal of Neuroscience,
16(13)42074221, 1996. David D. Cox and Robert
L. Savoy. Functional magnetic resonance imaging
(fMRI) brain reading detecting and classifying
distributed patterns of fMRI activity in human
visual cortex. NeuroImage, 19261270,
2003. Anders M. Dale. Optimal experimental
design for event-related fMRI. Human Brain
Mapping, 8109114, 1999. Zoubin Ghahramani and
Michael I. Jordan. Factorial hidden Markov
models. Machine Learning, 29245275,
1997. James V. Haxby, M. Ida Gobbini, Maura L.
Furey, Alumit Ishai, Jennifer L. Schouten, and
Pietro Pietrini. Distributed and overlapping
representations of faces and objects in ventral
temporal cortex. Science, 29324252430,
September 2001. Rebecca A. Hutchinson, Tom M.
Mitchell, and Indrayana Rustandi. Hidden Process
Models. To appear at International Conference on
Machine Learning, 2006. Marcel Adam Just,
Patricia A. Carpenter, and Sashank Varma.
Computational modeling of high-level cognition
and brain function. Human Brain Mapping,
8128136, 1999. http//www.ccbi.cmu.edu/project
10modeling4CAPS.htm. Tom M. Mitchell et al.
Learning to decode cognitive states from brain
images. Machine Learning, 57145175,
2004. Kevin P. Murphy. Dynamic bayesian
networks. To appear in Probabilistic Graphical
Models, M. Jordan, November 2002. Radu Stefan
Niculescu. Exploiting Parameter Domain Knowledge
for Learning in Bayesian Networks. PhD thesis,
Carnegie Mellon University, July 2005.
CMU-CS-05-147.