Title: Nonnegative least squares for imaging data
1Nonnegative least squares for imaging data
- Keith Worsley
- McGill
- Jonathan Taylor
- Stanford and Université de Montréal
- John Aston
- Academia Sinica, Taipei
2(No Transcript)
3Nature (2005)
4Subject is shown one of 40 faces chosen at
random
Happy
Sad
Fearful
Neutral
5 but face is only revealed through random
bubbles
- First trial Sad expression
- Subject is asked the expression
Neutral
- Response
Incorrect
75 random bubble centres
Smoothed by a Gaussian bubble
What the subject sees
Sad
6Your turn
Subject response Fearful CORRECT
7Your turn
Subject response Happy INCORRECT (Fearful)
8Your turn
Subject response Happy CORRECT
9Your turn
Subject response Fearful CORRECT
10Your turn
Subject response Sad CORRECT
11Your turn
Subject response Happy CORRECT
12Your turn
Subject response Neutral CORRECT
13Your turn
Subject response Happy CORRECT
14Your turn
Subject response Happy INCORRECT (Fearful)
15Bubbles analysis
- E.g. Fearful (3000/4750 trials)
Trial 1 2 3 4
5 6 7 750
Sum
Correct trials
Thresholded at proportion of correct
trials0.68, scaled to 0,1
Use this as a bubble mask
Proportion of correct bubbles (sum correct
bubbles) /(sum all bubbles)
16Results
- Mask average face
- But are these features real or just noise?
- Need statistics
Happy Sad
Fearful Neutral
17Statistical analysis
- Correlate bubbles with response (correct 1,
incorrect 0), separately for each expression - Equivalent to 2-sample Z-statistic for correct
vs. incorrect bubbles, e.g. Fearful - Very similar to the proportion of correct bubbles
ZN(0,1) statistic
Trial 1 2 3 4
5 6 7 750
Response 0 1 1 0
1 1 1 1
18Results
- Thresholded at Z1.64 (P0.05)
- Multiple comparisons correction for 91200 pixels?
- Need random field theory
ZN(0,1) statistic
Average face Happy Sad
Fearful Neutral
19Euler Characteristic blobs - holes
Excursion set Z gt threshold for neutral face
EC 0 0 -7 -11
13 14 9 1 0
Heuristic At high thresholds t, the holes
disappear, EC 1 or 0, E(EC) P(max Z gt
t).
- Exact expression for E(EC) for all thresholds,
- E(EC) P(max Z gt t) is extremely accurate.
20Random field theory
Lipschitz-Killing curvatures of S (Resels(S)
c)
EC densities of Z above t
filter
white noise
Z(s)
FWHM
21Results, corrected for search
- Random field theory threshold Z3.92 (P0.05)
-
- 3.82 3.80 3.81
3.80 - Saddle-point approx (Rabinowitz, 1997 Chamandy,
2007)? - Bonferroni Z4.87 (P0.05) nothing
ZN(0,1) statistic
Average face Happy Sad
Fearful Neutral
22fMRI data 120 scans, 3 scans each of hot, rest,
warm, rest,
T (hot warm effect) / S.d. t110 if no
effect
23Linear model regressors
24Linear model for fMRI time series with AR(p)
errors
? ?
unknown
parameters ?
? ?
25Unknown latency d of HRF
? ? ?
unknown parameters
26Example
convolve with stimulus
0
0
2
0.4
2
-2
-2
2
1
stimulus
0.2
0
0
-0.2
-1
0
10
20
10
20
30
40
50
60
70
t
t
Time,
(seconds)
Time,
(seconds)
- Two interesting problems
- Estimate the latency shift d and its standard
error - Test for the magnitude ß of the stimulus
allowing for unknown latency.
27Test for the magnitude ß of the stimulus allowing
for unknown latency
- We could do either
- T-test on ß1 gt 0, allowing for ß2
- loses sensitivity if d is far from 0
- F-test on (ß1, ß2) ? 0
- wastes sensitivity on unrealistic HRFs
28Cone alternative
- We know that the
- magnitude of the response is positive
- latency shift must lie in some restricted
interval, say -2, 2 seconds - This implies that
- ß 0
- -? d ?, where ? 2 seconds
- This specifies a cone alternative for (ß1 ß, ß2
dß) (Friman et al. , 2003)
d 2
ß2
Cone angle ? 2 atan(?x2/x1)
cone alternative
d 0
ß1
Null 0
d -2
29Non-negative least squares
2
x2(t)
x1(t)
1
stimulus
0
-1
10
20
30
40
50
60
70
t
Time,
(seconds)
30Non-negative least squares
x1(t)
ß2
ß1 0, ß2 0
Cone angle ? angle between x1 and x2
ß1 0 ß2 0
ß1
Null 0
ß1 0, ß2 0
x2(t)
31Example of three extremes
x3(t) spread 4 seconds
4
3D cone
0 25
4
x1(t) standard
x2(t) delayed 4 seconds
ß3
ß2
ß1
32General non-negative least squares problem
Footnote Woolrich et al. (2004) replace hard
constraints by soft constraints through a prior
distribution on ß, taking a Bayesian approach.
33(No Transcript)
34(No Transcript)
35Pick a range of say p 150 plausible values of
the non-linear parameter ? ?1,,?p
36Fitting the NNLS model
- Simple
- Do all subsets regression
- Throw out any model that does not satisfy the
non-negativity constraints - Among those left, pick the model with smallest
error sum of squares - For larger models there are more efficient
methods e.g. Lawson Hanson (1974). - The non-negativity constraints tend to enforce
sparsity, even if regressors are highly
correlated (e.g. PET). - Why? Highly correlated regressors have huge
positive and negative unconstrained coefficients
non-negativity suppresses the negative ones.
37Example n20, p150, but surprisingly it does
not overfit
30
Y
Yhat
25
Yhat component 1
Yhat component 2
20
15
Tracer
10
5
0
0
1000
2000
3000
4000
5000
6000
Time
Tend to get sparse pairs of adjacent regressors,
suggesting best regressor is somewhere inbetween.
38P-values?
1 when j?
39P-values for PET data at a single voxel
p 2 Cone weights w1 ½, w2 ?/2p
j
j
40The Beta-bar random field
1 when j?
From well-known EC densities of F field
From simulations at a single voxel
Same linear combination!
41Proof
Z1N(0,1)
Z2N(0,1)
s2
s1
Rejection regions,
Excursion sets,
Threshold t
Z2
Cone alternative
Search Region, S
Z1
Null
42Euler characteristic heuristic again
Excursion sets, Xt
Search Region, S
EC blobs - holes 1 7
6 5 2 1
1 0
Observed
Expected
Euler characteristic, EC
Threshold, t
EXACT!
43Proof
44Steiner-Weyl Tube Formula (1930)
Morse Theory Approach (1995)
- Put a tube of radius r about the search
- region ?S
r
Tube(?S,r)
?S
For a Gaussian random field
For a chi-bar random field???
- Find volume, expand as a power series
- in r, pull off coefficients
45Tube(?S,r)
r
?S
Steiner-Weyl Volume of Tubes Formula (1930)
Lipschitz-Killing curvatures are just intrinisic
volumes or Minkowski functionals in the
(Riemannian) metric of the variance of the
derivative of the process
46S
S
S
Edge length ?
Lipschitz-Killing curvature of triangles
Lipschitz-Killing curvature of union of triangles
47Non-isotropic data? Use Riemannian metric of
Var(?Z)
ZN(0,1)
ZN(0,1)
s2
s1
Edge length ?
Lipschitz-Killing curvature of triangles
Lipschitz-Killing curvature of union of triangles
48We need independent identically distributed
random fields e.g. residuals from a linear model
Lipschitz-Killing curvature of triangles
Lipschitz-Killing curvature of union of triangles
Taylor Worsley, JASA (2007)
49Beautiful symmetry
Steiner-Weyl Tube Formula (1930)
Taylor Gaussian Tube Formula (2003)
- Put a tube of radius r about the search region
?S and rejection region Rt
Z2N(0,1)
Rt
r
Tube(Rt,r)
Tube(?S,r)
r
?S
Z1N(0,1)
t
t-r
- Find volume or probability, expand as a power
series in r, pull off coefficients
50Z2N(0,1)
Rejection region Rt
Tube(Rt,r)
r
Z1N(0,1)
t
t-r
Taylors Gaussian Tube Formula (2003)
51From well-known EC densities of F field
From simulations at a single voxel
Same linear combination!
52Proof, n3
53Power? S 1000cc brain, FWHM 10mm, P
0.05
Event
Block (20 seconds)
1
1
Cone angle ? 78.4o
Cone angle ? 38.1o
0.9
0.9
T-test on ß1
0.8
0.8
Beta-bar test
0.7
0.7
F-test on (ß1, ß2)
Cone weights w1 ½ w2 ?/2p
0.6
0.6
0.5
0.5
Power of test
Power of test
0.4
0.4
0.5
2
x1(t)
0.3
0.3
0
Response
0
Response
0.2
0.2
x2(t)
-0.5
-2
0.1
0.1
0
20
40
0
20
40
Time t (seconds)
Time t (seconds)
0
0
0
1
2
3
0
1
2
3
d
d
Shift
of HRF (seconds)
Shift
of HRF (seconds)
54Bubbles task in fMRI scanner
- Correlate bubbles with BOLD at every voxel
- Calculate Z for each pair (bubble pixel, fMRI
voxel) a 5D image of Z statistics
Trial 1 2 3 4
5 6 7 3000
fMRI
55Thresholding? Cross correlation random field
- Correlation between 2 fields at 2 different
locations, - searched over all pairs of locations, one in S,
one in T - Bubbles data P0.05, n3000, c0.113, T6.22
Cao Worsley, Annals of Applied Probability
(1999)
56NNLS for bubbles?
- At the moment, we are correlating Y(t) fMRI
data at each voxel with each of the 240x38091200
face pixels as regressors x1(t),,x91200(t)
separately - Y(t) xj(t)ßj
z(t)? e(t). - We should be doing this simultaneously
- Y(t) x1(t)ß1
x91200(t)ß91200 z(t)? e(t). - Obviously impossible observations(3000) ltlt
regressors(91200). - Maybe we can use NNLS ß1 0, , ß91200 0.
- It should enforce sparsity over ß activation at
face pixels, provided - observations(3000) gtgt dimensions of cone
resels of face(146.2) - We can threshold Beta-bar over brain voxels to
Plt0.05 using above. - Result will be an face image of isolated local
maxima for each voxel. - It will tell you which brain voxels are
activated, but not which face pixels. - Might be a huge computational task!
- Interactions? Y x1 x91200 x1x2
z.
57MS lesions and cortical thickness
- Idea MS lesions interrupt neuronal signals,
causing thinning in down-stream cortex - Data n 425 mild MS patients
5.5
5
4.5
4
Average cortical thickness (mm)
3.5
3
2.5
Correlation -0.568, T -14.20 (423 df)
2
Charil et al, NeuroImage (2007)
1.5
0
10
20
30
40
50
60
70
80
Total lesion volume (cc)
58MS lesions and cortical thickness at all pairs of
points
- Dominated by total lesions and average cortical
thickness, so remove these effects as follows - CT cortical thickness, smoothed 20mm
- ACT average cortical thickness
- LD lesion density, smoothed 10mm
- TLV total lesion volume
- Find partial correlation(LD, CT-ACT) removing TLV
via linear model - CT-ACT 1 TLV LD
- test for LD
- Repeat for all voxels in 3D, nodes in 2D
- 1 billion correlations, so thresholding
essential! - Look for high negative correlations
- Threshold P0.05, c0.300, T6.48
59Cluster extent rather than peak height (Friston,
1994)
- Choose a lower level, e.g. t3.11 (P0.001)
- Find clusters i.e. connected components of
excursion set - Measure cluster extent
- by resels
- Distribution
- fit a quadratic to the
- peak
- Distribution of maximum cluster extent
- Bonferroni on N clusters E(EC).
Z
D1
extent
t
Peak height
s
Cao and Worsley, Advances in Applied Probability
(1999)