Title: The Subjective Experience of Feelings Past
1- Thresholding and multiple comparisons
2Information for making inferences on activation
- Signal location
- Local maximum no inference in SPM
- Could extract peak coordinates and test
- (e.g., Woods lab, Ploghaus, 1999)
- Center-of-mass no inference
- Sensitive to blob-defining-threshold
- Signal magnitude
- Local contrast intensity P-values ( CIs)
- Spatial extent
- Cluster volume P-value, no CIs
- Sensitive to blob-defining-threshold
- Signal timing
- No inference in SPM but see Aguirre 1998
Bellgowan 2003 Miezin et al. 2000, Lindquist
Wager, 2007
3Three levels of inference
- Voxel-level
- Most spatially specific, least sensitive
- Cluster-level
- Less spatially specific, more sensitive
- Set-level
- No spatial specificity (no locatization)
- Can be most sensitive
4Voxel-level Inference
- Retain voxels above ?-level threshold u?
- Gives best spatial specificity
- The null hyp. at a single voxel can be rejected
u?
space
Significant Voxels
No significant Voxels
5Cluster-level Inference
- Two step-process
- Define clusters by arbitrary threshold uclus
- Retain clusters larger than ?-level threshold k?
uclus
space
Cluster not significant
Cluster significant
k?
k?
6Cluster-level Inference
- Typically better sensitivity
- Worse spatial specificity
- The null hyp. of entire cluster is rejected
- Only means that one or more voxels in cluster
active
uclus
space
Cluster not significant
Cluster significant
k?
k?
7Set-level Inference
- Count number of blobs c
- c depends on by uclus blob size k threshold
- Worst spatial specificity
- Only can reject global null hypothesis
uclus
space
k
k
Here c 1 only 1 cluster larger than k
8Multiple comparisons
9Hypothesis Testing
- Null Hypothesis H0
- Test statistic T
- t observed realization of T
- ? level
- Acceptable false positive rate
- Level ? P( Tgtu? H0 )
- Threshold u? controls false positive rate at
level ? - P-value
- Assessment of t assuming H0
- P( T gt t H0 )
- Prob. of obtaining stat. as largeor larger in a
new experiment - P(DataNull) not P(NullData)
10Multiple Comparisons Problem
- Which of 100,000 voxels are sig.?
- ?0.05 ? 5,000 false positive voxels
- Which of (random number, say) 100 clusters
significant? - ?0.05 ? 5 false positives clusters
- Expected false positives for independent tests
- ? Ntests
11MCP SolutionsMeasuring False Positives
- Familywise Error Rate (FWER)
- Familywise Error
- Existence of one or more false positives
- FWER is probability of familywise error
- False Discovery Rate (FDR)
- FDR E(V/R)
- R voxels declared active, V falsely so
- Realized false discovery rate V/R
- (but dont know what it is with real data!)
12MCP SolutionsMeasuring False Positives
- Familywise Error Rate (FWER)
- Familywise Error
- Existence of one or more false positives
- FWER is probability of familywise error
- False Discovery Rate (FDR)
- FDR E(V/R)
- R voxels declared active, V falsely so
- Realized false discovery rate V/R
13FWE Multiple comparisons terminology
- Family of (null) hypotheses
- Hk k ? ? 1,,K
- Familywise Type I error
- weak control omnibus test
- Pr(reject H? ? H?? true) ? ?
- Pr falsely rejecting the global null
- If reject, can only say 1 or more Hk is false
anything, anywhere ? - strong control localising test
- Pr(reject HW ? HW) ? ?
- Pr rejecting any true null H lt alpha
- If reject, knowanything, where ?
- Adjusted pvalues
- test level at which reject Hk
14Voxel-level test
p 0.05
- Threshold u ? (critical t-value)
- tk gt u? ? reject Hk
- Weak control
- reject any Hk ? reject H?
- reject H? if t?max gt u?
- Adjusted p values
- Pr(T?max gt tk ? H?)
p 0.0001
p 0.0000001
u??
15Controlling FWE withBonferroni
16FWE MCP Solutions Bonferroni
- For a statistic image T...
- Ti ith voxel of statistic image T, where i
1V - ...use ? ?0/V
- ?0 desired FWER level (e.g. 0.05)
- ? new alpha level to achieve ?0 corrected
- V number of voxels
- u? ?-level statistic threshold, P(Ti ? u?) ?
- By Bonferroni inequality...
- FWER P(FWE) P( ?i Ti ? u? H0) ? ?i
P( Ti ? u? H0 ) - ?i ? ?i (?0 /V)
- ?0 (corrected)
P(any T exceeds mu under null)
Conservative under correlation Independent V
tests Some dep. ? tests Total dep. 1 test
Upper bound sum of P-values
17Controlling FWE withRandom field theory
18SPM approachRandom fields
- Consider statistic image as lattice
representation of a continuous random field - Use results from continuous random field theory
? lattice representation
19FWER MCP Solutions Controlling FWER w/ Max
- FWER distribution of maximum
- FWER P(FWE) P( ?i Ti ? u Ho)
- P(any t value exceeds mu under null) P(
maxi Ti ? u Ho) - P(max t value exceeds mu under null)
- 100(1-?)ile of max distn controls FWER
- FWER P( maxi Ti ? u? Ho) ?
- where
- u? F-1max (1-?)
- .
Mu such that the max only exceeds mu alpha of
the time
Distribution of maxima
u?
20FWER MCP SolutionsRandom Field Theory
- Euler Characteristic ?u
- Topological Measure
- blobs - holes
- At high thresholds,just counts blobs
- FWER P(Max voxel ? u Ho) P(One or more
blobs Ho) ? P(?u ? 1 Ho) ? E(?u Ho)
Threshold
Random Field
IF No holes
IF Never more than 1 blob
Suprathreshold Sets
21Random Field Intuition
- Corrected P-value for voxel value t
- Pc P(max T gt t) ? E(?t) ? ?(?) ?1/2 t2
exp(-t2/2) - Statistic value t increases
- Pc decreases (but only for large t)
- Search volume increases
- Pc increases (more severe MCP)
- Roughness increases (Smoothness decreases)
- Pc increases (more severe MCP)
volume
roughness
t-value
22Random Field TheorySmoothness Estimation
- Smoothness estdfrom standardizedresiduals
- Var. of gradientsyields resels pervoxel (RPV)
- RPV image
- Local roughness
- Can transform into local smoothness est.
- FWHM Img (RPV Img)-1/D
- Dimension D, e.g. D 2 or 3
23Random Field TheorySmoothness Parameterization
- RESELS Resolution Elements
- 1 RESEL FWHMx?? FWHMy?? FWHMz
- RESEL Count R
- R ?(?) ?1/2 / (4log2)3/2 ?(?) / (
FWHMx?? FWHMy?? FWHMz ) - Volume of search region in units of smoothness
- Eg 10 voxels, 2.5 FWHM 4 RESELS
- Beware RESEL misinterpretation
- RESEL are not number of independent things in
image - See Nichols Hayasaka, 2003, Stat. Meth. in Med.
Res. - Do not Bonferroni correct based on resel count.
24Random Field TheoryCluster Size Tests
- Expected Cluster Size
- E(S) E(N)/E(L)
- S cluster size
- N suprathreshold volume?(T gt uclus)
- L number of clusters
- E(N) ?(?) P( T gt uclus )
- E(L) ? E(?u)
- Assuming no holes
25Random Field TheoryCluster Size Corrected
P-Values
- Previous results give uncorrected P-value
- Corrected P-value
- Bonferroni
- Correct for expected number of clusters
- Corrected Pc E(L) Puncorr
- Poisson Clumping Heuristic (Adler, 1980)
- Corrected Pc 1 - exp( -E(L) Puncorr )
26Random Field Theory Limitations
- Sufficient smoothness
- FWHM smoothness 3-4 voxel size (Z)
- More like 10 for low-df T images
- Smoothness estimation
- Estimate is biased when images not sufficiently
smooth - Multivariate normality
- Virtually impossible to check
- Several layers of approximations
- Stationarity required for cluster size results
- Cluster size results tend to be too lenient in
practice!
27Real Data
- fMRI Study of Working Memory
- 12 subjects, block design Marshuetz et al (2000)
- Item Recognition
- ActiveView five letters, 2s pause, view probe
letter, respond - Baseline View XXXXX, 2s pause, view Y or N,
respond - Second Level RFX
- Difference image, A-B constructedfor each
subject - One sample t test
28Real DataRFT Result
- Threshold
- S 110,776
- 2 ? 2 ? 2 voxels5.1 ? 5.8 ? 6.9 mmFWHM
smoothness - u 9.870 (crit. T-value)
- Result
- 5 voxels abovethe threshold
- 0.0063 minimumFWE-correctedp-value
-log10 p-value
29Controlling FWER with permutation tests(e.g.,
SnPM)
30Permutation TestToy Example
- Data from V1 voxel in visual stim. experiment
- A Active, flashing checkerboard B Baseline,
fixation - 6 blocks, ABABAB Just consider block
averages... - Null hypothesis Ho
- No experimental effect, A B labels arbitrary
- Statistic
- Mean difference
31Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
32Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
- Compute all possible statistic values
33Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
- Compute all possible statistic values
- Find 95ile of permutation distribution
34Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
- Compute all possible statistic values
- Find 95ile of permutation distribution
35Permutation TestToy Example
- Under Ho
- Consider all equivalent relabelings
- Compute all possible statistic values
- Find 95ile of permutation distribution
0
4
8
-4
-8
36Permutation TestStrengths
- Requires only assumption of exchangeability
- Under Ho, distribution unperturbed by permutation
- Allows us to build permutation distribution
- Subjects are exchangeable
- Under Ho, each subjects A/B labels can be
flipped - Permutation tests thus useful for 2nd level
analysis! - fMRI scans not exchangeable under Ho
- Due to temporal autocorrelation
- Dont try permutation on individual subjects (but
see, e.g., Bullmore for work on wavelet-based
permutation) - BUT if experimental conditions are randomized,
experimental labels are exchangeable within a
subject
37One-sample t-test with SnPM
- Under H0, activation is symmetrically distributed
around zero. - Permute signs on individual subjects contrast
values to generate H0 t-distribution - Apply sign for each subject to whole maps
preserve spatial correlation patterns - For each permutation, compute pseudo-t at each
voxel, and save max. - 95th percentile of max pseudo-t distribution is
FWER corrected p lt .05 (one-tailed).
38Permutation TestExample
- Permute!
- 212 4,096 ways to flip 12 A/B labels
- For each, note maximum of t image
- .
39uRF 9.87uBonf 9.805 sig. vox.
uPerm 7.67 58 sig. vox.
t11 Statistic, RF Bonf. Threshold
t11 Statistic, Nonparametric Threshold
40Does this Generalize?RFT vs Bonf. vs Perm.
41RFT vs Bonf. vs Perm.
42 19 df
FamilywiseErrorThresholds
- RF Perm adapt to smoothness
- Perm Truth close
- Bonferroni close to truth for low smoothness
9 df
more
43Performance Summary
- Bonferroni
- Not adaptive to smoothness
- Not so conservative for low smoothness
- Random Field
- Adaptive
- Conservative for low smoothness df
- Permutation
- Adaptive (Exact)
44Understanding Performance Differences
- RFT Troubles
- Multivariate Normality assumption
- True by simulation
- Smoothness estimation
- Not much impact
- Smoothness
- You need lots, more at low df
- High threshold assumption
- Doesnt improve for ?0 less than 0.05 (not shown)
Highhr
45Controlling the False Discovery Rate
46MCP SolutionsMeasuring False Positives
- Familywise Error Rate (FWER)
- Familywise Error
- Existence of one or more false positives
- FWER is probability of familywise error
- False Discovery Rate (FDR)
- FDR E(V/R)
- R voxels declared active, V falsely so
- Realized false discovery rate V/R
47False Discovery Rate
- For any threshold, all voxels can be cross
classified - False Discovery Proportion
- FDP V0R/(V1RV0R) V0R/NR
- If NR 0, FDP 0
- But only can observe NR, dont know V1R V0R
- We control the expected FDP
- FDR E(FDP)
48False Discovery RateIllustration
Noise
Signal
SignalNoise
49Control of Per Comparison Rate at 10
Percentage of Null Pixels that are False Positives
Control of Familywise Error Rate at 10
FWE
Occurrence of Familywise Error
Control of False Discovery Rate at 10
Percentage of Activated Pixels that are False
Positives
50Benjamini HochbergProcedure
JRSS-B (1995)57289-300
- Select desired limit q on FDR (e.g., 0.05)
- Order p-values, p(1) ? p(2) ? ... ? p(V)
- Let r be largest i such that
- Reject all hypotheses corresponding to p(1),
... , p(r).
1
p(i)
p-value
- Ignore c(V) for a moment if c(V) 1 and there
is no signal, only 5 of the time we expect the
lowest p-value to be lower than the boundary
i/V ? q/c(V)
0
0
1
i/V
51Adaptiveness of Benjamini Hochberg FDR
Consider an increasing number of very significant
(p ? 0) voxels...
Threshold varies between Bonferroni uncorrected!
P-values when no signal (unif. dist)
Ordered p-values p(i)
P-value threshold when no signal p(1) ? (1/V)?
P-value thresholdwhen allsignal p(V) ? (V/V)?
Fractional index i/V
52Benjamini Hochberg Procedure Details
- c( V ) 1
- Positive Regression Dependency on Subsets
- P(X1?c1, X2?c2, ..., Xk?ck Xixi) is
non-decreasing in xi - Only required of test statistics for which null
true - Special cases include
- Independence
- Multivariate Normal with all positive
correlations - Same, but studentized with common std. err.
- c( V ) ?i1,...,V 1/i ? log(V)0.5772
- Arbitrary covariance structure
Benjamini Yekutieli (2001).Ann.
Stat.291165-1188
531
p(i)
p-value
i/V ? q/c(V)
0
0
1
i/V
If threshold at lowest p-value, One voxel
declared sig m01 Pcrit q1/V (e.g.)
0.05/V Bonferroni threshold E(false pos) .05
1 5 of activated vox (1)
If threshold at largest p-value, V voxels
declared sig m0V Pcrit qV/V (e.g.)
0.05 Uncorrected threshold E(false pos) .05
V 5 of activated vox (V)
54Benjamini HochbergKey Properties
- FDR is controlled E(FDP) ? q m0/V
- Conservative, if large fraction of nulls false
- Adaptive
- Threshold depends on amount of signal
- More signal, More small p-values,More p(i) less
than i/V ? q/c(V)
55Controlling FDRVarying Signal Extent
p z
1
56Controlling FDRVarying Signal Extent
- red truly activated area
- take-home hard to find small areas with FDR,
even if small p-values.
p z
2
57Controlling FDRVarying Signal Extent
p z
3
58Controlling FDRVarying Signal Extent
p 0.000252 z 3.48
4
59Controlling FDRVarying Signal Extent
p 0.001628 z 2.94
5
60Controlling FDRVarying Signal Extent
p 0.007157 z 2.45
6
61Controlling FDRVarying Signal Extent
p 0.019274 z 2.07
7
62Real Data FDR Example
- Threshold
- Indep/PosDepu 3.83 (T-threshold)
- Arb Covu 13.15 (T-threshold)
- Result
- 3,073 voxels aboveIndep/PosDep u
- lt0.0001 minimumFDR-correctedp-value
FDR Threshold 3.833,073 voxels
63Conclusions
- Must account for multiplicity
- Otherwise have a fishing expedition
- FWER
- Very specific, not very sensitive
- FDR
- Less specific, more sensitive
- Adaptive
- Good Power increases w/ amount of signal
- Bad Number of false positives increases too
64References
- Most of this talk covered in these papers
- TE Nichols S Hayasaka, Controlling the
Familywise Error Rate in Functional Neuroimaging
A Comparative Review. Statistical Methods in
Medical Research, 12(5) 419-446, 2003. - TE Nichols AP Holmes, Nonparametric
Permutation Tests for Functional Neuroimaging A
Primer with Examples. Human Brain Mapping,
151-25, 2001. - CR Genovese, N Lazar TE Nichols, Thresholding
of Statistical Maps in Functional Neuroimaging
Using the False Discovery Rate. NeuroImage,
15870-878, 2002.
65End of this section.
66RFT DetailsUnified Formula
- General form for expected Euler characteristic
- ?2, F, t fields restricted search regions
D dimensions - E?u(W) Sd Rd (W) rd (u)
Rd (W) d-dimensional Minkowski functional of
W function of dimension, space W and
smoothness R0(W) ?(W) Euler characteristic
of W R1(W) resel diameter R2(W) resel
surface area R3(W) resel volume
rd (W) d-dimensional EC density of Z(x)
function of dimension and threshold, specific
for RF type E.g. Gaussian RF r0(u) 1- ?(u)
r1(u) (4 ln2)1/2 exp(-u2/2) /
(2p) r2(u) (4 ln2) exp(-u2/2) /
(2p)3/2 r3(u) (4 ln2)3/2 (u2 -1) exp(-u2/2)
/ (2p)2 r4(u) (4 ln2)2 (u3 -3u) exp(-u2/2)
/ (2p)5/2
?
67Random Field TheoryCluster Size Distribution
- Gaussian Random Fields (Nosko, 1969)
- D Dimension of RF
- t Random Fields (Cao, 1999)
- B Beta distn
- Us ?2s
- c chosen s.t.E(S) E(N) / E(L)
68RFT DetailsExpected Euler Characteristic
- E(?u) ? ?(?) ?1/2 (u 2 -1) exp(-u 2/2) / (2?)2
- ? ? Search region ? ? R3
- ?(?? ? volume
- ?1/2 ? roughness
- Diff. expressions for N, t, chi-square, F fields
- Assumptions
- Multivariate Normal
- Stationary
- Sample approximates continuous RF
- Stationary
- Results valid w/out stationary
- More accurate when stat. holds
Mu threshold exceed by individual values at rate
alpha
69Random Field TheorySmoothness Parameterization
- E(?u) depends on ?1/2
- Roughness matrix ?
- Smoothness parameterized as Full Width at Half
Maximum - FWHM of Gaussian kernel needed to smooth a
whitenoise random field to roughness ?
70FWE Multiple comparisons terminology
- Family of hypotheses
- Hk k ? ? 1,,K
- H? ? Hk , I.e., all Hk true
- H??true no Hk is false
- Familywise Type I error
- weak control omnibus test
- Pr(reject H? ? H?? true) ? ?
- If reject, can only say 1 or more Hk is
falseanything, anywhere ? - strong control localising test
- Pr(reject HW ? HW) ? ?
- ? W W ? ? HW
- Pr rejecting any true null H lt alpha
- If reject, knowanything, where ?
- Adjusted pvalues
- test level at which reject Hk
71Voxel-level test
p 0.05
- Threshold u ? (critical t-value)
- tk gt u? ? reject Hk
- reject any Hk ? reject H?
- reject H? if t?max gt u?
- Valid test
- weak control
- Pr(T?max gt u? ? H?? ) ? ?
- strong control
- since W ? ?
- Pr(TWmax gt u? ? HW? ) ? ?
- Adjusted p values
- Pr(T?max gt tk ? H?)
p 0.0001
p 0.0000001
u??