Searching for structure in random field data - PowerPoint PPT Presentation

About This Presentation

Title:

Searching for structure in random field data

Description:

Searching for structure in random field data – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 56

Provided by: Kei7118

Category:

more less

Transcript and Presenter's Notes

Title: Searching for structure in random field data

1
Searching for structure in random field data

Keith J. Worsley12,
Thomas W. Yee3, Russell B. Millar3
1Department of Mathematics and Statistics, McGill
University, 2McConnell Brain Imaging Centre,
Montreal Neurological Institute, Montreal,
Canada, and
3Department of Statistics, University of
Auckland, New Zealand
www.math.mcgill.ca/keith

2
What is Data Mining?

The June 26, 2000, issue of TIME predicted that
one of the 10 hottest jobs of the 21st century
will be Data Mining
research gurus will be on hand to extract
useful tidbits from mountains of data,
pinpointing behaviour patterns for marketers and
epidemiologists alike.

3
Some definitions

Data mining is the process of selecting,
exploring, and modeling large amounts of data to
uncover previously unknown patterns for business
advantage (SAS 1998 Annual Report, p51)
Data mining is the nontrivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data
(Fayyad)
Data mining is the process of discovering
advantageous patterns in data (John)
Data mining is the computer automated exploratory
data analysis of (usually) large complex data
sets (Freidman, 1998)
Data mining is the search for valuable
information in large volumes of data (Weiss and
Indurkhya, 1998)
In contrast, Statistics is the science of
collecting, organizing and presenting data.

4
Why is it called Data Mining?

Plentiful data can be mined for nuggets of gold
(i.e. truth /insight/knowledge) by sifting
through vast amounts of raw data.
Some statisticians have criticized it as data
dredging or a fishing expedition in the search
of publishable P-values, or torturing the data
until it confesses.
Many DM methods are heuristic, complex, computer
intensive, so their statistical properties are
usually not tractable.
The focus of DM is often prediction and not
statistical inference.
I understand mining to be a very carefully
planned search for valuables hidden out of sight,
not a haphazard ramble. Mining is thus rewarding,
but, of course, a dangerous activity. (D.R. Cox,
in the discussion of Chatfield, 1995).

5
Striking fools gold

The Bible Code, a best-selling book by Michael
Drosnin, claims to find hidden messages in the
Bible about dinosaurs, Bill Clinton, the Rabin
assassination etc. from searches of arrays of
letters
In 1992, ProCyte Corp. was dismayed when a newly
developed drug, lamin, failed to promote general
healing of diabetic ulcer wounds. So the company
searched through subsets of data and found that
lamin appeared to work on certain foot wounds.
But that was a statistical fluke, as it turned
out after an expensive clinical trial. Not
allowed drug status, lamin is now sold as a wound
dressing

6
Confirming vs. Discovering

There are two types of DM
Hypothesis testing (aka top-down approach)
Knowledge Discovery in Databases (KDD)
(aka bottom-up approach)
Directed KDD want to explain the value of some
particular variable in terms of other variables
Undirected KDD identifies patterns in the data.
Undirected KDD recognizes relationships in data
Directed KDD explains those relationships once
they have been found.

7
Mining the miners

DM so far has been largely a commercial
enterprise. As in most gold rushes of the past,
the goal is to mine the miners. The largest
profits are made by selling the tools to the
miners, rather than in doing the actual mining
Hardware manufacturers emphasize high
computational requirements of DM.
Software developers emphasize competitive edge
Your competitor is doing it, so you had better
keep up.

8
Some commercial software

SAS Enterprise Miner
SPSS Clementine, Neural Connection and
AnswerTree
IBM Intelligent Miner
SGI MineSet
NeoVista Software ASIC
Mathsoft S-PLUS (for small data sets)

9
Some methods

Hypothesis testing Regression, analysis of
variance, time series analysis.
Directed KDD Classification, discrimination,
structural equation modeling, supervised neural
networks.
Undirected KDD Cluster analysis, tree methods
(AID, CHAID, CART), principal components analysis
(PCA), independent components analysis (ICA),
unsupervised neural networks.

10
Allied fields

Exploratory Data Analysis (EDA) Tukey defined
statistics in terms of problems rather than
tools.
Informatics is research on, development of, and
use of technological, sociological, and
organizational tools and applications for the
dynamic acquisition, indexing, dissemination,
storage, querying, retrieval, visualization,
integration, analysis, synthesis, sharing (which
includes electronic means of collaboration), and
publication of data such that economic and other
benefits may be derived from the information by
users of all sections of society.
Pattern recognition given some examples of
complex signals and the correct decisions for
them, make decisions automatically for a stream
of future examples, e.g. identify plants, tumors,
decide to buy or sell stocks.
Machine learning is the study of computer
algorithms that improve automatically through
experience. Applications range from data mining
programs that discover rules in large data sets,
to information filtering systems that
automatically learn users interests. (Mitchell,
1997).
Meta-Analysis is the statistical analysis of a
large collection of analysis results from
individual studies for the purpose of integrating
the findings.

11
Brain mapping data

We have huge data bases of brain images (MRI,
fMRI, PET, EEG, MEG ) together with patient
information (age, sex, psychological tests,
disease, genotype )
The novelty is that the image variables are 3D
images rather than single numbers (such as blood
pressure, cholesterol level )
These images can themselves be mined for
interesting information, e.g. peaks or clusters
of activated regions

12
Some data mining tools already used in brain
mapping

Regression, analysis of variance, time series
Cluster analysis (e.g. clustering of fMRI time
courses)
PCA and ICA of voxels scans matrix
Structural equation modeling to analyze
connectivity
Pattern recognition to segment gray/white/CSF
Meta-analysis to combine locations of activation
from different studies

13
Tree methods Automatic Interaction Detection
(AID)

Morgan, J.N. and Sonquist, J.A. (1963). Problems
in the analysis of survey data, and a proposal.
Journal of the American Statistical Association,
58, 415-434.
Kass, G.V. (1980). An exploratory technique for
investigating large quantities of categorical
data. Applied Statistics, 29, 119-127.
Worsley, K.J. (1978). Significance testing in
Automatic Interaction Detection (AID). PhD
Thesis, University of Auckland.

14
How AID works

Split observations into two groups according to
the values of a predictor
Two types of predictors
Monotonic split by thresholding
predictor x predictor gt x
Free split into any two subsets, e.g. if
predictor takes values x1, , x7
x1, x5, x6 x2, x3, x4, x7
Choose the split that maximizes a test statistic
for the difference in dependent or target
variable
Repeat on two subgroups until some stopping
criterion is reached (split is not significant
or subgroup size is too small)

15
SPSS example credit risk data
Dependant or target
Predictors M M F F M
M F F M M
M monotonic (split by thresholding), F free
(split into any two subsets)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Brain mapping example cortical thickness
Dependant or target
Predictor M M M M
M
Subject Node1 Node2 Node3 Node4 Node40962 Sex
1 3.73 3.05 3.93 2.30 1.59 m
2 2.95 1.17 3.33 2.75 1.03 f 3
2.30 1.23 2.56 1.20 1.46 f 4
2.64 2.19 2.57 2.25 1.29 m 5
2.39 2.76 2.51 2.82 1.02 f 6
3.26 1.85 3.31 1.70 1.65 f 7
2.68 2.52 3.23 2.30 1.47 m 8
3.60 3.66 2.90 2.25 1.79 m 9
3.27 1.43 2.88 1.81 2.14 f
321 4.10
2.67 2.83 1.78 1.70 f
20
Misclassification matrixcortical thickness

Actual category
Male Female
Predicted Male 145 18
category Female 18 140

21
(No Transcript)
22
fMRI data 120 scans, 3 scans each of hot, rest,
warm, rest, hot, rest,
T (hot warm effect) / S.d. t110 if no
effect
23
Brain mapping example fMRI
Dependant or target
Predictor M M M M
M
Frame Voxel1 Voxel2 Voxel3 Voxel4 Voxel30786
Stimulus 1 1.1 1.66 1.53 0.77 ...
-0.12 hot 2 -0.59 0.23 0.38 -0.43
... -1.73 hot 3 1.06 1.57 1.56
1.14 ... 0.64 hot 4 1.63 1.79 0.88
-0.22 ... -0.07 hot 5 2.3 1.96
1.41 1.33 ... 1.76 hot 6 1.27 1.36
0.73 0.24 ... 1.22 warm 7 1.18
1.33 1.35 1.3 ... 0.88 warm 8 0.98
0.9 0.47 0.18 ... 0.6 warm 9
1.46 1.25 0.77 0.73 ... 1.3 warm
10 0.07 0.7 1.29 1.96 ... 2.04
warm 11 0.39 0.68 1.13 1.81 ... 1.8
warm 12 0.04 -0.04 -0.18 0.37 ... 1.63
hot 13 -0.06 0.2 0.29 0.49 ...
0.7 hot 14 -0.48 -0.26 -0.19 -0.16 ...
-0.42 hot 15 -0.09 -0.39 -0.84 -0.94
... -0.68 hot 16 -0.24 0.02 0.51
1.2 ... 1.38 hot 17 -1.52 -1.11 -1.44
-1.88 ... -1.11 hot 18 -0.07 0.1
-0.07 -0.24 ... 0.17 warm 19 -1.4
-0.57 0.01 0.3 ... 0.41 warm
117
-0.01 0.5 0.74 0.83 ... 0.99 warm
24
Misclassification matrixfMRI

Actual category
Hot Warm
Predicted Hot 51 1
category Warm 7 58

25
Splitting the SPM itself
Dependant or target
Predictor ? ? ?

Voxel x y z T statistic
1 1.1719 -10.5469 7.2921 5.4852
2 3.5156 -10.5469 7.2921 5.9170
3 5.8594 -10.5469 7.2921 5.0115
4 1.1719 -8.2031 7.2921 6.1082
5 3.5156 -8.2031 7.2921 6.4825
6 5.8594 -8.2031 7.2921 5.7299
7 1.1719 -5.8594 7.2921 6.7113
8 3.5156 -5.8594 7.2921 7.3540
9 5.8594 -5.8594 7.2921 6.5934
10 1.1719 -10.5469 14.2921 5.4519
11 3.5156 -10.5469 14.2921 6.3674
12 5.8594 -10.5469 14.2921 6.3184
13 1.1719 -8.2031 14.2921 6.2774
14 3.5156 -8.2031 14.2921 6.5888
15 5.8594 -8.2031 14.2921 6.2456
16 1.1719 -5.8594 14.2921 6.3583
17 3.5156 -5.8594 14.2921 6.4093
18 5.8594 -5.8594 14.2921 5.8665

26
How do we split on a spatial predictor?Splits
can be regarded as models with different means
for the two groups
SPM model
SPM model
Monotonic predictor
Free predictor
Smoothed SPM model
Unsmoothed SPM model
Smooth SPM with a filter that matches the model
Free predictor
Spatial predictor
27
So

Treating spatial location as a free predictor
(for the smoothed SPM) is equivalent to simply
thresholding the smoothed SPM
We can choose the threshold to control the false
splitting rate to P lt 0.05 using Bonferroni
corrections or random field theory
If model width is unknown, we can make filter
width another parameter of the model, which leads
to scale space

28
(No Transcript)
29
Scale space smooth X(t) with a range of filter
widths, s continuous wavelet transform adds an
extra dimension to the random field X(t, s)
Scale space, no signal
34
8
22.7
6
4
15.2
2
10.2
0
-2
6.8
-60
-40
-20
0
20
40
60
S FWHM (mm, on log scale)
One 15mm signal
34
8
22.7
6
4
15.2
2
10.2
0
-2
6.8
-60
-40
-20
0
20
40
60
t (mm)
15mm signal best detected with a 15mm smoothing
filter
30
Matched Filter Theorem ( Gauss-Markov Theorem)
to best detect a signal white noise, filter
should match signal
10mm and 23mm signals
34
8
22.7
6
4
15.2
2
10.2
0
-2
6.8
-60
-40
-20
0
20
40
60
S FWHM (mm, on log scale)
Two 10mm signals 20mm apart
34
8
22.7
6
4
15.2
2
10.2
0
-2
6.8
-60
-40
-20
0
20
40
60
t (mm)
But if the signals are too close together they
are detected as a single signal half way between
them
31
Scale space can even separate two signals at the
same location!
8mm and 150mm signals at the same location
10
5
0
-60
-40
-20
0
20
40
60
170
113.7
20
76
50.8
15
S FWHM (mm, on log scale)
34
10
22.7
15.2
5
10.2
6.8
-60
-40
-20
0
20
40
60
t (mm)
32
FWHM 6.8mm
33
FWHM 9mm
34
FWHM 11mm
35
FWHM 15mm
36
FWHM 20mm
37
FWHM 26mm
38
FWHM 34mm
39
FWHM
40
FWHM
41
FWHM
42
FWHM
43
FWHM
44
FWHM
45
(No Transcript)
46
FWHM
47
FWHM
48
Functional connectivity

Measured by the correlation between residuals at
every pair of voxels (6D data!)
Local maxima are larger than all 12 neighbours
P-value can be calculated using random field
theory
Good at detecting focal connectivity, but
PCA of residuals x voxels is better at detecting
large regions of co-correlated voxels

Activation only
Correlation only
Voxel 2

Voxel 2

Voxel 1
Voxel 1

49
Correlations gt 0.7, Plt10-10 (corrected)
First Principal Component gt threshold
50
False Discovery Rate (FDR)

Benjamini and Hochberg (1995), Journal of the
Royal Statistical Society
Benjamini and Yekutieli (2001), Annals of
Statistics
Genovese et al. (2001), NeuroImage
FDR controls the expected proportion of false
positives amongst the discoveries, whereas
Bonferroni / random field theory controls the
probability of any false positives
No correction controls the proportion of false
positives in the volume

51
P lt 0.05 (uncorrected), T gt 1.64 5 of volume is
false
Signal Gaussian white noise
Signal
True
Noise
False
FDR lt 0.05, T gt 2.82 5 of discoveries is false
P lt 0.05 (corrected), T gt 4.22 5 probability of
any false
52
Comparison of thresholds

FDR depends on the ordered P-values
P1 lt P2 lt lt Pn. To control the FDR at a
0.05, find
K max i Pi lt (i/n) a, threshold the
P-values at PK
Proportion of true 1 0.1 0.01
0.001 0.0001
Threshold T 1.64 2.56 3.28
3.88 4.41
Bonferroni thresholds the P-values at a/n
Number of voxels 1 10 100 1000
10000
Threshold T 1.64 2.58 3.29
3.89 4.42
Random field theory resels volume / FHHM3
Number of resels 0 1 10
100 1000
Threshold T 1.64 2.82 3.46
4.09 4.65

53
P lt 0.05 (uncorrected), T gt 1.64 5 of volume is
false
54
FDR lt 0.05, T gt 2.66 5 of discoveries is false
55
P lt 0.05 (corrected), T gt 4.90 5 probability of
any false

Write a Comment

User Comments (0)