Title: BEYOND BLOBS 1 R and Independent Component Analysis
1BEYOND BLOBS 1R and Independent Component
Analysis
2The Tyranny of Blobs
- Most fMRI analysis ( a figure of 95 has been
quoted) is so-called mass-univariate, pipeline
analysis. - This involves first attempting to remove motion
artifact and then analyzing data one voxel at a
time to produce a blob map - a thresholded map of
sigificant brain responses to a particular task
or contrast.
3Traditional fMRI analysis (more than 95 of all
papers)
From a lecture by Jean-Baptiste Poline delivered
in 2004.
4Data transformations -SPM
Statistical parametric map (SPM)
Design matrix
Image time-series
Kernel
Realignment
Smoothing
General linear model
Gaussian field theory
Statistical inference
Normalisation
p lt0.05
Template
Parameter estimates
5Data transformations -XBAM
Image time-series
Design matrix
Normalisation
Realignment
Statistical inference
General linear model
Permutation testing
Template
Kernel
Smoothing
Permutation testing
GBAM
Parameter estimates
IBAM
6Pros and cons of the mass univariate approach
- Pros
- These methods have been around for some years and
the packages using them can often be highly
automated leading to a black box batch
processing pipeline requiring little operator
intervention. - They are also well suited to whole brain analysis
approaches where there is no absolutely clear
idea of exactly where activations may occur.
7- Cons
- Do we want to encourage a black box mentality?
Should not users of fMRI software understand what
they are doing, at least to some level? How many
users are able to choose from the options in SPM
or any other widely used package in an informed
way? - Also, treating each voxel as separate does not
make use of the fact that we know that most brain
functions involve networks of areas and that
contiguous voxels show similarities in function.
8Paradigm-dependence
- Another feature of the traditional analysis
approaches is the presence of contrast-based GLM
analysis. This requires a design matrix(contrast
matrix) specifying the details of the
experimental contrasts that we want to examine. - There is now an increasing interest in
examination of the properties of BOLD signals in
the absence of imposed paradigms and the
relationship between such responses in different
brain regions ( resting state connectivity).
9 Other approaches
- Despite the overwhelming market share of the
mass univariate approach in packages such as SPM
there have always been other methods of analysis
- often very sophisticated but also often
difficult to understand for non-mathematicians. - These are slowly attracting more attention -
especially from people who want to know what lies
behind the blobs.
10The Beyond Blobs philosophy
- My aim in these talks is not to replace
mass-univariate analysis but to encourage
researchers in fMRI to take the initiative to
learn more about what lies behind their blob maps
- perhaps to make better use of their data. - There are always time-constraints and data
analysis is already demanding so this will not be
for everyone but if you can spare the time it may
be worth it!!
11Model-free analysis
- Over the last decade, a very large number of
analysis methods have appeared that seek to
analyse BOLD signals not by fitting parametric
models but by looking for features that detect
some interesting features of the time series.
These are often multivariate in nature ( most
often meant to mean that they consider data over
many voxels). - One such technique is Independent Component
Analysis ( ICA).
12A brief and fairly non technical description of
ICA
13First consider two separate time series
14Then imagine mixing them together in different
proportions to give two new time series
15- This is a common situation that we might
encounter in fMRI. - Imagine that we have a real response to an
experimental task - say a block design but that
we also have some motion artefact. - In a particular region the time series that we
extract might be a mixture of response plus
motion. - ICA tries to unmix linear additive mixtures of
signals.
16So, in our example, this is what we might
expected to recover from the mixtures after
applying ICA to the two mixed time series.
17Basic things to remember about ICA
- In its most common form as used in fMRI it
demixes spatially independent, linearly
additive components. - You need at least as many mixed time series as
you wish to extract components. - There are different demixing algorithms ( maximum
non-gaussianity, minimum mutual information).
Informed choices of which of these to use will
involve you a more detailed understanding of how
the programs work. - The examples I have used come from ICA for
dummies http//www.sccn.ucsd.edu/arno/indexica.h
tml
18How to use ICA
- There are a number of different ICA packages
available for use on fMRI data. - FSL - FMRIB (Oxford) analysis package has ICA
capabilities. - The GIFT package written by Vince Colhoun and his
colleagues is quite widely used for fMRI ICA. - The ANALYZEfMRI package written for the R
statistical programming language by J Marchini (
Oxford) is good and will be employed for the rest
of my talk.
19R - statistical programming and fMRI analysis
- R is a free statistical programming environment
and language written under the GNU project. ( GNU
stands for GNUs not UNIX ) and is a project,
started in 1984 to develop a free UNIX style
operating system. - R is similar to the commercial S programming
language/environment (now Splus) developed at
Bell laboratories by John Chambers and his
colleagues. - Many but not all S commands/programs will run
under R.
20Why R?
- R is a major part of my Beyond Blobs
philosophy. - The idea is to have a set of tools for
sophisticated analysis of fMRI data to get behind
the simple activation maps that underlie much
published work in fMRI. - R benefits from having a very large number of
packages written and contributed by expert
statisticians/image analysts. These are available
on the CRAN (Comprehensive R Archive Network)
website. - Some of the packages available at the CRAN site
permit genetic algorithm analysis, neural network
analysis, wavelet analysis, complex econometric
time series analysis, multivariate autoregression
( connectivity), ICA , model based clustering etc
etc.
21Downloading R
- http//www.stats.bris.ac.uk/R/
- The package is available for
- Linux
- Macintosh systems (8.6 -9.1, up to 10.1, 10.2
and above). - Windows ( 95 and later).
22R interface - Macintosh version
23Using R
- R is a command driven interface so it requires a
little work to start achieving useful results. - It can be used in interactive mode ( where you
type in each command and then get the results )
or you can write scripts ( or use scripts written
by other people ) to read data and perform a
series of manipulations. - There is a very useful short introduction to R on
the CRAN website which goes through the basics to
get you up and running. There are also a
considerable number of books and internet guides
to various aspects of R. - Finally - I will be doing an R for fMRI course
next year for the Stats department.
24ICA analysis of fMRI using R.How to get started.
- Download R from The CRAN website for your
machine. It is very easy to install. - Then, get R running and go to the Packages (
Windows) or Packages and Data ( Mac) section of
the top toolbar. - Select Package Installer ( Mac) or Install
Packages ( Windows). - If you are connected to the web this will take
you direct to the CRAN packages website where you
can view a list of packages. - Select the mirror website ( Windows), there are
two UK mirrors, either will do, then select
AnalyzefMRI to download and install it. - Once installed, use the Package Manager ( Mac) or
Load Packages option( Windows) to get it loaded
into R and active for use.
25ICA analysis of fMRI
- Now we are ready to look at some data.
- AnalyzefMRI uses 4D Analyze files, the standard
format used by SPM etc. - I have written a small XBAM script that will
automatically generate the required file if
necessary.
26Getting the R ICA program running
- Once you have the 4D analyze file ( and its
header) you are ready to go. - In the directory where you plan to do the
analysis you should have two files, one called
xxxx.img and the other called xxxx.hdr ( xxxx is
the name of the file and should be the same in
both cases).
27Next steps
- Then get R running in Windows, Linux or MacOS and
in the command window, type - f.ica.fmri.gui()
- You will then see a pop-up window saying spatial
ICA for fMRI data sets.
28(No Transcript)
29- Use the Select File box to go to the directory
where your data are located then click on the
.img file. - Tick the variance normalise and create mask
boxes. - Choose a name to save some output images and put
it in the save to jpeg files box, then hit the
save button on that box. - Click start to get the program running - the
start box should go white and stay white until
the job has run. You will get a lot of output in
the command window finishing with the word done.
30(No Transcript)
31(No Transcript)
32- You then have the option to save some images in
jpg format using the save to jpeg files option.
If you put a filename in this box ( e.g mick) the
images will be saved as mick.comp.1.jpeg
mick.comp.2.jpeg etc etc.
33Output from the ICA analysis
- The ICA program creates an R object called
tmp.ica.obj. - This is a complex collection of different sorts
of data that will now be discussed. - The two most useful parts of tmp.ica.obj are
images of the independent components and the
corresponding time series. - The way to understand this output is to remember
that ICA - the way that it is used here - finds
the spatially independent components ( time
series) that underlie the fMRI data. The images
that are produced show where in the brain these
spatially separate components are located. The
time series are the components.
34Application to some data
- One of our old workhorse data sets is a series of
auditory-visual co-stimulation experiments done
in 1996. This is quite a good illustration of how
ICA works. - We analyzed this experiment using the procedure
described above, extracting 30 independent
components ( there are around 25,000 time series
in this data set). - It is quite interesting to look at some of the
independent components. - We can pick up those components associated with
the stimuli by correlating each of the IC time
series with the auditory and visual components of
the experimental design, then looking at the
components with the strongest correlations. These
correlation calculations can be carried out
rapidly in R. - This illustrates one of the good points about
doing the ICA in R, we can use the whole
statistical power of R to look at the output from
the ICA. We dont need to switch to another
package.
35Spatial locations of components
Power spectrum
IC representing visual response
Time series
36IC representing auditory response
37- The two previous ICs are obviously related to
the visual and auditory stimuli. - But we have extracted 30 ICs
- What are the others?
38Subject motion
- The ICA we carried out was on the raw data, with
no motion correction. - This data has also been analysed using XBAM and
part of that procedure is motion correction. One
of the outputs from the motion correction is
called reg.dat which is a data file of the
translations along the x y and z axes and the
rotations around those axes computed during
realignment of the data. - If we take the 6 columns of reg.dat and correlate
each column with the time series of the ICs we
can get some idea of which ones might be
motion-related.
39Here is some output from this correlation. We can
see that IC1 shows strong correlations ( gt 0.6)
with some of the columns Of reg.dat but IC 2 does
not.
40IC 2 - motion related.
We can see motion artefacts around the front edge
of the brain
Power spectrum shows a lot of low-frequency power
- often seen with motion-related artefacts.
41Most strongly motion related IC - IC 30
Again, note power spectrum with power at very low
frequencies
Time course shows steady signal drift
42IC 2, not correlated with response or motion
Chequerboard artefacts often more typical of
machine-related artefacts
Much more high-frequency power in spectrum
43IC 27 - another non-motion, non reponse related IC
Note areas in brain-stem, ventricles, Intracerebra
l sulcus. This is probably ventricular/csf
pulsation.
Note strong high-frequency peak of power
44More challenging data
- The next illustration is from one of the more
interesting experiments recently carried out at
the IOP. - This involved a two condition block design in
which one condition was swallowing a spoonful of
ice-cream in the scanner and the other condition
was swallowing water. - We expected to have some problems with motion
artefact here! - The same procedure was adopted as that described
above, ie extract ICs, correlate with
experimental design, correlate with estimated
motion.
45IC with stongest correlation with experimental
design
Cerebellar/sensory motor activations due to
swallowing
Two peaks of power, one at A/B alternation
frequency one at half that, I.e one due to
swallowing in general, one due to ice-cream/water
diffs.
46IC with 2nd highest correlation with experimental
design
Power spectrum shows two peaks, one
at experimental frequency, one at twice
that frequency. Time series shows spikes at
start and end of each block.
47One of (many) motion related ICs
Spectrum shows a very low frequency component
due to slow drift plus some power at experimental
frequency and twice that. Nice combination of
stimulus correlated and non stimulus
correlated Motion.
48One for a physicist!!
Equally spaced peaks in the power spectrum!! Peak
power at a frequency much higher than the
experimental design
49Some observations on ICA
- ICA is often regarded as a purely data-driven
analysis method and is very useful in that
respect. - However, if there is an experimental design that
we can describe mathematically, we can
automatically extract which ICs are likely to be
paradigm related. - We can do similar things with motion-correction
data to examine the location and nature of motion
artefacts.
50IC time-series as covariates
- One interesting possibility, described by the
FMRIB ( Oxford) group and others is the inclusion
of IC time series in the GLM ( general linear
model) contrast/design matrix as a nuisance
covariate to allow us to clean up time series
analysis. - For example, if we were interested in hippocampal
function, we might identify all ICs contributing
to time series in that region and include the
non-paradigm related ones as nuisance covariates. - This allows us to use ICA information within the
well-developed modelling inferential structure of
voxel-wise (mass univariate) analysis.