Title: Using Evoked Magnetoencephalographic Responses for the Cognitive Neuroscience of Language
1Using Evoked Magnetoencephalographic Responses
for the Cognitive Neuroscience of Language
- Alec Marantz
- MIT
- KIT/MIT MEG Joint Research Lab
- Department of Linguistics and Philosophy
2From Cog Sci to Cog Neurosci
- Cognitive Science, including Linguistics, has
used behavioral data to develop computational
theories of language representation and use - These theories play out along the dimensions of
time (sequential processing stages), space
(separation of processing functions) and
complexity (difficulty of processing)
3Cognitive Neuroscience of Language
- Cognitive Science moves to Cognitive Neuroscience
when the temporal, spatial, and complexity
dimensions of cognitive theories are mapped onto
the time course, localization, and intensity of
brain activity - However, because of the lack of temporal
information, the development of Neurolinguistics
with fMRI and PET techniques has tended to
flatten theories of the Cognitive Neuroscience of
Language
4Cognitive Science Taft Forster 1977
(traditional articulated Cog Sci)
Affix stripping, followed by recombination of
stem and affix
5sample prediction from model
- -semble is a stem, since assemble, resemble,
dissemble are words - -sassin (assasin) is not a stem, since only
assassin is a word - It should take longer to reject semble as a
non-word than sassin, since semble is a
lexical item (semble requires looping from box
4 through box 5 in the model before reaching box
7, while sassin pushes directly from box 4 to
box 7, No)
6Taft 2004 further behavioral support for
articulated model of processing stages
More contemporary instantiation of model -- makes
predictions about RTs based, e.g., on a theory of
the experimental task
7Flattened computational model Gonnerman Plaut
(2000)
8- Masked priming experiment compares responses to
- Semantic sofa-COUCH
- Morphological hunter-HUNT
- Orthographic passive-PASS
- Unrelated award-MUNCH
- Claim failure to find special location for the
morphological condition (using fMRI) supports
flat model in which morphology is an emergent
property of semantic and phonological/orthographic
relatedness
9fMRI experiment consistent with flattened
computational model. Temporal/sequential
processing not at issue.
But the masked priming experimental design is
confounded with respect to predictions from a
Taft-style model with affix-stripping since the
orthographic items consist of possible stems
and stripable affixes (e.g., tenable/ten
passive/pass)
10Articulated vs. Flattened Model
- Tafts articulated affix-stripping model predicts
that tenable and bendable should be processed
in the same places (in the model/brain) and in
the same temporal sequence (affix stripping
followed by stem activation followed by
recombination), with differences in complexity
(measured, e.g., by level of brain activity or
latency of brain events) - Thus the cognitive science model predicts the
fMRI results and makes further predictions
testable with techniques that allow exploration
of the latency of brain responses
11MEG allows cognitive neuroscience to fully
embrace cognitive science
- MEG records the magnetic fields generated by
electrical activity in the brain, millisecond by
millisecond - MEG has the spatial resolution, the temporal
resolution and the sensitivity necessary to test
predictions from cognitive science along the
space, time and complexity dimensions
12Plot
- Examples of MEG experiments exploiting the
temporal, spatial, and intensity resolution of
the technique - A return to Tafts stages
- The future even closer ties between
experimental designs in cognitive science and
cognitive neuroscience
13KIT/MIT MEG Lab
14Magnetoencephalography (MEG) study of the
brains magnetic fields
http//www.ctf.com/Pages/page33.html
15Magnetoencephalography (MEG)
Liina Pylkkänen, Aug 03, Tateshina
Distribution of magnetic field at 93 ms (auditory
M100)
Averaged epoch of activity in all sensors,
overlapping wave forms, one line/sensor
16MEG exemplified
17Parametric variation in letter string length and
in added visual noise
Categorical symbol vs letter manipulation
18M100 response varies in intensity with visual
noise M170 response varies in intensity with
string length
Note separation in space and temporal sequence
(M100 vs. M170) consistent with sequential
processing model
M100 response
M170 response
19Intensity of M170 response to letters as compared
to symbols confirms function of processing at
M170 time location (visual word form or
letter string area)
Reaction time to read words predicted by
combination of M170 amplitude and latency
20Latency coding? Response latency correlates with
stimulus properties.
21Auditory M100 (from auditory cortex)
22Frequency of tone predicts latency of M100 peak
23Temporal Coding?Shape of response over time at
M100 latency and source location correlates with
phonetic category of stimulus
24Voiced (b,d) vs. voiceless (p,t) consonant
auditory evoked response
25- Different ways of measuring the shape of the M100
response to voiced vs. voiceless consonants yield
good computational experts that can classify
data from a single response as either a pa/ta or
a ba/da with significantly greater than chance
accuracy
26Sequential processing of words
27What happens in the brain when we read words?
Letter string processing (Tarkiainen et al. 1999)
Lexical activation (Pylkkänen et al. 2002)
28Note left lateralization of responses in standard
perisylvian language areas
29M350
Latency of M350 sensitive to lexical
factors such as lexical frequency and
repetition
30M350 is (in time and place) the locus of lexical
activation lexical decision modulated by
competition among activated items occurs later
and elsewhere
31Vitevich and Luce (1998), stages of word
processing
- Phonotactic probability (sub-lexical frequency of
bits of words) affects lexical activation, with
frequency being facilitory - Phonological neighborhood density affects lexical
decision (after activation), with density being
inhibitory - Phonotactic probability and neighborhood density
are usually highly correlated, so the same items
that facilitate activation inhibit decision - So, words with high phonotactic probabilities
from dense neighborhoods should show quicker M350
latencies but slower RTs in lexical decision
32Words and non-words with high probability sound
sequences, from dense neighbors, show quicker
M350s and slower RTs
33Pylkkänen et al. (2002)
M350 not sensitive to competition from
phonological neighbors, RT is
NEIGHBORHOOD COMPETITION EFFECT
SUBLEXICAL PHON FREQUENCY EFFECT
34Irregular Past Tense PrimingStockall Marantz
(to appear in Mental Lexicon)
- In cross-modal priming (hear one word, make a
lexical decision on a letter string presented
immediately after), irregulars dont generally
prime their stems behaviorally - gave-GIVE taught-TEACH
- Allen Badecker show that orthographic overlap
in this experimental design leads to RT
inhibition and that past-tense/stem pairs with
higher orthographic overlap yield less priming
than those with less overlap
35Prediction of linguistic theories (e.g.,
Distributed Morphology)
- Irregular past tense/stem priming paradigms
(gave/give, taught/teach) should yield identity
priming at the stage of root/stem activation (the
M350) and form competition effects among
allomorphs subsequently, slowing reaction time
relative to pure stem/stem identity priming.
36MEG irregular past-tense priming experiment
- Design
- Visual-visual immediate priming, lexical decision
on the target - (see Pastizzo and Feldman 2002 )
prime
target
450 50
200 0 2500ms
Duration of trial (ms)
37MEG Results M350 Priming for Past Tense/Stem
equivalent to identity priming
Significant priming for Identity condition
(p0.01) TAUGHT-TEACH vs. SMACK-TEACH
(p0.04) GAVE-GIVE vs. PLUM-GIVE
(p0.05) No reliable effect for
STIFF-STAFF vs GRAB-STAFF (p0.13)
Amount of Priing
Amount of Priming n8
38RT Results Competition effects no significant
priming for TAUGHT-TEACH
Significant priming for Identity condition
(p0.0009) GAVE-GIVE (p0.03) Significant
inhibition for STIFF-STAFF (p0.01) No
reliable effect for TAUGHT-TEACH (p0.21)
(but trend towards inhibition)
n.s.
39MEG RT ResultsMEG taps stem activation RT
reflects decision in the face of competition
n.s.
40Follow-up Add regulars and ritzy/glitzy condition
- Regulars
- walk-walked
- Orthographic Semantic Overlap
- boil-broil
- Reverse order, stem before past tense
41ritzy-glitzy items
- dropdrip clashclang
- flipflop blossombloom
- petpat ghostghoul
- gloomglum shrivelshrink
- squishsquash crumplerumple
- boilbroil screechscream
- strainsprain convergemerge
- mangletangle scaldscorch
- slimtrim crinklewrinkle
- bumplump attaingain
- burstbust scrapescratch
42Order effect on RT i.e., on form competition
43Linguistic Computational Models of Morphology
fully supported
- Relation between irregular past tense form and
stem is like that between regular past tense form
and stem (or between identical stems), not like
that between words phonologically/orthographically
and semantically related (boil - broil) - Root priming separates from form competition
(between allomorphs of stem) in time course of
lexical access
44Taft (2004), Morphological Decomposition and the
Reverse Base Frequency Effect.
- Claim Base frequency effects (RT to complex word
correlates with freq of stem) reflect access of
the stem of morphological complex forms whereas
surface frequency effects (RT to complex word
correlates with freq of complex word) reflect
stage of checking recombination of stem and affix
for existence and/or well-formedness. - The suggestion being made, then, is that the
advantage at the early stages of processing of
having a relatively high base frequency could be
potentially obscured by counterbalancing factors
happening at later stages of processing. 750-1
45Lexical Decision Task
- non-word foils consisting of existing words with
ungrammatical affixes (mirths, kettled, joying,
redly, iratest) (just like the Devlin
orthographic cases) - three classes of words
- mending class low surface frequency
- low base frequency
- seeming class low surface frequency
- high base frequency
- growing class mid surface frequency
- high base frequency
46- Claim advantage of high base frequency for
seem at stem access stage (indexed by the M350)
is offset in RT by a disadvantage for the
low-frequency of the use of the ing with the
seem stem, i.e., at the post-affix
recombination stage, indexed by RT - (For Taft, manipulating the foils in lexical
decision attenuated the surface frequency effect,
arguing for two stages of processing in the
indirect fashion typical of good cognitive
science )
47Reilly and Holt 2004, with the KIT/MIT MEG Team
- Replicate Tafts experiment in the MEG Lab
- Predict
- base frequency affects root access and thus M350
latency - surface frequency affects post-M350
recombination stage and thus RT
48Results M350 Latency tracks Base Frequency, RT
tracks Surface Frequency
gt
gt
gt
Surface Frequency effect at RT (significant at
.05 level), Mending and Seeming slower than
Growing
Base Frequency effect at M350 Latency
(significant at .05 level), Mending slower than
Seeming and Growing
49Conclusion
- MEG serves as a tool to upgrade cognitive science
( linguistics) to cognitive neuroscience without
losing the empirically motivated richness of
cognitive computational theories - Cog Sci notions of space, time, and complexity
map onto brain space, latency and magnitude of
neural activity
50Whats the next step?
- Traditional approaches to MEG analysis involve
averaging together many responses (repeated from
an experimental bin) prior to computing
differences in responses by condition within each
subject - This contrasts with standard cognitive science
practice (e.g., with RT) of including a dependent
measure from each trial in the ANOVA. - To fully incorporate cognitive theories into
cognitive neuroscience, including the correlation
of continuous variables with continuous response
measures and the use of item analyses in complex
designs, we need to include single trial MEG data
in our analyses
51Why not single trial MEG?
- For the type of experiment discussed in this
talk, we would need to extract response amplitude
and latency information from each trial, given a
response defined in terms of source
localization - So, we would look at each single response for
dipole source activation (latency of peak
response, amplitude of response) for a source
identified from grand averaged data for a subject
52M100 Latency, Single Trials(Marantz, in
preparation)
- Left hemisphere M100 source computed via single
dipole model from grand averaged response to 60
tones, 30 at 200Hz, 30 at 1KHz - Weight matrix from dipole source used as spatial
filter over raw data to derive dipole activation
latency for each tone individually
53Single trial M100 latencies
200Hz
1 KHz
54Single trial analysis as in behavioral studies is
possible using only normal MEG techniques and
tools
- No fancy pre-processing
- No fancy localization or statistical tools
- For responses less automatic than the M100,
expect overlap in scatter plots to be greater
(approaching that for RTs in e.g. lexical
decision experiments)
55Taft Forster re-visited
- Is RT slow-down for -semble (bound stem) over
-sassin (pseudo-stem) attributable to lexical
access for semble but not for sassin, as Taft
claims, or to response competition from words
(resemble, dissemble, assemble vs. assassin)? - Prediction slow-down at lexical access should
show up at M350 while slow-down for response
competition should occur after (as shown by
neighborhood density and past tense studies)
56Brown Marantz (in preparation)
- 3 subjects
- 20 real stems, 20 pseudo stems (matched by Taft
Forster along various dimensions) per condition - Single trial analysis of MEG data M350 dipole
activation peak analysis, with M350 dipole fitted
over left-hemisphere sensors on the grand average
to all stimuli in the experiment
57Slow-down is observed at M350 for 3 subjects and
108 observations, difference is significant over
the single trial MEG data but not yet for RT
58- Taft theory of decomposition in which bound stems
have lexical entries is fully supported by the
MEG data - Single trial MEG data is at least as consistent
as reaction time data - MEG can be used on par with RT to add additional
dependent variables to experiments testing
computational theories within cognitive
neuroscience
59Thank you.