Title: Robert F. Port
1Presentation given at University of Illinois and
the Beckman Institute, March 31, 2005.
Modified slightly in the light of comments made
there. Thanks for helpful comments from Jennifer
Cole, Jont Allen, Hans-Heinrich Hock, Gary Dell,
Richard Sproat, and many others.
Linguistics and rich memory What could
phonology be if memory for words is episodic?
- Robert F. Port
- Linguistics and Cognitive Science
- Indiana University March 31, 2005
- Thirty years of memory research demonstrates
people remember words as detailed episodes. They
do not depend on an abstract, speaker-independent,
rate-independent code like phonology or a
phonetic transcription. - This violates all the predictions of linguistic
analysis. - Eventual Conclusion Linguistic structures (from
phonemes to sentences) are social constructs, and
play a negligible role in real-time processing of
language.
2Outline
- Part 1 Memory. It is much richer than we
thought. Episodic (or exemplar) models should be
embraced, not avoided as implausible. - Part 2 Phonemes and the alphabet. Western
academics find a symbolic model of mind
compelling due partly to our profound dependence
on graphic representation of language. - Part 3 What is a language? It is a social
institution a structure maintained at the level
of the community. The knowledge of language
in the individual speaker is completely
idiosyncratic at a detailed level, and thus less
interesting. - What is linguistics? The study of
distributions of utterances.
3Part 1 Human memory is rich and detailed.
Everything that can be noticed is remembered.
- Recognizing visual images, we tend to retain
something about the rich details. - Photo Recognition Demo
- Slide show of car portraits, about 25 slides
- 1 sec per slide
- 5 slides repeated after 4-22 intervening slides.
- Ought to be difficult since photos are very
similar, but - ______________________________
- Observations
- Not very hard to do, despite high similarity
between slides. - We easily remember rich sensory details at
least in vision. - Other data show When reading, we store the font
and location on the page of words. - But rich memory is denied for speech perception
by the symbolic theory of language. -
4Standard View of Language (Eg, Peirce,
Jakobson, Chomsky, most (all?) modern linguists)
- 1. Language is a cognitive symbol system
- Symbols discrete, static, serially ordered
tokens with associated meanings. - Perfectly recognized and produced.
- The basic unit is the phonological segment (or
its distinctive features). - 2. used for real-time processing of language.
- Speech production encoding, composing
utterances - internal symbols ? external
- Speech perception decoding, recovering
abstract symbols from speech - external symbols ? internal
- Linguistically irrelevant detail is stripped
away. - Linguistic symbols are abstract patterns with no
speaker or rate variation.
5- Thus most linguists and phoneticians would agree
with Morris Halle - It is unlikely that the information about the
phonic shape of words is stored in the memory of
speakers in acoustic form . - Properties directly linked to the unique
circumstances surrounding every utterance are
discarded in the course of learning a new word.
-
- 1985, in Fromkin, ed.
- But this is not true. Lots of detail is stored.
- Here are some data.
6Recognition memory for words - 1
- Goldinger (1992)
- Ss listened to first list of words spoken by 2,
6, or 10 voices. - Second list read 5 min, 1 day or 1 week later.
Asked to indicate word as repetition vs. new. - Items with the same voice were more accurately
recognized. Different voices, less accurately. - The representation in memory must include speaker
details. - Or could they be remembering the words abstractly
and just associating the voices?
10 0
7Recognition memory for words - 2
- Palmeri, Goldinger, Pisoni (1993)
- Ss heard a continuous list of words spoken by 2,
6, 12 or 20 voices. - Asked to recognize repetitions after lag of 2, 4,
8, 16, 32 or 64 words. - No effect of number of talkers on recognition.
- Performance declined with increasing lag.
- Voice must be automatically encoded with the
word. - Both nonlinguistic and linguistic features help
recognition. - Words must be stored episodically, just like
visual images.
8Human memory is rich and detailed. Little is
discarded.
- Psychologists studying human memory use word
lists and arbitrary categories - (Hintzman 1986, Nosofsky 1986,
Kruschke, 1992, Shiffrin and Steyvers, 1997). - Mathematical models predict performance on many
tasks including - recalling lists (in serial order or unordered)
- recognizing previously presented items
- reaction time for both
- The best-fitting models assume that Ss retain
maximum detail about the presentation. - They do not store just a pared-down, minimal
representation of schematic properties (Pisoni,
1997). - Memory for words is not different from visual
memory.
9 If memory for speech is episodic, what are
linguistic symbols?
- Reply Maybe linguistic symbols (words,
phonemes, etc) are like prototypes. - Many categories have a prototype, an ideal mean,
centroid token that best represents the category
(Rosch, 1978). Prototype members of a category
come to mind faster, are recognized more quickly,
etc. - Categories that are more abstract have fewer
features than concretes. - Granny Smith apple gt apple gt fruit
- Fluffy gt tabby cat gt housecat gt cat gt pet
- Bob saying tomato gt English word tomato
- HOWEVER,
- mathematical models of memory exhibit the
behaviors that support prototypes and
abstractions. But do it by storing rich detail
and computing abstractions and prototypes
whenever needed. - Halle was right When we think about a word, we
dont think of a particular voice or intonation.
We think of it as an abstract token. Exemplar
memory models claim we create the abstract,
reduced-feature, form of words only when we need
one.
10Minerva 2 Storing Episodes
- Lets look closer at a specific model.
- Minerva 2 Model (Doug Hintzman, 1986) Every
episode or exemplar is stored as a trace a
long vector of features, added to memory. For
words, the features represent many kinds of
information. The features can only be 1, ?1 or 0
in Minerva 2. - Exemplar Memory a matrix of feature vectors for
each exemplar in the experiment. -
-
-
-
1
0
1
0
-1
-1
1
1
1
0
-1
1
0
-1
1
1
-1
1
1
0
0
-1
1
pronunciation ftrs orthographic ftrs
semantic ftrs contextual ftrs
11Minerva 2 Probing Exemplar Memory
- Probing Memory. Each new episode is a probe into
the memory matrix. - The similarity of the probe is computed to all
traces. - Traces of the most similar episodes become highly
active. - The memory response (or echo) can show greater or
lesser activity overall (intensity) and a certain
prominent pattern of activity (content). - Echo Intensity. Stimulate memory with a probe.
The more activation across features and traces,
the greater the intensity of response. So if
there are many similar copies, the higher
familiarity of the probe. -
- Recognition Task For a new/old recognition task,
you set a threshold. If total Intensity is
above threshold, say old, if below, say new. - Prototypes If the probe is an abstract category
(eg, fruit), the most intense traces are its
prototypes.
12Minerva 2 Computing Abstractions
- Echo Content. The probe activates a subset of
traces. The common features across this set are
computed. These features specify an abstract
pattern similar to the probe but generic a kind
of abstract category for the probe. - The features not shared cancel out leaving an
abstract vector with fewer features a prototype
or schema or abstract object. - Thus hearing the word tomato activates the
prototype pronunciation and the abstracted
meaning of tomato. - Our intuitions about abstract symbols words,
phonemes, etc may reflect integration of
content across traces.
13Uses of Echo Content
Probe with pronunciation retrieve meaning from
many examples
1 1 1 1
Probe with semantics retrieve pronunciation and
spelling
1 1 1 1
Recognition memory (Palmeri et al) Ss did better
with the same voice because recognition was
supported by the additional voice features.
14Linguistic categories in episodic memory
- The formal modelers (eg, Nosofsky, Hintzman) use
random bitstrings for their feature vectors. They
only model qualitative behavior. (Eg, if the
task is changed by X, does the model correctly
predict performance change?) - Linguistics needs more refinement and specific
feature content. - Linguistic categories (like Voiced, High-front
Vowel, Sibilant, LH tone, etc) can be modeled
simply as features in the traces of episodes.
15Conclusions Memory
- 1. A rich memory for episodes has experimental
support even for speech. Maybe abstract
objects (including words and phonemes) do not
need to be remembered, but are simply computed on
the fly when useful. - Much of what we need abstract symbols for (such
as to specify motor patterns or recognize a word)
can be done directly from a database of concrete
episodes in a very high-dimensional space. - But can an exemplar system that re-analyzes its
memories support construction of creative or
novel patterns? Time will tell. But it seems
likely. - The episodic view of memory entails that each
speaker codes linguistic skills
idiosyncratically. The linguistic knowledge
of the individual brain seems less interesting. - Next Topic
- Now, why are these ideas so difficult to see?
Most cognitive scientists resist a rich
exemplar-like memory. Linguists too. - Here is one likely reason.
16Part 2 Alphabets and Phonemes
- Why are symbolic prototypes like phonemes so
compelling to us? No one seems to have
considered any other possibility for analysis of
language. - Much of the power of the phoneme may result
from our cognitive dependence on alphabetic
writing. - Linguists and phoneticians, like most other
cognitive scientists, are committed to discrete
segments. - For example the International Phonetic
Alphabet.
17From the Handbook of International Phonetic
Association (1999 edition)
- Theoretical assumptions of the IPA alphabet
include (p. 3-4) - Some aspects of speech are linguistically
relevant and some are not (e.g., personal voice
quality, speaking rate) - Speech can be represented partly as a sequence of
discrete sounds or segments - Segments can usefully be divided into two major
categories, consonants and vowels - The phonetic description of consonants and vowels
can be made with reference to how they are
produced and to their auditory characteristics. - All texts in linguistics and phonetics since
about 1900 assume that segmentally defined
phonemes are the basic units of all languages. - (The main exception is J. R. Firth (1948)
who decried emphasis on phonemics.)
18Engineering the Alphabet
- Western writing techniques began about 8000 BCE
in Mesopotamia (Sumer) and spread to Egypt. - About 1700 BCE, Proto-Canaanite alphabet of 17
consonant symbols began to spread in the middle
east. - Greeks borrowed this and added vowels.
-
- Phoenician alphabet
- Early Greek (1200 BCE)
- Early Roman
- Letters permit a reader to reconstruct
(approximately) the sounds of the spoken
language. Word boundaries only gradually became
marked, but were common by Roman times. - Texts were primarily read aloud for 2 millennia
after development of alphabetic writing. This
suggests words were still defined by sound, not
by their spelling.
19Alphabet as a Graphic Pattern
- Letters were an engineering solution to
representing words with enough phonetic accuracy
but few enough symbols. - Many of properties of letters stem from their
graphic nature as scratched or drawn by a human
hand - Context independent mutually isolated in
space, nonoverlapping - Serially ordered in rows
- Static remain on the page indefinitely
- Visually distinct can be reliably drawn and
differentiated - These 4 properties are essentially graphic
properties. The degree to which they are also
properties of human speech remains unclear
because of our perceptual bias.
20- Part of the reason an alphabetic form for
language is so intuitive is that we have used
alphabets since early childhood especially for
thinking about language. - We have internalized written language. We use a
letter-based cognitive representation for
thinking about and remembering speech sounds as
well as describing them to others. - It is the most useful and perspicuous
representation available. -
21This is a case of cognitive scaffolding the
use of external objects to facilitate cognition
- (Recommended Walter Ong Orality and Literacy
The Technologizing of the Word, 1988) - Some external things we internalize cognitively.
That is, we know some external things well enough
that we use our understanding to reason about
very different things (e.g., Lakoff and Nun?ez,
2000) - Thus,
- Target Domain Source Domain
- Reasoning about counting sheep ? Reasoning
about physical tokens - Reasoning about national borders ? Reasoning
about containers - Thinking about violent weather ?
Thinking about powerful people - Reasoning about time ? Reasoning about
space - Thinking about speech sounds ? Reasoning about
letters - Thinking about language ? Thinking about
writing
22Why think about speech as something else?Why not
judge speech as sound or as gesture?
It was nearly impossible because
- Speech is articulated very quickly.
- 10-20 segments/sec.
- The moving body parts are largely invisible
tongue, glottis, velum. - And speech gestures overlap in time Eg,
nasality, V-harmony, Cs and Vs - Speech depends on subtle temporal and spectral
patterns difficult to perceive as time or
spectrum. - Eg, voice-onset time, stop place-of-articulatio
n, mora timing (in Jap) - Speech sounds show huge variability due to
dialect, personal style, speaking rate, etc.
Most repetitions are slightly different. - Eg, does p'lice' have 2 Vs or 1?
- So it is difficult to decide what any word
really is. But the orthography simply chooses a
spelling and defends it staunchly. - Linguists need a conceptual tool to help attend
to speech and record it. - There could be no science of linguistics (or
phonetics) until some consistent IPA-like
alphabet was developed.
23Conclusions Alphabets and Phonemes
- The alphabet was an engineering achievement of
enormous significance. - Made possible philosophy and science
- Improved exploitation of trans-generational
knowledge and technologies - The science of language has depended on various
alphabets (eg, phonological and phonetic) to make
sense of language. It was the only tool
available. - But learning the alphabet changes our intuitions
about language. Phonemic awareness biases us
to hear all speech as only Cs and Vs. - My claim is not that The alphabet explains the
phoneme but that Much of the intuitive power
of segmental description of language results from
our lifelong alphabet training.
24Part 3 What is the phonology of a
language?What should linguistics study?
- Linguistics should be interested in the patterns
of speech of some community of speakers. We can
only study distributions of linguistic events. - A language is the possession of a social group.
It is not a structure within the brain of any
single individual. - Linguistics is inherently indeterminate at the
level of a single speaker/hearer. I propose
there is an uncertainty principle here - The detailed representation of linguistic skill
in an individual will not reveal the global
properties of the social institution. - And conversely
- The properties of language as a social
institution will not be found in the form of
linguistic skill within a speaker/hearer.
25Phonology as a social institution
- Phonological facts are true of many people.
Patterns or categories in behavior. There is a
tendency toward discreteness and combination of
features. - This structure is an emergent property. All
causal factors are not yet understood. - Phonological categories are similar to everyday
categories (like chair, house, dog, etc). In
learning about the world, we acquire chunks
(Grossberg Myers, 2001) for recognizing speech.
We learn them from imitating each other. - Often there are fairly discrete types.
- An approximate list of Vowels i, e, a, o
- An approximate list of Onsets b-, br-, bl-,
pl-, - Consonant types
- Syllable inventories
- Rhythmic patterns and styles
- These are culture specific, have vague
membership, a huge set of features play a role,
etc. - Empirical questions that cannot be answered yet
- What is an appropriate descriptive vocabulary for
the speech of various communities, and what
statistical description? - Why do phonological categories evolve in the
direction of an ideal symbol system? (E.g., why
do alphabets often work so well?)
26Phonology as Social Institution -- not as
Speakers Knowledge
- Phonology should be about the phonological social
institution. - Variation is inherent, so the formal tools must
be statistics for describing distributions - Linguistic methods of the future will resemble
those of sociolinguistics (eg, Labov) - and speech technology (eg, Jurafsky)
- The individuals brain is shaped by personal
history. Seems less interesting. - We thought linguistics could study both domains
at once personal and social - The explanation for community regularity was in
individuals grammars - .
- This proposal for episodic memory for language
demands further evidence. - There are many kinds of evidence in its support.
27Converging evidence for rich linguistic memory
- Phonetics Research
- No limit to the phonetic parameters that speakers
can manipulate - (contra Chomsky-Halle). No evidence of a
fixed, apriori phonetic - space (Port and Leary, 05).
- 2. Temporal detail is controlled by speakers and
used by hearers (Klatt, 76 Lehiste, 76). - 3. Incomplete neutralization as in budding vs.
butting (Port). Pierrehumbert (2002) has
proposed an distribution-based model for such
semi-contrasts. -
- Sound Variation and Change
- Most sounds change continuously in minute stages
(Labov, Bybee, Phillips). - No evidence of discreteness.
- Speakers choose from a huge range of potential
pronunciations (Hualde) - These choices demand detailed representations.
- Frequency of Occurrence Effects
- Frequent vocabulary is the locus of many sound
changes. - Frequency influences pronunciation details. (Eg,
Bybee) - Frequency influences speed of perception,
recognition, etc.
28Overall Conclusions
- Human memory is very rich. Memory for words is
far richer than we thought. Minimal, efficient
coding using only distinctive features has
marginal effects. - With rich memory, abstractions are computed as
patterns of activity. - With rich memory, traditional linguistic
categories seem to be socially-defined features
for describing and comparing utterances. They
are real, but not central to linguistic
cognition. - Linguistics is (or should be) concerned with
regularities across some group of
speaker/hearers, not with rules and abstract
symbols.