Title: Statistical Frequency in Word Segmentation
1Statistical Frequency in Word Segmentation
2Words dont come with nice clean boundaries
between them
- Where are the word boundaries?
3Question How do children work out where the word
boundaries are?
There are several potential clues
- Pauses (although this is dubious)
- Intonation (this too is dubious)
4Statistical Regularities
- Words very rarely begin with dw,
- Words never begin with bn,
- Words never begin with lb,
- Etc.
- So if the child hears these sequences, the child
hypothesizes the sequence occurred in the middle
or at the end of the word.
5Statistical Regularities
- Voiceless stops that begin words are almost
always aspirated,
- Voiced segments that end words are often
de-voiced,
- Various other phonological processes may occur,
e.g., word-final frication, etc.
- So these are phonological clues that may help
segment the speech stream.
6Problem
- In order for children to be able to make use of
these cues, they must be able to track the
frequency of such items in the speech, otherwise
it is a useless cue.
- So if the child is not able to track the
frequency of bn at the beginning of words, what
use is using this strategy?
7Statistical Tracking
- Very recent work suggests that children do in
fact have the capacity to track statistical
frequencies of certain elements in their
environment.
- Major researchers Jenny Saffran (Wisconsin),
Rebecca Gomez (Arizona), Elisa Newport
(Rochester), Richard Aslin (Rochester), LouAnn
Gerken (Arizona), Gary Marcus (NYU), etc.
8The Experiment - Overview
- Create a synthesized string of syllables that
occur in a particular frequency (cant use
English).
- Expose the children to this string of syllables
for 20 minutes.
- Test children to see if they have a preference
for the highly frequent syllable sets or the rare
syllable sets.
- If children show a preference (no matter what
direction that preference is in), then children
are sensitive to frequencies of syllables in the
input.
9Sample Stimulus
- Their language consisted of
- Four consonants (p,t,b,d)
- Which when combined created 12 syllables (pa, ti,
bu, da, etc.).
- These then created six words
- babupu, bupada, dutaba, patubi, pidabu, and
tutibu
10(No Transcript)
11Transitional Probabilities
- The chances of a word containing bu are much
greater than the chances of a word containing di.
- Transitional probabilities quantify this.
- The Transitional Probability of xy is
- xy
- x
12Transitional Probabilities
- So for the word babupu, the transitional
probability of babu is calculated as follows -
Frequency of babu / Frequency of ba ? 1/2 0.5
Frequency of bupu / Frequency of bu ? 1/4 0.25
Overall transitional probability of the word
babupu (0.50.25) / 2 0.375
13Whats the point?
- Transitional probability was manipulated so that
- The transitional probability was high within a
word, but low across a word boundary. This is
what a word IS in real life.
14- ba bu pu bu pa da du ta ba
15- 300 tokens of each of the six words were randomly
concatenated.
- All word boundaries were removed
- This left 4536 continuous syllables, which were
read by a speech synthesizer.
- Synthesizer produced a monotone of syllables at a
rate of 216 syllables per minute.
16Procedures
- Subjects consisted of 24 undergraduate students.
- Subjects were told to listen to nonsense
language. - Task is to figure out where words begin/end.
- After 3 blocks of 7 minutes of exposure to the
language, subjects were tested.
17Test Procedure
- Subjects heard two tri-syllabic strings, e.g.,
Real word
Not a real word
Which sounds more like a word from this nonsense
language?
36 trials in the test.
18Results
- Mean score correct for all subjects was 27.2,
where chance is 18. t-test shows this to be
statistically significantly different from
chance. - Conclusion adults are able to recognize what is
a word and what is not a word based purely on
statistical frequency.
19- the three words with the most common syllables in
them were easiest to recognize.
- the three words with the least common syllables
in them were hardest to recognize.
20But can kids do this too?
- Answer appears to be Yes.
- Saffran et al. (1996) used essentially the same
stimuli on 8 month old children
- Used four strings of words instead of six.
- Children were exposed for only 2 minutes (not 21
minutes)
21Methodology
Child
light
speakers
22Results
- Children looked statistically longer at the
speaker from which novel words were being
produced. - Why is this? Why wouldnt they look longer at
the speaker from which familiar words are being
produced?
23Bottom Line
- Children have the ability to track transitional
probabilities of sounds on the basis of very
little exposure. - This is therefore how words are parsed.
24Tool against Nativism?
- This has recently been the most prolific weapon
against the idea that children use innate
knowledge to acquire language.
- If children are using such sophisticated skills
to segment words, why cant they use similar
(non-linguistic) skills to learn syntax?
25But it isnt so simple
- Marcus et al. (1999) trained children on
sentences of the following sort
- la ta la
- ga na ga
- da ba da
26And tested them on
- wo fe wo
- gi tu gi
- po zi po
- Namely, words with
- new syllables, but
- the same structure (x-y-x)
And
- wo fe fe
- gi tu tu
- po zi zi
- Namely, words with
- new syllables, and
- new structure (x-y-y)
27Results
- Children appear to recognize the difference
between these sets of stimuli - ? Children are therefore tracking structure and
not just simple statistics.
28Questions to ask yourself
- Why would statistical tracking be useful to
linguists? - As a tool to explain language acquisition.
- Does statistical tracking explain how children
acquire language?
- No, only certain aspects of it.
- What aspects of language can we track?
?So far, it appears only phonologically related
things can be tracked like this (not
meaning-related things).
29Most Important Questions
- Is this useful for ALL languages on Earth?
- ? It appears that statistical tracking is only
useful for auditory stimuli, not visualASL?
- Are humans the only creatures that can do this?
(I hope so, otherwise other animals should have
language too)
? No. Vervet and Tamarin monkeys have been shown
to have essentially the same abilities that
humans do.
30So what do we really know?
- Kids have spectacular abilities to track
statistics.
- But so do adults (so why cant adults learn
languages as well as kids?)
- But so do monkeys (so why cant monkeys learn
language as well as humans?)
- This ability appears to be limited to statistics
in auditory perception.