Statistical Frequency in Word Segmentation - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical Frequency in Word Segmentation

Description:

Statistical Frequency in Word Segmentation. Words don't come with nice clean ... No. Vervet and Tamarin monkeys have been shown to have essentially the same ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 31
Provided by: Kam82
Learn more at: http://www2.hawaii.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistical Frequency in Word Segmentation


1
Statistical Frequency in Word Segmentation
2
Words dont come with nice clean boundaries
between them
  • Where are the word boundaries?

3
Question How do children work out where the word
boundaries are?
There are several potential clues
  • Pauses (although this is dubious)
  • Intonation (this too is dubious)
  • Statistical regularities

4
Statistical Regularities
  • Words very rarely begin with dw,
  • Words never begin with bn,
  • Words never begin with lb,
  • Etc.
  • So if the child hears these sequences, the child
    hypothesizes the sequence occurred in the middle
    or at the end of the word.

5
Statistical Regularities
  • Voiceless stops that begin words are almost
    always aspirated,
  • Voiced segments that end words are often
    de-voiced,
  • Various other phonological processes may occur,
    e.g., word-final frication, etc.
  • So these are phonological clues that may help
    segment the speech stream.

6
Problem
  • In order for children to be able to make use of
    these cues, they must be able to track the
    frequency of such items in the speech, otherwise
    it is a useless cue.
  • So if the child is not able to track the
    frequency of bn at the beginning of words, what
    use is using this strategy?

7
Statistical Tracking
  • Very recent work suggests that children do in
    fact have the capacity to track statistical
    frequencies of certain elements in their
    environment.
  • Major researchers Jenny Saffran (Wisconsin),
    Rebecca Gomez (Arizona), Elisa Newport
    (Rochester), Richard Aslin (Rochester), LouAnn
    Gerken (Arizona), Gary Marcus (NYU), etc.

8
The Experiment - Overview
  • Create a synthesized string of syllables that
    occur in a particular frequency (cant use
    English).
  • Expose the children to this string of syllables
    for 20 minutes.
  • Test children to see if they have a preference
    for the highly frequent syllable sets or the rare
    syllable sets.
  • If children show a preference (no matter what
    direction that preference is in), then children
    are sensitive to frequencies of syllables in the
    input.

9
Sample Stimulus
  • Their language consisted of
  • Four consonants (p,t,b,d)
  • Three vowels (a, i, u)
  • Which when combined created 12 syllables (pa, ti,
    bu, da, etc.).
  • These then created six words
  • babupu, bupada, dutaba, patubi, pidabu, and
    tutibu

10
(No Transcript)
11
Transitional Probabilities
  • The chances of a word containing bu are much
    greater than the chances of a word containing di.
  • Transitional probabilities quantify this.
  • The Transitional Probability of xy is
  • xy
  • x

12
Transitional Probabilities
  • So for the word babupu, the transitional
    probability of babu is calculated as follows

Frequency of babu / Frequency of ba ? 1/2 0.5
Frequency of bupu / Frequency of bu ? 1/4 0.25
Overall transitional probability of the word
babupu (0.50.25) / 2 0.375
13
Whats the point?
  • Transitional probability was manipulated so that
  • The transitional probability was high within a
    word, but low across a word boundary. This is
    what a word IS in real life.

14
  • ba bu pu bu pa da du ta ba

15
  • 300 tokens of each of the six words were randomly
    concatenated.
  • All word boundaries were removed
  • This left 4536 continuous syllables, which were
    read by a speech synthesizer.
  • Synthesizer produced a monotone of syllables at a
    rate of 216 syllables per minute.

16
Procedures
  • Subjects consisted of 24 undergraduate students.
  • Subjects were told to listen to nonsense
    language.
  • Task is to figure out where words begin/end.
  • After 3 blocks of 7 minutes of exposure to the
    language, subjects were tested.

17
Test Procedure
  • Subjects heard two tri-syllabic strings, e.g.,
  • bu-pa-da and pi-da-bu

Real word
Not a real word
Which sounds more like a word from this nonsense
language?
36 trials in the test.
18
Results
  • Mean score correct for all subjects was 27.2,
    where chance is 18. t-test shows this to be
    statistically significantly different from
    chance.
  • Conclusion adults are able to recognize what is
    a word and what is not a word based purely on
    statistical frequency.

19
  • Additional finding
  • the three words with the most common syllables in
    them were easiest to recognize.
  • the three words with the least common syllables
    in them were hardest to recognize.

20
But can kids do this too?
  • Answer appears to be Yes.
  • Saffran et al. (1996) used essentially the same
    stimuli on 8 month old children
  • Used four strings of words instead of six.
  • Children were exposed for only 2 minutes (not 21
    minutes)

21
Methodology
  • Head turning Procedure

Child
light
speakers
22
Results
  • Children looked statistically longer at the
    speaker from which novel words were being
    produced.
  • Why is this? Why wouldnt they look longer at
    the speaker from which familiar words are being
    produced?

23
Bottom Line
  • Children have the ability to track transitional
    probabilities of sounds on the basis of very
    little exposure.
  • This is therefore how words are parsed.

24
Tool against Nativism?
  • This has recently been the most prolific weapon
    against the idea that children use innate
    knowledge to acquire language.
  • If children are using such sophisticated skills
    to segment words, why cant they use similar
    (non-linguistic) skills to learn syntax?

25
But it isnt so simple
  • Marcus et al. (1999) trained children on
    sentences of the following sort
  • la ta la
  • ga na ga
  • da ba da
  • x y x

26
And tested them on
  • wo fe wo
  • gi tu gi
  • po zi po
  • Namely, words with
  • new syllables, but
  • the same structure (x-y-x)

And
  • wo fe fe
  • gi tu tu
  • po zi zi
  • Namely, words with
  • new syllables, and
  • new structure (x-y-y)

27
Results
  • Children appear to recognize the difference
    between these sets of stimuli
  • ? Children are therefore tracking structure and
    not just simple statistics.

28
Questions to ask yourself
  • Why would statistical tracking be useful to
    linguists?
  • As a tool to explain language acquisition.
  • Does statistical tracking explain how children
    acquire language?
  • No, only certain aspects of it.
  • What aspects of language can we track?

?So far, it appears only phonologically related
things can be tracked like this (not
meaning-related things).
29
Most Important Questions
  • Is this useful for ALL languages on Earth?
  • ? It appears that statistical tracking is only
    useful for auditory stimuli, not visualASL?
  • Are humans the only creatures that can do this?
    (I hope so, otherwise other animals should have
    language too)

? No. Vervet and Tamarin monkeys have been shown
to have essentially the same abilities that
humans do.
30
So what do we really know?
  • Kids have spectacular abilities to track
    statistics.
  • But so do adults (so why cant adults learn
    languages as well as kids?)
  • But so do monkeys (so why cant monkeys learn
    language as well as humans?)
  • This ability appears to be limited to statistics
    in auditory perception.
Write a Comment
User Comments (0)
About PowerShow.com