Words in puddles of sound - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Words in puddles of sound

Description:

where words are realised variably (Pollack & Pickett, 1964) Segmentation and sublexical cues. Final syllables of words are longer (Klatt, 1975) ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 33
Provided by: itser174
Category:

less

Transcript and Presenter's Notes

Title: Words in puddles of sound


1
Words in puddles of sound
Padraic Monaghan University of York Morten
Christiansen Cornell University
2
  • Words in a sea of sound (Saffran, 2001)
  • Discovering words
  • from continuous speech
  • with no reliable cues to word
    boundaries (Jones, 1918 Liberman
    et al., 1967)
  • where words are realised variably
    (Pollack Pickett, 1964)

3
  • Segmentation and sublexical cues
  • Final syllables of words are longer (Klatt,
    1975)
  • hamster v. ham (Saffran, Newport, Aslin (1996
    Salverda McQueen, 2004)
  • First syllables of words are stressed
  • 60 of the time in English (Crystal House,
    1990 Pierrehumbert, 1981)
  • Johnson Jusczyk (2001) Thiessen Saffran
    (2003)
  • Certain diphones are more likely to occur across
    words than within words (Mattys et al., 2005)

4
  • Multiple cues in speech segmentation
  • Hierarchical model
  • (Mattys, White,
  • Melhorn, 2005)

5
  • Puddles
  • whosalovelybabyyesyouareyourealovelybabyarentyouye
    syouare
  • In 5.5M words of child-directed speech

6
  • Lexical approach to segmentation
  • Once youve got the words, segmentation is easy
    (Norris, 1994 2007)
  • Assume each utterance is a word
  • until you know differently
  • if its repeated, you keep it
  • if it doesnt occur again, you lose it

7
  • Aims of Modelling
  • Utterances cant be used as dont know when its
    a single word, when its multiple (Brent
    Cartwright, 1996)
  • utterance boundaries are sufficient to get
    started
  • single-word utterances are useful anchors for
    segmentation
  • It is possible to distinguish (most) single-word
    from (most) multiple word utterances
  • Proper nouns have a special role
  • Frequent multiple-word sequences will be
    lexicalised (Tomasello, 2001)

8
  • Lexical approach to segmentation
  • Familiar words used for segmentation by Maggie
    (Bortfeld et al., 2005)
  • maggies bike had big, black wheels
  • hannahs cup was bright and shiny
  • infants familiarised to bike more quickly than
    cup
  • Proper nouns often occur as single utterances
  • 3.3 of utterances in naomi corpus in CHILDES
  • Very high frequent words are useful for
    categorising content words (Monaghan, Chater,
    Christiansen, 2005 Redington, Chater, Finch,
    1998)

9
  • Corpora
  • 6 corpora from CHILDES
  • child-directed speech to children aged lt 26
  • Orthographic transcription run through festival
    speech synthesiser (Black et al., 1990)

10
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON
11
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON
12
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 1.0
13
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 0.99
14
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 0.99
15
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 1.99 thatsright 1.00
16
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 1.98 thatsright 0.99
17
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 2.98 thatsright 0.99
18
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 3.96 thatsright 0.97 sayitagain
0.99 look 1.00
19
More constraints in the model Phonological
glue oh okay noway nevertheless
LEXICON oh kay n way evertheless
Candidate words with recognised beginnings and
endings admitted Candidate words which divide a
recognised word-internal diphone rejected
20
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh End oh Glue oh LEXICON oh
21
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh End oh Glue oh LEXICON oh
22
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh End oh Glue oh LEXICON oh
x
ka? oh? ok?
23
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh, ka End oh, ay Glue oh, ok, ka,
ay LEXICON oh okay
24
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh, ka End oh, ay Glue oh, ok, ka,
ay LEXICON oh okay no nevertheless
25
  • Testing the model
  • Decisions
  • Internal diphone glue constraint
  • Legal beginnings/endings constraint
  • Decay-rate
  • Ordering of lexicon
  • Accuracy Proportion of words segmented that are
    words
  • Completeness Proportion of words that are
    segmented
  • Baseline segmentation correct number of words
    in utterance, randomly positioned boundary (Brent
    Cartwright, 1996)

Included Included 0 By length
26
Results Accuracy t(5) 19.637, p lt .0001
27
Results Completeness t(5) 28.969, p lt .0001
28
  • Results Naomis Lexicon
  • Top 10 after 1K utterances
  • Nomi
  • Say
  • No
  • Yes
  • The
  • Okay
  • Whatsthis
  • Blanket
  • Is
  • What

29
  • Results Naomis Lexicon
  • Top 10 after 8K utterances
  • You
  • Nomi
  • The
  • It
  • To
  • What
  • I
  • Thats
  • No
  • Your

30
Results Naomis Lexicon 0.05 decay
31
Results Naomis Lexicon 0.01 decay
32
  • Summary
  • Model based on puddles of sound
  • accurate, complete
  • reliance on Proper noun
  • frequent words pop out
  • same words useful for grammatical categorisation
  • No mechanism for alternative, competing parses
    of speech
  • first, cognitively plausible step for how
    lexicon may be generated
  • Relative role of phonological glue, legal
    boundaries, sorting by length/frequency
Write a Comment
User Comments (0)
About PowerShow.com