Words in puddles of sound - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Words in puddles of sound

Description:

where words are realised variably (Pollack & Pickett, 1964) Segmentation and sublexical cues. Final syllables of words are longer (Klatt, 1975) ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 33

Provided by: itser174

Category:

more less

Transcript and Presenter's Notes

Title: Words in puddles of sound

1
Words in puddles of sound
Padraic Monaghan University of York Morten
Christiansen Cornell University
2

Words in a sea of sound (Saffran, 2001)
Discovering words
from continuous speech
with no reliable cues to word
boundaries (Jones, 1918 Liberman
et al., 1967)
where words are realised variably
(Pollack Pickett, 1964)

Segmentation and sublexical cues
Final syllables of words are longer (Klatt,
1975)
hamster v. ham (Saffran, Newport, Aslin (1996
Salverda McQueen, 2004)
First syllables of words are stressed
60 of the time in English (Crystal House,
1990 Pierrehumbert, 1981)
Johnson Jusczyk (2001) Thiessen Saffran
(2003)
Certain diphones are more likely to occur across
words than within words (Mattys et al., 2005)

Multiple cues in speech segmentation
Hierarchical model
(Mattys, White,
Melhorn, 2005)

Puddles
whosalovelybabyyesyouareyourealovelybabyarentyouye
syouare
In 5.5M words of child-directed speech

Lexical approach to segmentation
Once youve got the words, segmentation is easy
(Norris, 1994 2007)
Assume each utterance is a word
until you know differently
if its repeated, you keep it
if it doesnt occur again, you lose it

Aims of Modelling
Utterances cant be used as dont know when its
a single word, when its multiple (Brent
Cartwright, 1996)
utterance boundaries are sufficient to get
started
single-word utterances are useful anchors for
segmentation
It is possible to distinguish (most) single-word
from (most) multiple word utterances
Proper nouns have a special role
Frequent multiple-word sequences will be
lexicalised (Tomasello, 2001)

Lexical approach to segmentation
Familiar words used for segmentation by Maggie
(Bortfeld et al., 2005)
maggies bike had big, black wheels
hannahs cup was bright and shiny
infants familiarised to bike more quickly than
cup
Proper nouns often occur as single utterances
3.3 of utterances in naomi corpus in CHILDES
Very high frequent words are useful for
categorising content words (Monaghan, Chater,
Christiansen, 2005 Redington, Chater, Finch,
1998)

Corpora
6 corpora from CHILDES
child-directed speech to children aged lt 26
Orthographic transcription run through festival
speech synthesiser (Black et al., 1990)

10
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON
11
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON
12
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 1.0
13
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 0.99
14
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 0.99
15
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 1.99 thatsright 1.00
16
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 1.98 thatsright 0.99
17
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 2.98 thatsright 0.99
18
The model kitty thatsrightkitty kitty sayitagain
lookkitty
LEXICON kitty 3.96 thatsright 0.97 sayitagain
0.99 look 1.00
19
More constraints in the model Phonological
glue oh okay noway nevertheless
LEXICON oh kay n way evertheless
Candidate words with recognised beginnings and
endings admitted Candidate words which divide a
recognised word-internal diphone rejected
20
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh End oh Glue oh LEXICON oh
21
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh End oh Glue oh LEXICON oh
22
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh End oh Glue oh LEXICON oh
x
ka? oh? ok?
23
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh, ka End oh, ay Glue oh, ok, ka,
ay LEXICON oh okay
24
More constraints in the model Phonological
glue oh okay no nevertheless
GLUE Beg oh, ka End oh, ay Glue oh, ok, ka,
ay LEXICON oh okay no nevertheless
25

Testing the model
Decisions
Internal diphone glue constraint
Legal beginnings/endings constraint
Decay-rate
Ordering of lexicon
Accuracy Proportion of words segmented that are
words
Completeness Proportion of words that are
segmented
Baseline segmentation correct number of words
in utterance, randomly positioned boundary (Brent
Cartwright, 1996)

Included Included 0 By length
26
Results Accuracy t(5) 19.637, p lt .0001
27
Results Completeness t(5) 28.969, p lt .0001
28

Results Naomis Lexicon
Top 10 after 1K utterances
Nomi
Say
No
Yes
The
Okay
Whatsthis
Blanket
Is
What

Results Naomis Lexicon
Top 10 after 8K utterances
You
Nomi
The
It
To
What
I
Thats
No
Your

30
Results Naomis Lexicon 0.05 decay
31
Results Naomis Lexicon 0.01 decay
32

Summary
Model based on puddles of sound
accurate, complete
reliance on Proper noun
frequent words pop out
same words useful for grammatical categorisation
No mechanism for alternative, competing parses
of speech
first, cognitively plausible step for how
lexicon may be generated
Relative role of phonological glue, legal
boundaries, sorting by length/frequency