Title: Phonology from a computational point of view
1Phonology from a computational point of view
- Phonemes, dialects, letter-to-sound conversion
- March 2001
2Phonology
- The study of the sound patterns of languages.
- We will extend this to include the letter
patterns of languages.
3Syntax
Information Retrieval
Morphology catch PAST
Spelling caught
Phonemic representation K AO1 T
Sound
4Why study phonology in this course?
- Text to speech (TTS) applications include a
component which converts spelled words to
sequences of phonemes ( sound representations). - E.g., sight ?S AY1 T
- John ? J AA1 N
5Keep separate
- Spelling ( orthography)
- Detailed description of pronunciation
- Abstract description of pronunciation called
phonemic representation
6- Agenda
- Phonology set of phonemes their realizations as
phones - The phonemes are reasonably constant across a
language. - The phones vary a lot within a speaker and across
speakers. - Some of that variation is extremely rule-governed
and must be understood example, English flap
(in butter).
7- In addition to the phonemes syllable structure,
and - Prosody. Today stress levels 0,1,2
- Texts discussion of spelling errors, as a
lead-in to Viterbi-ing the Minimum Edit Distance - Letter to sound (LTS)
8- All speakers have a set of several dozen basic
pronunciation units (phonemes) to which they do
not add (or from which delete) during their adult
lifetimes. 39 phonemes in American English. - This phonemic inventory is not completely fixed
and stable across the United States, but it is
much more fixed and stable than is the
pronunciation of these phonemes.
9How is that possible?
- Im from New York the vowel that I have in cat
is very different from the vowel in a south
Chicago natives cat but the phonemes are the
same they correspond across thousands of words.
10Phonemic inventory
- In computational circles, phonemic inventory
described in DARPAbet - Some words from the CMU dictionary
- THE DH AH0
- THE(2) DH AH1
- THE(3) DH IY0
- THEA TH IY1 AH0
- THEALL TH IY1 L
- THEANO TH IY1 N OW0
- THEATER TH IY1 AH0 T ER0
11Darpabet
- AA odd AA D
- AE at AE T
- AH hut HH AH T
- AO ought AO T
- AW cow K AW
- AY hide HH AY D
1215 Vowels
- AA odd AA D
- AE at AE T
- AH hut HH AH T
- AO ought AO T
- AW cow K AW
- AY hide HH AY D
- EH Ed EH D
- ER hurt HH ER T
- EY ate EY T
- IH it IH T
- IY eat IY T
- OW oat OW T
- OY toy T OY
- UH hood HH UH D
- UW two T UW
1324 Consonants
- Z zee Z IY
- ZH seizure S IY ZH ER
- HH he HH IY
- CH cheese CH IY Z
- JH gee JH IY
- L lee L IY
- M me M IY
- N knee N IY
- NG ping P IY NG
- R read R IY D
- W we W IY
- Y yield Y IY L D
- B be B IY
- D dee D IY
- G green G R IY N
- P pee P IY
- T tea T IY
- K key K IY
- S sea S IY
- SH she SH IY
- F fee F IY
- V vee V IY
- DH thee DH IY
- TH theta TH EY T AH
14Moby system http//www.dcs.shef.ac.uk/research/ila
sh/Moby/
- // sounds like the "a" in "dab"
- /(_at_)/ sounds like the "a" in "air"
- /A/ sounds like the "a" in "far"
- /eI/ sounds like the "a" in "day"
- /_at_/ sounds like the "a" in "ado"
- or the glide "e" in "system" (dipthong schwa)
- /-/ sounds like the "ir" glide in "tire"
- or the "dl" glide in "handle"
- or the "den" glide in "sodden" (dipthong little
schwa) - /Oi/ sounds like the "oi" in "oil"
- /A/ sounds like the "o" in "bob"
- /AU/ sounds like the "ow" in "how"
- /O/ sounds like the "o" in "dog"
15Some sources of dictionaries,including CMUs
- ftp//svr-ftp.eng.cam.ac.uk/pub/pub/pub/comp.speec
h/dictionaries
16The tremendous variety of actual pronunciations
that native speakers can blissfully ignore is
staggering
- But speech recognition systems need to be trained
on this, just as people are in their youth.
17Varieties of sounds in everyones speech
- Most phonemes have several different
pronunciations (called their allophones),
determined by nearby sounds, most usually by the
following sound. - The most striking instance of such variation is
in the realization of the phoneme /T/ in American
English.
18- Well return to the flap after the syllable.
19The syllable
S
rhyme
onset
coda
nucleus
h e l p
20Flap (D) in American English
- We find the flap of water (waDer) under these
conditions strictly inside a word
21But across words
- Word initial t never flaps, regardless of
stresses before or after eat my tomato, see
Topeka... - Word-final t followed by a vowel-initial word
normally does flap, regardless of stresses before
or after. at all, sit on it... - But in the words to, tonight, today, tomorrow,
the to acts as if it were linked to the
preceding word. go Do bed
22Generalization
- English permits phonemes to belong simultaneously
to two syllables ( be ambisyllabic) under
certain conditions. - Ambisyllabic t's convert to flaps.
- Generally speaking
23s s
onset rhyme onset rhyme
B UH1 T ER
This is where we get a flap in American English
24- Within a word
- C becomes part of syllable with a following onset
("maximize syllable onset")
25...within a word
s
C
V
26This also applies across words --in English, and
in many languages, but not (e.g.) in German
s
V
C
27Within a word, ambisyllabification before an
unstressed vowel
e.g., atom
s
s
V
C
V
-stress
stress
28But not across word boundaries
we don't say my tomato my Domato
29/T/ as flap inside words
following stressed following unstressed
preceding stressed no flap Beethoven, attar flap matter, cattle
preceding unstressed no flap return, Mattel optional sanity
30/T/ as flap at word-edge
- If a word ends in a /t/ and the next word starts
with a vowel, flap is normal - at D all, What D is your name?, etc.
- If a word ends in a vowel and the next word
starts with a vowel, never a flap unless the
second word starts with the prefix to- ! - the t tomato, the t topology of but
- go D to the moon, go D tomorrow
31Most computational devices avoid worrying about
these issues
- by (always) treating phonemes in the context of
their left- and right-hand neighbors. - Need to produce an AE? Find out what neighbors it
needs to be produced next to. H AE T? Find an AE
that was produced after an H and before a T.
32Variation in pronunciation islargely
geographical, but it is also related to class,
race, and gender
- William Labov is the master analyst of this
material, and many papers are available at his
web site - http//www.ling.upenn.edu/labov/home.html
- See especially his
- http//www.ling.upenn.edu/phono_atlas/ICSLP4.html
Dialect Diversity in North America
33Ongoing changes in American English pronunciation
- 1. Loss of difference between AA (cot) and AO
(caught). See also hot dog (h AA t d AO g). - Some speakers produce these vowels differently (I
do). Others do not. - Labovs group has produced the following map
34AA / AO distinction/collapse
35Distinction between vowels IH and EH before n
- ink-pen versus baby-pin
- distinction lost in the South.
36 in/en distinction (pin/pen)
37Variation in AE phoneme (hat)
- A very wide range of American speakers do NOT
have the same vowels in sand and sang. - The vowels in cat and sang are the same, but in
sand the vowel is much higher. - However, in the Northern Cities shift, all AE is
pronounced like the last two syllables of idea
this is prevalent right here in the south Chicago
area.
38(No Transcript)
39Sound Letter relationships
- LTS Letter to sound, or
- Phoneme-Grapheme relationships.
- In most languages, this is simple.
- But in English and in French, its very messy.
- Why? Because the spelling system in both is based
on how the language used to be pronounced, and
the pronunciation has since changed.
40Other languages
- In most other languages, spelling reflects
current pronunciation much more accurately. - Stress most languages dont mark which syllable
is stressed. In some languages, there are simple
principles that tell us which syllable is
stressed, but when there are no such principles
(e.g. English, Russian), then you need to build
word-lists with the stressed indicated.
41Letter to sound for English
- Letter gtgt phoneme for speech synthesis
- Phoneme gtgt letter for speech recognition
42Challenges to Letter-to-Sound
- There are always new words being found, and most
of them are new proper names (people, places,
products, companies, etc.)
43Damper, Marchand, Adamson and Gustafson 1998
Testing Letter to Sound
- Third ESCA/COCOSDA Workshop on SPEECH SYNTHESIS
November 1998 - They contest Liberman and Churchs statement in
1991 - We will describe algorithms for pronunciation of
English wordsthat reduce the error rate to only
a few tenths of a percent for ordinary text,
about two orders of magnitude better than the
word error rates of 15 or so that were common a
decade ago. - They write,
- In this paper, we have shown that automatic
pronunciation of novel words is not a solved
problem in TTS synthesis. The best that can be
done is about 70 words correct using PbA
Pronunciation by Analogytraditional
rulesperform very badly much worse than
pronunciation by analogy and other data-driven
approaches.
44Damper et al.
- Compare 4 approaches
- Hand-written phonological rules
- Pronunciation by analogy (based on Dedina and
Nusbaum 1991) - Neural networks (based on Sejnowski and
Rosenbergs NETtalk) - Information theory-based approach (Nearest
neighbor)
45How to evaluate LTS?
- Systems typically use
- a large dictionary
- a set of exceptional words
- a backoff strategy for words that slip through
the first 2 steps. - Is it fair to test the backoff strategy on words
in the first two sets, then?
46- Damper et al propose
- Test on a single, entire, large dictionary
- Strict scoring, not frequency-weighted, giving
credit only for full-word correct - A standardized phoneme output set should be
employed
47Evaluation
- In reality, different descriptions of English use
different sets of phonemes (e.g., is stress
marked on the vowels? British versus American) - Issues in testing data-driven methods, because
the performance of a data-driven method is
tightly linked to the data it was trained on.
48Data-driven method
Data
Learning method
Letter-to-sound conversion system
49- In theory, you should never test a data-driven
method on data that it was trained on. - In theory, if you want to test the performance of
the method on the whole dictionary, you can train
the system on the whole dictionary less one word,
and then test it on that word and do all of that
each time for each word. - But that takes too long! and were also
interested in the relationship between training
corpus size and total performance.
50Damper et als work-around
- For various values of N (up to half the size of
the dictionary) - Take two random samples of the dictionary, each
of size N. Train on one set, test on the other. - N 100, 500, 1000, 2000, 5000 and 8,140.
- Dictionary is of size 16,280.
51Results Hand-written rules
- Elovitz et al hand-written rules for this
purpose. 25.7 of words were entirely correct.
Length errors (especially due to geminate
consonants), /g/-/j/ confusions and vowel
substitutions abound. Extensive efforts were
made to make sure that this low figure was not an
error!
52Pronunciation by analogy
- Begin with a (hand-made) alignment of letters to
sounds. For every observed string of letters,
gather the set of phonemes that it can be
associated with, and store in data-structure
along with their frequency. - For the test word, find all ways of dividing the
word up into pieces that are present in the data
structure. Weight the resulting analyses by (1)
how many subpieces are involved, and (2)
frequencies of the subpieces, and choose the
best.
53Results PbA neural net
- PbA 71.8 correct.
- Neural net 54.4, when trained on the whole
dictionary
54Information-Gain trees
- IB1-IG 57.4 correct
- This approach is a variant on decision-tree
learning (an important paradigm in machine
learning).
55- In simplest terms, a decision-tree approach
studies a problem like, What phoneme realizes
this letter in this context? by looking at all
relevant examples in the data, and considering
all context data (what precedes, what follows,
etc.) and deciding, first, which factor gives
the most information - Measure the uncertainty first uncertainty of how
this t should be pronounced - Measure the uncertainty if you know what the
following letter is. - Measuring uncertainty
56Entropy as measure of uncertainty
- Set of possibilities for realizing t
- T 64
- TH 36
- calculate
- 0.64 log (0.64) 0.36 log (0.36)
- and multiply by 1 0.94268
57- realization of t
- if following letter is h (36)
- T .02
- TH .98
- Entropy -1(.02log(.02) .98 log(.98) )
- .14144 (base 2 logs!)
- if following letter is anything else (64)
- T 1.00
- TH .00
- Entropy -1 ( 1 log 1)0 log 0 ) 0
- Total entropy now 0.36 .14144 0
- .05092 a huge decrease from 0.94268!
58Information gain and LTS
- The idea is to use this method of testing to
automatically determine which aspects of a
letters neighborhood are most revealing in
determining how that letter should be realized in
that word. - But 57.4 fully correct results in this
experiment.
59Bottom line
- Still a lot of work to be done both in getting
results and testing how well various methods
work.
60Minimal Edit Distance
- A first look at Viterbi in action
61- Whats the best way to line up two different
strings? To answer that question, we have to make
some specifications. - One (p. 53ff in textbook, Section 5.6) could be
that perfect alignments are free, while a
deletion (non-alignment) costs 1 and a
substitution costs 2.
62E X E C U T I O N
I N T E N T I O N
These are free and there are no reduced fares
for any kind of partial match for the others.
63Cost 3 substitutions 2 hangings 8
E X E C U T I O N
I N T E N T I O N
64Cost 1 substitutions 6 hangings 8
Same cost thats how weve set up the problem.
E X E C U T I O N
I N T E N T I O N
65N 9 10 11 10 11 12 11 10 9 8
O 8 9 10 9 10 11 10 9 8 9
I 7 8 9 8 9 10 9 8 9 10
T 6 7 8 7 8 9 8 9 10 11
N 5 6 7 6 7 8 9 10 11 12
E 4 5 6 5 6 7 8 9 10 11
T 3 4 5 6 7 8 9 10 11 12
N 2 3 4 5 6 7 8 8 10 11
I 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
E X E C U T I O N
66- The chart tells us something about how we walk
through it, but (the books not clear on this),
we also have to keep track on a memo-pad what the
best path was that got us to that box. - We need to find a path that only goes Right, Up,
or Both (Up Right) and leads us to the best
final box.
67- We can arbitrarily choose one of the best ways to
get to a box in this case, because the problem at
hand doesnt set different costs depending on the
row-transitions. But very frequently such costs
must be borne in mind.
68N 9 10 11 10 11 12 11 10 9 8
O 8 9 10 9 10 11 10 9 8 9
I 7 8 9 8 9 10 9 8 9 10
T 6 7 8 7 8 9 8 9 10 11
N 5 6 7 6 7 8 9 10 11 12
E 4 5 6 5 6 7 8 9 10 11
T 3 4 5 6 7 8 9 10 11 12
N 2 3 4 5 6 7 8 8 10 11
I 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
E X E C U T I O N