Title: Acquiring and implementing phonetic knowledge
1Acquiring and implementing phonetic knowledge
- Louis C.W. Pols
- Institute of Phonetic Sciences (IFA)
http//www.fon.hum.uva.nl/ - Amsterdam Center for Language and
- Communication (ACLC) / LOT
- Faculty of Humanities, University of Amsterdam
- Herengracht 338, Amsterdam, The Netherlands
Eurospeech 2001 - Scandinavia Aalborg, Sept. 3,
2001, Keynote
2why so excited?
- speech speech research are beautiful
- doing, supervising, talking, publishing is fun
- speech community is wonderful
- ISCA, former ESCA, is the best
- Paul Dalsgaard c.s., Aalborg, Denmark, and
Eurospeech 2001-Scandinavia unique
so..
what better could happen to me than getting this
ISCA medal, here and now in the year that I
became 60! and 75 years chair Phonetics in Adam
3outline
- phonetic knowledge
- acquiring and implementing that knowledge
- 30 years ago, 7th ICA in Budapest, Sept. 1971
- nothing compared to G. Fant K. Stevens (ESCA
medallists) who can easily talk about half a
century of experience in speech research! - speech production and speech perception
- supervising some 25 Ph.D. projects
- speech acquisition (L1 and L2)
- speech technology
- speech databases
- what might future bring us?
4acquiring and implementing phonetic knowledge
- from speech production and speech perception
- via speech analysis
- via experimental procedures
- via data mining in speech databases
- via literature
- formalizing and generalizing knowledge
- applying knowledge via rules, statistical
procedures, proper selections, etc.
5phonetic knowledge isindispensable for
- language acquisition (both L1 and L2)
- education and training
- aids for the handicapped
- speech technology (analysis, coding, synthesis,
recognition, dialogs, translation, spotting)
but, see Eurospeech Special Event 7, Friday,
900-1230 Integration of Phonetic Knowledge in
Speech Technology a) Experiments and
Experiences, Presentations b) Is Phonetic
Knowledge any Use? Panel Discussion
67th ICABudapest
- 17th ICA now in Rome, Italy (Sept. 2-7)
- every 3 years first one in 1951 in Delft, Neth.
- 7th ICA in Budapest, Hungary (Sept. 1971), plus
subsequent Speech Symposium in Szeged - my first active participation in a major
(speech) conference - substantial international participation on speech
- proper view of state-of-the-art 30 years ago
7state-of-the-art 30 years ago (1)
- speech perception
- Kasuya effect of context on vowel perception
- Rao plosive - vowel interaction
- Kozhevnikov perception of AM vowel-like stimuli
- Chistovich vowel discrimination, plus keynote on
importance of psycho-acoustics for speech
perception - followed by Symposium on Auditory Analysis and
Perception of Speech, Leningrad, Aug. 1973 - speech production
- Fujimura dynamic palatography, electromyography,
and Tokyo x-ray microbeam system
8state-of-the-art 30 years ago (2)
- speech processing
- Velichko dynamic programming
- Atal initial ideas about predictive coding
- speech synthesis (no rule synthesis, no diphones)
- Liljencrants Fant OVE III formant synthesizer
- Coker articulatory synthesis
- Mermelstein and Atal Vocal Tract transfer
functions - Rabiner digital formant synthesizer we
were away a year ago, may we all learn a yellow
lion roar - Denes word concatenation
- Itakura digital filters of ladder form for
synthesis
9state-of-the-art 30 years ago (3)
- speech recognition (only template matching,
simple time normalization, no probabilistic
approach) - isolated word recognition (some 50 words)
- Erman over telephone carefully spoken by one
- Neely in noise male speaker Ken Stevens
- Pols dimensional representation of BF spectra
- Rao diad matching
- Bonner DAWID-II system
- Sakoe dynamic processing for time normalization
- Dreyfus-Graf artificial language to simplify
recogn. - Flanagan keynote on focal points in sp. comm.
res.
10state-of-the-art 30 years ago (4)
- musical acoustics
- Sundberg real time pitch extraction in folk
music - Mathews music synthesis
- psycho-acoustics
- Houtgast psychophysical evidence for lateral
inhibition - Evans Wilson neurophysiological evidence
- Julesz critical bands in vision and audition
- de Boer reverse-correlation method
11speech production and perception
- three representative events
- Speech Recognition As pAttern Classification
(SPRAAC), MPI-workshop July 11-13, 2001 - van Son Pols Phoneme recognition as a
function of task and context - Moore Cutler Constraints on theories of human
vs.. machine recognition of speech - MIT Symposium on Invariance and variability of
speech processes, Cambridge, Oct. 1983 - Symposium on Auditory analysis and perception of
speech, Leningrad, Aug. 1973
12supervising some 25 Ph.D. projects
- ideas and productivity via these students
- Dutch habit good-looking booklet of each thesis
- plus reports at conf., workshops, and in open
lit. - in 3 main fields of research
- early speech acquisition (normal/pathological)
- speech production and perception
(normal/pathological) - speech technology
- joint responsibility for several projects
- daily supervision by Florien Koopmans- van Beinum
- with colleague promotores
13(No Transcript)
14Univ. of Amsterdam Sept. 26, 2001
15coarticulatory effects on the schwaD. van Bergem
(1995)
- stylized F2-tracks with second order polynomials
- F2-track of the schwa via model prediction
t-n
w-l
16Gradual Learning Algorithm (GLA)P. Boersma (1998)
Boersma Hayes, Linguistic Inquiries 32(1),
2001, 45-86
17(No Transcript)
18speech signal processing package praat
- mainly developed and maintained by P. Boersma
- meanwhile gt4000 registered users in 85 countries
- freely available upon request (http//www.fon.hum.
uva.nl/praat/) - for all common platforms Macintosh, Windows,
Linux, SGI, Solaris, HP-UX - user friendly, excellent graphical output,
scriptable - see demo at Educational Arena (Thu. afternoon)
- praat doing phonetics by computer
- a.o. used for transcriptions in Spoken Dutch
Corpus
19phonetic knowledge andearly speech acquisition
- source filter description system (FvB-JvdSt)
- early indicators for dyslexia (C. Schwippert)
- early hearing screening with babies
- but, early detection requires early intervention
- optimizing digital hearing aids
- objective adaptation of hearing aids for babies
- cochlear implants, also for young babies
20early speech development
vB, Cl, vdD, Developmental Sc. 4(1), 2001, 61-70
see poster, sess.C26
21phonetic knowledge and speech technology (1)
- speech technology barely existed 30 years ago
- ideal test bed for all acquired speech knowledge
- speech synthesis
- fully natural synthetic speech ( including
multilingual and in various speaking styles) ?
text interpretation and speech generation problem
solved - even better if optimized for noisy and
reverberant conditions and for non-natives and
elderly people - speech understanding
- full performance ? speaker adaptation, robust
word recognition, and speech understanding
problem solved
22predicting prominence
- Ph.D. project Barbertje Streefkerk (oral, sess.
B32) - acoustical and/or textual features to predict
prom. - (for ASR and rule synthesis purposes,
respectively) - prominence judgment by listeners at word level
- textual feat. POS (11 categ.), syll, word pos.,
co-occ. - rule set to predict prom. (level 0-4) for
results see paper - acoustical features (7) additional (5)
- F0 median range, syll. word median sent.
- duration vowel, syllable Vnorm. sent. rate
- intensity vowel Vnorm. sentence
- neural net predictor 82 best score (prom. 0 /
1)
23phonetic knowledge and speech technology (2)
- speech technological needs for handicapped
- artificial voice for laryngectomized speakers
- better digital hearing aid for hearing impaired
- better cochlear implant for deaf
- natural speech output for visually impaired
- training aids for speech and language impaired
- speech technology in education and training
24phonetic knowledge in speech databases
- speech databases potentially are a wealth of
phonetic knowledge - requires annotation (manually or automatic) at
various levels (from segmental to prosodic
linguistic) - requires SQL-type access intelligent data
mining - new ways of defining knowledge, e.g.
- duration modeling
- pronunciation variants
- concatenative synthesis (best match)
252 examples
- Spoken Dutch Corpus
- Dutch-Flemish project, start June 1998, 5 years
- 10M words 1000 hrs of speech, many
styles/speakers - for all 10M orthography, lemmas, POS
- for 1M phonetic and syntactic annotation
- for 250k prosodic annotation
- IFA corpus (Dutch), R. van Son (poster, sess.
D36) - few speakers (4 M and 4 F), but gt30 min./speaker
- various speaking styles per speaker, and
- all material phonemically segmented and labeled
- free access via SQL query language
26Spoken Dutch Corpus
- W. Levelt (chairman Board), J.P. Martens (overall
coordinator), Nijmegen Univ. (Dutch coordination) - so far, mainly project-internal results, e.g.
- optimizing transcription protocols, e.g.
- orthographic (using praat)
- phonetic doe ik du-w-Ik is zes Is_sEs
- determining consistency and efficiency (costs)
- optimizing automatic procedures for
- POS-tagging lemmatization
- syntactic annotation (semi-automatic)
- grapheme-to-phoneme conversion
- word alignment
27IFA corpus Consonant duration
Intervocalic Nasals, Fricatives, Stops, and
Glides in Spontaneous and Read connected speech
(2 or more syllable words) accounting for the
effects of speaker (8), style, and phoneme
identity word freq. lt 1/4000 (CELEX) words not
at sentence boundary
spont. str. (202, 295, 20)
spont. unstr. (96, 810, 94)
read str. (715, 837, 75)
read unstr. (285, 2586, 317) I M F
28some conclusions (1)
- let speech speak for itself (speech databases)
- 25 Ph.D students can do much more than one
(administratively overloaded) senior - despite skepticism much progress in last 30 yrs
- over 10,000 active in spoken language community
- ?700 papers at E01 gt all speech papers in 1971
- JASA speech 2nd (14.4) in 1999 (Nlt700) 6th in
1970 (5.1) - joint phonetic knowledge is insufficient to solve
todays communicative demands
?
29some conclusions (2)
- speech is most natural form of communication,
however, natural HC dialog is far away - synthetic speech is intelligible, but no proper
control over naturalness and speaker/style char. - ASR requires greater robustness and quicker
adaptation - speech and language technology could be used more
in education, language training and aids for the
handicapped - much basic knowledge about sp. perc. still missing
30some intriguing questions
- how do listeners normalize over speakers?
- how do listeners handle speech variation?
- is there always a cause for any variation?
- what is a realistic and efficient front-end?
- also for noisy speech and high-pitched voices
- how do we acquire our mother tongue and a foreign
language? - what are implications of speaking/hearing defect
- plus hearing aid and cochlear implant
31epilogue
- privileged to have been part of this lively
speech community for over 30 years - high expectations of progress to come
- phonetic knowledge nowadays more accessible
- and easier to implement in
- descriptive models (computational phonetics), and
- technological systems
- thank you all for your kind attention!