Acquiring and implementing phonetic knowledge - PowerPoint PPT Presentation

About This Presentation
Title:

Acquiring and implementing phonetic knowledge

Description:

Louis C.W. Pols Institute of Phonetic Sciences (IFA) http://www.fon.hum.uva.nl/ Amsterdam Center for Language and Communication (ACLC) / LOT Faculty of Humanities ... – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 32
Provided by: Louis128
Category:

less

Transcript and Presenter's Notes

Title: Acquiring and implementing phonetic knowledge


1
Acquiring and implementing phonetic knowledge
  • Louis C.W. Pols
  • Institute of Phonetic Sciences (IFA)
    http//www.fon.hum.uva.nl/
  • Amsterdam Center for Language and
  • Communication (ACLC) / LOT
  • Faculty of Humanities, University of Amsterdam
  • Herengracht 338, Amsterdam, The Netherlands

Eurospeech 2001 - Scandinavia Aalborg, Sept. 3,
2001, Keynote
2
why so excited?
  • speech speech research are beautiful
  • doing, supervising, talking, publishing is fun
  • speech community is wonderful
  • ISCA, former ESCA, is the best
  • Paul Dalsgaard c.s., Aalborg, Denmark, and
    Eurospeech 2001-Scandinavia unique

so..
what better could happen to me than getting this
ISCA medal, here and now in the year that I
became 60! and 75 years chair Phonetics in Adam
3
outline
  • phonetic knowledge
  • acquiring and implementing that knowledge
  • 30 years ago, 7th ICA in Budapest, Sept. 1971
  • nothing compared to G. Fant K. Stevens (ESCA
    medallists) who can easily talk about half a
    century of experience in speech research!
  • speech production and speech perception
  • supervising some 25 Ph.D. projects
  • speech acquisition (L1 and L2)
  • speech technology
  • speech databases
  • what might future bring us?

4
acquiring and implementing phonetic knowledge
  • from speech production and speech perception
  • via speech analysis
  • via experimental procedures
  • via data mining in speech databases
  • via literature
  • formalizing and generalizing knowledge
  • applying knowledge via rules, statistical
    procedures, proper selections, etc.

5
phonetic knowledge isindispensable for
  • language acquisition (both L1 and L2)
  • education and training
  • aids for the handicapped
  • speech technology (analysis, coding, synthesis,
    recognition, dialogs, translation, spotting)

but, see Eurospeech Special Event 7, Friday,
900-1230 Integration of Phonetic Knowledge in
Speech Technology a) Experiments and
Experiences, Presentations b) Is Phonetic
Knowledge any Use? Panel Discussion
6
7th ICABudapest
  • 17th ICA now in Rome, Italy (Sept. 2-7)
  • every 3 years first one in 1951 in Delft, Neth.
  • 7th ICA in Budapest, Hungary (Sept. 1971), plus
    subsequent Speech Symposium in Szeged
  • my first active participation in a major
    (speech) conference
  • substantial international participation on speech
  • proper view of state-of-the-art 30 years ago

7
state-of-the-art 30 years ago (1)
  • speech perception
  • Kasuya effect of context on vowel perception
  • Rao plosive - vowel interaction
  • Kozhevnikov perception of AM vowel-like stimuli
  • Chistovich vowel discrimination, plus keynote on
    importance of psycho-acoustics for speech
    perception
  • followed by Symposium on Auditory Analysis and
    Perception of Speech, Leningrad, Aug. 1973
  • speech production
  • Fujimura dynamic palatography, electromyography,
    and Tokyo x-ray microbeam system

8
state-of-the-art 30 years ago (2)
  • speech processing
  • Velichko dynamic programming
  • Atal initial ideas about predictive coding
  • speech synthesis (no rule synthesis, no diphones)
  • Liljencrants Fant OVE III formant synthesizer
  • Coker articulatory synthesis
  • Mermelstein and Atal Vocal Tract transfer
    functions
  • Rabiner digital formant synthesizer we
    were away a year ago, may we all learn a yellow
    lion roar
  • Denes word concatenation
  • Itakura digital filters of ladder form for
    synthesis

9
state-of-the-art 30 years ago (3)
  • speech recognition (only template matching,
    simple time normalization, no probabilistic
    approach)
  • isolated word recognition (some 50 words)
  • Erman over telephone carefully spoken by one
  • Neely in noise male speaker Ken Stevens
  • Pols dimensional representation of BF spectra
  • Rao diad matching
  • Bonner DAWID-II system
  • Sakoe dynamic processing for time normalization
  • Dreyfus-Graf artificial language to simplify
    recogn.
  • Flanagan keynote on focal points in sp. comm.
    res.

10
state-of-the-art 30 years ago (4)
  • musical acoustics
  • Sundberg real time pitch extraction in folk
    music
  • Mathews music synthesis
  • psycho-acoustics
  • Houtgast psychophysical evidence for lateral
    inhibition
  • Evans Wilson neurophysiological evidence
  • Julesz critical bands in vision and audition
  • de Boer reverse-correlation method

11
speech production and perception
  • three representative events
  • Speech Recognition As pAttern Classification
    (SPRAAC), MPI-workshop July 11-13, 2001
  • van Son Pols Phoneme recognition as a
    function of task and context
  • Moore Cutler Constraints on theories of human
    vs.. machine recognition of speech
  • MIT Symposium on Invariance and variability of
    speech processes, Cambridge, Oct. 1983
  • Symposium on Auditory analysis and perception of
    speech, Leningrad, Aug. 1973

12
supervising some 25 Ph.D. projects
  • ideas and productivity via these students
  • Dutch habit good-looking booklet of each thesis
  • plus reports at conf., workshops, and in open
    lit.
  • in 3 main fields of research
  • early speech acquisition (normal/pathological)
  • speech production and perception
    (normal/pathological)
  • speech technology
  • joint responsibility for several projects
  • daily supervision by Florien Koopmans- van Beinum
  • with colleague promotores

13
(No Transcript)
14
Univ. of Amsterdam Sept. 26, 2001
15
coarticulatory effects on the schwaD. van Bergem
(1995)
  • stylized F2-tracks with second order polynomials
  • F2-track of the schwa via model prediction

t-n
w-l
16
Gradual Learning Algorithm (GLA)P. Boersma (1998)
Boersma Hayes, Linguistic Inquiries 32(1),
2001, 45-86
17
(No Transcript)
18
speech signal processing package praat
  • mainly developed and maintained by P. Boersma
  • meanwhile gt4000 registered users in 85 countries
  • freely available upon request (http//www.fon.hum.
    uva.nl/praat/)
  • for all common platforms Macintosh, Windows,
    Linux, SGI, Solaris, HP-UX
  • user friendly, excellent graphical output,
    scriptable
  • see demo at Educational Arena (Thu. afternoon)
  • praat doing phonetics by computer
  • a.o. used for transcriptions in Spoken Dutch
    Corpus

19
phonetic knowledge andearly speech acquisition
  • source filter description system (FvB-JvdSt)
  • early indicators for dyslexia (C. Schwippert)
  • early hearing screening with babies
  • but, early detection requires early intervention
  • optimizing digital hearing aids
  • objective adaptation of hearing aids for babies
  • cochlear implants, also for young babies

20
early speech development
vB, Cl, vdD, Developmental Sc. 4(1), 2001, 61-70
see poster, sess.C26
21
phonetic knowledge and speech technology (1)
  • speech technology barely existed 30 years ago
  • ideal test bed for all acquired speech knowledge
  • speech synthesis
  • fully natural synthetic speech ( including
    multilingual and in various speaking styles) ?
    text interpretation and speech generation problem
    solved
  • even better if optimized for noisy and
    reverberant conditions and for non-natives and
    elderly people
  • speech understanding
  • full performance ? speaker adaptation, robust
    word recognition, and speech understanding
    problem solved

22
predicting prominence
  • Ph.D. project Barbertje Streefkerk (oral, sess.
    B32)
  • acoustical and/or textual features to predict
    prom.
  • (for ASR and rule synthesis purposes,
    respectively)
  • prominence judgment by listeners at word level
  • textual feat. POS (11 categ.), syll, word pos.,
    co-occ.
  • rule set to predict prom. (level 0-4) for
    results see paper
  • acoustical features (7) additional (5)
  • F0 median range, syll. word median sent.
  • duration vowel, syllable Vnorm. sent. rate
  • intensity vowel Vnorm. sentence
  • neural net predictor 82 best score (prom. 0 /
    1)

23
phonetic knowledge and speech technology (2)
  • speech technological needs for handicapped
  • artificial voice for laryngectomized speakers
  • better digital hearing aid for hearing impaired
  • better cochlear implant for deaf
  • natural speech output for visually impaired
  • training aids for speech and language impaired
  • speech technology in education and training

24
phonetic knowledge in speech databases
  • speech databases potentially are a wealth of
    phonetic knowledge
  • requires annotation (manually or automatic) at
    various levels (from segmental to prosodic
    linguistic)
  • requires SQL-type access intelligent data
    mining
  • new ways of defining knowledge, e.g.
  • duration modeling
  • pronunciation variants
  • concatenative synthesis (best match)

25
2 examples
  • Spoken Dutch Corpus
  • Dutch-Flemish project, start June 1998, 5 years
  • 10M words 1000 hrs of speech, many
    styles/speakers
  • for all 10M orthography, lemmas, POS
  • for 1M phonetic and syntactic annotation
  • for 250k prosodic annotation
  • IFA corpus (Dutch), R. van Son (poster, sess.
    D36)
  • few speakers (4 M and 4 F), but gt30 min./speaker
  • various speaking styles per speaker, and
  • all material phonemically segmented and labeled
  • free access via SQL query language

26
Spoken Dutch Corpus
  • W. Levelt (chairman Board), J.P. Martens (overall
    coordinator), Nijmegen Univ. (Dutch coordination)
  • so far, mainly project-internal results, e.g.
  • optimizing transcription protocols, e.g.
  • orthographic (using praat)
  • phonetic doe ik du-w-Ik is zes Is_sEs
  • determining consistency and efficiency (costs)
  • optimizing automatic procedures for
  • POS-tagging lemmatization
  • syntactic annotation (semi-automatic)
  • grapheme-to-phoneme conversion
  • word alignment

27
IFA corpus Consonant duration
Intervocalic Nasals, Fricatives, Stops, and
Glides in Spontaneous and Read connected speech
(2 or more syllable words) accounting for the
effects of speaker (8), style, and phoneme
identity word freq. lt 1/4000 (CELEX) words not
at sentence boundary
spont. str. (202, 295, 20)
spont. unstr. (96, 810, 94)
read str. (715, 837, 75)
read unstr. (285, 2586, 317) I M F
28
some conclusions (1)
  • let speech speak for itself (speech databases)
  • 25 Ph.D students can do much more than one
    (administratively overloaded) senior
  • despite skepticism much progress in last 30 yrs
  • over 10,000 active in spoken language community
  • ?700 papers at E01 gt all speech papers in 1971
  • JASA speech 2nd (14.4) in 1999 (Nlt700) 6th in
    1970 (5.1)
  • joint phonetic knowledge is insufficient to solve
    todays communicative demands

?
29
some conclusions (2)
  • speech is most natural form of communication,
    however, natural HC dialog is far away
  • synthetic speech is intelligible, but no proper
    control over naturalness and speaker/style char.
  • ASR requires greater robustness and quicker
    adaptation
  • speech and language technology could be used more
    in education, language training and aids for the
    handicapped
  • much basic knowledge about sp. perc. still missing

30
some intriguing questions
  • how do listeners normalize over speakers?
  • how do listeners handle speech variation?
  • is there always a cause for any variation?
  • what is a realistic and efficient front-end?
  • also for noisy speech and high-pitched voices
  • how do we acquire our mother tongue and a foreign
    language?
  • what are implications of speaking/hearing defect
  • plus hearing aid and cochlear implant

31
epilogue
  • privileged to have been part of this lively
    speech community for over 30 years
  • high expectations of progress to come
  • phonetic knowledge nowadays more accessible
  • and easier to implement in
  • descriptive models (computational phonetics), and
  • technological systems
  • thank you all for your kind attention!
Write a Comment
User Comments (0)
About PowerShow.com