Annotation of speech from the phonetics/phonology perspective - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Annotation of speech from the phonetics/phonology perspective

Description:

remove all consonant letters, replace them with a space, ... orthography. phonemic (canonical) form. realised form. word & sentence boundary. manually labelled ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 33
Provided by: Masterb
Category:

less

Transcript and Presenter's Notes

Title: Annotation of speech from the phonetics/phonology perspective


1
Annotation of speech from the phonetics/phonology
perspective
Bettina Braun Jürgen Trouvain
Fachrichtung 4.7, Institut für Phonetik
15.02.2002
2
Manipulating text vs. speech 1
  • text file manipulation "vowel-only" version
  • remove all consonant letters, replace them with
    a space, so that only the vowels are left
  • e ea e o e a o o o o a e ou y i
    e o i i a e u y e i e a e
    oo .

3
Manipulating text vs. speech 2
  • text file manipulation"consonants-only" version
  • remove all vowel letters, replace them with a
    space, so that only the consonants are left
  • Th w th r f r c st f r t m rr w r th r cl d
    n th m n ng w th f w s nn sp lls n th ft
    n n.

4
Manipulating text vs. speech 3
  • The weather forecast for tomorrow rather cloudy
    in the morning with a few sunny spells in the
    afternoon.
  • speech file manipulation
  • original recording, not manipulated
  • "consonants-only" version vowel segments
    replaced with silence
  • "vowels-only" version consonant segments
    replaced with silence

5
Coarticulation
  • articulating means
  • articulator in motion, not in fixed position
  • articulators move continously, not discretely
  • articulatory movements temporally overlap

6
originalvowelsonlyvowelsonlywithouts
ilences
7
Timing
  • information of consonant durationssilence is
    more than nothing

8
Speech melody
  • information about fundamental frequency (F0) in
    the voiced vowel segments
  • with F0 variation
  • without any F0 variation (monotonous)

9
Annotation of sound segments discreteness in
mind in physics
  • "Es ist 8 Uhr morgens."

m m m
o O
s s s
r r
g g
e _at_
n n
graphemes phonemes phones
N
O6
10
Annotation of sound segments discrete units?
  • "Die Nacht haben Maiers gut geschlafen."
  • " haben Maier ."
  • phonemic h a b _at_ n m aI _at_ r s
  • acoustic-phonetic h a b m aI 6 s
  • articulatory phonetic h a b n m aI 6
    s(possibly)

11
Segmentation of sound segments degree of
discreteness
  • "Wer möchte noch Milch?"
  • clear segmentation
  • closure and closure release in t in "möch t e"
  • unclear segmentation
  • I l in "M il ch"

12
Kiel Corpus read spontaneous speech
  • orthography
  • phonemic (canonical) form
  • realised form
  • word sentence boundary
  • manually labelled

13
From sounds to syllables how many syllables?
  • semi-vowels syllabic or not?
  • Studie Stu - di - e vs. Stu - die
  • Piano Pi - a - no vs. Pia - no
  • size of auditory window
  • " mit mir diese Dienstreise zu unternehmen, "
  • rei - se - zu - un - ter
  • zu - un - ter
  • zu - un

14
From sounds to syllableswhere is the syllable
boundary?
  • ambisyllabic consonants onset principles
  • Mitte /m I - t _at_/ vs. /m I _t _at_/
  • Adler /a t - l _at_ r/ vs. / a - d l _at_ r/
  • Fenster /f E n s - t E r/ vs. /f E n - s t
    E r/
  • resyllabification
  • "Wenn es Ihnen da 5 Tage lang irgendwo passen
    würde."
  • /v E n - E s/ vs. v E _ n E s

15
Controlled elicitation of spontaneous speech
  • Monologues
  • Erzählung
  • Bildbeschreibung
  • Dialogues Task-oriented data collection
  • Map Task
  • Appointment-making
  • Degree of naturalness?
  • Controlled elicitation

16
Controlled elicitation of spontaneous speech
17
Problems for annotation non-speech in speech
  • Many non-linguistic signal portions
  • swallowing
  • lip-smacking
  • breathing
  • unfilled, filled pauses
  • laughter
  • hesitational lengthening

Partly overlapping with speech
18
Functions of prosody
  • Generally Features above the segmental level ?
    suprasegmental

19
Phonetic encoding of prosody
  • perceived pitch over time
  • duration
  • intensity
  • spectral quality

20
Prosodic annotation Signal oriented
  • Tilt-model (Taylor 2000)
  • intonational events
  • continuous parameters (tilt parameter)
  • amplitude sum of the magnitude of rise and fall
  • duration sum of rise and fall durations
  • tilt shape of the event

1.0
0.5
0
21
Prosodic annotation Autosegmental, phonological
  • GToBI (Grice et al.)
  • Tonal tier, break tier
  • Two levels of pitch-heights (L, H)
  • Simple and complex pitch accents
  • Association to word stress marked by
  • Exact temporal alignment
  • Boundary tones marked by
  • Strength of prosodic breaks (3, 4)

22
Prosodic annotation Example
tonal
orth.
break
misc
23
GToBI Labelfiles
orthografic
tones
breaks
  • 46.836392 113 also
  • 46.958899 113 ich
  • 47.171623 113 bin
  • 47.555335 113 genau
  • 48.180049 113 waagerecht
  • 48.468170 113 rechts
  • 48.613576 113 von
  • 48.726670 113 der
  • 49.246344 113 Goldmine

47.469173 115 LH 47.555339 115 H-
47.768061 115 H 47.851534 115 lt
48.320061 115 !H 48.812822 115 !H
49.240958 115 L-
47.555339 123 3 49.249036 123 4
24
Prosodic annotation Phonological, single-layer
  • KIM (Kohler 1995)
  • no suprasegmental tiers gt efficient analysis of
    segment-prosody interaction
  • differentiated from segmental labels by special
    diacritica
  • time marks for prosodic events anchored to word
    boundaries.
  • Example

25
  • 13 c 0.0007500
  • 13 2 0.0007500
  • 13 v 0.0007500
  • 13 Q- 0.0007500
  • 13 E 0.0007500
  • 2147 m 0.1341250
  • 4787 PGn 0.2991250
  • 4787 2( 0.2991250
  • 4787 d 0.2991250
  • 6243 -h 0.3901250
  • 6619 'i 0.4136250
  • 7569 n 0.4730000
  • 8265 s 0.5165000
  • 9202 t 0.5750625
  • 9527 -h 0.5953750
  • 9995 a 0.6246250
  • 10648 k-x 0.6654375
  • 11405 0 0.7127500
  • 11405 v 0.7127500

14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
26
  • 13 c 0.0007500
  • 13 2 0.0007500
  • 13 v 0.0007500
  • 13 Q- 0.0007500
  • 13 E 0.0007500
  • 2147 m 0.1341250
  • 4787 PGn 0.2991250
  • 4787 2( 0.2991250
  • 4787 d 0.2991250
  • 6243 -h 0.3901250
  • 6619 'i 0.4136250
  • 7569 n 0.4730000
  • 8265 s 0.5165000
  • 9202 t 0.5750625
  • 9527 -h 0.5953750
  • 9995 a 0.6246250
  • 10648 k-x 0.6654375
  • 11405 0 0.7127500
  • 11405 v 0.7127500

14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
27
  • 13 c 0.0007500
  • 13 2 0.0007500
  • 13 v 0.0007500
  • 13 Q- 0.0007500
  • 13 E 0.0007500
  • 2147 m 0.1341250
  • 4787 PGn 0.2991250
  • 4787 2( 0.2991250
  • 4787 d 0.2991250
  • 6243 -h 0.3901250
  • 6619 'i 0.4136250
  • 7569 n 0.4730000
  • 8265 s 0.5165000
  • 9202 t 0.5750625
  • 9527 -h 0.5953750
  • 9995 a 0.6246250
  • 10648 k-x 0.6654375
  • 11405 0 0.7127500
  • 11405 v 0.7127500

14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
28
  • 13 c 0.0007500
  • 13 2 0.0007500
  • 13 v 0.0007500
  • 13 Q- 0.0007500
  • 13 E 0.0007500
  • 2147 m 0.1341250
  • 4787 PGn 0.2991250
  • 4787 2( 0.2991250
  • 4787 d 0.2991250
  • 6243 -h 0.3901250
  • 6619 'i 0.4136250
  • 7569 n 0.4730000
  • 8265 s 0.5165000
  • 9202 t 0.5750625
  • 9527 -h 0.5953750
  • 9995 a 0.6246250
  • 10648 k-x 0.6654375
  • 11405 0 0.7127500
  • 11405 v 0.7127500

14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
29
  • 13 c 0.0007500
  • 13 2 0.0007500
  • 13 v 0.0007500
  • 13 Q- 0.0007500
  • 13 E 0.0007500
  • 2147 m 0.1341250
  • 4787 PGn 0.2991250
  • 4787 2( 0.2991250
  • 4787 d 0.2991250
  • 6243 -h 0.3901250
  • 6619 'i 0.4136250
  • 7569 n 0.4730000
  • 8265 s 0.5165000
  • 9202 t 0.5750625
  • 9527 -h 0.5953750
  • 9995 a 0.6246250
  • 10648 k-x 0.6654375
  • 11405 0 0.7127500
  • 11405 v 0.7127500

14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
30
Data structures and retrieval
  • Mostly pure textfiles, aligned to signal
  • Retrieval using script languages
  • (GToBI in EMU-Format)
  • XML-formats

31
What for?
  • Basic research
  • Rhythmic patterns
  • Speech rate measurements (units, domains)
  • Temporal alignment scaling of pitch accents
  • Differentiated analysis of pitch range
  • Speech technology
  • Modelling accentuation in ASR
  • Speech rate in ASR
  • Intonation and timing for synthesis

32
Bibliography
  • Alwan, A., H.Bourlard and S.Furui (eds). 2001.
    Speech Communication 33. Special Issue on Speech
    Annotation and Corpus Tools.
  • Grice,M., S.Baumann and R.Benzmüller (to appear).
    German ToBI. In S.Jun (ed). Prosodic Typology
  • Grice, M. et al. (2000). Representation and
    annotation of dialogue. In Handbook of
    Multimodal and Spoken Dialogue Systems.
    Resources, Terminology and Product Evaluation.
    Kluwer, pp. 1-101.
  • Kohler, K.J. (ed) 1995. Kieler Arbeitsberichte
    29.
  • Taylor, P. 2000. Analysis and Synthesis of
    Intonation Using the Tilt Model. In JASA 107(3).
    pp. 1697-1714.
Write a Comment
User Comments (0)
About PowerShow.com