Title: Annotation of speech from the phonetics/phonology perspective
1Annotation of speech from the phonetics/phonology
perspective
Bettina Braun Jürgen Trouvain
Fachrichtung 4.7, Institut für Phonetik
15.02.2002
2Manipulating text vs. speech 1
- text file manipulation "vowel-only" version
- remove all consonant letters, replace them with
a space, so that only the vowels are left - e ea e o e a o o o o a e ou y i
e o i i a e u y e i e a e
oo .
3Manipulating text vs. speech 2
- text file manipulation"consonants-only" version
- remove all vowel letters, replace them with a
space, so that only the consonants are left - Th w th r f r c st f r t m rr w r th r cl d
n th m n ng w th f w s nn sp lls n th ft
n n. -
4Manipulating text vs. speech 3
- The weather forecast for tomorrow rather cloudy
in the morning with a few sunny spells in the
afternoon. - speech file manipulation
- original recording, not manipulated
- "consonants-only" version vowel segments
replaced with silence - "vowels-only" version consonant segments
replaced with silence
5Coarticulation
- articulating means
- articulator in motion, not in fixed position
- articulators move continously, not discretely
- articulatory movements temporally overlap
6originalvowelsonlyvowelsonlywithouts
ilences
7Timing
- information of consonant durationssilence is
more than nothing
8Speech melody
- information about fundamental frequency (F0) in
the voiced vowel segments - with F0 variation
- without any F0 variation (monotonous)
9Annotation of sound segments discreteness in
mind in physics
m m m
o O
s s s
r r
g g
e _at_
n n
graphemes phonemes phones
N
O6
10Annotation of sound segments discrete units?
- "Die Nacht haben Maiers gut geschlafen."
- " haben Maier ."
- phonemic h a b _at_ n m aI _at_ r s
- acoustic-phonetic h a b m aI 6 s
- articulatory phonetic h a b n m aI 6
s(possibly)
11Segmentation of sound segments degree of
discreteness
- "Wer möchte noch Milch?"
- clear segmentation
- closure and closure release in t in "möch t e"
- unclear segmentation
- I l in "M il ch"
12Kiel Corpus read spontaneous speech
- orthography
- phonemic (canonical) form
- realised form
- word sentence boundary
- manually labelled
13From sounds to syllables how many syllables?
- semi-vowels syllabic or not?
- Studie Stu - di - e vs. Stu - die
- Piano Pi - a - no vs. Pia - no
- size of auditory window
- " mit mir diese Dienstreise zu unternehmen, "
- rei - se - zu - un - ter
- zu - un - ter
- zu - un
14From sounds to syllableswhere is the syllable
boundary?
- ambisyllabic consonants onset principles
- Mitte /m I - t _at_/ vs. /m I _t _at_/
- Adler /a t - l _at_ r/ vs. / a - d l _at_ r/
- Fenster /f E n s - t E r/ vs. /f E n - s t
E r/ - resyllabification
- "Wenn es Ihnen da 5 Tage lang irgendwo passen
würde." - /v E n - E s/ vs. v E _ n E s
15Controlled elicitation of spontaneous speech
- Monologues
- Erzählung
- Bildbeschreibung
- Dialogues Task-oriented data collection
- Map Task
- Appointment-making
- Degree of naturalness?
- Controlled elicitation
16Controlled elicitation of spontaneous speech
17Problems for annotation non-speech in speech
- Many non-linguistic signal portions
- swallowing
- lip-smacking
- breathing
- unfilled, filled pauses
- laughter
- hesitational lengthening
Partly overlapping with speech
18Functions of prosody
- Generally Features above the segmental level ?
suprasegmental
19Phonetic encoding of prosody
- perceived pitch over time
- duration
- intensity
- spectral quality
20Prosodic annotation Signal oriented
- Tilt-model (Taylor 2000)
- intonational events
- continuous parameters (tilt parameter)
- amplitude sum of the magnitude of rise and fall
- duration sum of rise and fall durations
- tilt shape of the event
1.0
0.5
0
21Prosodic annotation Autosegmental, phonological
- GToBI (Grice et al.)
- Tonal tier, break tier
- Two levels of pitch-heights (L, H)
- Simple and complex pitch accents
- Association to word stress marked by
- Exact temporal alignment
- Boundary tones marked by
- Strength of prosodic breaks (3, 4)
22Prosodic annotation Example
tonal
orth.
break
misc
23GToBI Labelfiles
orthografic
tones
breaks
- 46.836392 113 also
- 46.958899 113 ich
- 47.171623 113 bin
- 47.555335 113 genau
- 48.180049 113 waagerecht
- 48.468170 113 rechts
- 48.613576 113 von
- 48.726670 113 der
- 49.246344 113 Goldmine
47.469173 115 LH 47.555339 115 H-
47.768061 115 H 47.851534 115 lt
48.320061 115 !H 48.812822 115 !H
49.240958 115 L-
47.555339 123 3 49.249036 123 4
24Prosodic annotation Phonological, single-layer
- KIM (Kohler 1995)
- no suprasegmental tiers gt efficient analysis of
segment-prosody interaction - differentiated from segmental labels by special
diacritica - time marks for prosodic events anchored to word
boundaries. - Example
25- 13 c 0.0007500
- 13 2 0.0007500
- 13 v 0.0007500
- 13 Q- 0.0007500
- 13 E 0.0007500
- 2147 m 0.1341250
- 4787 PGn 0.2991250
- 4787 2( 0.2991250
- 4787 d 0.2991250
- 6243 -h 0.3901250
- 6619 'i 0.4136250
- 7569 n 0.4730000
- 8265 s 0.5165000
- 9202 t 0.5750625
- 9527 -h 0.5953750
- 9995 a 0.6246250
- 10648 k-x 0.6654375
- 11405 0 0.7127500
- 11405 v 0.7127500
14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
26- 13 c 0.0007500
- 13 2 0.0007500
- 13 v 0.0007500
- 13 Q- 0.0007500
- 13 E 0.0007500
- 2147 m 0.1341250
- 4787 PGn 0.2991250
- 4787 2( 0.2991250
- 4787 d 0.2991250
- 6243 -h 0.3901250
- 6619 'i 0.4136250
- 7569 n 0.4730000
- 8265 s 0.5165000
- 9202 t 0.5750625
- 9527 -h 0.5953750
- 9995 a 0.6246250
- 10648 k-x 0.6654375
- 11405 0 0.7127500
- 11405 v 0.7127500
14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
27- 13 c 0.0007500
- 13 2 0.0007500
- 13 v 0.0007500
- 13 Q- 0.0007500
- 13 E 0.0007500
- 2147 m 0.1341250
- 4787 PGn 0.2991250
- 4787 2( 0.2991250
- 4787 d 0.2991250
- 6243 -h 0.3901250
- 6619 'i 0.4136250
- 7569 n 0.4730000
- 8265 s 0.5165000
- 9202 t 0.5750625
- 9527 -h 0.5953750
- 9995 a 0.6246250
- 10648 k-x 0.6654375
- 11405 0 0.7127500
- 11405 v 0.7127500
14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
28- 13 c 0.0007500
- 13 2 0.0007500
- 13 v 0.0007500
- 13 Q- 0.0007500
- 13 E 0.0007500
- 2147 m 0.1341250
- 4787 PGn 0.2991250
- 4787 2( 0.2991250
- 4787 d 0.2991250
- 6243 -h 0.3901250
- 6619 'i 0.4136250
- 7569 n 0.4730000
- 8265 s 0.5165000
- 9202 t 0.5750625
- 9527 -h 0.5953750
- 9995 a 0.6246250
- 10648 k-x 0.6654375
- 11405 0 0.7127500
- 11405 v 0.7127500
14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
29- 13 c 0.0007500
- 13 2 0.0007500
- 13 v 0.0007500
- 13 Q- 0.0007500
- 13 E 0.0007500
- 2147 m 0.1341250
- 4787 PGn 0.2991250
- 4787 2( 0.2991250
- 4787 d 0.2991250
- 6243 -h 0.3901250
- 6619 'i 0.4136250
- 7569 n 0.4730000
- 8265 s 0.5165000
- 9202 t 0.5750625
- 9527 -h 0.5953750
- 9995 a 0.6246250
- 10648 k-x 0.6654375
- 11405 0 0.7127500
- 11405 v 0.7127500
14721 0 0.9200000 14721 m
0.9200000 16051 i6 1.0031250
16935 0 1.0583750 16935 g
1.0583750 18093 -h 1.1307500
18564 'u 1.1601875 19314 t
1.2070625 19981 -h 1.2487500
20336 0. 1.2709375 20336 2)
1.2709375 20336 p 1.2709375
21501 -h 1.3437500 22440 'a
1.4024375 23700 s 1.4811875
25408 _at_- 1.5879375 25408 n
1.5879375 28935 , 1.8083750
30Data structures and retrieval
- Mostly pure textfiles, aligned to signal
- Retrieval using script languages
- (GToBI in EMU-Format)
- XML-formats
31What for?
- Basic research
- Rhythmic patterns
- Speech rate measurements (units, domains)
- Temporal alignment scaling of pitch accents
- Differentiated analysis of pitch range
- Speech technology
- Modelling accentuation in ASR
- Speech rate in ASR
- Intonation and timing for synthesis
32Bibliography
- Alwan, A., H.Bourlard and S.Furui (eds). 2001.
Speech Communication 33. Special Issue on Speech
Annotation and Corpus Tools. - Grice,M., S.Baumann and R.Benzmüller (to appear).
German ToBI. In S.Jun (ed). Prosodic Typology - Grice, M. et al. (2000). Representation and
annotation of dialogue. In Handbook of
Multimodal and Spoken Dialogue Systems.
Resources, Terminology and Product Evaluation.
Kluwer, pp. 1-101. - Kohler, K.J. (ed) 1995. Kieler Arbeitsberichte
29. - Taylor, P. 2000. Analysis and Synthesis of
Intonation Using the Tilt Model. In JASA 107(3).
pp. 1697-1714.