Title: Children
1Childrens Oral Reading Corpus (CHOREC)Descripti
on Assessment of Annotator AgreementL.
Cleuren, J. Duchateau, P. Ghesquière, H. Van
hammeThe SPACE project
2Overview Presentation
- The SPACE project
- Development of a reading tutor
- Development of CHOREC
- Annotation procedure
- Annotation agreement
- Conclusions
31. The SPACE project
- SPACE SPeech Algorithms for Clinical
Educational applications - http//www.esat.kuleuven.be/psi/spraak/projects/SP
ACE - Main goals
- Demonstrate the benefits of speech technology
based tools for - An automated reading tutor
- A pathological speech recognizer (e.g.
dysarthria) - Improve automatic speech recognition and speech
synthesis to use them in these tools
42. Development of a reading tutor
- Main goals
- Computerized assessment of word decoding skills
- Computerized training for slow and/or inaccurate
readers - Accurate speech recognition needed to accurately
detect reading errors
53. Development of CHOREC
- To improve the recognizers reading error
detection abilities - CHOREC is being developed
- Childrens Oral Reading Corpus
- Dutch database of recorded, transcribed, and
annotated - childrens oral readings
- Participants
- 400 Dutch speaking children
- 6-12 years old
- without (n 274, regular schools) or with (n
126, special schools) reading difficulties
63. Development of CHOREC (b)
- Reading material
- existing REAL WORDS
- unexisting, well pronounceable words (i.e.
PSEUDOWORDS) - STORIES
- Recordings
- 22050 Hz, 2 microphones
- 42 GB or 130 hours of speech
74. Annotation procedure
- Segmentations, transcriptions and annotations by
means of PRAAT (http//www.Praat.org)
84. Annotation procedure (b)
- 1. Pass 1 ? p-files
- Orthographic transcriptions
- Broad-phonetic transcription
- Utterances made by the examiner
- Background noise
- Pass 2 ? f-files
- (only for those words that contain reading
errors or hesitations) - Reading strategy labeling
- Reading error labeling
94. Annotation procedure (c)
- Expected Els zoekt haar schoen onder het bed.
- Els looks for her shoe under the bed.
- Observed Als (says something) zoekt haar
schschoen onder bed. - Als (says something) looks for het shshoe
under bed.
Els zoekt haar schoen onder het bed.
Orthography (als) s zoekt haar schoen onder bed
Phonetics Als s zukt har sx sxun Ond_at_r bEt
Strategy f sg g agg g O g
Error e/4
105. Annotation agreement
- Quality of annotations relies heavily on various
annotator characteristics (e.g. motivation) and
external influences (e.g. time pressure). - Analysis of inter- and intra-annotator agreement
to measure quality of annotations - INTER
- triple p-annotations by 3 different annotators
for 30 of the corpus (p01, p02, p03) - INTRA
- double f-annotations by the same annotator for
10 of the corpus (f01, f01b, f02)
115. Annotation agreement (b)
- Remark about the double f-annotations
- f01
- p01 reading strategy error tiers
- f01b
- f01 reading strategy error tiers
reading strategy error tiers - f02
- p02 reading strategy error tiers
- Agreement metrics
- Percentage agreement 95 CI
- Kappa statistic 95 CI
125. Annotation agreement (c)
All data
Reading error detection (RED) Reading error detection (RED) 95.96
Reading error detection (RED) Reading error detection (RED) ? 0.796
Orthographic transcr. (OT) Orthographic transcr. (OT) 90.79
Phonetic transcriptions (PT) Phonetic transcriptions (PT) 86.37
Phonetic transcriptions (PT) Phonetic transcriptions (PT) ? 0.930
Reading strategy labelling (RSL) f01-f01b (1) 98.64
Reading strategy labelling (RSL) f01-f01b (1) ? 0.966
Reading strategy labelling (RSL) f01-f02 (2) 91.50
Reading strategy labelling (RSL) f01-f02 (2) ? 0.779
Reading error labelling (REL) f01-f01b (1) 97.77
Reading error labelling (REL) f01-f01b (1) ? 0.911
Reading error labelling (REL) f01-f02 (2) 94.14
Reading error labelling (REL) f01-f02 (2) ? 0.717
- overall high agreement!
- ? 0.717 ? 0.966
- 86.37 ? 98.64
- INTER
- ? PT gt RED
- RED gt OT gt PT
- INTRA
- ? RSL gt REL
- (1) gt (2)
- RSL gt REL for (1)
- RSL lt REL for (2)
- (1) gt (2)
p lt .05
135. Annotation agreement (d)
School type School type
Regular Special
Reading error detection (RED) Reading error detection (RED) 96.32 95.21
Reading error detection (RED) Reading error detection (RED) ? 0.779 ? 0.816
Orthographic transcr. (OT) Orthographic transcr. (OT) 92.13 87.93
Phonetic transcriptions (PT) Phonetic transcriptions (PT) 88.51 82.18
Phonetic transcriptions (PT) Phonetic transcriptions (PT) ? 0.937 ? 0.917
Reading strategy labelling (RSL) f01-f01b (1) 98.72 98.45
Reading strategy labelling (RSL) f01-f01b (1) ? 0.961 ? 0.971
Reading strategy labelling (RSL) f01-f02 (2) 93.09 88.38
Reading strategy labelling (RSL) f01-f02 (2) ? 0.802 ? 0.744
Reading error labelling (REL) f01-f01b (1) 98.01 97.22
Reading error labelling (REL) f01-f01b (1) ? 0.899 ? 0.921
Reading error labelling (REL) f01-f02 (2) 95.39 91.71
Reading error labelling (REL) f01-f02 (2) ? 0.722 ? 0.706
- overall high agreement!
- ? 0.706 ? 0.971
-
- 82.18 ? 98.72
- When looking at agreement scores
- regular gt special
- (except for f01-f01b comparison)
- However, when looking at kappa values
- No systematic or sign. differences
- RED regular lt special
- PT regular gt special
- RSL(2) regular gt special
p lt .05
145. Annotation agreement (e)
Task type Task type Task type
Real Words (RW) Pseudowords (PW) Stories (S)
Reading error detection (RED) Reading error detection (RED) 95.20 90.59 98.37
Reading error detection (RED) Reading error detection (RED) ? 0.735 ? 0.776 ? 0.794
Orthographic tr. (OT) Orthographic tr. (OT) 88.92 80.50 95.56
Phonetic transcriptions (PT) Phonetic transcriptions (PT) 78.87 68.45 94.34
Phonetic transcriptions (PT) Phonetic transcriptions (PT) ? 0.907 ? 0.888 ? 0.964
Reading strategy labelling (RSL) f01-f01b (1) 98.35 96.75 99.26
Reading strategy labelling (RSL) f01-f01b (1) ? 0.960 ? 0.966 ? 0.956
Reading strategy labelling (RSL) f01-f02 (2) 91.25 77.79 95.96
Reading strategy labelling (RSL) f01-f02 (2) ? 0.774 ? 0.733 ? 0.711
Reading error labelling (REL) f01-f01b (1) 97.55 92.55 99.32
Reading error labelling (REL) f01-f01b (1) ? 0.896 ? 0.575 ? 0.933
Reading error labelling (REL) f01-f02 (2) 94.57 80.88 98.24
Reading error labelling (REL) f01-f02 (2) ? 0.709 ? 0.660 ? 0.848
- overall substantial agreement!
- ? 0.575 ? 0.966
-
- 68.45 ? 99.32
- When looking at agreement scores
- S gt RW gt PW
-
p lt .05
- However, when looking at kappa values
- Always best agreement for S
- (except for RSL no sign. diff.
- OR RW gt S in case of (2))
- No systematic or sign. differences
- w. r. t. RW and PW
- RED RW lt PW
- PT RW gt PW
- RSL RW PW
- REL RW gt PW (1) or RW PW (2)
155. Annotation agreement (f)
- Remarkable finding
- Systematic differences in agreement disappear
when looking at kappa values! - Explanation
- Differences go hand in hand with differences in
the amount of errors made - children coming from special schools make more
errors than children coming from a regular
school - pseudowords are harder to read than real
words, which are again harder to read than words
embedded in a text - ? Kappa is better suited to assess annotation
quality
166. Conclusions
- The SPACE project
- SPeech Algorithms for Clinical and Educational
applications - http//www.esat.kuleuven.be/psi/spraak/projects/SP
ACE - CHOREC
- Dutch database of recorded, transcribed, and
annotated - childrens oral readings
- Assessment of annotator agreement
- High overall agreement ? reliable annotations
- Kappa better suited to assess annotation quality