Title: The Sounds of Silence:
1The Sounds of Silence
- Towards Automated Evaluation of
- Student Learning in
- a Reading Tutor that Listens
- Jack Mostow and Gregory Aist
- Project LISTEN, Carnegie Mellon University
- http//www.cs.cmu.edu/listen
2Pilot study in urban elementary school
- Goals
- Analyze extended use of Reading Tutor
- Identify opportunities for improvement
- Protocol
- Principal chose 8 lowest third-grade readers
- Aide took each kid daily to use Reading Tutor in
small room - Kid chose text to read (Weekly Reader, poems, )
- Milestones
- Oct. 96 deployed Pentium, trained users,
refined design - Nov. 96 school pre-tested individually
- June 97 school post-tested individually
3User-Tutor interaction(11/7/96 version used in
pilot study)
- User may
- click Back
- click Help
- click Go
- click word
- read
- Tutor may
- go on
- read word
- recue word
- read phrase
4Data recorded by Reading Tutor
- Sessions from Nov. 96 to May 97 (excluding
outliers) - 29 to 57 sessions per kid, averaging 14 minutes
- Not used during vacations, downtime, absences
- 6 gigabytes of data
- .WAV files of kids spoken utterances
- .SEG files of time-aligned speech recognizer
output - .LOG files of Reading Tutor events
5What to evaluate?
- Usability (can kids use it?)
- 1993 Wizard of Oz experiments
- Lab and in-school user tests of successive
versions - Assistiveness (do kids perform better with than
without?) - 1994 Reading Coach boosted comprehension by 20
- But evaluation obtrusive, costly, sparse,
subjective, noisy - Learning (do kids improve over time?)
- Within tutor this talk
- On unassisted reading pre-/post-test by school
- More than with alternatives future studies
6How should the Reading Tutorevaluate learning?
- Evaluation should be
- Ecologically valid -- based on normal system use
- Authentic -- student chooses material
- Unobtrusive -- invisible to student
- Automatic -- objective, cheap
- Fast -- computable in real-time on PC
- Robust -- to student, recognizer, and tutor
behavior - Data-rich -- based on many observations
- Sensitive -- detect subtle effects
- So estimate improvement in assisted performance
7How to estimate performance?
- Accuracy of text words matched by recognizer
output - Coarse-grained
- Sensitive to missed words
- Doesnt penalize requests for help
- Inter-word latency time interval between
aligned text words - Finer-grained
- Sensitive to hesitations, insertions
- Robust to many speech recognizer errors
8Estimation of accuracy and latency(Nov. 96
example from video)
- Text
- If the computer thinks you need help, it talks to
you. - Student said
- if the computer...takes your name...help
it...take...s to you - Recognizer heard
- IF THE COMPUTER THINKS YOU IF THE HELP IT TO TO
YOU - Tutor estimated 81 accuracy inter-word
latencies - If the computer thinks you needhelp, it
talks...to you. - ? 43 39 1 60 41 226 7 1
242 1 cs
9Improvement in accuracy and latency(same kid
reads help in May 97)
- Text
- When some kids jump rope, they help other people
too. - Student said
- when some kids jump rope they help other people
too - Recognizer heard
- WHEN SOME KIDS JUMP ROPE THEY HELP OTHER PEOPLE
TOO - Tutor estimated 100 accuracy inter-word
latencies - When some kids jump rope, they help other people
too. - ? 1 10 34 19 77
9 1 34 1 cs
10Which performance improvements count?
- Echoing the sentence doesnt count.
- So look only at the first try.
- Picking stories with easier words doesnt count.
- So look at changes on the same word.
- Memorizing the story doesnt count.
- So look only at encounters of words in new
contexts. - Remembering recent words doesnt count.
- So look only at the first time a word is seen
that day.
11 Accuracy increased 16 on same word from first
to last day seen in new context
12Latency decreased 35 on same word from first
to last day read in new context
13Is accuracy and latency estimation...
- Ecologically valid? Reading Tutor used in school
- Authentic? kids choose stories
- Unobtrusive? evaluate assisted reading invisibly
- Automatic? align recognizer output against text
- Fast? real-time on Pentium
- Robust? to much student, recognizer, and tutor
behavior - Data-rich? 10498 utterances, 139133 aligned
words - Sensitive? detects significant but subtle
effects (lt 0.1 sec)
14Conclusion
- Does the Reading Tutor help?
- Yes, with assisted reading
- Transfers to unassisted reading!
- Research questions
- Who benefits how much, when, and why?
- How should we improve the Tutor?
- For more information
- http//www.cs.cmu.edu/listen