Title: Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges
1Spoken Dialogue for Intelligent Tutoring
SystemsOpportunities and Challenges
- Diane Litman
- Computer Science Department
- Learning Research Development Center
- University of Pittsburgh
- HLT-NAACL 2006
2Outline
- Motivation and History
- The ITSPOKE System and Corpora
- Opportunities and Challenges
- Performance Evaluation
- Affective Reasoning
- Discourse Analysis
- Summing Up
3What is Tutoring?
- A one-on-one dialogue between a teacher and a
student for the purpose of helping the student
learn something. - Evens and Michael 2006
- Human Tutoring Excerpt
- Thanks to Natalie Person and Lindsay
Sears, Rhodes College
4Intelligent Tutoring Systems
- Students who receive one-on-one instruction
perform as well as the top two percent of
students who receive traditional classroom
instruction Bloom 1984 - Unfortunately, providing every student with a
personal human tutor is infeasible - Develop computer tutors instead
5Tutorial Dialogue Systems
- Why is one-on-one tutoring so effective?
- ...there is something about discourse and
natural language (as opposed to sophisticated
pedagogical strategies) that explains the
effectiveness of unaccomplished human tutors. - Graesser, Person et al. 2001
- Working hypothesis regarding learning gains
- Human Dialogue Computer Dialogue Text
6Spoken Tutorial Dialogue Systems
- Most human tutoring involves face-to-face spoken
interaction, while most computer dialogue tutors
are text-based - Can the effectiveness of dialogue tutorial
systems be further increased by using spoken
interactions?
7A Brief History
- 1970 Mid 1980s
- SCHOLAR (Carbonell)
- WHY (Stevens and Collins)
- SOPHIE (Burton and Brown)
- Meno-Tutor (Woolf and McDonald)
- Late 1980s - 1990s
- CIRCSIM-Tutor (Evens, Michael and Rovick)
- SHERLOCK II (Lesgold)
- Unix Consultant (Wilensky et al. )
- EDGE (Cawsey)
- Currently
- Why2-AutoTutor (Graesser et al.) (speech
synthesis) - Why2-Atlas (VanLehn et al.)
- CyclePad (Rose et al.)
- Beetle (Moore et al.)
- DIAG-NLG (Di Eugenio)
- SCoT (Peters et al.) (spoken dialogue)
- ITSPOKE (Litman et al.) (spoken dialogue)
8Potential Benefits of Speech I
- Self-explanation correlates with learning Chi et
al. 1994 and occurs more in speech Hausmann and
Chi 2002 - Tutor The right side pumps blood to the lungs,
and the left side pumps blood to the other parts
of the body. Could you explain how that works? - Student 1 (self-explains) So the septum is a
divider so that the blood doesn't get mixed up.
So the right side is to the lungs, and the left
side is to the body. So the septum is like a wall
that divides the heart into two parts...it kind
of like separates it so that the blood doesn't
get mixed up... - Student 2 (doesnt self-explain) right side
pumps blood to lungs
9Potential Benefits of Speech I
- Self-explanation correlates with learning Chi et
al. 1994 and occurs more in speech Hausmann and
Chi 2002 - Tutor The right side pumps blood to the lungs,
and the left side pumps blood to the other parts
of the body. Could you explain how that works? - Student 1 (self-explains) So the septum is a
divider so that the blood doesn't get mixed up.
So the right side is to the lungs, and the left
side is to the body. So the septum is like a wall
that divides the heart into two parts...it kind
of like separates it so that the blood doesn't
get mixed up... - Student 2 (doesnt self-explain) right side
pumps blood to lungs
10Potential Benefits of Speech I
- Self-explanation correlates with learning Chi et
al. 1994 and occurs more in speech Hausmann and
Chi 2002 - Tutor The right side pumps blood to the lungs,
and the left side pumps blood to the other parts
of the body. Could you explain how that works? - Student 1 (self-explains) So the septum is a
divider so that the blood doesn't get mixed up.
So the right side is to the lungs, and the left
side is to the body. So the septum is like a wall
that divides the heart into two parts...it kind
of like separates it so that the blood doesn't
get mixed up... - Student 2 (doesnt self-explain) right side
pumps blood to lungs
11Potential Benefits of Speech II
- Speech contains prosodic information, providing
new sources of information about the student for
dialogue adaptation Fox 1993 Litman and
Forbes-Riley 2003 Pon-Barry et al. 2005 - A correct but uncertain student turn
- ITSPOKE How does his velocity compare to that of
his keys? - STUDENT his velocity is constant
12Potential Benefits of Speech III
- Spoken computational environments may foster
social relationships that may enhance learning - AutoTutor Graesser et al. 2003
13Potential Benefits of Speech IV
- Some applications inherently involve spoken
language - Spoken Conversational Interface for
- Language Learning
- Thanks to Stephenie Seneff, MIT and Cambridge
- Reading Tutors Mostow, Cole
- Others require hands-free interaction
- Circuit Fix-It Shop Smith 1992
14Why Should NLP Researchers Care?
- Many reasons why tutoring researchers are
interested in spoken dialogue - Why should spoken dialogue researchers become
interested in tutoring? - Tutoring applications differ in many ways from
typical spoken dialogue applications - Opportunities and Challenges!
15Outline
- Motivation and History
- The ITSPOKE System and Corpora
- Opportunities and Challenges
- Performance Evaluation
- Affective Reasoning
- Discourse Analysis
- Summing Up
16- Back-end is Why2-Atlas system VanLehn et al.
2002 - Sphinx2 speech recognition and Cepstral
text-to-speech
17- Back-end is Why2-Atlas system VanLehn et al.
2002 - Sphinx2 speech recognition and Cepstral
text-to-speech
18- Back-end is Why2-Atlas system VanLehn et al.
2002 - Sphinx2 speech recognition and Cepstral
text-to-speech
19Two Types of Tutoring Corpora
- Human Tutoring
- 14 students / 128 dialogues (physics problems)
- 5948 student turns, 5505 tutor turns
- Computer Tutoring
- ITSPOKE v1
- 20 students / 100 dialogues
- 2445 student turns, 2967 tutor turns
- ITSPOKE v2
- 57 students / 285 dialogues
- both synthesized and pre-recorded tutor voices
20ITSPOKE Experimental Procedure
- College students without physics
- Read a small background document
- Took a multiple-choice Pretest
- Worked 5 problems (dialogues) with ITSPOKE
- Took an isomorphic Posttest
- Goal was to optimize Learning Gain
- e.g., Posttest Pretest
21Outline
- Motivation and History
- The ITSPOKE System and Corpora
- Opportunities and Challenges
- Performance Evaluation
- Affective Reasoning
- Discourse Analysis
- Summing Up
22Predictive Performance Modeling
- Opportunity
- Spoken dialogue system evaluation methodologies
can improve our understanding of how dialogue
facilitates student learning Forbes-Riley and
Litman 2006 - Challenges
- How to measure system performance?
- What are predictive interaction parameters?
23Predictive Performance Modeling
- Understand why a spoken dialogue system fails or
succeeds - PARADISE Walker et al. 1997
- Measure parameters (interaction costs and
benefits) and performance in a system corpus - Train model via multiple linear regression over
parameters, predicting performance - System Performance ? wi pi
- Test model on new corpus
- Predict performance during future system design
n
i1
24Challenges
- System Performance
- Prior evaluations used User Satisfaction
- Is Student Learning more relevant for the
tutoring domain? - Interaction Parameters
- Prior applications used Generic parameters
- Are Task-Specific and Affective parameters also
useful?
25Findings
- Using PARADISE to predict Learning
- Posttest .86 Time .65 Pretest - .54
Neutrals - Useful Predictors
- Traditional parameters
- e.g., Elapsed Time, Dialogue and Turn Length
- New parameters
- e.g., Affect, Correctness
26Contrasts with Non-Tutorial Dialogue
- User Satisfaction models are less useful
- Tutoring systems are not designed to maximize
User Satisfaction - Interaction parameters for learning
- Posttest .86 Time .65 Pretest - .54
Neutrals - longer dialogues are better
- speech recognition problems dont seem to matter
- lack of some types of affect is bad
27Contrasts with Non-Tutorial Dialogue
- User Satisfaction models are less useful
- Tutoring systems are not designed to maximize
User Satisfaction - Interaction parameters for learning
- Posttest .86 Time .65 Pretest - .54
Neutrals - longer dialogues are better
- speech recognition problems dont seem to matter
- lack of some types of affect is bad
28Contrasts with Non-Tutorial Dialogue
- User Satisfaction models are less useful
- Tutoring systems are not designed to maximize
User Satisfaction - Interaction parameters for learning
- Posttest .86 Time .65 Pretest - .54
Neutrals - longer dialogues are better
- speech recognition problems dont seem to matter
- lack of some types of affect is bad
29Outline
- Motivation and History
- The ITSPOKE System and Corpora
- Opportunities and Challenges
- Performance Evaluation
- Affective Reasoning
- Discourse Analysis
- Summing Up
30Detecting and Responding to Student Affective
States
- Opportunity
- Affective spoken dialogue system technology can
improve student learning and other measures of
performance Aist et al. 2002 Pon-Barry et al.
2006 - Challenges
- What to detect?
- How to respond?
- Pedagogical versus spoken dialogue performance?
31Monitoring Student State (motivation)
- Tutor26 Let me ask my question in a more
general way. What happens to the velocity of a
body when there is no force acting on it? - Student27 dammit (ASR it is)
- Tutor28 Could you please repeat that?
- Student29 same (ASR i same)
- Tutor30 Right. And what happens to the
HORIZONTAL velocity of a body when there is no
HORIZONTAL force acting on it? - Student31 zero (ASR the zero)
- Tutor32 Well... OK, let's consider how forces
in the vertical direction affect the motion of a
body in a horizontal direction in a different
scenario - Student33 oh gravity you already said this (ASR
beats gravity you're exert less) - Tutor34 Fine. Are there any other forces acting
on the apple as it falls? - Student35 no why are you doing this again (ASR
no y and to it yes) - Tutor36 Correct. The only force acting on the
apple as it falls to the earth is the earth's
gravitational force. In which direction does
gravitational force act? - Student37 downward you computer (ASR downward
you computer) -
32Affective Spoken Dialogue Systems Standard
Methodology
- Manual Annotation of Affect and Attitudes
- Naturally-occurring spoken dialogue data Ang et
al. 2002 Lee et al. 2002 Batliner et al. 2003
Devillers et al. 2003 Shafran et al. 2003
Liscombe et al. 2005 - Prediction via Machine Learning
- Automatically extract features from user turns
- Use different feature sets (e.g. prosodic,
lexical) to predict affect - Significant reduction of baseline error
33Challenge 1 What emotions to detect?
- Communicator and Customer Care Systems
- Negative Angry, Annoyed, Frustrated, Tired
- Positive/Neutral Amused, Cheerful, Delighted,
Happy, Serious - Ang et al. 2002 Shafran et al. 2003 Lee and
Narayanan 2005 Liscombe et al. 2005
34Challenge 1 What emotions to detect?
- Communicator and Customer Care Systems
- Negative Angry, Annoyed, Frustrated, Tired
- Positive/Neutral Amused, Cheerful, Delighted,
Happy, Serious - Ang et al. 2002 Shafran et al. 2003 Lee and
Narayanan 2005 Liscombe et al. 2005 - Tutorial Dialogue Systems
- Negative Angry, Annoyed, Frustrated, Bored,
Confused, Uncertain, Contempt, Disgusted, Sad - Positive/Neutral Certain, Curious, Enthusiastic,
Eureka - Litman and Forbes-Riley 2006, DMello et al.
2006
35Example Student States in ITSPOKE
- ITSPOKE What else do you need to know to find
the boxs acceleration? - Student the direction UNCERTAIN
- ITSPOKE If you see a body accelerate, what
caused that acceleration? - Student force CERTAIN
- ITSPOKE Good job. Say there is only one force
acting on the box. How is this force, the box's
mass, and its acceleration related? - Student velocity UNCERTAIN
- ITSPOKE Could you please repeat that?
- Student velocity ANNOYED
36Challenge 2 How to respond?
- In tutoring, not all negatively-valenced emotions
are bad! - While frustration/anger/annoyance is often
frustrating - Frustration can also be an opportunity to learn
- Example from AutoTutor
- neutral ? flow ? confusion ? frustration ?
neutral - Thanks to Sidney DMello and Arthur Graesser,
- University of Memphis
37Example from ITSPOKE
- ITSPOKE What is the net force acting on the
truck equal to? - Student I dont know
Uncertain, Frustrated - ITSPOKE Since net force is defined as the
vector sum of all the forces exerted on an
object, let's first find the forces acting on
the truck. Try to name them. - Student The impact force, the normal force, and
gravity Certain, Neutral - ITSPOKE Yep.
38Uncertainty is also a Learning Opportunity
- Uncertainty represents one type of learning
impasse VanLehn et al. 2003 -
- An impasse motivates a student to take an
active role in constructing a better
understanding of the principle. - Uncertainty is also associated with cognitive
disequilibrium Craig et al. 2004 - A state of failed expectations causing
deliberation aimed at restoring equilibrium - Uncertainty positively correlates with learning
39Do Human Tutors Respond to Student Uncertainty?
- A data-driven method for designing dialogue
systems adaptive to student state Forbes-Riley
and Litman 2005 - extraction of dialogue bigrams from annotated
human tutoring corpora - ?2 analysis to identify dependent bigrams
- generalizable to any domain with corpora labeled
for user state and system response
40Example Human Tutoring Excerpt
- S So the- when you throw it up the acceleration
will stay the same? Uncertain - T Acceleration uh will always be the same
because there is- that is being caused by force
of gravity which is not changing. Restatement
, Expansion - S mm-k. Neutral
- T Acceleration is it is in- what is the
direction uh of this acceleration- acceleration
due to gravity? - Short Answer Question
- S Its- the direction- its downward. Certain
- T Yes, its vertically down. Positive
Feedback, Restatement
41Bigram Dependency Analysis
- Student Certainness Tutor Positive Feedback
Bigrams
?2 225.92 (critical ?2 value at p .001 is
16.27)
42Bigram Dependency Analysis (cont.)
- Less Tutor Positive Feedback after Student
Neutral turns
43Bigram Dependency Analysis (cont.)
- Less Tutor Positive Feedback after Student
Neutral turns
- More Tutor Positive Feedback after Emotional
turns
44Findings
- Statistically significant dependencies exist
between students state of certainty and the
responses of an expert human tutor - After uncertain, tutor Bottoms Out and avoids
expansions - After certain, tutor Restates
- After mixed, tutor Hints
- After any emotion, tutor increases Feedback
- Dependencies suggest adaptive strategies for
implementation in computer tutoring systems
45Challenge 3 Pedagogical versus spoken dialogue
performance?
- Negative user emotions (e.g. frustration) are
often associated with speech recognition problems
Boozer et al. 2003 Goldberg et al. 2003 - Is this also true in tutoring?
- Speech recognition problems negatively correlate
with user satisfaction Walker et al. 2002,
Pon-Barry et al. 2006 - Is this also true for learning?
46Findings
- Statistically significant dependencies exist
between student state and speech recognition
problems Rotaru and Litman 2006 - Frustrated/Angry turns are rejected more than
expected - Uncertain turns have more problems than expected
(certain turns have less) - Incorrect turns have more problems than expected
(correct turns have less) - Learning opportunities (e.g. uncertain and
incorrect student states) have more speech
recognition problems - However, speech recognition problems have not
negatively correlated with learning Litman and
Forbes-Riley 2005, Pon-Barry et al. 2005
47Outline
- Motivation and History
- The ITSPOKE System and Corpora
- Opportunities and Challenges
- Performance Evaluation
- Affective Reasoning
- Discourse Analysis
- Summing Up
48Discourse Structure
- Opportunity
- Dialogues with tutoring systems have more complex
hierarchical discourse structures compared to
many other types of dialogues - Challenges
- How can discourse structure be exploited in the
context of spoken dialogue systems?
49Exploiting Discourse Structure (Motivation)
- Average ITSPOKE dialogue is 20 minutes
- Student turns are hierarchically structured
- Level 1 1350 (57.3)
- Level 2 643 (27.3)
- Level 3 248 (10.5)
- Levels 4-6 113 (4.8)
50Discourse structureAnnotation and Transitions
- Based on the Grosz Sidner theory of discourse
structure - Discourse segment ? Discourse segment purpose
- Hierarchy of discourse segments
- Tutoring information encoded in a hierarchical
structure - Human tutor manually authored dialogue paths for
ITSPOKE - Automatic traversal of logs places utterances
into the structure
51ITSPOKE behavior Discourse structure annotation
Q1
Q2
Q3
Q2.1
Q2.2
52Discourse structure transitions
53 Findings
- Student correctness is predictive of student
learning, but only after particular discourse
transitions Rotaru and Litman 2006 - e.g., After Pops (PopUp, PopUpAdvance)
- incorrect turns negatively predict learning
- correct turns positively predict learning
- Student certainness is more predictive only after
particular transitions
54Findings (cont.)
- While single discourse transitions are not
predictive of learning, patterns in the discourse
structure are - e.g., Advance-Advance and Push-Push both
positively correlate with learning - Statistically significant dependencies exist
between discourse transitions and speech
recognition - e.g., after both Pushes and Pops, more
misrecognitions
55Outline
- Motivation and History
- The ITSPOKE System and Corpora
- Opportunities and Challenges
- Performance Evaluation
- Affective Reasoning
- Discourse Analysis
- Summing Up
56Summing Up I
- Spoken Dialogue Systems are of great interest to
researchers in Intelligent Tutoring - One-on-one tutoring is a powerful technique for
helping students learn - Natural language dialogue contributes in a
powerful way to the efficacy of
one-on-one-tutoring - Using presently available NLP technology,
computer tutors can be built and can serve as a
valuable aid to student learning
57Summing Up II
- Intelligent Tutoring in turn provides many
opportunities and challenges for researchers in
Spoken Dialogue Systems - Performance Evaluation
- Affective Reasoning
- Discourse Analysis
58Summing Up II
- Intelligent Tutoring in turn provides many
opportunities and challenges for researchers in
Spoken Dialogue Systems - Performance Evaluation
- Affective Reasoning
- Discourse Analysis
- and many more!
- Initiative, Cohesion/Coherence, Dialogue Acts,
Turn-Taking, Reinforcement Learning, User
Simulation, Question-Answering
59Acknowledgements
- ITSPOKE group
- Hua Ai, Kate Forbes-Riley, Alison Huettner,
Beatriz Maeireizo-Tokeshi, Greg Nicholas, Amruta
Purandare, Mihai Rotaru, Scott Silliman, Joel
Tetrault, Art Ward - Columbia Collaborators Julia Hirschberg, Jackson
Liscombe, Jennifer Venditti - NLP_at_Pitt
- Jan Wiebe, Rebecca Hwa, Wendy Chapman, Paul
Hoffmann, Behrang Mohit, Carol Nichols, Swapna
Somasundaran, Theresa Wilson, Chenhai Xi - Why2-Atlas and Human Tutoring groups
- Kurt Vanlehn, Pam Jordan, Uma Pappuswamy, Carolyn
Rose - Micki Chi, Scotty Craig, Bob Hausmann,
Margueritte Roy - Art Graesser, Natalie Person, Sidney DMello,
Lindsay Sears - Stephenie Seneff
- Martha Evens
60Thank You!
- Questions?
- Further Information
- http//www.cs.pitt.edu/litman/itspoke.html
- And in September, come to Pittsburgh for
Interspeech 2006!