Title: The Why2 Project
1The Why2 Project
- Kurt VanLehn
- the Natural Language Tutoring Group at the
University of Pittsburgh - Art Graesser
- the Tutoring Research Group at the
- University of Memphis
2Broad Outline
- Intro evaluations (45 min) Art Kurt
- Why2/Atlas (60 min) Pam
- Coffee break (15 min)
- Why2/AutoTutor (60 min) Art, Tanner Jackson
Andrew Olney - ProPl (30 min) H. Chad Lane(shared NL
technology, but not the Why2 task domain)
3Our research topic
- The effects of natural language interaction on
learning during tutoring - Where tutoring is
- In natural language (written or spoken)
- More conceptual than procedural
- Consistent with the 5-step frame (Graesser,
Person Magliano, 1995)
4Where the 5-step frame is
- Tutor asks a long answer question
- Student gives initial answer
- Tutor gives brief evaluative feedback
- Tutor and student collaboratively improve the
answer via a multi-turn dialogue - Tutor ends the discussion(often by asking
students if they understand, and almost always
getting a positive response).
5Example of the 5-step frame
- Tutor What does a T-test tell you?
- Student Whether a mean is significant.
- Tutor Sorta.
- Tutor Can it be applied to an experiment with
just 1 condition, or do you need 2 or
more?Student More than oneltetcgt - Tutor So do you understand the T-test?Student
YesTutor Good. Lets go on.
65-step frame tutoring is more effective than
reading a textbook
- AutoTutor (next slide) implements 5-step frame
- AutoTutor gt textbook nothing
- Computer literacy (Graesser et al. 2001)
- Qualitative physics (Graesser et al. 2003)
- ( experiment 2 in the Why2 series)
- Participants had taken or were taking relevant
course
7(No Transcript)
8Why is such tutoring effective?
- The Interaction hypothesis
- The more interaction, the more learning
- Inspired by
- Slameka Grafs generation effect
- Chis self-generation effect
9Testing the interaction hypothesis on the 5-step
frame
- Tutoring
- Leave steps 1 through 5 as interaction
- Text only
- Replacing steps 1 through 5 with text
- Canned text remediation
- Leave steps 1 and 2 as interaction, but replace
steps 3, 4 and 5 with text
Already found tutoring gt text
Will this be in between?
10Example, continued
- Canned text remediation
- Tutor What does a T-test tell you?
- Student Whether a mean is significant.
- Tutor The T-test is used to compare the means
of 2 or more populations. It determines the
chance that they
- Text only
- Tutor The T-test is used to compare the means of
2 or more populations. It determines the chance
that they
11We study both human tutors and computer tutors
- NL computer tutors can control the content of
tutoring better than human tutors - The Interaction hypothesis could be true for
human tutors but false for NL computer tutors - Or vice-versa?
12Include both deep linguistic andshallow
linguistic computer tutors
13Our research design
- Compare
- Human tutors
- Computer tutors
- Why2/AutoTutor
- Why2/Atlas
- Canned Text Remediation
- Control content, so only interactivity varies
- Prediction Human tutors gt Computer Tutors gt
Canned text remediation - No prediction Why2/AutoTutor vs. Why2/Atlas
14Outline of the evaluation section
- Introduction (done)
- Experiments
- Related work and interpretation
15Task domain Qualitativephysics essay questions
- Tutor Suppose a man is running in a straight
line at constant speed. He throws a pumpkin
straight up. Where will it land? Explain. - Student The pumpkin will land right back into
his hands. There is only one force acting on the
pumpkin once it leaves his hands and that is
gravity which is only pulling it straight down.
It does not have a horizontal component, only a
horizontal component straight down where the man
threw it.
16Students screen for human tutoringand Why2/Atlas
Problem
Dialogue history
Students essay
Students turn in the dialogue
17For each problem, all tutors implement the
5-step frame
- Show problem (question)
- Collect the students initial essay (answer)
- Analyze the essay for missing correct points and
for errors (misconceptions). - Select a missing point or error, and discuss it
with the student Repeat until satisfiedThis is
a simplification - Wrap up, including presenting the ideal essay
18For each problem, the canned text remediation
- Presents the problem
- Collects the students initial essay
- Presents several text minilessons
- Collects the students revised essay
- Presents the ideal essay
19Experiment 1 procedure
- Survey on participant information
- Pretest (form A or B)
- Multiple choice test
- 4 essay questions
- Training on 10 problems
- Posttest (form B or A)
- Multiple choice test
- 4 essay questions
20Experiment 1 participants
- Students
- Are taking or had taken college physics
- Had not taken advanced physics
- Tutors
- 4 physics instructors
- 1 was full time tutor for the project
21Experiment 1 conditions
- Human tutors (N 18)
- Why2/Atlas (N 22)
- Why2/AutoTutor (N 24)
- Canned Text Remediation (N 22)
22Experiment 1 adjusted post-test scores
23Experiment 1 resultsAll 4 groups learned the
same amount
- Measures
- Multiple choice test Raw, normalized, malleable
- Essay test Lenient, stringent, wholistic
- Combined measure of principle/misconceptions
- Statistical power was sufficient
- Subpopulations
- Test items that are similar to training vs. not
- Subjects with low vs. high pretests
24Experiment 1 Discussions
- Tests may not detect deep learning
- Add retention far transfer tests
- Multi-session experiment may be exhausting
- Reduce to two sessions
25Experiment 3 procedure
- Session 1
- Pretest (3 essays)
- Training (5 problems)
- Posttest (3 essays, 26 multiple choice)
- Session 2 (1 week later)
- Retention Test (3 essays, 26 multiple choice)
- Far transfer test (7 essays)
26Experiment 3 conditions participants
- Conditions
- Why/AutoTutor (N 28)
- Canned Text Remediation (N 28)
- Subjects had taken or were taking college physics
27Experiment 3 Adjusted post-testscores
28Experiment 3 discussion
- No significant differences between training
conditions. - Marginal advantage for AutoTutor on multiple
choice at posttest (p.08) and retention (p.12). - Intermediates may have been refreshing their
memory equally well in all conditions. - Use novice participants
29Experiment 4 participants conditions
- Participants
- No college physics courses
- High school physics okay
- Materials not rewritten
- College physics students are still the target
audience - Materials may be over the new participants heads
- Conditions
- Human tutoring, typed (N20)
- Human tutoring, spoken (N14)
- Canned text remediation (N20)
30Experiment 4 procedure
- Pre-test Same as experiment 1
- Pre-training short physics textbook
- Self-paced
- 32 minutes, average
- Training Same as experiment 1
- Post-test Same as experiment 1
31Experiment 4 Adjusted post-test scores
32Experiment 4 Discussion
- Findings
- Tutoring gt Canned Text Remediation
- Spoken human tutoring gt typed human tutoring
- Possible interpretations
- With these novice students, real learning
occurred, not just reminding, and tutors were
more effective - For these below-target students, canned text
remediation was over their heads, so tutoring was
necessary for learning
33Experiment 5 Experiment 1 done right
- Conditions
- Human tutors, spoken (N 21)
- Why2/Atlas (N 23)
- Why2/Autotutor (N 21)
- Canned Text Remediation (N 19)
- Participants
- No college physics
- High school physics okay
34Experiment 5 Procedure
- Pre-training Study short textbook
- Pre-test
- 26 multiple choice, 3 essays
- Training
- 5 problems
- Post-test
- 26 multiple choice, 3 essays
35Experiment 5 Adjusted post-test scores
36Experiment 5, low-pretest Adjusted post-test
scores
37Summary of experiments
- All 3 tutors Canned Text Rem.
- AutoTutor gt text nothing
- AutoTutor Canned Text Rem.
- Human tutors gt Canned Text Rem.
- All 3 tutors Canned Text Rem. but for
low-pretest students,Human tutors gt Canned Text
Rem.
Target audience
Below target audience
38Related workEssay questions
- Katz, Connelly Albritton (2003)
- Essay questions after solving physics problem
- Students currently taking college physics
- Human remediation Canned text remediation
- Like Why2 experiments 1, 3 and 5
- Essay questions gt nothing
- Like Why2 experiment 2
- Like computer literacy AutoTutor experiments
39Related workMulti-step tasks
- Some gt no interaction between steps
- Wood, Wood Middleton (1978)
- Swanson (1992)
- Merrill, Reiser, Merrill Landes (1995)
- Lane VanLehn (in press)
- Dialogue simpler interaction between steps
- Reif Scott (1999)
- Chi et al. (2001)
- Rosé, Moore, VanLehn Albritton (2001)
- Aleven, Koedinger Popescu (2002)
- Rosé, Bhembe, Siler Srivastav (2003)
40General pattern (almost)
- Some interaction gt no interaction
- Some essay questions gt none
- Some interaction between steps gt none
- Dialogue interaction Simpler interaction
- Tutorial remediation of essays canned text
remediation - Dialogue between steps canned text, hints,
menus, etc. between steps - But what about Why2 experiments 4 and 5N?
41Perhaps there is no generation effect
- Generation effect
- Student generated remediation gt tutor generated
- If an experiment is designed to test the
generation effect, then both the dialogue
interaction and the simple interaction are
designed to be in the students zone of proximal
development - So if there is no generation effect,
thendialogue interaction simple interaction - But when simple interaction is over the
students head then the dialogue interaction
scaffolds learning - So dialogue interaction gt simple interaction
- As in Why2 experiments 4 and 5N
42Is the interaction hypothesis true?
- Original interaction hypothesis
- The more interaction, the more learning
- Revised interaction hypothesis
- Some interaction gt no interaction
- Dialogue interaction gt simple interaction only
when the simple interaction is incomprehensible
to the student - Dialogue interaction simple interaction
otherwiseThere is no generation effect
43Future work Experiment 6
- Hypothesis
- Tutoring gt canned text remediation if essay
questions are very difficult relative to
students prior knowledge - Use same training problems
- Carefully control prior knowledge to keep it low
- Use novices
- Use long pretraining
44Experiment 6 conditions
- Why2/AutoTutor
- Why2/Atlas
- Canned Text Remediation
- Text only (new)
- Students read question then Canned Text
Remediation - Students do not write essays
- Omit human tutoring condition
45Why2 / Atlas dialogues made more adaptive
- If the essay is almost perfect, then just fix the
flaws - Otherwise, tutor asks 4 medium-sized questions
- For each one, if it is answered almost perfectly,
go on - Otherwise, break down into even smaller questions
46Broad Outline
- Intro evaluations (45 min) Art Kurt
- Why2/Atlas (60 min) Pam
- Coffee break (15 min)
- Why2/AutoTutor (60 min) Art, Tanner Jackson
Andrew Olney - ProPl (30 min) H. Chad Lane(shared NL
technology, but not the Why2 task domain)
Next
47END
48(No Transcript)
49Prior work Correlational studies with human
tutors
- For multiple measures, interaction correlates
with gains - Contingency (Wood Middleton, 1975)
- Scaffolding (Chi et al., 2001)
- Student/Tutor word ratio (Core et al., 2003)
- Student turn length intercept (Litman et al.,
2004)
50Correlation ? causation
- Students who are more may both learn more and
be more interactive during tutoring
obedient interested in topic awake
51At first, experimental evidence seems mixed
- Hypothesis More interaction causes more learning
- 10 experiments support
- Tutoring gt textbook (Graesser Lane Why2 expt.
2) - Tutoring gt nothing (Graesser Merrill)
- Contingent tutoring gt lecturing (Wood Swanson)
- Canned text remediation gt nothing (Katz)
- Tutoring gt Canned text remediation (Why2
experiments 4 5N) - 9 experiments do not support
- Socratic tutoring didactic tutoring (Rosé2001)
- Scaffolding lecturing or text (Chi Rosé2003)
- Tutoring multiple choice feedback (Aleven
Reif) - Tutoring Canned Text Remediation (Katz Why2
experiments 1, 3 5)