Title: Using%20Mixed-Effects%20Modeling%20to%20Compare%20Different%20Grain-Sized%20Skill%20Models
1Using Mixed-Effects Modeling to Compare Different
Grain-Sized Skill Models
- Mingyu Feng, Worcester Polytechnic Institute
- Neil T. Heffernan, Worcester Polytechnic
Institute - Murali Mani, Worcester Polytechnic Institute
- Cristina Heffernan, Worcester Public Schools
2The ASSISTment System
- An e-assessment and e-learning system that does
both ASSISTing of students and assessMENT (movie) - Massachusetts Comprehensive Assessment System
MCAS
- Web-based system built on Common Tutoring Object
Platform (CTOP) 1
We are giving away accounts!
1 Nuzzo-Jones., G. Macasek M.A., Walonoski, J.,
Rasmussen K. P., Heffernan, N.T., Common Tutor
Object Platform, an e-Learning Software
Development Strategy, WPI technical report.
WPI-CS-TR-06-08.
3ASSISTment
Geometry
- We break multi-step problems into scaffolding
questions - Hint Messages given on demand that give hints
about what step to do next - Buggy Message a context sensitive feedback
message - Skills
- The state reports to teachers on 5 areas
- We seek to report on more and finer grain-sized
skills - Demo (two triangles problem)
(Demo/movie)
The original question
a. Congruence
b. Perimeter
c. Equation-Solving
The 1st scaffolding question
Congruence
The 2nd scaffolding question
Perimeter
A buggy message
A hint message
4How was the Skill Models Created
5How was the Skill Models Created
Multi-mapped model (WPI-5) vs. single-mapped
model (MCAS-5) ?
6Previous Work on Skill Models
- Fine grained skill models in reporting
- Teachers get reports that they think are credible
and useful. 3
3 Feng, M., Heffernan, N.T. (in press).
Informing Teachers Live about Student Learning
Reporting in the Assistment System. To be
published in Technology, Instruction, Cognition,
and Learning Journal Vol. 3. Old City Publishing,
Philadelphia, PA. 2006
7(No Transcript)
8(No Transcript)
9Previous Work on Skill Models
- Tracking skill performance over time 45
Number Sense
4 Feng, M., Heffernan, N.T., Koedinger, K.R.
(2006). Addressing the Testing Challenge with a
Web-Based E-Assessment System that Tutors as it
Assesses. Proceedings of the Fifteenth
International World Wide Web Conference. pp.
307-316. ACM Press New York, NY. 2006. 5
Feng, M., Heffernan, N.T., Koedinger, K.R.
(2006). Predicting state test scores better with
intelligent tutoring systems developing metrics
to measure assistance required. In Ikeda, Ashley
Chan (Eds.). Proceedings of the Eight
International Conference on Intelligent Tutoring
Systems. Springer-Verlag Berlin. pp. 31-40. 2006.
10- In this work, we compare different grain-sized
skill models - By comparing the accuracy of their prediction of
state test score
11Research Questions
- RQ1 Would adding response data to scaffolding
questions help us do a better job of tracking
students knowledge?
- RQ2 How does the finer-grained skill model
(WPI-78) do on estimating external test scores
comparing to the skill model with only 5
categories (WPI-5) and the one even with only one
category (WPI-1)?
- RQ3Does introducing item difficulty information
help to build a better predictive model?
12Data Source
- 497 students of two middle schools
- Students used the ASSISTment system every other
week from Sep. 2004 to May 2005 - Real state test score in May 2005
- Item level online data
- students binary response (1/0) to items that are
tagged in different skill models
- Some statistics
- Average usage 7.3 days, Minimum usage 6 days
- 138,000 data points (43,000 original data points)
- Average question answered
- Original 87, Scaffolding 189
Online data of 700 8th grade students available
for researchers! If you want access, talk to Neil
Heffernan and Kenneth Koedinger.
13How is the Data Organized?
14Approach
- Fit mixed-effects logistic regression model on
the longitudinal online data - using skills as a factor
- predicting prob(response1) on an item tagged
with certain skill at certain time - The fitted model gives learning parameters
(initial knowledge learning rate) of each skill
of individual student
- Compare skill models by Mean Absolute Difference
(MAD) and Err ( MAD/full score)
15Data Preprocessing Strategies
- Scaffolding Credit
- Scaffolding only shows in case of wrong answer to
original - We assume correct responses to all scaffolding
questions if a student correctly answered the
original one - Partial Blame
- Only blame the skill of the worst performance
overall
16RQ1 Will Scaffolding Response Help?
Real MCAS score Assistment Predicted Score (WPI-78) Assistment Predicted Score (WPI-78)
Orig. Orig. Scaffolds
Mary 29 22.93 27.05
Tom 28 19.38 25.35
Sue 25 18.58 24.10
Dick 22 16.57 21.31
Harry 33 18.66 28.12
Absolute Difference between Real Score and Assistment Predicted Score Absolute Difference between Real Score and Assistment Predicted Score
Orig. Orig. Scaffolds
6.06 1.35
8.62 1.57
6.42 0.06
5.43 0.78
14.34 6.63
MAD 6.03 4.121
Error 17.75 12.12
- Why?
- Using more training data
- Deal with credit-blame issue better
- More identifiability per skill
- Scaffolding questions provide valuable
information 4567
Answer Yes!
6 Walonoski, J., Heffernan, N.T. (2006).
Detection and Analysis of Off-Task Gaming
Behavior in Intelligent Tutoring Systems. In
Ikeda, Ashley Chan (Eds.). Proceedings of the
Eighth International Conference on Intelligent
Tutoring Systems. Springer-Verlag Berlin. pp.
382-391. 2006 7 Walonoski, J., Heffernan, N.T.
(2006). Prevention of Off-Task Gaming Behavior in
Intelligent Tutoring Systems. In Ikeda, Ashley
Chan (Eds.). Proceedings of the Eighth
International Conference on Intelligent Tutoring
Systems. Springer-Verlag Berlin. pp. 722-724.
2006.
17RQ2 Does finer grained model predict better?
Real MCAS score Assistment Predicted Score (scaffolding response used) Assistment Predicted Score (scaffolding response used) Assistment Predicted Score (scaffolding response used)
Skill Models Skill Models WPI-1 WPI-5 WPI-78
Mary 29 28.59 27.65 27.05
Tom 28 27.58 26.43 25.35
Sue 25 26.56 24.94 24.10
Dick 22 23.70 22.78 21.31
Harry 33 27.54 26.37 28.12
Absolute Difference between Real Score and Assistment Predicted Score Absolute Difference between Real Score and Assistment Predicted Score Absolute Difference between Real Score and Assistment Predicted Score
WPI-1 WPI-5 WPI-78
0.41 1.35 1.95
0.42 1.57 2.65
1.56 0.06 0.90
1.70 0.78 0.69
5.46 6.63 4.88
MAD 4.552 4.343 4.121
Error 13.39 12.77 12.12
Is 12.12 any good for assessment
purpose? MCAS-simulation result 11.12
18Conclusion
- Recall RQ1, RQ2.
- Positive answer to both RQ1 and RQ2.
- RQ3 Item difficulty was introduced as a factor
to improve the predictive models. We ended up
with better internally fitted models, but
surprisingly no significant enhancement on the
prediction of state test.
19 Some of the ASSISTMENT TEAM (2004-2005)
This research was made possible by the US Dept
of Education, Institute of Education Science,
"Effective Mathematics Education Research"
program grant R305K03140, the Office of Naval
Research grant N00014-03-1-0221, NSF CAREER
award to Neil Heffernan, and the Spencer
Foundation. Authors Razzaq and Mercado were
funded by the National Science Foundation under
Grant No. 0231773. All the opinions in this
article are those of the authors, and not those
of any of the funders.
Leena RAZZAQ, Mingyu FENG, Goss NUZZO-JONES,
Neil T. HEFFERNAN, Kenneth KOEDINGER,
Brian JUNKER, Steven RITTER, Andrea KNIGHT,
Edwin MERCADO, Terrence E. TURNER, Ruta
UPALEKAR, Jason A. WALONOSKI Michael A.
MACASEK, Christopher ANISZCZYK, Sanket CHOKSEY,
Tom LIVAK, Kai RASMUSSEN
Carnegie Learning