Title: Inferring Conceptual Knowledge from Unstructured Student Writing
1Inferring Conceptual Knowledge from Unstructured
Student Writing
Workshop Personalizing Education with Machine
Learning Neural Information Processing Systems
(NIPS) ConferenceLake Tahoe, CA, 8 December 2012
Vivienne L. Ming
2The role of assessment in instruction
- Reveals what students already know and what they
need to learn - Provides feedback to students and teachers on
success of learning and instruction
- Timely and specific feedback can guide continued
instruction (formative assessment)
Graphic from http//www.cmu.edu/teaching/assessmen
t/basics/alignment.html
3Challenges with assessment
- Large-scale assessment
- Heavy on summative assessment
- Standardized tests, academic analytics systems
- Emphasize performance, not conceptual
understanding - Delayed, coarse-grained feedback
- Intrusive
- Interrupt class to administer test
- Modify instruction to adopt others materials
- Alternatives
- Teachers may lack training in designing and
interpreting other kinds of assessment - Difficult to aggregate, calibrate
Printable sign available athttp//www.pickens.k12
.ga.us/assessment.html
4Our goals
- Use continuous, passive assessmentto elucidate
conceptual knowledge. - Wealth of unstructured data
- Informal
- Build on teachers existing instruction
- Align with formal assessment, e.g.
- course grades
- standardized tests
- instructor qualitative assessment
5Research questions
- Can topic models of unstructured student writing
predict course outcomes? - How does the accuracy of these predictions change
over time as more student work is analyzed? - What does learning the topic hierarchy add beyond
conventional topic modeling in improving these
predictions?
6Dataset Methods
- Online discussion forums
- 5- or 6-week courses
- 2 mandatory discussion questions per week
- Introductory courses at large, for-profit
university
Biology (undergraduate) Economics (MBA)
Course length (wks) 5 6
discussion question threads per class 10 12
classes 17 45
students (after filtering) 230 970
posts by students 9118 44345
7Analytical approach
- Outcome of interest Student conceptual
understanding - Proxy Outcome Student course grade
- Compare possible data features
- Baseline
- Mean course grade
- Individual student posting characteristics
- Word count
- Conventional Semantic Modeling
- Probabilistic Latent Semantic Analysis (pLSA)
- Feature of Interest
- Hierarchical Latent Dirichlet Allocation (hLDA)
8Algorithms
- Proof of concept
- Logistic regression on the accumulated topic
coefficients from each week - Other supervised algorithms (e.g., SVM) surely
better - LR chosen to focus on contribution from hLDA
- Current work utilizes
- HCRF (Hidden-state Conditional Random Fields)
- Improved weekly predictions
- Allows forward prediction in course time
9Results Biology course
- Prediction accuracy
- Word count gt mean (for 3 wks)
- pLSA gtgt word count
- hLDA gt pLSA
- With more data collected over time
- All predictions improve.
10Results Economics course
- Prediction accuracy
- Word count gt mean (for 2 wks)
- pLSA gt word count
- hLDA gtgt pLSA
- With more data collected over time
- All predictions improve.
11Topic modeling can distinguish topics discussed
by final grades.
- Each point represents posts by one student
- Posts projected in 100-D pLSA concept space
- Used local linear embedding (LLE) to reduce to
2-D
Cs Ds neglect these topics
Increasing final grades
12Comments by higher grade-earners reveal more
structure.
- Each point represents one post, color-coded by
grade - Ds and below cluster in the center
- Higher grades move in specific directions toward
periphery - Directions may correspond to course structure or
instructors guidance - Not just depth or specificity, but particular
concepts
13Structure corresponds to course topics.
- Same points, color-coded by week
- Different weeks on different branches
- Low grades stay in center even when discussion
topics invite more specific comments.
14What does hierarchical modeling add?
- Not all language is equal.
- Conventional topic modeling treats all topics as
equal (and independent). - Hierarchy implies ranking
- Shallower more frequent and generic language
- Deeper more infrequent and technical language
15Examining hLDA results (Econ)
- Posts from students earning higher grades
correlated with
- Higher mean of depth in hLDA
- C grades most language at shallowest level
- A, B grades more language at deeper levels
- More technically proficient language use
- General language more anecdotal comments
- Specific language greater conceptual depth
16Summary of results
- Can topic models of unstructured student writing
predict course outcomes? - YES pLSA, hLDA both better than chance (and
better than post length). - How does the accuracy of these predictions change
over time as more student work is analyzed? - Extra weeks of data improves predictions.
- By end of course, pLSA predictions are within one
letter grade. - What does learning the topic hierarchy add beyond
conventional topic modeling in improving these
predictions? - hLDA gt pLSA
- Higher grades associated with discussion of
deeper topics in hLDA.
17Conclusions and Future Work
- There is some collection of topics associated
with higher grades (and some other collection of
topics associated with lower grades). - Deeper topics associated with low/high grades
could potentially differ analysis yet to be
done. - i.e., deep misconceptions such as inheriting
acquired traits (Lamarckian evolution) - Next steps
- Create topic map
- Hierarchical relationships
- Normative sources (e.g., textbook, exemplary
student work) - Labeled, non-normative sources (common
misconceptions)
18Implications
- Extensions to other text data
- Essays, short-answer test questions
- Online tutoring
- Informal learning environments (e.g., Quora,
Evernote) - Annotations on e-texts
- Wiki contributions
- Language mediates learning text is everywhere.
Learn from it, improve it.