Title: Causal Data Mining
1Causal Data Mining
Richard Scheines Dept. of Philosophy, Machine
Learning, Human-Computer Interaction
Carnegie Mellon
21. Predictive Data Mining
- Finding predictive relationships in data
- What feature of student behavior predicts
learning - Who will default on credit cards
- Who will get an A in your course
- Which HS students will do well at CMU
- Do students cluster by learning style
3Causal Data Mining
- Finding causal relationships in data
- What feature of student behavior causes learning
- What will happen when we make everyone take a
reading quiz before each class - What will happen when we program our tutor to
intervene to give hints after an error
4Predictive Data Mining
Data Mining Search
Predictive Model Y f(X1, X2, Xk)
5Predictive Data Mining
- Model Classes
- Simple Regression
- Locally Weighted Regression
- Logistic Regression
- Neural Nets
- Vector Support Machines
- Decision Trees
- Bayes Net
- Naïve Bayes Classifier
- Independent Components
- Clustering
- Etc.
Data Mining Search
Predictive Model Y f(X1, X2, Xk)
6Predictive Data Mining
Data Mining Search
Predictive Model under Constraints Y f(X1, X2,
Xk), e.g., f ? Additive functions
7Predictive Data Mining
Data Mining Search
Predictive Model under Constraints Y f(X1, X2,
Xk), Or Probability Model under
Constraints P(Y X1, X2, , Xk), where P ?
Gaussian, with mean 0
8Predictive Data Mining
Decision Tree Search
9Predictive Data Mining ?Causal Data Mining
Conditioning is not the same as intervening
- P(Y X1, X2, , Xk)
- ?
- P(Y X1set, X2, , Xk)
10Causal DiscoveryStatistical Data ? Causal
Structure
11Causal Discovery Software TETRAD IV
www.phil.cmu.edu/projects/tetrad
12Full Semester Online Course in Causal
Statistical Reasoning
13Full Semester Online Course in Causal
Statistical Reasoning
- Course is tooled to record certain events
- Logins, page requests, print requests, quiz
attempts, quiz scores, voluntary exercises
attempted, etc. - Each event was associated with attributes
- Time
- student-id
- Session-id
14Printing and Voluntary Comprehension Checks 2002
--gt 2003
15References
- Causation, Prediction, and Search, 2nd Edition,
(2000), by P. Spirtes, C. Glymour, and R.
Scheines ( MIT Press) - Causality Models, Reasoning, and Inference,
(2000), Judea Pearl, Cambridge Univ. Press - Shih, B., Koedinger, K., Scheines, R. (2008).
A Response Time Model for Bottom-Out Hints as
Worked Examples. Proceedings of the First
Educational Data Mining Conference. - Shih, B., Koedinger, K., and Scheines, R. (2007)
"Optimizing Student Models for Causality." in
Proceedings of the 13th International Conference
on Artificial Intelligence in Education. - Arnold, A., Beck, J., and Scheines, R. (2006).
"Feature Discovery in the Context of Educational
Data Mining An Inductive Approach." Proceedings
of the AAAI2006 Workshop on Educational Data
Mining, Boston, MA. - Scheines, R., Leinhardt, G., Smith, J., and Cho,
K. (2005) "Replacing Lecture with Web-Based
Course Materials, Journal of Educational
Computing Research, 32, 1, 1-26.