Title: Machine Learning Risk Adjustment of the
1Machine Learning Risk Adjustment of the
C-section Rate Impact by Provider
Cynthia J. Sims MD, Obstetrics, Gynecology
Reproductive Sciences, Magee Womens Hospital,
Pittsburgh, PA 15213 Rich Caruana, Peng Jia, Radu
Stefan Niculescu, Matt Troup, Carnegie Mellon
University, Pittsburgh, PA 15213 R. Bharat Rao,
Data Mining Group, Siemens Corporate Research,
Inc. Princeton, NJ 08540
2 Objective We observed a significant variation
in C-section rates for 17 physician groups, 13
to 23. The objective of this study was to
determine how much of the observed variation was
due to differences in the patient sub-population
and how much was due to differences inherent to
the group practices. Method We studied a
population of 22,176 patients (1995-1997)
stratified by provider group. We trained a
machine-learning decision-tree model on all
22,176 patients. The model had an accuracy of
90, and an ROC area of 0.92. Care was taken to
prevent over-fitting. The decision-tree model
was applied to the patients in each group to
determine the aggregate risk for C-section for
the sub-population predicted by average physician
practice as represented by the 17 physician
groups.
3Results 1. Little of the observed variation in
C-section rate was attributable to variation in
the patient sub-populations (the correlation
between the observed C-section rates and the
rates predicted by the machine learning model was
only 0.21). 2. After adjusting for patient
sub-population risk, we found that several groups
had differences between actual and predicted
rates that were highly significant. 3. Raw
C-section rates are misleading. Some groups with
a high rate had a high risk patient population
that justified the high rate. Other groups with
a high rate did not have high risk patient
populations. Conclusions There was
significant variation in the C-section rate of
the different sub-populations. (See table to
right.) Only a fraction of the observed variation
was explained by differences in predicted risk
for C-section of the population. When determining
which groups have high c-section rates, it is
important to adjust for the relative risk of the
different sub-populations. The raw, unadjusted
cesarean section rate of different
sub-populations can be misleading. We conclude
that the substantial differences among the groups
were not predicted by patient risk.
4MACHINE LEARNING DECISION TREE MODEL TRAINED ON
22,176 CASES
5Observed and Predicted C-Section Rates for 17
Physician Groups Sorted by Observed C-Section
Rates. Physician Groups 7, 8, and 10 are
particularly Interesting. Last Column is
Estimated C-Section Rate that Would Result if the
Physician Group Treated all 22,176 Patients.
6O
G
M
A
E
K
D
J
F
H
Scatter Plot Comparing the Observed C-Section
Rate in the 17 Physician Groups With the
C-Section Rates Predicted for Those Groups by the
Decision Tree
7- Hypothesis
- The observed variation in C-section rates for
physician groups is inherent to the group
practice and not due to differences in the
patient sub-population. - The Population
- 22,176 patients (1995-1997).
- Stratified by provider groups.
- 17 provider groups.
- Conclusions
- The substantial differences among groups were
not predicted by patient risk. - Significant variation in the C-section rate of
the different provider group sub-populations.
8- Future Work
- Evaluate methods for machine learning group
comparison. - Compare decision tree model with a Neural Network
model. - Best evidence that c-section rate can be lowered
without adversely affecting the results comes
from countries with lower c-section rates but
comparable outcomes. We intend to apply the same
techniques to a medical database of one of these
countries.