Title: Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4'5
1Patient LOS prediction Evaluating the impact of
different CS on the prediction accuracy of the
C4.5 algorithm
Revlin Abbi, Elia El-Darzi, and Christos Vasilakis
University of Westminster, Harrow School of
Computer Science
2Presentation overview
- Main Focus
- Patient spell classification methodology
- Impact of LOS classification scheme on prediction
accuracy - Content of the presentation
- The use of patient length of stay for decision
making - Aim, objectives, and methods
- Results and conclusion
3Introduction
- Health care issues
- Health care systems are complex
- Ageing of population in developed world
- Increased demand and escalating costs
- Patient length of stay (LOS) and decision making
- Duration of time a patient spends in hospital
- Readily available and easy to calculate
- Proxy measure for resource consumption
4Grouping and predicting LOS
- Why automate the grouping according to LOS?
- Provides simplified representation of population
- Clinical judgement and visual inspection is
subjective - Why predict patient LOS?
- Improved discharge planning
- Better allocation and scheduling of resources
5Classification algorithms
- Unsupervised algorithms
- Partitions records into groups
- Used when groups are unknown
- E.g. K-means, Gaussian mixture modelling (GMM)
- Supervised algorithms
- Prediction of patient LOS
- Maps patient characteristics to LOS classes
- Previously combined with clinical judgement
- E.g. C4.5, neural networks
6Aim and objectives
- Aim
- Investigate the impact of different
classification schemes on prediction accuracy - Identify if
- Prediction accuracy of a decision tree is
affected by the choice of LOS classification
scheme - Tree structure is affected by the LOS
classification scheme - Number of patients within a class affects class
accuracy
7Dataset
- Admissions over a 16-month period
- Consists of 7723 records of patients undergoing
surgery - Variables
- Gender
- Age
- Date of admission and discharge
- Public or private patient
- Case type (emergency/ planned)
- Major diagnostic category (MDC)
8Variability of patient LOS
LOS Percentile
99th
100th
95th
90th
75th
50th
25th
45
20
13
7
3
1
All patients records
228
Gender
47
21
14
8
3
1
Women
228
43
19
12
6
3
1
Men
106
Case type
48
23
15
8
3
2
Emergency
228
30
16
11
5
2
1
Non-emergency
98
Age
32
13
8
4
2
1
0-19
49
38
12
8
4
2
1
20-39
228
43
18
12
6
3
1
40-59
174
49
24
16
9
4
2
60-79
179
41
25
19
11
6
2
80-100
98
Patient type
45
20
13
7
3
1
Public
228
41
21
13
8
4
2
Private
106
9Proposed methodology
Training Data
Fitting GMMs to LOS using EM
Selecting a GMM using MDL
Set of GMMs
A single GMM
Deriving LOS classification scheme and merging of
boundaries
Input Data
Splitting
127 LOS classification schemes
Building the decision tree (C4.5)
Testing Data
Testing decision tree
Decision tree
Performance measures
Calculate performance measures
Confusion Matrix
10Gaussian mixture model (GMM)
- Fitted to the raw LOS data
- Mixture of Gaussian functions
- Expectation Maximisation (EM) algorithm
- Iterative optimisation algorithm fast and
efficient - Prior, posterior and unconditional probability
- Criterion for selecting the appropriate model
- Need to decide on the number of Gaussian
- Minimum description length (MDL)
- Quantifies each GMM One to six components
11Building the decision tree
- C4.5 - divide and conquer algorithm
- Creates the decision tree from the training data
- Example segment of a decision tree
Age lt 61 MDC 19 0-2 (6.0/2.0) MDC
0 Pub_Priv 2 6-13 (2.0/1.0)
Pub_Priv 1 Age lt 49
Adm_Cat 1 14-36 (2.0/1.0)
Adm_Cat 2 Age lt 32
14-36 (4.0/2.0) Age gt 32
37-228 (9.0/1.0) Age gt 49
If Age lt 61 and MDC is 19 then LOS is between 0
and 2 days
12Performance measures
Overall accuracy
Class accuracy
No of correct predictions in class Total no
belonging to class
Prediction profit
13Computational experiment
- Enumeration Merging class boundaries
- 8 classes
- 127 classification schemes
- Merging of 8 classes (2-7 classes)
- 1) 0-1, 2, 3-5, 6-9, 10-11, 12-17, 18-33, and
34-228 - 2) 0-1, 2, 3-5, 6-9, 10-11, 12-17, and 18-228 (7
classes) - 3) 0-1, 2, 3-5, 6-9, 10-11, 12-33, and 34-228 (7
classes) -
-
- 127) 0-33, 34-228 (2 classes)
14Results
- Overall accuracy is affected by the CS
- Overall accuracy ranges 30.6-98.3
- 47 above 50
15Results
- Prediction profit (PP) is affected by the
selection of CS - 127 different CS resulted in a range from -7 to
25.3 - 49 zero or below
- 46 above one
16Results
- The structure of the decision tree changes
- The size of the tree is affected (number of
nodes) - Importance of variables changes
- Gender selected before age and admission category
- Age selected before admission category and gender
- Wide class intervals
- Tree predicts the same classification for all
patients - 25 of CS resulted in the tree predicting the
same class - e.g. 0-33 days consists of 98.3 of patient
records
17Conclusion
- Selection of LOS classification scheme has an
impact on prediction accuracy of decision tree - Structure of the decision tree is also affected
- Classes with large proportions of records cause
the tree to predict the same LOS class - Standard performance measures are not adequate
- A new measure for accuracy robustness is needed