Title: K' P' Unnikrishnan
 1Data Mining Methods for Electronic Medical Records
Collaborators Indian Institute of Science P. S. 
Sastry, (Srivatsan Laxman) Univ. Michigan Vijay 
Nair, Casey Diekman, Kohinoor Dasgupta Virginia 
Tech Naren Ramakrishnan, Debprakash Patnaik UC, 
Davis Anne Smith, UCSF Loren Frank RIKEN Kazuo 
Okanoya Wayne State Sorin Draghici 
 2Applications 
 3Discovering episodes with temporal constraints
- In Neuroscience, one can get delays synaptic and 
 axonal delays.
- Automatic discovery of inter-event intervals and 
 episodes
- Inter-event times of event occurrences have 
 valuable information
- Goal Unearthing network connectivity patterns
4Graph Edges Patterns in data
t
U X Y 
 Z 
G M
tUX
tGM
tXY
tYZ
A. Efficient level-wise mining
Counting all episodes above a threshold
B. Discovering inter-event intervals
Discovering the best fit interval from a 
supplied set 
 5Tracking an Evolving Network 
 6Discovering Rare Patterns
- 10,000 spikes from 26 neurons 
- 11 spikes (0.1 of the total) are in a pattern 
 G-M-B-E-A-T-S-F-O-R-D
- A single occurrence is statistically significant
7Mining EMR using GMiner
- Example EMRPatient ID_0Recorded medical event 
 "DIAG_1" on Day 0Recorded medical event "DIAG_3"
 on Day 1Recorded medical event "PRES_1" on Day
 5Recorded medical event "PRES_3" on Day
 6Recorded medical event "EVT_L" on Day
 7Recorded medical event "TEST_4" on Day 7...
- Embedded patterns 
- TEST_1 -gt TEST_2 -gt DIAG_1 -gt PRES_1 
- TEST_3 -gt TEST_4 -gt DIAG_2 -gt PRES_2 
- TEST_5 -gt DIAG_3 -gt PRES_3 
- GMiner Results 
- No. of 3 node frequent episodes  5 
- TEST_54-6-DIAG_34-6-PRES_3 (0.78141)  242 
- No. of 4 node frequent episodes  2 
- TEST_14-6-TEST_24-6-DIAG_14-6-PRES_1 
 (0.81822)  187
- TEST_34-6-TEST_44-6-DIAG_24-6-PRES_2 
 (0.80452)  175
8Imaginary Situation 1
- Patients arriving in Emergency Department (ED) 
- Events Diagnostic tests  EMR (historical data) 
 represented here as alphabets
- Event patterns can be discovered 
- Patients can be flagged as high-priority (based 
 on partial patterns)
A15-30 min-B15-30 min-C15-30 hours-Y 
MAZXYCQBGMQPTARYCDJBSPASWCJDGMDYZXHGDH
Patient 1
Historical Data
ZXHADHOTCBFAKVPCLVIRXY
Patient 2
SARYCDJBSPASWCJDGMDYKVPQLVIRX
Patient 3
Raise flag at current time
Time  
 9GMiner Graph Visualization 
 10Imaginary Situation 2
- 1,000 patients come through the hospital 
- Most of these events occur independently of one 
 another or with weak dependence
-  2 of these patients have the same condition and 
 show the sequence of events we looked at before
 
-  A4 to 6 hours-B1 to 3 hours-C5 to 7 
 hours-Y
- However, another 2 of the patients also have the 
 same condition but a different pattern of events
 occurs with different time delays
-  
-  A9 to 11 hours-B3 to 5 hours-C11 to 13 
 hours-N
- Imagine that event Y represents a positive 
 outcome, while event N represents a negative
 outcome.
11GMiner Results (Simulation Example 2) 
 12Backup 
 13Complex dynamical system
- What is the problem we are trying to solve?
- Large graph with many nodes and edges 
- Activity from many of the nodes are available 
- How do we get the graph (strength, direction, 
 delay) out
14With inter-event time constraints
- Inter-event times in serial episodes 
- Inter-event expiry constraint  (0 lt ?ti lt TX) 
- Inter-event interval constraint  (Tlow lt ?ti lt 
 Thigh)
15Counting Episodes with inter-event constraints
- Complex state-transitions required for counting 
 with inter-event constraints
- Space complexity  O(mnC) and Time Complexity  
 O(mnC)
Accept_A()
Accept_B()
Accept_C()
Accept_D()
C10
A1
B4
D17
A2
B12
C13
A5
5
10
Data Read Head
A1
B4
A5
C10
B12
C13
D17
A2
Event Sequence 
 16Parallelizing counting
- Run several parallel automatons at different 
 start states for the same episode
- Map step 
- Merge count and state info from each auto 
- Concatenate step 
- Implemented on Nvidia GTX280 GPU 
- 1.3 Ghz clock 
- 1 GB device memory) 
- 200X speed up w.r.t CPU
17Cortical cultures on micro-electrode arrays 
 18Relative spike counts
Days 
 19(No Transcript) 
 20Data mining Bayesian GLM 
 21TDMiner Finds Fault Correlations 
 22Finding Relevant Fault Correlations 
Statistically significant correlations
Problem begins
Problem fixed
By using TDMiner, the root-cause could have been 
identified 2.5 weeks earlier 
 23LOMA Robot Problems