Trend Analysis and Risk Identification

About This Presentation

Title:

Trend Analysis and Risk Identification

Description:

Lenka Nov kov 1, Jir Kl ma1, Michal Jakob1, Simon Rawles2, Olga tep nkov 1 ... positive angina pectoris (silent) myocardial infarction. cerebrovascular accident ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 24

Provided by: JK162

Category:

more less

Transcript and Presenter's Notes

Title: Trend Analysis and Risk Identification

1
Trend Analysis and Risk Identification
Lenka Nováková1, Jirí Kléma1, Michal Jakob1,
Simon Rawles2, Olga Štepánková1
1 The Gerstner laboratory for intelligent
decision making and control, Czech Technical
University, Prague
2 Department of Computer Science, University of
Bristol, Bristol, UK
PKDD 2003, Discovery Challenge
2
Outline

STULONG data, orientation towards CVD
Used tools
SumatraTT, Statistica, Weka
Used techniques
mainly statistical tests - ANOVA, Chi-square,
etc.
Exploratory analysis and subgroup discovery
Entry table
Trend analysis
Entry and Control tables
three principal ways of preprocessing
derived aggregated attributes
univariate and multivariate analysis

3
STULONG Data

Four tables Entry, Control, Letter, Death
Dependent variable CVD
CardioVascular Disease
boolean attribute derived of A2 questionnaire
(Control table)

CVD false The patient has no coronary
disease.
CVD true The patient has one of these
attributes true (Hodn1,
Hodn2, Hodn3, Hodn11, Hodn13, Hodn14)
positive angina pectoris
(silent) myocardial infarction
ischaemic heart disease
cerebrovascular accident
We remove patients who have diabetes (Hodn4) or
cancer (Hodn15) only.
4
ENTRY - subgroup discovery

AQ no.6 Are there any differences in the ENTRY
examination for different CVD groups?
Statistica 6.0
module for interactive decision tree induction
two tailed t-test or chi-square test to asses
significance of subgroups
Dependencies are relatively weak
Interesting dependencies found
social characteristics derived attribute
AGE_of_ENTRY
alcohol positive effect of beer, no effect of
wine
sugar consumption increases CVD risk
well-known dependencies are not mentioned
(smoking, BMI, cholesterol)

5
ENTRY - general model

General CVD model (in WEKA)
feature selection modeling (e.g., decision
trees)
tends to generate trivial models (always
predicting false)
asymmetric error-cost matrix does not help

Predict CVD risk
Identify principal variables (Chi-squared test)
Naïve Bayes ROC evaluation
three independent variables
discretized AGE_of_ENTRY
discretized BMI
Cholrisk - derived of CHLST
AUC 0.66

6
CONTROL - trend analysis

AQ no.7 Are there any differences in development
of risk factors for different CVD groups?

ENTRY table
CONTR table
ICO primary key Year of birth Year of
entry Smoking Alcohol Cholesterol Body Mass
Index Blood pressure
ICO Risk factors followed during 20 years
7
Global Approach

Risk factors to be observed are selected
SYST, DIAST, TRIGL, BMI, CHLSTMG
Selected control examinations are transformed
pivoting
Patients with no control entries are removed
about 60 patients
Trend aggregates are calculated

ICO_1
ICO_2
8
Derived trend attributes
9
Global Approach - results

The derived aggregates were discretized
e.g., the gradient can be strongly decreasing,
decreasing, constant, increasing, strongly
increasing
Chi-square test for independence wrt. to CVD
Large number of aggregates proved to be
significant including gradients (Chi square test,
p0.05)

10
12
11
12
12
ControlCount vs. CVD

ControlCount
number of examinations
strong relation with CVD
AUC 0.35
ControlCount ? CVD risk ?
anachronistic attribute
introduced by the design of the study

ControlCount has influence on the trend
aggregates - ControlCount ? gradients tend to be
more steep etc.
Conclusion global approach cannot be applied (at
least with these aggregates)

13
Windowing Approach I.

The same risk factors, the same pivoting
transformation and similar trend aggregates
BUT the constant number of examinations
Issues
window
time period vs. number of examinations
5 examinations are enough to express trend
patients records (1 ControlCount 3)
entry is used as the first examination
records are dependent
CVD classification
time from the last examination to CVD
yes/no (yes CVD in the next year or CVD in
future)

14
Windowing Approach I.
...
Data
First vector
New vector
15
Aggregate tests
T-tests Grouping Time_round (Trend_all_nahrady
in Trend_analysis.stw) Group 1 1000 Group 2 1

Trend aggregates approach the normal distribution
in all (both) the specified CVD groups
Two groups were selected CVD never appears in
the future (1000) vs. CVD appears at the next
exam. (1)
T-test for comparison of the group means can be
applied (plt0.05)
Do the means of the calculated aggregates differ
in the different CVD groups?
Just a few of them
two variables (!gradients!) are clearly
significant only
SYST and DIAST
two significant intercepts
TRIGL and CHLST

16
Further tests of SYST, DIAST
T-tests Grouping Time_round (Trend_all_nahrady
in Trend_analysis.stw) Group 1 1000 Group 2 1

Try to test the gradients for all the CVD groups,
not only two extreme groups
Repeated ANOVA can be applied development of
SYST/DIAST trend for different CVD groups

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Windowing Approach II.

There are missing values of risk factors
Windowing I.
skips missing values
different numbers of rows are generated for
different factors
Windowing II.
replaces the missing values
the same numbers of rows are generated for
different factors
enables multivariate analysis
combination of different aggregates and their
relation with CVD

21
Windowing II.
...
Data
First vector
New vector
22
27 patients only!
23
Conclusions

The main scope
AQ no.7 Are there any differences in development
of risk factors for different CVD groups?
Contributions
Pitfalls of the global approach revealed
Using windowing differences proved for SYST and
DIAST blood pressures
Other assumptions and ideas
interesting course of development of risk factors
(DIAST is decreasing first then increases and CVD
appears)
other trends may have influence under specific
conditions (BMITrend and overweight, etc.)

Write a Comment

User Comments (0)

About PowerShow.com

Trend Analysis and Risk Identification - PowerPoint PPT Presentation

Trend Analysis and Risk Identification

Lenka Nov kov 1, Jir Kl ma1, Michal Jakob1, Simon Rawles2, Olga tep nkov 1 ... positive angina pectoris (silent) myocardial infarction. cerebrovascular accident ... – PowerPoint PPT presentation