Trend Analysis and Risk Identification - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Trend Analysis and Risk Identification

Description:

Lenka Nov kov 1, Jir Kl ma1, Michal Jakob1, Simon Rawles2, Olga tep nkov 1 ... positive angina pectoris (silent) myocardial infarction. cerebrovascular accident ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 24
Provided by: JK162
Category:

less

Transcript and Presenter's Notes

Title: Trend Analysis and Risk Identification


1
Trend Analysis and Risk Identification
Lenka Nováková1, Jirí Kléma1, Michal Jakob1,
Simon Rawles2, Olga Štepánková1
1 The Gerstner laboratory for intelligent
decision making and control, Czech Technical
University, Prague
2 Department of Computer Science, University of
Bristol, Bristol, UK
PKDD 2003, Discovery Challenge
2
Outline
  • STULONG data, orientation towards CVD
  • Used tools
  • SumatraTT, Statistica, Weka
  • Used techniques
  • mainly statistical tests - ANOVA, Chi-square,
    etc.
  • Exploratory analysis and subgroup discovery
  • Entry table
  • Trend analysis
  • Entry and Control tables
  • three principal ways of preprocessing
  • derived aggregated attributes
  • univariate and multivariate analysis

3
STULONG Data
  • Four tables Entry, Control, Letter, Death
  • Dependent variable CVD
  • CardioVascular Disease
  • boolean attribute derived of A2 questionnaire
    (Control table)

CVD false The patient has no coronary
disease.
CVD true The patient has one of these
attributes true (Hodn1,
Hodn2, Hodn3, Hodn11, Hodn13, Hodn14)
positive angina pectoris
(silent) myocardial infarction
ischaemic heart disease
cerebrovascular accident
We remove patients who have diabetes (Hodn4) or
cancer (Hodn15) only.
4
ENTRY - subgroup discovery
  • AQ no.6 Are there any differences in the ENTRY
    examination for different CVD groups?
  • Statistica 6.0
  • module for interactive decision tree induction
  • two tailed t-test or chi-square test to asses
    significance of subgroups
  • Dependencies are relatively weak
  • Interesting dependencies found
  • social characteristics derived attribute
    AGE_of_ENTRY
  • alcohol positive effect of beer, no effect of
    wine
  • sugar consumption increases CVD risk
  • well-known dependencies are not mentioned
    (smoking, BMI, cholesterol)

5
ENTRY - general model
  • General CVD model (in WEKA)
  • feature selection modeling (e.g., decision
    trees)
  • tends to generate trivial models (always
    predicting false)
  • asymmetric error-cost matrix does not help
  • Predict CVD risk
  • Identify principal variables (Chi-squared test)
  • Naïve Bayes ROC evaluation
  • three independent variables
  • discretized AGE_of_ENTRY
  • discretized BMI
  • Cholrisk - derived of CHLST
  • AUC 0.66

6
CONTROL - trend analysis
  • AQ no.7 Are there any differences in development
    of risk factors for different CVD groups?

ENTRY table
CONTR table
ICO primary key Year of birth Year of
entry Smoking Alcohol Cholesterol Body Mass
Index Blood pressure
ICO Risk factors followed during 20 years
7
Global Approach
  • Risk factors to be observed are selected
  • SYST, DIAST, TRIGL, BMI, CHLSTMG
  • Selected control examinations are transformed
  • pivoting
  • Patients with no control entries are removed
  • about 60 patients
  • Trend aggregates are calculated

ICO_1
ICO_2
8
Derived trend attributes
9
Global Approach - results
  • The derived aggregates were discretized
  • e.g., the gradient can be strongly decreasing,
    decreasing, constant, increasing, strongly
    increasing
  • Chi-square test for independence wrt. to CVD
  • Large number of aggregates proved to be
    significant including gradients (Chi square test,
    p0.05)

10
12
11
12
12
ControlCount vs. CVD
  • ControlCount
  • number of examinations
  • strong relation with CVD
  • AUC 0.35
  • ControlCount ? CVD risk ?
  • anachronistic attribute
  • introduced by the design of the study
  • ControlCount has influence on the trend
    aggregates - ControlCount ? gradients tend to be
    more steep etc.
  • Conclusion global approach cannot be applied (at
    least with these aggregates)

13
Windowing Approach I.
  • The same risk factors, the same pivoting
    transformation and similar trend aggregates
  • BUT the constant number of examinations
  • Issues
  • window
  • time period vs. number of examinations
  • 5 examinations are enough to express trend
  • patients records (1 ControlCount 3)
  • entry is used as the first examination
  • records are dependent
  • CVD classification
  • time from the last examination to CVD
  • yes/no (yes CVD in the next year or CVD in
    future)

14
Windowing Approach I.
...
Data
First vector
New vector
15
Aggregate tests
T-tests Grouping Time_round (Trend_all_nahrady
in Trend_analysis.stw) Group 1 1000 Group 2 1
  • Trend aggregates approach the normal distribution
    in all (both) the specified CVD groups
  • Two groups were selected CVD never appears in
    the future (1000) vs. CVD appears at the next
    exam. (1)
  • T-test for comparison of the group means can be
    applied (plt0.05)
  • Do the means of the calculated aggregates differ
    in the different CVD groups?
  • Just a few of them
  • two variables (!gradients!) are clearly
    significant only
  • SYST and DIAST
  • two significant intercepts
  • TRIGL and CHLST


16
Further tests of SYST, DIAST
T-tests Grouping Time_round (Trend_all_nahrady
in Trend_analysis.stw) Group 1 1000 Group 2 1
  • Try to test the gradients for all the CVD groups,
    not only two extreme groups
  • Repeated ANOVA can be applied development of
    SYST/DIAST trend for different CVD groups


17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Windowing Approach II.
  • There are missing values of risk factors
  • Windowing I.
  • skips missing values
  • different numbers of rows are generated for
    different factors
  • Windowing II.
  • replaces the missing values
  • the same numbers of rows are generated for
    different factors
  • enables multivariate analysis
  • combination of different aggregates and their
    relation with CVD

21
Windowing II.
...
Data
First vector
New vector
22
27 patients only!
23
Conclusions
  • The main scope
  • AQ no.7 Are there any differences in development
    of risk factors for different CVD groups?
  • Contributions
  • Pitfalls of the global approach revealed
  • Using windowing differences proved for SYST and
    DIAST blood pressures
  • Other assumptions and ideas
  • interesting course of development of risk factors
    (DIAST is decreasing first then increases and CVD
    appears)
  • other trends may have influence under specific
    conditions (BMITrend and overweight, etc.)
Write a Comment
User Comments (0)
About PowerShow.com