Title: The Application of Data Mining
1 The Application of Data Mining in Health
Research
Li Xiaosong, M.D., M.P.H., Ph.D. Prof. of
Biostatistics School of Public Health Sichuan
University
2 Knowledge discovery in databases ( KDD)
- With the rapid development of the Information
Industry, great advances have been made in data
production and collection capacities,however, the
conflict of rich data but pool knowledge is
getting increasingly evident. - It is the Knowledge discovery in databases
that cater the demand!
3Data mining
Data tomb?
Knowledge
4Data Mining (DM)
- In the face of vast databases, how to discover
the hidden but - useful knowledge from data, which can help in
the government - and enterprises decision-making, so as to
get more benefit had - become an important problem to solve
DM
Data Knowledge
- Data Mining is the procedure of distilling the
unknown but - potentially valuable information and
knowledge from plentiful - data which is uncompleted, misty and
stochastic
5Data Mining A KDD Process
Knowledge
- Data mining--the core of knowledge discovery
process
Pattern Evaluation
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
6Classification of Data Mining technology
-
- Association Rule Mining
- Classification and Predicting
- Clustering Analysis
- Trend Analysis
- Patten Analysis
-
7The Application of Data Mining
Banking Telecom Economy
Meteorology Agriculture Health care
Military
8Data Mining applied in health care
- The application of data mining in medical and
health - researches had prove itself to be effective,
showing - great development potentialities
- Now data mining has become a key method in
- obtaining information in clinical medicine,
biomedicine, - pharmacy and public health
9Data Mining applied in Clinical Research
- Finding the relationship among diseases
- Searching the rule of disease development and
prevalence - Disease diagnosis and treatment
- Summarizing therapeutic effects
- e.g. using Bayes classification decision tree
- classification in disease diagnosis
-
10Data Mining applied in Biomedicine
-A powerful tool in DNA analysis!
-
- Using Sequence Model Analysis Similarity
Retrieval - to find the gene sequence model for certain
kind of diseases - Data Cleaning Data integration is valuable in
the data - integration and database building of gene
- Association Rules Analysis can help to discover
Gene - crossover and correlation in a Genome
-
11Data Mining applied in Public Health
- Spatio-temporal Data Mining used in infectious
diseases - monitoring to search for the epidemic rules
and distribution - characteristics of diseases
- Using Time Series Analysis, Neural Networks to
predict the - incidence and infant mortality rate of
infectious diseases - Association Rules to discuss the influencing
factors of - diseases and health seeking behavior
- Clustering and classification are now widely
used in the - decision support system for health insurance
12e.g. Application of Association Rules in Medical
Data Analysis
- Association Rules aims to find out the
relationship among valuables in database,
resulting in deferent types of rules
Table 1original data from a research on heart
disease
LAD- The percentage of heat disease caused by
left anterior descending
coronary artery RCA- The percentage of heat
disease caused by right coronary
artery
13Results
Table.2 Medical Association Rules
- Rule 1 indicates40 of the cases are
male, over 70 years old and - have the habit of smoking, the possibility
of RCA50 is 100 -
- Rule 2 indicates20 of the cases
are female, under 70 years old and - have the habit of smoking, the possibility of
LAD70 is 100
14The future application of Data Mining in medical
research
- Data Mining is based on a series of new data
process technology - Wavelets Analysis
- Neural Networks
- Genetic Algorithm
- Fuzzy Logic Reasoning
15Challenges facing Data Mining
- The data of medical research are always
complicated and - unique in types and structures
- To integrate the specialized knowledge of both
medical and data- - processing staffs
- Plenty of data and repeatedly practices are
needed - Targets
- To form a real useful data mining system
for - health research
16Backpropagation Neural Networks (BPNNs)
-
- A type of artificial neural network
- A way to model highly complex, nonlinear
solutions to classification problems - Useful in classifying health-related phenomena
17e.g. Classification of smoking cessation status
with BPNN
- Classifier performance estimates
- The confidence intervals of Az for both the BPNN
and logistic classifiers are narrow - And exceeds random chance (Az0.5) by at least
25 points - The finding indicates the performance of both
classifiers exceeds that of random - chance
- Az area under receiver operating
characteristic curve
18e.g. Classification of smoking cessation status
with BPNN
Binormal conventional ROC curves for BPNN and
logistic regression classifiers
- The graph illustrates the estimated true
positive fraction (TPF) at multiple values of the
false positive fraction (FPF) - The areas under the ROC curve differ at a0.05
ROCreceiver operating characteristic
19