What is data mining?

1 / 71
About This Presentation
Title:

What is data mining?

Description:

What is data mining? W odzis aw Duch Dept. of Informatics, Nicholas Copernicus University, Toru , Poland http://www.phys.uni.torun.pl/~duch ISEP Porto, 8-12 July 2002 – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 72
Provided by: Valeri110

less

Transcript and Presenter's Notes

Title: What is data mining?


1
What is data mining?
  • Wlodzislaw Duch
  • Dept. of Informatics, Nicholas Copernicus
    University, Torun, Poland
  • http//www.phys.uni.torun.pl/duch

ISEP Porto, 8-12 July 2002
2
What is it about?
  • Data used to be precious! Now it is overwhelming
    ...
  • In many areas of science, business and commerce
    people are drowning in data.
  • Ex astronomy super-telescope data mining in
    existing databases.
  • Database technology allows to store and retrieve
    large amounts of data of any kind.
  • There is knowledge hidden in data.
  • Data analysis requires intelligence.

3
Ancient history
  • 1960 first databases, collections of data.
  • 1970 RDBMS, relational data model most popular
    today, large centralized systems.
  • 1980 application-oriented data models,
    specialized for scientific, geographic,
    engineering data, time series, text,
    object-oriented models, distributed databases.
  • 1990 multimedia and Web databases, data
    warehousing (subject-oriented DB for decision
    support), and on-line analytical processing
    (OLAP), deduction and verification of
    hypothetical patterns.
  • Data mining first conference in 1989, book 1996,
    discover something useful!

4
Data Mining History
  • 1989 IJCAI Workshop on Knowledge Discovery in
    Databases (Piatetsky-Shapiro and W. Frawley 1991)
  • 1991-1994 Workshops on KDD
  • 1996 Advances in Knowledge Discovery and Data
    Mining (Fayyad et al.)
  • 1995-1998 International Conferences on Knowledge
    Discovery in Databases and Data Mining
    (KDD95-98)
  • 1997 Journal of Data Mining and Knowledge
    Discovery
  • 1998 ACM SIGKDD, SIGKDD1999-2001 conferences,
    and SIGKDD Explorations
  • Many conferences on data mining PAKDD, PKDD,
    SIAM-Data Mining, (IEEE) ICDM, etc.

5
References, papers
  • KDD WWW Resources
  • http//www.kdd.org
  • http//www.kdnuggets.com
  • http//www.the-data-mine.com
  • http//www.acm.org/sigkdd/

ResearchIndex http//citeseer.nj.nec.com/cs AI
ML aspects http//www.phys.uni.torun.pl/kmk NN
Statistics http//www.phys.uni.torun.pl/kmk Compa
rison of results on many datasets http//www.phy
s.uni.torun.pl/kmk
6
Data Mining and statistics
  • Statisticians deal with data whats new in DM?
  • Many DM methods have roots in statistics.
  • Statistics used to deal with small, controlled
    experiments, while DM deals with large, messy
    collections of data.
  • Statistics is based on analytical probabilistic
    models, DM is based on algorithms that find
    patterns in data.
  • Many DM algorithms came from other sources and
    slowly get some statistical justification.
  • Key factor for DM is the computer
    cost/performance.
  • Sometimes DM is more art than science

7
Types of Data
  • Statistical data clean, numerical, controlled
    experiments, vector space model.
  • Relational data marketing, finances.
  • Textual data Web, NLP, search.
  • Complex structures chemistry, economics.
  • Sequence data bioinformatics.
  • Multimedia data images, video.
  • Signals dynamic data, biosignals.
  • AI data logical problems, games, behavior

8
What is DM?
  • Discovering interesting patterns, finding useful
    summaries of large databases.
  • DM is more than database technology, On-Line
    Analitic Processing (OLAP) tools.
  • DM is more than statistical analysis, although it
    includes classification, association, clustering,
    outlier and trend analysis, decision rules,
    prototype cases, multidimensional visualization
    etc. Understanding of data has not been an
    explicit goal of statistics, focusing on
    predictive data models.

9
DM applications
  • Many applications, but spectacular new knowledge
    is rarely discovered. Some examples
  • Diapers and beer correlation please them close
    and put potato chips in between.
  • Mining astronomical catalogs (Skycat, Sloan Sky
    survey) new subtype of stars has been
    discovered!
  • Bioinformatics more precise characterization of
    some diseases, many discoveries to be made?
  • Credit card fraud detection (HNC company).
  • Discounts of air/hotel for frequent travelers.

10
Important issues in data mining.
  • Use of statistical and CI methods for KDD.
  • What makes an interesting pattern?
  • Handling uncertainty in the data.
  • Handling noise, outliers and missing or unknown
    data.
  • Finding linguistic variables, discretization of
    continuous data, presentation and evaluation of
    knowledge.
  • Knowledge representation for structural data,
    heterogeneous information, textual databases
    NLP.
  • Performance, scalability, distributed data,
    incremental or on-line processing.
  • Best form of explanation depends on the
    application.

11
DM dangers
  • If there are too many conclusions to draw some
    inferences will be true by chance due to too
    small data samples (Bonferronis
    theorem).Example 1 David Rhine (Duke Univ) ESP
    tests. 1 person in 1000 guessed correctly color
    (red or black) of 10 cards is this evidence for
    ESP?Retesting of these people gave average
    results. Rhines conclusion telling people that
    they have ESP interferes with their ability
  • Example 2 using m letters to form a random
    sequence of the length N all possible
    subsequences of logmN are found gt Bible code!

12
Data Mining process
  • Knowledge discovery in databases (KDD)
  • a search process for understandable and useful
    patterns in data.

Data Mining
most effort
13
Stages of DM process
  • Data gathering, data warehousing, Web crawling.
  • Preparation of the data cleaning, removing
    outliers and impossible values, removing wrong
    records, finding missing data.
  • Exploratory data analysis visualization of
    different aspects of data.
  • Finding relevant features for questions that are
    asked, preparing data structures for predictive
    methods, converting symbolic values to numerical
    representation.
  • Pattern extraction, discovery, rules, prototypes.
  • Evaluation of knowledge gained, finding useful
    patterns, consultation with experts.

14
Multidimensional Data Cuboids
  • Data warehouses use multidimensional data model.
  • Projections (views) of data on different
    dimensions (attributes) form data cuboids.
  • In DB warehousing literature base cuboid
    original data, N-Dim. apex cuboid 0-D cuboid,
    highest-level summary data cube lattice of
    cuboids.
  • Ex Sales data cube, viewed in multiple
    dimensions
  • Dimension tables, ex. item (item_name, brand,
    type), or time(day, week, month, quarter, year)
  • Fact tables, measures (such as cost), and keys to
    each of the related dimension tables

15
Data Cube A Lattice of Cuboids
time,item
time,item,location
16
Forms of useful knowledge
AI/Machine Learning camp Neural nets are black
boxes. Unacceptable! Symbolic rules forever.
  • But ... knowledge accessible to humans is in
  • symbols,
  • similarity to prototypes,
  • images, visual representations.
  • What type of explanation is satisfactory?
  • Interesting question for cognitive scientists.
  • Different answers in different fields.

17
Forms of knowledge
  • Humans remember examples of each category and
    refer to such examples as similarity-based or
    nearest-neighbors methods do.
  • Humans create prototypes out of many examples
    as Gaussian classifiers, RBF networks, neurofuzzy
    systems do.
  • Logical rules are the highest form of
    summarization of knowledge.
  • Types of explanation
  • exemplar-based prototypes and similarity
  • logic-based symbols and rules
  • visualization-based exploratory data analysis,
    maps, diagrams, relations ...

18
Computational Intelligence
Soft computing
Computational IntelligenceData gt
KnowledgeArtificial Intelligence
19
CI methods for data mining
  • Provide non-parametric (universal), predictive
    models of data.
  • Classify new data to pre-defined categories,
    supporting diagnosis prognosis.
  • Discover new categories, clusters, patterns.
  • Discover interesting associations, correlations.
  • Allow to understand the data, creating fuzzy or
    crisp logical rules, or prototypes.
  • Help to visualize multi-dimensional relationships
    among data samples.

20
Association rules
  • Classification rules X gt C(X)
  • Association rules looking for correlation
    between components of X, i.e. probability
    p(XiX1,Xi-1,Xi1,Xn).
  • Market basket problem many items selected from
    an available pool to a basket what are the
    correlations?
  • Only frequent items are interestingitemsets
    with high support, i.e. appearing together in
    many baskets. Search for rules above support
    threshold gt 1.

21
Association rules - related
  • Related problems to market basket correlation
    between documents high for plagiarism phrases
    in documents high for semantically related
    documents.
  • Causal relations matter, although may be
    difficult to determine lower the price of
    diapers, keep high beer price, or try the reverse
    what will happen?
  • More general approach Bayesian belief networks,
    causal networks, graphical models.

22
Clustering
  • Given points in multidimensional space divided
    them into groups that are similar.
  • Ex if epidemic breaks, look for location of
    cases on the map (cholera in London). Documents
    in the space of words cluster according to their
    topics.
  • How to measure similarity?
  • Hierarchical approaches start from single cases,
    join them forming clusters ex
    dendrogram.Centroid approaches assume a few
    centers and adapt their position ex k-means,
    LVQ, SOM.

23
Neural networks
  • Inspired by neurobiology simple elements
    cooperate changing internal parameters.
  • Large field, dozens of different models, over 500
    papers on NN in medicine each year.
  • Supervised networks heteroassociative mapping
    XgtY, symptoms gt diseases,universal
    approximators.
  • Unsupervised networks clusterization,
    competitive learning, autoassociation.
  • Reinforcement learning modeling behavior,
    playing games, sequential data.

24
Supervised learning
  • Compare the desired with the achieved outputs
    you cant always get what you want.
  • Examples MLP/RBF NN, kNN, SVM, LDA, DT

25
Unsupervised learning
  • Find interesting structures in data.
  • SOM, many variants.

26
Reinforcement learning
  • Reward comes after the sequence of actions.
  • Games, survival behavior, planning sequences of
    actions.

27
Unsupervised NN example
Clustering and visualization of the quality of
life index (UN data) by SOM map.
Poor classification, inaccurate visualization.
28
Real and artificial neurons
Nodes artificial neurons
Dendrites
Signals
Synapses
Synapses
(weights)
Axon
29
Neural network for MI diagnosis
Myocardial Infarction
p(MIX)
0.7
Outputweights
Inputweights
Sex
Age
Smoking
Elevation
Pain
ECG ST
Duration
30
MI network function
  • Training setting the values of weights and
    thresholds, efficient algorithms exist.

Effect non-linear regression function
Such networks are universal approximators they
may learn any mapping X gt Y
31
Knowledge from networks
  • Simplify networks force most weights to 0,
    quantize remaining parameters, be constructive!
  • Regularization mathematical technique
    improving predictive abilities of the network.
  • Result MLP2LN neural networks that are
    equivalent to logical rules.

32
MLP2LN
  • Converts MLP neural networks into a network
    performing logical operations (LN).

Input layer
Output one node per class.
Aggregation better features
Rule units threshold logic
Linguistic units windows, filters
33
Learning dynamics
Decision regions shown every 200 training epochs
in x3, x4 coordinates borders are optimally
placed with wide margins.
34
Neurofuzzy systems
Fuzzy m(x)0,1 (no/yes) replaced by a degree
m(x)?0,1. Triangular, trapezoidal, Gaussian ...
MF.
M.f-s in many dimensions
  • Feature Space Mapping (FSM) neurofuzzy system.
  • Neural adaptation, estimation of probability
    density distribution (PDF) using single hidden
    layer network (RBF-like) with nodes realizing
    separable functions

35
GhostMiner Philosophy
  • GhostMiner, data mining tools from our lab.
    http//www.fqspl.com.pl/ghostminer/
  • Separate the process of model building and
    knowledge discovery from model use gt
    GhostMiner Developer GhostMiner Analyzer.
  • There is no free lunch provide different type
    of tools for knowledge discovery. Decision tree,
    neural, neurofuzzy, similarity-based, committees.
  • Provide tools for visualization of data.
  • Support the process of knowledge discovery/model
    building and evaluating, organizing it into
    projects.

36
Heterogeneous systems
Homogenous systems one type of building
blocks, same type of decision borders. Ex
neural networks, SVMs, decision trees, kNNs
. Committees combine many models together, but
lead to complex models that are difficult to
understand.
  • Discovering simplest class structures, its
    inductive bias, requires heterogeneous adaptive
    systems (HAS).
  • Ockham razor simpler systems are better.
  • HAS examples
  • NN with many types of neuron transfer functions.
  • k-NN with different distance functions.
  • DT with different types of test criteria.

37
Wine data example
Chemical analysis of wine from grapes grown in
the same region in Italy, but derived from three
different cultivars.Task recognize the source
of wine sample.13 quantities measured,
continuous features
  • alcohol content
  • ash content
  • magnesium content
  • flavanoids content
  • proanthocyanins phenols content
  • OD280/D315 of diluted wines
  • malic acid content
  • alkalinity of ash
  • total phenols content
  • nonanthocyanins phenols content
  • color intensity
  • hue
  • proline.

38
Exploration and visualization
  • General info about the data

39
Exploration data
  • Inspect the data

40
Exploration data statistics
  • Distribution of feature values

Proline has very large values, the data should be
standardized before further processing.
41
Exploration data standardized
  • Standardized data unit standard deviation, about
    2/3 of all data should fall within
    mean-std,meanstd

Other options normalize to fit in -1,1, or
normalize rejecting some extreme values.
42
Exploration 1D histograms
  • Distribution of feature values in classes

Some features are more useful than the others.
43
Exploration 1D/3D histograms
  • Distribution of feature values in classes, 3D

44
Exploration 2D projections
  • Projections (cuboids) on selected 2D

Projections on selected 2D
45
Visualize data
Relations in more than 3D are hard to
imagine. SOM mappings popular for
visualization, but rather inaccurate, no measure
of distortions. Measure of topographical
distortions map all Xi points from Rn to xi
points in Rm, m lt n, and ask How well are Rij
D(Xi, Xj) distances reproduced by distances rij
d(xi,xj) ? Use m 2 for visualization, use
higher m for dimensionality reduction.
46
Visualize data MDS
Multidimensional scaling invented in psychometry
by Torgerson (1952), re-invented by Sammon (1969)
and myself (1994) Minimize measure of
topographical distortions moving the x
coordinates.
47
Visualize data Wine
3 clusters are clearly distinguished, 2D is fine.
The green outlier can be identified easily.
48
Decision trees
Simplest things first use decision tree to find
logical rules.
Test single attribute, find good point to split
the data, separating vectors from different
classes. DT advantages fast, simple, easy to
understand, easy to program, many good
algorithms.
4 attributes used, 10 errors, 168 correct,
94.4 correct.
49
Decision borders
Univariate trees test the value of a single
attribute x lt a.
Multivariate trees test on combinations of
attributes, hyperplanes.
Result feature space is divided into cuboids.
Wine data univariate decision tree borders for
proline and flavanoids
50
Logical rules
Crisp logic rules for continuous x use
linguistic variables (predicate functions).
sk(x) s True XkL x L X'k, for example
small(x) Truexx lt 1 medium(x)
Truexx ĂŽ 1,2 large(x) Truexx gt
2 Linguistic variables are used in crisp
(prepositional, Boolean) logic rules IF
small-height(X) AND has-hat(X) AND has-beard(X)
THEN (X is a Brownie) ELSE IF ... ELSE ...
51
Crisp logic decisions
Crisp logic is based on rectangular membership
functions
True/False values jump from 0 to 1. Step
functions are used for partitioning of the
feature space.
Very simple hyper-rectangular decision borders.
Sever limitation on the expressive power of
crisp logical rules!
52
Logical rules - advantages
Logical rules, if simple enough, are preferable.
  • Rules may expose limitations of black box
    solutions.
  • Only relevant features are used in rules.
  • Rules may sometimes be more accurate than NN and
    other CI methods.
  • Overfitting is easy to control, rules usually
    have small number of parameters.
  • Rules forever !? A logical rule about logical
    rules is

53
Logical rules - limitations
  • Logical rules are preferred but ...
  • Only one class is predicted p(CiX,M) 0 or 1
  • black-and-white picture may be inappropriate in
    many applications.
  • Discontinuous cost function allow only
    non-gradient optimization.
  • Sets of rules are unstable small change in the
    dataset leads to a large change in structure of
    complex sets of rules.
  • Reliable crisp rules may reject some cases as
    unclassified.
  • Interpretation of crisp rules may be misleading.
  • Fuzzy rules are not so comprehensible.

54
Rules - choices
Simplicity vs. accuracy. Confidence vs.
rejection rate.
p is a hit p- false alarm p- is a miss.
Accuracy (overall) A(M) p p--
Error rate L(M) p- p-
Rejection rate R(M)prp-r 1-L(M)-A(M)
Sensitivity S(M) p p /p
Specificity S-(M) p-- p-- /p-
55
Rules error functions
  • The overall accuracy is equal to a combination of
    sensitivity and specificity weighted by the a
    priori probabilities

A(M) pS(M)p-S-(M)
Optimization of rules for the C class large g
means no errors but high rejection rate.
E(Mg) gL(M)-A(M) g (p-p-) - (pp--)
minM E(Mg) ? minM (1g)L(M)R(M)
Optimization with different costs of errors
minM E(Ma) minM p- a p- minM
p(1-S(M)) - pr(M) a p-(1-S-(M)) -
p-r(M) ROC (Receiver Operating Curve) p
(p-), hit(false alarm).
56
Wine example SSV rules
  • Decision trees provide rules of different
    complexity.

Simplest tree 5 nodes, corresponding to 3
rules 25 errors, mostly Class2/3 wines mixed.
57
Wine SSV 5 rules
  • Lower pruning leads to more complex tree.

7 nodes, corresponding to 5 rules 10 errors,
mostly Class2/3 wines mixed.
58
Wine SSV optimal rules
What is the optimal complexity of rules? Use
crossvalidation to estimate generalization.
Various solutions may be found, depending on the
search 5 rules with 12 premises, making 6
errors, 6 rules with 16 premises and 3 errors,
8 rules, 25 premises, and 1 error.
if OD280/D315 gt 2.505 ? proline gt 726.5 ? color gt
3.435 then class 1 if OD280/D315 gt 2.505 ?
proline gt 726.5 ? color lt 3.435 then class 2 if
OD280/D315 lt 2.505 ? hue gt 0.875 ? malic-acid lt
2.82 then class 2 if OD280/D315 gt 2.505 ? proline
lt 726.5 then class 2 if OD280/D315 lt 2.505 ? hue
lt 0.875 then class 3 if OD280/D315 lt 2.505 ? hue
gt 0.875 ? malic-acid gt 2.82 then class 3
59
Wine FSM rules
SSV hierarchical rules FSM density estimation
with feature selection.
Complexity of rules depends on desired
accuracy. Use rectangular functions for crisp
rules. Optimal accuracy may be evaluated using
crossvalidation.
FSM discovers simpler rules, for example if
proline gt 929.5 then class 1 (48 cases, 45
correct, 2 recovered by other rules). if color lt
3.79285 then class 2 (63 cases, 60 correct)
60
Examples of interesting knowledge discovered!
  • The most famous example of knowledge discovered
    by data mining
  • correlation between beer, milk and diapers.

Other examples 2 subtypes of galactic spectra
forced astrophysicist to reconsider star
evolutionary processes. Several examples of
knowledge found by us in medical and other
datasets follow.
61
Mushrooms
  • The Mushroom Guide no simple rule for mushrooms
    no rule like leaflets three, let it be for
    Poisonous Oak and Ivy.

8124 cases, 51.8 are edible, the rest
non-edible. 22 symbolic attributes, up to 12
values each, equivalent to 118 logical features,
or 21183.1035 possible input vectors. Odor
almond, anise, creosote, fishy, foul, musty,
none, pungent, spicy Spore print color black,
brown, buff, chocolate, green, orange, purple,
white, yellow.
Safe rule for edible mushrooms
odor(almond.or.anise.or.none) U
spore-print-color R green 48 errors,
99.41 correct This is why animals have such a
good sense of smell! What does it tell us
about odor receptors?
62
Mushrooms rules
  • To eat or not to eat, this is the question! Not
    any more ...

A mushroom is poisonous if R1) odor R (almond
Ăš anise Ăš none) 120 errors, 98.52 R2)
spore-print-color green 48 errors, 99.41
R3) odor none U stalk-surface-below-ring
scaly U stalk-color-above-ring R brown
8 errors, 99.90 R4) habitat leaves U
cap-color white no errors! R1 R2 are
quite stable, found even with 10 of data R3
and R4 may be replaced by other rules, ex R'3)
gill-sizenarrow U stalk-surface-above-ring(silky
Ăš scaly) R'4) gill-sizenarrow U
populationclustered Only 5 of 22 attributes
used! Simplest possible rules? 100 in CV tests
- structure of this data is completely clear.
63
Recurrence of breast cancer
  • Data from Institute of Oncology, University
    Medical Center, Ljubljana, Yugoslavia.

286 cases, 201 no recurrence (70.3), 85
recurrence cases (29.7) no-recurrence-events,
40-49, premeno, 25-29, 0-2, ?, 2, left,
right_low, yes 9 nominal features age (9 bins),
menopause, tumor-size (12 bins), nodes involved
(13 bins), node-caps, degree-malignant (1,2,3),
breast, breast quad, radiation.
64
Rules for breast cancer
  • Data from Institute of Oncology, University
    Medical Center, Ljubljana, Yugoslavia.

Many systems used, 65-78 accuracy reported.
Single rule IF (nodes-involved ? 0,2 Ă™
degree-malignant 3 THEN recurrence, ELSE
no-recurrence 76.2 accuracy, only trivial
knowledge in the data Highly malignant breast
cancer involving many nodes is likely to strike
back.
65
Recurrence - comparison.
Method 10xCV accuracy MLP2LN 1
rule 76.2 SSV DT stable rules 75.7 ? 1.0
k-NN, k10, Canberra 74.1 ?1.2 MLPbackprop.
73.5 ? 9.4 (Zarndt)CART DT 71.4 ? 5.0
(Zarndt) FSM, Gaussian nodes 71.7 ? 6.8 Naive
Bayes 69.3 ? 10.0 (Zarndt) Other decision
trees lt 70.0
66
Breast cancer diagnosis.
  • Data from University of Wisconsin Hospital,
    Madison, collected by dr. W.H. Wolberg.

699 cases, 9 features quantized from 1 to 10
clump thickness, uniformity of cell size,
uniformity of cell shape, marginal adhesion,
single epithelial cell size, bare nuclei, bland
chromatin, normal nucleoli, mitoses Tasks
distinguish benign from malignant cases.
67
Breast cancer rules.
  • Data from University of Wisconsin Hospital,
    Madison, collected by dr. W.H. Wolberg.

Simplest rule from MLP2LN, large regularization
If uniformity of cell size lt 3 Then
benign Else malignant Sensitivity0.97,
Specificity0.85 More complex NN solutions, from
10CV estimate Sensitivity 0.98,
Specificity0.94
68
Breast cancer comparison.
Method 10xCV accuracy k-NN, k3,
Manh 97.0 ? 2.1 (GM)FSM, neurofuzzy 96.9 ?
1.4 (GM) Fisher LDA 96.8 MLPbackprop.
96.7 (Ster, Dobnikar)LVQ 96.6 (Ster,
Dobnikar) IncNet (neural) 96.4 ? 2.1 (GM)Naive
Bayes 96.4 SSV DT, 3 crisp rules 96.0 ?
2.9 (GM) LDA (linear discriminant) 96.0
Various decision trees 93.5-95.6
69
Melanoma skin cancer
  • Collected in the Outpatient Center of Dermatology
    in RzeszĂłw, Poland.
  • Four types of Melanoma benign, blue, suspicious,
    or malignant.
  • 250 cases, with almost equal class distribution.
  • Each record in the database has 13 attributes
    asymmetry, border, color (6), diversity (5).
  • TDS (Total Dermatoscopy Score) - single index
  • Goal hardware scanner for preliminary diagnosis.

70
Melanoma rules
R1 IF TDS 4.85 AND C-BLUE IS absent THEN
MELANOMA IS Benign-nevus R2 IF TDS 4.85 AND
C-BLUE IS present THEN MELANOMA IS
Blue-nevus R3 IF TDS gt 5.45 THEN MELANOMA IS
Malignant R4 IF TDS gt 4.85 AND TDS lt 5.45
THEN MELANOMA IS Suspicious 5 errors (98.0)
on the training set 0 errors (100 ) on the test
set. Feature aggregation is important! Without
TDS 15 rules are needed.
71
Melanoma results
Method Rules Training Test MLP2LN,
crisp rules 4 98.0 all 100 SSV Tree,
crisp rules 4 97.50.3 100FSM,
rectangular f. 7 95.51.0 100 knn
prototype selection 13 97.50.0 100
FSM, Gaussian f. 15 93.71.0 953.6 knn
k1, Manh, 2 features -- 97.40.3 100 LERS,
rough rules 21 -- 96.2
72
Summary
  • Data mining is a large field only a few issues
    have been mentioned here.
  • DM involves many steps, here only those related
    to pattern recognition were stressed, but in
    practice scalability and efficiency issues may be
    most important.

Neural networks are used still mostly for
building predictive data models, but they may
also provide simplified description in form of
rules. Rules are not the only for of data
understanding. Rules may be a beginning for a
practical application. Some interesting
knowledge has been discovered.
73
Challenges
  • Fully automatic universal data analysis systems
    press the button and wait for the truth
  • Discovery of theories rather than data models
  • Integration with image/signal analysis
  • Integration with reasoning in complex domains
  • Combining expert systems with neural networks

We are slowly getting there. More more
computational intelligence tools (including our
own) are available.
74
Disclaimer
  • A few slides/figures were taken from various
    presentations found in the Internet
    unfortunately I cannot identify original authors
    at the moment, since these slides went through
    different iterations.
  • I have to apologize for that.
Write a Comment
User Comments (0)