GhostMiner Wine example - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

GhostMiner Wine example

Description:

There is no free lunch provide different type of tools for knowledge discovery. ... Multidimensional scaling: invented in psychometry by Torgerson (1952), re ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 27
Provided by: valeri86
Category:

less

Transcript and Presenter's Notes

Title: GhostMiner Wine example


1
GhostMiner Wine example
  • Wlodzislaw Duch
  • Dept. of Informatics, Nicholas Copernicus
    University, Torun, Poland
  • http//www.phys.uni.torun.pl/duch

ISEP Porto, 8-12 July 2002
2
GhostMiner Philosophy
  • GhostMiner, data mining tools from our lab.
    http//www.fqspl.com.pl/ghostminer/
  • Separate the process of model building and
    knowledge discovery from model use gt
    GhostMiner Developer GhostMiner Analyzer.
  • There is no free lunch provide different type
    of tools for knowledge discovery. Decision tree,
    neural, neurofuzzy, similarity-based, committees.
  • Provide tools for visualization of data.
  • Support the process of knowledge discovery/model
    building and evaluating, organizing it into
    projects.

3
GM summary
  • Ghost Miner combines 4 basic tools for
    predictive data mining and understanding of data,
    avoiding too many choices of parameters (like
    network structure specs)
  • IncNet ontogenic neural network using Kalman
    filter learning separating each class from all
    other classes
  • Feature Space Mapping neurofuzzy system producing
    logical rules of crisp and fuzzy types.
  • Separability Split Value decision tree.
  • Weighted nearest neighbor method.
  • K-classifiers and committees of models.
  • MDS visualization

4
Wine data example
Chemical analysis of wine from grapes grown in
the same region in Italy, but derived from three
different cultivars.Task recognize the source
of wine sample.13 quantities measured,
continuous features
  • alcohol content
  • ash content
  • magnesium content
  • flavanoids content
  • proanthocyanins phenols content
  • OD280/D315 of diluted wines
  • malic acid content
  • alkalinity of ash
  • total phenols content
  • nonanthocyanins phenols content
  • color intensity
  • hue
  • proline.

5
Exploration and visualization
  • Load data (using load icon) and look at general
    info about the data.

6
Exploration data
  • Inspect the data itself in the raw form.

7
Exploration data statistics
  • Look at distribution of feature values

Note that Proline has very large values,
therefore the data should be standardized before
further processing.
8
Exploration data standardized
  • Standardized data unit standard deviation, about
    2/3 of all data should fall within
    mean-std,meanstd

Other options normalize to fit in -1,1, or
normalize rejecting some extreme values.
9
Exploration 1D histograms
  • Distribution of feature values in classes

Some features are more useful than the others.
10
Exploration 1D/3D histograms
  • Distribution of feature values in classes, 3D

11
Exploration 2D projections
  • Projections (cuboids) on selected 2D

Projections on selected 2D
12
Visualize data
Relations in more than 3D are hard to
imagine. SOM mappings popular for
visualization, but rather inaccurate, no measure
of distortions. Measure of topographical
distortions map all Xi points from Rn to xi
points in Rm, m lt n, and ask How well are Rij
D(Xi, Xj) distances reproduced by distances rij
d(xi,xj) ? Use m 2 for visualization, use
higher m for dimensionality reduction.
13
Visualize data MDS
Multidimensional scaling invented in psychometry
by Torgerson (1952), re-invented by Sammon (1969)
and myself (1994) Minimize measure of
topographical distortions moving the x
coordinates.
14
Visualize data Wine
3 clusters are clearly distinguished, 2D is fine.
The green outlier can be identified easily.
15
Decision trees
Simplest things first use decision tree to find
logical rules.
Test single attribute, find good point to split
the data, separating vectors from different
classes. DT advantages fast, simple, easy to
understand, easy to program, many good
algorithms.
4 attributes used, 10 errors, 168 correct,
94.4 correct.
16
Decision borders
Univariate trees test the value of a single
attribute x lt a.
Multivariate trees test on combinations of
attributes, hyperplanes.
Result feature space is divided into cuboids.
Wine data univariate decision tree borders for
proline and flavanoids
17
Separability Split Value (SSV)
  • SSV criterion
  • select attribute and split value that maximizes
    the number of correctly separated pairs from
    different classes
  • if several equivalent split values exist select
    one that minimizes the number of pairs split from
    the same class.
  • Works on raw data, including symbolic values.
  • Search for splits using best-first or beam-search
    method.
  • Tests are A(x) lt T or x ? si
  • Create tree that classifies all data correctly.
  • Use crossvalidation to determine how many node to
    prune or what should be the pruning level.

18
Wine SSV 5 rules
  • Lower pruning leads to more complex tree.

7 nodes, corresponding to 5 rules 10 errors,
mostly Class2/3 wines mixed check the confusion
matrix in results.
19
Wine SSV optimal rules
What is the optimal complexity of rules? Use
crossvalidation to estimate generalization.
Various solutions may be found, depending on the
search 5 rules with 12 premises, making 6
errors, 6 rules with 16 premises and 3 errors,
8 rules, 25 premises, and 1 error.
if OD280/D315 gt 2.505 ? proline gt 726.5 ? color gt
3.435 then class 1 if OD280/D315 gt 2.505 ?
proline gt 726.5 ? color lt 3.435 then class 2 if
OD280/D315 lt 2.505 ? hue gt 0.875 ? malic-acid lt
2.82 then class 2 if OD280/D315 gt 2.505 ? proline
lt 726.5 then class 2 if OD280/D315 lt 2.505 ? hue
lt 0.875 then class 3 if OD280/D315 lt 2.505 ? hue
gt 0.875 ? malic-acid gt 2.82 then class 3
20
Neurofuzzy systems
MLP discrimination, finds separating surfaces as
combination of sigmoidal functions. Fuzzy
approach define MF replacing m(x)0,1 (no/yes)
by a degree m(x)?0,1. Typically triangular,
trapezoidal, Gaussian ... MF are used.
M.f-s in many dimensions are constructed using
products to determine the threshold of
m(X)const?0,1.
Advantage easy to add a priori knowledge (proper
bias) may work well for very small datasets!
21
Feature Space Mapping
Feature Space Mapping (FSM) neurofuzzy
system. Find best network architecture (number of
nodes and feature selection) using an ontogenic
network (growing and shrinking) with one hidden
layer. Use separable rectangular, triangular,
Gaussian MF.
Initialize using clusterization techniques.
Allow for rotation of Gaussian functions.
  • Describe the joint prob. density p(X,C).
  • Neural adaptation using RBF-like algorithms.
  • Good for logical rules and NN predictive models.

22
Wine FSM rules
SSV hierarchical rules FSM density estimation
with feature selection.
Complexity of rules depends on desired
accuracy. Use rectangular functions for crisp
rules. Optimal accuracy may be evaluated using
crossvalidation.
FSM discovers simpler rules, for example if
proline gt 929.5 then class 1 (48 cases, 45
correct, 2 recovered by other rules). if color lt
3.79285 then class 2 (63 cases, 60 correct)
23
IncNet
Incremental Neural Network (IncNet). Ontogenic
NN with single hidden layer, adding, removing and
merging neurons.
Transfer functions Gaussians or combination of
sigmoids (bi-central functions). Training use
Kalman filter approach to estimate network
parameters.
Fast Kalman filter training is usually
sufficient. Always creates one network per
class, separating it from other samples. Creates
predictive models equivalent to fuzzy rules.
24
k-nearest neighbors
Use various similarity functions to evaluate how
similar new case is to all reference (training)
cases, use p(CiX) k(Ci)/k.
Similarity functions include Minkovsky and
similar functions. Optimize k, the number of
neighbors included. Optimize the scaling factors
of features WiXi-Yi this goes beyond feature
selection.Use search-based techniques to find
good scaling parameters for features. Notice
that For k1 always 100 on the training set is
obtained! To evaluate accuracy on training use
leave-one-out procedure.
25
Committees and K-classifiers
K-classifiers in K-class problems create K
classifiers, one for each class.
Committees combine results from different
classification models create different models
using the same method (for example decision tree)
on different data samples (bootstraping) combine
several different models, including other
committees, into one model use majority voting
to decide on the predicted class. No rules, but
stable and accurate classification models.
26
Summary
  • Please get your copy from
  • http//www.fqspl.com.pl/ghostminer/
  • Ghost Miner combines 4 basic tools for predictive
    data mining and understanding of data.
  • GM includes K-classifiers and committees of
    models.
  • GM includes MDS visualization/dimensionality
    reduction.
  • Model building is separated from model use.
  • GM provides tools for easy testing of statistical
    accuracy.
  • Many new classification models are coming.
Write a Comment
User Comments (0)
About PowerShow.com