Visual Data Mining: Concepts, Frameworks and Algorithm Development PowerPoint PPT Presentation

presentation player overlay
1 / 30
About This Presentation
Transcript and Presenter's Notes

Title: Visual Data Mining: Concepts, Frameworks and Algorithm Development


1
Visual Data Mining Concepts, Frameworks and
Algorithm Development
  • Student Fasheng Qiu
  • Instructor Dr. Yingshu Li

2
Outline
  • Introduction to VDM
  • VDM frameworks
  • Case study of a specific framework VQLBCI
  • Conclusions
  • References

3
Introduction to VDM
  • The use of visualization techniques to allow data
    miners to evaluate, monitor, and guide the
    inputs, products and process of data mining.
  • It can help introduce user insights, preferences,
    and biases in the earlier stages of the data
    mining life-cycle to reduce its overall
    computation complexity and reduce the set of
    uninteresting patterns in the product.
  • The new insights may facilitate the development
    of better algorithms and processes for data
    mining.

4
Why Visual Data Mining
  • VDM takes advantage of both,
  • The power of automatic calculations, and
  • The capabilities of human processing.
  • Human perception is involved

5
Why Visual Data Mining
  • It is useful because
  • Early user feedback about level of interest can
    filter out uninteresting patterns early in the
    process.
  • User feedback can improve the selection of
    appropriate learning algorithms based on the
    application domain.
  • Visual inspection of datasets can provide direct
    clues towards interesting patterns in the data.
  • Visualization of the data mining process can
    provide new insights and may result in the design
    of better algorithms for data mining.

6
The current research in VDM
  • Mainly focus on the framework of integrating the
    information visualization paradigms and data
    mining techniques.
  • For example,
  • A Flexible Approach for Visual Data Mining.
  • A framework for visual data mining of structures.
  • A Visual Data Mining Framework for Convenient
    Identification of Useful knowledge.
  • A Visual Data Mining Framework by loose-coupling
    of databases and visualization systems.

7
A VDM framework-VQLBCI
  • The following slides present a VDM framework and
    the application of the VDM framework towards
    designing new algorithms that can learn decision
    trees by manually refining some of the decisions
    made by well known algorithms such as C4.5.

8
Components of VQLBCI
  • The three major components of VQLBCI are Visual
    Representations, Computations and Events.

9
Visual Development of Algorithms
  • Most interesting use of visual data mining is the
    development of new insights and algorithms.
  • The figure below shows the ER diagram for
    learning classification decision trees.
  • This model allows the user to monitor the quality
    and impact of decisions made by the learning
    procedure.
  • Learning procedure can be refined interactively
    via a visual interface.

10
ER diagram for the search space of decision tree
learning algorithm
11
General Process
  • Learning a classification decision tree from a
    training data set can be regarded as a process of
    searching for the best decision tree that meets
    user-provided goal constraints.
  • The problem space of this search process consists
    of Model Candidates, Model Candidate Generator
    and Model Constraints.
  • Many existing classification-learning algorithms
    like C4.5 and CDP fit nicely within this search
    framework. New learning algorithms that fit
    users requirements can be developed by defining
    the components of the problem space.

12
General Process
  • Model Candidate corresponds to the partial
    classification decision tree. Each node of the
    decision tree is a Model Atom.
  • Search process is the process of finding a final
    model candidate such that it meets user goal
    specifications.
  • Model Candidate Generator transforms the current
    model candidate into a new model candidate by
    selecting one model atom to expand from the
    expandable leaf model atoms.
  • Model Constraints (used by Model Candidate
    Generator) provide controls and boundaries to the
    search space.

13
Example of Search Process
14
Example of Search Process
15
Acceptability Constraint
  • Model Constraints consist of Acceptability
    constraints, Expandability constraints and a
    Data-Entropy calculation function.
  • Acceptability constraint predicate specifies when
    a model candidate is acceptable and thus allows
    search process to stop. EX
  • A1) Total of expandable leaf model atoms 0.
    (Default)
  • A2) Overall error rate of the model candidate lt
    acceptable error rate.
  • A3) Total number of model atoms in the model
    candidategt maximal allowable tree size.
  • A1 is used in C4.5 and CDP

16
Expandability Constraint
  • An Expandability constraint predicate specifies
    whether a leaf model atom is expandable or not.
    EX
  • C4.5 uses E1 and E2
  • CDP uses E2 and E3

17
Traversal Strategy
  • Traversal strategy ranks expandable leaf model
    atoms based on the model atom attributes. EX
  • Breadth-first
  • Depth-first
  • Orders based on other model atom attributes.
    (Best-first)

18
Steps in Visual Algorithm Development
  • No single algorithm is the best all the time,
    performance is highly data dependent.
  • By changing different predicates of model
    constraints, users can construct new
    classification-learning algorithm.
  • This enables users to find an algorithm that
    works the best on a given data set.
  • Two algorithms are developed BF based on Best
    First search idea and CDP which is a
    modification of CDP

19
BF
  • This algorithm is based on the Best-First search
    idea.
  • For Acceptability criteria, it includes A1 and A2
    with a user specified acceptable error rate.
  • The Traversal strategy chosen is T3
  • In Best-First, expandable leaf model atoms are
    ranked according to the decreasing order of the
    number of misclassified training cases. (local
    error rate size of subset training data set)
  • The traversal strategy will expand a model atom
    that has the most misclassified training cases,
    thus reducing the overall error rate the most.

20
CDP
  • CDP is a modification of CDP
  • CDP has dynamic pruning using expandability
    constraint E3.
  • Here, the depth is modified according to the size
    of the training data set of the model atom.
  • We set
  • b is the branching factor of the decision tree,
    t is the size of training data set belonging to
    model atom, T is the whole training data set.

21
Comparison of different classification learning
algorithms
22
Experiment
  • The new BF and CDP algorithms are compared with
    the C4.5 and CDP algorithms.
  • Various metrics are selected to compare the
    efficiency, accuracy and size of final decision
    trees of the classification algorithm.
  • The generation efficiency of the nodes is
    measured in terms of the total number of nodes
    generated.
  • To compare accuracy of the various algorithms,
    the mean classification error on the test data
    sets have been computed.

23
Classification error for 10 data sets
24
Nodes generated for 10 data sets
25
Final decision tree size
26
Results
  • CDP has accuracy comparable to C4.5 while
    generating considerably fewer nodes.
  • CDP has accuracy comparable to C4.5 while
    generating considerably fewer nodes.
  • CDP outperformed CDP in error rate and number of
    nodes generated.
  • Considering all performance metrics together,
    CDP is the best overall algorithm.
  • Considering classification accuracy alone, C4.5P
    is the winner.

27
Results
  • Different datasets require different algorithms
    for best results.
  • Diverse user requirements put different
    constraints on the final decision tree.
  • The experiment shows that Visual Data Mining
    Framework can help find the most suitable
    algorithm for a given data set and group of user
    requirements.

28
Conclusions
  • VDM is useful in enhancing the data mining
    process.
  • Different VDM frameworks are provided.
  • VDM can help the design of better algorithms.

29
References
  • Matthias Kreuseler and Heidrun Schumann. A
    Flexible Approach for Visual Data Mining. IEEE
    TRANSACTIONS ON VISUALIZATION AND COMPUTER
    GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002
  • M. Kreuseler, T.Nocke, etc. A History Mechanism
    for Visual Data Mining. IEEE Symposium on
    Information Visualization 2004 October 10-12,
    Austin, Texas, USA
  • Kaidi Zhao, Bing Liu, etc. A Visual Data Mining
    Framework for Convenient Identification of Useful
    Knowledge. Proceedings of the Fifth IEEE
    International Conference on Data Mining (ICDM05)
  • Hans-Jorg Schulz, Thomas Nocke, etc. A Framework
    for Visual Data Mining of Structures.
    Twenty-Ninth Australasian Computer Science
    Conference (ACSC2006), Hobart, Tasmania,
    Australia, January 2006.
  • Alexander Kort. Visual Data Mining and Zoomable
    Interfaces. IUI04, January 1316, 2004, Madeira,
    Funchal, Portugal. ACM 1-58113-815-6/04/0001
  • Koji Kato, Tomoyuki Shibata. Visual Data Mining
    using Omni-directional Sensor. IEEE Conference on
    Multisensor Fusion and Integration for
    Intelligent Systems 2003
  • Daniel A. Keim. Information Visualization and
    Visual Data Mining. IEEE TRANSACTIONS ON
    VISUALIZATION AND COMPUTER GRAPHICS, VOL. 7, NO.
    1, JANUARY-MARCH 2002
  • M.Ganesh, Eui-Hong Han, etc. Visual Data Mining
    Framework and Algorithm Development. Technical
    Report, March 12, 1996

30
Thats it
  • Questions? Comments?
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com