The Data Mining Visual Environment - PowerPoint PPT Presentation

About This Presentation
Title:

The Data Mining Visual Environment

Description:

The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform mining ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 13
Provided by: StephenK164
Category:

less

Transcript and Presenter's Notes

Title: The Data Mining Visual Environment


1
The Data Mining Visual Environment
Motivation   Major problems with existing DM
systems They are based on non-extensible
frameworks. They provide a non-uniform mining
environment - the user is presented with totally
different interface(s) across implementations of
different DM techniques. Major needs An overall
framework that can support the entire Knowledge
Discovery (KD) process (accommodate and integrate
all KD phases seamlessly).   Placing the user at
the center of the entire KD process/in the
framework. In fact the corresponding system
should provide a consistent, uniform and flexible
visual interaction environment that supports the
user throughout the entire discovery process.
2
The Data Mining Visual Environment
  • Primary layers
  • User layer
  • Engine layer
  • Data layer
  • Main features
  • Open
  • Modular with
  • well defined
  • modification/extension points
  • Possible integration of
  • different tasks
  • eg output reuse by another task
  • User flexibility and enablement to
  • process data and knowledge,
  • drive and guide the entire KD
  • process

System Architecture
At present, there is a partial prototype,
complete implementation is underway.
3
The Data Mining Visual Environment
  • Primary layers
  • User Interface/GUI Container
  • (interacts with specific DM
  • visual environments)
  • Developed using Java
  • Abstract DM Engine/Wrapper of
  • DM algorithms (interacts with
  • specific DM algorithms)
  • Developed using Java
  • Note
  • The DM algorithms may be
  • implemented by third parties
  • in possibly any language.
  • DM methods (but not limited to)
  • MQs, ARs, and clustering

System Architecture The Prototype
Stephen
Stefano
4
The Data Mining Visual Environment
Visual Environment A consistent, uniform,
flexible and intuitive GUI, with support
throughout the whole DM process. The principal
focus is to support the user in Visual
construction of the task relevant dataset The
user directly interacts with data. For this task,
there are two intuitive interaction
spaces. Visual construction of the mining query
The user directly interacts with data and other
parameters (e.g. threshold values) in making
queries e.g., in the Metaquery Environment, the
user can suggest patterns by linking attributes,
while the Association Rule Environment offers
visual baskets. Visual output presentation and
interaction Exploiting relevant effective
visualizations and where necessary, we have
designed novel visualizations. Planning E.g.,
advertising relevant prior knowledge.   Handling
the non-static nature of users quest E.g.,
enabling user to adjust.
5
The Data Mining Visual Environment
Visual Environment Overall
6
The Data Mining Visual Environment
Visual Environment Tree View (Progress
Companion)
Before user settings
After user settings
After DM results
7
The Data Mining Visual Environment
Visual Environment Clustering - Input
8
The Data Mining Visual Environment
Visual Environment Clustering - Output
... for more on the prototype, demo
9
The Data Mining Visual Environment
Usability Usability heuristics Done, but
regular reference to the same will go
on.   Mock-up tests Done with DM experts. The
experts gave an encouraging feedback and even
suggestions on how to improve the
interface. (These tests were done at the end of
2001.) Questionnaire experiments The
experiments involved the application simulation,
a case study, data schema and user tasks
corresponding to the case study, and a
questionnaire. Positive interface features
consistency, layout/organization, visual
exploration. Negative interface features size of
some visual elements small/big (These tests were
done in July 2002.) Formal usability tests In
the pipeline.
10
The Data Mining Visual Environment
  • The Clustering Engine
  • Clustering method Generalizations of three
    techniques homogeneity, separation, density.
  • Clustering based on homogeneity/separation
    Homogeneity (separation) is a global measure of
    the similarity between points belonging to the
    same cluster (to different clusters)
  • Clustering based on density Clusters are regions
    of the object space where objects are located
    most frequently
  • Clustering based The system selects the best
    clustering according to a cost function
  • For homogeneity/separation-based clustering the
    cost function is computed by evaluating
    pointwise, clusterwise, and partitionwise
    similarity/dissimilarity
  • For density-based clustering, the cost function
    is derived from an estimated density function

11
The Data Mining Visual Environment
  • Formal Semantics of the Input Environment
  • Visual language abstract syntax semantics
  • Abstract syntax defined in terms of multi-graphs
  • Visual components are vertices of the multi-graph
  • Spatial relations between visual components are
    edges of the multi-graph
  • Semantics
  • Clustering defined by a mapping between
    multi-graphs and cost functions and predicates
    expressing optimality
  • Metaqueries/association rules defined by a
    mapping between multi-graphs and rules
  •  

12
The Data Mining Visual Environment
  • Operational Specification
  • Concrete, high-level syntax of the tasks proposed
    in the usability tests
  • Describes legal click-streams allowed to occur
    during operation
  • Standard grammar notation
  • Communication protocol between the abstract
    clustering engine and the data mining engines
  • XML DTDs based on PMML 2.0
  • Extension of PMML 2.0 to
  • Specification of input
  • Broader spectrum of clustering methods
  • Concrete semantics of the clustering task by
    mapping on symbols in the tasks grammar to
    structures of the communication protocol
  • Interpretation function recursively defined on
    the grammar rules of the high-level syntax of
    tasks
  • The interpretation of a legal click-stream is an
    XML document satisfying the DTD of the input
    specification
Write a Comment
User Comments (0)
About PowerShow.com