Visualization for Classification and Clustering Techniques

About This Presentation

Title:

Visualization for Classification and Clustering Techniques

Description:

Vertical representation - allows for easy user interaction ... Louie, J. Q. and Kraay, T., 'Origami: A New Visualization Tool', KDD-99, San Diego, CA ... – PowerPoint PPT presentation

Number of Views:629

Avg rating:3.0/5.0

Slides: 34

Provided by: GTE2

Category:

more less

Transcript and Presenter's Notes

Title: Visualization for Classification and Clustering Techniques

1
Visualization for Classification and Clustering
Techniques

Marc René
CSE 8331
Data Mining - Project 1

2
Overview

Importance of Data Visualization in the KDD
Process
Understanding and Trust
Visualization techniques
Classification
Clustering
Future Directions

3
KDD Process

Selection
Obtain data from all of sources
Preprocessing
After selecting the data, clean it to make sure
it is consistent
Transformation
After preprocessing the data, analyze the
format/amount of data
Data Mining
Once the data is in a useable format/content,
apply various algorithms based upon the results
trying to be achieved
Interpretation/Evaluation
Finally, present the results of the data mining
step to the user, so that the results can be used
to solve the business need at hand

4
Importance of Data Visualization

The final step in the KDD process
Highly dependent on the Data Visualization
technique
Bad/inappropriate technique may result in
misunderstanding
Misunderstanding may cause an incorrect (or no)
decision

It is important to consider that the KDD process
is useless if the results are not understandable
5
Current Issues w/Data Visualization

The literature suggests a significant reliance on
expert users
General lack of data visualization support in
many data mining tools Goebel99
These are significant problems if KDD/DM/Data
Visualization will expand at the rates suggested
Data visualization tool market 2.2 billion by
2007 Nuttall03

6
Suggested Direction

Need to determine techniques that balance
simplicity with completeness
If this can be done for non-expert users
Simplicity Completeness ? Understanding
Understanding ? Trust
Trust ? more use of KDD/DM
Result will be
Better business value
Higher ROI

7
Common Visualization Techniques

Visualization techniques dependent upon
The type of data mining technique chosen
The underlying structure and attributes of the
data
Classification Clustering
Decision Trees - Scatter Plots
Scatter Plots - Dendrograms
Axis-Parallel Decision Trees - Smoothed Data
Histograms
Circle Segments - Self-Organizing Maps
Decision Tables - Proximity Matrixes

8
Classification
9
Decision Tree

Information limited to
Attributes
Splitting values
Terminal node class assignments

10
Decision Tree with Histograms

Data mining rarely classify 100 of the data
correctly
Include the success of properly classifying the
data - histogram added for each terminal node
Percentage of data that was classified
correctly/incorrectly
Assists users in determining if the
classification is good enough

11
Decision Tree - Different Format

Vertical representation - allows for easy user
interaction
Combines the split points and classification
accuracy - compactly
Key difference - colors are matched with a
specific classification

12
Scatter Plot with Regression Line

Excellent way to view 2-dimensional data
Familiar to anyone who has taken high-school
algebra
Regression lines provide descriptive techniques
for classification

13
Axis-Parallel Decision Tree

Combination Scatter Plot and Decision Tree
Areas divided in parallel regions on the axis
Well suited for classification problems with two
attribute values
High visibility into the impact of outliers

14
Circle Segments

Multi-dimension data
Maps dataset with n dimensions onto a circle
divided by n segments
Each segment is a different attribute
Each pixel inside a segment is a single value of
the attribute
Values of each attribute are then sorted
(independently) and assigned a different colors
based upon its class

15
Decision Table

Interactive technique
Maps attribute data to a 2D hierarchical matrix
Levels can be drilled down - another set of
attributes
Height of a cell conveys the number of data
entities
Cells color coded
Neutral color ? no data in that intersection
point
Color coded by class (percentage)

16
Decision Table
17
Clustering
18
Scatter Plot

Extensions include, displaying points in
Various sizes and colors to indicate additional
attributes
Shading of points to introduce a third dimension
Using different brightness levels of the same
color to represent continuous values for the same
attribute
Using various points or classification
identifiers (i.e., numbers, symbols)
Using various glyphs to display additional
attributes

19
Scatter Plot

Map decision trees on top of scatter plots to
describe clusters

20
Scatter Plot with Regression Lines
21
Scatter Plot w/Min Spanning Tree
22
Dendrogram

Intuitive representation - hierarchical
decomposition of data into sets of nested
clusters.
From an agglomerative perspective
Each leaf - a single data entity
Each internal node - the union of all data
entities in its sub-tree
The root - the entire dataset
The height of any internal node - the similarity
between its children.

23
Dendrogram with Exemplars

The most typical member of each cluster
Wishart99
Underlined labels of the leafs
Done in combination with shading to identify the
clustering level

24
Smoothed Data Histogram

Represents data on a display map
Similar data items are located close to each
other
More defined the clusters lighter colors

25
Smoothed Data Histogram - Detail
26
Self-Organizing Map Grid

Source of Smoothed Data Histogram
Numbers indicate most common cluster

27
Proximity Matrix

Graphically display the relationship between data
elements
Usually symmetric, but can be sorted by the
strength of relationships

28
Proximity Matrix and Dendrogram
29
Summary

Data visualization techniques are extremely
important for understanding the KDD process
A balance of simplicity and completeness is
important
The techniques discussed allow average users to
understand the results of the KDD process
Understanding ? KDD results to be
interpreted/trusted by non-expert users ?
extending the business value
If data visualization techniques do not establish
a high level of trust in the KDD process, the
process will fail

30
Future Direction

Significant effort will be spent on improving
data visualization techniques in the next few
years
KDD process and data mining are becoming more
widespread
Business will expect tools to become more
user-friendly and support the varied level of
skills
Trends are moving to a more interactive mode
Static reporting techniques (i.e., standard
decision trees, standard circle segments) are
being replaced
Interactive techniques (i.e., smoothed data
histograms, interactive circle segments and
decision tables)
Very interactive data models ? virtual reality
are also being considered/proposed

31
ReferencesPart 1

Ahlberg, C., Spotfire An Information
Exploration Environment, ACM SIGMOD Record,
Volume 25, Number 4, December 1996
Ankerst, M., et. al., Visual Classification An
Interactive Approach to Decision Tree
Construction, KDD-99, San Diego, CA
Ankerst, M., et. al., Towards an Effective
Cooperation of the User and the Computer for
Classification, KDD00, Boston, MA, USA
Apte C. and Weiss S.M., Data Mining with
Decision Trees and Decision Rules, Future
Generation Computer Systems, November 1997
Arkin, E., et. al., Decision Trees for
Geometric Models, ACM, 9th Annual Computational
Geometry, 5/93/CA, USA
de Hann, G., et. al., Towards Intuitive
Exploration Tools for Data Visualization in VR,
VRST02, November 11-13, 2003, Hong Kong
Dunham, M. H., Data Mining Introductory and
Advanced Topics, Prentice Hall, 2003.
Fekete, J. and Plaisant, C., Excentric Labeling
Dynamic Neighborhood Labeling for Data
Visualization, Proceedings of the Conference on
Human factors in Computer Systems (CHI'99), ACM ,
New York
Fredrikson, A., et. al., Temporal, Geographical
and Categorical Aggregations Viewed through
Coordinated Displays A Case Study with Highway
Incident Data, NPIVM99, Kansas City, MO, 1999
Goebel, M. and Gruenwald, L., A Survey of Data
Mining and Knowledge Discovery Software Tools,
SIGKDD Explorations, June 1999.
Han, J. and Cersone, N., RuleViz A Model for
Visualizing Knowledge Discovery Process, Sixth
International Conference on Knowledge Discovery
and Data Mining, 2000
Ho, T., et. al., Visualization Support for a
User-Centered KDD Process, SIGKDD02, 2002.

32
ReferencesPart 2

Hsieh, H. and Shipman, F. M. III, VITE A Visual
Interface Supporting the Direct Manipulation of
Structured Data Using Two-Way Mappings, IUI
2000, New Orleans LA
Solving Business Problems with IBM DB2
Intelligent Miner, Presented by DB2 Developer
Domain, http//www7b.software.ibm.com/dmdd
Jain, A. K., et. al., Data Clustering A
Review, ACM Computing Surveys, Volume 3, Number
3, September 1999
Keim, D. A., Visual Techniques for Exploring
Databases, KDD97, Newport Beach, CA, 1997
Kohavi, R., and Sommerfield, D, Targeting
Business Users with Decision Table Classifiers,
KDD99, New York City, 1998
Kohavi, R., et. al., Emerging Trends in
Business Analytics, Communications of the ACM,
Volume 45, Number 8, August 2002
Liu, B., et. al., Clustering Through Decision
Tree Construction, CIKM 2000, ACM, McLean VA,
2000
Louie, J. Q. and Kraay, T., Origami A New
Visualization Tool, KDD-99, San Diego, CA
Moret, B. M. E., Decision Trees and Diagrams,
Computing Surveys, Volume 14, Number 4, December
1982
Nuttall, C., "It's a Vision Thing", Financial
Times-IT Review , November 12, 2003
Pampalk, E. et. al., Using Smoothed Data
Histograms for Cluster Visualization in
Self-Organizing Maps, Proceeding of the
International Conference on Artificial Neural
Networks (ICANN02), Springer Lecture Notes in
Computer Science, Madrid Spain, 2002
Pampalk, E. et. al., Content-based Organization
and Visualization of Music Archives, Proceeding
of the 10th ACM International Conference on
Multimedia (MM02), Juan-les-Pins, France, 2002
Pampalk, E., et. al., A New Approach to
Hierarchical Clustering and Structuring of Data
with Self-Organizing Maps, Intelligent Data
Analysis Journal (IDA), Volume 8, Number 2, 2003

33
ReferencesPart 3

Rauber, A., et. al., Empirical Evaluation of
Clustering Algorithms, Journal of Information
and Organizational Sciences (JIOS), Volume 24,
Number 2, 2000
Finding the Solution to Data Mining Exploring
the Features and Components of Enterprise Miner,
Release 4.1 from SAS SAS White Paper, 2001
See5 - Data Mining Tools, Release 1.9, Rulequest
Research 1997-2003
Simoff, S. J., VDM_at_ECML/PKDD2001 The
International Workshop on Visual Data Mining at
ECML/PKDD 2001, SIGKDD Explorations, Volume 3,
Issue 2, 2001
Thearling, K., Understanding Data Mining Its
All in the Interaction, DS Star The On-Line
Executive Journal for Data-Intensive Decision
Support, Volume 1, Number 10, December 9, 1997
Thearling, K., et. al., Visualizing Data Mining
Models, as published in Information
Visualization in Data Mining and Knowledge
Discovery, edited by Fayyad, Usama, et. al.,
Morgan Kaufman, 2001
Ward, M. O., XmdvTool Integrating Multiple
Methods for Visualizing Multivariate Data,
Proceedings of IEEE Visualization '94
(Washington, DC, 1994).
Wishart, D., Efficient Hierarchical Cluster
Analysis for Data Mining and Knowledge
Discovery, Computing Science and Statistics,
Volume 30, 1998.
Wishart, D., ClustanGraphics3 Interactive
Graphics for Cluster Analysis, Published in
Classification in the Information Age, Gaul W.
and Locarrek-Junge, H (Eds.), Springer 1999
XmdvTool Home Page (http//davis.wpi.edu/xmdv/vis
ualizations.html)

Write a Comment

User Comments (0)

About PowerShow.com

Visualization for Classification and Clustering Techniques - PowerPoint PPT Presentation

Visualization for Classification and Clustering Techniques

Vertical representation - allows for easy user interaction ... Louie, J. Q. and Kraay, T., 'Origami: A New Visualization Tool', KDD-99, San Diego, CA ... – PowerPoint PPT presentation