Data Exploration with DAVIS - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Data Exploration with DAVIS

Description:

Moon HUH1, KwangRyeol SONG2, YoungSuk PARK1, KyungWook Shim ... Coloring a subset outlier detection. Oct. 28-2005. Variable Selection. 20 ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 32
Provided by: jiaw185
Category:

less

Transcript and Presenter's Notes

Title: Data Exploration with DAVIS


1
Data Exploration with DAVIS
  • Moon HUH1, KwangRyeol SONG2, YoungSuk PARK1,
  • KyungWook Shim
  • 1Sungkyunkwan University, Seoul, Korea
  • 2 Kwansei Research Institute, Seoul, Korea

2
Purpose of DAVIS
  • to visually explore the structure or pattern of
    data

3
Components of DAVIS
  • Data Manipulation
  • Statistical Tools
  • Plots
  • Graphic Controllers

4
Data Manipulation
  • Observation/variable selection
  • Focusing/deleting a subset of data set
  • Missing value process
  • Discretization

5
Plots - Univariate
  • Bar Charts
  • Histogram
  • QQ Plot
  • FEDF
  • BoxPlot
  • Parallel Coordinates

6
BoxPlot Features
  • Standardization
  • Indentification

7
Parallel Coordinates Features
  • Direction of Plotting Horizontal / Vertical
  • Ordering of the Variables Component /
    Permutation
  • Jittering

8
Parallel Coordinates -options
9
Plots-multivariate
  • Scatterplot
  • Loess curve fitting
  • Touring
  • Dendrogram
  • Line Mosaic Plot
  • PCA plot

10
Scatterplot-options
11
Touring GrandTour/Tracking
12
Dendrogram Agglomeration /Distance options
13
Line Mosaic Plot for discrete data
14
PCA plot
15
Real time grouping with DAVIS - hiliting
  • Manually grouping the data set into 2 subsets
  • by mouse brushing a subset of data
  • Always can go back to the original data set

16
Real time grouping with DAVISdeleting/focusing
17
Interactive Clustering with DAVIS-linking
18
Clustering with DAVIS EM with 3 groups
19
Coloring a subset outlier detection
20
Touring with DAVIS- Tracking
  • Can investigate multidimensional structure of the
    data

21
Data exploration with Decision Trees-Titanic data
22
Decision Trees-2
23
Variable selection with DAVIS
  • Target (Class) variable
  • discrete (nominal) type
  • Candidate variables
  • nominal, numerical, and complex type

24
Variable subset selection methods
  • MDI ( Lee and Huh, 2003) . using p-values for the
    test statistics between the 2 variables.
  • log (p-value) is suggested
  • ReliefF (Kira and Randell, 1992)
  • Relief (x)Pdifferent value of X different
    class -
  •          Pdifferent value of X same class
  • Mutual Information (originated by Shanon, 1948
    and used for the measure of dependence by Perez,
    1957, Russian)
  • Darbellay (1999, CSDA) gives a good survey on
    the measure of statistical dependence using MI

25
Subset selection with DAVIS ranking variables
  • MDI (meaured of departure from indep.)
  • ReliefF
  • MI (measure of Information)

26
Subset selection with DAVIS-decision trees
  • Discretization required

27
Subset Selection with DAVIS- stepwise
discriminant analysis
  • Continuous variables only
  • Good under normality

28
Subset Selection with DAVIS- Mutual Information
  • Conventional approach
  • Discretization required
  • Normal mixture approach
  • Good for continuous variables
  • Incremental Algorith
  • Good for complex data

29
Variable Selection with DAVIS-design layout
30
Variable selection titanic data
  • Variable ranking sex, class, age
  • subset selection age, class

31
Concluding remarks
  • DAVIS is a Java-based system
  • Any statistical model can be added to the system
    as a visual component if it follows certain
    rules.
  • Need more efficient design layout for various
    strategies of variable selection.
  • Need to coin easier-to-understand terminologies
    for various elements of the component.
Write a Comment
User Comments (0)
About PowerShow.com