FODAVA-Lead:%20Dimension%20Reduction%20and%20Data%20Reduction:%20Foundations%20for%20Visualization - PowerPoint PPT Presentation

About This Presentation

Title:

FODAVA-Lead:%20Dimension%20Reduction%20and%20Data%20Reduction:%20Foundations%20for%20Visualization

Description:

FODAVA-Lead: Dimension Reduction and Data Reduction: Foundations for Visualization Haesun Park Division of Computational Science and Engineering – PowerPoint PPT presentation

Number of Views:146

Avg rating:3.0/5.0

Slides: 16

Provided by: Richar1157

Learn more at: https://fodava.gatech.edu

Category:

more less

Transcript and Presenter's Notes

Title: FODAVA-Lead:%20Dimension%20Reduction%20and%20Data%20Reduction:%20Foundations%20for%20Visualization

1
FODAVA-Lead Dimension Reduction and Data
Reduction Foundations for Visualization

Haesun Park
Division of Computational Science and Engineering
College of Computing
Georgia Institute of Technology
FODAVA Kick-off Meeting, Sep. 2008

2
FODAVA-Lead Proposed Research

Fundamental Challenges two important constraints
on Data and Visual Analytics system
Speed necessary for real-time, interactive use
Even back-end data analysis and transformation
operations must appear to be essentially
instantaneous to users, massive size poses
challenges
Screen Space number of available pixels
fundamentally limiting constraint
Effective representation and efficient
transformation of large data sets by data
reduction and dimension reduction

3
FODAVA-Lead Research Goals

Development of Fundamental Theory and Algorithms
in Data Representations and Transformations to
enable Visual Understanding
Dimension Reduction
Feature selection by sparse recovery
Manifold learning
Dimension reduction with prior info/interpretabili
ty constraints
Data Reduction
Multi-resolution data approximation
Anomaly cleaning and detection
Data Fusion
Fast Algorithms
Large-scale optimization problems/matrix
decompositions
Dynamic and time-varying data
Integration with DAVA systems (e.g.Text Analysis
and Jigsaw)

4
Research Interests (H. Park)

Efficient and Effective Numerical Algorithm
Development and Analysis
Algorithms for Massive Data Analysis
Dimension Reduction
Clustering and Classification
Adaptive Methods
Applications
Microarray analysis gene selection, missing
value estimation
Protein structure prediction
Biometric Recognition
Text Analysis

Effective Dimension Reduction with Prior
Knowledge
Dimension Reduction for Clustered Data

Linear Discriminant Analysis (LDA), Generalized
LDA (LDA/GSVD), Orthogonal
Centroid Method (OCM), Fast Adaptive algorithms
Dimension Reduction for Nonnegative Data
Nonnegative Matrix Factorization (NMF)
Applications Text Classification, Face
Recognition, Fingerprint Classification, Gene
Clustering in Microarray Analysis

5
2D Representation Utilize Cluster Structure if
Known
LDAPCA(2)
SVD(2)
PCA(2)
2D representation of 700x1000 data with 7
clusters LDA vs. SVD vs. PCA
6
Dimension Reduction for Clustered Data (LDA/GSVD)
(Howland, Jeon, Park SIMAX 03, Howland Park
TPAMI 04) Measure for Cluster Quality

A a1 ... an mxn, clustered data
Ni
items in class i, Ni ni , total r
classesci centroid, c global centroid

Sw ?1 i r ? j?Ni (aj ci ) (aj ci )T
Sb ?1 i r ? j ?Ni (ci c) (ci c)T
St ?1 i n (ai c ) (ai c )T
High quality clusters have
small trace(Sw) large
trace(Sb)
Want G mxq
s.t. min trace(GT
SwG) max trace(GT Sb G)
Sw-1Sb x l x ? SbxlSwx ? a 2Hb HbTx b 2Hw
HwTx
GSVD UT HbT X D1 ,
VT HwT X D2
7
QRD Preprocessing in Dim. Reduction (Distance
Preserving Dim. Redution)
For under-sampled data Amxn, mgtgtn
A
Q1
R
Q1
Q2
R

0
Q1 orthonormal basis for range(A) when
rank(A)n Dimension reduction of A by Q1T, Q1T A
R nxn Q1T preserves distance in L2 norm
ai 2 Q1T ai 2 ai
- aj 2 Q1T (ai - aj )2 in cos distance
cos(ai, aj) cos(Q1T ai, Q1T aj)

Applicable to PCA, LDA, LDA/GSVD, regLDA,
Isomap, LLE,
Updating and Downdating can be done fast,
important for iterative vis.

8
Speed Up with QRD Preprocessing(computation time)

Data Dim. r LDA/GSVD regLDA (LDA) QRLDA/GSVD QRLDA/regGSVD
Text 5896 x 210 7 48.8 42.2 0.14 0.03

Yale 77760 x 165 15 -- -- 0.96 0.22

ATT 10304 x 400 40 -- -- 0.07 0.02

Feret 3000 x 130 10 10.9 9.3 0.03 0.01

OptDigit 64 x 5610 10 8.97 9.60 0.02

Isolet 617 x 7797 26 98.1 99.33 6.70

9
LDA for Data with Sub-clusters Facial
Recognition Cross-Language Processing
Sports
Sentiment 1
Sentiment 2
Technology
Person 1
English
Person 2
Korean
Person 3

Unimodal Gaussian assumption for each cluster
in LDA may not hold when sub-cluster structure
exists.

Sentiment Recognition PCA LDA tensorFaces Regularized h-LDA
Accuracy() 63.53 75.83 69.61 81.95
10
Dimension Reduction for Visualization of
Clustered Data
max trace ((GT SwG)-1 (GT Sb G)) ? LDA
(Fisher 36, Rao 48) max trace (GT Sb G) ?
Orthogonal Centroid(Park et al. 03) IN-SPIRE OC
with rank(G)2, can be updated easily and
nonlinearized max trace (GT (SwSb
)G) ? PCA (Hotelling 33)
max trace (GT (AAT )G) ? LSI (Deerwester et al.
90) (
11
Nonlinear Discriminant Analysis by Kernel
Functions
F
2D

Left Loop Right Loop Whorl
Arch Tented Arch
Construction of Directional Images by DFT
1. Compute directionality in local neighborhood
by FFT 2. Compute the dominant direction 3. Find
core point for unified centering of fingerprints
within the same class
12
Fingerprint Classification Results on NIST
Fingerprint Database 4
(C. Park and H. Park , Pattern Recognition, 06)
KDA/GSVD Nonlinear Extension of LDA/GSVD
based on Kernel Functions
Rejection rate() 0 1.8
8.5 KDA/GSVD 90.7
91.3 92.8 kNN NN Jain et al., 99 -
90.0 91.2 SVM Yao et
al., 03 - 90.0 92.2
4000 fingerprint images of size 512x512 By
KDA/GSVD, dimension reduced from 105x105 to 4
13
Nonnegativity Preserving Dim. Reduction
Nonnegative Matrix Factorization
(PaateroTappa 94, LeeSeung NATURE 99, Pauca et
al. SIAM DM 04, Hoyer 04, Lin 05, Berry 06, Kim
and Park 06 Bioinformatics, Kim and Park 08 SIAM
Journal on Matrix Analysis and Applications, )
A
W
H

min A WH F
Wgt0, Hgt0

Why Nonnegativity Constraints?

Better Approx. vs. Better Representation/Interpre
tation
Nonnegative Constraints often physically
meaningful
Interpretation of analysis results possible
Fastest Algorithm for NMF, with theoretical
convergence
Can be used as a clustering algorithm

14
How this research will influence FODAVA

Better Representation and Transformation of
Data Improved theory and methods that more
accurately incorporates prior knowledge
Capacity to Process More Data Faster Fast and
scalable algorithms that can represent and
transform larger data sets in shorter time
Improved Visual Interaction Capability Fast
algorithms for efficient handling of dynamic and
transient data
Information Synthesis Visual representation of
information of different types on one map

15
Developing New Understanding

Dimension reduction in DAVA requires
new modeling, optimization criteria, algorithms
Design efficient and effective algorithms for
data representation and transformation. Balance
between speed and accuracy
Will address more on community building plans
tomorrow. Thank you!

Write a Comment

User Comments (0)