FODAVA-Lead:%20Dimension%20Reduction%20and%20Data%20Reduction:%20Foundations%20for%20Visualization - PowerPoint PPT Presentation

About This Presentation
Title:

FODAVA-Lead:%20Dimension%20Reduction%20and%20Data%20Reduction:%20Foundations%20for%20Visualization

Description:

FODAVA-Lead: Dimension Reduction and Data Reduction: Foundations for Visualization Haesun Park Division of Computational Science and Engineering – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 16
Provided by: Richar1157
Category:

less

Transcript and Presenter's Notes

Title: FODAVA-Lead:%20Dimension%20Reduction%20and%20Data%20Reduction:%20Foundations%20for%20Visualization


1
FODAVA-Lead Dimension Reduction and Data
Reduction Foundations for Visualization
  • Haesun Park
  • Division of Computational Science and Engineering
  • College of Computing
  • Georgia Institute of Technology
  • FODAVA Kick-off Meeting, Sep. 2008

2
FODAVA-Lead Proposed Research
  • Fundamental Challenges two important constraints
    on Data and Visual Analytics system
  • Speed necessary for real-time, interactive use
  • Even back-end data analysis and transformation
    operations must appear to be essentially
    instantaneous to users, massive size poses
    challenges
  • Screen Space number of available pixels
    fundamentally limiting constraint
  • Effective representation and efficient
    transformation of large data sets by data
    reduction and dimension reduction

3
FODAVA-Lead Research Goals
  • Development of Fundamental Theory and Algorithms
    in Data Representations and Transformations to
    enable Visual Understanding
  • Dimension Reduction
  • Feature selection by sparse recovery
  • Manifold learning
  • Dimension reduction with prior info/interpretabili
    ty constraints
  • Data Reduction
  • Multi-resolution data approximation
  • Anomaly cleaning and detection
  • Data Fusion
  • Fast Algorithms
  • Large-scale optimization problems/matrix
    decompositions
  • Dynamic and time-varying data
  • Integration with DAVA systems (e.g.Text Analysis
    and Jigsaw)

4
Research Interests (H. Park)
  • Efficient and Effective Numerical Algorithm
    Development and Analysis
  • Algorithms for Massive Data Analysis
  • Dimension Reduction
  • Clustering and Classification
  • Adaptive Methods
  • Applications
  • Microarray analysis gene selection, missing
    value estimation
  • Protein structure prediction
  • Biometric Recognition
  • Text Analysis
  • Effective Dimension Reduction with Prior
    Knowledge
  • Dimension Reduction for Clustered Data

    Linear Discriminant Analysis (LDA), Generalized
    LDA (LDA/GSVD), Orthogonal
    Centroid Method (OCM), Fast Adaptive algorithms
  • Dimension Reduction for Nonnegative Data
    Nonnegative Matrix Factorization (NMF)
  • Applications Text Classification, Face
    Recognition, Fingerprint Classification, Gene
    Clustering in Microarray Analysis

5
2D Representation Utilize Cluster Structure if
Known
LDAPCA(2)
SVD(2)
PCA(2)
2D representation of 700x1000 data with 7
clusters LDA vs. SVD vs. PCA
6
Dimension Reduction for Clustered Data (LDA/GSVD)
(Howland, Jeon, Park SIMAX 03, Howland Park
TPAMI 04) Measure for Cluster Quality
  • A a1 ... an mxn, clustered data
    Ni
    items in class i, Ni ni , total r
    classesci centroid, c global centroid

Sw ?1 i r ? j?Ni (aj ci ) (aj ci )T
Sb ?1 i r ? j ?Ni (ci c) (ci c)T
St ?1 i n (ai c ) (ai c )T
High quality clusters have
small trace(Sw) large
trace(Sb)
Want G mxq
s.t. min trace(GT
SwG) max trace(GT Sb G)
Sw-1Sb x l x ? SbxlSwx ? a 2Hb HbTx b 2Hw
HwTx
GSVD UT HbT X D1 ,
VT HwT X D2
7
QRD Preprocessing in Dim. Reduction (Distance
Preserving Dim. Redution)
For under-sampled data Amxn, mgtgtn
A
Q1
R
Q1
Q2
R


0
Q1 orthonormal basis for range(A) when
rank(A)n Dimension reduction of A by Q1T, Q1T A
R nxn Q1T preserves distance in L2 norm
ai 2 Q1T ai 2 ai
- aj 2 Q1T (ai - aj )2 in cos distance
cos(ai, aj) cos(Q1T ai, Q1T aj)
  • Applicable to PCA, LDA, LDA/GSVD, regLDA,
    Isomap, LLE,
  • Updating and Downdating can be done fast,
    important for iterative vis.

8
Speed Up with QRD Preprocessing(computation time)

Data Dim. r LDA/GSVD regLDA (LDA) QRLDA/GSVD QRLDA/regGSVD
Text 5896 x 210 7 48.8 42.2 0.14 0.03

Yale 77760 x 165 15 -- -- 0.96 0.22

ATT 10304 x 400 40 -- -- 0.07 0.02

Feret 3000 x 130 10 10.9 9.3 0.03 0.01

OptDigit 64 x 5610 10 8.97 9.60 0.02

Isolet 617 x 7797 26 98.1 99.33 6.70

9
LDA for Data with Sub-clusters Facial
Recognition Cross-Language Processing
Sports
Sentiment 1
Sentiment 2
Technology
Person 1
English
Person 2
Korean
Person 3
  • Unimodal Gaussian assumption for each cluster
    in LDA may not hold when sub-cluster structure
    exists.

Sentiment Recognition PCA LDA tensorFaces Regularized h-LDA
Accuracy() 63.53 75.83 69.61 81.95
10
Dimension Reduction for Visualization of
Clustered Data
max trace ((GT SwG)-1 (GT Sb G)) ? LDA
(Fisher 36, Rao 48) max trace (GT Sb G) ?
Orthogonal Centroid(Park et al. 03) IN-SPIRE OC
with rank(G)2, can be updated easily and
nonlinearized max trace (GT (SwSb
)G) ? PCA (Hotelling 33)
max trace (GT (AAT )G) ? LSI (Deerwester et al.
90) (
11
Nonlinear Discriminant Analysis by Kernel
Functions
F
2D

Left Loop Right Loop Whorl
Arch Tented Arch
Construction of Directional Images by DFT
1. Compute directionality in local neighborhood
by FFT 2. Compute the dominant direction 3. Find
core point for unified centering of fingerprints
within the same class
12
Fingerprint Classification Results on NIST
Fingerprint Database 4
(C. Park and H. Park , Pattern Recognition, 06)
KDA/GSVD Nonlinear Extension of LDA/GSVD
based on Kernel Functions
Rejection rate() 0 1.8
8.5 KDA/GSVD 90.7
91.3 92.8 kNN NN Jain et al., 99 -
90.0 91.2 SVM Yao et
al., 03 - 90.0 92.2
4000 fingerprint images of size 512x512 By
KDA/GSVD, dimension reduced from 105x105 to 4
13
Nonnegativity Preserving Dim. Reduction
Nonnegative Matrix Factorization
(PaateroTappa 94, LeeSeung NATURE 99, Pauca et
al. SIAM DM 04, Hoyer 04, Lin 05, Berry 06, Kim
and Park 06 Bioinformatics, Kim and Park 08 SIAM
Journal on Matrix Analysis and Applications, )
A
W
H
  • min A WH F
  • Wgt0, Hgt0


Why Nonnegativity Constraints?
  • Better Approx. vs. Better Representation/Interpre
    tation
  • Nonnegative Constraints often physically
    meaningful
  • Interpretation of analysis results possible
  • Fastest Algorithm for NMF, with theoretical
    convergence
  • Can be used as a clustering algorithm

14
How this research will influence FODAVA
  • Better Representation and Transformation of
    Data Improved theory and methods that more
    accurately incorporates prior knowledge
  • Capacity to Process More Data Faster Fast and
    scalable algorithms that can represent and
    transform larger data sets in shorter time
  • Improved Visual Interaction Capability Fast
    algorithms for efficient handling of dynamic and
    transient data
  • Information Synthesis Visual representation of
    information of different types on one map

15
Developing New Understanding
  • Dimension reduction in DAVA requires
    new modeling, optimization criteria, algorithms
  • Design efficient and effective algorithms for
    data representation and transformation. Balance
    between speed and accuracy
  • Will address more on community building plans
    tomorrow. Thank you!
Write a Comment
User Comments (0)
About PowerShow.com