Explaining High-Dimensional Data - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Explaining High-Dimensional Data

Description:

a class, and enclose them in a convex shape. Goal. To understand the properties of the data enclosed by a convex shape. Convex Shapes ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 20
Provided by: wor118
Category:

less

Transcript and Presenter's Notes

Title: Explaining High-Dimensional Data


1
Explaining High-Dimensional Data
  • Hoa Nguyen, Rutgers University
  • Mentors Ofer Melnik
  • Kobbi Nissim

2
High-Dimensional Data
  • A great deal of data from different domains
    (medicine, finance, science) is high-dimensional.
  • High-dimensional data is hard to visualize and
    understand.

3
Example
  • Given a set of images (represented by 8x8 pixel
    matrices), we could consider each image as a
    point in 64-dimensional space.

4
Analyzing Data
  • Instead of visualizing, we find properties of
    data to describe it.
  • Statistics, Data Mining, Machine Learning
  • Finding properties of data directly
  • Finding models to capture data

5
Classifier
  • Given a set of data points with assigned labels.
  • Build a model for the data.
  • Use this model to label new unclassified data
    points.

6
The Geometrical View of Classifiers
  • Example
  • Given a set of data points in 2
    dimensional-space with a or label for each
    point
  • (x i ,y i)

_
_
_
_



_




_
_
_
  • We are interested in classifiers that take all
    the points of
  • a class, and enclose them in a convex shape.

7
Goal
  • To understand the properties of the data
    enclosed by a convex shape.

8
Convex Shapes
  • The convex hull
  • The convex hull C of a set of points is the
    smallest convex
  • set that includes all the points.
  • Problem
  • It is difficult to study the convex hull
    directly.
  • Our solution
  • Instead of looking at the convex hull, we use the
    simpler convex
  • shape to approximate the hull the ellipsoid.

9
MVE (Minimum Volume Ellipsoid)
  • Example










  • Advantage
  • Use an ellipsoid to approximate the convex
    region, or to bound the geometry of the convex
    hull.

10
Using MVE to approximate the convex hull
  • John 1948 has shown that if we shrink the
    minimum
  • volume outer ellipsoid of a convex set C by a
    factor k
  • about its center, we obtain an ellipsoid
    contained in C.
  • (k is the dimension of the space)
  • Example











11
Calculate the MVE
  • The MVE is described by the equation
  • v ?-1 v k
  • Vv1,v2,,vh ? Rk.
  • v is an exterior point
  • ? the scatter matrix
  • ? ?wiviviT
  • the eigen vectors of ? correspond to the
    directions of the ellipsoid axes.
  • the eigen values of ?correspond to the
    half-lengths or radii of the axes.
  • k a constant equal to the dimension of the space
  • wi the weight of a point vi.

h
i1
12
Calculate the MVE (cont.)
  • Titterington 1978
  • An algorithm to calculate the weights of MVE
  • A point has a positive weight if it lies on the
    surface of the ellipsoid.
  • At least k1 points have non-zero weights.
  • At most k(k3)/21 points have non-zero weights.

13
Use MVE for data analysis
  1. Finding extreme points by looking at points on
    the ellipsoid surface.
  2. Finding the subspace of data by looking at the
    directions where the ellipsoid is thin.

14
Points on the surface of the ellipsoid
  • i.e., points with non-zero weights.
  • Example In our hand-written digit file, there
    are 376 points which belong to class 0. By
    using MVE, we find that there are about 178
    points with non-zero weights, i.e., these points
    lie on the surface of the MVE.

The mean-zero
Some 0 points on the surface of the MVE
15
Directions where the ellipsoid is thin
  • The directions and size of an ellipsoids axes
    correspond to the eigen vectors and values of its
    scatter matrix.
  • Direction of thinness A short axis defines a
    direction in which the data does not extend.
  • If V is a zero-valued eigen vector, then it
    defines a constraint for any data point x Vx0

16
A simple Null Space
  • Any basis for the Null Space of the scatter
    matrix is an equivalent set of constraints.
  • In order to understand the data, we would like to
    find constraints that are easy to interpret.
  • Goal simplify the null space basis, e.g.,find a
    basis with many zeros.

17
The Null Space Problem (NSP)
  • The Null Space Problem is defined as finding the
    basis with the maximal number of zeros.
  • It is an NP-hard problem. Pothen, Coleman 1986
  • An approach
  • Find a heuristic algorithm to simplify an
    existing basis of the null space of Class 0
    e.g., using Gaussian elimination to get a null
    space basis with more 0 components.

18
The null space basis the set of eigenvectors
with 0-eigenvalues
  • The null space basis after using Gaussian
    elimination

19
Summary
  • Data analysis
  • Classifier with convex shape
  • MVE
  • Points on the surface of the MVE
  • Simple basis for the null space
Write a Comment
User Comments (0)
About PowerShow.com