Functional Nonparametric Unsupervised Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Functional Nonparametric Unsupervised Classification

Description:

Well-known statistical symbols describing the data ... Group 22 is not even tried to divide, because it is found out to be unimodal ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 18
Provided by: anttiso
Category:

less

Transcript and Presenter's Notes

Title: Functional Nonparametric Unsupervised Classification


1
Functional Nonparametric Unsupervised
Classification
  • T-61.6030 Functional Data Analysis
  • Antti Sorjamaa
  • Time Series Prediction and Chemoinformatics Group
  • Laboratory of Computer and Information Science
  • Helsinki University of Technology

2
Outline
  • Mean, Median and Mode
  • Measuring Heterogeneity
  • Building a Partition
  • Algorithm
  • Example
  • Comments and Conclusions

3
Mean, Median and Mode
  • Well-known statistical symbols describing the
    data
  • Mean has problems with robustness, less with
    median and mode
  • Usage of these three symbols need the definition
    of a metric (or semi)
  • Mode is used as a local maximum instead of the
    global one

4
Mean Example
Speech dataThe solid line is the mean and
other lines the means of the subgroups.Global
mean does not represent much of anything
5
Mean Example (2)
Spectrometric dataThe solid line is the mean
and other lines the means of the
subgroups.Global mean close to submeans with a
vertical shift
6
Measuring Heterogeneity
  • Heterogeneity Index
  • Subsampled Heterogeneity Index(close to
    Bootstrapping)

7
Splitting Score
  • Partitioning Heterogeneity Index
  • Splitting Score

8
Building a Partition
  • Small Ball Probability and Probability Curves
  • Sample partition simply dividing the Probability
    Curves to closest Mode
  • Optimal Bandwidth selected by minimizing the
    Entropy, which maximizes the Heterogeneity

9
Building a Partition (2)
10
Algorithm
  • Step 1
  • If only 1 Mode ? Stop
  • Step 2
  • Split the sample and compute SC
  • If SC gt t, goto Step 1 for each subclass
  • Step 3 Stop

11
Example
12
Example (2)
  • Semi-metric based on the second derivative
  • Mean and Mode as the measurements in the
    Heterogeneity Index
  • Then the algorithm is used and the data split
    accordingly

13
Example (3)
Resulting division is in 3 groups.Note that
Group 22 is not even tried to divide, because it
is found out to be unimodal
14
Example (4)
15
Conclusions
  • Good results with the Spectrometric dataset
  • Better selection of semi-metrics could improve
    the results
  • Metric selection a problem for all methods
  • Field is just opened, needs more work
  • Many open questions remain

16
Open Questions
  • Is it possible to get rates of convergence?
  • Convergence of the estimator? Is it strictly
    sample dependent?
  • Fully k-means procedure?

17
  • Questions?
  • Antti.Sorjamaa_at_hut.fi
Write a Comment
User Comments (0)
About PowerShow.com