Classification%20of%20Affective%20States%20%20-%20GP%20Semi-Supervised%20Learning,%20SVM%20and%20kNN PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Classification%20of%20Affective%20States%20%20-%20GP%20Semi-Supervised%20Learning,%20SVM%20and%20kNN


1
Classification of Affective States - GP
Semi-Supervised Learning, SVM and kNN
MAS 622J Course Project

Hyungil Ahn (hiahn_at_media.mit.edu)
2
Objective Dataset
  • Recognize the affective states of a child solving
  • a puzzle
  • Affective Dataset
  • - 1024 features from Face, Posture, Game
  • - 3 affective states, labels annotated by
    teachers
  • High interest (61), Low interest (59),
    Refreshing (16)

3
Task Approaches
  • Binary Classification
  • High interest (61 samples) vs.
  • Low Interest or Refreshing (75 samples)
  • Approaches
  • - Semi-Supervised Learning Gaussian Process
    (GP)
  • - Support Vector Machine
  • - k-Nearest Neighbor (k 1)

4
GP Semi-Supervised Learning
  • Given , predict the labels of unlabeled
    pts
  • Assume the data, data generation process
  • X inputs, y
    vector of labels,
  • t vector of hidden soft labels,
  • Each label (binary classification)
  • Final classifier y sign t sign
  • Define

Similarity function
? Infer
given
5
GP Semi-Supervised Learning

Infer
given
? Bayesian Model ?
Prior of the classifier
Likelihood of the classifier given the
labeled data
6
GP Semi-Supervised Learning

? How to model the prior the likelihood ?
The prior Using GP,
(Soft labels vary
smoothly across the data manifold!) The
likelihood
7
GP Semi-Supervised Learning
  • EP (Expectation Propagation) ? approximating the
    posterior as a Gaussian
  • Select hyperparameter kernel width s, labeling
    error rate e that maximizes evidence !
  • Advantage of using EP ? we get the evidence
  • as a side product
  • EP estimates the leave-one-out predictive
    performance without performing any expensive
    cross-validation.

8
Support Vector Machine
  • OSU SVM toolbox
  • RBF kernel
  • Hyperparameter C, s Selection ? Use
    leave-one-out validation !

9
kNN (k 1)
  • The label of test point follows that of its
    nearest point
  • This algorithm is simple to implement and the
    accuracy of this algorithm can be used as a base
    line.
  • However, sometimes this algorithm gives a good
    result !

10
Split of the dataset Experiment
  • GP Semi-supervised learning
  • - randomly select labeled data (p of
    overall data), use the remaining data as
    unlabeled data, predict the labels of unlabeled
    data (In this setting, unlabeled data test
    data)
  • - 50 tries for each p (p 10, 20, 30, 40,
    50)
  • - Each time select the hyperparameter that
    maximizes the evidence from EP
  • SVM and kNN
  • - randomly select train data (p of overall
    data), use the remaining data as test data,
    predict the labels of test data
  • - 50 tries for each p (p 10, 20, 30, 40, 50)
  • - In the SVM, leave-one-out validation for
    hyperparameter selection was achieved by using
    the train data

11
GP evidence accuracy
The case of Percentage of train points per class
50 (average over 10 tries) (Note) An offset
was added to log evidence to plot all curves in
the same figure. Max of Rec Accuracy Max of
Log Evidence ? Find the optimal hyperparameter
by using evidence from EP
12
SVM hyperparameter selection
Evidence from Leave-one-out validation
Log (C)
Select the hyperparameter C, sigma that
maximizes the evidence from leave-one-out
validation !
13
Classification Accuracy
As expected, kNN is bad at small of train pts
and better at large of train pts SVM has good
accuracy even when the of train pts is small,
why? GP has bad accuracy when the of train pts
is large, why?
14
Analysis-SVM
Why does SVM give a good test accuracy even when
the number of train points is small ?
  • The best things I can tell
  • Support Vectors / of Train Points is high
    in this task, in particular when the percentage
    of train points is low.
  • The support vectors decide the decision
    boundary. But it is not guaranteed that the SV
    ratio is highly related with the test accuracy.
  • Actually it is known that Leave-one-out
    CV error is less than Support Vectors /
    of Train Points.
  • 2. CV accuracy rate is high even when the
    of train pts is small. CV accuracy rate is very
    related with Test accuracy rate.

15
Analysis-GP
Why does GP give a bad test accuracy when the
number of train points is small ?
Percentage of train points per class 50 Max
of Rec Accuracy Max of Log Evidence
Percentage of train points per class 10 Log
Evidence curve is flat ? fail to find optimal
Sigma !
16
Conclusion
  • GP
  • Small number of train points ? bad accuracy
  • Large number of train points ? good accuracy
  • SVM
  • Regardless of the number of train points ? good
    accuracy
  • kNN (k 1)
  • Small number of train points ? bad accuracy
  • Large number of train points ? good accuracy
Write a Comment
User Comments (0)
About PowerShow.com