Classification%20of%20Affective%20States%20%20-%20GP%20Semi-Supervised%20Learning,%20SVM%20and%20kNN presentation

About This Presentation

Transcript and Presenter's Notes

Title: Classification%20of%20Affective%20States%20%20-%20GP%20Semi-Supervised%20Learning,%20SVM%20and%20kNN

1
Classification of Affective States - GP
Semi-Supervised Learning, SVM and kNN
MAS 622J Course Project

Hyungil Ahn (hiahn_at_media.mit.edu)
2
Objective Dataset

Recognize the affective states of a child solving
a puzzle
Affective Dataset
- 1024 features from Face, Posture, Game
- 3 affective states, labels annotated by
teachers
High interest (61), Low interest (59),
Refreshing (16)

3
Task Approaches

Binary Classification
High interest (61 samples) vs.
Low Interest or Refreshing (75 samples)
Approaches
- Semi-Supervised Learning Gaussian Process
(GP)
- Support Vector Machine
- k-Nearest Neighbor (k 1)

4
GP Semi-Supervised Learning

Given , predict the labels of unlabeled
pts
Assume the data, data generation process
X inputs, y
vector of labels,
t vector of hidden soft labels,
Each label (binary classification)
Final classifier y sign t sign
Define

Similarity function
? Infer
given
5
GP Semi-Supervised Learning

Infer
given
? Bayesian Model ?
Prior of the classifier
Likelihood of the classifier given the
labeled data
6
GP Semi-Supervised Learning

? How to model the prior the likelihood ?
The prior Using GP,
(Soft labels vary
smoothly across the data manifold!) The
likelihood
7
GP Semi-Supervised Learning

EP (Expectation Propagation) ? approximating the
posterior as a Gaussian
Select hyperparameter kernel width s, labeling
error rate e that maximizes evidence !
Advantage of using EP ? we get the evidence
as a side product
EP estimates the leave-one-out predictive
performance without performing any expensive
cross-validation.

8
Support Vector Machine

OSU SVM toolbox
RBF kernel
Hyperparameter C, s Selection ? Use
leave-one-out validation !

9
kNN (k 1)

The label of test point follows that of its
nearest point
This algorithm is simple to implement and the
accuracy of this algorithm can be used as a base
line.
However, sometimes this algorithm gives a good
result !

10
Split of the dataset Experiment

GP Semi-supervised learning
- randomly select labeled data (p of
overall data), use the remaining data as
unlabeled data, predict the labels of unlabeled
data (In this setting, unlabeled data test
data)
- 50 tries for each p (p 10, 20, 30, 40,
50)
- Each time select the hyperparameter that
maximizes the evidence from EP
SVM and kNN
- randomly select train data (p of overall
data), use the remaining data as test data,
predict the labels of test data
- 50 tries for each p (p 10, 20, 30, 40, 50)
- In the SVM, leave-one-out validation for
hyperparameter selection was achieved by using
the train data

11
GP evidence accuracy
The case of Percentage of train points per class
50 (average over 10 tries) (Note) An offset
was added to log evidence to plot all curves in
the same figure. Max of Rec Accuracy Max of
Log Evidence ? Find the optimal hyperparameter
by using evidence from EP
12
SVM hyperparameter selection
Evidence from Leave-one-out validation
Log (C)
Select the hyperparameter C, sigma that
maximizes the evidence from leave-one-out
validation !
13
Classification Accuracy
As expected, kNN is bad at small of train pts
and better at large of train pts SVM has good
accuracy even when the of train pts is small,
why? GP has bad accuracy when the of train pts
is large, why?
14
Analysis-SVM
Why does SVM give a good test accuracy even when
the number of train points is small ?

The best things I can tell
Support Vectors / of Train Points is high
in this task, in particular when the percentage
of train points is low.
The support vectors decide the decision
boundary. But it is not guaranteed that the SV
ratio is highly related with the test accuracy.
Actually it is known that Leave-one-out
CV error is less than Support Vectors /
of Train Points.
2. CV accuracy rate is high even when the
of train pts is small. CV accuracy rate is very
related with Test accuracy rate.

15
Analysis-GP
Why does GP give a bad test accuracy when the
number of train points is small ?
Percentage of train points per class 50 Max
of Rec Accuracy Max of Log Evidence
Percentage of train points per class 10 Log
Evidence curve is flat ? fail to find optimal
Sigma !
16
Conclusion

GP
Small number of train points ? bad accuracy
Large number of train points ? good accuracy
SVM
Regardless of the number of train points ? good
accuracy
kNN (k 1)
Small number of train points ? bad accuracy
Large number of train points ? good accuracy

Write a Comment

User Comments (0)

About PowerShow.com

Classification%20of%20Affective%20States%20%20-%20GP%20Semi-Supervised%20Learning,%20SVM%20and%20kNN PowerPoint PPT Presentation