Title: Classification%20of%20Affective%20States%20%20-%20GP%20Semi-Supervised%20Learning,%20SVM%20and%20kNN
1 Classification of Affective States - GP
Semi-Supervised Learning, SVM and kNN
MAS 622J Course Project
Hyungil Ahn (hiahn_at_media.mit.edu)
2Objective Dataset
- Recognize the affective states of a child solving
- a puzzle
- Affective Dataset
- - 1024 features from Face, Posture, Game
- - 3 affective states, labels annotated by
teachers - High interest (61), Low interest (59),
Refreshing (16) -
3 Task Approaches
- Binary Classification
- High interest (61 samples) vs.
- Low Interest or Refreshing (75 samples)
- Approaches
- - Semi-Supervised Learning Gaussian Process
(GP) - - Support Vector Machine
- - k-Nearest Neighbor (k 1)
4GP Semi-Supervised Learning
- Given , predict the labels of unlabeled
pts - Assume the data, data generation process
- X inputs, y
vector of labels, - t vector of hidden soft labels,
- Each label (binary classification)
- Final classifier y sign t sign
-
- Define
Similarity function
? Infer
given
5 GP Semi-Supervised Learning
Infer
given
? Bayesian Model ?
Prior of the classifier
Likelihood of the classifier given the
labeled data
6 GP Semi-Supervised Learning
? How to model the prior the likelihood ?
The prior Using GP,
(Soft labels vary
smoothly across the data manifold!) The
likelihood
7GP Semi-Supervised Learning
- EP (Expectation Propagation) ? approximating the
posterior as a Gaussian - Select hyperparameter kernel width s, labeling
error rate e that maximizes evidence ! -
- Advantage of using EP ? we get the evidence
- as a side product
- EP estimates the leave-one-out predictive
performance without performing any expensive
cross-validation.
8Support Vector Machine
- OSU SVM toolbox
- RBF kernel
- Hyperparameter C, s Selection ? Use
leave-one-out validation !
9 kNN (k 1)
- The label of test point follows that of its
nearest point -
- This algorithm is simple to implement and the
accuracy of this algorithm can be used as a base
line. - However, sometimes this algorithm gives a good
result !
10Split of the dataset Experiment
- GP Semi-supervised learning
- - randomly select labeled data (p of
overall data), use the remaining data as
unlabeled data, predict the labels of unlabeled
data (In this setting, unlabeled data test
data) - - 50 tries for each p (p 10, 20, 30, 40,
50) - - Each time select the hyperparameter that
maximizes the evidence from EP -
- SVM and kNN
- - randomly select train data (p of overall
data), use the remaining data as test data,
predict the labels of test data - - 50 tries for each p (p 10, 20, 30, 40, 50)
- - In the SVM, leave-one-out validation for
hyperparameter selection was achieved by using
the train data
11GP evidence accuracy
The case of Percentage of train points per class
50 (average over 10 tries) (Note) An offset
was added to log evidence to plot all curves in
the same figure. Max of Rec Accuracy Max of
Log Evidence ? Find the optimal hyperparameter
by using evidence from EP
12SVM hyperparameter selection
Evidence from Leave-one-out validation
Log (C)
Select the hyperparameter C, sigma that
maximizes the evidence from leave-one-out
validation !
13Classification Accuracy
As expected, kNN is bad at small of train pts
and better at large of train pts SVM has good
accuracy even when the of train pts is small,
why? GP has bad accuracy when the of train pts
is large, why?
14Analysis-SVM
Why does SVM give a good test accuracy even when
the number of train points is small ?
- The best things I can tell
- Support Vectors / of Train Points is high
in this task, in particular when the percentage
of train points is low. - The support vectors decide the decision
boundary. But it is not guaranteed that the SV
ratio is highly related with the test accuracy. - Actually it is known that Leave-one-out
CV error is less than Support Vectors /
of Train Points. - 2. CV accuracy rate is high even when the
of train pts is small. CV accuracy rate is very
related with Test accuracy rate. -
15Analysis-GP
Why does GP give a bad test accuracy when the
number of train points is small ?
Percentage of train points per class 50 Max
of Rec Accuracy Max of Log Evidence
Percentage of train points per class 10 Log
Evidence curve is flat ? fail to find optimal
Sigma !
16Conclusion
- GP
- Small number of train points ? bad accuracy
- Large number of train points ? good accuracy
- SVM
- Regardless of the number of train points ? good
accuracy - kNN (k 1)
- Small number of train points ? bad accuracy
- Large number of train points ? good accuracy