Machine Learning Methods for Human-Computer Interaction - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Learning Methods for Human-Computer Interaction

Description:

... Massachussetts Institute of Technology, MA ... KDE for activity recognition data KDE for gesture recognition data Other density estimation methods ... – PowerPoint PPT presentation

Number of Views:294
Avg rating:3.0/5.0
Slides: 164
Provided by: ker105
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning Methods for Human-Computer Interaction


1
Machine Learning Methods for Human-Computer
Interaction
  • Kerem Altun
  • Postdoctoral Fellow
  • Department of Computer Science
  • University of British Columbia

IEEE Haptics Symposium March 4, 2012 Vancouver,
B.C., Canada
2
Machine learning
Machine learning
Pattern recognition
Regression
Template matching
Statistical pattern recognition
Structural pattern recognition
Neural networks
Supervised methods
Unsupervised methods
3
What is pattern recognition?
  • title even appears in the International
    Association for Pattern Recognition (IAPR)
    newsletter
  • many definitions exist
  • simply the process of labeling observations (x)
    with predefined categories (w)

4
Various applications of PR
Jain et al., 2000
5
Supervised learning
Can you identify other tufas here?
lifted from lecture notes by Josh Tenenbaum
6
Unsupervised learning
How many categories are there? Which image
belongs to which category?
lifted from lecture notes by Josh Tenenbaum
7
Pattern recognition in haptics/HCI
  • Altun et al., 2010a
  • human activity recognition
  • body-worn inertial sensors
  • accelerometers and gyroscopes
  • daily activities
  • sitting, standing, walking, stairs, etc.
  • sports activities
  • walking/running, cycling, rowing, basketball, etc.

8
Pattern recognition in haptics/HCI
Altun et al., 2010a
walking
basketball
right arm acc
left arm acc
9
Pattern recognition in haptics/HCI
  • Flagg et al., 2012
  • touch gesture recognition on a conductive fur
    patch

10
Pattern recognition in haptics/HCI
Flagg et al., 2012
light touch
stroke
scratch
11
Other haptics/HCI applications?
12
Pattern recognition example
Duda et al., 2000
  • excellent example by Duda et al.
  • classifying incoming fish on a conveyor belt
    using a camera image
  • sea bass
  • salmon

13
Pattern recognition example
  • how to classify? what kind of information can
    distinguish these two species?
  • length, width, weight, etc.
  • suppose a fisherman tells us that salmon are
    usually shorter
  • so, let's use length as a feature
  • what to do to classify?
  • capture image find fish in the image measure
    length make decision
  • how to make the decision?
  • how to find the threshold?

14
Pattern recognition example
Duda et al., 2000
15
Pattern recognition example
  • on the average, salmon are usually shorter, but
    is this a good feature?
  • let's try classifying according to lightness of
    the fish scales

16
Pattern recognition example
Duda et al., 2000
17
Pattern recognition example
  • how to choose the threshold?

18
Pattern recognition example
  • how to choose the threshold?
  • minimize the probability of error
  • sometimes we should consider costs of different
    errors
  • salmon is more expensive
  • customers who order salmon but get sea bass
    instead will be angry
  • customers who order sea bass but occasionally get
    salmon instead will not be unhappy

19
Pattern recognition example
  • we don't have to use just one feature
  • let's use lightness and width

each point is a feature vector
2-D plane is the feature space
Duda et al., 2000
20
Pattern recognition example
  • we don't have to use just one feature
  • let's use lightness and width

each point is a feature vector
2-D plane is the feature space
decision boundary
Duda et al., 2000
21
Pattern recognition example
  • should we add as more features as we can?
  • do not use redundant features

22
Pattern recognition example
  • should we add as more features as we can?
  • do not use redundant features
  • consider noise in the measurements

23
Pattern recognition example
  • should we add as more features as we can?
  • do not use redundant features
  • consider noise in the measurements
  • moreover,
  • avoid adding too many features
  • more features means higher dimensional feature
    vectors
  • difficult to work in high dimensional spaces
  • this is called the curse of dimensionality
  • more on this later

24
Pattern recognition example
  • how to choose the decision boundary?

is this one better?
Duda et al., 2000
25
Pattern recognition example
  • how to choose the decision boundary?

is this one better?
Duda et al., 2000
26
Probability theory review
  • a chance experiment, e.g., tossing a 6-sided die
  • 1, 2, 3, 4, 5, 6 are possible outcomes
  • the set of all outcomes W1,2,3,4,5,6 is the
    sample space
  • any subset of the sample space is an event
  • the event that the outcome is odd A1,3,5
  • each event is assigned a number called the
    probability of the event P(A)
  • the assigned probabilities can be selected
    freely, as long as Kolmogorov axioms are not
    violated

27
Probability axioms
  • for any event,
  • for the sample space,
  • for disjoint events
  • third axiom also includes the case
  • die tossing if all outcomes are equally likely
  • for all i16, probability of getting outcome i
    is 1/6

28
Conditional probability
  • sometimes events occur and change the
    probabilities of other events
  • example ten coins in a bag
  • nine of them are fair coins heads (H) and tails
    (T)
  • one of them is fake both sides are heads (H)
  • I randomly draw one coin from the bag, but I
    dont show it to you
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • which of these events would you bet on?

29
Conditional probability
  • suppose I flip the coin five times, obtaining the
    outcome HHHHH (five heads in a row)
  • call this event F
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • which of these events would you bet on now?

30
Conditional probability
  • definition the conditional probability of event
    A given that event B has occurred
  • P(AB) is the probability of events A and B
    occurring together
  • Bayes theorem

read as "probability of A given B"
31
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)
  • we know that F occurred
  • we want to find
  • difficult use Bayes theorem

32
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)

33
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)

probability of observing F if H0 was true
prior probability (before the observation F)
posterior probability
total probability of observing F
34
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)

total probability of observing F
35
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)

1
1
36
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)

1
1/10
1
1/10
37
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)

1
1/10
1
1/10
1/32
38
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)

1
1/10
1
1/10
1/32
9/10
39
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)

1
1/10
32/41
1
1/10
1/32
9/10
which event would you bet on?
40
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)
  • this is very similar to a pattern recognition
    problem!

1
1/10
32/41
1
1/10
1/32
9/10
41
Conditional probability
  • H0 the coin is fake, both sides H
  • H1 the coin is fair one side H, other side T
  • F obtaining five heads in a row (HHHHH)
  • we can put a label on the coin as fake based on
    our observations!

1
1/10
32/41
1
1/10
1/32
9/10
42
Bayesian inference
  • w0 the coin belongs to the fake class
  • w1 the coin belongs to the fair class
  • x observation
  • decide if the posterior probability
    is higher than others
  • this is called the MAP (maximum a posteriori)
    decision rule

43
Random variables
  • we model the observations with random variables
  • a random variable is a real number whose value
    depends on a chance experiment
  • discrete random variable
  • the possible values form a discrete set
  • continuous random variable
  • the possible values form a continuous set

44
Random variables
  • a discrete random variable X is characterized by
    a probability mass function (pmf)
  • a pmf has two properties

45
Random variables
  • a continuous random variable X is characterized
    by a probability density function (pdf) denoted
    by
  • for all possible values
  • probabilities are calculated for intervals

46
Random variables
  • a pdf also has two properties

47
Expectation
  • definition
  • average of possible values of X, weighted by
    probabilities
  • also called expected value, mean

48
Variance and standard deviation
  • variance is the expected value of deviation from
    the mean
  • variance is always positive
  • or zero, which means X is not random
  • standard deviation is the square root of the
    variance

49
Gaussian (normal) distribution
  • possibly the most ''natural'' distribution
  • encountered frequently in nature
  • central limit theorem
  • sum of i.i.d. random variables is Gaussian
  • definition the random variable with pdf
  • two parameters

50
Gaussian distribution
it can be proved that
figure lifted from http//assets.allbusiness.com
51
Random vectors
  • extension of the scalar case
  • pdf
  • mean
  • covariance matrix
  • covariance matrix is always symmetric and
    positive semidefinite

52
Multivariate Gaussian distribution
  • probability density function
  • two parameters
  • compare with the univariate case

53
Bivariate Gaussian exercise
The scatter plots show 100 independent samples
drawn from zero-mean Gaussian distributions,with
different covariance matrices. Match the
covariance matrices with the scatter plots, by
inspection only.
b
c
a
54
Bivariate Gaussian exercise
The scatter plots show 100 independent samples
drawn from zero-mean Gaussian distributions,with
different covariance matrices. Match the
covariance matrices with the scatter plots, by
inspection only.
b
c
a
55
Bayesian decision theory
  • Bayesian decision theory falls into the
    subjective interpretation of probability
  • in the pattern recognition context, some prior
    belief about the class (category) of an
    observation is updated using the Bayes rule

56
Bayesian decision theory
  • back to the fish example
  • say we have two classes (states of nature)
  • let be the prior probability that the
    fish is a sea bass
  • is the prior probability that the fish
    is a salmon

57
Bayesian decision theory
  • prior probabilities reflect our belief about
    which kind of fish to expect, before we observe
    it
  • we can choose according to the fishing location,
    time of year etc.
  • if we dont have any prior knowledge, we can
    choose equal priors (or uniform priors)

58
Bayesian decision theory
  • let be the feature vector
    obtained from our observations
  • can include features like lightness, weight,
    length, etc.
  • calculate posterior probabilities
  • how to calculate?
  • and

59
Bayesian decision theory
  • is called the class-conditional
    probability density function (CCPDF)
  • pdf of observation x if the true class was
  • the CCPDF is usually not known
  • e.g., impossible to know the pdf of the length of
    all sea bass in the world
  • but it can be estimated, more on this later
  • for now, assume that the CCPDF is known
  • just substitute observation x in

60
Bayesian decision theory
  • MAP rule (also called the minimum-error rule)
  • decide if
  • decide otherwise
  • do we really have to calculate ?

61
Bayesian decision theory
  • multiclass problems
  • if prior probabilities are equal

maximum a posteriori (MAP) decision rule
the MAP rule minimizes the error probability, and
is the best performance that can be achieved (of
course, if the CCPDFs are known)
maximum likelihood (ML) decision rule
62
Exercise (single feature)
  • find
  • the maximum likelihood decision rule

Duda et al., 2000
63
Exercise (single feature)
  • find
  • the maximum likelihood decision rule

Duda et al., 2000
64
Exercise (single feature)
  • find
  • the MAP decision rule
  • if
  • if

Duda et al., 2000
65
Exercise (single feature)
  • find
  • the MAP decision rule
  • if
  • if

Duda et al., 2000
66
Discriminant functions
  • we can generalize this
  • let be the discriminant function for the
    ith class
  • decision rule assign x to class i if
  • for the MAP rule

67
Discriminant functions
  • the discriminant functions divide the feature
    space into decision regions that are separated by
    decision boundaries

68
Discriminant functions for Gaussian densities
  • consider a multiclass problem (c classes)
  • discriminant functions
  • easy to show analytically that the decision
    boundaries are hyperquadrics
  • if the feature space is 2-D, conic sections
  • hyperplanes (or lines for 2-D) if covariance
    matrices are the same for all classes (degenerate
    case)

69
Examples
2-D
3-D
equal and spherical covariance matrices
equal covariance matrices
Duda et al., 2000
70
Examples
Duda et al., 2000
71
Examples
Duda et al., 2000
72
2-D example
  • artificial data

Jain et al., 2000
73
Density estimation
  • but, CCPDFs are usually unknown
  • that's why we need training data

density estimation
parametric
non-parametric
assume a class of densities (e.g. Gaussian), find
the parameters
estimate the pdf directly (and numerically) from
the training data
74
Density estimation
  • assume we have n samples of training vectors for
    a class
  • we assume that these samples are independent and
    drawn from a certain probability distribution
  • this is called the generative approach

75
Parametric methods
  • we will consider only the Gaussian case
  • underlying assumption samples are actually
    noise-corrupted versions of a single feature
    vector
  • why Gaussian? three important properties
  • completely specified by mean and variance
  • linear transformations remain Gaussian
  • central limit theorem many phenomena encountered
    in reality are asymptotically Gaussian

76
Gaussian case
  • assume are drawn from
    a Gaussian distribution
  • how to find the pdf?

77
Gaussian case
  • assume are drawn from
    a Gaussian distribution
  • how to find the pdf?
  • finding the mean and covariance is sufficient

sample mean
sample covariance
78
2-D example
  • back to the 2-D example

calculate
apply the MAP rule
79
2-D example
  • back to the 2-D example

80
2-D example
decision boundary with true pdf
decision boundary with estimated pdf
81
Haptics example
Flagg et al., 2012
light touch
stroke
scratch
which feature to use for discrimination?
82
Haptics example
  • Flagg et al., 2012
  • 7 participants performed each gesture 10 times
  • 210 samples in total
  • we should find distinguishing features
  • let's use one feature at a time
  • we assume the feature value is normally
    distributed, find the mean and covariance

83
Haptics example
assume equal priors apply ML rule
84
Haptics example
apply ML rule decision boundaries? (decision
thresholds for 1-D)
85
Haptics example
  • let's plot the 2-D distribution
  • clearly this isn't a "good" classifier for this
    problem
  • the Gaussian assumption is not valid

86
Activity recognition example
  • Altun et al., 2010a
  • 4 participants (2 male, 2 female)
  • activities standing, ascending stairs, walking
  • 720 samples in total
  • sensor accelerometer on the right leg
  • let's use the same features
  • minimum and maximum values

87
Activity recognition example
feature 1
feature 2
88
Activity recognition example
  • the Gaussian assumption looks valid
  • this is a "good" classifier for this problem

89
Activity recognition example
  • decision boundaries

90
Haptics example
  • how to solve the problem?

91
Haptics example
  • how to solve the problem?
  • either change the classifier, or change the
    features

92
Non-parametric methods
  • let's estimate the CCPDF directly from samples
  • simplest method to use is the histogram
  • partition the feature space into (equally-sized)
    bins
  • count the number of samples in each bin

k number of samples in the bin that includes
x n total number of samples V volume of the bin
93
Non-parametric methods
  • how to choose the bin size?
  • number of bins increase exponentially with the
    dimension of the feature space
  • we can do better than that!

94
Non-parametric methods
  • compare the following density estimates
  • pdf estimates with six samples

image from http//en.wikipedia.org/wiki/Parzen_Win
dows
95
Kernel density estimation
  • a density estimate can be obtained as
  • where the functions are Gaussians centered
    at . More precisely,

K Gaussian kernel hn width of the Gaussian
96
Kernel density estimation
  • three different density estimates with different
    widths
  • if the width is large, the pdf will be too smooth
  • if the width is small, the pdf will be too spiked
  • as the width approaches zero, the pdf converges
    to a sum of Dirac delta functions

Duda et al., 2000
97
KDE for activity recognition data
98
KDE for activity recognition data
99
KDE for gesture recognition data
100
Other density estimation methods
  • Gaussian mixture models
  • parametric
  • model the distribution as sum of M Gaussians
  • optimization algorithm
  • expectation-maximization (EM)
  • k-nearest neighbor estimation
  • non-parametric
  • variable width
  • fixed k

101
Another example
Aksoy., 2011
102
Measuring classifier performance
  • how do we know our classifiers will work?
  • how do we measure the performance, i.e., decide
    one classifier is better than the other?
  • correct recognition rate
  • confusion matrix
  • ideally, we should have more data independent
    from the training set and test the classifiers

103
Confusion matrix
confusion matrix for an 8-class problem Tunçel
et al., 2009
104
Measuring classifier performance
  • use the training samples to test the classifiers
  • this is possible, but not good practice

100 correct classification rate for this
example! because the classifier "memorized" the
training samples instead of "learning" them
Duda et al., 2000
105
Cross validation
  • having a separate test data set might not be
    possible for some cases
  • we can use cross validation
  • use some of the data for training, and the
    remaining for testing
  • how to divide the data?

106
Cross validation methods
  • repeated random sub-sampling
  • divide the data into two groups randomly (usually
    the size of the training set is larger)
  • train and test, record the correct classification
    rate
  • do this repeatedly, take the average

107
Cross validation methods
  • K-fold cross validation
  • randomly divide the data into K sets
  • use K-1 sets for training, 1 set for testing
  • repeat K times, at each fold use a different set
    for testing
  • leave-one-out cross validation
  • use one sample for testing, and all the remaining
    for training
  • same as K-fold cross validation, with K being
    equal to the total number of samples

108
Haptics example
assume equal priors apply ML rule
60.0
the decision region for light touch is too small!!
109
Haptics example
apply ML rule
58.5
110
Haptics example
58.8
62.4
111
Activity recognition example
75.8
71.9
112
Activity recognition example
87.8
113
Another cross-validation method
  • used in HCI studies with multiple human subjects
  • subject-based leave-one-out cross validation
  • number of subjects S
  • leave one subject's data out, train with the
    remaining data
  • repeat for S times, each time test with a
    different subject, then average
  • gives an estimate for the expected correct
    recognition rate when a new user is encountered

114
Activity recognition example
minimum value
maximum value
K-fold
K-fold
75.8
71.9
subject-based leave-one-out
subject-based leave-one-out
60.8
61.6
115
Activity recognition example
K-fold
87.8
subject-based leave-one-out
81.8
116
Dimensionality reduction
Duda et al., 2000
  • for most problems a few features are not enough
  • adding features sometimes helps

117
Dimensionality reduction
Jain et al., 2000
  • should we add as many features as we can?
  • what does this figure say?

118
Dimensionality reduction
  • we should add features up to a certain point
  • the more the training samples, the farther away
    this point is
  • more features higher dimensional spaces
  • in higher dimensions, we need more samples to
    estimate the parameters and the densities
    accurately
  • number of necessary training samples grows
    exponentially with the dimension of the feature
    space
  • this is called the curse of dimensionality

119
Dimensionality reduction
  • how many features to use?
  • rule of thumb use at least ten times as many
    training samples as the number of features
  • which features to use?
  • difficult to know beforehand
  • one approach consider many features and select
    among them

120
Pen input recognition
Willems, 2010
121
Touch gesture recognition
Flagg et al., 2012
122
Feature reduction and selection
  • form a set of many features
  • some of them might be redundant
  • feature reduction (sometimes called feature
    extraction)
  • form linear or nonlinear combinations of features
  • features in the reduced set usually dont have
    physical meaning
  • feature selection
  • select most discriminative features from the set

123
Feature reduction
  • we will only consider Principal Component
    Analysis (PCA)
  • unsupervised method
  • we dont care about the class labels
  • consider the distribution of all the feature
    vectors in the d-dimensional feature space
  • PCA is the projection to a lower dimensional
    space that best represents the data
  • get rid of unnecessary dimensions

124
Principal component analysis
  • how to best represent the data?

125
Principal component analysis
  • how to best represent the data?

find the direction(s) in which the variance of
the data is the largest
126
Principal component analysis
  • find the covariance matrix
  • spectral decomposition
  • eigenvalues on the diagonal of
  • eigenvectors columns of
  • covariance matrix is symmetric and positive
    semidefinite eigenvalues are nonnegative,
    eigenvectors are orthogonal

127
Principal component analysis
  • put the eigenvalues in decreasing order
  • corresponding eigenvectors show the principal
    directions in which the variance of the data is
    largest
  • say we want to have m features only
  • project to the space spanned by the first m
    eigenvectors

128
Activity recognition example
Altun et al., 2010a
  • five sensor units (wrists, legs,chest)
  • each unit has three accelerometers, three
    gyroscopes, three magnetometers
  • 45 sensors in total
  • computed 26 features from sensor signals
  • mean, variance, min, max, Fourier transform etc.
  • 45x261170 features

129
Activity recognition example
  • compute covariance matrix
  • find eigenvalues and eigenvectors
  • plot first 100 eigenvalues
  • reduced the number of features to 30

130
Activity recognition example
131
Activity recognition example
what does the Bayesian decision making (BDM)
result suggest?
132
Feature reduction
  • ideally, this should be done for the training set
    only
  • estimate from the training set, find
    eigenvalues and eigenvectors and the projection
  • apply the projection to the test vector
  • for example for K-fold cross validation, this
    should be done K times
  • computationally expensive

133
Feature selection
  • alternatively, we can select from our large
    feature set
  • say we have d features and want to reduce it to m
  • optimal way evaluate all possibilities
    and choose the best one
  • not feasible except for small values of m and d
  • suboptimal methods greedy search

134
Feature selection
  • best individual features
  • evaluate all the d features individually, select
    the best m features

135
Feature selection
  • sequential forward selection
  • start with the empty set
  • evaluate all features one by one, select the best
    one, add to the set
  • form pairs of features with this one and one of
    the remaining features, add the best one to the
    set
  • form triplets of features with these two and one
    of the remaining features, add the best one to
    the set

136
Feature selection
  • sequential backward selection
  • start with the full feature set
  • evaluate by removing one feature at a time from
    the set, then remove the worst feature
  • continue step 2 with the current feature set

137
Feature selection
  • plus p take away r selection
  • first enlarge the feature set by adding p
    features using sequential forward selection
  • then remove r features using sequential backward
    selection

138
Activity recognition example
first 5 features selected by sequential forward
selection
first 5 features selected by PCA
SFS performs better than PCA for a few features.
If 10-15 features are used, their performances
become closer. Time domain features and leg
features are more discriminative
Altun et al., 2010b
139
Activity recognition example
Altun et al., 2010b
140
Discriminative methods
  • we talked about discriminant functions
  • for the MAP rule we used
  • discriminative methods try to find
    directly from data

141
Linear discriminant functions
  • consider the discriminant function that is a
    linear combination of the components of x
  • for the two-class case, there is a single
    decision boundary

142
Linear discriminant functions
  • for the multiclass case, there are options
  • c two-class problems, separate from others
  • consider classes pairwise

143
Linear discriminant functions
distinguish one class from others
consider classes pairwise
Duda et al., 2000
144
Linear discriminant functions
  • or, use the original definition
  • assign x to class i if

Duda et al., 2000
145
Nearest mean classifier
  • find the means of training vectors
  • assign the class of the nearest mean for a test
    vector y

146
2-D example
  • artificial data

147
2-D example
  • estimated parameters

decision boundary with true pdf
decision boundary with nearest mean classifier
148
Activity recognition example
149
k-nearest neighbor method
  • for a test vector y
  • find the k closest training vectors
  • let be the number of training vectors
    belonging to class i among these k vectors
  • simplest case k1
  • just find the closest training vector assign its
    class
  • decision boundaries
  • Voronoi tessellation of the space

150
1-nearest neighbor
  • decision regions

this is called a Voronoi tessellation
Duda et al., 2000
151
k-nearest neighbor
  • test sample
  • circle
  • class
  • square
  • class
  • triangle
  • note how the decision is different for k3 and
    k5

k3
k5
http//en.wikipedia.org/wiki/K-nearest_neighbor_al
gorithm
152
k-nearest neighbor
  • no training is needed
  • computation time for testing is high
  • many techniques to reduce the computational load
    exist
  • other alternatives exist for computing the
    distance
  • Manhattan distance (L1 norm)
  • chessboard distance (L8 norm)

153
Haptics example
K-fold
63.3
subject-based leave-one-out
59.0
154
Activity recognition example
K-fold
90.0
subject-based leave-one-out
89.2
155
Activity recognition example
decision boundaries for k3
156
Feature normalization
  • especially when computing distances, the scales
    of the feature axes are important
  • features with large ranges may be weighted more
  • feature normalization can be applied so that the
    ranges are similar

157
Feature normalization
  • linear scaling
  • normalization to zero mean unit variance
  • other methods exist

where l is the lowest value and u is the largest
value of the feature x
where m is the mean value and s is the
standard deviation of the feature x
158
Feature normalization
  • ideally, the parameters l, u, m, and s should be
    estimated from the training set only, and then
    used on the test vectors
  • for example for K-fold cross validation, this
    should be done K times

159
Discriminative methods
  • another popular method is the binary decision
    tree
  • start from the root node
  • proceed in the tree by setting thresholds on the
    feature values
  • proceed with sequentially answering questions
    like
  • "is feature j less than threshold value Tk?"

160
Activity recognition example
161
Discriminative methods
Aksoy, 2011
  • one very popular method is the support vector
    machine classifier
  • linear classifier applicable to linearly
    separable data
  • if the data is not linearly separable, maps to a
    higher dimensional space
  • usually a Hilbert space

162
Comparison for activity recognition
  • 1170 features reduced to 30 by PCA
  • 19 activities
  • 8 participants

163
References
  • S. Aksoy, Pattern Recognition lecture notes,
    Bilkent University, Ankara, Turkey, 2011.
  • A. Moore, Statistical Data Mining tutorials
    (http//www.autonlab.org/tutorials)
  • J. Tenenbaum, The Cognitive Science of Intuitive
    Theories lecture notes, Massachussetts Institute
    of Technology, MA, USA, 2006. (accessed online
    http//www.mit.edu/jbt/9.iap/9.94.Tenenbaum.ppt)
  • R. O. Duda, P. E. Hart, D. G. Stork, Pattern
    Classification, 2nd ed., Wiley-Interscience,
    2000.
  • A. K. Jain, R. P. D. Duin, J. Mao, Statistical
    pattern recognition a review, IEEE Transactions
    on Pattern Analysis and Machine Intelligence,
    22(1)437, January 2000.
  • A. R. Webb, Statistical Pattern Recognition, 2nd
    ed., John Wiley Sons, West Sussex, England,
    2002.
  • V. N. Vapnik, The Nature of Statistical Learning
    Theory, 2nd ed., Springer-Verlag New York, Inc.,
    2000.
  • K. Altun, B. Barshan, O. Tuncel, (2010a)
    Comparative study on classifying human
    activities with miniature inertial/magnetic
    sensors, Pattern Recognition, 43(10)36053620,
    October 2010.
  • K. Altun, B. Barshan, (2010b) "Human activity
    recognition using inertial/magnetic sensor
    units," in Human Behavior Understanding, Lecture
    Notes in Computer Science, A.A.Salah et al.
    (eds.), vol. 6219, pp. 3851, Springer, Berlin,
    Heidelberg, August 2010.
  • A. Flagg, D. Tam, K. MacLean, R. Flagg,
    Conductive fur sensing for a gesture-aware furry
    robot, Proceedings of IEEE 2012 Haptics
    Symposium, March 4-7, 2012, Vancouver, B.C.,
    Canada.
  • O. Tuncel, K. Altun, B. Barshan, Classifying
    human leg motions with uniaxial piezoelectric
    gyroscopes, Sensors, 9(11)85088546, November
    2009.
  • D. Willems, Interactive Maps using the pen in
    human-computer interaction, PhD Thesis, Radboud
    University Nijmegen, Netherlands, 2010
  • (accessed online http//www.donwillems.ne
    t/waaaa/InteractiveMaps_PhDThesis_DWillems.pdf)
Write a Comment
User Comments (0)
About PowerShow.com