Lecture 4: Homework Discussion, and more on Classification presentation

About This Presentation

Transcript and Presenter's Notes

Title: Lecture 4: Homework Discussion, and more on Classification

1
Lecture 4 Homework Discussion,and more on
Classification

CS 175, Fall 2007
Padhraic Smyth
Department of Computer Science
University of California, Irvine

2
Outline

Discussion of Assignment 1
Classification revisited
Discussion of Assignment 2
Due Wednesday (tomorrow) at noon

3
Grading of Assignment 1

40 points total
Each MATLAB function 10 points
euclidean.m, nearest_neighbor.m, maxvalue.m
functioning correctly on the test cases 6
points
comments 2 points
error-checking 2 points
test case example
x random vector of length 100
A random matrix with 100 rows and 100 columns

4
Comments on Grading

Common mistakes
incorrect definition of Euclidean distance
dE(x, y) sqrt(S (xi - yi)2 )
no error-checking
nearest_neighbor(x, A) gt check that
cols(A) cols(x)
rows(x) 1
no comments
no comments in header
no comments in body of nearest_neighbor.m
If you find any errors in the grading of your
assignment please see Nathan during lab hours (or
email him to make an appointment)
no grade negotiating!

5
Suggestions

Improve the performance by vectorization
can speed-up significantly
e.g., calculate vector distance in Euclidean.m
function
Do not output input / intermediate / output
variables to screen
can increase your run-time significantly
use semicolon in the end of each line
Helpful commands
To learn more about the function
help
To find a built-in function
lookfor

6
Suggestions (2)

Test your code (!)
Some .m functions that were submitted did not run
2 types of errors
Simple syntax errors (understandable)
Systematic errors
Incorrect calculations (e.g.,for Euclidean.m)
Incorrect logic in finding the minimum vector
Sloppy assignment of variables to values
How to address this
Define a set of simple test cases
Run your code and compare with manual calculation
Check that the results make intuitive sense

7
Example of a Euclidean.m function
function dist euclidean(x,y) function dist
euclidean(x,y) Calculates the Euclidean
distance between two vectors x and y
A. Student, CS 175
Inputs x, y 2 vectors of real numbers,
each of size 1 x n Outputs dist the
Euclidean distance between x and y
8
Example of a Euclidean.m function
function dist euclidean(x,y) function dist
euclidean(x,y) Calculates the Euclidean
distance between two vectors x and y
A. Student, CS 175
Inputs x, y 2 vectors of real numbers,
each of size 1 x n Outputs dist the
Euclidean distance between x and y xr, xc
size(x) yr, yc size(y) if (xc yc)
error('input vectors must be the same
length') end if (xr 1 yr 1)
error('inputs must both be row vectors (1 row, n
columns)') end
Error Checking
9
Example of a Euclidean.m function
function dist euclidean(x,y) function dist
euclidean(x,y) Calculates the Euclidean
distance between two vectors x and y
A. Student, CS 175 Inputs
x, y 2 vectors of real numbers, each of size
1 x n Outputs dist the Euclidean
distance between x and y xr, xc size(x) yr,
yc size(y) if (xc yc) error('input
vectors must be the same length') end if (xr
1 yr 1) error('inputs must both be row
vectors (1 row, n columns)') end calculate a
vector of component_by_component distances delta
x - y now calculate the Euclidean
distance dist sqrt(deltadelta)
Note the use of vectorization
10
Min.m function in MATLAB
help min MIN Smallest component. For
vectors, MIN(X) is the smallest element in X. For
matrices, MIN(X) is a row vector containing
the minimum element from each column. For N-D
arrays, MIN(X) operates along the first
non-singleton dimension. Y,I MIN(X)
returns the indices of the minimum values in
vector I. If the values along the first
non-singleton dimension contain more than one
minimal element, the index of the first one is
returned. MIN(X,Y) returns an array the
same size as X and Y with the smallest
elements taken from X or Y. Either one can be a
scalar. Y,I MIN(X,,DIM) operates along
the dimension DIM. When complex, the
magnitude MIN(ABS(X)) is used. NaN's are ignored
when computing the minimum. Example If
X 2 8 4 then min(X,,1) is 2 3 4,
7 3 9
min(X,,2) is 2 and min(X,5) is 2 5 4
3,
5 3 5. See also MAX, MEDIAN,
MEAN, SORT.
11
Example of a maxvalue.m function
function maxvalue, rmax, cmax maxvalue(A)
function maxvalue, rmax, cmax maxvalue(A)
brief description of function here
Your Name, CS 175
Inputs A a matrix of size r x c, with
r rows and c columns Outputs
maxvalue largest entry in A rmax, cmax
integers specifying the (row,column) location
of the max value Get a row vector containing
the maximum value within each column Store
idx_row - a vector containing the location of
the maximum within the column mx_row, idx_row
max(A) find the maximum within this
vector maxvalue, cmax max(mx_row) Use the
idx_row to find the row location of the max rmax
idx_row(cmax)
12
Example of a nearest_neighbor.m function
function y, i, d nearest_neighbor(x, A)
function y, i, d nearest_neighbor(x, A)
Find the row vector y from a matrix of row
vectors A that is closest in Euclidean distance
to row vector x.
A. Student, CS 175 Inputs x a
vector of numbers of size 1 x n A k
vectors of size 1 x n, "stacked" in a k x n
matrix Outputs y the closes vector
in A to x (of size 1 x n) i the integer
(row) index of y in A d the Euclidean
distance between x and y
13
Example of a nearest_neighbor.m function
function y, i, d nearest_neighbor(x, A)
function y, i, d nearest_neighbor(x, A)
Find the row vector y from a matrix of row
vectors A that is closest in Euclidean distance
to row vector x.
A. Student, CS 175 Inputs x a
vector of numbers of size 1 x n A k
vectors of size 1 x n, "stacked" in a k x n
matrix Outputs y the closes vector
in A to x (of size 1 x n) i the integer
(row) index of y in A d the Euclidean
distance between x and y xr, xc
size(x) Ar, Ac size(A) if (xc Ac)
error('input vector x and matrix A must have the
same number of columns') end if (xr 1)
error('input vector x must be a row vector') end
Error Checking
14
For loop version of nearest_neighbor.m function
function y, i, d nearest_neighbor(x, A)
function y, i, d nearest_neighbor(x,
A) .. .. xr, xc size(x) Ar, Ac
size(A) .. "for loop" version of
code distances zeros(Ar,1) preallocate
storage for distances for j1Ar loop over
rows in A y A(j,) distances(j)
euclidean(x,y) end find the minimum distance
and its location d i min(distances) find
the vector (the row in A) corresponding to the
minimum distance y A(i,)
15
(No Transcript)
16
Repmat.m function in MATLAB
help repmat REPMAT Replicate and tile an
array. B REPMAT(A,M,N) replicates and tiles
the matrix A to produce the M-by-N block
matrix B. B REPMAT(A,M N) produces
the same thing. B REPMAT(A,M N P ...)
tiles the array A to produce a
M-by-N-by-P-by-... block array. A can be N-D.
REPMAT(A,M,N) when A is a scalar is commonly
used to produce an M-by-N matrix filled with
A's value. This can be much faster than
AONES(M,N) when M and/or N are large.
Example repmat(magic(2),2,3)
repmat(NaN,2,3) See also MESHGRID.
repmat(1 2, 3, 1) ans 1 2 1
2 1 2
17
Vectorized version of nearest_neighbor.m function
function y, i, d nearest_neighbor(x, A)
function y, i, d nearest_neighbor(x,
A) .. .. VECTORIZED VERSION OF THE CODE
create a matrix of size Ar x xc, where each row
consists of x xmatrix repmat(x,Ar,1)
subtract the components of xmatrix and A, by
matrix subtraction delta xmatrix - A now
square the differences, by component
multiplication squaredelta delta.delta sum
up the squared differences, row by row (note use
of transpose ') distances sqrt(sum(squaredelta'
)') find the minimum distance and its
location (as before) d i min(distances)
find the vector (the row in A) corresponding to
the minimum distance y A(i,)
18
Nearest-Neighbor Classification (revisited)
19
Example of Data from 2 Classes
20
Classifiers and Decision Boundaries

What is a Classifier?
A classifier is a mapping from feature space (a
d-dimensional vector) to the class labels 1, 2,
m
Thus, a classifier partitions the feature space
into m decision regions
The line or surface separating any 2 classes is
the decision boundary

21
2-Class Data with a Linear Decision Boundary
22
Classification Problem with Overlap
23
(No Transcript)
24
Classifiers functions or mappings
Feature Values (which are known, measured)
Predicted Class Value (true class is unknown to
the classifier)
a
b
c
Classifier
d
z
We want a mapping or function which takes any
combination of values x (a, b, d, ..... z) and
will produce a prediction c, i.e., a function c
f(a, b, d, . z) which produces a value c1,
c2,cm The problem is that we dont know
this mapping we have to learn it from data!
25
Classification Accuracy

Say we have N feature vectors
Say we know the true class label for each feature
vector
We can measure how accurate a classifier is by
how many feature vectors it classifies correctly
Accuracy percentage of feature vectors
correctly classified
training accuracy accuracy on training data
test accuracy accuracy on new data not used in
training

26
Some Notation

Training Data
Dtrain x(1), c(1) , x(2), c(2) ,
x(N), c(N)
N pairs of feature vectors and class labels
Feature Vectors and Class Labels
x(i) is the ith training data feature vector
in MATLAB this could be the ith row of an N x d
matrix
c(i) is the class label of the ith feature vector
in general, c(i) can take m different class
values, e.g., c 1, c 2, ...
Let y be a new feature vector whose class label
we do not know, i.e., we wish to classify it.

27
Example
Feature 2
1
1
1
2
2
1
1
2
2
1
2
1
1
2
2
2
Feature 1
28
kNN Decision Boundary (k1)
1
In general Nearest-neighbor classifier produces
piecewise linear decision boundaries
1
1
2
Feature 2
2
1
1
2
2
1
2
1
1
2
2
2
Feature 1
29
K-Nearest Neighbor (kNN) Classifier

Find the k-nearest neighbors to y in Dtrain
i.e., rank the feature vectors according to
Euclidean distance
select the k vectors which are have smallest
distance to y
Classification
ranking yields k feature vectors and a set of k
class labels
pick the class label which is most common in this
set (vote)
classify y as belonging to this class

30
K-Nearest Neighbor (kNN) Classifier

Notes
In effect, the classifier uses the nearest k
feature vectors from Dtrain to vote on the
class label for y
the single-nearest neighbor classifier is the
special case of k1
for two-class problems, if we choose k to be odd
(i.e., k1, 3, 5,) then there will never be any
ties
training is trivial for the kNN classifier,
i.e., we just use Dtrain as a lookup table when
we want to classify a new feature vector
Extensions of the Nearest Neighbor classifier
weighted distances
e.g., if some of the features are more important
e.g., if features are irrelevant
fast search techniques (indexing) to find
k-nearest neighbors in d-space

31
Assignment 2

Due Wednesday..
4 parts
Plot classification data in two-dimensions
Implement a nearest-neighbor classifier
Plot the errors of a k-nearest-neighbor
classifier
Test the effect of the value k on the accuracy of
the classifier

32
Data Structure

simdata1
shortname 'Simulated Data 1'
numfeatures 2
classnames 2x6 char
numclasses 2
description 1x66 char
features 200x2 double
classlabels 200x1 double

33
Plotting Function
function classplot(data, x, y) function
classplot(data, x, y) brief description
of what the function does ......
Your Name, CS 175, date
Inputs data (a structure with the
same fields as described above
your comment header should describe the structure
explicitly) Note that if you are
only using certain fields in the structure
in the function below, you need only
define these fields in the input comments
-------- Your code goes here -------
34
First simulated data set, simdata1
35
Second simulated data set, simdata2
36
Nearest Neighbor Classifier
function class_predictions knn(traindata,train
labels,k, testdata) function
class_predictions knn(traindata,trainlabels,k,
testdata) a brief description of what
the function does ......
Your Name, CS 175, date
Inputs traindata a N1 x d vector of
feature data (the "memory" for kNN)
trainlabels a N1 x 1 vector of classlabels for
traindata k an odd positive integer
indicating the number of neighbors to use
testdata a N2 x d vector of feature data for
testing the knn classifier Outputs
class_predictions N2 x 1 vector of predicted
class values -------- Your code goes
here -------
37
Plotting k-NN Errors
function knn_plot(traindata,trainlabels,k,testdata
,testlabels) function knn_plot(traindata,trainl
abels,k,testdata,testlabels) Predicts
class-labels for the data in testdata using the k
nearest neighbors in traindata, and then plots
the data (using the first 2 dimensions or first
2 features), displaying the data from each
class in different colors, and overlaying circles
on the points that were incorrectly
classified. Inputs traindata a N1 x d
vector of feature data (the "memory" for kNN)
trainlabels a N1 x 1 vector of classlabels for
traindata k an odd positive integer
indicating the number of neighbors to use
testdata a N2 x d vector of feature data for
testing the knn classifier trainlabels a
N2 x 1 vector of classlabels for traindata
38
Accuracy of kNN Classifier as k is varied
function errors knn_error_rates(traindata,tr
ainlabels, testdata, testlabels,kmax,plotflag)
function errors knn_error_rates(traindata,tr
ainlabels, testdata, testlabels,kmax,plotflag)
a brief description of what the function
does ......
Your Name, CS 175, date Inputs
traindata a N1 x d vector of feature data (the
"memory" for kNN)      trainlabels a N1 x 1
vector of classlabels for traindata
testdata a N2 x d vector of feature data for
testing the knn classifier      testlabels a
N2 x 1 vector of classlabels for traindata
     kmax an odd positive integer indicating
the maximum number of neighbors
plotflag (optional argument) if 1, the
error-rates versus k is plotted,
                                   otherwise no
plot. Outputs      errors r x 1
vector of error-rates on testdata, where r is
the                  number of values of k
that are tested.       --------   Your code goes
here -------
39
Training Data and Test Data

Training data
labeled data used to build a classifier
Test data
new data, not used in the training process, to
evaluate how well a classifier does on new data
Memorization versus Generalization
better training_accuracy
memorizing the training data
better test_accuracy
generalizing to new data
in general, we would like our classifier to
perform well on new test data, not just on
training data,
i.e., we would like it to generalize well to new
data
Test accuracy is more important than training
accuracy

40
Test Accuracy and Generalization

The accuracy of our classifier on new unseen data
is a fair/honest assessment of the performance of
our classifier
Why is training accuracy not good enough?
Training accuracy is optimistic
a classifier like nearest-neighbor can construct
boundaries which always separate all training
data points, but which do not separate new points
e.g., what is the training accuracy of kNN, k
1?
A flexible classifier can overfit the training
data
in effect it just memorizes the training data,
but does not learn the general relationship
between x and C
Generalization
We are really interested in how our classifier
generalizes to new data
test data accuracy is a good estimate of
generalization performance

41
Another Example
42
A More Complex Decision Boundary
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
Decision
Decision
Region 1
Region 2
Feature 2
Decision
Boundary
Feature 1
43
Example The Overfitting Phenomenon
Y
X
44
A Complex Model
Y high-order polynomial in X
Y
X
45
The True (simpler) Model
Y a X b noise
Y
X
46
How Overfitting affects Prediction
Predictive Error
Error on Training Data
Model Complexity
47
How Overfitting affects Prediction
Predictive Error
Error on Test Data
Error on Training Data
Model Complexity
48
How Overfitting affects Prediction
Predictive Error
Overfitting
Underfitting
Error on Test Data
Error on Training Data
Model Complexity
Ideal Range for Model Complexity
49
Linear Classifiers
50
Decision Boundaries

What is a Classifier?
A classifier is a mapping from feature space (a
d-dimensional vector) to the class labels 1, 2,
m
Thus, a classifier partitions the feature space
into m decision regions
A line or curve separating the classes is a
decision boundary
in more than 2 dimensions this is a surface
(e.g., a hyperplane)
Linear Classifiers
a linear classifier is a mapping which partitions
feature space using a linear function (a straight
line, or a hyperplane)
it is one of the simplest classifiers we can
imagine
separate the two classes using a straight line
in feature space
in 2 dimensions the decision boundary is a
straight line

51
2-Class Data with a Linear Decision Boundary
52
Non-Linearly Separable Data, with Decision
Boundary
53
Convex Hull of a Set of Points

Convex Hull of a set of Q points
Intuitively
think of each point in Q as a nail sticking out
from a 2d board
the convex hull the shape formed by a tight
rubber band that surrounds all the nails
Formally the convex hull is the smallest convex
polygon P for which each point in Q is either on
the boundary of P or in its interior
(p.898, Cormen, Leiserson, and Rivest,
Introduction to Algorithms)
can be found (for n points) in time n log n
Relation to Class Overlap
define convex hulls of data points D1 and D2 as
P1 and P2
If P1 and P2 do not intersect gt D1 and D2 are
linearly separable
if P1 and P2 intersect, then we have overlap
If P1 and P2 intersect then D1 and D2 are not
linearly separable

54
Convex Hull Example
Feature 2
Feature 1
55
Convex Hull Example
Convex Hull P1
Feature 2
Feature 1
56
Data from 2 Classes Linearly Separable?
Feature 2
x
x
x
x
x
x
x
Feature 1
57
Data from 2 Classes Linearly Separable?
Convex Hull P1
Feature 2
x
x
x
x
x
x
x
Feature 1
58
Data from 2 Classes linearly separable?
Convex Hull P1
Convex Hull P2
Feature 2
x
x
x
x
x
x
x
The 2 Hulls intersect gt data from each class are
not linearly separable
Feature 1
59
Different data that is linearly separable
Convex Hull P1
Convex Hull P2
Feature 2
x
x
x
x
x
x
x
Feature 1
60
Some Theory
Let N be the number of data points Let d be the
dimension of the data points Consider N points
in general position and assume each point is
labeled as belonging to class 1 or class 2 There
are 2N possible labelings Let F(N, d) the
fraction of labelings of N points in d dimensions
that are linearly separable It can be
shown that 1 if d gt N-2
F(N, d) (1 / 2 N-1 )
Sdi0 (N-1)! / (N-1-i)! i! if N gt
d
61
Fraction of Labellings in d-space that are
Linearly Separable
F(N,d) fraction that are linearly separable
d infinity
1
d 10
0.5
d 1
0
N/(d1)
1
2
3
62
Fraction of Labellings in d-space that are
Linearly Separable
F(N,d) fraction that are linearly separable
Note that for N lt d1, any labeling of N points
in d-dimensions is linearly separable (e.g., N3,
d 2 or N50, d100)
1
0.5
0
N/(d1)
1
2
3
63
A Linear Classifier in 2 Dimensions
Let Feature 1 be called X Let Feature 2 be called
Y A linear classifier is a linear function of X
and Y, i.e., it computes f(X,Y) aX bY
c Here a, b, and c are the weights of the
classifier Define the output of the linear
classifier to be T(f) -1,
if f lt 0 T(f) 1, if f gt 0
if f(X,Y) lt 0, the classifier produces a
-1 (Decision Region 1) if f(X,Y) gt 0,
the classifier produces a 1 (Decision Region
2)
64
Decision Boundaries for a 2d Linear Classifier
Depending on whether f(X,Y) is gt or lt 0, the
features (X,Y) get classified into class 1 or
class 2 Thus, f(X,Y) 0 must define the
decision boundary between class 1 and 2
65
Decision Boundaries for a 2d Linear Classifier
Depending on whether f(X,Y) is gt or lt 0, the
features (X,Y) get classified into class 1 or
class 2 Thus, f(X,Y) 0 must define the
decision boundary between class 1 and 2 What is
the equation for this decision boundary?
f(X,Y) aX bY c 0 OR Y (c
aX)/b Thus, defining a, b, and c automatically
locates the decision boundary in X,Y space In
summary - a classifier defines a decision
boundaries between classes - for a linear
classifier, this boundary is a line or a plane -
the equation of the plane is defined by the
parameters of the classifier
66
An Example of a Linear Decision Boundary
67
A Better Linear Decision Boundary
68
The Perceptron Classifier (for 2 features)
X
w1
w2
f w1 X w2 Y w3
-1, 1
Y
T(f)
w3
Threshold Function
Output class decision
Weighted Sum of the inputs
1
69
The Perceptron Classifier (for 2 features)
X
w1
w2
f w1 X w2 Y w3
-1, 1
Y
T(f)
w3
Threshold Function
Output class decision
Weighted Sum of the inputs
1
Note weights w1, w2, w3, are the same as a, b, c
in the previous slides, i.e., f aX bY c
70
Perceptrons

Perceptron a linear classifier
The ws are the weights (denoted as a, b,c,
earlier)
real-valued constants (can be positive or
negative)
Define an additional constant input 1 (allows
an intercept in decision boundary)
A perceptron calculates 2 quantities
1. A weighted sum of the input features
2. This sum is then thresholded by the T function
A simple artificial model of human neurons
weights synapses
threshold neuron firing

71
Notation

Inputs
x1, x2, , xd, xd1
x1, x2, , xd-1, xd are the values of the d
features
xd1 1 (a constant input)
x (x1, x2, , xd, xd1 )
Weights
w1, w2, , wd, wd1
we have d1 weights
one for each feature one for the constant
w (w1, w2, , wd, wd1 )

72
Perceptron Operation

Equations of operation

1 (if w1x1 wd1 xd1 gt
0) ox1, x2,, xd-1, xd
-1 (otherwise) Note that w
(w1,.. wd1) , the weight vector (row
vector, 1 x d1) and x (x1, xd1), the
feature vector (row vector, 1 x d1) gt
w1x1 w2x2 wd1 xd1 w . x and w .
x is the vector inner product (wx or w.x
in MATLAB)
73
Vector Inner Product
This is the transpose of the row vector x (it
becomes a column vector)
Note that w . x (w1,.. wd1) (x1
x2
..
..
xd
xd1 ) w1x1 w2x2 wd1
xd1
74
Perceptron Decision Boundary

Equations of operation (in vector form)

1 (if w . x gt 0) o(x1, x2,,
xd, xd1) -1
(otherwise) The perceptron represents a
hyperplane decision surface in d-dimensional
space e.g., a line in 2d, a plane in 3d,
etc The equation of the hyperplane is w . x
0 This is the equation for points in x-space
that are on the boundary
75
Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
x2
x1
76
Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
w . x 0 gt 1. x1 - 1. x2 0.1 0 gt x1 -
x2 0 gt x1 x2
x2
x1
77
Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
w . x 0 gt 1. x1 - 1. x2 0.1 0 gt x1 -
x2 0 gt x1 x2 This is the equation for the
decision boundary
x2
x1
78
Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
w . x lt 0 gt x1 - x2 lt 0 gt x1 lt x2 (this
is the equation for decision region -1)
w . x 0
x2
x1
79
Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
w . x 0
x2
w . X lt 0
w . x gt 0 gt x1 - x2 gt 0 gt x1 gt x2 (this
is the equation for decision region 1)
x1
80
Representational Power of Perceptrons

What mappings can a perceptron represent
perfectly?
A perceptron is a linear classifier
thus it can represent any mapping that is
linearly separable
some Boolean functions like AND (on left)
but not Boolean functions like XOR (on right)

0
x
0
x
0
0
x
0
81
Summary

Review of Assignment 1
K-nearest-neighbor classifiers
Basic concepts
Assignment 2
Training and test accuracy
Linear classifiers
Perceptron classifier
Next lecture
How we can learn the weights of a perceptron

Write a Comment

User Comments (0)

About PowerShow.com

Lecture 4: Homework Discussion, and more on Classification PowerPoint PPT Presentation