Title: Lecture 4: Homework Discussion, and more on Classification
1Lecture 4 Homework Discussion,and more on
Classification
- CS 175, Fall 2007
- Padhraic Smyth
- Department of Computer Science
- University of California, Irvine
2Outline
- Discussion of Assignment 1
- Classification revisited
- Discussion of Assignment 2
- Due Wednesday (tomorrow) at noon
3Grading of Assignment 1
- 40 points total
- Each MATLAB function 10 points
- euclidean.m, nearest_neighbor.m, maxvalue.m
- functioning correctly on the test cases 6
points - comments 2 points
- error-checking 2 points
- test case example
- x random vector of length 100
- A random matrix with 100 rows and 100 columns
4Comments on Grading
- Common mistakes
- incorrect definition of Euclidean distance
- dE(x, y) sqrt(S (xi - yi)2 )
- no error-checking
- nearest_neighbor(x, A) gt check that
- cols(A) cols(x)
- rows(x) 1
- no comments
- no comments in header
- no comments in body of nearest_neighbor.m
- If you find any errors in the grading of your
assignment please see Nathan during lab hours (or
email him to make an appointment) - no grade negotiating!
5Suggestions
- Improve the performance by vectorization
- can speed-up significantly
- e.g., calculate vector distance in Euclidean.m
function - Do not output input / intermediate / output
variables to screen - can increase your run-time significantly
- use semicolon in the end of each line
- Helpful commands
- To learn more about the function
- help
- To find a built-in function
- lookfor
6Suggestions (2)
- Test your code (!)
- Some .m functions that were submitted did not run
- 2 types of errors
- Simple syntax errors (understandable)
- Systematic errors
- Incorrect calculations (e.g.,for Euclidean.m)
- Incorrect logic in finding the minimum vector
- Sloppy assignment of variables to values
- How to address this
- Define a set of simple test cases
- Run your code and compare with manual calculation
- Check that the results make intuitive sense
7Example of a Euclidean.m function
function dist euclidean(x,y) function dist
euclidean(x,y) Calculates the Euclidean
distance between two vectors x and y
A. Student, CS 175
Inputs x, y 2 vectors of real numbers,
each of size 1 x n Outputs dist the
Euclidean distance between x and y
8Example of a Euclidean.m function
function dist euclidean(x,y) function dist
euclidean(x,y) Calculates the Euclidean
distance between two vectors x and y
A. Student, CS 175
Inputs x, y 2 vectors of real numbers,
each of size 1 x n Outputs dist the
Euclidean distance between x and y xr, xc
size(x) yr, yc size(y) if (xc yc)
error('input vectors must be the same
length') end if (xr 1 yr 1)
error('inputs must both be row vectors (1 row, n
columns)') end
Error Checking
9Example of a Euclidean.m function
function dist euclidean(x,y) function dist
euclidean(x,y) Calculates the Euclidean
distance between two vectors x and y
A. Student, CS 175 Inputs
x, y 2 vectors of real numbers, each of size
1 x n Outputs dist the Euclidean
distance between x and y xr, xc size(x) yr,
yc size(y) if (xc yc) error('input
vectors must be the same length') end if (xr
1 yr 1) error('inputs must both be row
vectors (1 row, n columns)') end calculate a
vector of component_by_component distances delta
x - y now calculate the Euclidean
distance dist sqrt(deltadelta)
Note the use of vectorization
10Min.m function in MATLAB
help min MIN Smallest component. For
vectors, MIN(X) is the smallest element in X. For
matrices, MIN(X) is a row vector containing
the minimum element from each column. For N-D
arrays, MIN(X) operates along the first
non-singleton dimension. Y,I MIN(X)
returns the indices of the minimum values in
vector I. If the values along the first
non-singleton dimension contain more than one
minimal element, the index of the first one is
returned. MIN(X,Y) returns an array the
same size as X and Y with the smallest
elements taken from X or Y. Either one can be a
scalar. Y,I MIN(X,,DIM) operates along
the dimension DIM. When complex, the
magnitude MIN(ABS(X)) is used. NaN's are ignored
when computing the minimum. Example If
X 2 8 4 then min(X,,1) is 2 3 4,
7 3 9
min(X,,2) is 2 and min(X,5) is 2 5 4
3,
5 3 5. See also MAX, MEDIAN,
MEAN, SORT.
11Example of a maxvalue.m function
function maxvalue, rmax, cmax maxvalue(A)
function maxvalue, rmax, cmax maxvalue(A)
brief description of function here
Your Name, CS 175
Inputs A a matrix of size r x c, with
r rows and c columns Outputs
maxvalue largest entry in A rmax, cmax
integers specifying the (row,column) location
of the max value Get a row vector containing
the maximum value within each column Store
idx_row - a vector containing the location of
the maximum within the column mx_row, idx_row
max(A) find the maximum within this
vector maxvalue, cmax max(mx_row) Use the
idx_row to find the row location of the max rmax
idx_row(cmax)
12Example of a nearest_neighbor.m function
function y, i, d nearest_neighbor(x, A)
function y, i, d nearest_neighbor(x, A)
Find the row vector y from a matrix of row
vectors A that is closest in Euclidean distance
to row vector x.
A. Student, CS 175 Inputs x a
vector of numbers of size 1 x n A k
vectors of size 1 x n, "stacked" in a k x n
matrix Outputs y the closes vector
in A to x (of size 1 x n) i the integer
(row) index of y in A d the Euclidean
distance between x and y
13Example of a nearest_neighbor.m function
function y, i, d nearest_neighbor(x, A)
function y, i, d nearest_neighbor(x, A)
Find the row vector y from a matrix of row
vectors A that is closest in Euclidean distance
to row vector x.
A. Student, CS 175 Inputs x a
vector of numbers of size 1 x n A k
vectors of size 1 x n, "stacked" in a k x n
matrix Outputs y the closes vector
in A to x (of size 1 x n) i the integer
(row) index of y in A d the Euclidean
distance between x and y xr, xc
size(x) Ar, Ac size(A) if (xc Ac)
error('input vector x and matrix A must have the
same number of columns') end if (xr 1)
error('input vector x must be a row vector') end
Error Checking
14For loop version of nearest_neighbor.m function
function y, i, d nearest_neighbor(x, A)
function y, i, d nearest_neighbor(x,
A) .. .. xr, xc size(x) Ar, Ac
size(A) .. "for loop" version of
code distances zeros(Ar,1) preallocate
storage for distances for j1Ar loop over
rows in A y A(j,) distances(j)
euclidean(x,y) end find the minimum distance
and its location d i min(distances) find
the vector (the row in A) corresponding to the
minimum distance y A(i,)
15(No Transcript)
16Repmat.m function in MATLAB
help repmat REPMAT Replicate and tile an
array. B REPMAT(A,M,N) replicates and tiles
the matrix A to produce the M-by-N block
matrix B. B REPMAT(A,M N) produces
the same thing. B REPMAT(A,M N P ...)
tiles the array A to produce a
M-by-N-by-P-by-... block array. A can be N-D.
REPMAT(A,M,N) when A is a scalar is commonly
used to produce an M-by-N matrix filled with
A's value. This can be much faster than
AONES(M,N) when M and/or N are large.
Example repmat(magic(2),2,3)
repmat(NaN,2,3) See also MESHGRID.
repmat(1 2, 3, 1) ans 1 2 1
2 1 2
17Vectorized version of nearest_neighbor.m function
function y, i, d nearest_neighbor(x, A)
function y, i, d nearest_neighbor(x,
A) .. .. VECTORIZED VERSION OF THE CODE
create a matrix of size Ar x xc, where each row
consists of x xmatrix repmat(x,Ar,1)
subtract the components of xmatrix and A, by
matrix subtraction delta xmatrix - A now
square the differences, by component
multiplication squaredelta delta.delta sum
up the squared differences, row by row (note use
of transpose ') distances sqrt(sum(squaredelta'
)') find the minimum distance and its
location (as before) d i min(distances)
find the vector (the row in A) corresponding to
the minimum distance y A(i,)
18Nearest-Neighbor Classification (revisited)
19Example of Data from 2 Classes
20Classifiers and Decision Boundaries
- What is a Classifier?
- A classifier is a mapping from feature space (a
d-dimensional vector) to the class labels 1, 2,
m - Thus, a classifier partitions the feature space
into m decision regions - The line or surface separating any 2 classes is
the decision boundary
212-Class Data with a Linear Decision Boundary
22Classification Problem with Overlap
23(No Transcript)
24 Classifiers functions or mappings
Feature Values (which are known, measured)
Predicted Class Value (true class is unknown to
the classifier)
a
b
c
Classifier
d
z
We want a mapping or function which takes any
combination of values x (a, b, d, ..... z) and
will produce a prediction c, i.e., a function c
f(a, b, d, . z) which produces a value c1,
c2,cm The problem is that we dont know
this mapping we have to learn it from data!
25Classification Accuracy
- Say we have N feature vectors
- Say we know the true class label for each feature
vector - We can measure how accurate a classifier is by
how many feature vectors it classifies correctly - Accuracy percentage of feature vectors
correctly classified - training accuracy accuracy on training data
- test accuracy accuracy on new data not used in
training
26Some Notation
- Training Data
- Dtrain x(1), c(1) , x(2), c(2) ,
x(N), c(N) - N pairs of feature vectors and class labels
- Feature Vectors and Class Labels
- x(i) is the ith training data feature vector
- in MATLAB this could be the ith row of an N x d
matrix - c(i) is the class label of the ith feature vector
- in general, c(i) can take m different class
values, e.g., c 1, c 2, ... - Let y be a new feature vector whose class label
we do not know, i.e., we wish to classify it.
27Example
Feature 2
1
1
1
2
2
1
1
2
2
1
2
1
1
2
2
2
Feature 1
28kNN Decision Boundary (k1)
1
In general Nearest-neighbor classifier produces
piecewise linear decision boundaries
1
1
2
Feature 2
2
1
1
2
2
1
2
1
1
2
2
2
Feature 1
29K-Nearest Neighbor (kNN) Classifier
- Find the k-nearest neighbors to y in Dtrain
- i.e., rank the feature vectors according to
Euclidean distance - select the k vectors which are have smallest
distance to y - Classification
- ranking yields k feature vectors and a set of k
class labels - pick the class label which is most common in this
set (vote) - classify y as belonging to this class
30K-Nearest Neighbor (kNN) Classifier
- Notes
- In effect, the classifier uses the nearest k
feature vectors from Dtrain to vote on the
class label for y - the single-nearest neighbor classifier is the
special case of k1 - for two-class problems, if we choose k to be odd
(i.e., k1, 3, 5,) then there will never be any
ties - training is trivial for the kNN classifier,
i.e., we just use Dtrain as a lookup table when
we want to classify a new feature vector - Extensions of the Nearest Neighbor classifier
- weighted distances
- e.g., if some of the features are more important
- e.g., if features are irrelevant
- fast search techniques (indexing) to find
k-nearest neighbors in d-space
31Assignment 2
- Due Wednesday..
- 4 parts
- Plot classification data in two-dimensions
- Implement a nearest-neighbor classifier
- Plot the errors of a k-nearest-neighbor
classifier - Test the effect of the value k on the accuracy of
the classifier
32Data Structure
- simdata1
- shortname 'Simulated Data 1'
- numfeatures 2
- classnames 2x6 char
- numclasses 2
- description 1x66 char
- features 200x2 double
- classlabels 200x1 double
33Plotting Function
function classplot(data, x, y) function
classplot(data, x, y) brief description
of what the function does ......
Your Name, CS 175, date
Inputs data (a structure with the
same fields as described above
your comment header should describe the structure
explicitly) Note that if you are
only using certain fields in the structure
in the function below, you need only
define these fields in the input comments
-------- Your code goes here -------
34First simulated data set, simdata1
35Second simulated data set, simdata2
36Nearest Neighbor Classifier
function class_predictions knn(traindata,train
labels,k, testdata) function
class_predictions knn(traindata,trainlabels,k,
testdata) a brief description of what
the function does ......
Your Name, CS 175, date
Inputs traindata a N1 x d vector of
feature data (the "memory" for kNN)
trainlabels a N1 x 1 vector of classlabels for
traindata k an odd positive integer
indicating the number of neighbors to use
testdata a N2 x d vector of feature data for
testing the knn classifier Outputs
class_predictions N2 x 1 vector of predicted
class values -------- Your code goes
here -------
37Plotting k-NN Errors
function knn_plot(traindata,trainlabels,k,testdata
,testlabels) function knn_plot(traindata,trainl
abels,k,testdata,testlabels) Predicts
class-labels for the data in testdata using the k
nearest neighbors in traindata, and then plots
the data (using the first 2 dimensions or first
2 features), displaying the data from each
class in different colors, and overlaying circles
on the points that were incorrectly
classified. Inputs traindata a N1 x d
vector of feature data (the "memory" for kNN)
trainlabels a N1 x 1 vector of classlabels for
traindata k an odd positive integer
indicating the number of neighbors to use
testdata a N2 x d vector of feature data for
testing the knn classifier trainlabels a
N2 x 1 vector of classlabels for traindata
38Accuracy of kNN Classifier as k is varied
 function errors knn_error_rates(traindata,tr
ainlabels, testdata, testlabels,kmax,plotflag) Â
function errors knn_error_rates(traindata,tr
ainlabels, testdata, testlabels,kmax,plotflag) Â
  a brief description of what the function
does   ......                            Â
Your Name, CS 175, date    Inputs     Â
traindata a N1 x d vector of feature data (the
"memory" for kNN) Â Â Â Â Â trainlabels a N1 x 1
vector of classlabels for traindata     Â
testdata a N2 x d vector of feature data for
testing the knn classifier      testlabels a
N2 x 1 vector of classlabels for traindata Â
    kmax an odd positive integer indicating
the maximum number of neighbors     Â
plotflag (optional argument) if 1, the
error-rates versus k is plotted, Â
                                  otherwise no
plot.    Outputs      errors r x 1
vector of error-rates on testdata, where r is
the                  number of values of k
that are tested. Â Â Â Â Â --------Â Â Your code goes
here -------
39Training Data and Test Data
- Training data
- labeled data used to build a classifier
- Test data
- new data, not used in the training process, to
evaluate how well a classifier does on new data - Memorization versus Generalization
- better training_accuracy
- memorizing the training data
- better test_accuracy
- generalizing to new data
- in general, we would like our classifier to
perform well on new test data, not just on
training data, - i.e., we would like it to generalize well to new
data - Test accuracy is more important than training
accuracy
40Test Accuracy and Generalization
- The accuracy of our classifier on new unseen data
is a fair/honest assessment of the performance of
our classifier - Why is training accuracy not good enough?
- Training accuracy is optimistic
- a classifier like nearest-neighbor can construct
boundaries which always separate all training
data points, but which do not separate new points - e.g., what is the training accuracy of kNN, k
1? - A flexible classifier can overfit the training
data - in effect it just memorizes the training data,
but does not learn the general relationship
between x and C - Generalization
- We are really interested in how our classifier
generalizes to new data - test data accuracy is a good estimate of
generalization performance
41Another Example
42A More Complex Decision Boundary
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
Decision
Decision
Region 1
Region 2
Feature 2
Decision
Boundary
Feature 1
43Example The Overfitting Phenomenon
Y
X
44A Complex Model
Y high-order polynomial in X
Y
X
45The True (simpler) Model
Y a X b noise
Y
X
46How Overfitting affects Prediction
Predictive Error
Error on Training Data
Model Complexity
47How Overfitting affects Prediction
Predictive Error
Error on Test Data
Error on Training Data
Model Complexity
48How Overfitting affects Prediction
Predictive Error
Overfitting
Underfitting
Error on Test Data
Error on Training Data
Model Complexity
Ideal Range for Model Complexity
49Linear Classifiers
50Decision Boundaries
- What is a Classifier?
- A classifier is a mapping from feature space (a
d-dimensional vector) to the class labels 1, 2,
m - Thus, a classifier partitions the feature space
into m decision regions - A line or curve separating the classes is a
decision boundary - in more than 2 dimensions this is a surface
(e.g., a hyperplane) - Linear Classifiers
- a linear classifier is a mapping which partitions
feature space using a linear function (a straight
line, or a hyperplane) - it is one of the simplest classifiers we can
imagine - separate the two classes using a straight line
in feature space - in 2 dimensions the decision boundary is a
straight line
512-Class Data with a Linear Decision Boundary
52Non-Linearly Separable Data, with Decision
Boundary
53Convex Hull of a Set of Points
- Convex Hull of a set of Q points
- Intuitively
- think of each point in Q as a nail sticking out
from a 2d board - the convex hull the shape formed by a tight
rubber band that surrounds all the nails - Formally the convex hull is the smallest convex
polygon P for which each point in Q is either on
the boundary of P or in its interior - (p.898, Cormen, Leiserson, and Rivest,
Introduction to Algorithms) - can be found (for n points) in time n log n
- Relation to Class Overlap
- define convex hulls of data points D1 and D2 as
P1 and P2 - If P1 and P2 do not intersect gt D1 and D2 are
linearly separable - if P1 and P2 intersect, then we have overlap
- If P1 and P2 intersect then D1 and D2 are not
linearly separable
54Convex Hull Example
Feature 2
Feature 1
55Convex Hull Example
Convex Hull P1
Feature 2
Feature 1
56Data from 2 Classes Linearly Separable?
Feature 2
x
x
x
x
x
x
x
Feature 1
57Data from 2 Classes Linearly Separable?
Convex Hull P1
Feature 2
x
x
x
x
x
x
x
Feature 1
58Data from 2 Classes linearly separable?
Convex Hull P1
Convex Hull P2
Feature 2
x
x
x
x
x
x
x
The 2 Hulls intersect gt data from each class are
not linearly separable
Feature 1
59Different data that is linearly separable
Convex Hull P1
Convex Hull P2
Feature 2
x
x
x
x
x
x
x
Feature 1
60Some Theory
Let N be the number of data points Let d be the
dimension of the data points Consider N points
in general position and assume each point is
labeled as belonging to class 1 or class 2 There
are 2N possible labelings Let F(N, d) the
fraction of labelings of N points in d dimensions
that are linearly separable It can be
shown that 1 if d gt N-2
F(N, d) (1 / 2 N-1 )
Sdi0 (N-1)! / (N-1-i)! i! if N gt
d
61Fraction of Labellings in d-space that are
Linearly Separable
F(N,d) fraction that are linearly separable
d infinity
1
d 10
0.5
d 1
0
N/(d1)
1
2
3
62Fraction of Labellings in d-space that are
Linearly Separable
F(N,d) fraction that are linearly separable
Note that for N lt d1, any labeling of N points
in d-dimensions is linearly separable (e.g., N3,
d 2 or N50, d100)
1
0.5
0
N/(d1)
1
2
3
63A Linear Classifier in 2 Dimensions
Let Feature 1 be called X Let Feature 2 be called
Y A linear classifier is a linear function of X
and Y, i.e., it computes f(X,Y) aX bY
c Here a, b, and c are the weights of the
classifier Define the output of the linear
classifier to be T(f) -1,
if f lt 0 T(f) 1, if f gt 0
if f(X,Y) lt 0, the classifier produces a
-1 (Decision Region 1) if f(X,Y) gt 0,
the classifier produces a 1 (Decision Region
2)
64Decision Boundaries for a 2d Linear Classifier
Depending on whether f(X,Y) is gt or lt 0, the
features (X,Y) get classified into class 1 or
class 2 Thus, f(X,Y) 0 must define the
decision boundary between class 1 and 2
65Decision Boundaries for a 2d Linear Classifier
Depending on whether f(X,Y) is gt or lt 0, the
features (X,Y) get classified into class 1 or
class 2 Thus, f(X,Y) 0 must define the
decision boundary between class 1 and 2 What is
the equation for this decision boundary?
f(X,Y) aX bY c 0 OR Y (c
aX)/b Thus, defining a, b, and c automatically
locates the decision boundary in X,Y space In
summary - a classifier defines a decision
boundaries between classes - for a linear
classifier, this boundary is a line or a plane -
the equation of the plane is defined by the
parameters of the classifier
66An Example of a Linear Decision Boundary
67A Better Linear Decision Boundary
68The Perceptron Classifier (for 2 features)
X
w1
w2
f w1 X w2 Y w3
-1, 1
Y
T(f)
w3
Threshold Function
Output class decision
Weighted Sum of the inputs
1
69The Perceptron Classifier (for 2 features)
X
w1
w2
f w1 X w2 Y w3
-1, 1
Y
T(f)
w3
Threshold Function
Output class decision
Weighted Sum of the inputs
1
Note weights w1, w2, w3, are the same as a, b, c
in the previous slides, i.e., f aX bY c
70Perceptrons
- Perceptron a linear classifier
- The ws are the weights (denoted as a, b,c,
earlier) - real-valued constants (can be positive or
negative) - Define an additional constant input 1 (allows
an intercept in decision boundary) - A perceptron calculates 2 quantities
- 1. A weighted sum of the input features
- 2. This sum is then thresholded by the T function
- A simple artificial model of human neurons
- weights synapses
- threshold neuron firing
71Notation
- Inputs
- x1, x2, , xd, xd1
- x1, x2, , xd-1, xd are the values of the d
features - xd1 1 (a constant input)
- x (x1, x2, , xd, xd1 )
- Weights
- w1, w2, , wd, wd1
- we have d1 weights
- one for each feature one for the constant
- w (w1, w2, , wd, wd1 )
72Perceptron Operation
1 (if w1x1 wd1 xd1 gt
0) ox1, x2,, xd-1, xd
-1 (otherwise) Note that w
(w1,.. wd1) , the weight vector (row
vector, 1 x d1) and x (x1, xd1), the
feature vector (row vector, 1 x d1) gt
w1x1 w2x2 wd1 xd1 w . x and w .
x is the vector inner product (wx or w.x
in MATLAB)
73Vector Inner Product
This is the transpose of the row vector x (it
becomes a column vector)
Note that w . x (w1,.. wd1) (x1
x2
..
..
xd
xd1 ) w1x1 w2x2 wd1
xd1
74Perceptron Decision Boundary
- Equations of operation (in vector form)
1 (if w . x gt 0) o(x1, x2,,
xd, xd1) -1
(otherwise) The perceptron represents a
hyperplane decision surface in d-dimensional
space e.g., a line in 2d, a plane in 3d,
etc The equation of the hyperplane is w . x
0 This is the equation for points in x-space
that are on the boundary
75Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
x2
x1
76Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
w . x 0 gt 1. x1 - 1. x2 0.1 0 gt x1 -
x2 0 gt x1 x2
x2
x1
77Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
w . x 0 gt 1. x1 - 1. x2 0.1 0 gt x1 -
x2 0 gt x1 x2 This is the equation for the
decision boundary
x2
x1
78Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
w . x lt 0 gt x1 - x2 lt 0 gt x1 lt x2 (this
is the equation for decision region -1)
w . x 0
x2
x1
79Example of Perceptron Decision Boundary
w (w1, w2,w3) (1, -1, 0)
w . x 0
x2
w . X lt 0
w . x gt 0 gt x1 - x2 gt 0 gt x1 gt x2 (this
is the equation for decision region 1)
x1
80Representational Power of Perceptrons
- What mappings can a perceptron represent
perfectly? - A perceptron is a linear classifier
- thus it can represent any mapping that is
linearly separable - some Boolean functions like AND (on left)
- but not Boolean functions like XOR (on right)
0
x
0
x
0
0
x
0
81Summary
- Review of Assignment 1
- K-nearest-neighbor classifiers
- Basic concepts
- Assignment 2
- Training and test accuracy
- Linear classifiers
- Perceptron classifier
- Next lecture
- How we can learn the weights of a perceptron