Title: Neural Networks and Pattern Recognition
1unit 2
Neural Networks and Pattern Recognition
Giansalvo EXIN Cirrincione
2STATISTICAL PATTERN RECOGNITION
An example character recognition
Problem distinguish handwritten versions of the
characters a and b
captured by a camera
Goal develop an algorithm which will assign any
image, represented by a vector x, to one of two
classes, which we shall denote by Ck where k1,2,
so that class C1 corresponds to a and class C2
corresponds to b.
3data set (sample)
feature selection/extraction
4Approximation of the class-conditional pdfs
classifier feature threshold
5(No Transcript)
6(No Transcript)
7(No Transcript)
8Classification outcome y
Mapping x ? ?n ? y ? ?c ( c classes)
Model yk yk (xw)
9- regression problems continuous outputs
- classification problems discrete outputs
10prior knowledge
11The curse of dimensionality
PROBLEM model a mapping x ? ?n ? y ? ? on
the basis of a set of training data
SIMPLE SOLUTION discretize the input variables
in bins. This leads to a division of the whole
input space into cells.
Each of the training examples corresponds to a
point in one of the cells and carries an
associated value of the output y.
12The curse of dimensionality
PROBLEM model a mapping x ? ?n ? y ? ? on
the basis of a set of training data
Given a new point in input space, find which cell
the point falls in and return the average value
of y for all points in that cell.
By increasing the number of divisions M along
each axis we could increase the precision with
which the input is specified.
13The curse of dimensionality
PROBLEM model a mapping x ? ?n ? y ? ? on
the basis of a set of training data
If each input variable is divided into M
divisions, then the total number of cells is Md
and this grows exponentially with the
dimensionality d of the input space. Since each
cell must contain at least one data point, this
implies that the quantity of training data needed
to specify the mapping also grows exponentially.
For a limited quantity of data, increasing d
leads to a very poor representation of the
mapping.
14homework
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19Another example polynomial curve fitting
Problem fit a polynomial to a set of N data
points by minimizing an error function
linear in w
supervised learning
sum-of-squares error
w minimum
quadratic in w
20(No Transcript)
21(No Transcript)
22overfitting in classification
not allowed overlap
23model complexity
Occams razor
complexity control
24Goal classify a new character in such a way as
to minimize the probability of misclassification
P(Ck ) prior probability (given the TS,
fraction of characters labelled k in the limit of
an infinite number of observations)
Assign it to the class having the higher prior
probability
25Problem seek a formalism which allows this
information to be combined with the prior
probabilities we already possess
26P(C1)
Prior probability P(Ck) (in the limit of an
infinite number of images)
27P(C1 , X 5 )
Joint probability P(Ck , X l ) (in the limit of
an infinite number of images)
28P( X 5 C1 )
Class-conditional probability P( X l Ck ) (in
the limit of an infinite number of images)
29P( X 5 C1 )
30P( X 5 )
Unconditional probability P( X l ) (in the limit
of an infinite number of images)
31(No Transcript)
32also for degree of belief
posterior
33different prior probabilities (e.g.
classification normal tissue / tumour on medical
X-ray images)
P(C1) 0.6
34inference decision making classification
process
35Bayes theorem (continuous variables)
p ( x Ck )
P ( Ck )
P ( Ck x )
x (observation)
36Bayes theorem (continuous variables)
for c classes and feature vector x
37Decision making
Minimum misclassification rule
Assign feature vector x to class Ck if
decision regions R1 , , Rc such that a point
falling in Rk is assigned to Ck
38Decision making
39(No Transcript)
40discriminant (decision) functions y1(x), ,
yc(x)
41discriminant (decision) functions y1(x), ,
yc(x)
decision boundaries
42discriminant (decision) functions y1(x), ,
yc(x)
other discriminant functions
decision boundaries
43two-class decision problems
- assign x to class C1 if y(x) gt 0
- assign x to class C2 if y(x) lt 0
44Lkj penalty associated with assigning a pattern
to Cj when in fact it belongs to Ck
expected loss for patterns in Ck
Loss matrix L (Lij)
minimizing risk
45Lkj penalty associated with assigning a pattern
to Cj when in fact it belongs to Ck
risk
Loss matrix L (Lij)
minimizing risk
46Lkj penalty associated with assigning a pattern
to Cj when in fact it belongs to Ck
Choose regions Rj such that x ? Rj when
minimizing risk
47homework
minimizing risk
48The reject option
One way in which the reject option can be used is
to design a relatively simple but fast classifier
system to cover the bulk of the feature space,
while leaving the remaining regions to a more
sophisticated system which might be relatively
slow.
49(No Transcript)
50FINE