Neural Networks and Pattern Recognition - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Neural Networks and Pattern Recognition

Description:

array of pixel values xi which range from 0 to 1 according to the fraction of ... Problem: seek a formalism which allows this information to be combined with the ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 51

Provided by: EXIN

Category:

more less

Transcript and Presenter's Notes

Title: Neural Networks and Pattern Recognition

1
unit 2
Neural Networks and Pattern Recognition
Giansalvo EXIN Cirrincione
2
STATISTICAL PATTERN RECOGNITION
An example character recognition
Problem distinguish handwritten versions of the
characters a and b
captured by a camera
Goal develop an algorithm which will assign any
image, represented by a vector x, to one of two
classes, which we shall denote by Ck where k1,2,
so that class C1 corresponds to a and class C2
corresponds to b.
3
data set (sample)
feature selection/extraction
4
Approximation of the class-conditional pdfs
classifier feature threshold
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Classification outcome y
Mapping x ? ?n ? y ? ?c ( c classes)

Model yk yk (xw)
9

regression problems continuous outputs
classification problems discrete outputs

10
prior knowledge
11
The curse of dimensionality
PROBLEM model a mapping x ? ?n ? y ? ? on
the basis of a set of training data
SIMPLE SOLUTION discretize the input variables
in bins. This leads to a division of the whole
input space into cells.
Each of the training examples corresponds to a
point in one of the cells and carries an
associated value of the output y.
12
The curse of dimensionality
PROBLEM model a mapping x ? ?n ? y ? ? on
the basis of a set of training data
Given a new point in input space, find which cell
the point falls in and return the average value
of y for all points in that cell.
By increasing the number of divisions M along
each axis we could increase the precision with
which the input is specified.
13
The curse of dimensionality
PROBLEM model a mapping x ? ?n ? y ? ? on
the basis of a set of training data
If each input variable is divided into M
divisions, then the total number of cells is Md
and this grows exponentially with the
dimensionality d of the input space. Since each
cell must contain at least one data point, this
implies that the quantity of training data needed
to specify the mapping also grows exponentially.
For a limited quantity of data, increasing d
leads to a very poor representation of the
mapping.
14
homework
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Another example polynomial curve fitting
Problem fit a polynomial to a set of N data
points by minimizing an error function
linear in w
supervised learning
sum-of-squares error
w minimum
quadratic in w
20
(No Transcript)
21
(No Transcript)
22
overfitting in classification
not allowed overlap
23
model complexity
Occams razor
complexity control
24
Goal classify a new character in such a way as
to minimize the probability of misclassification
P(Ck ) prior probability (given the TS,
fraction of characters labelled k in the limit of
an infinite number of observations)
Assign it to the class having the higher prior
probability
25
Problem seek a formalism which allows this
information to be combined with the prior
probabilities we already possess
26
P(C1)
Prior probability P(Ck) (in the limit of an
infinite number of images)
27
P(C1 , X 5 )
Joint probability P(Ck , X l ) (in the limit of
an infinite number of images)
28
P( X 5 C1 )
Class-conditional probability P( X l Ck ) (in
the limit of an infinite number of images)
29
P( X 5 C1 )
30
P( X 5 )
Unconditional probability P( X l ) (in the limit
of an infinite number of images)
31
(No Transcript)
32
also for degree of belief
posterior
33
different prior probabilities (e.g.
classification normal tissue / tumour on medical
X-ray images)
P(C1) 0.6
34
inference decision making classification
process
35
Bayes theorem (continuous variables)
p ( x Ck )
P ( Ck )
P ( Ck x )
x (observation)
36
Bayes theorem (continuous variables)
for c classes and feature vector x
37
Decision making
Minimum misclassification rule
Assign feature vector x to class Ck if
decision regions R1 , , Rc such that a point
falling in Rk is assigned to Ck
38
Decision making
39
(No Transcript)
40
discriminant (decision) functions y1(x), ,
yc(x)
41
discriminant (decision) functions y1(x), ,
yc(x)
decision boundaries
42
discriminant (decision) functions y1(x), ,
yc(x)
other discriminant functions
decision boundaries
43
two-class decision problems

assign x to class C1 if y(x) gt 0
assign x to class C2 if y(x) lt 0

44
Lkj penalty associated with assigning a pattern
to Cj when in fact it belongs to Ck
expected loss for patterns in Ck
Loss matrix L (Lij)
minimizing risk
45
Lkj penalty associated with assigning a pattern
to Cj when in fact it belongs to Ck
risk
Loss matrix L (Lij)
minimizing risk
46
Lkj penalty associated with assigning a pattern
to Cj when in fact it belongs to Ck
Choose regions Rj such that x ? Rj when
minimizing risk
47
homework
minimizing risk
48
The reject option
One way in which the reject option can be used is
to design a relatively simple but fast classifier
system to cover the bulk of the feature space,
while leaving the remaining regions to a more
sophisticated system which might be relatively
slow.
49
(No Transcript)
50
FINE

Write a Comment

User Comments (0)