Title: Pattern Recognition
1Pattern Recognition
- Lecture 2
- By Rob Buxton (some ideas inspired by L.Noriega)
2History
- Mathematicians knew that organised data could be
classified in terms of its inherent patterns long
before the first computers were invented - One of the first attempts to apply statistical
pattern recognition and classification to a
problem was Fischer, who classified prehistoric
skulls according to a complex set of measurements
3Vectors
- Statistical methods represent objects by using
vectors, where the components are measurements or
features associated with that object - These can be discrete (number of legs) or
continuous (weight) - It is usual to have several components to a
vector (0.7,0.9,12,5.07)
4Vectors
- In order to be compared directly vectors need to
be of the same dimensionality
(0.4,0.5,0.9,0.6,0.6)
(0.2,0.7,0.9,0.5,0.7)
5Measurements
- If we look at vectors in practical terms often
the components are measurements of some sort
2 cubes can be compared in terms of a
3-dimensional feature vector
(x,y,z)
(x,y,z)
6Pattern Spaces
x1, x2, x3
Each row of a table like this is a vector
describing a point in pattern space
7Euclidean Distance
- In Lecture 1 we discussed the concept of distance
as a similarity measure - But how does this work?
The difference between the ith component of x and
the ith component of y
The number of components
The difference between x and y
i is the component number
8Euclidean Distance
- Take two 2-d vectors
- x (0.5,0.7)
- y (0.1,0.9)
Diff(x-y) sqrt((0.5-0.1)2(0.7-0.9)2)
sqrt(0.2) 0.447
9Possible Problems
What happens if we have a lot of variance in one
feature compared to another ?
large
overlap is an area of indecision
x1
A distance measure that attempts to create a more
symmetrical cluster may help
small
x2
10Modified Euclidean distance Metrics
- These can be a lot more complex but can help to
overcome the sort of problems shown in the last
example - A relatively simple way of rescaling is based
upon the variance in the measured features - What effect do you think this might have ?
11Features and their selection
- So far weve looked at 3-d objects and it is
fairly simple to see that our vectors correspond
to (width,length,height) - In some PR problems extracting the most
appropriate feature information from unwanted
clutter is a major task
12Features example
13Features
- There are three distinctive points contained in
the information on the previous slide, these
shall be our features, can you pick them? - That shouldve been quite easy but how do we get
a computer system to pick the features on a
reliable basis? - This can be very tricky!
14Graphing
- Our system needs to pick the distinctive values
in the same way we do - It can sometimes be easier to visualise the
process if it is graphed
15Problem dependent
- The answer is problem dependent
- If we know that in every case we are going to be
searching for 3 points to equal our features-then
it is quite simple - We simply list the 3 highest points
- What could we do if we dont always know the
number of features?
16Clusters
- Many methods rely on grouping items that are
similar together (in some imaginary space) - Often this process is called Clustering
- Clusters that are close to each other are
similar, clusters that are far apart are
dissimilar - Often clusters are associated with unsupervised
PR methods
17K-Nearest Neighbour Revisited
- This is a simple method that we have already
discussed very briefly, but it serves a purpose
because it allows us to discuss in more detail
issues that affect these types of methods in real
life
18K- NN explained
Cluster 1
Cluster 2
K-nn allocates a novel vector to an existing
cluster based upon its Euclidean distance from
the (7) nearest allocated vectors
novel vector
19K-NN, advantages and disadvantages
- Advantages
- Quick and simple to use
- Disadvantages
- The clusters have to be predefined
- Very sensitive to outliers
20K-Means
- A better algorithm to use in practice!
- How does it work?
- This is an unsupervised method where the clusters
are created - Prototypes within the cluster structure are used
to determine allocation
21K-Means explained
Cluster 1
Distance measured to the prototype
prototype
Novel vector
22K-Means update
- If a novel vector is allocated to a cluster then
the prototype is updated so as to include the
characteristics of the additional vector - How do you think this may be done?
23Average of the vectors
- One simple way would be to average the vector
components
(i1 i2 in) ((j1 k1)/2 (j2 k2)/2
..)
24Advantages and Disadvantages
- Advantages
- Simple
- Efficient
- Disadvantages
- K must be provided
- Depends on Linear Separability
25Next Lecture
- Syntactical Pattern Recognition