Title: Pattern Recognition
1Pattern Recognition Machine Learning
Debrup Chakraborty debrup_at_delta.cs.cinvestav.mx
2The Initials
Time Wednesday 16 hrs to 18 hrs Friday
16 hrs to 18 hrs
Course website http//delta.cs.cinvestav.mx/debr
up/Machine_Learning.html
- Books
- Pattern Classification Duda, Hart and Stork
- Machine Learning Mitchell
- Neural Networks Haykin
3The Initials (contd.)
Grading policies 4 homeworks (20) 2 exams
(30) 1 project/term paper ( 50)
The lectures would be in English, I am really
sorry about it.
4We are good at recognizing patterns
We can recognize faces with ease We can
understand spoken words We can read
handwritings many more
It would be nice to make machines perform these
tasks.
Informally pattern recognition deals with
techniques and methods to make machines recognize
patterns.
5Satellite image of Kolkata, in 4 channels Red,
Green, Blue and Infrared
Which parts are land and which are water?
Where is the airport?
6Protein Fold Prediction
- Proteins are sequences of amino acid molecules.
There are 20 distinct type of amino acids and
their sequences can form many-many protein
molecules. - A amino acid chain must fold in a certain manner
which helps it in its specified activity. - The activity and function of a protein depends on
the way it folds. Thus the fold information is
necessary for determining the function of a
protein. - Problem Given an amino acid sequence, tell me
the fold it will undergo.
7Text Categorization
Given a large corpus of text documents (Say news
items). Tell me the categories of the documents.
Similar problems Classify music in a music
corpus Classify images in a image corpus
Retrive documents/music/images from a database
which are similar to x.
8Other Problems
- Predict whether a patient hospitalized due to
heart attack will have a second heart attack. The
prediction is to be based on demographic, diet
and clinical measurements of the patient. - Predict the price of a stock in 6 months from now
on the basis of company performance measures and
econimic data. - Identify the numbers in a handwritten zip code
(CP) from digitized images. - Estimate the amount of glucose in the blood of a
diabetic patient from the infra red absorption
spectrum of the blood. - Estimate the risk factors for prostrate cancer
based on clinical and demographic variables.
9Pattern Recognition (What the experts say?)
Duda Hart, 1973 A field concerned with
machine recognition of meaningful regularities.
Bezdek, 1981 Pattern recognition is the search
for structure in data
10Types of Data
Object Data
Relational Data
11Object Data Numeric Features
The characteristics of an object is encoded in a
vector called the Feature Vector
Each component of the vector represents some
attribute of the object. These components are
called features.
Example Iris data A data in R4. Representing
iris flowers. The features are the sepal length,
sepal width, petal length and petal width of 150
iris flowers of 3 different types.
Multichannel satellite Image Images captured by
different sensors which captures the frequency
information of the electromagnetic radiation from
earths surface.
12Relational Data
A relational matrix whose values are assigned by
humans or computed from features
Russian German Chinese Japanese
1.0 0.6 0.2 0.25
0.6 1.0 0.1 0.2
0.2 0.1 1.0 0.8
0.25 0.2 0.8 1.0
Russian German Chinese Japanese
13Pattern Recognition Systems
Preprocessing
Feature extraction
Feature analysis
The main recognition task
14Recognition Involves Learning
- Learning in this context means
- Extracting knowledge from past experience
- Representing the knowledge efficiently
- Using the knowledge for future predictions/recogni
tions
15Types of Learning
Supervised learning Learning with a
teacher Unsupervised learning
Learning without a teacher Reinforcement
learning Learning with a critic
16Supervised Learning
Training Set
Learning Algorithm
New example
h
Prediction
17Supervised Learning (Contd.)
Supervised learning systems varies according to
the type of function learned.
Basic types Function approximation systems
(Numeric outputs)
Classifier systems (outputs are classes)
18Supervised Learning (Contd.)
It is assumed that x and y bear a
(unknown) functional relationship say
It is assumed that (xi, yi) is generated from a
fixed (but unknown) time invariant probability
distribution.
The Training set
Goal To find h which closely resembles S given L.
19Supervised Learning (Contd.)
How to measure whether h resembles S.
Training error The error on training data points
Test error (Generalization Error) The error on
points not in the training set (difficult to
measure)
20Bad Generalization An example
y
y
x
x
y
y
x
x
21Thank You