Spoken Dialog Systems and Voice XML : Intro to Pattern Recognition - PowerPoint PPT Presentation

1 / 95

About This Presentation

Title:

Spoken Dialog Systems and Voice XML : Intro to Pattern Recognition

Description:

Some materials used in this course were taken from the textbook 'Pattern ... David G. Stork, Stanford University. Dr. Godfried Toussaint, McGill University ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 96

Provided by: CED80

Category:

more less

Transcript and Presenter's Notes

Title: Spoken Dialog Systems and Voice XML : Intro to Pattern Recognition

1
Spoken Dialog Systems and Voice XML Intro to
Pattern Recognition

Esther Levin
Dept of Computer Science
CCNY

Some materials used in this course were taken
from the textbook Pattern Classification by
Duda et al., John Wiley Sons, 2001 with the
permission of the authors and the publisher
2
Credits and Acknowledgments

Materials used in this course were taken from the
textbook Pattern Classification by Duda et al.,
John Wiley Sons, 2001 with the permission of
the authors and the publisher and also from
Other material on the web
Dr. A. Aydin Atalan, Middle East Technical
University, Turkey
Dr. Djamel Bouchaffra, Oakland University
Dr. Adam Krzyzak, Concordia University
Dr. Joseph Picone, Mississippi State University
Dr. Robi Polikar, Rowan University
Dr. Stefan A. Robila, University of New Orleans
Dr. Sargur N. Srihari, State University of New
York at Buffalo
David G. Stork, Stanford University
Dr. Godfried Toussaint, McGill University
Dr. Chris Wyatt, Virginia Tech
Dr. Alan L. Yuille, University of California, Los
Angeles
Dr. Song-Chun Zhu, University of California, Los
Angeles

3
Outline

Introduction
What is this pattern recogntiion
Background Material
Probability theory

4
(No Transcript)
5
PATTERN RECOGNITION AREAS

Optical Character Recognition ( OCR)
Sorting letters by postal code.
Reconstructing text from printed materials (such
as reading machines for blind people).
Analysis and identification of human patterns
Speech and voice recognition.
Finger prints and DNA mapping.
Banking and insurance applications
Credit cards applicants classified by income,
credit worthiness, mortgage amount, of
dependents, etc.
Car insurance (pattern including make of car, of
accidents, age, sex, driving habits, location,
etc).
Diagnosis systems
Medical diagnosis (disease vs. symptoms
classification, X-Ray, EKG and tests analysis,
etc).
Diagnosis of automotive malfunctioning
Prediction systems
Weather forecasting (based on satellite data).
Analysis of seismic patterns
Dating services (where pattern includes age, sex,
race, hobbies, income, etc).

6
More Pattern Recognition Applications

SENSORY
Vision
Face/Handwriting/Hand
Speech
Speaker/Speech
Olfaction
Apple Ripe?

DATA
Text Categorization
Information Retrieval
Data Mining
Genome Sequence Matching

7
What is a pattern?

A pattern is the opposite of a chaos it is an
entity vaguely defined, that could be given a
name.

8
PR Definitions

Theory, Algorithms, Systems to Put Patterns into
Categories
Classification of Noisy or Complex Data
Relate Perceived Pattern to Previously Perceived
Patterns

9
Characters
A v t u I h D U w K
Ç s g I ü Ü Ö G
????? ? ? ? ? ? ? O ? ? ?
? ? ? ? ?
10
Handwriting
11
(No Transcript)
12
Terminology

Features, feature vector
Decision boundary
Error
Cost of error
Generalization

13
A Fishy Example I

Sorting incoming Fish on a conveyor according
to species using optical sensing
Salmon or Sea Bass?

Problem Analysis
Set up a camera and take some sample images to
extract features
Length
Lightness
Width
Number and shape of fins
Position of the mouth, etc
This is the set of all suggested features to
explore for use in our classifier!

15
Solution by Stages

Preprocess raw data from camera
Segment isolated fish
Extract features from each fish (length,width,
brightness, etc.)
Classify each fish

Preprocessing
Use a segmentation operation to isolate fishes
from one another and from the background
Information from a single fish is sent to a
feature extractor whose purpose is to reduce the
data by measuring certain features
The features are passed to a classifier

2
17
2
18

Classification
Select the length of the fish as a possible
feature for discrimination

2
19
2
20

The length is a poor feature alone!
Select the lightness as a possible feature.

2
21
2
22
Customers do not want sea bass in their cans of
salmon

Threshold decision boundary and cost relationship
Move our decision boundary toward smaller values
of lightness in order to minimize the cost
(reduce the number of sea bass that are
classified salmon!)
Task of decision theory

2
23

Adopt the lightness and add the width of the fish
Fish x x1, x2

Lightness
Width
2
24
2
25

We might add other features that are not
correlated with the ones we already have. A
precaution should be taken not to reduce the
performance by adding such noisy features
Ideally, the best decision boundary should be the
one which provides an optimal performance such as
in the following figure

2
26
2
27

However, our satisfaction is premature because
the central aim of designing a classifier is to
correctly classify novel input
Issue of generalization!

2
28
2
29
Decision Boundaries
Observe Can do much better with two features
Caveat overfitting!
30
Occams Razor
Entities are not to be multiplied without
necessity William of Occam
(1284-1347)
31
A Complete PR System
32
Problem Formulation
Input object
Class Label
Measurements Preprocessing
Classification
Features

Basic ingredients
Measurement space (e.g., image intensity,
pressure)
Features (e.g., corners, spectral energy)
Classifier - soft and hard
Decision boundary
Training sample
Probability of error

33
Pattern Recognition Systems

Sensing
Use of a transducer (camera or microphone)
PR system depends of the bandwidth, the
resolution, sensitivity, distortion of the
transducer
Segmentation and grouping
Patterns should be well separated and should not
overlap

3
34
3
35

Feature extraction
Discriminative features
Invariant features with respect to translation,
rotation and scale.
Classification
Use a feature vector provided by a feature
extractor to assign the object to a category
Post Processing
Exploit context dependent information other than
from the target pattern itself to improve
performance

36
The Design Cycle

Data collection
Feature Choice
Model Choice
Training
Evaluation
Computational Complexity

4
37
4
38

Data Collection
How do we know when we have collected an
adequately large and representative set of
examples for training and testing the system?

4
39

Feature Choice
Depends on the characteristics of the problem
domain. Simple to extract, invariant to
irrelevant transformation insensitive to noise.

4
40

Model Choice
Unsatisfied with the performance of our linear
fish classifier and want to jump to another class
of model

4
41

Training
Use data to determine the classifier. Many
different procedures for training classifiers and
choosing models

4
42

Evaluation
Measure the error rate (or performance) and
switch from one set of features models to
another one.

4
43

Computational Complexity
What is the trade off between computational ease
and performance?
(How an algorithm scales as a function of the
number of features, number or training examples,
number patterns or categories?)

4
44
Learning and Adaptation

Learning Any method that combines empirical
information from the environment with prior
knowledge into the design of a classifier,
attempting to improve performance with time.
Empirical information Usually in the form of
training examples.
Prior knowledge Invariances, correlations
Supervised learning
A teacher provides a category label or cost for
each pattern in the training set
Unsupervised learning
The system forms clusters or natural groupings
of the input patterns

5
45
Syntactic Versus Statistical PR

Basic assumption There is an underlying
regularity behind the observed phenomena.
Question Based on noisy observations, what is
the underlying regularity?
Syntactic Structure through common generative
mechanism. For example, all different
manifestations of English, share a common
underlying set of grammatical rules.
Statistical Objects characterized through
statistical similarity. For example, all possible
digits 2' share some common underlying
statistical relationship.

46
Difficulties

Segmentation
Context
Temporal structure
Missing features
Aberrant data
Noise

Do all these images represent an A'?
47
Design Cycle
How do we know what features to select, and how
do we select them?
What type of classifier shall we use. Is there
best classifier? How do we train? How do we
combine prior knowledge with empirical data?
How do we evaluate our performance Validate the
results. Confidence in decision?
48
Conclusion

I expect you are overwhelmed by the number,
complexity and magnitude of the sub-problems of
Pattern Recognition
Many of these sub-problems can indeed be solved
Many fascinating unsolved problems still remain

6
49
Toolkit for PR

Statistics
Decision Theory
Optimization
Signal Processing
Neural Networks
Fuzzy Logic
Decision Trees
Clustering
Genetic Algorithms
AI Search
Formal Grammars
.

50
Linear algebra

Matrix A
Matrix Transpose
Vector a

51
Matrix and vector multiplication

Matrix multiplication
Outer vector product
Vector-matrix product

52
Inner Product

Inner (dot) product
Length (Eucledian norm) of a vector
a is normalized iff a 1
The angle between two n-dimesional vectors
An inner product is a measure of collinearity
a and b are orthogonal iff
a and b are collinear iff
A set of vectors is linearly independent if no
vector is a linear combination of other vectors.

53
Determinant and Trace

Determinant
det(AB) det(A)det(B)
Trace

54
Matrix Inversion

A (n x n) is nonsingular if there exists B
A2 3 2 2, B-1 3/2 1 -1
A is nonsingular iff
Pseudo-inverse for a non square matrix, provided
is not singular

55
Eigenvectors and Eigenvalues
Characteristic equation n-th order polynomial,
with n roots.
56
Probability Theory

Primary references
Any Probability and Statistics text book
(Papoulis)
Appendix A.4 in Pattern Classification by Duda
et al
The principles of probability theory,
describing the behavior of systems with random
characteristics, are of fundamental importance to
pattern recognition.

57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
Example 1 ( wikipedia)

two bowls full of cookies.
Bowl 1 has 10 chocolate chip cookies and 30
plain cookies,
bowl 2 has 20 of each.
Fred picks a bowl at random, and then picks a
cookie at random.
The cookie turns out to be a plain one.
How probable is it that Fred picked it out of
bowl
whats the probability that Fred picked bowl 1,
given that he has a plain cookie?
event A is that Fred picked bowl 1,
event B is that Fred picked a plain cookie.
Pr(AB) ?

75
Example1 - cpntinued
Tables of occurrences and relative frequencies It
is often helpful when calculating conditional
probabilities to create a simple table containing
the number of occurrences of each outcome, or the
relative frequencies of each outcome, for each of
the independent variables. The tables below
illustrate the use of this method for the cookies.
The table on the right is derived from the table
on the left by dividing each entry by the total
number of cookies under consideration, or 80
cookies.
76
Example 2

1. Power Plant Operation.
The variables X, Y, Z describe the state of 3
power plants (X0 means plant X is idle).
Denote by A an event that a plant X is idle, and
by B an event that 2 out of three plants are
working.
Whats P(A) and P(AB), the probability that X is
idle given that at least 2 out of three are
working?

P(A) P(0,0,0) P(0,0,1) P(0,1,0) P(0, 1,
1) 0.070.04 0.03 0.18 0.32
P(B) P(0,1,1) P(1,0,1) P(1,1,0) P(1,1,1)
0.18 0.180.210.130.7
P(A and B) P(0,1,1) 0.18
P(AB) P(A and B)/P(B) 0.18/0.7 0.257

2. Cars are assembled in four possible locations.
Plant I supplies 20 of the cars plant II, 24
plant III, 25 and plant IV, 31. There is 1
year warrantee on every car.
The company collected data that shows
P(claim plant I) 0.05 P(claimPlant II)0.11
P(claimplant III) 0.03 P(claimPlant
IV)0.18
Cars are sold at random.
An owned just submitted a claim for her car. What
are the posterior probabilities that this car was
made in plant I, II, III and IV?

P(claim) P(claimplant I)P(plant I)
P(claimplant II)P(plant II)
P(claimplant III)P(plant III)
P(claimplant IV)P(plant IV) 0.0687
P(plant1claim)
P(claimplant I) P(plant I)/P(claim) 0.146
P(plantIIclaim)
P(claimplant II) P(plant II)/P(claim)
0.384
P(plantIIIclaim)
P(claimplant III) P(plant III)/P(claim)
0.109
P(plantIVclaim)
P(claimplant IV) P(plant IV)/P(claim)
0.361

80
Example 3

3. It is known that 1 of population suffers from
a particular disease. A blood test has a 97
chance to identify the disease for a diseased
individual, by also has a 6 chance of falsely
indicating that a healthy person has a disease.
a. What is the probability that a random person
has a positive blood test.
b. If a blood test is positive, whats the
probability that the person has the disease?
c. If a blood test is negative, whats the
probability that the person does not have the
disease?

A is the event that a person has a disease. P(A)
0.01 P(A) 0.99.
B is the event that the test result is positive.
P(BA) 0.97 P(BA) 0.03
P(BA) 0.06 P(BA) 0.94
(a) P(B) P(A) P(BA) P(A)P(BA) 0.010.97
0.99 0.06 0.0691
(b) P(AB)P(BA)P(A)/P(B) 0.97 0.01/0.0691
0.1403
(c) P(AB) P(BA)P(A)/P(B)
P(BA)P(A)/(1-P(B)) 0.940.99/(1-.0691)0.9997

82
Sums of Random Variables

z x y
Var(z) Var(x) Var(y) 2Cov(x,y)
If x,y independent Var(z) Var(x) Var(y)
Distribution of z

83
Examples

x and y are uniform on 0,1
Find p(zxy), E(z), Var(z)
x is uniform on -1,1, and P(y) 0.5 for y 0,
y10 and 0 elsewhere.
Find p(zxy), E(z), Var(z)

84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
Normal Distributions

Gaussian distribution
Mean
Variance
Central Limit Theorem says sums of random
variables tend toward a Normal distribution.
Mahalanobis Distance

89
(No Transcript)
90
Multivariate Normal Density

x is a vector of d Gaussian variables
Mahalanobis Distance
All conditionals and marginals are also Gaussian

91
(No Transcript)
92
(No Transcript)
93
Bivariate Normal Densities

Level curves - elliplses.
x and y width are determined by the variances,
and the eccentricity by correlation coefficient
Principal axes are the eigenvectors, and the
width in these direction is the root of the
corresponding eigenvalue.

94
Extra Credit Assignment