Spoken Dialog Systems and Voice XML : Intro to Pattern Recognition - PowerPoint PPT Presentation

1 / 95
About This Presentation
Title:

Spoken Dialog Systems and Voice XML : Intro to Pattern Recognition

Description:

Some materials used in this course were taken from the textbook 'Pattern ... David G. Stork, Stanford University. Dr. Godfried Toussaint, McGill University ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 96
Provided by: CED80
Category:

less

Transcript and Presenter's Notes

Title: Spoken Dialog Systems and Voice XML : Intro to Pattern Recognition


1
Spoken Dialog Systems and Voice XML Intro to
Pattern Recognition
  • Esther Levin
  • Dept of Computer Science
  • CCNY

Some materials used in this course were taken
from the textbook Pattern Classification by
Duda et al., John Wiley Sons, 2001 with the
permission of the authors and the publisher
2
Credits and Acknowledgments
  • Materials used in this course were taken from the
    textbook Pattern Classification by Duda et al.,
    John Wiley Sons, 2001 with the permission of
    the authors and the publisher and also from
  • Other material on the web
  • Dr. A. Aydin Atalan, Middle East Technical
    University, Turkey
  • Dr. Djamel Bouchaffra, Oakland University
  • Dr. Adam Krzyzak, Concordia University
  • Dr. Joseph Picone, Mississippi State University
  • Dr. Robi Polikar, Rowan University
  • Dr. Stefan A. Robila, University of New Orleans
  • Dr. Sargur N. Srihari, State University of New
    York at Buffalo
  • David G. Stork, Stanford University
  • Dr. Godfried Toussaint, McGill University
  • Dr. Chris Wyatt, Virginia Tech
  • Dr. Alan L. Yuille, University of California, Los
    Angeles
  • Dr. Song-Chun Zhu, University of California, Los
    Angeles

3
Outline
  • Introduction
  • What is this pattern recogntiion
  • Background Material
  • Probability theory

4
(No Transcript)
5
PATTERN RECOGNITION AREAS
  • Optical Character Recognition ( OCR)
  • Sorting letters by postal code.
  • Reconstructing text from printed materials (such
    as reading machines for blind people).
  • Analysis and identification of human patterns
  • Speech and voice recognition.
  • Finger prints and DNA mapping.
  • Banking and insurance applications
  • Credit cards applicants classified by income,
    credit worthiness, mortgage amount, of
    dependents, etc.
  • Car insurance (pattern including make of car, of
    accidents, age, sex, driving habits, location,
    etc).
  • Diagnosis systems
  • Medical diagnosis (disease vs. symptoms
    classification, X-Ray, EKG and tests analysis,
    etc).
  • Diagnosis of automotive malfunctioning
  • Prediction systems
  • Weather forecasting (based on satellite data).
  • Analysis of seismic patterns
  • Dating services (where pattern includes age, sex,
    race, hobbies, income, etc).

6
More Pattern Recognition Applications
  • SENSORY
  • Vision
  • Face/Handwriting/Hand
  • Speech
  • Speaker/Speech
  • Olfaction
  • Apple Ripe?
  • DATA
  • Text Categorization
  • Information Retrieval
  • Data Mining
  • Genome Sequence Matching

7
What is a pattern?
  • A pattern is the opposite of a chaos it is an
    entity vaguely defined, that could be given a
    name.

8
PR Definitions
  • Theory, Algorithms, Systems to Put Patterns into
    Categories
  • Classification of Noisy or Complex Data
  • Relate Perceived Pattern to Previously Perceived
    Patterns

9
Characters
A v t u I h D U w K
Ç s g I ü Ü Ö G
????? ? ? ? ? ? ? O ? ? ?
? ? ? ? ?
10
Handwriting
11
(No Transcript)
12
Terminology
  • Features, feature vector
  • Decision boundary
  • Error
  • Cost of error
  • Generalization

13
A Fishy Example I
  • Sorting incoming Fish on a conveyor according
    to species using optical sensing
  • Salmon or Sea Bass?

14
  • Problem Analysis
  • Set up a camera and take some sample images to
    extract features
  • Length
  • Lightness
  • Width
  • Number and shape of fins
  • Position of the mouth, etc
  • This is the set of all suggested features to
    explore for use in our classifier!

15
Solution by Stages
  • Preprocess raw data from camera
  • Segment isolated fish
  • Extract features from each fish (length,width,
    brightness, etc.)
  • Classify each fish

16
  • Preprocessing
  • Use a segmentation operation to isolate fishes
    from one another and from the background
  • Information from a single fish is sent to a
    feature extractor whose purpose is to reduce the
    data by measuring certain features
  • The features are passed to a classifier

2
17
2
18
  • Classification
  • Select the length of the fish as a possible
    feature for discrimination

2
19
2
20
  • The length is a poor feature alone!
  • Select the lightness as a possible feature.

2
21
2
22
Customers do not want sea bass in their cans of
salmon
  • Threshold decision boundary and cost relationship
  • Move our decision boundary toward smaller values
    of lightness in order to minimize the cost
    (reduce the number of sea bass that are
    classified salmon!)
  • Task of decision theory

2
23
  • Adopt the lightness and add the width of the fish
  • Fish x x1, x2

Lightness
Width
2
24
2
25
  • We might add other features that are not
    correlated with the ones we already have. A
    precaution should be taken not to reduce the
    performance by adding such noisy features
  • Ideally, the best decision boundary should be the
    one which provides an optimal performance such as
    in the following figure

2
26
2
27
  • However, our satisfaction is premature because
    the central aim of designing a classifier is to
    correctly classify novel input
  • Issue of generalization!

2
28
2
29
Decision Boundaries
Observe Can do much better with two features
Caveat overfitting!
30
Occams Razor
Entities are not to be multiplied without
necessity William of Occam
(1284-1347)
31
A Complete PR System
32
Problem Formulation
Input object
Class Label
Measurements Preprocessing
Classification
Features
  • Basic ingredients
  • Measurement space (e.g., image intensity,
    pressure)
  • Features (e.g., corners, spectral energy)
  • Classifier - soft and hard
  • Decision boundary
  • Training sample
  • Probability of error

33
Pattern Recognition Systems
  • Sensing
  • Use of a transducer (camera or microphone)
  • PR system depends of the bandwidth, the
    resolution, sensitivity, distortion of the
    transducer
  • Segmentation and grouping
  • Patterns should be well separated and should not
    overlap

3
34
3
35
  • Feature extraction
  • Discriminative features
  • Invariant features with respect to translation,
    rotation and scale.
  • Classification
  • Use a feature vector provided by a feature
    extractor to assign the object to a category
  • Post Processing
  • Exploit context dependent information other than
    from the target pattern itself to improve
    performance

36
The Design Cycle
  • Data collection
  • Feature Choice
  • Model Choice
  • Training
  • Evaluation
  • Computational Complexity

4
37
4
38
  • Data Collection
  • How do we know when we have collected an
    adequately large and representative set of
    examples for training and testing the system?

4
39
  • Feature Choice
  • Depends on the characteristics of the problem
    domain. Simple to extract, invariant to
    irrelevant transformation insensitive to noise.

4
40
  • Model Choice
  • Unsatisfied with the performance of our linear
    fish classifier and want to jump to another class
    of model

4
41
  • Training
  • Use data to determine the classifier. Many
    different procedures for training classifiers and
    choosing models

4
42
  • Evaluation
  • Measure the error rate (or performance) and
    switch from one set of features models to
    another one.

4
43
  • Computational Complexity
  • What is the trade off between computational ease
    and performance?
  • (How an algorithm scales as a function of the
    number of features, number or training examples,
    number patterns or categories?)

4
44
Learning and Adaptation
  • Learning Any method that combines empirical
    information from the environment with prior
    knowledge into the design of a classifier,
    attempting to improve performance with time.
  • Empirical information Usually in the form of
    training examples.
  • Prior knowledge Invariances, correlations
  • Supervised learning
  • A teacher provides a category label or cost for
    each pattern in the training set
  • Unsupervised learning
  • The system forms clusters or natural groupings
    of the input patterns

5
45
Syntactic Versus Statistical PR
  • Basic assumption There is an underlying
    regularity behind the observed phenomena.
  • Question Based on noisy observations, what is
    the underlying regularity?
  • Syntactic Structure through common generative
    mechanism. For example, all different
    manifestations of English, share a common
    underlying set of grammatical rules.
  • Statistical Objects characterized through
    statistical similarity. For example, all possible
    digits 2' share some common underlying
    statistical relationship.

46
Difficulties
  • Segmentation
  • Context
  • Temporal structure
  • Missing features
  • Aberrant data
  • Noise

Do all these images represent an A'?
47
Design Cycle
How do we know what features to select, and how
do we select them?
What type of classifier shall we use. Is there
best classifier? How do we train? How do we
combine prior knowledge with empirical data?
How do we evaluate our performance Validate the
results. Confidence in decision?
48
Conclusion
  • I expect you are overwhelmed by the number,
    complexity and magnitude of the sub-problems of
    Pattern Recognition
  • Many of these sub-problems can indeed be solved
  • Many fascinating unsolved problems still remain

6
49
Toolkit for PR
  • Statistics
  • Decision Theory
  • Optimization
  • Signal Processing
  • Neural Networks
  • Fuzzy Logic
  • Decision Trees
  • Clustering
  • Genetic Algorithms
  • AI Search
  • Formal Grammars
  • .

50
Linear algebra
  • Matrix A
  • Matrix Transpose
  • Vector a

51
Matrix and vector multiplication
  • Matrix multiplication
  • Outer vector product
  • Vector-matrix product

52
Inner Product
  • Inner (dot) product
  • Length (Eucledian norm) of a vector
  • a is normalized iff a 1
  • The angle between two n-dimesional vectors
  • An inner product is a measure of collinearity
  • a and b are orthogonal iff
  • a and b are collinear iff
  • A set of vectors is linearly independent if no
    vector is a linear combination of other vectors.

53
Determinant and Trace
  • Determinant
  • det(AB) det(A)det(B)
  • Trace

54
Matrix Inversion
  • A (n x n) is nonsingular if there exists B
  • A2 3 2 2, B-1 3/2 1 -1
  • A is nonsingular iff
  • Pseudo-inverse for a non square matrix, provided
  • is not singular

55
Eigenvectors and Eigenvalues
Characteristic equation n-th order polynomial,
with n roots.
56
Probability Theory
  • Primary references
  • Any Probability and Statistics text book
    (Papoulis)
  • Appendix A.4 in Pattern Classification by Duda
    et al
  • The principles of probability theory,
    describing the behavior of systems with random
    characteristics, are of fundamental importance to
    pattern recognition.

57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
Example 1 ( wikipedia)
  • two bowls full of cookies.
  • Bowl 1 has 10 chocolate chip cookies and 30
    plain cookies,
  • bowl 2 has 20 of each.
  • Fred picks a bowl at random, and then picks a
    cookie at random.
  • The cookie turns out to be a plain one.
  • How probable is it that Fred picked it out of
    bowl
  • whats the probability that Fred picked bowl 1,
    given that he has a plain cookie?
  • event A is that Fred picked bowl 1,
  • event B is that Fred picked a plain cookie.
  • Pr(AB) ?

75
Example1 - cpntinued
Tables of occurrences and relative frequencies It
is often helpful when calculating conditional
probabilities to create a simple table containing
the number of occurrences of each outcome, or the
relative frequencies of each outcome, for each of
the independent variables. The tables below
illustrate the use of this method for the cookies.
The table on the right is derived from the table
on the left by dividing each entry by the total
number of cookies under consideration, or 80
cookies.
76
Example 2
  • 1. Power Plant Operation.
  • The variables X, Y, Z describe the state of 3
    power plants (X0 means plant X is idle).
  • Denote by A an event that a plant X is idle, and
    by B an event that 2 out of three plants are
    working.
  • Whats P(A) and P(AB), the probability that X is
    idle given that at least 2 out of three are
    working?

77
  • P(A) P(0,0,0) P(0,0,1) P(0,1,0) P(0, 1,
    1) 0.070.04 0.03 0.18 0.32
  • P(B) P(0,1,1) P(1,0,1) P(1,1,0) P(1,1,1)
    0.18 0.180.210.130.7
  • P(A and B) P(0,1,1) 0.18
  • P(AB) P(A and B)/P(B) 0.18/0.7 0.257

78
  • 2. Cars are assembled in four possible locations.
    Plant I supplies 20 of the cars plant II, 24
    plant III, 25 and plant IV, 31. There is 1
    year warrantee on every car.
  • The company collected data that shows
  • P(claim plant I) 0.05 P(claimPlant II)0.11
  • P(claimplant III) 0.03 P(claimPlant
    IV)0.18
  • Cars are sold at random.
  • An owned just submitted a claim for her car. What
    are the posterior probabilities that this car was
    made in plant I, II, III and IV?

79
  • P(claim) P(claimplant I)P(plant I)
  • P(claimplant II)P(plant II)
  • P(claimplant III)P(plant III)
  • P(claimplant IV)P(plant IV) 0.0687
  • P(plant1claim)
  • P(claimplant I) P(plant I)/P(claim) 0.146
  • P(plantIIclaim)
  • P(claimplant II) P(plant II)/P(claim)
    0.384
  • P(plantIIIclaim)
  • P(claimplant III) P(plant III)/P(claim)
    0.109
  • P(plantIVclaim)
  • P(claimplant IV) P(plant IV)/P(claim)
    0.361

80
Example 3
  • 3. It is known that 1 of population suffers from
    a particular disease. A blood test has a 97
    chance to identify the disease for a diseased
    individual, by also has a 6 chance of falsely
    indicating that a healthy person has a disease.
  • a. What is the probability that a random person
    has a positive blood test.
  • b. If a blood test is positive, whats the
    probability that the person has the disease?
  • c. If a blood test is negative, whats the
    probability that the person does not have the
    disease?

81
  • A is the event that a person has a disease. P(A)
    0.01 P(A) 0.99.
  • B is the event that the test result is positive.
  • P(BA) 0.97 P(BA) 0.03
  • P(BA) 0.06 P(BA) 0.94
  • (a) P(B) P(A) P(BA) P(A)P(BA) 0.010.97
    0.99 0.06 0.0691
  • (b) P(AB)P(BA)P(A)/P(B) 0.97 0.01/0.0691
    0.1403
  • (c) P(AB) P(BA)P(A)/P(B)
    P(BA)P(A)/(1-P(B)) 0.940.99/(1-.0691)0.9997

82
Sums of Random Variables
  • z x y
  • Var(z) Var(x) Var(y) 2Cov(x,y)
  • If x,y independent Var(z) Var(x) Var(y)
  • Distribution of z

83
Examples
  • x and y are uniform on 0,1
  • Find p(zxy), E(z), Var(z)
  • x is uniform on -1,1, and P(y) 0.5 for y 0,
    y10 and 0 elsewhere.
  • Find p(zxy), E(z), Var(z)

84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
Normal Distributions
  • Gaussian distribution
  • Mean
  • Variance
  • Central Limit Theorem says sums of random
    variables tend toward a Normal distribution.
  • Mahalanobis Distance

89
(No Transcript)
90
Multivariate Normal Density
  • x is a vector of d Gaussian variables
  • Mahalanobis Distance
  • All conditionals and marginals are also Gaussian

91
(No Transcript)
92
(No Transcript)
93
Bivariate Normal Densities
  • Level curves - elliplses.
  • x and y width are determined by the variances,
    and the eccentricity by correlation coefficient
  • Principal axes are the eigenvectors, and the
    width in these direction is the root of the
    corresponding eigenvalue.

94
Extra Credit Assignment
  • 2 of the final grade, due before next lecture

95
Information theory
  • Key principles
  • What is the information contained in a random
    event?
  • Less probable event contains more information
  • For two independent event, the information is a
    sum
  • What is the average information or entropy of a
    distribution?

96
  • Examples uniform distribution, dirac
    distribution
  • Mutual information reduction in uncertainty
    about one variable due to knowledge of other
    variable.
Write a Comment
User Comments (0)
About PowerShow.com