Perception - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Perception

Description:

texture or color changes. illumination discontinuity. shadows ... How to Wreck a Nice Beach. where P(signal) is a constant (it is the signal we received) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 44
Provided by: jimg3
Category:
Tags: perception

less

Transcript and Presenter's Notes

Title: Perception


1
Perception
  • Vision, Sections 24.1 - 24.3
  • Speech, Section 24.7

2
Computer Vision
  • the process by which descriptions of physical
    scenes are inferred from images of them. -- S.
    Zucker
  • produces from images of the external 3D world a
    description that is useful to the viewer and not
    cluttered by irrelevant information

3
Typical Applications
  • Medical Image Analysis
  • Aerial Photo Interpretation
  • Material Handling
  • Inspection
  • Navigation

4
Multimedia Applications
  • Image compression
  • Video teleconferencing
  • Virtual classrooms

5
Image pixelation
6
Pixel values
7
How to recognize faces?
8
Problem Background
  • M training images
  • Each image is N x N pixels
  • Each image is
  • normalized for face position, orientation, scale,
    and brightness
  • There are several pictures of each face
  • different moods

9
Your Task
  • Determine if the test image contains a face
  • If it contains a face, is it a face of a person
    in our database?
  • If it is a person in our database, which one?
  • Also, what is the probability that it is Jim?

10
Image Space
  • An N x N image can be thought of as a point in an
    N2 dimensional image space
  • Each pixel is a feature with a gray scale value.
  • Example
  • 512 x 512 image
  • each pixel can be 0 (black) to 255 (white)

11
Nearest Neighbor
  • The most likely match is the nearest neighbor
  • But that would take too much processing
  • Since all images are faces, they will have very
    high similarity

12
Face Space
  • Lower dimensionality to both simplify the storage
    and generalize the answer
  • Use eigenvectors to distill the 20 most
    distinctive metrics
  • Make a 20-item array for each face that contains
    the values of 20 features that most distinguish
    faces.
  • Now each face can be stored in 20 words

13
The average face
  • Training images are I1, I2, . . . Im
  • Average image is A

14
Weight of an image in each feature
  • For k1, . . ., 20 features, compute the
    similarity between the Input image, I, and the
    kth eigenvector, Ek

15
Image in Face Space
  • Only 20 dimensional space
  • W w1, w2, . . ., w20, a column vector of
    weights that indicate the contribution of each of
    the 20 eigenfaces in I
  • Each image is projected from a point in high
    dimensional space into face space
  • 20 features 32 bits 320 bits per image

16
Reconstructing image I
  • If M lt M, we can only approximate I
  • Good enough for recognizing faces

17
Picking the 20 Eigenfaces
  • Principal Component Analysis
  • (also called Karhunen-Loeve transform)
  • Create 20 images that maximize the information
    content in eigenspace
  • Normalize by subtracting the average face
  • Compute the covariance matrix, C
  • Find the eigenvectors of C that have the 20
    largest eigenvalues

18
Build a database of faces
  • Given a training set of face images, compute the
    20 largest eigenvectors,E1, E2, . . . , E20
  • Offline because it is slow
  • For each face in the training set, compute the
    point in eigenspace,W w1,w2, . . . ,w20
  • Offline, because it is big

19
Categorizing a test face
  • Given a test image, Itest, project it into the
    20-space by computing Wtest
  • Find the closest face in the database to the test
    face
  • where Wk is the point in facespace associated
    with the kth person
  • denotes the euclidean distance in
    facespace

20
Distance from facespace
  • Find the distance of the test image from
    eigenspace

21
Is this a face?
  • If dffs lt threshold1
  • then if d lt threshold2
  • the test image is a face that is very close to
    the nearest neighbor, classify it as that person
  • else
  • the image is a face, but not one we recognize
  • else
  • the image probably does not contain a face

22
Face Recognition Accuracy
  • Using 20-dimensional facespace resulted in about
    95 correct classification on a database of 7500
    images of 3000 people
  • If there are several images per person, the
    average W for that person helps improve accuracy

23
Edge Detection
  • Finding simple descriptions of objects in complex
    images
  • find edges
  • interrelate edges

24
Causes of edges
  • Depth discontinuity
  • One surface occludes another
  • Surface orientation discontinuity
  • the edge of a block
  • reflectance discontinuity
  • texture or color changes
  • illumination discontinuity
  • shadows

25
Examples of edges
26
Finding Edges
Image Intensity along a line
First derivative of intensity
Smoothed via convolving with gaussian
27
Pixels on edges
28
Edges found
29
Human-Computer Interfaces
  • Handwriting recognition
  • Optical Character Recognition
  • Gesture recognition
  • Gaze tracking
  • Face recognition

30
Vision Conclusion
  • Machine Vision is so much fun, we have a full
    semester course in it
  • Current research in vision modeling is very
    active
  • More breakthroughs are needed

31
Speech Recognition
  • Section 24.7

32
Speech recognition goal
  • Find a sequence of words that maximizes P(words
    signal)

33
Signal Processing
  • Toll quality was the Bell labs definition of
    digitized speech good enough for long distance
    calls (toll calls)
  • Sampling rate 8000 samples per second
  • Quantization factor 8 bits per sample
  • Too much data to analyze to find utterances
    directly

34
Computational Linguistics
  • Human speech is limited to a repertoire of about
    40 to 50 sounds, called phones
  • Our problem
  • What speech sounds did the speaker utter?
  • What words did the speaker intend?
  • What meaning did the speaker intend?

35
Finding features
36
Vector Quantization
  • The 255 most common clusters of feature values
    are labeled C1, , C255
  • Send only the 8 bit label
  • One byte per frame (a 100-fold improvement over
    the 500 KB/minute)

37
How to Wreck a Nice Beach
  • where P(signal) is a constant (it is the signal
    we received)
  • So we want

38
Unigram Frequency
  • Word frequency
  • Even though his handwriting was sloppy, Woody
    Allens bank hold-up note probably should not
    have been interpreted as I have a gub
  • The word gun is common
  • The word gub is unlikely

39
Language model
  • Use the language model to compare
  • P(wreck a nice beach)
  • P(recognize speech)
  • Use naïve Bayes to asses the likelihood for each
    word that it will appear in this context

40
Bigram model
  • want P(wi w1, w2, , wn)
  • approximate it by P(wi wI-1)
  • Easy to train
  • Simply count the number of times each word pair
    occurs
  • I has is unlikely, I have is likely
  • an gun is unlikely, a gun is likely

41
Trigram
  • Some trigrams are very common
  • only track the most common trigrams
  • Use a weighted sum of
  • unigram
  • bigram
  • trigram

42
Near the end of the semester
  • Time flies like an arrow
  • Fruit flies like a banana
  • It is currently hard to incorporate parts of
    speech and sentence grammar into the probability
    calculation
  • lots of ambiguity
  • but humans seem to do it

43
Conclusion
  • Speech recognition technology is changing very
    quickly
  • Highly parallel
  • Amenable to hardware implementations
Write a Comment
User Comments (0)
About PowerShow.com