Title: Outline
1Outline
- Syllabus
- Introduction
- Computational Paradigms for Vision
- Appearance-based computer vision
- Physics-based computer vision
2Class Materials
- In this class most of the time we will discuss
papers from the literature - At the beginning I will give a general
introduction based on chapters from different
books - There is no required textbook for this class
3Vision
- Vision
- The process of acquiring knowledge about the
environmental objects and events by extracting
information from the light they emit or reflect - Vision is a very complicated process, involving
different processes such as memory - Vision is the most useful source for information
as about 50 of the human brain is devoted to
visual processing
4Vision cont.
- Vision has been studied from many different
perspectives - Computational vision
- Emphasis on approaches that are biologically
plausible - Computer vision
- Emphasis on algorithms to solve particular
problems - Statistical vision
- Emphasis on developing and analyzing mathematical
and statistical models
5Darwin X
Source NewScientist
6Computer Vision
- Computer vision tries to automate the vision
process by building devices that simulate the
human vision process - Note that devices that solve part of the problems
can be very useful
7Motivation Examples
- Computer vision techniques can provide novel
opportunities and improve performance of existing
systems (sometimes significantly) - Hopefully the following examples will convince you
8Human Computer Interfaces
- Mouse gestures
- Allow one to control programs more easily by
drawing commands using mouse - Some of the 80 gestures recognized by strokeit
(http//www.tcbmi.com/strokeit/)
9Mouse Gestures
- In Photoshop, for example, you can
- In a web browser, you can
10Human-Computer Interactions
113D Hand Mouse
12HandiEye
13Sign Language Recognition
14ALVINN
15(No Transcript)
16RALPH
17Applications continued
18DARPA Grant Challenge
- http//www.darpa.mil/grandchallenge/gcorg/index.ht
ml
19DARPA Grant Challenge
20Introduction cont.
- Honda ASIMO
- http//world.honda.com/ASIMO/
21Automated Map Updating
22Automated Map Updating
233D Urban Models
24Image-Guided Neurosurgery
25Intracardiac Surgical Planning
26Medical Image Analysis
27Detection and Recognition
28Detection and Recognition of Text in Natural
Scenes
29Detection and Recognition of Text in Natural
Scenes
30Text Detection and Recognition in Images and
Videos
31Driver Monitoring System
32Face Recognition
http//www.a4vision.com
33Intelligent Transportation Systems
http//dfwtraffic.dot.state.tx.us/dal-cam-nf.asp
34Handwritten Address Interpretation System
- HWAI - http//www.cedar.buffalo.edu/HWAI/
- The HWAI (Handwritten Address Interpretation)
System was developed at Center of Excellence for
Document Analysis and Recognition (CEDAR) at
University at Buffalo, The State University of
New York. It resulted from many years of research
at CEDAR on the problems of Address Block
location, Handwritten Digit/Character/Word
Recognition, Database Compression, Information
Retrieval, Real-Time Image Processing, and
Loosely-Coupled Multiprocessing. - The following presentation is based on the
demonstration pages at HWAI
35Handwritten Address Interpretation System cont.
36Handwritten Address Interpretation System Cont.
- Step 2 Address Block Location
37Handwritten Address Interpretation System Cont.
- Step 3 Address Extraction
38Handwritten Address Interpretation System Cont.
39Handwritten Address Interpretation System Cont.
40Handwritten Address Interpretation System Cont.
41Handwritten Address Interpretation System Cont.
- Step 7 Recognition
- (a) State Abbreviation Recognition
42Handwritten Address Interpretation System Cont.
- Step 7 Recognition
- (b) ZIP Code Recognition
43Handwritten Address Interpretation System Cont.
- Step 7 Recognition
- (c) Street Number Recognition
44Handwritten Address Interpretation System Cont.
- Step 8 Street Name Recognition
45Handwritten Address Interpretation System Cont.
- Step 9 Delivery Point Codes
46Handwritten Address Interpretation System Cont.
47Military Applications
48Automated Global Monitoring
49Approaches to Computer Vision
- Vision is a complicated computational process
- Try to simulate the human vision system
- Try to build mathematical formulations of the
environment (to be perceived) and then perform
inference - Try to invent approximate but efficient short
cuts to the general vision problem
50Neuroanatomy of the Brain
51Visual Pathway
52Visual Pathway Diagram
53Eye-Camera Analogy
- The eye is much like a camera
- Both form an upside-down image by admitting light
through a variable-sized opening and focusing it
on a two-dimensional surface using a transparent
lens
54Functions of Different Cells
55Nobel Prize Winning Experiments
56Nobel Prize Winning Experiments
57Nobel Prize Winning Experiments cont.
58Nobel Prize Winning Experiments cont.
59Simple Cells in the Visual Cortex
60Simple Cells
- rectangular shaped receptive fields
- segregated ON and OFF zones
- respond to a bright or dark bar
- represent a restricted region in the visual field
- respond best to a specific orientation
- non-optimally oriented stimuli will be
ineffective in stimulating the neuron
61Complex Cells
- larger receptive field than simple cells
- orientation tuned
- ON and OFF zones are mixed in the receptive field
- respond well to a moving bar
- direction selective
62Hyper-complex Cells
- receptive field is selective for the length of
the stimulus - similar to complex cell receptive fields
(orientation and direction selective) - selective for features of shape such as length
and width of the bar of light.
63Brain Imaging
64Psychophysical Studies
- Determination of the relationship between the
magnitude of a sensation and the magnitude of the
stimulus that gave rise to that perceptual
sensation - By studying the perception to different stimuli,
one can guess what happened in the visual
system
65Contrast sensitivity function
66Single Channel or Multiple Channels
67Neural Spatial Frequency Channels
- Neural receptive fields are tuned to the spatial
frequency of the stimulus - There seems to be a range of neural spatial
frequency channels, each tuned to a different
spatial frequency - A spatial frequency channel can be adapted
68Vision as an Inverse Problem
- 2-D images are generated by projecting 3-D world
onto an image plane under certain lighting
conditions and view angles - The images are a function of the 3-D object
surfaces and their surface properties - Vision essentially needs to solve an inverse
problem - Roughly the inverse of computer graphics
69An Example
70Physics-based Computer Vision
- This naturally leads to the physics-based
computer vision - One needs to build computational models for image
formation process (computer graphics) - One needs to build representations of objects
- Which includes surface and texture (color map)
- Vision is essentially an algorithm to recover the
underlying three dimensional models of a given
image - A widely accepted framework is Bayesian inference
71Face Recognition based on a 3D Model
72Face Recognition based on a 3D Model cont.
73Face Recognition based on a 3D Model cont.
74Face Recognition based on a 3D Model cont.
75Appearance-based Computer Vision
- A different approach is to try to utilize the
resulting 2-D images directly - The images are treated as a matrix
- One tries to make decisions based on the images
without building explicit 3-D models - Note that here computer vision is an application
of pattern recognition algorithms
76Face detection using spectral histograms
- The problem is to detect faces in images
77Face detection using spectral histograms cont.
Preprocessing
78Face detection using spectral histograms cont.
79Face detection using spectral histograms cont.
80Face detection using spectral histograms cont.
81Face detection using spectral histograms cont.
82Rotation-invariant face detection
83Face detection using spectral histograms cont.
84Object Detection and Recognition
- Object detection and recognition problem
- Given a set of images, find regions in these
images which contain instances of relevant
objects - Here the number of relevant objects is assumed to
be large - For example, the system should be able to handle
30,000 different kinds of objects, an estimate of
the humans capacity for basic level visual
categorization - Goal
- Develop a system that achieves real-time
detection and recognition for images of size 720
x 480 - At a frame rate of 30 frames per second (which is
the NTSC standard video stream)
85A Framework
86Requirements
- To achieve real-time detection and recognition,
we need two critical components - A classifier that can reduce the average
classification time effectively - Features that can discriminate a large number of
objects and can be computed using a few
instructions
87Lookup Table Decision Trees
- We use local spectral histogram features that are
computed using histogram integral images - We build a decision tree by clustering
- At each node, we reduce the dimension to a small
number, i.e., no more than 5 for detection and
recognition applications - We can approximate the decision from any of the
classifiers using a lookup table
88Local spectral histogram features
89Comparison of LSH and Haar features
90Look-up Table Decision Trees
This requires clustering and we just use some
standard methods
91An example path of a decision tree
92Real-time detection and recognition cont.
93Optimal component analysis
- Linear representations are widely used in
appearance-based object recognition applications - Simple to implement and analyze
- Efficient to compute
- Effective for many applications
94Standard linear representations
- Principal Component Analysis
- Designed to minimize the reconstruction error on
the training set - Fisher Discriminant Analysis
- Designed to maximize the separation between means
of each class - Independent Component Analysis
- Designed to maximize the statistical independence
among coefficients along different directions - A toy example
- Standard representations give the worst
recognition performance
95Optimal Component Analysis
- Derive a performance function that is related to
the recognition performance - Formulate the problem of finding optimal
representations as an optimization one on the
Grassmann manifold - Use MCMC stochastic gradient algorithm for
optimization
96Performance Measure - continued
- Suppose there are C classes to be recognized
- Each class has ktrain training images
- It has kcross cross validation images
97Performance Measure - continued
- F(U) depends on the span of U but is invariant to
change of basis - In other words, F(U)F(UO) for any orthonormal
matrix O - The search space of F(U) is the set of all the
subspaces, which is known as the Grassmann
manifold - It is not a flat vector space and gradient flow
must take the underlying geometry of the manifold
into account
98Kernel optimal component analysis
99Kernel optimal component analysis
100Kernel optimal component analysis
101Kernel function and kernel parameter learning
102Subset of a face dataset for visualization
103Evolution of OCA learning
104Performance comparison
105Performance comparison on a full face dataset
106Summary
- Computer vision as an information processing
process is very complex - A fundamental approach to vision is synthesis by
analysis - Which involves building 3D models
- A popular short cut is appearance-based computer
vision - Where an object is approximated by views under
different conditions - We will start will appearance-based approach