Intelligent Vision Processor - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

Intelligent Vision Processor

Description:

Intelligent Vision Processor John Morris Computer Science/ Electrical & Computer Engineering, The University of Auckland Iolanthe II rounds Channel Island - – PowerPoint PPT presentation

Number of Views:160

Avg rating:3.0/5.0

Slides: 59

Provided by: CSStu6

Category:

more less

Transcript and Presenter's Notes

Title: Intelligent Vision Processor

1
Intelligent Vision Processor

John Morris
Computer Science/Electrical Computer
Engineering,The University of Auckland

Iolanthe II rounds Channel Island
-Auckland-Tauranga Race, 2007
2
Intelligent Vision Processor

Applications
? Robot Navigation
? Collision avoidance autonomous vehicles
? Manoeuvring in dynamic environments
? Biometrics
Face recognition
? Tracking individuals
? Films
? Markerless motion tracking
? Security
? Intelligent threat detection
? Civil Engineering
? Materials Science
? Archaeology

3
Background
4
Intelligent Vision

Our vision system is extraordinary
Capabilities currently exceed those of any single
processor
Our brains
Operates on a very slow clock
kHz region
Massively parallel
gt1010 neurons can compute in parallel
Vision system (eyes) can exploit this parallelism
3 x 106 sensor elements (rods and cones) in
human retina

5
Intelligent Vision

Matching and recognition
Artificial intelligence systems are currently not
in the race!
For example
Face recognition
We can recognize faces
From varying angles
Under extreme lighting conditions
With or without glasses, beards, bandages,
makeup, etc
With skin tone changes, eg sunburn
Games
We can strike balls travelling at gt 100 km/h
and
Direct that ball with high precision

6
Human vision

Uses a relatively slow, but massively parallel
processor (our brains)
Able to perform tasks
At speeds
and
With accuracy
beyond capabilities of state-of-the-art
artificial systems

7
Intelligent Artificial Vision

High performance processor
Too slow for high resolution (Mpixel) imagein
real time (30 frames per second)
Useful vision systems
Must be able to
Produce 3D scene models
Update scene models quickly
Immediate goal 20-30Hz to mimic human
capabilities
Long term goal gt30 Hz to provide enhanced
capabilities
Produce accurate scene models

8
Intelligent Artificial Vision

Use human brain as the fundamental model
We know it works better than a conventional
processor!
We need

Artificial system Brain
Large numbers of (small) processing elements Neurons
Many parallel connections Nerves
9
Human Vision Systems

Higher order animals all use binocular vision
systems
Permits estimation of distance to an object
Vital for many survival tasks
Hunting
Avoiding danger
Fighting predators
Distance (or depth) computed by triangulation

P
P
P
P-P is the disparity It increases as P comes
closer
10
Human Vision Systems

Higher order animals all use binocular vision
systems
Permits estimation of distance to an object
Vital for many survival tasks
Hunting
Avoiding danger
Fighting predators
Distance (or depth) computed by triangulation

P
P
P
P-P is the disparity Increases as P comes
closer
11
Artificial Vision

Evolution took millions of years to optimize
vision
Dont ignore those lessons!
Binocular vision works
Verging optics
Human eyes are known to swivel to fixate on an
object of interest

12
Real vs Ideal Systems

Real lenses distort images
Distortion must be removed for high precision
work!
Easy
but
Conventional technique uses iterative solution
Slow!
Faster approach needed for real time work

Image of a rectangular gridwith a real lens
13
Why Stereo?

Range finders give depth information directly
SONAR
Simple
Not very accurate (long l)
Beam spread ? Low spatial resolution
Lasers
Precise
Low divergence ? High spatial resolution
Requires fairly sophisticated electronics
Nothing too challenging in 2008
Why use an indirect measurement when direct ones
are available?

14
Why Stereo?

Passive
Suitable for dense environments
Sensors do not interfere with each other
Wide area coverage
Multiple overlapping views obtainable without
interference
Wide area 3D data can be acquired at high rates
3D data aids unambiguous recognition
3rd dimension provides additional discrimination
Textureless regions cause problems
but
Active illumination can resolve these
Active patterns can use IR (invisible, eye-safe)
light

15
Artificial Vision Challenges
16
Artificial Vision - Challenges

High processor power
Match parallel capabilities of human brain
Distortion removal
Real lenses always show some distortion
Depth accuracy
Evolution learnt about verging optics millions of
years ago!
Efficient matching
Good corresondence algorithms

17
Artificial Vision

Simple stereo systems are being produced
Point Grey, etc
All use canonical configuration
Parallel axes, coplanar image planes
Computationally simpler
High performance processor doesnt have time to
deal with the extra computational complexity of
verging optics

Point Grey Research Trinocular vision system
18
Artificial System Requirements

Highly Parallel Computation
Calculations are not complex
but
There are a lot of them in megapixel ( gt106 )
images!
High Resolution Images
Depth is calculated from the disparity
If its only a few pixels, then depth accuracy is
low
Basic equation (canonical configuration only!)

Baseline
Focal Length
Depth, z b f d p
Pixel size
Disparity
19
Artificial System Requirements

Depth resolution is critical!
A cricket player can catch a 100mm ball
travelling at 100km/h
High Resolution Images Needed
Disparities are large numbers of pixels
Small depth variations can be measured
but
High resolution images increase the demand for
processing power!

Strange game played in former British
coloniesin which a batsmen defends 3 small
sticksin the centre of a large field against a
bowler whotries to knock them down!
20
Artificial System Requirements

Conventional processors do not have sufficient
processing power
but Moores Law says
Wait 18 months and the power will have doubled
but
The changes that give you twice the poweralso
give your twice as many pixels in a rowand four
times as many in an image!

Specialized highly parallel hardwareis the only
solution!
21
Processing Power Solution
22
FPGA Hardware

FPGA Field Programmable Gate Array
Soft hardware
Connections and logic functions are programmed
in much the same way as a conventional von Neuman
processor
Creating a new circuit is about as difficult as
writing a programme!
High order parallelism is easy
Replicate the circuit n times
As easy as writing a for loop!

23
FPGA Hardware

FPGA Field Programmable Gate Array
Circuit is stored in static RAM cells
Changed as easily as reloading a new program

24
FPGA Hardware

Why is programmability important?
or
Why not design a custom ASIC?
Optical systems dont have the flexibility of a
human eye
Lenses fabricated from rigid materials
Not possible to make a one system fits all
system
Optical configurations must be designed for each
application
Field of view
Resolution required
Physical constraints
Processing hardware has to be adapted to the
optical configuration
If we design an ASIC, it will only work for one
application!!

25
Correspondence or Matching
26
Stereo Correspondence
Can you find all the matching points in these two
images?
Of course! Its easy!
The best computer matching algorithms get 5 or
more of the points completely wrong!
and take a long time to do it!Theyre not
candidates for real time systems!!
27
Stereo Correspondence

High performance matching algorithms are global
in nature
Optimize over large image regions using energy
minimization schemes
Global algorithms are inherently slow
Iterate many times over small regions to find
optimal solutions

28
Correspondence Algorithms

Good matching performance, global, low speed
Graph-cut, belief-propagation,
High speed, simple, local, high parallelism,
lowest performance
Correlation
High speed, moderate complexity, parallel, medium
performance

Dynamic programming algorithms
29
Depth Accuracy
30
Stereo Configuration
Points along these lineshave the same disparity

Canonical configuration Two cameras with
parallel optical axes
Rays are drawn through each pixel in the image
Ray intersections represent points imaged onto
the centre of each pixel

Depthresolution

but
To obtain depth information, a point must be seen
by both cameras, ie it must be in the Common
Field of View

31
Stereo Camera Configuration

Now, consider an object of extent, a
To be completely measured, it must lie in the
Common Field of View
but
place it as close to the camera as you can so
that you can obtain the best accuracy, say at D
Now increase b to increase the accuracy at D
But you must increase D so that the object stays
within the CFoV!
Detailed analysis leads to an optimum value of b
? a

a
D
b
a
32
Increasing the baseline
Increasing the baseline decreases performance!!
good matches
Images corridor set (ray-traced) Matching
algorithms P2P, SAD
Baseline, b
33
Increasing the baseline
Examine the distribution of errors
Increasing the baseline decreases performance!!
Standard Deviation
Images corridor set (ray-traced) Matching
algorithms P2P, SAD
Baseline, b
34
Increased Baseline ? Decreased Performance

Statistical
Higher disparity range
increased probability of matching incorrectly -
youve simply got more choices!
Perspective
Scene objects are not fronto-planar
Angled to camera axes
subtend different numbers of pixels in L and R
images
Scattering
Perfect scattering (Lambertian) surface
assumption
OK at small angular differences
increasing failure at higher angles
Occlusions
Number of hidden regions increases as angular
difference increases
increasing number of monocular points for
which there is no 3D information!

35
Evolution

Human eyes verge on an object to estimate its
distance, ie the eyes fix on the object in the
field of view

Configuration commonly used in stereo systems
Configuration discovered by evolution millions of
years ago
Note immediately that the CFoV is much larger!
36
Look at the optical configuration!

If we increase f, then Dmin returns to the
critical value!

Original f
Increase f
37
Depth Accuracy - Verging axes, increased f
Now the depth accuracy has increased dramatically!
Note that at large f, the CFoV does not
extend very far!
38
Summary
39
Summary Real time stereo

General data acquisition is
Non contact
Adaptable to many environments
Passive
Not susceptible to interference from other
sensors
Rapid
Acquires complete scenes in each shot
Imaging technology is well established
Cost effective, robust, reliable
3D data enhances recognition
Full capabilities of 2D imaging system
Depth data
With hardware acceleration
3D scene views available for
ControlMonitoring
in real time
Rapid response ? rapid throughput

Host computer is free to process complex control
algorithms ? Intelligent Vision
Processing Systems which can mimic human vision
system capabilities!
40
Our Solution
41
System Architecture
FPGA
L Camera
SerialInterface Firewire/GigE/CameraLink
Line BuffersDistortion Removal Image Alignment
RCamera
PC
Host Higher orderInterpretation
Control Signals
CorrectedImages
Stereo Matching
Disparity? Depth
DepthMap
42
Distortion removal

Image of a rectangular grid from camera with
simple zoom lens
Lines should be straight!
Store displacements of actual image from ideal
points in LUT
Removal algorithm
For each ideal pixel position
Get displacement to real image
Calculate intensity of ideal pixel (bilinear
interpolation)

43
Distortion Removal

Fundamental Idea
Calculation of undistorted pixel position
Simple but slow
Not suitable for real time
but
Its the same for every image!
So, calculate once!
Create a look up table containing ideal ? actual
displacements for each pixel

ud uud (1k2k4..)r2 r2 (uudvud)2
44
Distortion Removal

Creating the LUT
One entry (dx,dy) per pixel
For a 1 Mpixel image needs 8 Mpixels!
Each entry is a float (dx,dy) requires 8 bytes
However, distortion is a smooth curve
Store one entry per n pixels
Trials show that n64 is OK for severely
distorted image
LUT row contains 210 / 2 6 24 16 entries
Total LUT is 256 entries
Displacement for pixel j,k
dujk (j mod 64) duj/64,k/64
duj/64,k/64 is stored in LUT
Simple, fast circuit

Since the algorithm runs along scan lines,this
multiplication is done by repeated addition
45
Alignment correction

In general, cameras will not be perfectly aligned
in canonical configuration
Also, may be using verging axes to improve depth
resolution
Calculate locations of epipolar lines once!
Add displacements to LUT for distortion!

46
Real time 3D data acquisition

Real time stereo vision
Implemented Gimelfarbs Symmetric Dynamic
Programming Stereo in FPGA hardware
Real time precise stereo vision
Faster, smaller hardware circuit
Real time 3D maps
1 depth accuracy with 2 scan line latency at 25
frames/se

System block diagram lens distortion
removal,misalignment correction and depth
calculator
Output is stream of depth values a 3D movie!
47
Real time 3D data acquisition

Possible Applications
Collision avoidance for robots
Recognition via 3D models
Fast model acquisition
Imaging technology not scanning!
Recognition of humans without markers
Tracking objects
Recognizing orientation, alignment
Process monitoring
eg Resin flow in flexible (bag) moulds
Motion capture robot training

System block diagram lens distortion
removal,misalignment correction and depth
calculator
Output is stream of depth values a 3D movie!
48
FPGA Stereo System
Parallel Host Interface
FirewireCables
FPGA AlteraStratix
FirewirePhysical Layer ASIC
FirewireLink Layer ASIC
FPGA Prog Cable
49
Summary
50
Summary

Challenges of Artificial Vision Systems
Real-time Image processing requires compute
power!
Correspondence (Matching)
Depth accuracy
Evolution Lessons
Emulate parallel processing capability of
humanbrain
Use verging optics

51
Summary

Our system
FPGA front end processor
Remove distortion
Correct camera misalignment
Stereo matching
Using dynamic programming
Latency
Several scan lines (?1 millisecond)
Depends on lens distortion and camera alignment
Host does not have to wait for a whole image!
Depth (distance) maps in real-time
3D vision!
Frees host processor for image interpretation
Use both technologies (FPGA, conventional CPU)
where they perform best!

52
Ongoing Photogrammetry Projects
53
Ongoing Projects

Face Recognition
Development of Face Models
Animation
Automated Driving
With Daimler-Benz
Stereo Algorithms
Improved correspondence algorithms
High Quality Rendering
Movie special effects eg The Lord of the
Rings
Using reconfigurable hardware (FPGA)

54
Spare slides
55
Stereo matching

Automated stereo systems find matching regions in
the two images
The separation of the matching regions is the
disparity from which depth is calculated
Matching algorithms generally search over a range
of possible disparities
Looking for the best match in the two images

Stereo Correspondence is a classical challenge
for AI systems Our brains match regions in images
without effort .. but computers struggle to match
as well!
56
Stereo Photogrammetry
Pairs of images giving different views of the
scene
can be used to compute a depth (disparity) map ?
57
DetailSystem Architecture
Pixel Buffers
Pixel AddressGeneratorRemoves distortionand
misalignment
Predecessormatrix (dynamic programming)
n DisparityCalculatorsOne for each
possibledisparity value
Stream of disparity values
58
(No Transcript)

Write a Comment

User Comments (0)