Intelligent Vision Processor - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Intelligent Vision Processor

Description:

Intelligent Vision Processor John Morris Computer Science/ Electrical & Computer Engineering, The University of Auckland Iolanthe II rounds Channel Island - – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 59
Provided by: CSStu6
Category:

less

Transcript and Presenter's Notes

Title: Intelligent Vision Processor


1
Intelligent Vision Processor
  • John Morris
  • Computer Science/Electrical Computer
    Engineering,The University of Auckland

Iolanthe II rounds Channel Island
-Auckland-Tauranga Race, 2007
2
Intelligent Vision Processor
  • Applications
  • ? Robot Navigation
  • ? Collision avoidance autonomous vehicles
  • ? Manoeuvring in dynamic environments
  • ? Biometrics
  • Face recognition
  • ? Tracking individuals
  • ? Films
  • ? Markerless motion tracking
  • ? Security
  • ? Intelligent threat detection
  • ? Civil Engineering
  • ? Materials Science
  • ? Archaeology

3
Background
4
Intelligent Vision
  • Our vision system is extraordinary
  • Capabilities currently exceed those of any single
    processor
  • Our brains
  • Operates on a very slow clock
  • kHz region
  • Massively parallel
  • gt1010 neurons can compute in parallel
  • Vision system (eyes) can exploit this parallelism
  • 3 x 106 sensor elements (rods and cones) in
    human retina

5
Intelligent Vision
  • Matching and recognition
  • Artificial intelligence systems are currently not
    in the race!
  • For example
  • Face recognition
  • We can recognize faces
  • From varying angles
  • Under extreme lighting conditions
  • With or without glasses, beards, bandages,
    makeup, etc
  • With skin tone changes, eg sunburn
  • Games
  • We can strike balls travelling at gt 100 km/h
  • and
  • Direct that ball with high precision

6
Human vision
  • Uses a relatively slow, but massively parallel
    processor (our brains)
  • Able to perform tasks
  • At speeds
  • and
  • With accuracy
  • beyond capabilities of state-of-the-art
    artificial systems

7
Intelligent Artificial Vision
  • High performance processor
  • Too slow for high resolution (Mpixel) imagein
    real time (30 frames per second)
  • Useful vision systems
  • Must be able to
  • Produce 3D scene models
  • Update scene models quickly
  • Immediate goal 20-30Hz to mimic human
    capabilities
  • Long term goal gt30 Hz to provide enhanced
    capabilities
  • Produce accurate scene models

8
Intelligent Artificial Vision
  • Use human brain as the fundamental model
  • We know it works better than a conventional
    processor!
  • We need

Artificial system Brain
Large numbers of (small) processing elements Neurons
Many parallel connections Nerves
9
Human Vision Systems
  • Higher order animals all use binocular vision
    systems
  • Permits estimation of distance to an object
  • Vital for many survival tasks
  • Hunting
  • Avoiding danger
  • Fighting predators
  • Distance (or depth) computed by triangulation

P
P
P
P-P is the disparity It increases as P comes
closer
10
Human Vision Systems
  • Higher order animals all use binocular vision
    systems
  • Permits estimation of distance to an object
  • Vital for many survival tasks
  • Hunting
  • Avoiding danger
  • Fighting predators
  • Distance (or depth) computed by triangulation

P
P
P
P-P is the disparity Increases as P comes
closer
11
Artificial Vision
  • Evolution took millions of years to optimize
    vision
  • Dont ignore those lessons!
  • Binocular vision works
  • Verging optics
  • Human eyes are known to swivel to fixate on an
    object of interest

12
Real vs Ideal Systems
  • Real lenses distort images
  • Distortion must be removed for high precision
    work!
  • Easy
  • but
  • Conventional technique uses iterative solution
  • Slow!
  • Faster approach needed for real time work

Image of a rectangular gridwith a real lens
13
Why Stereo?
  • Range finders give depth information directly
  • SONAR
  • Simple
  • Not very accurate (long l)
  • Beam spread ? Low spatial resolution
  • Lasers
  • Precise
  • Low divergence ? High spatial resolution
  • Requires fairly sophisticated electronics
  • Nothing too challenging in 2008
  • Why use an indirect measurement when direct ones
    are available?

14
Why Stereo?
  • Passive
  • Suitable for dense environments
  • Sensors do not interfere with each other
  • Wide area coverage
  • Multiple overlapping views obtainable without
    interference
  • Wide area 3D data can be acquired at high rates
  • 3D data aids unambiguous recognition
  • 3rd dimension provides additional discrimination
  • Textureless regions cause problems
  • but
  • Active illumination can resolve these
  • Active patterns can use IR (invisible, eye-safe)
    light

15
Artificial Vision Challenges
16
Artificial Vision - Challenges
  • High processor power
  • Match parallel capabilities of human brain
  • Distortion removal
  • Real lenses always show some distortion
  • Depth accuracy
  • Evolution learnt about verging optics millions of
    years ago!
  • Efficient matching
  • Good corresondence algorithms

17
Artificial Vision
  • Simple stereo systems are being produced
  • Point Grey, etc
  • All use canonical configuration
  • Parallel axes, coplanar image planes
  • Computationally simpler
  • High performance processor doesnt have time to
    deal with the extra computational complexity of
    verging optics

Point Grey Research Trinocular vision system
18
Artificial System Requirements
  • Highly Parallel Computation
  • Calculations are not complex
  • but
  • There are a lot of them in megapixel ( gt106 )
    images!
  • High Resolution Images
  • Depth is calculated from the disparity
  • If its only a few pixels, then depth accuracy is
    low
  • Basic equation (canonical configuration only!)

Baseline
Focal Length
Depth, z b f d p
Pixel size
Disparity
19
Artificial System Requirements
  • Depth resolution is critical!
  • A cricket player can catch a 100mm ball
    travelling at 100km/h
  • High Resolution Images Needed
  • Disparities are large numbers of pixels
  • Small depth variations can be measured
  • but
  • High resolution images increase the demand for
    processing power!

Strange game played in former British
coloniesin which a batsmen defends 3 small
sticksin the centre of a large field against a
bowler whotries to knock them down!
20
Artificial System Requirements
  • Conventional processors do not have sufficient
    processing power
  • but Moores Law says
  • Wait 18 months and the power will have doubled
  • but
  • The changes that give you twice the poweralso
    give your twice as many pixels in a rowand four
    times as many in an image!

Specialized highly parallel hardwareis the only
solution!
21
Processing Power Solution
22
FPGA Hardware
  • FPGA Field Programmable Gate Array
  • Soft hardware
  • Connections and logic functions are programmed
    in much the same way as a conventional von Neuman
    processor
  • Creating a new circuit is about as difficult as
    writing a programme!
  • High order parallelism is easy
  • Replicate the circuit n times
  • As easy as writing a for loop!

23
FPGA Hardware
  • FPGA Field Programmable Gate Array
  • Circuit is stored in static RAM cells
  • Changed as easily as reloading a new program

24
FPGA Hardware
  • Why is programmability important?
  • or
  • Why not design a custom ASIC?
  • Optical systems dont have the flexibility of a
    human eye
  • Lenses fabricated from rigid materials
  • Not possible to make a one system fits all
    system
  • Optical configurations must be designed for each
    application
  • Field of view
  • Resolution required
  • Physical constraints
  • Processing hardware has to be adapted to the
    optical configuration
  • If we design an ASIC, it will only work for one
    application!!

25
Correspondence or Matching
26
Stereo Correspondence
Can you find all the matching points in these two
images?
Of course! Its easy!
The best computer matching algorithms get 5 or
more of the points completely wrong!
and take a long time to do it!Theyre not
candidates for real time systems!!
27
Stereo Correspondence
  • High performance matching algorithms are global
    in nature
  • Optimize over large image regions using energy
    minimization schemes
  • Global algorithms are inherently slow
  • Iterate many times over small regions to find
    optimal solutions

28
Correspondence Algorithms
  • Good matching performance, global, low speed
  • Graph-cut, belief-propagation,
  • High speed, simple, local, high parallelism,
    lowest performance
  • Correlation
  • High speed, moderate complexity, parallel, medium
    performance

Dynamic programming algorithms
29
Depth Accuracy
30
Stereo Configuration
Points along these lineshave the same disparity
  • Canonical configuration Two cameras with
    parallel optical axes
  • Rays are drawn through each pixel in the image
  • Ray intersections represent points imaged onto
    the centre of each pixel

Depthresolution
  • but
  • To obtain depth information, a point must be seen
    by both cameras, ie it must be in the Common
    Field of View

31
Stereo Camera Configuration
  • Now, consider an object of extent, a
  • To be completely measured, it must lie in the
    Common Field of View
  • but
  • place it as close to the camera as you can so
    that you can obtain the best accuracy, say at D
  • Now increase b to increase the accuracy at D
  • But you must increase D so that the object stays
    within the CFoV!
  • Detailed analysis leads to an optimum value of b
    ? a

a
D
b
a
32
Increasing the baseline
Increasing the baseline decreases performance!!
good matches
Images corridor set (ray-traced) Matching
algorithms P2P, SAD
Baseline, b
33
Increasing the baseline
Examine the distribution of errors
Increasing the baseline decreases performance!!
Standard Deviation
Images corridor set (ray-traced) Matching
algorithms P2P, SAD
Baseline, b
34
Increased Baseline ? Decreased Performance
  • Statistical
  • Higher disparity range
  • increased probability of matching incorrectly -
    youve simply got more choices!
  • Perspective
  • Scene objects are not fronto-planar
  • Angled to camera axes
  • subtend different numbers of pixels in L and R
    images
  • Scattering
  • Perfect scattering (Lambertian) surface
    assumption
  • OK at small angular differences
  • increasing failure at higher angles
  • Occlusions
  • Number of hidden regions increases as angular
    difference increases
  • increasing number of monocular points for
    which there is no 3D information!

35
Evolution
  • Human eyes verge on an object to estimate its
    distance, ie the eyes fix on the object in the
    field of view

Configuration commonly used in stereo systems
Configuration discovered by evolution millions of
years ago
Note immediately that the CFoV is much larger!
36
Look at the optical configuration!
  • If we increase f, then Dmin returns to the
    critical value!

Original f
Increase f
37
Depth Accuracy - Verging axes, increased f
Now the depth accuracy has increased dramatically!
Note that at large f, the CFoV does not
extend very far!
38
Summary
39
Summary Real time stereo
  • General data acquisition is
  • Non contact
  • Adaptable to many environments
  • Passive
  • Not susceptible to interference from other
    sensors
  • Rapid
  • Acquires complete scenes in each shot
  • Imaging technology is well established
  • Cost effective, robust, reliable
  • 3D data enhances recognition
  • Full capabilities of 2D imaging system
  • Depth data
  • With hardware acceleration
  • 3D scene views available for
  • ControlMonitoring
  • in real time
  • Rapid response ? rapid throughput

Host computer is free to process complex control
algorithms ? Intelligent Vision
Processing Systems which can mimic human vision
system capabilities!
40
Our Solution
41
System Architecture
FPGA
L Camera
SerialInterface Firewire/GigE/CameraLink
Line BuffersDistortion Removal Image Alignment
RCamera
PC
Host Higher orderInterpretation
Control Signals
CorrectedImages
Stereo Matching
Disparity? Depth
DepthMap
42
Distortion removal
  • Image of a rectangular grid from camera with
    simple zoom lens
  • Lines should be straight!
  • Store displacements of actual image from ideal
    points in LUT
  • Removal algorithm
  • For each ideal pixel position
  • Get displacement to real image
  • Calculate intensity of ideal pixel (bilinear
    interpolation)

43
Distortion Removal
  • Fundamental Idea
  • Calculation of undistorted pixel position
  • Simple but slow
  • Not suitable for real time
  • but
  • Its the same for every image!
  • So, calculate once!
  • Create a look up table containing ideal ? actual
    displacements for each pixel

ud uud (1k2k4..)r2 r2 (uudvud)2
44
Distortion Removal
  • Creating the LUT
  • One entry (dx,dy) per pixel
  • For a 1 Mpixel image needs 8 Mpixels!
  • Each entry is a float (dx,dy) requires 8 bytes
  • However, distortion is a smooth curve
  • Store one entry per n pixels
  • Trials show that n64 is OK for severely
    distorted image
  • LUT row contains 210 / 2 6 24 16 entries
  • Total LUT is 256 entries
  • Displacement for pixel j,k
  • dujk (j mod 64) duj/64,k/64
  • duj/64,k/64 is stored in LUT
  • Simple, fast circuit

Since the algorithm runs along scan lines,this
multiplication is done by repeated addition
45
Alignment correction
  • In general, cameras will not be perfectly aligned
    in canonical configuration
  • Also, may be using verging axes to improve depth
    resolution
  • Calculate locations of epipolar lines once!
  • Add displacements to LUT for distortion!

46
Real time 3D data acquisition
  • Real time stereo vision
  • Implemented Gimelfarbs Symmetric Dynamic
    Programming Stereo in FPGA hardware
  • Real time precise stereo vision
  • Faster, smaller hardware circuit
  • Real time 3D maps
  • 1 depth accuracy with 2 scan line latency at 25
    frames/se

System block diagram lens distortion
removal,misalignment correction and depth
calculator
Output is stream of depth values a 3D movie!
47
Real time 3D data acquisition
  • Possible Applications
  • Collision avoidance for robots
  • Recognition via 3D models
  • Fast model acquisition
  • Imaging technology not scanning!
  • Recognition of humans without markers
  • Tracking objects
  • Recognizing orientation, alignment
  • Process monitoring
  • eg Resin flow in flexible (bag) moulds
  • Motion capture robot training

System block diagram lens distortion
removal,misalignment correction and depth
calculator
Output is stream of depth values a 3D movie!
48
FPGA Stereo System
Parallel Host Interface
FirewireCables
FPGA AlteraStratix
FirewirePhysical Layer ASIC
FirewireLink Layer ASIC
FPGA Prog Cable
49
Summary
50
Summary
  • Challenges of Artificial Vision Systems
  • Real-time Image processing requires compute
    power!
  • Correspondence (Matching)
  • Depth accuracy
  • Evolution Lessons
  • Emulate parallel processing capability of
    humanbrain
  • Use verging optics

51
Summary
  • Our system
  • FPGA front end processor
  • Remove distortion
  • Correct camera misalignment
  • Stereo matching
  • Using dynamic programming
  • Latency
  • Several scan lines (?1 millisecond)
  • Depends on lens distortion and camera alignment
  • Host does not have to wait for a whole image!
  • Depth (distance) maps in real-time
  • 3D vision!
  • Frees host processor for image interpretation
  • Use both technologies (FPGA, conventional CPU)
    where they perform best!

52
Ongoing Photogrammetry Projects
53
Ongoing Projects
  • Face Recognition
  • Development of Face Models
  • Animation
  • Automated Driving
  • With Daimler-Benz
  • Stereo Algorithms
  • Improved correspondence algorithms
  • High Quality Rendering
  • Movie special effects eg The Lord of the
    Rings
  • Using reconfigurable hardware (FPGA)

54
Spare slides
55
Stereo matching
  • Automated stereo systems find matching regions in
    the two images
  • The separation of the matching regions is the
    disparity from which depth is calculated
  • Matching algorithms generally search over a range
    of possible disparities
  • Looking for the best match in the two images

Stereo Correspondence is a classical challenge
for AI systems Our brains match regions in images
without effort .. but computers struggle to match
as well!
56
Stereo Photogrammetry
Pairs of images giving different views of the
scene
can be used to compute a depth (disparity) map ?
57
DetailSystem Architecture
Pixel Buffers
Pixel AddressGeneratorRemoves distortionand
misalignment
Predecessormatrix (dynamic programming)
n DisparityCalculatorsOne for each
possibledisparity value
Stream of disparity values
58
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com