Title: Vision Strategies and Tools You Can Use
1Vision Strategies and Tools You Can Use
- RSS II Lecture 5
- September 16, 2005
- Prof. Teller
2Today
- Strategies
- What do we want from a vision system?
- Tools
- Pinhole camera model
- Vision algorithms
- Development
- Carmen Module APIs, brainstorming
3Vision System Capabilities
- Material identification
- Are there bricks in vicinity? If so, where?
- Motion freedom
- What local motion freedom does robot have?
- Manipulation support
- Is brick pose amenable to grasping/placement?
- Is robot pose correct for grasping/placement?
- Localization
- Where is robot, with respect to provided map?
- Which way to home base? To a new region?
- Has robot been in this region before?
4Material Identification
- Detecting bricks when theyre present
- How?
- Locate bricks
- In which coordinate system?
- Estimate range and bearing how?
5Gauge distance from apparent size?
- Yes under what assumption?
6Pinhole Camera Model (physical)
y
v
World point
Image plane (u, v)
P (0, y, z)
z
Image point
p (0, -y/z)
pinhole at world origin O
World coordinates (x,y,z)
z 0
z -1
enclosure
Notes Diagram is drawn in the plane
x0 Image-space u-axis points out of
diagram World-space x-axis points out of
diagram Both coordinate systems are left-handed
7Pinhole Camera Model (virtual)
(Virtual image plane placed 1 unit in front of
pinhole no inversion)
World points
P2 (0, y2, z2)
v
y
Image point
P1 (0, y1, z1)
p (0, y/z)
O
z
Image plane
(All points along ray Op project to image point
p!)
z 1
z 0
8Perspective Apparent Size
- Apparent object size decreases with depth(perp.
distance from camera image plane)
9Perspective Apparent Size
10What assumptions yield depth?
11Ground plane assumption
- Requires additional metric information
- Think of this as a constraint on camera, world
structure - Plane in scene, with two independent marked
lengths - Can measure distance to, or size of, objects on
the plane - but where do the marked lengths come from?
4 m
3 m
2 m
1 m
1 m
12Camera Calibration
- Maps 3D world points wP to 2D image plane IP
- Map can be factored into two operations
- Extrinsic (rigid-body) calibration (situates
camera in world) - Intrinsic calibration (warps rays through optics
onto image)
v
wZ
cY
cZ
height
wP
cO
Principal point
IP
wY
CP
(u0, v0)
A
cX
K
wO
wX
width
u
IO
World coordinates (arbitrary choice)
Camera coordinates (e.g., cm)
Image coordinates (pixels)
K3x3
A3x4 (R3x3 t3x1)
Ip K (1/CZ) A WP K (1/Cz) (R t) WP
Ip3x1 K3x3 (1/CZ) A3x4 WP4x1
13World-to-Camera Transform
- Relabels world-space points w.r.t. camera body
- Extrinsic (rigid-body) calibration (situates
camera in world)
wZ
cY
cZ1
cZ
wP
cO
wY
CP
A
cX
wO
wX
World coordinates (arbitrary choice)
Camera coordinates (e.g., cm)
A3x4 (R3x3 t3x1)
CP (1/Cz) (R t) WP
CP3x1 (1/CZ) A3x4 WP4x1
Note effect of division by Cz no scaling
necessary!
14Camera-to-Image Transform
- Maps 2D camera points to 2D image plane
- Models ray path through camera optics and body to
CCD
v
cY
cZ1
height
cZ
cO
Principal point
IP
CP
(u0, v0)
cX
K
width
u
IO
Camera coordinates (e.g., cm)
Image coordinates (pixels)
Ip K CP
Ip3x1 K3x3 CP3x1
Matrix K captures the cameras intrinsic
parameters a, b horizontal, vertical scale
factors (equal iff pixel elements are
square) u0, v0 principal point, i.e., point at
which optical axis pierces image plane c image
element (CCD) skew, usually 0
K3x3
15End-to-End Transformation
v
wZ
cY
cZ
height
cZ1
wP
cO
Principal point
IP
wY
CP
(u0, v0)
A
cX
K
wO
wX
u
IO
width
World coordinates (arbitrary choice)
Camera coordinates (e.g., cm)
Image coordinates (pixels)
K3x3
A3x4 (R3x3 t3x1)
Ip K (1/CZ) A WP K (1/Cz) (R t) WP
Ip3x1 K3x3 (1/CZ) A3x4 WP4x1
16Example Metric Ground Plane
- Make camera-frame and world-frame coincidentThus
R I3x3, t 03x1, A4x4 (R t) as before - Lay out a tape measure on line x 0, y -h
- Mark off points at (e.g.) 50-cm intervals
- What is the functional form of map u
f(wx,wy,wz)?
Ip K (1/CZ) A WP K (1/Cz) I WP
Image point
y
K (1/CZ) (0, -h, Cz)T
Ip (0, v, 1)
K (0, -h/Cz, 1)T
v
(u0, -bh/Cz v0, 1)T
Camera
Measure h observe CZi, vi repeatedly solve
for u0, b, v0
y -h
z
Image plane
z 1
z 0
z12
z2 3
z3 4
17Vision System Capabilities
- Material identification
- Are there bricks in vicinity? If so, where?
- Motion freedom
- What local motion freedom does robot have?
- Manipulation support
- Is brick pose amenable to grasping/placement?
- Is robot pose correct for grasping/placement?
- Localization
- Where is robot, with respect to provided map?
- Which way to home base? To a new region?
- Has the robot been in this region before?
18Motion Freedom
- What can be inferred from image?
19Freespace Map
- Discretize bearing classify surface type
20Freespace Map Ideas
- Use simple color classifier
- Train on road, sidewalk, grass, leaves etc.
- Training could be done offline, or in a
start-of-mission calibration phase adapted from
RSS II Lab 2 - For each wedge of disk, could report distance to
nearest obstruction - Careful how will your code deal with varying
lighting conditions? - Finally can fuse (or confirm) with laser data
21Vision System Capabilities
- Material identification
- Are there bricks in vicinity? If so, where?
- Motion freedom
- What local motion freedom does robot have?
- Manipulation support
- Is brick pose amenable to grasping/placement?
- Is robot pose correct for grasping/placement?
- Localization
- Where is robot, with respect to provided map?
- Which way to home base? To a new region?
- Has the robot been in this region before?
22Manipulation Support
- Two options
- Manipulate brick into appropriate grasp pose
- Plan motion to approach the (fixed-pose) brick
Manipulation and/or motion plan
Initial pose
Desired pose
How? Hint compute moments
How to disambiguate edge-on, end-on?
23Vision System Capabilities
- Material identification
- Are there bricks in vicinity? If so, where?
- Motion freedom
- What local motion freedom does robot have?
- Manipulation support
- Is brick pose amenable to grasping/placement?
- Is robot pose correct for grasping/placement?
- Localization
- Where is robot, with respect to provided map?
- Which way to home base? To a new region?
- Has the robot been in this region before?
24Localization support
- Localization w.r.t. a known map
- See localization lecture from RSS I
- Features curb cuts, vertical building edges
- Map format not yet defined one of your tasks
L3
L1
L2
P, q
Locus of likely poses
25Localization support
- Weaker localization model
- Create (virtual) landmarks at intervals
- Chain each landmark to predecessor
- Recognize when landmark is revisited
- Record direction from landmark to its neighbors
- Is this map topological or metrical?
- Does it support homing? Exploration?
26Visual basis for landmarks
- Desire a visual property that
- Is nearly invariant to large robot rotations
- Is nearly invariant to small robot translations
- Has a definable scalar distance d(b,c) why?
- Possible approaches
- Hue histograms (coarsely discretized)
- Ordered hue lists (e.g., of vertical strips)
- Skylines (must segment ground from sky)
- Some hybrid of vision, laser data
- Careful think about ambiguity in scene
c
?
?
a
b
27Today
- Strategies
- What do we want from a vision system?
- Tools
- Pinhole camera model
- Vision algorithms
- Development
- Carmen Module APIs, brainstorming
28Carmen Module APIs
- Vision module handles (processes) image stream
- Must export more compact representation than
images - What representation(s) should module export?
- Features? Distances? Landmarks? Directions?
Maps? - What questions should module answer?
- Are there collectable blocks nearby? Where?
- Has robot been here before? With what
confidence? - Which direction(s) will get me closer to home?
- Which direction(s) will explore new regions?
- Which directions are physically possible for
robot? - What non-vision data should module accept?
- Commands to establish a new visual landmark?
- Notification of rotation in place? Translation?
- Spiral development
- Put simple APIs in place, even if performance is
stubbed - Get someone else to exercise them revise
appropriately
29Conclusion
- One plausible task decomposition
- Bottom-up operating scenario
- Related existing, new vision tools
- Functional view what are APIs?
- Exhortation to spiral development
30- vision lecture plan (seth)
- review pinhole camera model
- show that, since depth is unknown, scale of a
scene object is unknown - (show picture of rob and me?)
- but under certain assumptions, we can measure
an object in the scene - simplest such assumption is the "ground
plane" assumption - show how the extent of the object can be
"read off" from image - show intuitive trig, then show in terms of
calibration matrix K - how do we know that the object isn't just a
"flash in the pan" ? - i.e., it could be a transient object a
person, animal, something - blowing in the wind
- answer look for confirmatory evidence over
multiple image frames! - and from multiple vantage points.