Active%20Perception - PowerPoint PPT Presentation

About This Presentation

Title:

Active%20Perception

Description:

Active Perception We not only see but we look, we not only touch we feel, JJ.Gibson Active Perception vs. Active Sensing WHAT IS ACTIVE SENSING? – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 36

Provided by: Ruze2

Learn more at: http://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Active%20Perception

1
Active Perception

We not only see but we look, we not only touch we
feel,
JJ.Gibson

2
Active Perception vs. Active Sensing

WHAT IS ACTIVE SENSING?
In the robotics and computer vision literature,
the term
active sensor generally refers to a sensor that
transmits
(generally electromagnetic radiation, e.g.,
radar, sonar,
ultrasound, microwaves and collimated light) into
the environment
and receives and measures the reflected signals.
We believe that the use of active sensors is not
a necessary
condition on active sensing, and that sensing can
be performed
with passive sensors (that only receive, and do
not
emit, information), employed actively.

3
Active Sensing

Hence the problem of Active Sensing can be stated
as a
problem of controlling strategies applied to the
data acquisition
process which will depend on the current state of
the
data interpretation and the goal or the task of
the process.
The question may be asked, Is Active Sensing
only an
application of Control Theory? Our answer is
No, at least
not in its simple version. Here is why

4
Active Perception

1) The feedback is performed not only on sensory
data
but on complex processed sensory data, i.e.,
various
extracted features, including relational
features.
2) The feedback is dependent on a priori
knowledge and models
that are a mixture of numeric/parametric and
symbolic information.

5
Active Perception turned into an engineering
agenda

The implications of the active sensing/perception
approach are the
following
1) The necessity of models of sensors. This is to
say, first,
the model of the physics of sensors as well as
the noise of
the sensors. Second, the model of the signal
processing and data reduction mechanisms that are
applied on the measured
data. These processes produce parameters with a
definite
range of expected values plus some measure of
uncertainties.
These models shall be called Local Models.

6
Engineering agenda,cont.

2) The system (which mirrors the theory) is
modular as
dictated by good computer science practices and
interactive,
that is, it acquires data as needed. In order to
be able
to make predictions on the whole outcome, we
need, in
addition to models of each module (as described
in 1)
above), models for the whole process, including
feedback.
We shall refer to these as Global Models.
3) Explicit specification of the initial and
final state /goal.
If the Active Vision theory is a theory, what is
its predictive
power? There are two components to our theory,
each
with certain predictions

7
Active Vision theory

1) Local models. At each processing level, local
models
are characterized by certain internal parameters.
Examples
of local models can be region growing algorithm
with internal
parameters, the local similarity and size of the
local
neighborhood. Another example is an edge
detection algorithm
with parameter of the width of the band pass
filter in
which one is detecting the edge effect. These
parameters
predict a) the definite range of plausible
values, and b) the
noise and uncertainty which will determine the
expected
resolution, sensitivity ,robustness of the output
results from
each module

8
Active Vision,cont.

2) Global models characterize the overall
performance
and make predictions on how the individual
modules will
interact which in turn will determine how
intermediate
results are combined. The global models also
embody the
Global external parameters, the initial and final
global state
of the system. The basic assumption of the Active
Vision
approach is the inclusion of feedback into the
system and
gathering data as needed. The global model
represents all
the explicit feedback connection, parameters, and
the optimization
criteria which guides the process.

9
Control Strategies

three distinct control stages proceeding in
sequence
initialization,
processing in midterm,
completion of the task.
Strategies are divided with respect to the
tradeoff between
how much data measurement the system acquires
(data
driven, bottom-up) and how much a priori or
acquired
knowledge the system uses at a given stage
(knowledge
driven, top-down). Of course, there is that
strategy which
combines the two.

10
Bottom up and Top down process

To eliminate possible ambiguities with the terms
bottom up
and top-down, we define them here. Bottom-up
(data
driven), in this discussion, is defined as a
control strategy
where no concrete semantic, context dependent
model is
available, as opposed to the top-down strategy
where such
knowledge is available.

11
GOALS/TASKS

Different tasks will determine the design of the
system, i.e. the architecture.
Consider the following tasks
Manipulation
Mobility
Communication and Interaction of machine to
machine or people to people via digital media or
people to machine.

12
Goal/Task

Geographically distributed communication and
interaction using multimedia (vision primarily)
using the Internet.
We are concerned with primarily unspoken
communication gestures and body motion.
Examples are coordinated movement such as dance,
physical exercises, training of manual skills,
remote guidance of physical activities.

13
Note

Recognition , Learning will play a role in all
the tasks.

14
Environments/context

Serves as a constraint in the design.
We shall consider only the constraints relevant
to the visual task that serves to accomplish the
physical activity.
For example in the manipulation task, the size
of the object will determine the data
acquisition strategy but also the design of the
vision system (choice of field of view, focal
length, illumination, and spatial resolution).
Think of moving furniture vs. picking up a coin.

15
Environment/context

Another example Mobility
There is a difference if the mobility is on the
ground, in the air looking down or up.
The position and orientation of the observer will
determine the interpretation of the signal.
Furthermore there is a difference between outdoor
and indoor environment.
Varied visibility conditions will influence the
design and the architecture.

16
Environment/context

For distributed communication and interaction.
The environment will depend on the application,
could be digitized environment of the place
where the participants are or it also could be a
virtual environment, for example one can put
people into a historical environment (Rome,
Pompei, etc.)

17
Active Vision System for 3D object recognition

Table 1 below outlines the multilayered system of
an
Active vision system, with the final goal of 3-D
object/shape
recognition. The layers are enumerated from 0, 1,
2, . .
with respect to the goal (intermediate results)
and feedback
parameters. Note that the first three levels
correspond to
monocular processing only. Naturally the menu of
extracted
Features from monocular images is far from
exhaustive. The
other 3-5 levels are based on binocular images.
It is only
the last level that is concerned with semantic
interpretation.

18
Table

Level Feedback
Goal
Parameters
stopping conditions
__________________________________________________
______
0
control of the directly measured
grossly focused
Physical device current lighting system
scene ,camera adjusted
open/close aperture
aperture
__________________________________________________
________
1.
Control of the directly measured
focused
Physical device focus, zoom
on one object
Computed contrast
distance from
focus
_______________________________________________
2.
Control of low computed only
2D segmentation
Level vision threshold of the width
max .of edges/regions
Modules of filters

19
Table cont.

Level Feedback
Parameters Goal/Stopping
__________________________________________________
_____________________
3.
Control of binocular directly measured
Depth map
System hardware vergence angle
Software) computed
range of admissible
depth
values
__________________________________________________
_____________________
4.
Control of intermediate computed only
segmentation
Geometric vision threshold of
similarity
Module between
surfaces
__________________________________________________
____________________
5.Control of compute the
position 3D object
description
Several views rotation of
different views
Integration process
__________________________________________________
_________________________
6. Control of semantic
Interpretation

recognition of 3D objects/scene

20
Comments

Several comments are in order
1) Although we have presented the levels in a
sequential
order, we do not believe that is the only way of
the
flow of information through the system. The only
significance
in the order of levels is that the lower levels
are somewhat more basic and necessary for the
higher
levels to function.
2) In fact, the choice of at which level one
accesses the
system very much depends on the given task and/or
the goal.

21
Active Visual Observer

Several groups around the world build a binocular
active vision system that can attend to and
fixate a moving target.
We will review two such systems one built at
UPENN,GRASP laboratory and the other at KTH
(Royal Institute of Technology) in
Stockhols,Sweden.

22
The UPENN System
23
PennEyesA Binocular Active Vision System
24
PennEyes

PennEyes is a head in-hand system with a
binocular camera platform mounted on a 6 DOF
robotic arm. Although physically limited to reach
of the arm, the functionality of the head is
extended through the use of the motorized optics
(10x zoom). The architecture is configured to
rely minimally on external systems and .

25
Design considerations

MechanicalThe precision positioning was afforded
by the PUMA arm. However the binocular camera
platform needed to weigh in the range of 2.5 Kg.
Optics The use of motorized lenses (zoom, focus
and aperture) offered an increase functionality.
Electronics This was the most critical element
in the design. A MIMD DSP organization was
decided as the best tradeoff between
performance, extensibility and ease of
integration.

26
Puma Polka
27
Tracking Performance

The two robots afforded objective measures of
tracking performance with precision target.
A three dimensional path with known precision can
be repeatedly generated , allowing the comparison
of different visual servoing algorithms.

28
BiSight Head
29
BiSight head

Has an independent pan axes with the highest
tracking performance of 1000deg/s and
12,000deg/ssquare. The concern here is how well
can be maintained the calibration after repeated
exposure to acceleration and vibration.
Another problem occurred with zoom adjustment the
focal length also changed.
The binocular camera platform has 4 optical (zoom
and focus) and 2 mechanical (pan) degrees of
freedom.

30
C40 Architecture

Beyond the basic computing power of the
individual C40s the performance of the network is
enhanced by the ability to interconnect the
modules with a fair degree of flexibility as well
as the ability store an appreciable amount of
information. The former is made possible up to
six comports on each module and the later by
several Mbytes of local storage.

31
C40 Architecture
32
Critical Issues

The performance of any modularly structured
active vision system depends critically on a few
recurring issues. They involve the coordination
of processes running on different subsystems, the
management of large data streams, processing and
transmission delays and the control of systems
operating at different rates.

33
Synchronization

The three major components of this modular active
vision system are independent entities that work
at their own pace. The lack of a common time base
makes synchronizing the components a difficult
task.
In some cases , an external signal can be used to
synchronize independent hardware components. In
this system, C40 network, the digitizers and the
graphics module are slaved on the vertical sync
of the genlocked cameras.

34
Other considerations

Bandwidth large data streams
System Integration. If data throughput becomes
the bottleneck, then some new data compression
algorithms must be invoked.
Latency. Delays between the acquisition of a
frame and the motor response to it are an
inevitable problem of active vision systems.
Delays make the control more difficult because
they can cause instabilities.
Multi-rate control. Active vision systems
suggests by their very nature a hierarchical
approach to control

35
Control

If the visual and mechanical control rates are
one or more orders of magnitude apart, the
mechanical control loops are essentially
independent of the visual control loop.

Write a Comment

User Comments (0)