Active%20Perception - PowerPoint PPT Presentation

About This Presentation
Title:

Active%20Perception

Description:

Active Perception We not only see but we look, we not only touch we feel, JJ.Gibson Active Perception vs. Active Sensing WHAT IS ACTIVE SENSING? – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 36
Provided by: Ruze2
Category:

less

Transcript and Presenter's Notes

Title: Active%20Perception


1
Active Perception
  • We not only see but we look, we not only touch we
    feel,
  • JJ.Gibson

2
Active Perception vs. Active Sensing
  • WHAT IS ACTIVE SENSING?
  • In the robotics and computer vision literature,
    the term
  • active sensor generally refers to a sensor that
    transmits
  • (generally electromagnetic radiation, e.g.,
    radar, sonar,
  • ultrasound, microwaves and collimated light) into
    the environment
  • and receives and measures the reflected signals.
  • We believe that the use of active sensors is not
    a necessary
  • condition on active sensing, and that sensing can
    be performed
  • with passive sensors (that only receive, and do
    not
  • emit, information), employed actively.

3
Active Sensing
  • Hence the problem of Active Sensing can be stated
    as a
  • problem of controlling strategies applied to the
    data acquisition
  • process which will depend on the current state of
    the
  • data interpretation and the goal or the task of
    the process.
  • The question may be asked, Is Active Sensing
    only an
  • application of Control Theory? Our answer is
    No, at least
  • not in its simple version. Here is why

4
Active Perception
  • 1) The feedback is performed not only on sensory
    data
  • but on complex processed sensory data, i.e.,
    various
  • extracted features, including relational
    features.
  • 2) The feedback is dependent on a priori
    knowledge and models
  • that are a mixture of numeric/parametric and
  • symbolic information.

5
Active Perception turned into an engineering
agenda
  • The implications of the active sensing/perception
    approach are the
  • following
  • 1) The necessity of models of sensors. This is to
    say, first,
  • the model of the physics of sensors as well as
    the noise of
  • the sensors. Second, the model of the signal
    processing and data reduction mechanisms that are
    applied on the measured
  • data. These processes produce parameters with a
    definite
  • range of expected values plus some measure of
    uncertainties.
  • These models shall be called Local Models.

6
Engineering agenda,cont.
  • 2) The system (which mirrors the theory) is
    modular as
  • dictated by good computer science practices and
    interactive,
  • that is, it acquires data as needed. In order to
    be able
  • to make predictions on the whole outcome, we
    need, in
  • addition to models of each module (as described
    in 1)
  • above), models for the whole process, including
    feedback.
  • We shall refer to these as Global Models.
  • 3) Explicit specification of the initial and
    final state /goal.
  • If the Active Vision theory is a theory, what is
    its predictive
  • power? There are two components to our theory,
    each
  • with certain predictions

7
Active Vision theory
  • 1) Local models. At each processing level, local
    models
  • are characterized by certain internal parameters.
    Examples
  • of local models can be region growing algorithm
    with internal
  • parameters, the local similarity and size of the
    local
  • neighborhood. Another example is an edge
    detection algorithm
  • with parameter of the width of the band pass
    filter in
  • which one is detecting the edge effect. These
    parameters
  • predict a) the definite range of plausible
    values, and b) the
  • noise and uncertainty which will determine the
    expected
  • resolution, sensitivity ,robustness of the output
    results from
  • each module

8
Active Vision,cont.
  • 2) Global models characterize the overall
    performance
  • and make predictions on how the individual
    modules will
  • interact which in turn will determine how
    intermediate
  • results are combined. The global models also
    embody the
  • Global external parameters, the initial and final
    global state
  • of the system. The basic assumption of the Active
    Vision
  • approach is the inclusion of feedback into the
    system and
  • gathering data as needed. The global model
    represents all
  • the explicit feedback connection, parameters, and
    the optimization
  • criteria which guides the process.

9
Control Strategies
  • three distinct control stages proceeding in
    sequence
  • initialization,
  • processing in midterm,
  • completion of the task.
  • Strategies are divided with respect to the
    tradeoff between
  • how much data measurement the system acquires
    (data
  • driven, bottom-up) and how much a priori or
    acquired
  • knowledge the system uses at a given stage
    (knowledge
  • driven, top-down). Of course, there is that
    strategy which
  • combines the two.

10
Bottom up and Top down process
  • To eliminate possible ambiguities with the terms
    bottom up
  • and top-down, we define them here. Bottom-up
    (data
  • driven), in this discussion, is defined as a
    control strategy
  • where no concrete semantic, context dependent
    model is
  • available, as opposed to the top-down strategy
    where such
  • knowledge is available.

11
GOALS/TASKS
  • Different tasks will determine the design of the
    system, i.e. the architecture.
  • Consider the following tasks
  • Manipulation
  • Mobility
  • Communication and Interaction of machine to
    machine or people to people via digital media or
    people to machine.

12
Goal/Task
  • Geographically distributed communication and
    interaction using multimedia (vision primarily)
    using the Internet.
  • We are concerned with primarily unspoken
    communication gestures and body motion.
  • Examples are coordinated movement such as dance,
    physical exercises, training of manual skills,
    remote guidance of physical activities.

13
Note
  • Recognition , Learning will play a role in all
    the tasks.

14
Environments/context
  • Serves as a constraint in the design.
  • We shall consider only the constraints relevant
    to the visual task that serves to accomplish the
    physical activity.
  • For example in the manipulation task, the size
    of the object will determine the data
    acquisition strategy but also the design of the
    vision system (choice of field of view, focal
    length, illumination, and spatial resolution).
    Think of moving furniture vs. picking up a coin.

15
Environment/context
  • Another example Mobility
  • There is a difference if the mobility is on the
    ground, in the air looking down or up.
  • The position and orientation of the observer will
    determine the interpretation of the signal.
  • Furthermore there is a difference between outdoor
    and indoor environment.
  • Varied visibility conditions will influence the
    design and the architecture.

16
Environment/context
  • For distributed communication and interaction.
  • The environment will depend on the application,
    could be digitized environment of the place
    where the participants are or it also could be a
    virtual environment, for example one can put
    people into a historical environment (Rome,
    Pompei, etc.)

17
Active Vision System for 3D object recognition
  • Table 1 below outlines the multilayered system of
    an
  • Active vision system, with the final goal of 3-D
    object/shape
  • recognition. The layers are enumerated from 0, 1,
    2, . .
  • with respect to the goal (intermediate results)
    and feedback
  • parameters. Note that the first three levels
    correspond to
  • monocular processing only. Naturally the menu of
    extracted
  • Features from monocular images is far from
    exhaustive. The
  • other 3-5 levels are based on binocular images.
    It is only
  • the last level that is concerned with semantic
    interpretation.

18
Table
  • Level Feedback
    Goal
  • Parameters
    stopping conditions
  • __________________________________________________
    ______
  • 0
  • control of the directly measured
    grossly focused
  • Physical device current lighting system
    scene ,camera adjusted
  • open/close aperture
    aperture
  • __________________________________________________
    ________
  • 1.
  • Control of the directly measured
    focused
  • Physical device focus, zoom
    on one object
  • Computed contrast
    distance from

  • focus
  • _______________________________________________
  • 2.
  • Control of low computed only
    2D segmentation
  • Level vision threshold of the width
    max .of edges/regions
  • Modules of filters

19
Table cont.
  • Level Feedback
    Parameters Goal/Stopping
  • __________________________________________________
    _____________________
  • 3.
  • Control of binocular directly measured
    Depth map
  • System hardware vergence angle
  • Software) computed
    range of admissible
  • depth
    values
  • __________________________________________________
    _____________________
  • 4.
  • Control of intermediate computed only
    segmentation
  • Geometric vision threshold of
    similarity
  • Module between
    surfaces
  • __________________________________________________
    ____________________
  • 5.Control of compute the
    position 3D object
    description
  • Several views rotation of
    different views
  • Integration process
  • __________________________________________________
    _________________________
  • 6. Control of semantic
  • Interpretation

    recognition of 3D objects/scene




20
Comments
  • Several comments are in order
  • 1) Although we have presented the levels in a
    sequential
  • order, we do not believe that is the only way of
    the
  • flow of information through the system. The only
    significance
  • in the order of levels is that the lower levels
  • are somewhat more basic and necessary for the
    higher
  • levels to function.
  • 2) In fact, the choice of at which level one
    accesses the
  • system very much depends on the given task and/or
  • the goal.

21
Active Visual Observer
  • Several groups around the world build a binocular
    active vision system that can attend to and
    fixate a moving target.
  • We will review two such systems one built at
    UPENN,GRASP laboratory and the other at KTH
    (Royal Institute of Technology) in
    Stockhols,Sweden.

22
The UPENN System
23
PennEyesA Binocular Active Vision System
24
PennEyes
  • PennEyes is a head in-hand system with a
    binocular camera platform mounted on a 6 DOF
    robotic arm. Although physically limited to reach
    of the arm, the functionality of the head is
    extended through the use of the motorized optics
    (10x zoom). The architecture is configured to
    rely minimally on external systems and .

25
Design considerations
  • MechanicalThe precision positioning was afforded
    by the PUMA arm. However the binocular camera
    platform needed to weigh in the range of 2.5 Kg.
  • Optics The use of motorized lenses (zoom, focus
    and aperture) offered an increase functionality.
  • Electronics This was the most critical element
    in the design. A MIMD DSP organization was
    decided as the best tradeoff between
    performance, extensibility and ease of
    integration.

26
Puma Polka
27
Tracking Performance
  • The two robots afforded objective measures of
    tracking performance with precision target.
  • A three dimensional path with known precision can
    be repeatedly generated , allowing the comparison
    of different visual servoing algorithms.

28
BiSight Head
29
BiSight head
  • Has an independent pan axes with the highest
    tracking performance of 1000deg/s and
    12,000deg/ssquare. The concern here is how well
    can be maintained the calibration after repeated
    exposure to acceleration and vibration.
  • Another problem occurred with zoom adjustment the
    focal length also changed.
  • The binocular camera platform has 4 optical (zoom
    and focus) and 2 mechanical (pan) degrees of
    freedom.

30
C40 Architecture
  • Beyond the basic computing power of the
    individual C40s the performance of the network is
    enhanced by the ability to interconnect the
    modules with a fair degree of flexibility as well
    as the ability store an appreciable amount of
    information. The former is made possible up to
    six comports on each module and the later by
    several Mbytes of local storage.

31
C40 Architecture
32
Critical Issues
  • The performance of any modularly structured
    active vision system depends critically on a few
    recurring issues. They involve the coordination
    of processes running on different subsystems, the
    management of large data streams, processing and
    transmission delays and the control of systems
    operating at different rates.

33
Synchronization
  • The three major components of this modular active
    vision system are independent entities that work
    at their own pace. The lack of a common time base
    makes synchronizing the components a difficult
    task.
  • In some cases , an external signal can be used to
    synchronize independent hardware components. In
    this system, C40 network, the digitizers and the
    graphics module are slaved on the vertical sync
    of the genlocked cameras.

34
Other considerations
  • Bandwidth large data streams
  • System Integration. If data throughput becomes
    the bottleneck, then some new data compression
    algorithms must be invoked.
  • Latency. Delays between the acquisition of a
    frame and the motor response to it are an
    inevitable problem of active vision systems.
    Delays make the control more difficult because
    they can cause instabilities.
  • Multi-rate control. Active vision systems
    suggests by their very nature a hierarchical
    approach to control

35
Control
  • If the visual and mechanical control rates are
    one or more orders of magnitude apart, the
    mechanical control loops are essentially
    independent of the visual control loop.
Write a Comment
User Comments (0)
About PowerShow.com