Title: Cambridge 2005
1When moving makes it easier
Edgar J. Bermudez Contreras
ejb21_at_sussex.ac.uk
A robotic approach to active vision.
Informatics Department University of Sussex
Introduction.
Methods and Results.
Conclusions
Some biologically inspired models of visual
systems try to represent the hierarchical
structure described by the ventral visual path in
the visual cortex. A particular model of this
kind that has been implemented for this work is
HMAX model 10. This model has been studied and
reported showing a degree of scale and
translation invariance, similarly to experimental
data reported on monkeys 9. In my research,
this model has been implemented, tested and
compared against a model of object recognition
based only in the response of the output of
low-pass filters (Difference of Gaussians and
Gabor filters) having a similarity of the
response of receptive fields in the retina (RBFM)
5,8.
We are still far from a complete understanding
of the human brain and the functional structure
of the visual cortex, however, it is important
not to isolate only the matching tasks when
studying visual systems in animals. Concepts like
embodiness, situadteness and purposive vision can
help to clarify the behavioural role or
some parallel neural tasks (like foveation,
attention, etc.) intrinsic to the visual system
itself. So far, this work gives an example of
how a simple parallel mechanism (like foveation)
in the visual cortex can reduce the complexity of
a bilogically inspired model of a visual system
and still have robustness and good performance.
Knowing how robust is the model to scale and
translation invariance allows to a better
coupling with dynamics imposed by the body of
the agent and its interactions with its
environment.
Object recognition is very important for animals
and crucial for higher primates. It represents
one of the most advanced tasks of the visual
cortex. Object recognition has received much
attention within the field of computer vision,
however, it is a very hard task to solve from a
computational point of view and still the
computational processes underlying in the visual
cortex are poorly understood 9,10.
Biologically inspired models of object
recognition and categorisation are a very useful
tool for the understanding of the processes in
the brain (apart from the fact that are models
are very robust and already tested in nature) in
contrast with more engineering solutions where
humans impose restrictions over the design of
such visual systems.
Both models used a feature space approach
(view-based) to build tuned units to carry out
the matching processes 5, 8. The response of
the visual system in the cortex is very fast, for
models dealing with expensive computation and
large amounts of information in the analysed
images (visual field), this response time is very
hard to emulate. Inherent parallel mechanisms in
the visual cortex as foveation and attentional
can be useful to reduce the amount of data being
processed 7, 8.
The future work in my research will be to
explore the advantages of using such models of
visual systems in agents to carry out
visually guided tasks. Through the study and
understanding better visual systems and
visually guided tasks in agents (artificial or
animals) can give us an important grasping about
what is going on in our brain and help us to
develop a broad range of applications , from
security systems in banks or Airports to spatial
missions to another planets.
Even when there are many open questions about the
neural processes underlying object recognition
and categorisation in the visual cortex, there
are some basic facts that are commonly accepted.
Very briefly summarised, it can be mentioned that
object recognition is thought to be mediated by
the ventral visual path from primary visual
cortex V1, over extrastriate visual areas V2 and
V4, to inferotemporal cortex IT and to prefrontal
cortex PFC playing an important role in linking
perception to memory. This ventral visual path
has a hierarchical architecture where simple
cells in V1 have small receptive fields and they
are increasing in size and complexity trhough V2
and V4 to IT where cells respond to more complex
stimuli like faces.9,10
The results so far show how the performance of
visual systems can be affected using systematic
perturbations when using such mechanisms. In this
way is clear that by adding active vision
techniques (keeping in mind the intrinsic
mechanisms of visual systems in animals), the
performance can increase and reduce the time of
response at the same time (approximately in order
of 10 in this case).
Something important to have in mind when studying
models of a system in agents is not to forget
concepts like embodiness and situadteness. That
is, often the restrictions imposed by the
environment or the body of the agent itself can
reduce the complexity of the model in question or
increase the performance of the system in general
3, 11.
Even when in general HMAX model performs better
forscale and translation invariance tests, when
a cheapmechanism of foveation is added to the
model, (following the idea about following
biologically inspired ideas), the performance of
the RBFM model is better than HMAX model with
some degree of translation invariance.
References
Sometimes the design of such systems that couple
environment and agent with the model to be
studied can be too hard to design by hand by
humans. Evolutionary Robotics is an area where
the restrictions imposed by humans over the
design of solutions are not present and the
solutions provided by this methodology can
result as an important case of study because of
this degree of freedom3.
1 Aloimonos, Y. Active Perception. Erlbaum,
Hillside, NJ, 1993. 2 Ballard, Danna. Animate
Vision. Aritficial Intelligence, 1991. 3 Beer,
R.D. Dynamical approaches to cognitive science.
Trendsin Cognitive Sciences 4(3)91-99, 2000.
4 Buhrmann, T and Searles, B. InQubator.
www.inqubator.de 5 Edelman, S, Representation
and Recognition in Vision. The MIT Press,
1999. 6 Hole, G, George, P., Eaves K., Rasek
A. Effects of geometric distortions on
face-recognition performance, Perception, volume
31, number 10, pages 1221Â -Â 1240, 2002. 7 Itti,
L. and Kock, C. Computational modelling of visual
attention. Nature,2194-203, 2001. 8 Poggio,
T. and Edelman, S. A network that learns to
recognize 3-D objects. Nature. 343, 263-266.
(1990). 9 Riesenhuber, M, and Poggio, T. How
Visual Cortex Recognizes Objects The Tale of
the Standard Model. In The Visual Neurosciences,
(Eds. L.M. Chalupa and J.S. Werner), MIT Press,
Cambridge, MA, Vol. 2, 1640-1653 (2003). 10
Reisenhuber, M. and Poggio, T. Hierarchical
models of object recognition in cortex. Nature,
Neuroscience, 2(11) 1019-1025, 1999. 11 Spier,
E. Behavioural categorisation Behaviour makes up
for bad vision. 9th Artificial Life Conference,
2004. 12 Stamp Dawkins, M, Woodington, A.
Pattern recognition and active vision in
chickens. Nature, 2000.
The results about scale invariance are expected
to be consistent withthe results presented for
translation invariance. The respective
experimentsare on going work. This work can help
to explain face recognition performance in
humans for example 6. The design of a robot
control that allows the visual system affect the
motorsystem of an agent is on going work as
well. The idea is to explore theinteraction
between the control system with motors and
sensors.
In this work, the intention of studying models of
visual systems goes together with the interest of
exploring the advantages and restrictions imposed
by the body of an agent interacting with its
environment to achieve some tasks 1,2,12.
The general goal for this research work is to
study models of visual systems in the terms
described so far, biologically inspired models of
visual systems in agents capable of interact with
its environment to achieve visually guided tasks.
The next step is to test these visual systems in
an agent interacting with its environment. A
simulator has been implemented based on the
InQubator 4 project to explore the advantages
and restrictions of embodied and situated agents
with  the models of visual systems implemented to
perform specific tasks 12.