Title: Simon J. Davies
1Visual perception object recognition
daviess_at_hope.ac.uk
2What is vision for?
- It allows us to know the qualities of distal
(distant) objects. - It helps us act on the world, by navigation
around and interacting with objects. - This lecture will look at our perception of the
world, and how our brain represents obects.
3Vision is hard
- Two-dimensional retinal representation needs to
be turned into a 3D representation (the inverse
projection problem). - There are a number of indeterminacies with this
retinal image - Size
- Shape
- Luminance, light source, reflectance and shadow
- Distance
4(No Transcript)
5(No Transcript)
6Indeterminate shape
- Perception makes an assumption that retinal
stimulation is not a product of an accident of
viewpoint - Helmholtzs likelihood principle
- Frames of reference objects are seen with
reference to other things (e.g. gravity). - Palmer et al. showed gravity based perception can
be overridden by some visual present frame of
reference.
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Indeterminate size and distance 1
- Cues in the visual system
- Accommodation the lens in our eye changes shape
to bring an object into focus. - Convergence eyes converge as objects get
closer. - Stereopsis (retinal disparity) images on each
retina are different, this produces a different
fused image dependent on distance.
12(No Transcript)
13(No Transcript)
14Indeterminate size and distance 2
- Cues in the environment
- Familiar size this is developed from
experience. - Interposition
- Linear perspective
- Relative height
- Texture gradients
- Atmospheric perspective
- Motion parallax
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Directions of processing
- Perception is interesting in that is connects
lower and higher level cognitive processes. - Perception can be affected by top-down
(conceptual) processes as well as bottom-up
processes. - The main theories of perception are polarised
around these two views.
22(No Transcript)
23Gibsons direct theory of perception
- Developed by his impression that our optic array
contains rich information. - Argues that bottom-up information is sufficient
no need for top-down processing. - This includes optic flow, texture gradients,
invariants (focus, size constancy). - Objects also seen as giving off affordances and
thus not requiring thought or learning.
24(No Transcript)
25(No Transcript)
26Perceptual constancy
- We experience some aspects of the environment as
being invariant or constant, despite differences
in size or colour. - Colour constancy the perception that a colour
is homogenous when illumination and reflectance
vary. - Size constancy the perception of stable object
size despite the fact that the size of the object
on the retina varies. - Shape constancy objects appear to be the same
shape despite viewing from different angles.
27Size constancy
Shape constancy
28Evaluation of Gibson
- Plus
- Understanding the animal-environment association.
- Most perception is dynamic, and in such
conditions there is more information than
previously thought.
- Minus
- Perception of invariants is not simple, but
involves top-down processing. - Memory and representation are important in what
and how we perceive.
29Constructivist theories (indirect perception)
- Helmholtz (1821 1894) visual input
insufficient to explain complexity of perception. - Perception is active, and is the end result of
unconscious inferential processing (empiricist). - The percept can thus be influenced by a number of
factors motivation, emotion, expectations. - Evidence for a constructive account of perception
comes from illusions, visual completion,
ambiguous objects, etc.
30(No Transcript)
31(No Transcript)
32(No Transcript)
33The Muller-Lyer and Ponzo illusions
34Synthesizing theories of perception
- Both indirect and direct theories are directed by
specific research questions leading to
different theoretical emphasis. - Goodale and Humphreys (1998) provide
physiological evidence that distinguishes
different streams of visual processing. - One of these (dorsal) is primed for action, the
other stream (ventral) is primed for recognition
and reconstruction.
35Perceptual organisation
- What we perceive is organised along a number of
perceptual principles. - These permit us to perceive and represent a world
that is coherent. - The experience error is a philosophical fault
with the direct theory of perception. - What we experience is organised, it does NOT
reflect the array of light exciting our retina.
36Gestaltism
- Developed the axiom the whole is greater than
the sum of the parts based on the experience
error. - They argue that our visual world is organised
along a set of simple principles (e.g.
figure-ground segregation). - The most common of these is grouping. This occurs
in a number of ways. By similarity, good
continuation, common fate, proximity, colour,
size, closure, symmetry, synchrony,
connectedness, parallelism, etc.
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Evaluation of Gestaltism
- Plus
- Perceptual experience provides robust support.
- The perceptual system in general adheres to their
law of Pragnanz.
- Minus
- Perceptual experiments are not generalisable.
- Gestalt assumptions did not generate many
explanations of cause. - The stage at which grouping occurs seems
vulnerable to top-down interference.
42What is it?
Its a Greeble stupid!
Thanks to Michael Tarrs lab for this image
43What is it?
Its a Greeble stupid!
44What is it?
Its a Greeble stupid!
45Object recognition
- The world we inhabit contains millions of
objects which are perceived in an infinite number
of ways. - How do we instantly recognize them given certain
problems - Problems of new perspective
- Problems of incomplete information (partial
occlusion) - Problems of new instances of old object categories
46Object Representations
- Allow us to
- Identify an object as belonging to a class of
objects (e.g. seeing a knife as different from a
fork). - Discriminate between members of the same class
(e.g. the bread knife vs. the carving knife). - To interact with the object representations are
used to help us act on the world. - Object representations can be viewpoint dependent
or viewpoint independent.
47(No Transcript)
48Theories of object recognition
- Feature theories
- Propose that objects are recognized by their
basic features. - Features can be combined in any number of ways,
so some information also needs to take account of
feature relationships.
- Template theories
- Propose that we store each instance of an object.
- Recognition of objects is good even when an
object is a new instance or viewed in a new
orientation.
49Marrs computational model
- Marr (1982) attempted to model vision in terms of
computer programs. - There are commonly thought to be four stages in
developing an image image-based stage,
surface-based stage, object-based stage, and
category based stage. - Marrs theory also uses stages, the accumulation
of which result in a full representation.
50(No Transcript)
51Marrs Primal sketches (image based)
- Raw primal sketch this 2D stage extracts
information about contrast (edges, bars, blobs,
line terminations) in the image. The result is a
set of defining primitive features. - These are then organised in the full primal
sketch. Perceptual organisation is based on
Gestalt principles. Two rules are applied the
principle of explicit naming, and the principle
of least commitment.
52The 2.5 D sketch
- The 2.5 D sketch is a surface-based
representation. It deals with identifying which
parts of a scene belong together. - This process of identifying surfaces leads to the
perception of depth in the image (i.e. 3D). - Information about depth can be inferred from
surface properties such as texture, binocular
disparity, shading, motion, etc.
53The 3D sketch
- The 3D sketch is an object-based representation.
- This stage is viewpoint-invariant rather than
viewpoint-centred. This means that information in
the representation is sufficient to recognize the
object from any viewpoint. - There are two approaches to an object-based
representation boundary approach and the
volumetric approach.
54Object recognition
- Marr develops a possible series of stages that
lead to a 3D representation. This is not
sufficient to recognize the object however. To do
this the current representation will have to be
matched with one in memory. - There are various ways this might be achieved,
and Marr considered some, but Biederman (1990)
extended and developed these into a theory of his
own.
55Marr Nishihara (1978)
- Proposed
- Basic descriptor for all object parts is a
generalised cone. - Cones come in variety of shapes.
- Hierarchical object representations.
- Structural descriptions matched against store of
descriptions for different objects. - However, restricted to symmetrical objects and
parts. - Marr died before developing this final stage in
object recognition.
56(No Transcript)
57(No Transcript)
58Biedermans (1987) Recognition-by-components
- RBC or geon theory argues that objects can be
specified by the spatial arrangement of
volumetric primitives (i.e. geons). - Thus, both stored and current representations are
basically sets of structural descriptions. - Geons form an object alphabet. The 36 geons can
be combined to form a complete object.
59(No Transcript)
60(No Transcript)
61How geons are extracted
- Geons are extracted along a number of distinct
dimensions - Cross-sectional curvature
- Symmetry
- Axis curvature
- Size variation
- Recognition of these is achieved through
non-accidental properties. This states that
visual image properties reflect properties in the
real world.
62(No Transcript)
63(No Transcript)
64Object identification
RBC
Activation of object modules
Top-down
Bottom-up
Activation of geons and relations
Parsing at regions of concavity
Detection of non- accidental properties
Edge extraction
Light transduction
65Evaluation of RBC
- How are objects within the same class
distinguished if they possess the same geons? - Most objects can be decomposed in a number of
ways. - Components can be difficult to identify in many
natural images so geons may not be the best
approach to segmentation (not one the visual
system uses).