Title: Recognition
1Recognition
- Chapter 5
- Identifying and Classifying
- patterns, objects, faces
2Ch 5 Identifying and Classifying Concepts
COGNITION AND CONSCIOUSNESS Subliminal Visual
Priming COGNITION AND INDIVIDUAL DIFFERENCES
Right Hemisphere and Self-Recognition
COGNITION AND NEUROSCIENCE Causal Similarity,
Perceptual Similarity, Childrens Categorization
- I. Bottom-up Views of Pattern Recognition
- A.Structural-Description-Based (SDB) Approaches
- Pandemonium Model Marr RBC
- B.View-Based (VB) Approaches
- Template Matching and Object Recognition
- II. Top-down Processing in Pattern Recognition
- A.The Word-Superiority Effect
- B.A Scene-Superiority Effect
- C.Two-System Views of Object Recognition
- Structural Description and Episodic Systems
- III. Recognizing Faces
- A. The Face-Inversion Effect
- B. Wholistic Processing of Faces
- C. Dissociations/Associations in Face Recognition
- IV. Concepts and Categories
- Similarity Prototypes Exemplars
- Explanation-based view on Concept Representation
3Identifying and Classifying Objects
- Pattern, object and face recognition how does
this remarkable feat happen? - processes whereby we match a stimulus with stored
representations for the purpose of identification - I. Bottom-up Views of Pattern Recognition
- views of pattern recognition differ primarily in
their characterization of bottom-up processes - basic question--when engaged in object
recognition, what are we matching the information
to in memory? - A. Structural-Description-Based (SDB) Approaches
- compare patterns to a structural description in
memory includes a list of the visual features,
as well as the relationships between these
features (often called feature analysis) - representation in memory is not visually or
spatially analogous to the pattern being
recognized - compare features of incoming stimulus to an
abstract description of those features in memory - advantage particular orientation or view of the
pattern is not important no matter what the
perspective, patterns are broken down into
component parts and compared to a structural
description in memory (also called view-point
independent)
4Pandemonium Model
- feature analysis is carried out by hierarchically
organized demons - -entities each carry out different jobs that
culminate in pattern recognition - -each set of demons passes their info on to the
next set for further analysis - image demons responsible for the initial
encoding of the pattern - feature demons analyze the now encoded stimulus
in terms of its component elements (e.g., /, -,
and \) different feature demons look for each of
these elements - cognitive demons monitor work of feature demons,
looking for evidence that supports a specific
pattern - -each cognitive demon represents a different
pattern (e.g., A, V, W, B) - -cognitive demons shout with each piece of
evidence (i.e., info from feature demons like /,
-, or \) that supports their designated pattern - -multiple cognitive demons will be shouting if
the to-be-identified pattern shares features with
several cognitive demons - -e.g., the A cognitive demon, plus the V
cognitive demon will be shouting - decision demons assess shouting of cognitive
demons - evaluate which one (e.g., A or V) is shouting
the loudest (i.e, has detected the most
evidence) use that information to decide on
pattern
5Pandemonium model
- model fits well with what we know about lower
level perceptual processes - there are cells in the visual cortex that
correspond to lines of certain lengths,
orientations, etc. - so it makes sense to characterize pattern
recognition as a gradual process of evidence
accumulation based on a feature-by-feature
analysis of incoming information - problems
- -limited to explanations of how we recognize
simple patterns like letters - -representational economy--unlikely that we have
a near-infinite number of feature analyzers to
analyze a near-infinite number of features
6Visual Pathways
7Extracting Sensory MessagesVisual Processing in
the Brain (Pp. 130-131)
- Neural messages travel to brain via optic nerve
- Splits at optic chiasm so that information from
left visual field goes to right hemisphere and
vice versa - Within each hemisphere, information goes to the
lateral geniculate nucleus (90 in thalamus) and
superior colliculus (10 in midbrain) - Former deals with colour, texture, depth
- Former deals with movement
- Demonstrates parallel processing
- From the lateral geniculate nucleus,
informationgoes to primary visual cortex (in
occipital lobe) - Feature detectors Cells in cortex that react to
simple visual stimuli such as edges, lines,
angles (may be orientation specific)
8Feature Detectors in Visual Cortex
9Marrs View
- goal of the visual system is to transform a
2-dimensional retinal image into a 3-dimensional
percept that is quickly and easily identified - processing stages
- first stage register the information contained
in the retinal image - note the specifics of the retinal image depend
on the particular view of the object) - information about light intensity, boundaries,
discontinuities, edges, and groupings - used to form a primal sketch--rough rendition of
the most primitive elements of the object - information from the primal sketch is used to
construct a 2½ -D sketch - includes information about orientation, relative
depth of the visible surfaces also about
discontinuities in depth and orientation
10Marrs View
- information derived from the primal sketch and
2½-D sketch forms the basis for the construction
of a 3-D model of the object - the parts of 3-D model are termed volumetric
primitives, but are basically generalized 3-D
cylinders (a pipe-cleaner version of the object
were looking at) -I.e., there is less detail - unlike the primal sketch and the 2½ -D sketch
(which differ depending on the perspective of the
observer), the 3-D model is not view-specific
(viewpoint independent)
11Recognition-by-Components (RBC) Theory
- object recognition is a matter of parsing objects
into features - analyzed features are not simple line segments,
angles, and curves the features are basic 3-D
shapes termed geons - a total of 36 geons that serve as visual
primitives--simple shapes that can combine to
form most other more complex shapes - similar to Marrs theory in that it proposes a
series of hierarchically arranged stages whereby
information about component features is used to
identify the object - edge extraction process--looks for differences in
features like texture, luminance, color, and
results in a simple line drawing of the object - -next, two processes occur simultaneously
- -the detection of nonaccidental features--actual
features of the stimulus, rather than some
accident of a peculiar observer perspective - -parsing the object at areas where there appear
to be boundaries between the parts of the object
12Recognition-by-Components (RBC) Theory
- next, with the information gained to this point,
the components of the figure, the geons, are
determined - this set of components is matched with object
representations in memory - when a match is found, the object is identified
- critical predictions
- objects should become less identifiable the
harder it gets to recover their components
13Biederman and Cooper (1991)
- participants were presented with sketches of
objects in which 50 of the contours had been
deleted - for some figures, the contour deletion disrupted
the figure at the points of segmentation that
would be used to carve the object into its
component geons - Predictions
- object recognition should suffer
- for other figures, the contour deletion did not
prevent recovery of the geons (although the same
amount of the contour was deleted) - object recognition should not be affected
- findings were consistent with the RBC theory
- rotation of objects should not hinder recognition
- changes in orientation do not influence the basic
components of the object and their relationships
to one another
14View-Based (VB) Approaches
- objects are recognized wholistically through a
process of comparison with a stored analog - when a match is found, the pattern is recognized
- termed viewpoint-dependent because identification
of an object depends critically on the particular
perspective the viewer has - to identify the object, an image matching this
particular view must be found or, the incoming
stimulus image must be manipulated in some way
(e.g., rotated) until a match is found with
images represented in memory - An Early Attempt Template Matching
- our store of general knowledge includes a set of
templates (copies) of every pattern that we might
encounter - when we encounter a pattern that needs to be
identified the mind quickly rifles through its
set of templates when a match is found, the
pattern is given the label stored with the
template (i.e., the pattern is recognized) - even the slightest change in the pattern will
lead to a recognition failure - problems
- too rigid
- lack of economy
15Modern Versions of the View-Based Approach
Tarr and Pinker (1989)
- taught participants names for shapes shapes were
always presented in the same orientation during
training - during a test phase, these shapes were presented
for recognition - participants responded quickly if the shapes were
presented at the same orientation as in training,
but were successively slower as the degree of
rotation from that original position increased,
indicating that perception is viewpoint-dependent - after training on the new orientations,
participants eventually became equally fast no
matter what the orientation - results parallel the everyday recognition of
objects - we start with one representation of objects
(termed the canonical representation), and
through experience with objects in many different
orientations and viewed from many different
perspectives, we develop multiple
representations, or views of the objects - these multiple views serve as the templates for
later recognition - orientation tends not to affect visual
recognition under most circumstances since
everything we must recognize has received
extensive exposure from different view
16Logothetis, Poggio, and Poggio (1995)
- taught monkeys to recognize novel 3D objects from
a variety of different perspectives, and like
humans, they eventually became equally proficient
at recognizing these objects given any of the
rotations - neural activity associated with recognition
different sets of cells responded most strongly
to certain objects, indicating that certain
networks were devoted to certain objects - a given set of cells responded most strongly when
that object appeared in the same orientation as
it had during training the responses decreased
systematically with increases in the rotation
from that perspective - monkeys seemed to have what might be termed
physiological templates that were devoted to
recognizing a specific object in a specific
orientation
17Object Recognition Views or Structural
Descriptions?
- basic question--is object recognition is truly
independent of the particular view there is
support for both views - Tarr and Bulthoff (1995) suggest that object
recognition should be conceived of as a continuum - one end are heavily view-point dependent
mechanisms that are used for making subtle
discriminations among similar exemplars
(distinguishing a finch from a sparrow) - explains Tarr and colleagues results--stimuli
used required fairly subtle discriminations - object recognition appears to be viewpoint
dependent - other end are heavily view-point independent
mechanisms that are used when gross categorical
judgments are required (distinguishing a hammer
from a sparrow) - explains Biederman and colleagues
results--stimuli used require fairly gross
discriminations - object recognition process appears to be
view-point independent
18II. Top-down Processing in Pattern Recognition
- A. The Word-Superiority Effect
-
- Reicher (1969)
- participants were briefly presented with letter
strings that either did or did not form a word
(e.g., OWRK or WORK, respectively) - following a rapid display of such a letter
string, participants were queried about the
component letters - two alternatives were presented and participants
were to pick the one they had just seen - identification accuracy was higher if the letter
had been presented in the context of a word,
relative to when it had been presented in the
context of a non-word - if letter identification had been based solely on
bottom-up processing, then letter identification
accuracy should have been equivalent in the two
conditions - the data that make up the letter D do not vary
depending on the context in which D appears a
D is a D
19An Interactive Approach to Word Recognition
- word superiority effect is the result of an
interplay between the bottom-up processes and the
top-down processes - McClelland and Rumelhart (1981) model assumes
that words are represented in our mental
dictionaries at three different levels features,
letters, and as whole words
- each type of information about a word is being
analyzed simultaneously, and information about
the words identity accumulates. -example the
letter A - activation of the A detector will excite nodes
representing words that have an A, but inhibit
nodes representing words that dont have an
A--bottom-up - activation of the A detector will excite nodes
representing features that are part of the letter
A (e.g., slanted lines), but inhibit nodes that
are not part of the letter A (e.g., a curved
line)--top-down
20An Interactive Approach to Word Recognition
- when we read a word (e.g., work), info about the
component letters leads to the activation of
representations at the word level that include
these letters - this heightened activation at the word level then
feeds back to the letter level, enhancing
activation of component letters of the words
activated - but only the letters in a word (e.g., work) are
receiving the bottom-up (feature-based)
activation therefore, evidence for these
particular letters will accumulate the fastest,
facilitating their identification - nonwords dont have representations at the word
level the letters in the nonword (e.g., owrk)
dont receive this additional top-down
activation therefore speed of identifying a
letter in a non-word will be slower
21A Scene-Superiority Effect
- identifying an object within a scene is
facilitated when the object is consistent with
the scene (e.g., refrigerator in kitchen),
relative to when it is inconsistent (e.g.,
refrigerator in farm) - Biederman, Mezzanotte, and Rabinowitz (1982)
- participants saw the name of a common object,
followed by a real-world scene, and then a mask
that included a location cue - participants were to determine whether the common
object had appeared at that location in the scene - detection performance was better when the object
was consistent with the scene relative to when
the object was inconsistent
22Hollingworth and Henderson (1998)
- a subtle response bias may have been at play in
the Biederman procedure - participants were told what to look for in the
scene, prior to the scenes presentation this
expectation may have influenced the participants
response - seeing sheep might lead to an expectation of a
farm scene - if that scene is indeed presented, then
participants will have a bias to respond yes - when an inconsistent scene is presented (e.g., an
office) the participants will demand a lot of
visual evidence (i.e., bottom-up information)
before theyll acknowledge that a sheep was in
the office--in other words, they will have a bias
to respond no - employed a procedure much like the one used by
Reicher (1969) to investigate word-superiority - after presentation of a scene (e.g., farm),
participants were given a forced choice
recognition test in which two alternatives were
presented - either both scene consistent (sheep, pig) or both
scene inconsistent (coffee maker, mixer) - participants had to pick the one that was in the
scene
23Hollingworth and Henderson (1998)
- predictions
- if scene context facilitates recognition, then in
the scene-consistent trials, participants should
successfully choose the correct option a majority
of the time - if scene superiority effect had been due to a
guessing bias, then this procedure should
eliminate the scene superiority effect - when faced with two alternatives that both fit
with the context, neither item will have an
advantage--theyre both likely choices - no scene-superiority effect was found
- participants were just as good at choosing which
of two scene inconsistent objects had occurred as
they were at guessing which of two scene
consistent objects had occurred - authors conclude that scene context does not
facilitate the recognition of scene components - the processes underlying object perception seem
to be isolated from information about objects
that might occur in any given scene - the lack of effect of contextual knowledge on
object recognition is probably a good thing - if identification of objects involved consulting
general knowledge about objects both relevant and
irrelevant to the scene, it would get bogged
down top-down processing would be an unnecessary
drag on the system
24Hollingworth and Henderson (1998), cont.
- so why does it top-down information facilitate
the recognition of letters? - our experience with any given word is not as
extensive as our experience with a given natural
scene - the letters that can appear in a word are much
more constrained by context (i.e., context
provides more information about the component
letters of a word) than are the objects that can
appear in a scene - Two-System Views of Object Recognition
- objects are represented in two separate
systems--a structural description system and an
episodic system - the structural description system includes
information about the global shape of an object,
as well as the relationships among the objects
parts - the episodic system encodes semantic (i.e.,
meaning) and visual information about the
objects, such as their identity, their function,
and specifics of their visual presentation
25Cooper and Shepard (1992)
- encoding phase participants are asked to make a
judgment about possible and impossible objects - possible objects are ones whose surfaces and
edges are configured such that they could exist
in a 3-D world - impossible objects are those that could not exist
in three dimensions
- two encoding tasks
- global structure task--decide whether the object
faced right or left - induce participants to encode the global
properties of the objects, and the
interrelationships between the parts - meaningful properties task--name something that
each object resembled - required a meaningful elaboration of the object
- later phase participants are given two different
object recognition tests - one test, they are asked whether they saw each
figure earlier - other test, they are asked to rapidly classify a
series of objects as possible or impossible - some of these objects had been presented earlier
26- dependent variable is whether participants have
an easier time making the possible-impossible
judgment for figures that they had seen before
(i.e., priming) - benefit from having seen an object before is
called priming - results
- if participants had to recognize the object as
one they had seen before, the most effective
encoding was to think about it in meaningful
terms (i.e., meaningful properties task) - foreshadows levels-of-processing effect remember
material better when its processed in terms of
its meaning rather than its physical structure - if participants had to rapidly classify objects
as possible or impossible, the most effective
encoding was to notice the global structure of
the object (i.e., the global structure task)
27Structural Description System
- includes a stored representation of the overall
structure of objects and is used as the basis for
their rapid recognition - subserved performance on the speeded
possible-impossible classification test - the only encoding method that primed the
representation in this system was the global
structure task (which induced participants to
notice the overall structure of objects) - However, rapid recognition of impossible objects
was not aided by global structure encoding task. - Suggestion The structural description system
cant recognize impossible objects? - based on bottom-up processing this system
contains the actual data that we analyze
28Episodic System
- includes a stored representation of the identity
of the object and its distinctive physical
characteristics this system is the basis for our
memory of and knowledge about objects - subserved performance on the recognition test
this system stores information about what objects
are, recognition was most affected by the
encoding task that required participants to say
what each object resembled - based on top-down processing this system
includes our conceptual knowledge about the
objects that we need to identify - global judgments primed representations in the
structural description system but only for
possible objects the rapid recognition of
impossible objects showed no benefit from the
global structure judgment - the structural description system is incapable of
representing objects that could not exist in
three dimensions - indicates the importance of top-down processing
in object recognition--even the structural
description system, which provides the data for
bottom-up processing, is influenced by our
previous experience
29III. Recognizing Faces
- The Face-Inversion Effect
- the deleterious effect of inversion is
disproportionately great for faces compared to
other objects - to recognize objects, we need first-order
relational information--information about the
parts of an object, and how those parts relate to
one another - first-order relational information is not enough
to recognize faces noticing that two eyes are
above the nose, which is above the mouth may be
enough to recognize that something is a face, but
doesnt allow for recognition of who the face is - to recognize faces, we need second-order
relational information - second-order relational information involves
comparing the first order analysis to facial
features of a typical or average face - this typical face is built up through experience,
and serves as an implicit standard against which
we compare faces that we see - when a face is inverted, this disrupts the
encoding of second-order relational information
therefore inversion disproportionately harms face
recognition
30Diamond and Carey (1986)
- in addition to replicating the basic inversion
effect with human faces, they also investigated
recognition of dog faces - compared dog experts with dog non-experts
- dog experts are so experienced with dogs that
they encode dog faces in terms of second-order
relational properties - inversion should have adverse effects on the dog
non-experts recognition of human faces, but not
dog faces - inversion should have adverse effects on dog
experts recognition of both human faces and dog
faces - predicted results were obtained
- Wholistic Processing of Faces
- faces are encoded, stored, and retrieved from
memory as whole configurations, rather than as a
set of features or parts
31Tanaka and Farah (1993)
- to the degree that a given object is stored as a
set of features, then those features ought to be
useful cues in retrieving the remaining
information about the object - if an object is stored as a whole configuration
then presenting part of that whole will not be
particularly helpful in recognition - presented participants with sketches of faces and
sketches of houses, both decomposable in terms of
distinct features each face and house was given
a label, such as Larrys house, or Larrys
face - on a later recognition test, participants were
asked about the faces and houses - isolated-part condition--given a choice of two
object parts, and had to pick which one of them
had been part of an earlier-presented object
(e.g., Which of these is Larrys nose? or Which
of these is Larrys door?) - whole-object condition--given a choice of two
whole objects, and had to pick out the one they
had seen earlier (e.g., Which of these is Larrys
face? or Which of these is Larrys house?)
32Tanaka and Farah (1993)
- results
- the type of question asked didnt matter for
recognition of houses participants were just as
good at recognizing parts of houses as they were
at recognizing whole houses - for faces, the type of question did matter
participants were not as good at recognizing face
parts as they were at recognizing whole faces - face recognition is similar to view based
approaches to pattern/object recognition
33Dissociations and Associations in Face Recognition
- some suggest at least two subsystems for
recognizing letters, objects, and faces rather
than a special mechanism for face recognition - alexia deficits in the ability to recognize
printed words - object agnosia deficits in the ability to
recognize everyday objects - prosopagnosia an inability to recognize familiar
faces - There are associations and dissociations among
these three disorders - deficits in recognizing faces are often found in
association with at least some deficits in
recognizing objects - deficits in recognizing letters/words are often
found in association with at least some deficits
in recognizing objects - these two findings indicate that object
recognition does share some mechanisms with both
letter/word and face recognition
34Dissociations and Associations in Face Recognition
- deficits in object recognition with spared face
and letter/word processing or the opposite
pattern (deficits in face and letter/word
processing with spared object recognition) are
rarely found - the fact that face recognition and letter/word
recognition are almost never spared together or
impaired together indicates that face recognition
and letter/word recognition rely on quite
distinct mechanisms - the fact that problems in object recognition
quite often line up with problems in either face
or letter recognition indicates that object
recognition relies on some of the mechanisms
required for each - visual recognition involves two primary
mechanisms - one mechanism used for representation/combination
of parts - important for letter/word recognition not
important for face recognition - second mechanism used for representation/combinati
on of complex wholes - important for face recognition not important for
letter/word recognition - a combination of both is important for object
recognition