A neglected problem in the computational theory of mind Object Tracking and the Mind-World gap

About This Presentation

Title:

A neglected problem in the computational theory of mind Object Tracking and the Mind-World gap

Description:

Title: Lecture 1 Connecting vision and the world: An empirical introduction Author: Zenon Pylyshyn Last modified by: Zenon Pylyshyn Created Date – PowerPoint PPT presentation

Number of Views:118

Avg rating:3.0/5.0

Slides: 62

Provided by: ZenonPy1

Learn more at: https://ruccs.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: A neglected problem in the computational theory of mind Object Tracking and the Mind-World gap

1
A neglected problem in the computational theory
of mindObject Tracking and the Mind-World gap

Zenon Pylyshyn
Rutgers Center for Cognitive Science

2
Before I begin I would like you to see a video
game that will figure in the last part of my talk

The demonstration shows a task called Multiple
Object Tracking
Track the initially-distinct (flashing) items
through the trial (here 10 secs) and indicate at
the end which items are the targets
After each example Id like you to ask yourself,
How do I do it?
If you are like most of our subjects you will
have no idea, or a false idea

3
Keep track of the objects that flash 512x6.83
172x 169
4
How do we do it? What properties of individual
objects do we use?
5
Going behind occluding surfaces does not disrupt
tracking
Scholl, B. J., Pylyshyn, Z. W. (1999). Tracking
multiple items through occlusion Clues to visual
objecthood. Cognitive Psychology, 38(2), 259-290.
6
Not all well-defined features can be
trackedTrack endpoints of these linesEndpoints
move exactly as the squares did!
7
(No Transcript)
8
The basic problem of cognitive science

What determines our behavior is not how the world
is, but how we represent it as being
As Chomsky pointed out in his review of Skinner,
if we describe behavior in relation to the
objective properties of the world, we would have
to conclude that behavior is essentially
stimulus-independent
Every naturally-occurring behavioral regularity
is cognitively penetrable
Any information that changes beliefs can
systematically and rationally change behavior

9
Representation and Mind Why representations are
essential

Do representations only come into play in higher
level mental activities, such as reasoning?
Even at early stages of perception many of the
states that must be postulated are
representations (i.e. what they are about plays a
role in explanations).

10
Examples from vision (1) Intrapercept
constraints Epstein, W. (1982). Percept-percept
couplings. Perception, 11, 75-83.
11
Examples from vision (2)The Pogendorf iIlusion
depends on perceived contours they need not be
physical edges
12
The rules of color mixing apply to perceived color

Red light and yellow light mix to produce
orange light
This law holds regardless of how the red light
and yellow light are produced
The yellow may be light of 580 nanometer
wavelength, or it may be a mixture of light of
530 nm and 650 nm wavelengths.
So long as one light looks yellow and the other
looks red the law will hold the mixture will
look orange.

13
Another example of a classical representation
14
Other forms of representation.

Lines FG, BC are parallel and equal.
Lines EH, AD are parallel and equal.
Lines FB, GC are parallel and equal.
Lines EA, HD are parallel and equal.
Vertices EF, HG, DC and AB are joined....
Part-OfCube, Top-Face(EFGH), Bottom-Face(ABCD),
Front-Face(FGCB), Back-Face(EHDA)
Part-OfTop-Face(Front-Edge(FG), Back-Edge(EH),
Left-Edge(EF), Right-Edge(HG),

15
Whats wrong with this picture?

Whats wrong is that the CTM is incomplete
it does not address a number of fundamental
questions
It fails to specify how representations connect
with what they represent its not enough to use
English words in the representation (thats been
a common confusion in AI) or to draw pictures (a
common confusion in theories of mental imagery)
English labels and pictures may help the theorist
recall which objects are being referred to
But what makes it the case that a particular
mental symbol refers to one thing rather than
another?
How are concepts grounded? (Symbol Grounding
Problem)

16
Another way to look at what the Computational
Theory of Mind lacks

The missing function in the CTM is a mechanism
that allows perception to refer to individual
things in the visual field directly and
nonconceptually
Not as whatever has properties P1, P2, P3, ...,
but as a singular term that refers directly to an
individual and does not appeal to a
representation of the individuals properties.
Such a reference is like a proper name or a
pointer in a computer data structure, or like a
demonstrative term (like this or that) in natural
language.
Note that in a computer a pointer does not refer
via a location, despite what the term
pointer suggests

17
An example from personal history Why we need to
pick out individual things without referring to
their properties

We wanted to develop a computer system that would
reason about geometry by actually drawing a
diagram and noticing adventitious properties of
the diagram from which it would conjecture lemmas
to prove
We wanted the system to be as psychologically
realistic as possible so we assumed that it had a
narrow field of view and noticed only limited,
spatially-restricted information as it examined
the drawing
This immediately raised the problem of
coordinating noticings and led us to the idea of
visual indexes to keep track of previously
encoded parts of the diagram.

18
Begin by drawing a line.
L1
19
Now draw a second line.
L2
20
And draw a third line.
L3
21
Notice what you have so far.(noticings are local
you encode what you attend to)
L1
V6
L2
There is an intersection of two lines But which
of the two lines you drew are they? There is no
way to indicate which individual things are seen
again without a way to refer to individual
(token) things
22
Look around some more to see what is there .
L5
L2
V12
Here is another intersection of two lines Is it
the same intersection as the one seen
earlier? Without a special way to keep track of
individuals the only way to tell would be to
encode unique properties of each of the lines.
Which properties should you encode?
23
In examining a geometrical figure one only gets
to see a sequence of local glimpses
24
The incremental construction of visual
representations requires solving a correspondence
problem over time

We have to determine whether a particular
individual element seen at time t is identical to
another individual element seen at a previous
time t-? . This is one manifestation of the
correspondence problem.
Solving the correspondence problem is equivalent
to picking out and tracking the identity of token
individuals as they change their appearance,
their location or the way they are encoded or
conceptualized
To do that we need the capacity to refer to token
individuals (I will call them objects) without
doing so by appealing to their properties. This
requires a special form of demonstrative
reference I call a Visual Index.

25
A note about the use of labels in this example

There are two purposes for figure labels. One is
to specify what type of individual it is (line,
vertex,..). The other is to specify which
individual it is so it is individuated and thus
can be selected or bound to the argument of a
predicate.
The second of these is what I am concerned with
because indicating which individual it is is
essential in vision.
Many people (e.g., Marr, Yantis) have suggested
that individuals may be marked by tags, but that
wont do since one cannot literally place a tag
on an object and even if we could it would not
obviate the need to individuate and index just as
labels dont help.
Labeling things in the world is not enough
because to refer to the line labeled L1 you would
have to be able to think this is line L1 and
you could not think that unless you had a way to
first picking out the referent of this.

The difference between a direct (demonstrative)
and a descriptive way of picking something out
has produced many You are here cartoons.
It is also illustrated in this recent New Yorker
cartoon

27
The difference between descriptive and
demonstrative ways of picking something out
(illustrated in this New Yorker cartoon by
Sipress )
28
Picking out

Picking out entails individuating, in the sense
of separating something from a background (what
Gestalt psychologists called a figure-ground
distinction)
This sort of picking out has been studied in
psychology under the heading of focal or
selective attention.
Focal attention appears to pick out and adhere to
objects rather than places
In addition to a unitary focal attention there is
also evidence for a mechanism of multiple
references (about 4 or 5), that I have called a
visual index or a FINST
Indexes are different from focal attention in
many ways that we have studied in our laboratory
(I will mention a few later)
A visual index is like a pointer in a computer
data structure it allows access but does not
itself tell you anything about what is being
pointed to

29
The requirements for picking out and keeping
track of several individual things reminded me of
an early comic book character called Plastic Man
30
Imagine being able to place several of your
fingers on things in the world without
recognizing their properties while doing so. You
could then refer to those things (e.g. what
finger 2 is touching) and could move your
attention to them. You would then be said to
possess FINgers of INSTantiation (FINSTs)
31
FINST Theory postulates a limited number of
pointers in early vision that are elicited by
certain events in the visual field and that
enable vision to refer to those things without
doing so under concept or a description
32
FINSTs and Object Files form the link between the
world and its conceptualization
The only nonconceptual contents in this picture
are FINST indexes!
Object File contents are conceptual!
33
Summarizing FINSTs

A FINST is a primitive reference mechanism that
normally references individual visible objects in
the world. There are a small number (4-5) FINSTs
available at any one time.
Objects are picked out and referred to without
using any encoding of their properties, including
their location.
Picking out objects is prior to encoding any
properties!
Indexing is nonconceptual because it does not
represent an individual as a member of some
conceptual category.
An important function of FINST indexes is to bind
arguments of visual predicates to things in the
world to which they refer. Only predicates with
bound arguments can be evaluated. Since
predicates are quintessential concepts, an index
serves as a bridge from nonconceptual to
conceptual representations.
Similarly they can bind arguments of motor
commands, including the command to move focal
attention or gaze to the indexed object e.g.,
MoveGaze(x)

34
A note on terminology

A FINST provides a reference to an individual
visible thing
I sometimes call this referent a FING by analogy
with FINST and sometimes an object to conform
with usage in psych, but FINGs are nonconceptual
so they do not pick out something as an object,
because OBJECT us a concept. Maybe proto
object?
I have also called it a pointer, but that
erroneously suggests that it points to the
location of an object, as opposed to the object
itself. In a computer, a pointer is the name of
a stored datum.
I have said that a FINST is a visual
demonstrative like this or that, but that too
is misleading because the reference of a
demonstrative depends on the intentions of the
speaker
I have also noted that a FINST is like a proper
name but that wont do since a name can pick out
something not in sensory contact whereas a FINST
can only refer to a visible item (or one that is
briefly out of sight).

35
A quick tour of some evidence for FINSTs

The correspondence problem
The binding problem
Evaluating multi-place visual predicates
(recognizing multi-element patterns)
Operating over several visual elements at once
without having to search for them first
Subitizing
Subset search
Multiple-Object Tracking
Cognizing space without requiring a spatial
display in the head

36
A quick tour of some evidence for FINSTs

The correspondence problem (mentioned earlier)
The binding problem
Evaluating multi-place visual predicates
(recognizing multi-element patterns)
Operating over several visual elements at once
without having to search for them first
Subitizing
Subset selection
Multiple-Object Tracking
Cognizing space without requiring a spatial
display in the head

37
Individual objects and the binding problem

We can distinguish scenes that differ by
conjunctions of properties, so early vision must
somehow keep track of how properties co-occur
conjunction must not be obscured. This is the
called the binding problem
The most common proposal is that vision keeps
track of properties according to their location
and binds together co-located properties.

38
The proposal of binding conjunctions by the
location of conjuncts does not work when feature
location is not punctate and becomes even more
problematic if they are co-located e.g., if
their relation is inside
39
PandemoniumAn early architecture, was
proposed by Oliver Selfridge in 1959. This idea
continues to be at the heart of many
psychological models, including ones implemented
in contemporary connectionist or neural net
models.
40
Binding as object-based

The proposal that properties are conjoined by
virtue of their common location has many problems
In order to assign a location to a property you
need to know its boundaries, which requires
distinguishing the object that has those
properties from its background (figure-ground
individuation)
Properties are properties of objects, not of
locations which is why properties move when
objects move. Empty locations have no causal
properties.
The alternative to conjoining-by-location is
conjoining by object. According to this view,
solving the binding problem requires first
selecting individual objects and then keeping
track of each objects properties (in its object
file)
If only properties of selected objects are
encoded and if those properties are recorded in
object files specific to each object, then all
conjoined properties will be recorded in the same
object file, thus solving the binding problem

41
Attention spreads over perceived objects
Spreads to B and not C
Spreads to C and not B

Spreads to B and not C
Spreads to C and not B
Using a priming method (Egly, Driver Rafal,
1994) showed that the effect of a prime spreads
to other parts of the same visual object compared
to equally distant parts of different objects.
42
A quick tour of some evidence for FINSTs

The correspondence problem (mentioned earlier)
The binding problem
Evaluating multi-place visual predicates
(recognizing multi-element patterns)
Operating over several visual elements at once
without having to search for them first
Subitizing
Subset selection
Multiple-Object Tracking
Cognizing space without requiring a spatial
display in the head

43
Being able to pick out and refer to individual
distal elements is essential for encoding patterns

Encoding relational predicates e.g., Collinear
(x,y,z,..) Inside (x, C) Above (x,y) Square
(w,x,y,z), requires simultaneously binding the
arguments of n-place predicates to n elements in
the visual scene
Evaluating such visual predicates requires
individuating and referring to the objects over
which the predicate is evaluated i.e., the
arguments in the predicate must be bound to
individual elements in the scene.

44
Several objects must be picked out at once in
making relational judgments
When we judge that certain objects are
collinear, we must first pick out the relevant
objects while ignoring their properties
45
Several objects must be picked out at once in
making relational judgments

The same is true for other relational judgments
like inside or on-the-same-contour etc. We must
pick out the relevant individual objects first.
Are dots Inside-same contour? On-same contour?

46
A quick tour of some evidence for FINSTs

The correspondence problem
The binding problem
Evaluating multi-place visual predicates
(recognizing multi-element patterns)
Operating over several visual elements at once
without first having to search for them
Subitizing
Subset selection
Multiple-Object Tracking
Cognizing space without requiring a spatial
display in the head

47
More functions of FINSTsFurther experimental
explorationsusing different paradigms

Recognizing the cardinality of small sets of
things Subitizing vs counting (Trick, 1994)
Searching through subsets selecting items to
search through (Burkell, 1997)
Selecting subsets and maintaining the selection
during a saccade (Currie, 2002)
Application of FINST index theory to infant
cardinality studies (Carey, Spelke, Leslie,
Uller, etc)
Indexes explain how children are able to acquire
words for objects by ostension without suffering
Quines Gavagai problem.

48
Signature subitizing phenomena only appear when
objects are automatically individuated and indexed
Counting slope
subitizing slope
Trick, L. M., Pylyshyn, Z. W. (1994). Why are
small and large numbers enumerated differently? A
limited capacity preattentive stage in vision.
Psychological Review, 101(1), 80-102.
49
Subitizing results

There is evidence that a different mechanism is
involved in enumerating small (nlt4) and large
(ngt4) numbers of items (even different brain
mechanisms Dehaene Cohen, 1994)
Rapid small-number enumeration (subitizing) only
occurs when items are first (automatically)
individuated
Subitizing is not affected by precuing location
while counting is
Subitizing is insensitive to distance among
items
Our explanation for what is special about
subitizing is that once FINST indexes are
assigned to nlt 4 individual objects, the objects
can be enumerated without first searching for
them. In fact they might be enumerated simply by
counting active indexes which is fast and
accurate because it does not require visual
scanning
Trick, L. M., Pylyshyn, Z. W. (1994).
Why are small and large numbers enumerated
differently? A limited capacity preattentive
stage in vision. Psychological Review, 101(1),
80-102.

50
Subset selection for search
Burkell, J., Pylyshyn, Z. W. (1997). Searching
through subsets A test of the visual indexing
hypothesis. Spatial Vision, 11(2), 225-258.
51
Subset search results

Only properties of the subset matter but note
that properties of the entire subset are taken
into account simultaneously (since that is what
distinguishes a feature search from a conjunction
search)
If the subset is a single-feature search it is
fast and the slope (RT vs number of items) is
shallow
If the subset is a conjunction search set, it
takes longer and is more sensitive to the set
size
As with subitizing, the distance between targets
does not matter, so observers dont seem to be
scanning the display looking for the target

52
The stability of the visual world entails the
capacity to reidentify individuals after a saccade

There is no problem about how tactile selection
can provide a stable world when you move around
while keeping your fingers on the same objects
because in that case retaining individual
identity is automatic
But with FINSTs the same can be true with vision
for a small number of visual objects
This is compatible with the fact that it appears
one retains the relative location of only about 4
elements during saccadic eye movements (Irwin,
1996)Irwin, D. E. (1996). Integrating
information across saccadic eye movements.
Current Directions in Psychological Science,
5(3), 94-100.

53
The selective search experiment with a saccade
induced between the late onset cues and start of
search
Even with a saccade between selection and access,
items can be accessed efficiently
54
A quick tour of some evidence for FINSTs

The correspondence problem (mentioned earlier)
The binding problem
Evaluating multi-place visual predicates
(recognizing multi-element patterns)
Operating over several visual elements at once
without having to search for them first
Subitizing
Subset selection
Multiple-Object Tracking
Cognizing space without requiring a spatial
display in the head

55
Demonstrating the function of FINSTs
withMultiple Object Tracking (MOT)

In a typical experiment, 8 simple identical
objects are presented on a screen and 4 of them
are briefly distinguished in some visual manner
usually by flashing them on and off.
After these 4 targets are briefly identified, all
objects resume their identical appearance and
move randomly. The observers task is to keep
track of the ones that had been designated as
targets at the start
After a period of 5-10 seconds the motion stops
and observers must indicate, using a mouse, which
objects are the targets

56
Another example of MOT With self occlusion 5 x
5 1.75 x 1.75
57
Self occlusion dues not seriously impair tracking
58
Some findings of Multiple Object Tracking

Basic finding Most people can track at least 4
targets that move randomly among identical
non-target objects (even 5 year old children can
track 3 objects)
Object properties do not appear to be recorded
during tracking and tracking is not improved if
all objects are visually distinct (no two objects
have the same color, shape or size)
How is it done?
We showed that it is unlikely that the tracking
is done by keeping a record of the targets
locations and updating them by serially visiting
the objects (Pylyshyn Storm, 1998)
Other strategies may be employed (e.g., tracking
a single deforming pattern), but they do not
explain tracking
Hypothesis FINST Indexes get assigned to
targets. At the end of the trial these pointers
can be used to move attention to the targets and
hence to select them

59
What role do visual properties play in MOT?

Certain properties may have to be present in
order for an object to be indexed, and certain
properties (probably different properties) may be
required in order for the index to keep track of
the object, but this does not mean that such
properties are encoded, stored, or used in
tracking.
Compare this with Kripkes distinction between
properties that fix the referent of a proper name
and the property that the name refers to. The
former only plays a role at the names initial
baptism.
Is there something special about location? Do we
record and track properties-at-locations?
Location in time space may be essential for
individuating objects, but locations need not be
encoded or made cognitively available
The fact that an object is actually at some
location or other does not mean that it is
represented as such. Representing property P
(where P happens to be at location L) ?
Representing property P-is-at-L.

60
A way of viewing what goes on in MOT

According Kahneman Treismans Object File
theory, the appearance of a new visual object
causes a new Object File to be created. Each
object file is associated with its respective
object presumably through a FINST Index.
The object file may contain information about the
object to which it is attached. But according to
FINST Theory, keeping track of the objects
identity does not require the use of this
information. The evidence suggests that in MOT,
little or nothing is stored in the object file
except maybe in special cases (e.g., when the
object suddenly changes or disappears).
What makes something the same object over time is
that it remains connected to the same object-file
(by the same FINST). Thus, for vision to treat
something as the same enduring individual does
not require appeal to properties or concepts.

61
Why is this relevant to foundational questions in
the philosophy of mind?

According to Quine, Strawson, and most
philosophers, you cannot pick out or track
individuals without concepts (sortals)
But you also cannot pick out individuals with
only concepts
Sooner or later you have to pick out individuals
using non-conceptual causal connections between
thoughts and things
The present proposal is that FINSTs provide the
needed non-conceptual mechanism for individuating
objects and for tracking their identity, which
works most of the time in our kind of world. It
relies on a natural constraint (Marr)
FINST indexes provide the right sort of
connection for predicating properties of the
world by allowing the arguments of predicates to
be bound to objects prior to the predicates being
evaluated. They may thus be the basis for early
vocabulary learning.

62
But there must be some properties that cause
indexes to be grabbed!

Of course there are properties that are causally
responsible for indexes being grabbed, and also
properties (probably different ones) that make it
possible for objects to be tracked
But these properties need not be represented
(encoded) and used in tracking
The distinction between object properties that
cause indexes to be assigned and those that are
represented (in Object Files) is similar to
Kripkes distinction between properties that are
needed to pick out name an object and those that
constitute its meaning

63
Effect of target properties on MOT

Changes of target properties are not reported nor
even noticed during MOT
Keeping all targets at different color, size, or
shape does not improve tracking
Observers do not use target speed or direction in
tracking (e.g., by anticipating where the targets
will be when they reappear after occlusion)

64
Some open questions

We have arrived at the view that only properties
of selected (indexed) objects enter into
subsequent conceptualization and perception-based
thought (i.e., only information in object files
is made available to cognition)
So what happens to the rest of the visual
information?
Visual information seems rich and fine-grained
while this theory only allows for the properties
of 4 or 5 objects to be encoded!
The present view leaves no room for nonconceptual
representations whose content corresponds to the
content of conscious experience
According to the present view, the only content
that nonconceptual representations have is the
demonstrative content of indexes that refer to
perceptual objects
Question Why do we need any more than that?

65
An intriguing possibility.

Maybe the theoretically relevant information we
take in is less than (or at least different from)
what we experience
This possibility has received attention recently
with the discovery of various blindnesses
(e.g., change-blindness, inattentional blindness,
blindsight) as well as the discovery of
independent-vision systems (e.g., recognition and
motor control)
The qualitative content of conscious experience
may not play a role in explanations of cognitive
processes
Even if unconceptualized information enters into
causal process (e.g., motor control) it may not
be represented or made available to the cognitive
mind it not even as a nonconceptual
representation
For something to be a representation its content
must figure in explanations it must capture
generalizations. It must have truth conditions
and therefore allow for misrepresentation. It is
an empirical question whether current proposals
do (e.g., primal sketch, scenarios). cf Devitt
Pylyshyns Razor

66
Vision science has always been deeply ambivalent
about role of conscious experience

Isnt how things appear one of the things that
our theories must explain? Answer There is no a
priori must explain!
The content of subjective experience is a major
type of evidence. But it may turn out not to be
the most reliable source for inferring the
relevant functional states. It competes with
other types of evidence.
How things appear cannot be taken at face value
it carries substantive theoretical assumptions.
It also draws on many levels of processing.
It was a serious obstacle to early theories of
vision (Kepler)
It has been a poor guide in the case of theories
of mental imagery (e.g., color mixing, image
size, image distances). Reading X off an image
is an illusion.
It seems likely that vision science will use
evidence of conscious experience the way
linguistics uses evidence of grammatical
intuitions only as it is filtered through
developing theories.
The questions a science is expected to answer
cannot be set in advance they change as the
science develops.

67
What next?

This picture leaves many unanswered questions,
but it does provide a mechanism for solving the
binding problem and also explaining how mental
representations could have a nonconceptual
connection with objects in the world (something
required if mental representations are to connect
with actions)

68
Schema for how FINSTs function in hockey
69

For a copy of these slides seehttp//ruccs.rutge
rs.edu/faculty/pylyshyn/SelectionReference.ppt
Or MIT PressPaperback

70
Index capacity and training

Daphne Baveliers lab (Rochester) has shown that
videogame players can track a larger number of
objects in MOT
Jose Rivest (York) has shown that some athletes
can track more targets than non-athletes
Within individuals the main determiner of number
of targets that can be tracked is the spacing
between them

71
You are now here
X
But you are also here
72
(No Transcript)
73
Additional examples of MOT

MOT with occlusion
MOT with virtual occluders
MOT with matched nonoccluding disappearance
Track endpoints of lines
Track rubber-band linked boxes
Track and remember ID by location
Track and remember ID by name (number)
Track while everything briefly disappears (½ sec)
and goes on moving while invisible
Track while everything briefy disappears and
reappears where they were when they disappeared