Title: A neglected problem in the computational theory of mind Object Tracking and the Mind-World gap
1A neglected problem in the computational theory
of mindObject Tracking and the Mind-World gap
- In the course of these lectures I will try to
show how several interconnected concepts are
essential to understanding mind. They are - Picking out, individuating, and nonconceptual
selection - The type-token distinction everyone is familiar
with these terms but often fail to see their
importance and relevance - This distinction crosses the proximal-distal
distinction - The need for tagging or marking individuals
to keep them distinct (but where does the tag
reside?) - The correspondence problem when do two proximal
tokens correspond to the same individual (same
distal token)? - The binding problem how does the visual system
indicate that several properties are conjoined
i.e., are properties of the same individual
2Before I begin I would like you to see a video
game that will figure in the last part of my talk
- The demonstration shows a task called Multiple
Object Tracking - Track the initially-distinct (flashing) items
through the trial (here 10 secs) and indicate at
the end which items are the targets - After each example Id like you to ask yourself,
How do I do it? - If you are like most of our subjects you will
have no idea, or a false idea
3Keep track of the objects that flash 512x6.83
172x 169
4How do we do it? What properties of individual
objects do we use?
5Going behind occluding surfaces does not disrupt
tracking
Scholl, B. J., Pylyshyn, Z. W. (1999). Tracking
multiple items through occlusion Clues to visual
objecthood. Cognitive Psychology, 38(2), 259-290.
6Not all well-defined features can be
trackedTrack endpoints of these linesEndpoints
move exactly as the squares did!
7(No Transcript)
8The basic problem of cognitive science
- What determines our behavior is not how the world
is, but how we represent it as being - As Chomsky pointed out in his review of Skinner,
if we describe behavior in relation to the
objective properties of the world, we would have
to conclude that behavior is essentially
stimulus-independent - Nearly every naturally-occurring person-level
action or behavioral regularity is cognitively
penetrable - Any information that changes beliefs can
systematically and rationally change behavior
9Representation and Mind Why representations are
essential
- Do representations only come into play in higher
level mental activities, such as reasoning? - Even at early stages of perception many of the
states that must be postulated are
representations (i.e. what they are about plays a
role in explanations).
10Examples from vision (1) Intrapercept
constraints Epstein, W. (1982). Percept-percept
couplings. Perception, 11, 75-83.
Far Top/ Far High
Front Bottom/ Back bottom
11Another example of a classical representation
12Other forms of representation.
Note the essential role played by the
letter-labels
- Lines FG, BC are parallel and equal.
- Lines EH, AD are parallel and equal.
- Lines FB, GC are parallel and equal.
- Lines EA, HD are parallel and equal.
- Vertices EF, HG, DC and AB are joined....
- Other predicate-argument representations
- Part-OfCube Top-Face(EFGH), Bottom-Face(ABCD),
Front-Face(FGCB), Back-Face(EHDA) - Part-OfTop-Face(Front-Edge(FG), Back-Edge(EH),
Left-Edge(EF), Right-Edge(HG),
13Whats wrong with this picture?
- Whats wrong is that the CTM is incomplete
it does not address a number of fundamental
questions - It fails to specify how representations connect
with what they represent its not enough to use
English words in the representation (thats been
a common confusion in AI) or to draw pictures (a
common confusion in theories of reasoning with
mental images) - English labels and pictures may help the theorist
recall which objects are being referred to - But what makes it the case that a particular
mental symbol refers to one thing rather than
another? - How are concepts grounded? (Symbol Grounding
Problem)
14Another way to look at what the Computational
Theory of Mind lacks
- The missing function in the CTM is a mechanism
that allows perception to refer to individual
tokens in the visual field directly and
nonconceptually - Not as whatever has properties P1, P2, P3, ...,
but as a singular term that refers directly to an
individual and does not appeal to the prior
representation of the individuals properties. - Such a reference is like a proper name or a
pointer in a computer data structure, or like a
demonstrative term (like this or that) in natural
language. But it is difference from all of
these. E.g. - Unlike a demonstrative or a deictic term, the
reference is not determined by discourse
context - Unlike a proper name it only refers to objects
currently in view - Unlike the usual sort of pointer it does not
refer by addressing a location ? rather it is
like a pointer in a computer which serves as a
variable and does not refer via a location,
despite what the term pointer might imply.
15An example from personal history Why we need to
pick out individual things without referring to
their properties
- We wanted to develop a computer system that would
reason about geometry by actually drawing a
diagram and noticing adventitious properties of
the diagram from which it would conjecture lemmas
to prove - We wanted the system to be as psychologically
realistic as possible so we assumed that it had a
narrow field of view and noticed only limited,
spatially-restricted information as it examined
the drawing - This immediately raised the problem of
coordinating noticings and led us to the idea of
visual indexes to keep track of previously
encoded parts of the diagram.
16Begin by drawing a line.
L1
17Now draw a second line.
L2
18And draw a third line.
L3
19Notice what you have so far.(noticings are local
you encode what you attend to)
L1
V6
L2
There is an intersection of two lines But which
of the two lines you drew are they? There is no
way to indicate which individual things are seen
again without a way to refer to individual
(token) things
20Look around some more to see what is there .
L5
L2
V12
Here is another intersection of two lines Is it
the same intersection as the one seen
earlier? Without a special way to keep track of
individuals the only way to tell would be to
encode unique properties of each of the lines.
Which properties should you encode?
21In examining a geometrical figure one only gets
to see a sequence of local glimpses
22The incremental construction of visual
representations requires solving a correspondence
problem over time
- We have to determine whether a particular
individual element seen at time t is identical to
another individual element seen at a previous
time t-? . This is one manifestation of the
correspondence problem. - Solving the correspondence problem is equivalent
to picking out and tracking the identity of token
individuals as they change their appearance,
their location or the way they are encoded or
conceptualized - To do that we need the capacity to refer to token
individuals (I will call them objects) without
doing so by appealing to their properties. This
requires a special form of demonstrative
reference I call a Visual Index.
23A note about the use of labels in this example
- There are two purposes for figure labels. One is
to specify what type of individual it is (line,
vertex,..). The other is to specify which
individual it is so it can be bound to the
argument of a predicate which can then be
evaluated. - The second of these is what I am concerned with
because it is essential that we be able to
indicate which individual a predicate applies to.
- Many people (e.g., Marr, Yantis) have suggested
that individuals may be marked by tags. But that
wont do since one cannot literally place a tag
on an object. Even if we could it would not
obviate the need to refer directly to individuals
for the same reason that labels didnt help in
the geometry examples discussed earlier. - Labeling things in the world is not enough
because to refer to the line labeled L1 you would
have to be able to think this is line L1 and
you could not think that unless you had a way to
first picking out the referent of this.
24- The difference between a direct (demonstrative)
way and a descriptive (attributive) way of
picking something out has produced many You are
here cartoons. - It is also illustrated in this recent New Yorker
cartoon
25The difference between descriptive and
demonstrative ways of picking something out
(illustrated in this New Yorker cartoon by
Sipress )
26Referring and Picking out
- Picking out entails individuating, in the sense
of separating an individual from a background
(what Gestalt psychologists called a
figure-ground distinction) and from all other
possible things - This sort of picking out has been studied in
psychology under the heading of focal or
selective attention. - Focal attention can be understood as an instance
of demonstrative reference! - Focal attention appears to pick out and adhere to
objects rather than places - In addition to the usual unitary attention there
is also evidence for a mechanism of multiple
direct references (about 4 or 5), that I have
called a visual index or a FINST - Indexes are different from split focal attention
in many ways that we have studied in our
laboratory (I will mention a few later) - A visual index is like a pointer in a computer
data structure it allows access but does not
itself reveal anything about what is being
pointed to
27The requirements for picking out and keeping
track of several individual things reminded me of
an early comic book character called Plastic Man
28Imagine being able to place several of your
fingers on things in the world without
recognizing their properties while doing so. You
could then refer to those things (e.g. what
finger 2 is touching) and could move your
attention to them. You would then be said to
possess FINgers of INSTantiation (FINSTs)
29Some questions raised by this view of indexing as
primitive reference
- Is there a limit on the number of such indexes?
If so - Is it fixed structural (architectural) property?
- Can it be altered by different tasks, experience,
etc? - How is it different from focal attention?
- What determines whether something is attended?
- What object properties allow objects to be
tracked? - How can an object be selected without being
selected as the object with property P (e.g.,
the object at location ltx,ygt)? Selection is a
misleading term. - Without some unique property how do you know
which object you have selected? This is a
misleading way to put it?
30FINST Theory postulates a limited number of
pointers in early vision that are elicited by
certain things in the visual field and that
enable vision to refer to those things without
doing so under concept or a description
31FINSTs and Object Files are the basic mechanisms
that link the world and its conceptualization
The only thing in this picture that is conceptual
is whats in the Object Files (unless you count a
reference as conceptual)
Object File contents are conceptual!
32Summarizing FINST Theory
- A FINST is a primitive reference mechanism that
normally references individual visible objects in
the world. There are a small number (4-5) FINSTs
available at any one time. - Objects are picked out and referred to without
using any encoding of their properties, including
their location. - Referring to objects (or more accurately, being
grabbed by objects) is prior to encoding any of
their properties! - Indexing is nonconceptual because it does not
represent an individual as a member of some
conceptual category. - An important function of FINST indexes is to bind
arguments of visual predicates to things in the
world. Only predicates with bound arguments can
be evaluated. Since predicates are
quintessential concepts, an index serves as a
bridge from nonconceptual to conceptual
representations. - Similarly FINSTs can bind arguments of motor
commands, including the command to move focal
attention or gaze to the indexed object - e.g., MoveGaze(x) might be a primitive
perceptual-motor operation
NOT MoveGaze (x, y, z) which gives spatial
coordinates of the gaze target
33A note on terminology
- A FINST provides a reference to an individual
visible thing - I sometimes call this referent a FING by analogy
with FINST and sometimes an object to conform
with usage in psychology - A FINST does not pick out or refer to something
as an object, because OBJECT is a concept. So
FINGs are nonconceptual. Maybe proto object ? - I have also called it a pointer, but that
erroneously suggests that it points to the
location of an object, as opposed to the object
itself. In a computer, a pointer is the name of
a stored datum. - I have said that a FINST is a visual
demonstrative like this or that, but this too
is misleading because the reference of a
demonstrative depends on the context and
intentions of the speaker - I have also noted that a FINST is like a proper
name but that wont do either since a name can
pick out something not in sensory contact whereas
a FINST can only refer to a visible item (or one
that has been only briefly out of sight).
34A quick tour of some evidence for FINSTs
- The correspondence problem
- The binding problem
- Evaluating multi-place visual predicates
(recognizing multi-element patterns) - Operating over several visual elements at once
without having to search for them first - Subitizing
- Subset selection
- Multiple-Object Tracking
- Cognizing space without requiring a spatial
display in the head
35Dawson Configuration (Dawson Pylyshyn, 1988)
36Apparent Motion solves a correspondence
problemDawson Configuration (Dawson Pylyshyn,
1988)
37Apparent Motion solves a correspondence
problemDawson Configuration (Dawson Pylyshyn,
1988)
Linear trajectory?
Curved trajectory?
Which criterion does the visual module prefer?
38Apparent Motion solves a correspondence
problemDawson Configuration (Dawson Pylyshyn,
1988)
Nearest vector distance?
Nearest mean distance?
Nearest configural distance?
Which criterion does the visual module prefer?
39Dawson Configuration (animated)
40Dawson Configuration (animated)
41Dawson Configuration Different Shapes Ignored
42Yantis use of the Ternus Configuration to
demonstrate the early visual effect of objecthood
Short time delays result in element motion (the
middle object persists as the same object so it
does not appear to move)
43Long time delays result in group motion because
the middle object does not persist but is
perceived as a new object each time it reappears
44But long delays, when the disappearance appears
to be due to occlusion by an opaque surface,
maintain objecthood, and therefore behave like
short delays
45But long delays, when the disappearance appears
to be due to occlusion by an opaque surface,
maintain objecthood, and therefore behave like
short delays
46A quick tour of some evidence for FINSTs
- The correspondence problem
- The binding problem
- Evaluating multi-place visual predicates
(recognizing multi-element patterns) - Operating over several visual elements at once
without having to search for them first - Subitizing
- Subset search
- Multiple-Object Tracking
- Cognizing space without requiring a spatial
display in the head
47Encoding conjunctions of properties and solving
the Binding Problem
- Experiments have shown that detecting
conjunctions of several properties involves
attending to the bearers of the properties.
These studies have provided a basis for
understanding an important problem in visual
analysis the Binding Problem - The following aside is to illustrate some aspects
of the problem of encoding conjunctions.
48How are conjunctions of features detected?
Read the vertical line of digits in the following
display
Under these conditions Conjunction Errors are
very frequent
49Rapid visual search (Treisman)
Find the following simple figure in the next
slide
50This case is easy and the time is independent
of how many nontargets there are because there
is only one red item. This is called a popout
search
51This case is also easy and the time is
independent of how many nontargets there are
because there is only one right-leaning item.
This is also a popout search.
52Rapid visual search (conjunction)
Find the following simple figure in the next
slide
53(No Transcript)
54Find the unique item in this slide
55Example of features and feature
conjunctions(Count Pink vs. Count Online)
56Serial vs parallel search?
- Finding an element that differs from all others
in a scene by a single feature which is called
a feature search is fast, error-free and almost
independent of how many nontargets there are,
but - Finding a target that differs from some objects
by one or more of its feature while it differs
from the other objects by another of its features
is usually slow, error-prone, and is worse when
there are more objects. (This is called a
conjunction search because several properties of
a target are needed to distinguish it from the
nontarget objects). - These results suggest that in order to find a
conjunction, attention has to be scanned serially
to all objects.
As with most empirical generalizations, this
one fails under certain conditions such as
when one of the properties is motion or 3D depth.
57Single-Feature vs Conjunction-feature search
58The idea that attention moves without eye
movements has been known for 30 years
- Because a sequence of eye movements takes several
hundred milliseconds, the serial search view
relies on the assumption that attention can move
like a beam of light, rapidly and without eye
movements - Posner experiments showed that this can occur
under exogenous (data-driven or bottom-up)
control or endogenous (voluntary or top-down)
control
59Covert movements of attention
Example of an experiment using a cue-validity
paradigm for showing that the locus of attention
moves without eye movements and for estimating
its speed. Posner, M. I. (1980). Orienting of
Attention. Quarterly Journal of Experimental
Psychology, 32, 3-25.
60Recall Posners demonstration of exogenous
attention switch
Does the improved detection in intermediate
locations entail that the spotlight of
attention moves continuously through empty space?
61Sperling Weichselgartner (1995) Episodic or
Quantal Theory of Attention switching
Assumes a quantal shift in attention in which
the spotlight pointed at location -2 is
extinguished and, simultaneously, the spotlight
at location 2 is turned on. Because extinction
and onset take a measurable amount of time, there
is a brief period when the spotlights partially
illuminate both locations simultaneously.
62So if there is a visual attention beam it must
be scanned rapidly and must fall on conjunctions
of features for the features to be encoded as
conjoined
- Now one can appreciate how something like
attention is needed to solve the binding problem
of representing features as being conjoined. But
there is also reason to believe that one needs
not one, but several such conjunction-detectors
in order to recognize patterns.
63So a that brings us to a substantial constraint
on the mechanisms of early vision They must
keep track of which properties are conjoined.
- It is not enough to detect which properties are
present in a scene. The earliest stages in the
vision system must group these properties (or
features) in the right way to preserve the
information that some properties go together. If
this information is lost, the scene cannot be
correctly perceived.
64 Pandemonium An architecture for vision,
was proposed by Oliver Selfridge in 1959. This
idea continues to be at the heart of many
psycholog-ical models of vision, especially
connectionist and neural net models. It is also
a progenitor of blackboard architectures for
computational systems for vision and speech
perception (CMUsHearsay) This architecture
fails to keep track of which features are
conjoined (which property goes with which). It
also does not provide a way to represent tokens
of the same type.
65A popular proposal for solving the Binding
Problem Encode Features-at-Locations
- The near-universal view as to how the binding
problem is solved is that conjunctions are
computed as co-located features the visual
system encodes features-at-locations. - Austen Clark (in A Theory of Sentience),
following the tradition of Quine and Strawson,
also assumes that location is primary and that in
our most primitive nonconceptual sensory contact
with the world, which he calls the level of
sentience, the only resources available are
those of what Strawson called a feature-placing
language. Our sensory system can only encode
Feature F at location L - This is also the standard view in psychology (as
embodied in Triesmans Feature Integration
Theory) - But this cant be right for some simple reasons.
66Treismans Attention as Glue Hypothesis
- The purpose of visual attention is to bind
properties together in order to recognize objects - Thus the purpose of attention is to solve the
binding problem - We can recognize not only the presence of
squareness and redness in our field of view,
but we can also distinguish between different
ways they may be conjoined
67The role of attention to location in Treismans
Feature Integration Theory
The map for feature F shows where Fs are
located. To determine that F1 and F2 are
colocated, the attention beam checks the
locations on each feature map against its
location in the Master Map.
68There is another way to read the
attention-as-glue hypothesis In computing
conjunctions of properties attention must be
directed primarily at objects since it is objects
that have the conjoined properties
- Instead of being like a spotlight beam that can
be scanned around a scene, and can be zoomed to
cover a larger or smaller area, maybe attention
can only be directed towards occupied places
i.e., to visual objects. There is now
considerable evidence for this claim, some of
which was reviewed earlier (e.g., single object
advantage, both sensitivity to detection and
inhibition of return travel as the relevant
object move).
69Individual objects and the binding problem
- We can distinguish scenes that differ by
conjunctions of properties, so early vision must
somehow keep track of how properties co-occur
conjunction must not be obscured. This is the
called the binding problem - The most common proposal is that vision keeps
track of properties according to their location
and binds together co-located properties.
70The proposal of binding conjunctions by the
location of conjuncts does not work when feature
location is not punctate and becomes even more
problematic if they are co-located e.g., if
their relation is inside
71Binding must be object-based
- The proposal that properties are conjoined by
virtue of their common location has many problems - In order to assign a location to a property you
need to know its boundaries, which requires
distinguishing the object that has those
properties from its background (figure-ground
individuation) - Properties are properties of objects, not of
locations which is why properties move when
objects move. Moreover, empty locations have no
causal properties. - The alternative to conjoining-by-location is
conjoining by object. According to this view,
solving the binding problem requires first
distinguishing individual objects and then
keeping track of each objects properties (in its
object file) - If only properties of selected objects are
encoded and if those properties are recorded in
object files specific to each object, then all
conjoined properties will be recorded in the same
object file, thus solving the binding problem
72Binding must be object-based
- Another reason why conjunction-by-location fails
is that conjunction must be computed for - Token moving objects
- Token objects whose representation is built up
incrementally over time - Token objects that appear on the two retinae
- In all cases what must be computed is the
distinction betweenthere it is again and here
is another one the correspondence between
proximal patterns when they derive from the same
distal object vs when they derive from different
objects.
Credit Austen Clark
73Attention is object-based
- There is a great deal of evidence that
attention favors objects and adheres to those
objects as they move. - The assumption that attention is like a spotlight
and moves through space has been challenged by
Weichselgartner and Sperling (1987) who provided
an alternative explanation for the Posner
evidence - Single object advantage
- Evidence that attention moves with objects
motion - Object File experiments described later that show
Object-Specific Priming Benefit (OSPB) - Even abstract objects can be tracked through
feature space - Attention is assigned to (spreads to) entire
objects when some part of the object is attended
(Egly, Driver Rafal, 1994).
74Attention spreads over perceived objects
Spreads to B and not C
Spreads to C and not B
Spreads to B and not C
Spreads to C and not B
Using a priming method (Egly, Driver Rafal,
1994) showed that the effect of a prime spreads
to other parts of the same visual object compared
to equally distant parts of different objects.
75A quick tour of some evidence for FINSTs
- The correspondence problem (mentioned earlier)
- The binding problem
- Evaluating multi-place visual predicates (or
recognizing multi-element patterns) - Operating over several visual elements at once
without having to search for them first - Subitizing
- Subset selection
- Multiple-Object Tracking
- Cognizing space without requiring a spatial
display in the head
76Being able to pick out and refer to individual
distal elements is essential for encoding patterns
- Encoding relational predicates e.g., Collinear
(x,y,z,..) Inside (x, C) Above (x,y) Square
(w,x,y,z), requires simultaneously binding the
arguments of n-place predicates to n elements in
the visual scene - Evaluating such visual predicates requires
individuating and referring to the objects over
which the predicate is evaluated i.e., the
arguments in the predicate must be bound to
individual elements in the scene. - In detecting patterns the properties of
individual objects must be ignored and the
evidence of MOT suggests that these properties
are indeed non encoded.
77Several objects must be picked out at once in
making relational judgments
When we judge that certain objects are
collinear, we must first pick out the relevant
objects while ignoring their properties
78Several objects must be picked out at once in
making relational judgments
- The same is true for other relational judgments
like inside or on-the-same-contour etc. We must
pick out the relevant individual objects first.
Are dots Inside-same contour? On-same contour?
79A quick tour of some evidence for FINSTs
- The correspondence problem
- The binding problem
- Evaluating multi-place visual predicates
(recognizing multi-element patterns) - There is evidence that we can operate over
several visual elements at once without first
having to search for them - Subitizing
- Subset selection
- Multiple-Object Tracking
- Cognizing space without requiring a spatial
display in the head
80More functions of FINSTsFurther experimental
explorationsusing different paradigms
- Recognizing the cardinality of small sets of
things Subitizing vs counting (Trick, 1994) - Searching through subsets selecting items to
search through (Burkell, 1997) - Selecting subsets and maintaining the selection
during a saccade (Currie, 2002) - Application of FINST index theory to infant
cardinality studies (Carey, Spelke, Leslie,
Uller, etc) - Indexes explain how children are able to acquire
words for objects by ostension without suffering
Quines Gavagai problem.
81Signature subitizing phenomena only appear when
objects are automatically individuated and indexed
Counting slope
subitizing slope
Trick, L. M., Pylyshyn, Z. W. (1994). Why are
small and large numbers enumerated differently? A
limited capacity preattentive stage in vision.
Psychological Review, 101(1), 80-102.
82Subitizing results
- There is evidence that a different mechanism is
involved in enumerating small (nlt4) and large
(ngt4) numbers of items (even different brain
mechanisms Dehaene Cohen, 1994) - Rapid small-number enumeration (subitizing) only
occurs when items are first (automatically)
individuated - Subitizing is not affected by precuing location
while counting is - Subitizing is insensitive to distance among
items - Our explanation for what is special about
subitizing is that once FINST indexes are
assigned to nlt 4 individual objects, the objects
can be enumerated without first searching for
them. In fact they might be enumerated simply by
counting active indexes which is fast and
accurate because it does not require visual
scanning. - New data on subitizing Limits on enumeration
may be related to recalling which items have
already been counted (Haladjian 2009). - Trick, L. M., Pylyshyn, Z. W. (1994).
Why are small and large numbers enumerated
differently? A limited capacity preattentive
stage in vision. Psychological Review, 101(1),
80-102.
83Subset selection for search
Burkell, J., Pylyshyn, Z. W. (1997). Searching
through subsets A test of the visual indexing
hypothesis. Spatial Vision, 11(2), 225-258.
84Subset search results
- The finding is that only properties of the
subset matter - Note that properties of the entire subset are
taken into account simultaneously (since that is
what distinguishes a feature search from a
conjunction search) - If the subset is a single-feature search it is
fast and the slope (RT vs number of items) is
shallow - If the subset is a conjunction search set, it
takes longer and is more sensitive to the set
size - As with subitizing, the distance between targets
does not matter, so observers dont seem to be
scanning the display looking for the target
85The stability of the visual world entails the
capacity to re-identify individuals after a
saccade
- There is no problem about how tactile selection
can provide a stable world when you move around
while keeping your fingers on the same objects
because in that case retaining individual
identity is automatic - But with FINSTs the same can be true with vision
for a small number of visual objects - This is compatible with the fact that it appears
one retains the relative location of only about 4
elements during saccadic eye movements (Irwin,
1996)Irwin, D. E. (1996). Integrating
information across saccadic eye movements.
Current Directions in Psychological Science,
5(3), 94-100.
86The selective search experiment with a saccade
induced between the late onset cues and start of
search
Even with a saccade between selection and access,
items can be accessed efficiently
87A quick tour of some evidence for FINSTs
- The correspondence problem (mentioned earlier)
- The binding problem
- Evaluating multi-place visual predicates
(recognizing multi-element patterns) - Operating over several visual elements at once
without having to search for them first - Subitizing
- Subset selection (with saccades)
- What happens when objects move?
- The ultimate test Multiple Object Tracking
88Detecting objects means not only solving the
binding problem over objects, but also detecting
and keeping track of properties as objects move.
- The correspondence problem (mentioned earlier)
- The binding problem
- Evaluating multi-place visual predicates
(recognizing multi-element patterns) - Operating over several visual elements at once
without having to search for them first - Subitizing
- Subset selection
- What happens when objects move?
- The ultimate test Multiple Object Tracking
89Object File Experiments (Kahneman,Treisman,
Gibbs 1992) Priming object information sticks
to moving objects
F
D
F
90Inhibition of return appears to be object-based
(as well as to some extent location-based)
- Inhibition-of-return is thought to help in visual
search since it prevents previously visited
objects from being revisited - The original study used static objects. Then
(Tipper, Driver Weaver, 1991) showed that IOR
moves with the inhibited object.
91The time-course of attentionInhibition of return
- If we vary the time between the cue and target in
a modified Posner paradigm, we find that when the
Cue-Target-Onset-Asynchrony (CTOA) gets to
around 300-900 ms, reaction time to the target
begins to increase. This is called
Inhibition-of-return (Klein, 2000). - To get this effect we actually have to attract
attention to the target location and then attract
it back to the origin. IOR is one of many
examples of an inhibition effect being produced
by attention.
92IOR appears to be object-based (it travels with
the object that was attended)
93Tracking objects not defined by distinct spatial
locations and spatial trajectories
Blaser, E., Pylyshyn, Z. W., Holcombe, A. O.
(2000). Tracking an object through feature-space.
Nature, 408(Nov 9), 196-199.
94Demonstrating the function of FINSTs with
Multiple Object Tracking (MOT)
- In a typical experiment, 8 simple identical
objects are presented on a screen and 4 of them
are briefly distinguished in some visual manner
usually by flashing them on and off. - After these 4 targets are briefly identified, all
objects resume their identical appearance and
move randomly. The observers task is to keep
track of the ones that had been designated as
targets at the start - After a period of 5-10 seconds the motion stops
and observers must indicate, using a mouse, which
objects are the targets
95Another example of MOT With self occlusion 5 x
5 1.75 x 1.75
96Self occlusion dues not seriously impair tracking
97Some Multiple Object Tracking Findings
- Basic finding Most people can track at least 4
targets that move randomly among identical
non-target objects (even 5 year old children can
track 3 objects) - We have now accumulated dozens of results that I
will list later as they have implications for
FINST theory. - How is it done? a first pass
- We showed that it is unlikely that the tracking
is done by keeping a record of the targets
locations and updating them by serially visiting
the objects (Pylyshyn Storm, 1998) - Other strategies may be employed (e.g., tracking
a single deforming pattern), but they do not
explain tracking - Hypothesis FINST Indexes get assigned to
targets. At the end of the trial these pointers
can be used to move attention to the targets and
hence to select them
98What role do visual properties play in MOT?
- Certain properties may have to be present in
order for an object to be indexed, and certain
properties (probably different properties) may be
required in order for the index to keep track of
the object, but this does not mean that such
properties are encoded, stored, or used in
tracking. - Compare this with Kripkes distinction between
properties that fix the referent of a proper name
and the property that the name refers to. The
former only plays a role at the names initial
baptism. - Is there something special about location? Do we
record and track properties-at-locations? - Location in time space may be essential for
individuating objects, but locations need not be
encoded or made cognitively available - The fact that an object is actually at some
location or other does not mean that it is
represented as such. Representing property P
(where P happens to be at location L) ?
Representing property P-is-at-L.
99A way of viewing what goes on in MOT
- According to Kahneman Treismans Object File
theory, the appearance of a new visual object
causes a new Object File to be created. Each
object file is associated with its respective
object presumably through a FINST Index. - The object file may contain information about the
object to which it is attached. But according to
FINST Theory, keeping track of the objects
identity does not require the use of this
information. The evidence suggests that in MOT,
little or nothing may be stored in the object
file except in some special cases (e.g., when the
object suddenly changes or disappears). - What makes something the same object over time is
that it remains connected to the same object-file
(by the same FINST). Thus, for vision to treat
something as the same enduring individual does
not require appeal to properties or concepts.
100Why is this relevant to foundational questions in
the philosophy of mind?
- According to Quine, Strawson, and most
philosophers, you cannot pick out or track
individuals without concepts (sortals) - But you also cannot pick out individuals with
only concepts - Sooner or later you have to pick out individuals
using non-conceptual causal connections between
thoughts and things - The present proposal is that FINSTs provide the
needed non-conceptual mechanism for individuating
objects and for tracking their identity, which
works most of the time in our kind of world. It
relies on natural constraints (Marr) - FINST indexes provide the right sort of
connection for predicating properties of the
world by allowing the arguments of predicates to
be bound to objects prior to the predicates being
evaluated. They may thus be the basis for early
vocabulary learning.
101But there must be some properties that cause
indexes to be grabbed!
- Of course there are properties that are causally
responsible for indexes being grabbed, and also
properties (probably different ones) that make it
possible for objects to be tracked - But these properties need not be represented
(encoded) and used in tracking - The distinction between object properties that
cause indexes to be assigned and those that are
represented (in Object Files) is similar to
Kripkes distinction between properties that are
needed to pick out and name an object and those
that constitute its meaning
102Role of target properties in MOT Evidence that
they play little or no part in tracking
- Changes of target properties are not reported nor
even noticed during MOT - Keeping all targets at different color, size, or
shape does not improve tracking - Observers do not use target speed or direction in
tracking (e.g., by anticipating where the targets
will reappear after occlusion). But they do
appear to retain the objects locations at the
time they disappeared since if they reappear at
the location where they disappeared, tracking is
not impaired.
103Some open questions
- We have arrived at the view that only properties
of selected (indexed) objects enter into
subsequent conceptualization and perception-based
thought (i.e., only information in object files
is made available to cognition) - So what happens to the rest of the visual
information? - Visual information seems rich and fine-grained
while this theory only allows for the properties
of 4 or 5 objects to be encoded! - The present view leaves no room for nonconceptual
representations whose content corresponds to the
content of conscious experience - According to the present view, the only content
that nonconceptual representations have is the
demonstrative content of indexes that refer to
perceptual objects - Question Why do we need any more than that?
104An intriguing possibility.
- Maybe the theoretically relevant information we
take in is less than (or at least different from)
what we experience - This possibility has received attention recently
with the discovery of various blindnesses
(e.g., change-blindness, inattentional blindness,
blindsight) as well as the discovery of
independent-vision systems (e.g., recognition and
motor control) - The qualitative content of conscious experience
may not play a role in explanations of cognitive
processes - Even if unconceptualized information enters into
causal process (e.g., motor control) it may not
be represented or made available to the cognitive
mind it not even as a nonconceptual
representation - For something to be a representation its content
must figure in explanations it must capture
generalizations. It must have truth conditions
and therefore allow for misrepresentation. It is
an empirical question whether current proposals
do (e.g., primal sketch, scenarios). cf Devitt
Pylyshyns Razor
105Vision science has always been deeply ambivalent
about role of conscious experience
- Isnt how things appear one of the things that
our theories must explain? Answer There is no a
priori must explain! - The content of subjective experience is a major
source of evidence. But it may turn out not to
be the most reliable source for inferring the
relevant functional states. It competes with
other types of evidence. - How things appear cannot be taken at face value
it carries substantive theoretical assumptions.
It also draws on many levels of processing. - It was a serious obstacle to early theories of
vision (Kepler and the inverted image) - It has been a poor guide in the case of theories
of mental imagery (e.g., color mixing, image
size, image distances). Reading X off an image
is an illusion. - It seems likely that vision science will use
evidence of conscious experience the way
linguistics uses evidence of grammatical
intuitions only as it is filtered through
developing theories. - The questions a science is expected to answer
cannot be set in advance they change as the
science develops. If they change too much we may
give up our current theories.
106Index capacity and learning
- Daphne Baveliers lab (Rochester) has shown that
videogame players (VGPs) can track a larger
number of objects in MOT (about 2 more targets). - Non VGPs can also increase the number tracked
after only 9 hrs of practice on certain kinds of
(mostly violent) video games - José Rivest (York U) has shown that some athletes
can track more targets than non-athletes - Within individuals the main determiner of number
of targets that can be tracked is the spacing
between them (crowding). - A widely cited result alleged to show that the
limit is not architectural is the effect of speed
on tracking - We have shown that this is because increasing
speed increases crowding averaged over time.
When crowding is constant, speed is not a factor.
107What next?
- This picture leaves many unanswered questions,
but it does provide a mechanism for solving the
binding problem and also explaining how mental
representations could have a nonconceptual
connection with objects in the world (something
required if mental representations are to connect
with actions)
108Schema for how FINSTs function in hockey
109- For a copy of these slides seehttp//ruccs.rutge
rs.edu/faculty/pylyshyn/SelectionReference.ppt
110You are now here
X
But you are also here
111(No Transcript)
112Additional examples of MOT
- MOT with occlusion
- MOT with virtual occluders
- MOT with matched nonoccluding disappearance
- Track endpoints of lines
- Track rubber-band linked boxes
- Track and remember ID by location
- Track and remember ID by name (number)
- Track while everything briefly disappears (½ sec)
and goes on moving while invisible - Track while everything briefy disappears and
reappears where they were when they disappeared