A neglected problem in the computational theory of mind Object Tracking and the Mind-World gap - PowerPoint PPT Presentation

1 / 103
About This Presentation
Title:

A neglected problem in the computational theory of mind Object Tracking and the Mind-World gap

Description:

A neglected problem in the computational theory of mind Object Tracking and the Mind-World gap In the course of these lectures I will try to show how several ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 104
Provided by: Zeno6
Category:

less

Transcript and Presenter's Notes

Title: A neglected problem in the computational theory of mind Object Tracking and the Mind-World gap


1
A neglected problem in the computational theory
of mindObject Tracking and the Mind-World gap
  • In the course of these lectures I will try to
    show how several interconnected concepts are
    essential to understanding mind. They are
  • Picking out, individuating, and nonconceptual
    selection
  • The type-token distinction everyone is familiar
    with these terms but often fail to see their
    importance and relevance
  • This distinction crosses the proximal-distal
    distinction
  • The need for tagging or marking individuals
    to keep them distinct (but where does the tag
    reside?)
  • The correspondence problem when do two proximal
    tokens correspond to the same individual (same
    distal token)?
  • The binding problem how does the visual system
    indicate that several properties are conjoined
    i.e., are properties of the same individual

2
Before I begin I would like you to see a video
game that will figure in the last part of my talk
  • The demonstration shows a task called Multiple
    Object Tracking
  • Track the initially-distinct (flashing) items
    through the trial (here 10 secs) and indicate at
    the end which items are the targets
  • After each example Id like you to ask yourself,
    How do I do it?
  • If you are like most of our subjects you will
    have no idea, or a false idea

3
Keep track of the objects that flash 512x6.83
172x 169
4
How do we do it? What properties of individual
objects do we use?
5
Going behind occluding surfaces does not disrupt
tracking
Scholl, B. J., Pylyshyn, Z. W. (1999). Tracking
multiple items through occlusion Clues to visual
objecthood. Cognitive Psychology, 38(2), 259-290.
6
Not all well-defined features can be
trackedTrack endpoints of these linesEndpoints
move exactly as the squares did!
7
(No Transcript)
8
The basic problem of cognitive science
  • What determines our behavior is not how the world
    is, but how we represent it as being
  • As Chomsky pointed out in his review of Skinner,
    if we describe behavior in relation to the
    objective properties of the world, we would have
    to conclude that behavior is essentially
    stimulus-independent
  • Nearly every naturally-occurring person-level
    action or behavioral regularity is cognitively
    penetrable
  • Any information that changes beliefs can
    systematically and rationally change behavior

9
Representation and Mind Why representations are
essential
  • Do representations only come into play in higher
    level mental activities, such as reasoning?
  • Even at early stages of perception many of the
    states that must be postulated are
    representations (i.e. what they are about plays a
    role in explanations).

10
Examples from vision (1) Intrapercept
constraints Epstein, W. (1982). Percept-percept
couplings. Perception, 11, 75-83.
Far Top/ Far High
Front Bottom/ Back bottom
11
Another example of a classical representation
12
Other forms of representation.
Note the essential role played by the
letter-labels
  • Lines FG, BC are parallel and equal.
  • Lines EH, AD are parallel and equal.
  • Lines FB, GC are parallel and equal.
  • Lines EA, HD are parallel and equal.
  • Vertices EF, HG, DC and AB are joined....
  • Other predicate-argument representations
  • Part-OfCube Top-Face(EFGH), Bottom-Face(ABCD),
    Front-Face(FGCB), Back-Face(EHDA)
  • Part-OfTop-Face(Front-Edge(FG), Back-Edge(EH),
    Left-Edge(EF), Right-Edge(HG),

13
Whats wrong with this picture?
  • Whats wrong is that the CTM is incomplete
    it does not address a number of fundamental
    questions
  • It fails to specify how representations connect
    with what they represent its not enough to use
    English words in the representation (thats been
    a common confusion in AI) or to draw pictures (a
    common confusion in theories of reasoning with
    mental images)
  • English labels and pictures may help the theorist
    recall which objects are being referred to
  • But what makes it the case that a particular
    mental symbol refers to one thing rather than
    another?
  • How are concepts grounded? (Symbol Grounding
    Problem)

14
Another way to look at what the Computational
Theory of Mind lacks
  • The missing function in the CTM is a mechanism
    that allows perception to refer to individual
    tokens in the visual field directly and
    nonconceptually
  • Not as whatever has properties P1, P2, P3, ...,
    but as a singular term that refers directly to an
    individual and does not appeal to the prior
    representation of the individuals properties.
  • Such a reference is like a proper name or a
    pointer in a computer data structure, or like a
    demonstrative term (like this or that) in natural
    language. But it is difference from all of
    these. E.g.
  • Unlike a demonstrative or a deictic term, the
    reference is not determined by discourse
    context
  • Unlike a proper name it only refers to objects
    currently in view
  • Unlike the usual sort of pointer it does not
    refer by addressing a location ? rather it is
    like a pointer in a computer which serves as a
    variable and does not refer via a location,
    despite what the term pointer might imply.

15
An example from personal history Why we need to
pick out individual things without referring to
their properties
  • We wanted to develop a computer system that would
    reason about geometry by actually drawing a
    diagram and noticing adventitious properties of
    the diagram from which it would conjecture lemmas
    to prove
  • We wanted the system to be as psychologically
    realistic as possible so we assumed that it had a
    narrow field of view and noticed only limited,
    spatially-restricted information as it examined
    the drawing
  • This immediately raised the problem of
    coordinating noticings and led us to the idea of
    visual indexes to keep track of previously
    encoded parts of the diagram.

16
Begin by drawing a line.
L1
17
Now draw a second line.
L2
18
And draw a third line.
L3
19
Notice what you have so far.(noticings are local
you encode what you attend to)
L1
V6
L2
There is an intersection of two lines But which
of the two lines you drew are they? There is no
way to indicate which individual things are seen
again without a way to refer to individual
(token) things
20
Look around some more to see what is there .
L5
L2
V12
Here is another intersection of two lines Is it
the same intersection as the one seen
earlier? Without a special way to keep track of
individuals the only way to tell would be to
encode unique properties of each of the lines.
Which properties should you encode?
21
In examining a geometrical figure one only gets
to see a sequence of local glimpses
22
The incremental construction of visual
representations requires solving a correspondence
problem over time
  • We have to determine whether a particular
    individual element seen at time t is identical to
    another individual element seen at a previous
    time t-? . This is one manifestation of the
    correspondence problem.
  • Solving the correspondence problem is equivalent
    to picking out and tracking the identity of token
    individuals as they change their appearance,
    their location or the way they are encoded or
    conceptualized
  • To do that we need the capacity to refer to token
    individuals (I will call them objects) without
    doing so by appealing to their properties. This
    requires a special form of demonstrative
    reference I call a Visual Index.

23
A note about the use of labels in this example
  • There are two purposes for figure labels. One is
    to specify what type of individual it is (line,
    vertex,..). The other is to specify which
    individual it is so it can be bound to the
    argument of a predicate which can then be
    evaluated.
  • The second of these is what I am concerned with
    because it is essential that we be able to
    indicate which individual a predicate applies to.
  • Many people (e.g., Marr, Yantis) have suggested
    that individuals may be marked by tags. But that
    wont do since one cannot literally place a tag
    on an object. Even if we could it would not
    obviate the need to refer directly to individuals
    for the same reason that labels didnt help in
    the geometry examples discussed earlier.
  • Labeling things in the world is not enough
    because to refer to the line labeled L1 you would
    have to be able to think this is line L1 and
    you could not think that unless you had a way to
    first picking out the referent of this.

24
  • The difference between a direct (demonstrative)
    way and a descriptive (attributive) way of
    picking something out has produced many You are
    here cartoons.
  • It is also illustrated in this recent New Yorker
    cartoon

25
The difference between descriptive and
demonstrative ways of picking something out
(illustrated in this New Yorker cartoon by
Sipress )
26
Referring and Picking out
  • Picking out entails individuating, in the sense
    of separating an individual from a background
    (what Gestalt psychologists called a
    figure-ground distinction) and from all other
    possible things
  • This sort of picking out has been studied in
    psychology under the heading of focal or
    selective attention.
  • Focal attention can be understood as an instance
    of demonstrative reference!
  • Focal attention appears to pick out and adhere to
    objects rather than places
  • In addition to the usual unitary attention there
    is also evidence for a mechanism of multiple
    direct references (about 4 or 5), that I have
    called a visual index or a FINST
  • Indexes are different from split focal attention
    in many ways that we have studied in our
    laboratory (I will mention a few later)
  • A visual index is like a pointer in a computer
    data structure it allows access but does not
    itself reveal anything about what is being
    pointed to

27
The requirements for picking out and keeping
track of several individual things reminded me of
an early comic book character called Plastic Man
28
Imagine being able to place several of your
fingers on things in the world without
recognizing their properties while doing so. You
could then refer to those things (e.g. what
finger 2 is touching) and could move your
attention to them. You would then be said to
possess FINgers of INSTantiation (FINSTs)
29
Some questions raised by this view of indexing as
primitive reference
  • Is there a limit on the number of such indexes?
    If so
  • Is it fixed structural (architectural) property?
  • Can it be altered by different tasks, experience,
    etc?
  • How is it different from focal attention?
  • What determines whether something is attended?
  • What object properties allow objects to be
    tracked?
  • How can an object be selected without being
    selected as the object with property P (e.g.,
    the object at location ltx,ygt)? Selection is a
    misleading term.
  • Without some unique property how do you know
    which object you have selected? This is a
    misleading way to put it?

30
FINST Theory postulates a limited number of
pointers in early vision that are elicited by
certain things in the visual field and that
enable vision to refer to those things without
doing so under concept or a description
31
FINSTs and Object Files are the basic mechanisms
that link the world and its conceptualization
The only thing in this picture that is conceptual
is whats in the Object Files (unless you count a
reference as conceptual)
Object File contents are conceptual!
32
Summarizing FINST Theory
  • A FINST is a primitive reference mechanism that
    normally references individual visible objects in
    the world. There are a small number (4-5) FINSTs
    available at any one time.
  • Objects are picked out and referred to without
    using any encoding of their properties, including
    their location.
  • Referring to objects (or more accurately, being
    grabbed by objects) is prior to encoding any of
    their properties!
  • Indexing is nonconceptual because it does not
    represent an individual as a member of some
    conceptual category.
  • An important function of FINST indexes is to bind
    arguments of visual predicates to things in the
    world. Only predicates with bound arguments can
    be evaluated. Since predicates are
    quintessential concepts, an index serves as a
    bridge from nonconceptual to conceptual
    representations.
  • Similarly FINSTs can bind arguments of motor
    commands, including the command to move focal
    attention or gaze to the indexed object
  • e.g., MoveGaze(x) might be a primitive
    perceptual-motor operation

NOT MoveGaze (x, y, z) which gives spatial
coordinates of the gaze target
33
A note on terminology
  • A FINST provides a reference to an individual
    visible thing
  • I sometimes call this referent a FING by analogy
    with FINST and sometimes an object to conform
    with usage in psychology
  • A FINST does not pick out or refer to something
    as an object, because OBJECT is a concept. So
    FINGs are nonconceptual. Maybe proto object ?
  • I have also called it a pointer, but that
    erroneously suggests that it points to the
    location of an object, as opposed to the object
    itself. In a computer, a pointer is the name of
    a stored datum.
  • I have said that a FINST is a visual
    demonstrative like this or that, but this too
    is misleading because the reference of a
    demonstrative depends on the context and
    intentions of the speaker
  • I have also noted that a FINST is like a proper
    name but that wont do either since a name can
    pick out something not in sensory contact whereas
    a FINST can only refer to a visible item (or one
    that has been only briefly out of sight).

34
A quick tour of some evidence for FINSTs
  • The correspondence problem
  • The binding problem
  • Evaluating multi-place visual predicates
    (recognizing multi-element patterns)
  • Operating over several visual elements at once
    without having to search for them first
  • Subitizing
  • Subset selection
  • Multiple-Object Tracking
  • Cognizing space without requiring a spatial
    display in the head

35
Dawson Configuration (Dawson Pylyshyn, 1988)
36
Apparent Motion solves a correspondence
problemDawson Configuration (Dawson Pylyshyn,
1988)
37
Apparent Motion solves a correspondence
problemDawson Configuration (Dawson Pylyshyn,
1988)
Linear trajectory?
Curved trajectory?
Which criterion does the visual module prefer?
38
Apparent Motion solves a correspondence
problemDawson Configuration (Dawson Pylyshyn,
1988)
Nearest vector distance?
Nearest mean distance?
Nearest configural distance?
Which criterion does the visual module prefer?
39
Dawson Configuration (animated)
40
Dawson Configuration (animated)
41
Dawson Configuration Different Shapes Ignored
42
Yantis use of the Ternus Configuration to
demonstrate the early visual effect of objecthood
Short time delays result in element motion (the
middle object persists as the same object so it
does not appear to move)
43
Long time delays result in group motion because
the middle object does not persist but is
perceived as a new object each time it reappears
44
But long delays, when the disappearance appears
to be due to occlusion by an opaque surface,
maintain objecthood, and therefore behave like
short delays
45
But long delays, when the disappearance appears
to be due to occlusion by an opaque surface,
maintain objecthood, and therefore behave like
short delays
46
A quick tour of some evidence for FINSTs
  • The correspondence problem
  • The binding problem
  • Evaluating multi-place visual predicates
    (recognizing multi-element patterns)
  • Operating over several visual elements at once
    without having to search for them first
  • Subitizing
  • Subset search
  • Multiple-Object Tracking
  • Cognizing space without requiring a spatial
    display in the head

47
Encoding conjunctions of properties and solving
the Binding Problem
  • Experiments have shown that detecting
    conjunctions of several properties involves
    attending to the bearers of the properties.
    These studies have provided a basis for
    understanding an important problem in visual
    analysis the Binding Problem
  • The following aside is to illustrate some aspects
    of the problem of encoding conjunctions.

48
How are conjunctions of features detected?
Read the vertical line of digits in the following
display
Under these conditions Conjunction Errors are
very frequent
49
Rapid visual search (Treisman)
Find the following simple figure in the next
slide
50
This case is easy and the time is independent
of how many nontargets there are because there
is only one red item. This is called a popout
search
51
This case is also easy and the time is
independent of how many nontargets there are
because there is only one right-leaning item.
This is also a popout search.
52
Rapid visual search (conjunction)
Find the following simple figure in the next
slide
53
(No Transcript)
54
Find the unique item in this slide
55
Example of features and feature
conjunctions(Count Pink vs. Count Online)
56
Serial vs parallel search?
  • Finding an element that differs from all others
    in a scene by a single feature which is called
    a feature search is fast, error-free and almost
    independent of how many nontargets there are,
    but
  • Finding a target that differs from some objects
    by one or more of its feature while it differs
    from the other objects by another of its features
    is usually slow, error-prone, and is worse when
    there are more objects. (This is called a
    conjunction search because several properties of
    a target are needed to distinguish it from the
    nontarget objects).
  • These results suggest that in order to find a
    conjunction, attention has to be scanned serially
    to all objects.

As with most empirical generalizations, this
one fails under certain conditions such as
when one of the properties is motion or 3D depth.
57
Single-Feature vs Conjunction-feature search
58
The idea that attention moves without eye
movements has been known for 30 years
  • Because a sequence of eye movements takes several
    hundred milliseconds, the serial search view
    relies on the assumption that attention can move
    like a beam of light, rapidly and without eye
    movements
  • Posner experiments showed that this can occur
    under exogenous (data-driven or bottom-up)
    control or endogenous (voluntary or top-down)
    control

59
Covert movements of attention
Example of an experiment using a cue-validity
paradigm for showing that the locus of attention
moves without eye movements and for estimating
its speed. Posner, M. I. (1980). Orienting of
Attention. Quarterly Journal of Experimental
Psychology, 32, 3-25.
60
Recall Posners demonstration of exogenous
attention switch
Does the improved detection in intermediate
locations entail that the spotlight of
attention moves continuously through empty space?
61
Sperling Weichselgartner (1995) Episodic or
Quantal Theory of Attention switching
Assumes a quantal shift in attention in which
the spotlight pointed at location -2 is
extinguished and, simultaneously, the spotlight
at location 2 is turned on. Because extinction
and onset take a measurable amount of time, there
is a brief period when the spotlights partially
illuminate both locations simultaneously.
62
So if there is a visual attention beam it must
be scanned rapidly and must fall on conjunctions
of features for the features to be encoded as
conjoined
  • Now one can appreciate how something like
    attention is needed to solve the binding problem
    of representing features as being conjoined. But
    there is also reason to believe that one needs
    not one, but several such conjunction-detectors
    in order to recognize patterns.

63
So a that brings us to a substantial constraint
on the mechanisms of early vision They must
keep track of which properties are conjoined.
  • It is not enough to detect which properties are
    present in a scene. The earliest stages in the
    vision system must group these properties (or
    features) in the right way to preserve the
    information that some properties go together. If
    this information is lost, the scene cannot be
    correctly perceived.

64
Pandemonium An architecture for vision,
was proposed by Oliver Selfridge in 1959. This
idea continues to be at the heart of many
psycholog-ical models of vision, especially
connectionist and neural net models. It is also
a progenitor of blackboard architectures for
computational systems for vision and speech
perception (CMUsHearsay) This architecture
fails to keep track of which features are
conjoined (which property goes with which). It
also does not provide a way to represent tokens
of the same type.
65
A popular proposal for solving the Binding
Problem Encode Features-at-Locations
  • The near-universal view as to how the binding
    problem is solved is that conjunctions are
    computed as co-located features the visual
    system encodes features-at-locations.
  • Austen Clark (in A Theory of Sentience),
    following the tradition of Quine and Strawson,
    also assumes that location is primary and that in
    our most primitive nonconceptual sensory contact
    with the world, which he calls the level of
    sentience, the only resources available are
    those of what Strawson called a feature-placing
    language. Our sensory system can only encode
    Feature F at location L
  • This is also the standard view in psychology (as
    embodied in Triesmans Feature Integration
    Theory)
  • But this cant be right for some simple reasons.

66
Treismans Attention as Glue Hypothesis
  • The purpose of visual attention is to bind
    properties together in order to recognize objects
  • Thus the purpose of attention is to solve the
    binding problem
  • We can recognize not only the presence of
    squareness and redness in our field of view,
    but we can also distinguish between different
    ways they may be conjoined

67
The role of attention to location in Treismans
Feature Integration Theory
The map for feature F shows where Fs are
located. To determine that F1 and F2 are
colocated, the attention beam checks the
locations on each feature map against its
location in the Master Map.
68
There is another way to read the
attention-as-glue hypothesis In computing
conjunctions of properties attention must be
directed primarily at objects since it is objects
that have the conjoined properties
  • Instead of being like a spotlight beam that can
    be scanned around a scene, and can be zoomed to
    cover a larger or smaller area, maybe attention
    can only be directed towards occupied places
    i.e., to visual objects. There is now
    considerable evidence for this claim, some of
    which was reviewed earlier (e.g., single object
    advantage, both sensitivity to detection and
    inhibition of return travel as the relevant
    object move).

69
Individual objects and the binding problem
  • We can distinguish scenes that differ by
    conjunctions of properties, so early vision must
    somehow keep track of how properties co-occur
    conjunction must not be obscured. This is the
    called the binding problem
  • The most common proposal is that vision keeps
    track of properties according to their location
    and binds together co-located properties.

70
The proposal of binding conjunctions by the
location of conjuncts does not work when feature
location is not punctate and becomes even more
problematic if they are co-located e.g., if
their relation is inside
71
Binding must be object-based
  • The proposal that properties are conjoined by
    virtue of their common location has many problems
  • In order to assign a location to a property you
    need to know its boundaries, which requires
    distinguishing the object that has those
    properties from its background (figure-ground
    individuation)
  • Properties are properties of objects, not of
    locations which is why properties move when
    objects move. Moreover, empty locations have no
    causal properties.
  • The alternative to conjoining-by-location is
    conjoining by object. According to this view,
    solving the binding problem requires first
    distinguishing individual objects and then
    keeping track of each objects properties (in its
    object file)
  • If only properties of selected objects are
    encoded and if those properties are recorded in
    object files specific to each object, then all
    conjoined properties will be recorded in the same
    object file, thus solving the binding problem

72
Binding must be object-based
  • Another reason why conjunction-by-location fails
    is that conjunction must be computed for
  • Token moving objects
  • Token objects whose representation is built up
    incrementally over time
  • Token objects that appear on the two retinae
  • In all cases what must be computed is the
    distinction betweenthere it is again and here
    is another one the correspondence between
    proximal patterns when they derive from the same
    distal object vs when they derive from different
    objects.

Credit Austen Clark
73
Attention is object-based
  • There is a great deal of evidence that
    attention favors objects and adheres to those
    objects as they move.
  • The assumption that attention is like a spotlight
    and moves through space has been challenged by
    Weichselgartner and Sperling (1987) who provided
    an alternative explanation for the Posner
    evidence
  • Single object advantage
  • Evidence that attention moves with objects
    motion
  • Object File experiments described later that show
    Object-Specific Priming Benefit (OSPB)
  • Even abstract objects can be tracked through
    feature space
  • Attention is assigned to (spreads to) entire
    objects when some part of the object is attended
    (Egly, Driver Rafal, 1994).

74
Attention spreads over perceived objects


Spreads to B and not C
Spreads to C and not B




Spreads to B and not C
Spreads to C and not B

Using a priming method (Egly, Driver Rafal,
1994) showed that the effect of a prime spreads
to other parts of the same visual object compared
to equally distant parts of different objects.
75
A quick tour of some evidence for FINSTs
  • The correspondence problem (mentioned earlier)
  • The binding problem
  • Evaluating multi-place visual predicates (or
    recognizing multi-element patterns)
  • Operating over several visual elements at once
    without having to search for them first
  • Subitizing
  • Subset selection
  • Multiple-Object Tracking
  • Cognizing space without requiring a spatial
    display in the head

76
Being able to pick out and refer to individual
distal elements is essential for encoding patterns
  • Encoding relational predicates e.g., Collinear
    (x,y,z,..) Inside (x, C) Above (x,y) Square
    (w,x,y,z), requires simultaneously binding the
    arguments of n-place predicates to n elements in
    the visual scene
  • Evaluating such visual predicates requires
    individuating and referring to the objects over
    which the predicate is evaluated i.e., the
    arguments in the predicate must be bound to
    individual elements in the scene.
  • In detecting patterns the properties of
    individual objects must be ignored and the
    evidence of MOT suggests that these properties
    are indeed non encoded.

77
Several objects must be picked out at once in
making relational judgments
When we judge that certain objects are
collinear, we must first pick out the relevant
objects while ignoring their properties
78
Several objects must be picked out at once in
making relational judgments
  • The same is true for other relational judgments
    like inside or on-the-same-contour etc. We must
    pick out the relevant individual objects first.
    Are dots Inside-same contour? On-same contour?

79
A quick tour of some evidence for FINSTs
  • The correspondence problem
  • The binding problem
  • Evaluating multi-place visual predicates
    (recognizing multi-element patterns)
  • There is evidence that we can operate over
    several visual elements at once without first
    having to search for them
  • Subitizing
  • Subset selection
  • Multiple-Object Tracking
  • Cognizing space without requiring a spatial
    display in the head

80
More functions of FINSTsFurther experimental
explorationsusing different paradigms
  • Recognizing the cardinality of small sets of
    things Subitizing vs counting (Trick, 1994)
  • Searching through subsets selecting items to
    search through (Burkell, 1997)
  • Selecting subsets and maintaining the selection
    during a saccade (Currie, 2002)
  • Application of FINST index theory to infant
    cardinality studies (Carey, Spelke, Leslie,
    Uller, etc)
  • Indexes explain how children are able to acquire
    words for objects by ostension without suffering
    Quines Gavagai problem.

81
Signature subitizing phenomena only appear when
objects are automatically individuated and indexed
Counting slope
subitizing slope
Trick, L. M., Pylyshyn, Z. W. (1994). Why are
small and large numbers enumerated differently? A
limited capacity preattentive stage in vision.
Psychological Review, 101(1), 80-102.
82
Subitizing results
  • There is evidence that a different mechanism is
    involved in enumerating small (nlt4) and large
    (ngt4) numbers of items (even different brain
    mechanisms Dehaene Cohen, 1994)
  • Rapid small-number enumeration (subitizing) only
    occurs when items are first (automatically)
    individuated
  • Subitizing is not affected by precuing location
    while counting is
  • Subitizing is insensitive to distance among
    items
  • Our explanation for what is special about
    subitizing is that once FINST indexes are
    assigned to nlt 4 individual objects, the objects
    can be enumerated without first searching for
    them. In fact they might be enumerated simply by
    counting active indexes which is fast and
    accurate because it does not require visual
    scanning.
  • New data on subitizing Limits on enumeration
    may be related to recalling which items have
    already been counted (Haladjian 2009).
  • Trick, L. M., Pylyshyn, Z. W. (1994).
    Why are small and large numbers enumerated
    differently? A limited capacity preattentive
    stage in vision. Psychological Review, 101(1),
    80-102.

83
Subset selection for search
Burkell, J., Pylyshyn, Z. W. (1997). Searching
through subsets A test of the visual indexing
hypothesis. Spatial Vision, 11(2), 225-258.
84
Subset search results
  • The finding is that only properties of the
    subset matter
  • Note that properties of the entire subset are
    taken into account simultaneously (since that is
    what distinguishes a feature search from a
    conjunction search)
  • If the subset is a single-feature search it is
    fast and the slope (RT vs number of items) is
    shallow
  • If the subset is a conjunction search set, it
    takes longer and is more sensitive to the set
    size
  • As with subitizing, the distance between targets
    does not matter, so observers dont seem to be
    scanning the display looking for the target

85
The stability of the visual world entails the
capacity to re-identify individuals after a
saccade
  • There is no problem about how tactile selection
    can provide a stable world when you move around
    while keeping your fingers on the same objects
    because in that case retaining individual
    identity is automatic
  • But with FINSTs the same can be true with vision
    for a small number of visual objects
  • This is compatible with the fact that it appears
    one retains the relative location of only about 4
    elements during saccadic eye movements (Irwin,
    1996)Irwin, D. E. (1996). Integrating
    information across saccadic eye movements.
    Current Directions in Psychological Science,
    5(3), 94-100.

86
The selective search experiment with a saccade
induced between the late onset cues and start of
search
Even with a saccade between selection and access,
items can be accessed efficiently
87
A quick tour of some evidence for FINSTs
  • The correspondence problem (mentioned earlier)
  • The binding problem
  • Evaluating multi-place visual predicates
    (recognizing multi-element patterns)
  • Operating over several visual elements at once
    without having to search for them first
  • Subitizing
  • Subset selection (with saccades)
  • What happens when objects move?
  • The ultimate test Multiple Object Tracking

88
Detecting objects means not only solving the
binding problem over objects, but also detecting
and keeping track of properties as objects move.
  • The correspondence problem (mentioned earlier)
  • The binding problem
  • Evaluating multi-place visual predicates
    (recognizing multi-element patterns)
  • Operating over several visual elements at once
    without having to search for them first
  • Subitizing
  • Subset selection
  • What happens when objects move?
  • The ultimate test Multiple Object Tracking

89
Object File Experiments (Kahneman,Treisman,
Gibbs 1992) Priming object information sticks
to moving objects
F
D
F
90
Inhibition of return appears to be object-based
(as well as to some extent location-based)
  • Inhibition-of-return is thought to help in visual
    search since it prevents previously visited
    objects from being revisited
  • The original study used static objects. Then
    (Tipper, Driver Weaver, 1991) showed that IOR
    moves with the inhibited object.

91
The time-course of attentionInhibition of return
  • If we vary the time between the cue and target in
    a modified Posner paradigm, we find that when the
    Cue-Target-Onset-Asynchrony (CTOA) gets to
    around 300-900 ms, reaction time to the target
    begins to increase. This is called
    Inhibition-of-return (Klein, 2000).
  • To get this effect we actually have to attract
    attention to the target location and then attract
    it back to the origin. IOR is one of many
    examples of an inhibition effect being produced
    by attention.

92
IOR appears to be object-based (it travels with
the object that was attended)
93
Tracking objects not defined by distinct spatial
locations and spatial trajectories
Blaser, E., Pylyshyn, Z. W., Holcombe, A. O.
(2000). Tracking an object through feature-space.
Nature, 408(Nov 9), 196-199.
94
Demonstrating the function of FINSTs with
Multiple Object Tracking (MOT)
  • In a typical experiment, 8 simple identical
    objects are presented on a screen and 4 of them
    are briefly distinguished in some visual manner
    usually by flashing them on and off.
  • After these 4 targets are briefly identified, all
    objects resume their identical appearance and
    move randomly. The observers task is to keep
    track of the ones that had been designated as
    targets at the start
  • After a period of 5-10 seconds the motion stops
    and observers must indicate, using a mouse, which
    objects are the targets

95
Another example of MOT With self occlusion 5 x
5 1.75 x 1.75
96
Self occlusion dues not seriously impair tracking
97
Some Multiple Object Tracking Findings
  • Basic finding Most people can track at least 4
    targets that move randomly among identical
    non-target objects (even 5 year old children can
    track 3 objects)
  • We have now accumulated dozens of results that I
    will list later as they have implications for
    FINST theory.
  • How is it done? a first pass
  • We showed that it is unlikely that the tracking
    is done by keeping a record of the targets
    locations and updating them by serially visiting
    the objects (Pylyshyn Storm, 1998)
  • Other strategies may be employed (e.g., tracking
    a single deforming pattern), but they do not
    explain tracking
  • Hypothesis FINST Indexes get assigned to
    targets. At the end of the trial these pointers
    can be used to move attention to the targets and
    hence to select them

98
What role do visual properties play in MOT?
  • Certain properties may have to be present in
    order for an object to be indexed, and certain
    properties (probably different properties) may be
    required in order for the index to keep track of
    the object, but this does not mean that such
    properties are encoded, stored, or used in
    tracking.
  • Compare this with Kripkes distinction between
    properties that fix the referent of a proper name
    and the property that the name refers to. The
    former only plays a role at the names initial
    baptism.
  • Is there something special about location? Do we
    record and track properties-at-locations?
  • Location in time space may be essential for
    individuating objects, but locations need not be
    encoded or made cognitively available
  • The fact that an object is actually at some
    location or other does not mean that it is
    represented as such. Representing property P
    (where P happens to be at location L) ?
    Representing property P-is-at-L.

99
A way of viewing what goes on in MOT
  • According to Kahneman Treismans Object File
    theory, the appearance of a new visual object
    causes a new Object File to be created. Each
    object file is associated with its respective
    object presumably through a FINST Index.
  • The object file may contain information about the
    object to which it is attached. But according to
    FINST Theory, keeping track of the objects
    identity does not require the use of this
    information. The evidence suggests that in MOT,
    little or nothing may be stored in the object
    file except in some special cases (e.g., when the
    object suddenly changes or disappears).
  • What makes something the same object over time is
    that it remains connected to the same object-file
    (by the same FINST). Thus, for vision to treat
    something as the same enduring individual does
    not require appeal to properties or concepts.

100
Why is this relevant to foundational questions in
the philosophy of mind?
  • According to Quine, Strawson, and most
    philosophers, you cannot pick out or track
    individuals without concepts (sortals)
  • But you also cannot pick out individuals with
    only concepts
  • Sooner or later you have to pick out individuals
    using non-conceptual causal connections between
    thoughts and things
  • The present proposal is that FINSTs provide the
    needed non-conceptual mechanism for individuating
    objects and for tracking their identity, which
    works most of the time in our kind of world. It
    relies on natural constraints (Marr)
  • FINST indexes provide the right sort of
    connection for predicating properties of the
    world by allowing the arguments of predicates to
    be bound to objects prior to the predicates being
    evaluated. They may thus be the basis for early
    vocabulary learning.

101
But there must be some properties that cause
indexes to be grabbed!
  • Of course there are properties that are causally
    responsible for indexes being grabbed, and also
    properties (probably different ones) that make it
    possible for objects to be tracked
  • But these properties need not be represented
    (encoded) and used in tracking
  • The distinction between object properties that
    cause indexes to be assigned and those that are
    represented (in Object Files) is similar to
    Kripkes distinction between properties that are
    needed to pick out and name an object and those
    that constitute its meaning

102
Role of target properties in MOT Evidence that
they play little or no part in tracking
  • Changes of target properties are not reported nor
    even noticed during MOT
  • Keeping all targets at different color, size, or
    shape does not improve tracking
  • Observers do not use target speed or direction in
    tracking (e.g., by anticipating where the targets
    will reappear after occlusion). But they do
    appear to retain the objects locations at the
    time they disappeared since if they reappear at
    the location where they disappeared, tracking is
    not impaired.

103
Some open questions
  • We have arrived at the view that only properties
    of selected (indexed) objects enter into
    subsequent conceptualization and perception-based
    thought (i.e., only information in object files
    is made available to cognition)
  • So what happens to the rest of the visual
    information?
  • Visual information seems rich and fine-grained
    while this theory only allows for the properties
    of 4 or 5 objects to be encoded!
  • The present view leaves no room for nonconceptual
    representations whose content corresponds to the
    content of conscious experience
  • According to the present view, the only content
    that nonconceptual representations have is the
    demonstrative content of indexes that refer to
    perceptual objects
  • Question Why do we need any more than that?

104
An intriguing possibility.
  • Maybe the theoretically relevant information we
    take in is less than (or at least different from)
    what we experience
  • This possibility has received attention recently
    with the discovery of various blindnesses
    (e.g., change-blindness, inattentional blindness,
    blindsight) as well as the discovery of
    independent-vision systems (e.g., recognition and
    motor control)
  • The qualitative content of conscious experience
    may not play a role in explanations of cognitive
    processes
  • Even if unconceptualized information enters into
    causal process (e.g., motor control) it may not
    be represented or made available to the cognitive
    mind it not even as a nonconceptual
    representation
  • For something to be a representation its content
    must figure in explanations it must capture
    generalizations. It must have truth conditions
    and therefore allow for misrepresentation. It is
    an empirical question whether current proposals
    do (e.g., primal sketch, scenarios). cf Devitt
    Pylyshyns Razor

105
Vision science has always been deeply ambivalent
about role of conscious experience
  • Isnt how things appear one of the things that
    our theories must explain? Answer There is no a
    priori must explain!
  • The content of subjective experience is a major
    source of evidence. But it may turn out not to
    be the most reliable source for inferring the
    relevant functional states. It competes with
    other types of evidence.
  • How things appear cannot be taken at face value
    it carries substantive theoretical assumptions.
    It also draws on many levels of processing.
  • It was a serious obstacle to early theories of
    vision (Kepler and the inverted image)
  • It has been a poor guide in the case of theories
    of mental imagery (e.g., color mixing, image
    size, image distances). Reading X off an image
    is an illusion.
  • It seems likely that vision science will use
    evidence of conscious experience the way
    linguistics uses evidence of grammatical
    intuitions only as it is filtered through
    developing theories.
  • The questions a science is expected to answer
    cannot be set in advance they change as the
    science develops. If they change too much we may
    give up our current theories.

106
Index capacity and learning
  • Daphne Baveliers lab (Rochester) has shown that
    videogame players (VGPs) can track a larger
    number of objects in MOT (about 2 more targets).
  • Non VGPs can also increase the number tracked
    after only 9 hrs of practice on certain kinds of
    (mostly violent) video games
  • José Rivest (York U) has shown that some athletes
    can track more targets than non-athletes
  • Within individuals the main determiner of number
    of targets that can be tracked is the spacing
    between them (crowding).
  • A widely cited result alleged to show that the
    limit is not architectural is the effect of speed
    on tracking
  • We have shown that this is because increasing
    speed increases crowding averaged over time.
    When crowding is constant, speed is not a factor.

107
What next?
  • This picture leaves many unanswered questions,
    but it does provide a mechanism for solving the
    binding problem and also explaining how mental
    representations could have a nonconceptual
    connection with objects in the world (something
    required if mental representations are to connect
    with actions)

108
Schema for how FINSTs function in hockey
109
  • For a copy of these slides seehttp//ruccs.rutge
    rs.edu/faculty/pylyshyn/SelectionReference.ppt

110
You are now here
X
But you are also here
111
(No Transcript)
112
Additional examples of MOT
  • MOT with occlusion
  • MOT with virtual occluders
  • MOT with matched nonoccluding disappearance
  • Track endpoints of lines
  • Track rubber-band linked boxes
  • Track and remember ID by location
  • Track and remember ID by name (number)
  • Track while everything briefly disappears (½ sec)
    and goes on moving while invisible
  • Track while everything briefy disappears and
    reappears where they were when they disappeared
Write a Comment
User Comments (0)
About PowerShow.com