Title: Visual Attention: Selective tuning and saliency computation using game theory
1Visual AttentionSelective tuning and saliency
computation using game theory
- Presentation prepared by Alexandre Bernardino,
VisLab-ISR-IST.
- Based on the papers
- Modeling visual attention via selective tuning.
J. Tsotsos, S. Culhane, W. Wai, Y. Lai, N.
Davies, F. Nuflo. Artificial Intelligence 78
(1995) 507-546. - Visual Attention using game theory. O.
Ramström, H. Christensen. BMCV 2002, 462-471.
2The General Vision Problem (oversimplified)
Context
Recognition/Inference
Attention/Selection
Features/Saliency
Images
Can all vision problems be described by this
diagram ?
3Visual Attention Components (from Tsotsos et al)
- Selection of a region of interest in the visual
field - Selection of feature dimensions and values of
interest - Control of information flow through the network
of neurons that constitute the visual system - The shifting from one selected region to the next
in time. - Transformation of task information into
attentional instructions - Integration of successive attentional fixations
- Interactions with memory
- Indexing into model bases
4The Selective Tuning Model
- Localizes interesting regions in the visual
field. - Assumes Interestingness values can be easily
computed for each item, depending on task
definition. - Reduces computation by utilizing a visual pyramid
- Addresses some problems with pyramid
representations
5Visual Pyramids
- Small receptive fields at the bottom and large
receptive fields at the top may overlap. - At each site and scale, the information is
interpreted by interpretive units of different
types. - Each interpretive unit may receive feedback,
fedforward, lateral interactions (etc...), from
other units. - Solve part of the complexity problem but
introduce others
6Benefits of Visual Pyramids
- Multi-scale analysis and data reduction
- Each unit computes a weighted average of its
lower-level units.
7Problems with information flow due to pyramidal
processing (1)
- The Context Effect
- Units at the top of the top of the pyramid
receives input from a very large sub-pyramid and
are confounded by the surroundings of the
attended object.
8Problems with information flow due to pyramidal
processing (2)
- Blurring
- A single event at the input affects an inverted
subpyramid of units and gets blurred as it flows
upwards so that a large portion of the output
represents part of it.
9Problems with information flow due to pyramidal
processing (3)
- Cross-talk
- Two separate visual events activate two inverted
subpyramids that may overlap. Thus one event
interferes with the interpretation of the other.
10Problems with information flow due to pyramidal
processing (4)
- Boudary
- Central items appear stronger than peripheral
items since the number of upgoing connections for
central objects is bigger.
11Tsotsos et al Selective Tuning Architecture
12WTA Units
- Il,k the interpretive unit in assembly k in
layer l - Gl,k,j the jth WTA gating unit, in assembly k in
layer l, linking Il,k with Il-1,j - gl,k the gating control unit for the WTA over
the inputs to Il,k - bl,k the bias unit for Il,k
- ql,j,i weight applied to Il-1,i in the
computation of Il,j - nl,x scale normalization factor
- Ml,k the set of gating units for Il,k
- Ul1,k the set of gating units in layer l1
making feedback connections to gl,k - Bl1,k the set of bias units in layer l1 making
feedback connections to bl,k
13Selective Tuning Overview
- Build the pyramid
- Compute MAX (Winner Take All) at the higher level
to determine the globally most salient items.
Top-down bias can be externally introduced. - Inhibit units not on the winners receptive field.
- The process continues to the bottom of the
pyramid - As the prunning of connections proceeds
downwards, interpretive units are recomputed and
propagated upwards. - BENEFITS
- WTAs are computed on small regions.
- RESULT
- Selects (segments) a region that fulfils the
saliency definition at all scales.
14Information Routing
15Results Brightness and Orientation
- Brightness
- Saliency the largest and brightest
- Features Average gray level on rectangles
6,50x6,50 - Pyramid local average of previous level.
- Orientation
- Saliency the longest and highest contrast
straight line - Features edges with orientations 0, 45, 90,
135 and sizes 3,35x3,35 - Pyramid 128,108,80,48,28
16Results Motion
- Simulated optic flow
- Matching (correlation) against 16 templates of
motion patterns - Pyramid computes local average. 4 levels.
17What is missing ?
- Salient features are predefined and very simple
- Ex
- The brightest and largest item.
- The largest and highest contrast line.
- The best matching item with a database.
- Conjuction of features is Ad-hoc
- WTA within each feature dimension
- WTA across the winners of 1
- Overall winner selects the attended region
-
18Visual Attention Using Game Theory
- Compute salient locations on multi-feature
spaces - Each point (x,y) is associated with a unit vector
of multiple features, eg. color and brightness
nx,y (Rx,y,Gx,y,Bx,y,Ix,y)t/N - Incorporates task knowledge top-down bias as a
desired feature unit vector - w (Rd,Gd,Bd,Id)t/M
- Salient regions in a image are defined as being
similar the the desired feature vector and
distinct from their neighbors - wtnA gt wtnA A lt A nA sum(nx,y\in
A) - The subregion matches the wanted feature better
than its surrounding.
19The Market
- N actors (points)
- K types of available goods
- Actor i as an allocation of goods ni \in Rk
- Let f(ni) be the utility of a certain allocation
of goods. - Each agent will trade to get the zi that solves
- max (f(zi)-pt(zi-ni)) p is the price vector.
20The Feature Market
- If f(ni) is a concave function, then the market
reaches competitive equilibrium - ni naverage, in a neighborhood.
- f(ni) wtni is a concave function
- A fair price is defined as
- p f(nav) w navnavtw (I-navnavt)w Aw
- A projection on the orthogonal complement
21Saliency Wealth
- Capital of actor i
- Ci pt (ni-nav) wtA(ni-nav) wtAni
ni
w
Ani
nav
navnavtni
22Interesting Things
- Normalization Matrix A enhances directions with
less items. - Salience can be split in two terms
- Intrinsic salience independent of the task
- Si Ani
- Extrinsic salience depends on top-down bias
- Se wtSi