Title: Grammar of Image
1Grammar of Image
2Problems
- Enormous amount of vision knowledge
- Computational complexity
- Semantic gap
Classification, Recognition
3Task of image parsing
4Objectives in this paper
- Framework for vision
- And-Or Graph
- Algorithm for this framework
- Top-down/bottom-up computation
- Generalization of small sample
- Use Monte Carlos simulation to synthesis more
configurations - Fill the semantic gap
5Grammar
- Language co-occurance of s is more than chance
- Image Parallel T-junction
CONSTANTINOPLE
6Formulation of grammar
- Start symbol S
- Non-terminal nodes VN
- Reproduction Rule R
- Terminal nodes VT
7Formulation of grammar
- Start symbol S
- Non-terminal nodes VN
- Reproduction Rule R
- Terminal nodes VT
8Formulation of grammar
- Start symbol S
- Non-terminal nodes VN
- Reproduction Rule R
- Terminal nodes VT
S NP VP
VP VP PP
VP V NP
9Formulation of grammar
- Start symbol S
- Non-terminal nodes VN
- Reproduction Rule R
- Terminal nodes VT
10Formulation of grammar
- Start symbol S
- Non-terminal nodes VN
- Reproduction Rule R
- Terminal nodes VT
11Image grammar
- Start symbol S
- Reproduction Rules
- Non-terminal nodes VN
- Terminal nodes VT
12Overlapping parts/Ambiguity
13Overlapping parts/Ambiguity
- Similar color, occlusion, etc.
14Stochastic Context Free Grammar
- For each VN , we have reproduction rules
- with a probability associated with each one
-
- Probability of parsing tree
-
- Probability of sentence
15Stochastic Grammar with Context
- From left to right bi-gram model (Markov chain)
- a sentence with n words
- Non-local relations tree model
16New issues in Image Grammar
- Loss of left to right order region adjacency
graph
17New issues in Image Grammar
- Scaling makes different terminal in parsing tree
18New issues in Image Grammar
- Switch between texture and structure
19Building the image grammar
- Visual Vocabulary
- primitives, sketch graph, textons
- Relations and configurations
- co-occurance, attached, hinged, supported,
occluded - And-or Graph representation
- embedding image grammar
- Learning /testing the parse graph
- find the possible inference
20Database
- Lotus Hill Institute Dataset
- 636,748 images, 3,927,130 Physical Objects
- A few hundred are free
Benjamin Yao, Xiong Yang, and Song-Chun Zhu,
Introduction to a large scale general purpose
ground truth dataset methodology, annotation
tool, and benchmarks. EMMCVPR, 2007
http//www.imageparsing.com/
21Free Data
http//yoshi.cs.ucla.edu/yao/data/
- 6 categories, 145 subsets
- Manmade Object 75 Nature Object 40
Objects in Scene 6 - Transportation 9 UCLA Aerial Image 5
UIUC Sport Activity 10 - Outline segmentation of the object
22Free Data
http//yoshi.cs.ucla.edu/yao/data/
- 6 categories, 145 subsets
- Manmade Object 75 Nature Object 40
Objects in Scene 6 - Transportation 9 UCLA Aerial Image 5
UIUC Sport Activity 10 - Segmentation of a scene (street)
23Free Data
http//yoshi.cs.ucla.edu/yao/data/
- 6 categories, 145 subsets
- Manmade Object 75 Nature Object 40
Objects in Scene 6 - Transportation 9 UCLA Aerial Image 5
UIUC Sport Activity 10 - Physical parts of the object
24 Visual Vocabulary
25 Visual Vocabulary
-
- function of image primitives
- a) geometry transformation
- b) appearance
- bond between each primitives
26 Visual Vocabulary
S. C. Zhu, Y. N. Wu, and D. B. Mumford, Minimax
entropy principle and its applications to texture
modeling, Neural Computation, vol. 9, no. 8, pp.
16271660, November 1997
27Primal sketch model
Sketch graph
Input image
Texture pixels
C. E. Guo, S. C. Zhu, and Y. N. Wu, Primal
sketch Integrating texture and structure, in
Proceedings of International Conference on
Computer Vision,2003.
28Primal sketch model
C. E. Guo, S. C. Zhu, and Y. N. Wu, Primal
sketch Integrating texture and structure, in
Proceedings of International Conference on
Computer Vision,2003.
29High level visual vocabulary
- Cloth collar, left/right sleeves, hands
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu,
Composite templates for cloth modeling and
sketching, in Proceedings of IEEE Conference on
Pattern Recognition and Computer Vision, New
York, June 2006
30Relations and configurations
- Definition of relation
- bonds
- relations ,
structure, compatibility - Three types of relations
- Bonds and connections
- Joints and junctions
- Object interactions/semantics
- Definition of configurations
31Relations
- Bonds and connections
- connects primitives into bigger graphs
- intensity/color compatibility
32Relations
33Relations
34Configuration
- Spatial layout of entities at a certain level
- Primal sketch parts object scene
35Reconfigurable graphs
- Treat bonds as random variables address nodes
36Inference of the configuration
- Have the primal sketch of the image
- Detect the T-junction
- Simulated annealing to infer the Gestalt Law
Red dot connect region Black line known
edge Green line inferred connection
R. X. Gao and S. C. Zhu, From primal sketch to
2.1D sketch, Technical Report, Lotus Hill
Institute, 2006
37Reconfigurable graphs
Layer extraction
Inferred connection
Source image
T-junction
Ru-Xin Gao1, Tian-Fu Wu, Song-Chun Zhu, and Nong
Sang, Bayesian Inference for Layer
Representation with Mixed Markov Random Field
38Reconfigurable graphs
R. X. Gao and S. C. Zhu, From primal sketch to
2.1D sketch, Technical Report, Lotus Hill
Institute, 2006
39And-Or Graph
- Parse graph of the image
- pt parse tree of vocabulary E relations
- Inference the parse graph
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu,
Recursive top-down/bottom up algorithm for
object recognition, Technical Report, Lotus Hill
Research Institute, 2007.
40And-Or Graph
- Contain all the valid parse graphs
- And node, Or node, leaf-node
- Relation between children of And node
- Parse tree assigning label on Or node
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu,
Recursive top-down/bottom up algorithm for
object recognition, Technical Report, Lotus Hill
Research Institute, 2007.
41And-Or Graph
- Definition
-
- image primitives
- relations at all
level - probability model defined on the And-Or
graph - valid configuration of terminal nodes
42Stochastic Model on And-Or graph
- Terminal (leaf) node
- And-Or node
- Set of links
- Switch variable at Or-node
- Attributes of primitives
43Stochastic Model on And-Or graph
- Terminal (leaf) node
- And-Or node
- Set of links
- Switch variable at Or-node
- Attributes of primitives
SCFG weigh the frequency at the children of
or-nodes
44Stochastic Model on And-Or graph
- Terminal (leaf) node
- And-Or node
- Set of links
- Switch variable at Or-node
- Attributes of primitives
Weigh the local compatibility of primitives
(geometric and appearance)
45Stochastic Model on And-Or graph
- Terminal (leaf) node
- And-Or node
- Set of links
- Switch variable at Or-node
- Attributes of primitives
Spatial and appearance between primitives (parts
or objects)
46Learning And-Or Graph
- Learning the vocabulary
- Learning the relation set R, given
- Learning the parameters , given R and
47Learning And-Or Graph
- Learning the vocabulary , and hierarchic
And-Or Graph - Learning the relation set R, given
- Learning the parameters , given R and
Discussed in the paper
48Learning And-Or Graph
Observation
Learning model
- Learning and Pursuing Relation Set R
- Start from Stochastic Context Free Graph (a)
- Learn the relations that maximally reduce the KL
divergence to the observation (b-e)
J. Porway, Z. Y. Yao, and S. C. Zhu, Learning an
AndOr graph for modeling and recognizing object
categories, Technical Report, Department of
Statistics,2007
49Learning And-Or Graph
- Learning graph parameter
- Approximating to
- Similar to texture synthesis
S. C. Zhu, Y. N. Wu, and D. B. Mumford, Minimax
entropy principle and its applications to texture
modeling, Neural Computation, vol. 9, no. 8, pp.
16271660, November 1997
50Case I Rectangle
- Nodes Rectangle
- Two vanishing points, four edge direction
- Rules
F. Han and S. C. Zhu, Bottom-up/top-down image
parsing by attribute graph grammar. Proceedings
of International Conference on Computer Vision,
Beijing,China, 2005.
51Case I Rectangle
- Get the primal sketch of the scene
- Find the strong rectangular (bottom-up, red)
- Weigh (score) different hypothesis (top-down,
blue) - Weight is the compatibility of the image with the
proposed rectangular (primal-sketch) - Accept the best one
- Do the previous 3 steps until all the weigh is
small. (negative)
F. Han and S. C. Zhu, Bottom-up/top-down image
parsing by attribute graph grammar. Proceedings
of International Conference on Computer Vision,
Beijing,China, 2005.
52Case I Rectangle
53Case I Rectangle
F. Han and S. C. Zhu, Bottom-up/top-down image
parsing by attribute graph grammar. Proceedings
of International Conference on Computer Vision,
Beijing,China, 2005.
54Case II Human Cloth
- Use And-Or graph to generate a matching model
- Vocabulary (training dataset)
Matching using the And-or Graph
55Case II Human Cloth
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu,
Composite templates for cloth modeling and
sketching, in Proceedings of IEEE Conference on
Pattern Recognition and Computer Vision, New
York, June 2006.
56Case II Human Cloth
Top-down refine the matching using the relation
Localize face, then estimate the parts of the body
Bottom-up a coarse matching of the parts
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu,
Composite templates for cloth modeling and
sketching, in Proceedings of IEEE Conference on
Pattern Recognition and Computer Vision, New
York, June 2006.
57Case II Human Cloth
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu,
Composite templates for cloth modeling and
sketching, in Proceedings of IEEE Conference on
Pattern Recognition and Computer Vision, New
York, June 2006.
58Case II Human Cloth
Hands are not exactly the same find the best
matching in the dataset
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu,
Composite templates for cloth modeling and
sketching, in Proceedings of IEEE Conference on
Pattern Recognition and Computer Vision, New
York, June 2006.
59Case III Recognition
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu,
Recursive top-down/bottomup algorithm for object
recognition, Technical Report, Lotus Hill
Research Institute, 2007.
60Conclusion
- Enormous amount of vision knowledge (Add-Or
graph)
61Conclusion
- Computational complexity
- Remain open for scheduling bottom-up/top-down
procedure - Semantic Gap
- Learning the And-Or Graph
- Learning the vocabulary , and its attributes
- After all, we are not supposed to define so many
things - ideal vision words
-
- what we have now
62Thank you