Title: High Resolution Aerial Image Understanding
1High Resolution Aerial Image Understanding
2Overview (Slide show mode)
To classify elements of the scene, we combine
bottom-up object detection with top-down object
verification.
Scene
Bottom-up proposals
Accepted proposals
A number of features and classifiers are used to
create bottom-up proposals. A hierarchical model
for the scene prior then removes false positives
that do not match scene constraints.
Rejected proposals
Buildings
Buildings
Cars
Roads
3A hierarchical model
4Bottom-up Methods
Bottom-up Objectives for one aerial
image 1.Grass candidates 2. Tree candidates 3.
Roof candidates 4. Road candidates
5Input image
Use manual segments (road, grass, shadow, and
tree) over hundreds of satellite images as
training data to learn the color histogram of
each class.
Color Saliency Map
road
grass
tree
shadow
- Use 30 bins histogram
- bins 0-24 denotes a 55 2D histogram of Hue and
Saturation - bins 25-29 denotes Value of lightness
6Input image
For a given image, compute the probability of the
pixels by looking at the fixed sized window
centered at the pixel. Assuming pixel-wise
independent, the probability is product of
empirical probabilities of all the pixels in the
window. The probability of each pixel location is
computed across all categories and normalized to
get a saliency map.
Color Saliency Map
tree
shadow
grass
road
By threshold the saliency maps with several
scale-factors and compute connected component
(CCP), we get discrete bottom-up particles.
7Bottom-up candidates
Input image
Color Saliency Map
Extract the SIFT features from all manually
labeled segments (parking lot, residential area,
and building), and perform K-mean clustering
among features within each category to get 10
cluster centers as the codebook for each
category. According to the codebook, create a
bag-of-words histogram model for each category.
Bag-of-Words Texton Saliency Map
Building
Parking Lot
Residential Area
8Bottom-up candidates
Input image
Color Saliency Map
Similar to the color saliency, the Bag-of-Words
saliency map is created by computing the
probability of each pixel location. It also takes
the window nearby and assume independent.
Therefore, according to the empirical probability
in the histograms, the probability at each
location can be computed and normalized for each
category.
Bag-of-Words Texton Saliency Map
Building
Parking Lot
Residential Area
By thresholding and computing CCP, we get
discrete bottom-up particles
The method is further augmented by combing color
and bag-of-words histograms using a mixture model
9Bottom-up candidates
Input image
Color Saliency Map
Bag-of-Words Texton Saliency Map
We consider generic rectilinear structures of
roof, as showing below left, in which the dash
line annotation illustrates an abstraction of
shared geometric structures(or shared components)
among different kinds of irregular roofs.
Therefore, those roofs can be composed by such
components under some relation rules as showing
below right.
Compositional Boosting Algorithm Roof Detection
10Bottom-up candidates
Input image
Color Saliency Map
Bag-of-Words Texton Saliency Map
Compositional Boosting Algorithm Roof Detection
The shared component which composed into
different kinds of rooftop can be in turn
decomposed into some shared larger parts, smaller
parts, segments hierarchically as showing below.
A set of parts are organized in to a graphlet
dictionary with increasing complexity.
11Bottom-up candidates
Input image
Color Saliency Map
Bag-of-Words Texton Saliency Map
Compositional Boosting Algorithm Roof Detection
We build an And/Or Graph model to combine shared
subparts, parts and components through different
composition rules into different roof structure,
allowing multiple way composition to account for
multiple roof configuration.
12Bottom-up candidates
Input image
Color Saliency Map
Bag-of-Words Texton Saliency Map
We design so many features on different node in
And/Or Graph model, as showing below, including
Geometric feature, such as length(L1,L2),
angle(T1,T2), and Appearance features, such as
color histogram covariance of Area1,Area2, color
variance of Area4. In detection, the candidates
of those nodes are weighted by the log-posterior
probability ratio
Compositional Boosting Algorithm Roof Detection
13Compositional Boosting Algorithm Roof Detection
(a)
Inference We apply an extended Swendson-Wang cut
to do roof inference by sampling sub-tree in
And/or Graph. (a) is source image, (b) is canny
edge of source image,(c) is straight lines
segment detected from the canny edge, (d)(g) is
a inference process from line segment to
rectangles, (h) shows one sub-parse-tree with on
node and other alternative sub-trees with off
node. They can swap between each other.
(d)
(e)
(b)
(c)
(f)
(g)
(h)
14Compositional Boosting Algorithm Roof Detection
Running example
Input image
15Compositional Boosting Algorithm Roof Detection
Running example
Edgelink of canny edge
16Compositional Boosting Algorithm Roof Detection
Running example
Straight line segment
17Compositional Boosting Algorithm Roof Detection
Running example
L junction
18Compositional Boosting Algorithm Roof Detection
Running example
parallel
19Compositional Boosting Algorithm Roof Detection
Running example
U junction
20Compositional Boosting Algorithm Roof Detection
Running example
Two parallel
21Compositional Boosting Algorithm Roof Detection
Running example
Opposite junction
22Compositional Boosting Algorithm Roof Detection
Running example
Proposed polygon
23Compositional Boosting Algorithm Roof Detection
Running example
Roof detection result
24Compositional Boosting Algorithm Roof Detection
Different example
25Compositional Boosting Algorithm Roof Detection
Different example
26Compositional Boosting Algorithm Roof Detection
More Results
27Compositional Boosting Algorithm Roof Detection
More Results
28Compositional Boosting Algorithm Roof Detection
More Results
29Compositional Boosting Algorithm Roof Detection
More Results
30Compositional Boosting Algorithm Roof Detection
More Results
31Bottom-up candidates
Input image
Color Saliency Map
Bag-of-Words Texton Saliency Map
Compositional Boosting Roof Detection
First standard Adaboost algorithm (Haar color
feature) is used to detect road segment. Then
segments are refined with saliency maps generated
in previous steps. Finally, using Gestalt type of
prior, road segments are connected into networks
similar to texton process Model visual patterns,
Guo et al, IJCV03
Road Detection
Connect segments (Texton process)
Result road mask
Refined Road segments
32Bottom-up candidates
Input image
Color Saliency Map
Bag-of-Words Texton Saliency Map
Compositional Boosting Roof Detection
Road Detection
Standard adaboost algorithm based on Haar
features are exploited for the car detection.
Different from common single orientation tasks
such as face detection, car detection should be
performed under multiple orientations. Therefore,
the intermediate results is further refined by
merging overlapping detection boxes
Adaboost Car Detection
Haar Feature used
Detection
Training samples
1.Detect at multiple orientations. 2. Merge
overlapping detection boxes
Positive
Negative
10,320 Pos samples
33Some experiment results of Car detection
- Some Boosting Detection Results
34Detection ROC Curve
35Top-Down Modeling
In addition to the bottom-up information shown,
we incorporate a top-down model to refine our
segmentation of the scene.
The scene is represented as a hierarchy of parts
constrained by relationships on their appearances.
Nodes are divided into singular nodes,
representing single objects that are related to
other singular nodes of the same type, and
plural nodes, which are collections of singular
nodes and interact with other plural nodes.
In our current implementation, we model 4
categories of objects, cars, roads, buildings,
and trees.
Scene
Buildings
Roads
Cars
Trees
Building
Building
Road
Road
Car
Car
Tree
Tree
36Terminals
All nodes, singular or plural, are defined by a
few basic features
o
(x, y) Center of mass o Orientation,
determined by mode of line orientations. (sx, sy)
Scale, determined by bounding box with
orientation o.
(x, y)
(sx, sy)
37Roads A Special Case
Roads are treated as special terminals, as they
are often so long that using their bounding boxes
for position and scale wont yield very
informative results. We thus split the roads
into smaller rectangular regions and use these
segments as our terminals. In our implementation
these have a ratio of 21 lengthwidth.
38Singular and Plural Nodes
Plural nodes are really just sets of singular
nodes of the same type. By using this hierarchy,
we can ensure that singular nodes depend only on
each other and their enclosing set node. This
reduces the number of relationships we need to
model and gives us finer control over the
constraints of the scene.
Cars
Car
These car nodes constrain each others
appearances, and are in turn constrained by their
enclosing Cars set.
39Top-Down Modeling
Our model can be represented as a non-recursive
grammar in which the root node, the Scene node,
is decomposed into plural nodes, which are each
decomposed into a set of k singular nodes. p(ki)
is learned from the data and determines how many
singular nodes belong to each plural node.
S ? Roads, Cars, Buildings, Trees Roads ? Road1,
Road2, , Roadk1 k1 p(k1) Cars ? Car1,
Car2, , Cark2 k2 p(k2) Building ?
Building1, Building2, , Buildingk3 k3
p(k3) Tree ? Tree1, Tree2, , Treek4 k4
p(k4)
At each production, each set of nodes is subject
to certain constraints. Thus the singular nodes
all constrain one another, as do the plural nodes.
40Relationships
Relationships are functions on the properties of
each object. Our model constraints are formed
from histograms of the results of each of these
functions over our training data. We require
inferred or sampled data to match these
histograms, thus driving our results toward the
observed datas distribution.
Relative Scale
Relative Position
Relative Orientation
Containment
Alignment
Aspect Ratio
41Associations
To reduce complexity further, we introduce a
notion of association or proximity. For
certain relationships, some objects only depend
on those associated with them in some way. In
most cases, this is expressed merely as a
function of position. For example, tree sets are
only constrained by the other parts they are near.
This trees energy is determined relative only to
the objects within a certain radius of itself.
42Formulation
For ease of computation, we model the scene
sequentially, where each object depends only on a
subset of the other objects. p(Scene)
p(Roads)p(CarsRoads)p(BuildingsRoads,
Cars) p(TreesRoads,
Cars, Buildings) p(CarCars)p(Building
Buildings)p(TreeTrees)
Each conditional probability is defined as a
Gibbs distribution over its dependencies. For
example, p(CarsRoads) is defined by an energy
function over any relationship that exists
between Cars and Roads. As mentioned, these
constraints are histograms of the relationships
observed over many training images.
where R(A,B) is the set of relationships between
A and B, Hr is the histogram for relationship r
and /lambda is a weighting factor. For our
implementation, we set /lambdar -log(Hr), thus
creating an energy term that exactly matches our
observed statistics. I(A,B) is an indicator
function returning 1 if A and B are associated
and 0 otherwise. With this prior, we can sample
novel scenes.
43Training Data
Google satellite images labeled according to our
scene hierarchy (slide show mode).
44Scene Sampling Example (Slideshow
mode)
45Hierarchy For Scene Example(associations shown
by connecting lines)
46Scene Sampling Results
47Scene Sampling Results
48Inference
Given our prior p(S) and bottom-up likelihoods
for various parts P, p(PS), we can use the
standard Bayesian setup to derive a posterior
probability for the scene p(SP)
p(PS)p(S)
Given a set of bottom-up proposals, wed like to
select the set of proposals that maximizes
p(SP), thus creating the most likely parse. To
maximize p(SP), we begin with a set of candidate
bottom-up proposals, P many of which are likely
false positives. We then iteratively add the
part Pi at each step to our set of accepted
proposals, P, that maximizes p(SP). This
process continues until no proposals remain in P
or until the probability increase between
iterations i and i 1 is less than some value e.
49Inference Example
Below we see an example set of part proposals,
all of which are correct except for the false
positive of the tree. Each object shows the
final probability score it would produce for
p(SP) if it were selected. The scores are
reranked as each new part is selected.
Car 0.2
Car 0.7
Car
Road 0.6
Road
Road
Building 0.55
Building 0.6
Building 0.8
Tree 0.5
Tree 0.65
Tree 0.7
Selecting Tree in the next stage would now
create probability 0, since it is an impossible
parse. The algorithm would thus end here.
Accept road as highest proposal, which will
readjust its neighbors scores.
Car is accepted next, boosting buildings score
at the next step.
50Inference as non-maximal suppression
8 proposals 2 true positives 6 false positives
Bottom-up proposals
Final set 2 true positives 1 false positive
86 reduction in false positives
Final particles pruned by top-down
51Inference as non-maximal suppression
6 proposals 1 true positives 5 false positives
Bottom-up proposals
Final set 1 true positives 2 false positives
Final particles pruned by top-down
52Roads A Special Case
Roads are treated as special terminals, as they
are often so long that using their bounding boxes
for position and scale wont yield very
informative results. We thus split the roads
into smaller rectangular regions and use these
segments as our terminals. In our implementation
these have a ratio of 21 lengthwidth.
53Mixed class pruning
Candidate proposals
Accepted proposals
Before 30 True positives 9 False positives
After 28 True positives 0 False positives
54Inference Proposal Order (Slide Show Mode)
55Performance
Because attempting to count negatives as
regions where no positives exist can be
misleading, we use a Recall vs. 1-Precision curve
to show model performance. (The ideal shape for
these curves is a vertical line at 0).
Recall True positives / Total
positives 1-Precision False positives / (False
positives True positives)
True positive If gt80 of bounding region
overlaps true bounding region (counted once per
true positive per image).
Results pooled over 12 images for the building
class.
Bottom-up only
Bottom-up Top-down
56Car Detection
Bottom-up ROC with adaboost Top-down Improve
d ROC curve by masking out roof regions