Title: Search Techniques for Multimedia Databases
1Search Techniques for Multimedia Databases
- Similarity-Based Queries
- Similarity Computation
- Indexing Techniques
- Data Clustering
- Search Algorithms
2Characteristic ofMultimedia Queries
- We normally retrieve a few records from a
traditional DBMS through the specification of
exact queries based on the notions of equality. - The types of queries expected in an image/video
DBMS are relatively vague or fuzzy, and are based
on the notion of similarity.
3Content-Based Retrieval
- It is necessary to extract the features which are
characteristic of the image and index the image
on these features. -
- Examples Shape descriptions, texture
properties. - Typically, there are a few different quantitative
measures which describes the various aspects of
each feature. -
- Example The texture attribute of an image
can be modeled as a 3-dimensional vector with
measures of directionality, contrast, and
coarseness. -
4Measure of Similarity
- A suitable measure of similarity between an image
feature vector F and query vector Q is the
weighted metric D -
- where W is an nxn matrix which can be used to
specify suitable weighting measures.
W1(F1-Q1)2 W2(F2-Q2)2 Wn(Fn-Qn)2
Square of weighted Euclidean Distance
5Measure of Similarity
- A suitable measure of similarity between an image
feature vector F and query vector Q is the
weighted metric D -
- where W is an nxn matrix which can be used to
specify suitable weighting measures.
W1(F1-Q1)2 W2(F2-Q2)2 Wn(Fn-Qn)2
Square of weighted Euclidean Distance
6Similarity Based on Euclidean Distance
A Identity Matrix
3
2
3
2
é
ù
é
ù
é
ù
é
ù
ê
ú
ê
ú
ê
ú
ê
ú
F
F
F
Q
4
4
4
4
ê
ú
ê
ú
ê
ú
ê
ú
1
2
3
6
7
7
6
ê
ú
ê
ú
ê
ú
ê
ú
ë
û
ë
û
ë
û
ë
û
7Similarity Based on Euclidean Distance (cont.)
Features 1 2 are treated equally
Feature 2
Feature 1
Points which lie at the same distance from the
query point are considered equally similar, e.g.,
F1 and F2.
8Similarity Based on Weighted Euclidean Distance
Example
4
3
3
1
0
0
é
ù
é
ù
é
ù
é
ù
ê
ú
ê
ú
ê
ú
ê
ú
F
F
Q
W
5
5
5
0
1
0
ê
ú
ê
ú
ê
ú
ê
ú
1
2
7
8
7
0
0
2
ê
ú
ê
ú
ê
ú
ê
ú
ë
û
ë
û
ë
û
ë
û
1
0
0
1
é
ù
é
ù
ê
ú
ê
ú
Dissimilarity in 3rd dimension emphasized
D(F1 ,Q)
0
1
0
0
1
1
0
0
ê
ú
ê
ú
ê
ú
ê
ú
0
0
2
0
ë
û
ë
û
1
0
0
0
é
ù
é
ù
ê
ú
ê
ú
0
1
0
0
2
D(F2 ,Q)
0
0
1
ê
ú
ê
ú
ê
ú
ê
ú
0
0
2
1
ë
û
ë
û
D(F1 ,Q) lt D(F2 ,Q) ? F1 is more similar to Q
9How to determine the weights ?
The variance of the individual feature measures
can be used to determine their weights.
0
0
1/s12
si2 Statistical variance of the i th feature
measure.
A
0
0
1/s22
0
0
1/s32
- Rationale
- Variance characterizes the dispersion among the
measures - Use larger weight for feature with smaller
variance.
10Effect of Weights
Shape
A cluster of similar objects (e.g., cars)
Color
This feature has larger variance, use smaller
weight
11Distance in Euclidean Space
1-norm distance (Manhattan distance)
2-norm distance (Euclidean distance)
p-norm distance (Minkowski distance)
Infinity norm distance (Chebyshev distance)
Maximum distance between any component of the two
vectors
12Common Properties of a Distance
- Distances, such as the Euclidean distance, have
some well known properties. - Positive Definiteness d(p, q) ? 0 for all p
and q and d(p, q) 0 only if p q. - Symmetry d(p, q) d(q, p) for all p and q.
- Triangle Inequality d(p, r) ? d(p, q) d(q, r)
for all points p, q, and r. - where d(p, q) is the distance (dissimilarity)
between points (data objects), p and q. - A distance that satisfies these properties is a
metric
13Query Types
- k-Nearest-Neighbor (k-NN) Queries The user
specifies the number of close matches to the
given query point. - Retrieve 10 images most similar to this sample
- Range queries An interval is given for each
dimension of the feature space and all the images
which fall inside this hypercube are retrieved.
.
.
.
.
.
Q
.
.
.
Q
Q
.
.
.
.
.
.
.
.
.
.
r
.
.
.
.
r is large r
is small range query ?
vague query ?
4-nearest-neighbor
query
14Multiattribute and Spatial Indexing
- Spatial Databases Queries involve regions that
are represented as multidimensional objects. - Example A rectangle in a 2-dimensional space
involves four values - two points with two values
for each point (4D vector). - Access methods that index on multidimensional
keys yield better performance for spatial
queries.
(X1,Y1)
(X2,Y2)
15Multiattribute and Spatial Indexing of Multimedia
Objects
- Multimedia Databases Multimedia objects
typically have several attributes that
characterize them. - Example Attributes of an image include
coarseness, shape, color, etc. - Multimedia databases are also good candidates for
multikey search structures.
shape
Coarseness
Average color
16Indexing Multimedia Objects
- Cant we index multiple features using a B-tree
? - B-tree defines a linear order (e.g., according
to X) - Similar objects (e.g., O1 and O2) can be far
apart in the indexing order - Why multidimensional indexing ?
- A multidimensional index defines a spatial
order - Conceptually similar objects are spatially near
each other in the indexing order (e.g., O1 and
O2)
17Some Multidimensional Search Structures
- k-d Tree
- Multidimensional Trie
- Grid File
- R Tree
- Point-Quad Tree
- D-Tree
18k-d Tree
- Each node consists of a record and two
pointers. The pointers are either null or point
to another node. - Nodes have levels and each level of the tree
discriminates for one attribute. - The partitioning of the space with respect to
various attributes alternates between the various
attributes of the n-dimensional search space.
Discriminator
Example 2-D tree
A(65, 50)
X Y Y
Input Sequence A (65, 50) B (60, 70) C
(70, 60) D (75, 25) E (50, 90) F
(90, 65) G (10, 30) H (80, 85) I
(95, 75)
B(60, 70)
C(70, 60)
F(90, 65)
X
D(75, 25)
G(10,30)
E(50,90)
Insertion order can affect performance
H(80, 85)
I(95, 75)
19k-d Tree Search Algorithm
- Notations
- Algorithm Search for P(K1, ..., Kn)
-
- Q Root / Q will be used to navigate
the tree / - While NOT DONE DO the following
- if (Ki(P) Ki(Q) for i 1, ..., n) then
we have / Agree in each / - located the node and we are
DONE / dimension / - Otherwise if A Disc(Q) and KA(P) lt
KA(Q) - then Q High(Q)
- else Q Low(Q)
- Performance O(logN), where N is the number of
records
Disc(L) The discriminator at Ls level Ki(L)
The i-attribute value of node L Low(L) The
left child of L High(L) The right child of L
(..., Ki(L), ...)
L
N High(L)
M
N
M Low(L)
20Multidimensional Trie
- Multidimensional tries, or k-d tries, are similar
to k-d tree except that they divide the embedding
space. - ? Each split evenly divides a region
Example Construction of a 2D trie
Y
C(70, 60)
B(60,70)
Partitioning the space
A(65,50)
D(75,25)
X
20
21Disadvantages of k-d Trie
- The maximum level of decomposition depends on the
minimum separation between two points - A solution Split a region only if it contains
more than p points (i.e., using buckets) - Not a balanced tree
- ? unpredictable performance
22Grid File
Split Strategy The partitioning is done with
only one hyperplane, but the split extends to all
regions in the splitting direction
23Grid File - Potential Issues
Grid directory
A
B
C
K
D
E
F
K
H
I
J
K
Data bucket H
L
L
M
K
1
2
3
4
75
100
0
25
50
Data basket K
- The directory can be quite sparse.
- Many adjacent directory entries may point to the
same data block. - For partial-match and range queries, many
directory entries, but only few data blocks, may
have to be scanned (i.e., sequential search might
be faster)
24Point-Quad Tree
- Each node of a k-dimensional quad tree partitions
the object space into k quadrants. - The partitioning is performed along all search
dimensions and is data dependent, like k-d tree.
Partitioning of the space
D(35,85)
B(75,80)
A(50,50)
E(25,25)
- To search for P(55, 75)
- Since XAlt XP and YA lt YP ? go to NE (i.e., B).
- Since XB gt XP and YB gt YP ? go to SW, which in
this case is null.
25R-tree (RegionTree)
- Root and intermediate nodes correspond to the
smallest rectangle that encloses its child nodes,
i.e., containing r, ltpage
pointergt pairs. - Leaf nodes contain pointers to the actual
objects, i.e., containing r, ltRIDgt . - A rectangle may be spatially contained in several
nodes (e.g., J ), yet it can be associated with
only one node.
May incur redundant search
A
B
C
- R-tree is a higher generalization of B-tree
- The nodes correspond to disk pages
- All leaf nodes appear at the same level
Root
Level 2
Level 3
26R-tree Insertion
- A new object is added to the appropriate leaf
node. - If insertion causes the leaf node to overflow,
the node must be split, and the records
distributed in the two leaf nodes.
- Minimizing the total area of the covering
rectangles (compact clusters) - Minimizing the area common to the covering
rectangles (to minimize redundant search) - Splits are propagated up the tree (similar to
B-tree).
27R-tree Delete
- If a deletion causes a node to underflow, its
nodes are reinserted (instead of being merged
with adjacent nodes as in B-tree). - Reason There is no concept of adjacency in an
R-tree.
28D-tree Domain Decomposition
- If the number of objects inside a domain exceeds
a certain threshold, the domain is split into two
subdomains. -
29D-tree Split Example
D-tree
Embedding Space
Internal node
D
External node
D22.P
30D-tree Split Examples
D-tree
Embedding Space
D
A domain node
Original domain
A data node
A subdomain
D1
D2
D11
D12
31D-tree Split Example (continued)
D-tree
Embedding Space
After 3rd split
D11
D2
D121
D122
D11
D2
D121
D122
Internal node
After 4th split
D1
D2
External node
D11
D21
D122
D121
D22
D11
D121
D122
D21
D22
D22.P
32D-tree Search Algorithm
- Search(D_tree_root, search_object )
- Current_node D_tree_root
- For each entry in current-node, say (D, P ), do
- if D contains search_object, we do the
following - if Current_node is an external node
- retrieve the objects through D.P
compare them - if Current_node is an internal node
- call Search(D.P, search_object) /
Recursive call -
33D-tree Range Query
Discard this object
- A range query can be represented as a hypercube
embedded in the search space - Search Strategy
- Use D-tree to retrieve all subdomains which
overlap with the query cube. - For each such subdomain which is not fully
contained in the query cube, discard the objects
falling outside the query cube.
Range query
34D-tree Range Query
- Search(D_tree_root, search_cube)
- Current_node D_tree_root
- For each entry in Current_node, say (D, P), if D
overlaps with search_cube, do - If Current_node is an external node, retrieve the
objects in D.P, which fall within the overlap
region. - If Current_node is an internal node,
- call Search(D.P, search_cube).
35D-tree Desirable Properties
- D-tree is a balanced tree
- Search path for an object is unique
- No redundant search
- More splits occur in denser regions of the search
space. - No unnecessary splits
- Objects evenly distributed among data nodes
- Similar objects are physically clustered in the
same, or neighboring data nodes. - Good performance is ensured regardless of the
insertion order of the data.
36Curse of Dimensionality
- As the number of dimensions (D) increases, the
probability of finding data in the sphere
(Vol-S/V0l-C) decreases exponentially - Most data are in corners of the cube
- More dimension we have, more similar things
appear (i.e., data have equi-distance)
D Vol-S/Vol-C
1 100
2 78
3 52
4 31
5 17
6 9
Corners are very dense in high dimensional spaces
37Effect on High Dimensionality
- Figure (a) As dimensionality increases, it
requires a substantially greater search radius to
retrieve the same percentage of data (Note 0.02
means retrieving 3 from 16,000 images.) - Figure (b) In a high dimensional space, the
approximating hypercube query returns
substantially more candidate data items as the
search radius increases - Figure (c) The retrieval time increases
exponentially with the increases in the number of
dimensions (for a given selectivity) after a
certain high dimensionality
1
1
Long retrieval time
Need larger approximate query
Mostly irrelevant data in corners
38Effect on k-NN Queries
- Process k-NN query
- Use an approximating hypercube query to find kc
candidate neighbors - Examine the candidates to determine the k nearest
neighbors - Effect of high dimensionality
- In a high dimensional space, kc gtgt k
- When kc is a very large percentage of the
database, no index structure is helpful
39Sequential Scan is Better
- In a high-dimensional space, tree-based indexing
structures examine large fraction of leaf nodes - Instead of visiting so many tree nodes, it is
better - to scan the whole data set, and
- avoid performing seeks altogether
40Vector Approximation (VA) File
- How to speed-up linear scan ?
- A Solution - use approximation
- Divide data space into cells and allocates a
bit-string to each cell - Vectors inside a cell are approximated by the
cell - VA-file is an array of these geometric
approximations - For search,
- the VA-file is scanned to select candidate
vectors (i.e., relevant cells). - Candidates are then verified by visiting the
vector files (i.e., the original vectors)
41VA-File Example
Vectorfile
Vector Data Vector Data
O1 (0.1, 0.9)
O2 (0.7, 0.7)
O3 (0.3, 0.4)
O4 (0.8, 0.2)
Approximation Approximation
O1 00 11
O2 10 10
O3 01 01
O4 11 00
VA file
42Principle Component Analysis
- Principle Component Analysis (PCA)
- Goal is to find a projection that captures the
largest amount of variation in data - transforms a number of possibly correlated
variables (X1 and X2) into a smaller number of
uncorrelated variables called principle
components - This technique can be used to reduce the number
of dimensions in content-based image retrieval
(CBIR)
e
X2
X1
43Review Variance
44Covariance
45Covariance Matrix
X1 X2 X3 X4 X5 X6 X7 X8
X1
X2
X3
X4
X5
X6
X7
X8
Cov(X3,X4)
Cov(X7,X7) s7
46Review Transformation Matrix
47Review Eigenvector (2)
Not an eigenvector
- A matrix acts on a vector by changing both its
magnitude and its direction - This matrix may act on certain vectors by
changing only their magnitude, and leaving their
direction unchanged (or possibly reversing it) - These vectors are the eigenvectors of the matrix
Eigenvector
48A Transformation Example
- This transformation matrix does not change the
direction or magnitude of the vectors along the
central vertical axis (e.g., the red vector) - All the pixels along the central vertical axis
are the eigenvectors of this transformation matrix
Apply transformation
49Eigenvector Properties
- Eigenvectors can only be found for square
matrices - Not every square matrix has eigenvectors
- If an nxn matrix does have eigenvectors, there
are n of them - If we scale the vector by some amount before the
multiplication, we still get the same multiple of
it as a result - All the unit eigenvectors (i.e., length is 1) are
perpendicular
50Eigenvalue
The amount by which the original vector was
scaled after multiplication by the square matrix
is the same
Eigenvector
Scaled vector
Scale the eigenvector
Eigenvalue
- No matter what multiple of the eigenvector we
take before the multiplication, we always get 4
times the scaled vector as the result - 4 is the eigenvalue associated with this
eigenvector
51Principal Components Analysis (1)
- Prepare the matrix Data to hold the original data
set, with a data item in each column and each row
holding a separate dimension.
52Principal Components Analysis (2)
- Compute the mean for each dimension of the data
set (i.e., averaging each row in Data)
Compute means
53Principal Components Analysis (3)
- Compute the matrix DataAdjust by subtracting
each entry in Data by the mean of the
corresponding dimension - Note This produce a data set whose mean in each
dimension is zero
Adjusted entry
54Principal Components Analysis (4)
- Compute the covariance matrix for the dimensions
55Principal Components Analysis (5)
- Calculate the unit eigenvectors and eigenvalues
of the covariance matrix. - Note Most math packages give unit eigenvectors
56Principal Components Analysis (6)
- Sort the eigenvectors in descending order
according to their eigenvalue. - The eigenvector with the largest eigenvalue is
the principal component - The sorting order gives the components in order
of significance
More significant
57PCA Dimension Reduction
- The eigenvectors corresponding to the principle
components are orthogonal - We can map data from the original space into the
new space defined by the orthogonal vectors - We can reduce the number of dimensions by
dropping some of the less important components
58PCA Deriving the New Data Set
- Choose the components (eigenvectors) we want to
keep and form a transformation matrix F, with an
eigenvector in each row - Compute the new data set by applying the
transformation F to the adjusted data matrix
(coordinate transformation) - FinalData F x DataAdjust T
- We have projected the data from the original
coordinate system to a lower dimensional space
defined by the chosen eigenvectors - The relative spatial distances among the original
data items are mostly preserved in the lower
dimensional space
59PCA - Summary
- Determine the eigenvectors of the covariance
matrix - These eigenvectors define the new space with
lower dimensionality
60Data Clustering
- Supervised Classification
- Semi-supervised Classification
- Unsupervised kMeans Clustering
- Semi-Supervised kMeans Clustering
- Constrained kMeans Clustering
61Supervised Classification
Training data (labeled data)
62Supervised Classification
Result of supervised learning
63Supervised Classification
Item to be classified
64Supervised Classification
Classified based on the dividing line
65Supervised Classification
- Support Vector Machine (SVM)
- Artificial Neural Networks (ANN)
66Support Vector Machine (1)
- The dashed lines mark the distance between the
dividing line and the closest vectors (points) to
the line - The vectors that constrain the width of the
margin are the support vectors - SVM analysis finds the dividing line that
maximizes the margin
Large margin
Small margin
Support vectors
67Support Vector Machine (2)
- Points on a 2-dimensional plane can be separated
by a 1-dimensional plane (i.e., a line) - Points in an d-dimensional plane can be separated
by a (d-1)-dimensional hyperplane
68Support Vector Machine (3)
- Problem What if the points are separated by a
nonlinear region
69Semi-supervised Learning
- Labeling a lot of data can be expensive
- Solution Semi-supervised learning
- Make use of unlabeled data in conjunction with a
small amount of labeled data - Examples
- Semi-supervised EM Ghahramani,NIPS94,
Nigam,ML00 - Transductive SVM Joachims,ICML99
- Co-training Blum,COLT98
70Co-training (1)
- Many problems have two complimentary views that
can be used to label data - Example Faculty home pages
- my advisor pointing to a page is a good
indicator that it is a faculty home page - I am teaching in a page is a good indicator
that it is a faculty home page
71Co-training (2)
- Features can be split into two sets, i.e.,
x(x1,x2). - L set of labeled examples U set of
unlabeled examples - Repeat k times
- Use L to train a classifier c1 that considers
only x1 - Use L to train a classifier c2 that
considers only x2 - Apply c1 to label p positive and n negative
examples from U - Apply c2 to label p positive and n negative
examples from U - Move these self-labeled examples from U to L
- / c1 adds labeled examples to L that c2
will be able to use for learning, and vice verse - Assumption The class of each instance can be
accurately predicted from each of the two feature
subsets alone
72Unsupervised Clustering kMeans
Randomly initialize k means
73Unsupervised Clustering kMeans
Assign points to clusters
74Unsupervised Clustering kMeans
Re-estimate means
75Unsupervised Clustering kMeans
Re-assign points to clusters
76Unsupervised Clustering kMeans
Re-estimate means
77Unsupervised Clustering kMeans
Assign points to clusters
These are the clusters
No changes ? Converge
78Unsupervised Clustering kMeans
- Initialize k cluster centers
- Repeat until convergence
- Assign points to the cluster with the closest
center - For each cluster, re-estimate the center as the
mean of the points in that cluster
Choose well-separated centers
Property Locally minimizes sum of distances
between the data points and their corresponding
cluster center ? More compact clusters
79Semi-supervised Clustering
Idea Uses small amount of labeled data to guide
(bias) the clustering of unlabeled data Example
Use the labeled data to initialize clusters in
k-Means algorithm
Labeled data
80Semi-supervised Clustering
There are three clusters
81Constrained kMeans Clustering
- Must-link constraints specify that the two points
have to be in the same cluster - Cannot-link constraints specify that the two
points must not be placed in the same cluster
Con Set of must-link constraints
D Data set
Con? Set of cannot-link constraints
82Constrained kMeans Clustering Wagstaff, ICML01
- Let C1 Ck be the initial cluster centers
- For each point di in D, assign it to the closest
cluster Cj such that VIOLATION(di, Cj, Con,
Con?) is false. If no such cluster exists, fail
(return ) - For each cluster Ci , update its center by
averaging all of the points di that have been
assigned to it - Repeat (2) and (3) until convergence
- Return C1 Ck
Can we assign d to C without violating any
constraint
- VIOLATION(data point d, cluster C, must-link
constraints Con , cannot-link constraints Con?) - For each (d, d) ? Con , If d is in some C,
return true - For each (d, d?) ? Con? , If d? is in C, return
true - Otherwise, return false
83Content-Based Image Indexing
- Keyword Approach
- Problem there is no commonly agreed-upon
vocabulary for describing image properties. - Computer Vision Techniques
- Problem General image understanding and object
recognition is beyond the capability of current
computer vision technology. - Image Analysis Techniques
- It is relatively easy to capture the primitive
image properties such as - prominent regions,
- their colors and shapes,
- and related layout and location information
within images. - These features can be used to index image data.
- Challenge Semantic gap !
84Features Acquisition Image Segmentation
- Group adjacent pixels with similar color
properties into one region, and - segment the pixels with distinct color properties
into different regions.
Original Segmented Contour
85Image Indexing by contents
- By applying image segmentation techniques, a set
of regions are detected along with their
locations, sizes, colors, and shapes. - These features can be used to index image data.
86Color
- We can divide the color space into a small number
of zones, each of which is clearly distinct with
others for human eyes. - Each of the zones is assigned a sequence number
beginning from zero.
Notes Human eyes are not very sensitive to
colors. In fact, users only have a vague idea
about the colors they want to specify.
87Shape
- Shape feature can be measured by two properties
circularity and major axis orientation. - Circularity
-
- Major Axis Orientation
The more circular the shape, the closer to one
the circularity
88Location
- Image is divided into sub-areas.
- Each sub-area is labeled with a number.
- Region location is represented by ID of the
sub-area in which the gravity center of the
region is contained. - Note When a user queries the database by visual
contents, approximate feature values are used. - It is meaningless to use absolute feature values
as indices.
1
0
2
- Location of A is 4
- Location of B is 1
5
4
3
6
7
8
89Size
- The size range is divided into groups.
- A regions size is represented by the
corresponding group number. - Example
Cell area is Asub
ASSIGNED SIZE ACTUAL SIZES
1 ¼ Asub ? S ½ Asub
2 ½ Asub ? S Asub
3 Asub ? S 2 Asub
4 2 Asub ? S 3 Asub
5 3 Asub ? S 4 Asub
6 4 Asub ? S 5 Asub
7 5 Asub ? S 6 Asub
8 6 Asub ? S 7 Asub
9 7 Asub ? S 8 Asub
10 8 Asub ? S 9 Asub
Notes Only regions more than one-fourth of the
sub-area are registered.
90Texture Areas
- Texture areas and images with dominant high
frequency components are beyond the capacity of
image segmentation techniques. - Matching on the distribution of colors (i.e.,
color histograms) is a simple yet effective means
for these areas. - Strategy Dividing an image into sub-areas and
creating a histogram for each of the sub-areas. - Potential Issue the partitioning of the
image is to capture locality information. We
dont want to match an image with a red balloon
on top with an image with a red car in the bottom.
91Histograms
- Gray-Level Histogram It is a plot of the number
of pixels that assume each discrete value that
the quantized image intensity can take. - Color Histogram It holds information on color
distribution. It is a plot of the statistics of
the R, G, B components in the 3-D color space.
92Histograms (cont.)
Most histogram bins are sparsely populated, with
only a small number of bins capturing the
majority of pixel counts.
- We can use the largest, say 20, bins as the
representative bins of the histogram. - these 20 bins form a chain in the 3-D color
space.
93Histograms (cont.)
- If we can represent such chains using a numerical
number, then we can index the color images using
a B-tree. - Connecting order Representative bins are sorted
in ascending order by their distance from the
origin of the color space. - Weighted Perimeter
-
- Weighted Angle
- Format of index key
93
94Video Content Extraction
- Other forms of information extraction can be
employed
- Close-captioned text
- Speech recognition
- Descriptive information from screenplay
- Key frames that characterize a shot
- These content information can be associated with
the video story units.
95Story Units
- Shot Frames recorded in one camera operation
form a shot. - Scene One or several related shots are combined
in a scene. - Sequence A series of related scenes forms a
sequence. - Video A video is composed of different story
units such as shots, scenes, and sequences
arranged according to some logical structure
(defined by the screen play). - These concepts can be used to organize video data
96Video Modeling Approaches
- Physical Feature based Modeling
- Semantic Content based Modeling
97Physical Feature based Modeling
- A Video is represented and indexed based on
audio-visual features - Features can be extracted automatically
- Queries are formulated in terms of color,
texture, audio, or motion information - Very limited in expressing queries close to human
thinking - It would be difficult to ask for a video clip
showing the sinking of the ship in the movie
Titanic using only color description
98Semantic Content based Modeling
- Video semantics are captured and organized to
support video retrieval - Difficult to automate
- Partially rely on manual annotation
- Capable of supporting natural language like
queries.
99Semantic-Level Models
- Segmentation-based Models
- Stratification-based Models
- Temporal Coherent Model
100Segmentation-based Modeling
- A video stream is segmented into temporally
continuous segments - Each segment is associated with a description
which could be natural text, keywords, or other
kinds of annotation. - Disadvantages
- Lack of flexibility
- Limited in representing semantics
gt gt
101Stratification-based
- We partition the contextual information into
single events. - Each event is associated with a video segment
called a stratum. - Strata can overlap or encompass each other.
102Temporal Coherent
- Each event is associated with a set of video
segments where it happens. - More flexible in structuring video semantics.
103Stratum
The concept of stratification can be used to
assign descriptions to video footage. - Each
stratum refers to a sequence of video frames. -
The strata may overlap or totally encompass each
other.
Advantage Allowing easy retrieval by keyword,
e.g., using inverted index
104Inverted Index
Inverted Index Inverted Index
ANSI D1, D2
C D3, D4, D5
DB2 D1, D6, D21
GUI D2, D8. D11
SYBASE D2, D6, D17
MULTIMEDIA D5, D15
ORACLE D3, D11, D19
RELATIONAL D2, D3, D11, D19
SQL D2, D11, D20
JAVA D3, D20
Each document entry contains
- frequency of the term in the document,
- locations of the term in the document,
- other information pertaining to the relationship
of the term and the document.
105Video Algebra
- Goal To provide a high-level abstraction that
- models complex information associated with
digital video data, and - supports content-based access
- Strategy
- The algebraic video data model consists of
hierarchical compositions of video expressions
with high-level semantic descriptions - The video expressions are constructed using video
algebra operations
106Presentation
- The fundamental entity is a presentation
- A presentation is described by a video
expression. - A video expression describes a multi-window
spatial-temporal, and content combination of
video segments.
A presentation
107Presentation
- The fundamental entity is a presentation
- A presentation is described by a video
expression. - A video expression describes a multi-window
spatial-temporal, and content combination of
video segments. - An algebraic video node provides a means of
abstraction by which video expressions can be
named stored, and manipulated as units
Primitive video expression creates a
single-window presentation from a raw video
segment
Compound video expression constructed from
simpler expressions using video algebra operations
An algebraic video node
video expression
video expression
video expression
video expression
Raw video
108Video Algebra Operations
- The video algebra operations fall into four
categories - Creation defines the construction of video
expressions from raw video. - Description associates content attributes with a
video expression. - Composition defines temporal relationships
between component video expressions. - Output defines spatial layout and audio output
for component video expressions.
109Descriptions
- description E1 content specifies that E1 is
described by content. - a content is a Boolean combination of attributes
that consists of a field name and a value. - some field names have predefined semantics (e.g.,
title), while other fields are user-definable. - values can assume a variety of types, including
strings and video node names. - field names or values do not have to be unique
within a description. - hide-content E1 defines a presentation that
hides the content of E1 (i.e.., E1 does not
contain any description). - This operation provides a method for creating
abstraction barriers for content-based access.
Example title CNN Headline News
110Composition
The composition operations can be combined to
produce complex scheduling definitions and
constraints.
create a video presentation raw video
segment
C1 create Cnn.HeadlineNews.rv 10 30 C2
create Cnn.HeadlineNews.rv 20 40 C3 create
Cnn.HeadlineNews.rv 32 65 D1 (description
C1 Anchor speaking) D2 (description C2
Professor Smith) D3 (description C3 Economic
reform)
D3 follows D2 which follows D1, and common
footages are not repeated. (It creates a
non-redundant video stream from three overlapping
segments.)
C1
C3
C2
Anchor speaking
Professor Smith
Economic reform
111Composition Operators (1)
Operator Description
E1 ? E2 defines the presentation where E2 follows E1
E1 È E2 defines the presentation where E2 follows E1 and common footage is not repeated
E1 Ç E2 defines the presentation where only common footage of E1 and E2 is played
E1 - E2 defines the presentation where only footage of E1 that is not in E2 is played
E1 E2 E1 and E2 are played concurrently and terminate simultaneously
(test) ? E1E2...En Ei is played if test evaluates to i.
loop E1 time defines a repetition of E1 for a duration of time
112Composition Operators (2)
Operator Description
stretch E1 factor sets the duration of the presentation equal to factor times duration of E1 by changing the playback speed of the video segment
limit E1 time sets the duration of the presentation equal to the minimum of time and the duration of E1, but the playback speed is not changed
113Composition Operators (3)
- transition E1 E2 type time defines type
transition effect between E1 and E2 time defines
the duration of the transition effect - The transition type is one of a set of
transition effects, such as dissolve, fade, and
wipe. - contains E1 query defines the presentation that
contains component expressions of E1 that match
query. (similar to FROM clause in SQL) - A query is a Boolean combination of attributes
- Example text smith and text question
114Output Characteristics
- Video expressions include output characteristics
that specify the screen layout and audio output
for playing back children streams.
A presentation
115Output Characteristics
- Video expressions include output characteristics
that specify the screen layout and audio output
for playing back children streams. - window E1 (X1 , Y1 ) - (X2 , Y2 ) priority
- specifies that E1 will be displayed with
priority in the window defined by the top-left
corner (X1 , Y1) and the bottom-right corner (X2
, Y2) such that Xi in 0, 1 and Yi in 0, 1. - Window priorities are used to resolve overlap
conflicts of screen display. - audio E1 channel force priority
- specifies that the audio of E1 will be output to
channel with priority if force is true, then the
audio operation overrides any channel
specifications of the component video expressions.
116Output Characteristics example
- C1 create MagicvsBulls.rv 300 500
- P1 window C1 (0, 0) - (0.5, 0.5) 10
- P2 window C1 (0, 0.5) - (0.5, 1) 20
- P3 window C1 (0.5, 0.5) - (1, 1) 30
- P4 window C1 (0.5, 0) - (1, 0.5) 40
- P5 (P1 P2 P4)
- P6 (P1 P2 P3 P4)
-
- (P5
- (window
- (P5 (window P6 (0.5, 0.5) - (1, 1)
60)) - (0.5, 0.5) - (1, 1) 50))
Since expressions can be nested, the spatial
layout of any particular video expression is
defined relative to the parent rectangle.
Larger means higher priority
Top-left corner
Bottom-right corner
A presentation
117Scope of a video node description
- The scope of a given algebraic video node
description is the subgraph that originates from
the node. - The components of a video expression inherit
descriptions by context. - i.e., content attributes associated with some
parent video nodes are also associated with all
its descendant nodes.
118Content Based Access
- Search query Search a collection of video
nodes for video expressions that match query. - Example search text smith AND text
question - Strategy Matching a query to the attributes of
an expression must take into account all of the
attributes of that expression including the
attributes of its encompassing expressions.
Smith on economic reform
Result of the query
?
This node also satisfies the query but is not
returned because its a descendant of a node
already in the result set
Smith
Anchor
O
Question from audience
O
Question
.
Raw video
119Browsing and Navigation
- Playback video-expression
- Plays the video expression. It enables the user
to view the presentation defined by the
expressions. - Display video-expression
- Display the video expression. It allows the
user to inspect the video expression. - Get-parent video-expression
- Returns the set of nodes that directly point to
video-expression. - Get-children video-expression
- Returns the set of nodes video-expression
directly points at.
120Algebraic Video System Prototype
- The implementation is build on top of three
existing subsystems - The VuSystem is used for managing raw video data
and for its support of Tcl (Tool command
language) programming. It provides an environment
for recording, processing, and playing video. - The Semantic File System is used as a storage
subsystem with content-based access to data for
indexing and retrieving files that represent
algebraic video nodes. - The Web server provides a graphical interface to
the system that includes facilities for querying,
navigating, video editing and composing, and
invoking the video player.