Search Techniques for Multimedia Databases - PowerPoint PPT Presentation

1 / 120
About This Presentation
Title:

Search Techniques for Multimedia Databases

Description:

Title: Media and Data Stream Author: Morris Rich Last modified by: Kien Created Date: 6/17/1995 11:31:02 PM Document presentation format: Letter Paper (8.5x11 in) – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 121
Provided by: Morri93
Learn more at: http://www.cs.ucf.edu
Category:

less

Transcript and Presenter's Notes

Title: Search Techniques for Multimedia Databases


1
Search Techniques for Multimedia Databases
  • Similarity-Based Queries
  • Similarity Computation
  • Indexing Techniques
  • Data Clustering
  • Search Algorithms

2
Characteristic ofMultimedia Queries
  • We normally retrieve a few records from a
    traditional DBMS through the specification of
    exact queries based on the notions of equality.
  • The types of queries expected in an image/video
    DBMS are relatively vague or fuzzy, and are based
    on the notion of similarity.

3
Content-Based Retrieval
  • It is necessary to extract the features which are
    characteristic of the image and index the image
    on these features.
  • Examples Shape descriptions, texture
    properties.
  • Typically, there are a few different quantitative
    measures which describes the various aspects of
    each feature.
  • Example The texture attribute of an image
    can be modeled as a 3-dimensional vector with
    measures of directionality, contrast, and
    coarseness.

4
Measure of Similarity
  • A suitable measure of similarity between an image
    feature vector F and query vector Q is the
    weighted metric D
  • where W is an nxn matrix which can be used to
    specify suitable weighting measures.

W1(F1-Q1)2 W2(F2-Q2)2 Wn(Fn-Qn)2
Square of weighted Euclidean Distance
5
Measure of Similarity
  • A suitable measure of similarity between an image
    feature vector F and query vector Q is the
    weighted metric D
  • where W is an nxn matrix which can be used to
    specify suitable weighting measures.

W1(F1-Q1)2 W2(F2-Q2)2 Wn(Fn-Qn)2
Square of weighted Euclidean Distance
6
Similarity Based on Euclidean Distance
A Identity Matrix
3
2
3
2
é
ù
é
ù
é
ù
é
ù
ê
ú
ê
ú
ê
ú
ê
ú
F
F
F
Q
4
4
4
4






ê
ú
ê
ú
ê
ú
ê
ú
1
2
3
6
7
7
6
ê
ú
ê
ú
ê
ú
ê
ú
ë
û
ë
û
ë
û
ë
û
7
Similarity Based on Euclidean Distance (cont.)
Features 1 2 are treated equally

Feature 2
Feature 1
Points which lie at the same distance from the
query point are considered equally similar, e.g.,
F1 and F2.
8
Similarity Based on Weighted Euclidean Distance
  • Where W is the diagonal.

Example
4
3
3
1
0
0
é
ù
é
ù
é
ù
é
ù
ê
ú
ê
ú
ê
ú
ê
ú
F
F
Q
W
5
5
5
0
1
0







ê
ú
ê
ú
ê
ú
ê
ú
1
2
7
8
7
0
0
2
ê
ú
ê
ú
ê
ú
ê
ú
ë
û
ë
û
ë
û
ë
û
1
0
0
1
é
ù
é
ù
ê
ú
ê
ú
Dissimilarity in 3rd dimension emphasized


D(F1 ,Q)
0
1
0
0
1




1
0
0
ê
ú
ê
ú
ê
ú
ê
ú
0
0
2
0
ë
û
ë
û
1
0
0
0
é
ù
é
ù


ê
ú
ê
ú
0
1
0
0
2



D(F2 ,Q)
0
0
1

ê
ú
ê
ú
ê
ú
ê
ú
0
0
2
1
ë
û
ë
û
D(F1 ,Q) lt D(F2 ,Q) ? F1 is more similar to Q
9
How to determine the weights ?
The variance of the individual feature measures
can be used to determine their weights.
0
0
1/s12
si2 Statistical variance of the i th feature
measure.
A

0
0
1/s22
0
0
1/s32
  • Rationale
  • Variance characterizes the dispersion among the
    measures
  • Use larger weight for feature with smaller
    variance.

10
Effect of Weights
Shape
A cluster of similar objects (e.g., cars)
Color
This feature has larger variance, use smaller
weight
11
Distance in Euclidean Space
1-norm distance (Manhattan distance)
2-norm distance (Euclidean distance)
p-norm distance (Minkowski distance)
Infinity norm distance (Chebyshev distance)
Maximum distance between any component of the two
vectors
12
Common Properties of a Distance
  • Distances, such as the Euclidean distance, have
    some well known properties.
  • Positive Definiteness d(p, q) ? 0 for all p
    and q and d(p, q) 0 only if p q.
  • Symmetry d(p, q) d(q, p) for all p and q.
  • Triangle Inequality d(p, r) ? d(p, q) d(q, r)
    for all points p, q, and r.
  • where d(p, q) is the distance (dissimilarity)
    between points (data objects), p and q.
  • A distance that satisfies these properties is a
    metric

13
Query Types
  • k-Nearest-Neighbor (k-NN) Queries The user
    specifies the number of close matches to the
    given query point.
  • Retrieve 10 images most similar to this sample
  • Range queries An interval is given for each
    dimension of the feature space and all the images
    which fall inside this hypercube are retrieved.

.
.
.
.
.
Q
.
.
.
Q
Q
.
.
.
.
.
.
.
.
.

.
r
.

.
.
.
r is large r
is small range query ?
vague query ?
4-nearest-neighbor
query
14
Multiattribute and Spatial Indexing
  • Spatial Databases Queries involve regions that
    are represented as multidimensional objects.
  • Example A rectangle in a 2-dimensional space
    involves four values - two points with two values
    for each point (4D vector).
  • Access methods that index on multidimensional
    keys yield better performance for spatial
    queries.

(X1,Y1)
(X2,Y2)
15
Multiattribute and Spatial Indexing of Multimedia
Objects
  • Multimedia Databases Multimedia objects
    typically have several attributes that
    characterize them.
  • Example Attributes of an image include
    coarseness, shape, color, etc.
  • Multimedia databases are also good candidates for
    multikey search structures.

shape
Coarseness
Average color
16
Indexing Multimedia Objects
  • Cant we index multiple features using a B-tree
    ?
  • B-tree defines a linear order (e.g., according
    to X)
  • Similar objects (e.g., O1 and O2) can be far
    apart in the indexing order
  • Why multidimensional indexing ?
  • A multidimensional index defines a spatial
    order
  • Conceptually similar objects are spatially near
    each other in the indexing order (e.g., O1 and
    O2)

17
Some Multidimensional Search Structures
  • k-d Tree
  • Multidimensional Trie
  • Grid File
  • R Tree
  • Point-Quad Tree
  • D-Tree

18
k-d Tree
  • Each node consists of a record and two
    pointers. The pointers are either null or point
    to another node.
  • Nodes have levels and each level of the tree
    discriminates for one attribute.
  • The partitioning of the space with respect to
    various attributes alternates between the various
    attributes of the n-dimensional search space.

Discriminator
Example 2-D tree
A(65, 50)

X Y Y
Input Sequence A (65, 50) B (60, 70) C
(70, 60) D (75, 25) E (50, 90) F
(90, 65) G (10, 30) H (80, 85) I
(95, 75)
B(60, 70)
C(70, 60)


F(90, 65)


X


D(75, 25)
G(10,30)
E(50,90)


Insertion order can affect performance
H(80, 85)
I(95, 75)
19
k-d Tree Search Algorithm
  • Notations
  • Algorithm Search for P(K1, ..., Kn)
  • Q Root / Q will be used to navigate
    the tree /
  • While NOT DONE DO the following
  • if (Ki(P) Ki(Q) for i 1, ..., n) then
    we have / Agree in each /
  • located the node and we are
    DONE / dimension /
  • Otherwise if A Disc(Q) and KA(P) lt
    KA(Q)
  • then Q High(Q)
  • else Q Low(Q)
  • Performance O(logN), where N is the number of
    records

Disc(L) The discriminator at Ls level Ki(L)
The i-attribute value of node L Low(L) The
left child of L High(L) The right child of L
(..., Ki(L), ...)
L
N High(L)
M
N
M Low(L)
20
Multidimensional Trie
  • Multidimensional tries, or k-d tries, are similar
    to k-d tree except that they divide the embedding
    space.
  • ? Each split evenly divides a region

Example Construction of a 2D trie
Y

C(70, 60)


B(60,70)
Partitioning the space
A(65,50)

D(75,25)
X
20
21
Disadvantages of k-d Trie
  • The maximum level of decomposition depends on the
    minimum separation between two points
  • A solution Split a region only if it contains
    more than p points (i.e., using buckets)
  • Not a balanced tree
  • ? unpredictable performance

22
Grid File
Split Strategy The partitioning is done with
only one hyperplane, but the split extends to all
regions in the splitting direction
23
Grid File - Potential Issues
Grid directory
A
B
C
K
D
E
F
K
H
I
J
K
Data bucket H
L
L
M
K
1
2
3
4
75
100
0
25
50
Data basket K
  • The directory can be quite sparse.
  • Many adjacent directory entries may point to the
    same data block.
  • For partial-match and range queries, many
    directory entries, but only few data blocks, may
    have to be scanned (i.e., sequential search might
    be faster)

24
Point-Quad Tree
  • Each node of a k-dimensional quad tree partitions
    the object space into k quadrants.
  • The partitioning is performed along all search
    dimensions and is data dependent, like k-d tree.

Partitioning of the space
D(35,85)


B(75,80)

A(50,50)

E(25,25)
  • To search for P(55, 75)
  • Since XAlt XP and YA lt YP ? go to NE (i.e., B).
  • Since XB gt XP and YB gt YP ? go to SW, which in
    this case is null.

25
R-tree (RegionTree)
  • Root and intermediate nodes correspond to the
    smallest rectangle that encloses its child nodes,
    i.e., containing r, ltpage
    pointergt pairs.
  • Leaf nodes contain pointers to the actual
    objects, i.e., containing r, ltRIDgt .
  • A rectangle may be spatially contained in several
    nodes (e.g., J ), yet it can be associated with
    only one node.

May incur redundant search
A
B
C
  • R-tree is a higher generalization of B-tree
  • The nodes correspond to disk pages
  • All leaf nodes appear at the same level

Root
Level 2
Level 3
26
R-tree Insertion
  • A new object is added to the appropriate leaf
    node.
  • If insertion causes the leaf node to overflow,
    the node must be split, and the records
    distributed in the two leaf nodes.
  • Minimizing the total area of the covering
    rectangles (compact clusters)
  • Minimizing the area common to the covering
    rectangles (to minimize redundant search)
  • Splits are propagated up the tree (similar to
    B-tree).

27
R-tree Delete
  • If a deletion causes a node to underflow, its
    nodes are reinserted (instead of being merged
    with adjacent nodes as in B-tree).
  • Reason There is no concept of adjacency in an
    R-tree.

28
D-tree Domain Decomposition
  • If the number of objects inside a domain exceeds
    a certain threshold, the domain is split into two
    subdomains.

29
D-tree Split Example
D-tree
Embedding Space
Internal node
D
External node
D22.P
30
D-tree Split Examples
D-tree
Embedding Space
D
A domain node
Original domain
A data node
A subdomain
D1
D2
D11
D12
31
D-tree Split Example (continued)
D-tree
Embedding Space
After 3rd split
D11
D2
D121
D122
D11
D2
D121
D122
Internal node
After 4th split
D1
D2
External node
D11
D21
D122
D121
D22
D11
D121
D122
D21
D22
D22.P
32
D-tree Search Algorithm
  • Search(D_tree_root, search_object )
  • Current_node D_tree_root
  • For each entry in current-node, say (D, P ), do
  • if D contains search_object, we do the
    following
  • if Current_node is an external node
  • retrieve the objects through D.P
    compare them
  • if Current_node is an internal node
  • call Search(D.P, search_object) /
    Recursive call

33
D-tree Range Query
Discard this object
  • A range query can be represented as a hypercube
    embedded in the search space
  • Search Strategy
  • Use D-tree to retrieve all subdomains which
    overlap with the query cube.
  • For each such subdomain which is not fully
    contained in the query cube, discard the objects
    falling outside the query cube.

Range query
34
D-tree Range Query
  • Search(D_tree_root, search_cube)
  • Current_node D_tree_root
  • For each entry in Current_node, say (D, P), if D
    overlaps with search_cube, do
  • If Current_node is an external node, retrieve the
    objects in D.P, which fall within the overlap
    region.
  • If Current_node is an internal node,
  • call Search(D.P, search_cube).

35
D-tree Desirable Properties
  • D-tree is a balanced tree
  • Search path for an object is unique
  • No redundant search
  • More splits occur in denser regions of the search
    space.
  • No unnecessary splits
  • Objects evenly distributed among data nodes
  • Similar objects are physically clustered in the
    same, or neighboring data nodes.
  • Good performance is ensured regardless of the
    insertion order of the data.

36
Curse of Dimensionality
  • As the number of dimensions (D) increases, the
    probability of finding data in the sphere
    (Vol-S/V0l-C) decreases exponentially
  • Most data are in corners of the cube
  • More dimension we have, more similar things
    appear (i.e., data have equi-distance)

D Vol-S/Vol-C
1 100
2 78
3 52
4 31
5 17
6 9
Corners are very dense in high dimensional spaces
37
Effect on High Dimensionality
  • Figure (a) As dimensionality increases, it
    requires a substantially greater search radius to
    retrieve the same percentage of data (Note 0.02
    means retrieving 3 from 16,000 images.)
  • Figure (b) In a high dimensional space, the
    approximating hypercube query returns
    substantially more candidate data items as the
    search radius increases
  • Figure (c) The retrieval time increases
    exponentially with the increases in the number of
    dimensions (for a given selectivity) after a
    certain high dimensionality

1
1
Long retrieval time
Need larger approximate query
Mostly irrelevant data in corners
38
Effect on k-NN Queries
  • Process k-NN query
  • Use an approximating hypercube query to find kc
    candidate neighbors
  • Examine the candidates to determine the k nearest
    neighbors
  • Effect of high dimensionality
  • In a high dimensional space, kc gtgt k
  • When kc is a very large percentage of the
    database, no index structure is helpful

39
Sequential Scan is Better
  • In a high-dimensional space, tree-based indexing
    structures examine large fraction of leaf nodes
  • Instead of visiting so many tree nodes, it is
    better
  • to scan the whole data set, and
  • avoid performing seeks altogether

40
Vector Approximation (VA) File
  • How to speed-up linear scan ?
  • A Solution - use approximation
  • Divide data space into cells and allocates a
    bit-string to each cell
  • Vectors inside a cell are approximated by the
    cell
  • VA-file is an array of these geometric
    approximations
  • For search,
  • the VA-file is scanned to select candidate
    vectors (i.e., relevant cells).
  • Candidates are then verified by visiting the
    vector files (i.e., the original vectors)

41
VA-File Example
Vectorfile
Vector Data Vector Data
O1 (0.1, 0.9)
O2 (0.7, 0.7)
O3 (0.3, 0.4)
O4 (0.8, 0.2)
Approximation Approximation
O1 00 11
O2 10 10
O3 01 01
O4 11 00
VA file
42
Principle Component Analysis
  • Principle Component Analysis (PCA)
  • Goal is to find a projection that captures the
    largest amount of variation in data
  • transforms a number of possibly correlated
    variables (X1 and X2) into a smaller number of
    uncorrelated variables called principle
    components
  • This technique can be used to reduce the number
    of dimensions in content-based image retrieval
    (CBIR)

e
X2
X1
43
Review Variance
44
Covariance
45
Covariance Matrix
X1 X2 X3 X4 X5 X6 X7 X8
X1
X2
X3
X4
X5
X6
X7
X8








Cov(X3,X4)
Cov(X7,X7) s7
46
Review Transformation Matrix
47
Review Eigenvector (2)
Not an eigenvector
  • A matrix acts on a vector by changing both its
    magnitude and its direction
  • This matrix may act on certain vectors by
    changing only their magnitude, and leaving their
    direction unchanged (or possibly reversing it)
  • These vectors are the eigenvectors of the matrix

Eigenvector
48
A Transformation Example
  • This transformation matrix does not change the
    direction or magnitude of the vectors along the
    central vertical axis (e.g., the red vector)
  • All the pixels along the central vertical axis
    are the eigenvectors of this transformation matrix

Apply transformation
49
Eigenvector Properties
  • Eigenvectors can only be found for square
    matrices
  • Not every square matrix has eigenvectors
  • If an nxn matrix does have eigenvectors, there
    are n of them
  • If we scale the vector by some amount before the
    multiplication, we still get the same multiple of
    it as a result
  • All the unit eigenvectors (i.e., length is 1) are
    perpendicular

50
Eigenvalue
The amount by which the original vector was
scaled after multiplication by the square matrix
is the same
Eigenvector
Scaled vector
Scale the eigenvector
Eigenvalue
  • No matter what multiple of the eigenvector we
    take before the multiplication, we always get 4
    times the scaled vector as the result
  • 4 is the eigenvalue associated with this
    eigenvector

51
Principal Components Analysis (1)
  1. Prepare the matrix Data to hold the original data
    set, with a data item in each column and each row
    holding a separate dimension.

52
Principal Components Analysis (2)
  1. Compute the mean for each dimension of the data
    set (i.e., averaging each row in Data)

Compute means
53
Principal Components Analysis (3)
  • Compute the matrix DataAdjust by subtracting
    each entry in Data by the mean of the
    corresponding dimension
  • Note This produce a data set whose mean in each
    dimension is zero

Adjusted entry
54
Principal Components Analysis (4)
  1. Compute the covariance matrix for the dimensions

55
Principal Components Analysis (5)
  • Calculate the unit eigenvectors and eigenvalues
    of the covariance matrix.
  • Note Most math packages give unit eigenvectors

56
Principal Components Analysis (6)
  • Sort the eigenvectors in descending order
    according to their eigenvalue.
  • The eigenvector with the largest eigenvalue is
    the principal component
  • The sorting order gives the components in order
    of significance

More significant
57
PCA Dimension Reduction
  • The eigenvectors corresponding to the principle
    components are orthogonal
  • We can map data from the original space into the
    new space defined by the orthogonal vectors
  • We can reduce the number of dimensions by
    dropping some of the less important components

58
PCA Deriving the New Data Set
  • Choose the components (eigenvectors) we want to
    keep and form a transformation matrix F, with an
    eigenvector in each row
  • Compute the new data set by applying the
    transformation F to the adjusted data matrix
    (coordinate transformation)
  • FinalData F x DataAdjust T
  • We have projected the data from the original
    coordinate system to a lower dimensional space
    defined by the chosen eigenvectors
  • The relative spatial distances among the original
    data items are mostly preserved in the lower
    dimensional space

59
PCA - Summary
  • Determine the eigenvectors of the covariance
    matrix
  • These eigenvectors define the new space with
    lower dimensionality

60
Data Clustering
  • Supervised Classification
  • Semi-supervised Classification
  • Unsupervised kMeans Clustering
  • Semi-Supervised kMeans Clustering
  • Constrained kMeans Clustering

61
Supervised Classification
Training data (labeled data)
62
Supervised Classification
Result of supervised learning
63
Supervised Classification
Item to be classified
64
Supervised Classification
Classified based on the dividing line
65
Supervised Classification
  • Support Vector Machine (SVM)
  • Artificial Neural Networks (ANN)

66
Support Vector Machine (1)
  • The dashed lines mark the distance between the
    dividing line and the closest vectors (points) to
    the line
  • The vectors that constrain the width of the
    margin are the support vectors
  • SVM analysis finds the dividing line that
    maximizes the margin

Large margin
Small margin
Support vectors
67
Support Vector Machine (2)
  • Points on a 2-dimensional plane can be separated
    by a 1-dimensional plane (i.e., a line)
  • Points in an d-dimensional plane can be separated
    by a (d-1)-dimensional hyperplane

68
Support Vector Machine (3)
  • Problem What if the points are separated by a
    nonlinear region

69
Semi-supervised Learning
  • Labeling a lot of data can be expensive
  • Solution Semi-supervised learning
  • Make use of unlabeled data in conjunction with a
    small amount of labeled data
  • Examples
  • Semi-supervised EM Ghahramani,NIPS94,
    Nigam,ML00
  • Transductive SVM Joachims,ICML99
  • Co-training Blum,COLT98

70
Co-training (1)
  • Many problems have two complimentary views that
    can be used to label data
  • Example Faculty home pages
  • my advisor pointing to a page is a good
    indicator that it is a faculty home page
  • I am teaching in a page is a good indicator
    that it is a faculty home page

71
Co-training (2)
  • Features can be split into two sets, i.e.,
    x(x1,x2).
  • L set of labeled examples U set of
    unlabeled examples
  • Repeat k times
  • Use L to train a classifier c1 that considers
    only x1
  • Use L to train a classifier c2 that
    considers only x2
  • Apply c1 to label p positive and n negative
    examples from U
  • Apply c2 to label p positive and n negative
    examples from U
  • Move these self-labeled examples from U to L
  • / c1 adds labeled examples to L that c2
    will be able to use for learning, and vice verse
  • Assumption The class of each instance can be
    accurately predicted from each of the two feature
    subsets alone

72
Unsupervised Clustering kMeans
Randomly initialize k means
73
Unsupervised Clustering kMeans
Assign points to clusters
74
Unsupervised Clustering kMeans
Re-estimate means
75
Unsupervised Clustering kMeans
Re-assign points to clusters
76
Unsupervised Clustering kMeans
Re-estimate means
77
Unsupervised Clustering kMeans
Assign points to clusters
These are the clusters
No changes ? Converge
78
Unsupervised Clustering kMeans
  • Initialize k cluster centers
  • Repeat until convergence
  • Assign points to the cluster with the closest
    center
  • For each cluster, re-estimate the center as the
    mean of the points in that cluster

Choose well-separated centers
Property Locally minimizes sum of distances
between the data points and their corresponding
cluster center ? More compact clusters
79
Semi-supervised Clustering
Idea Uses small amount of labeled data to guide
(bias) the clustering of unlabeled data Example
Use the labeled data to initialize clusters in
k-Means algorithm
Labeled data
80
Semi-supervised Clustering
There are three clusters
81
Constrained kMeans Clustering
  • Must-link constraints specify that the two points
    have to be in the same cluster
  • Cannot-link constraints specify that the two
    points must not be placed in the same cluster

Con Set of must-link constraints
D Data set
Con? Set of cannot-link constraints
82
Constrained kMeans Clustering Wagstaff, ICML01
  1. Let C1 Ck be the initial cluster centers
  2. For each point di in D, assign it to the closest
    cluster Cj such that VIOLATION(di, Cj, Con,
    Con?) is false. If no such cluster exists, fail
    (return )
  3. For each cluster Ci , update its center by
    averaging all of the points di that have been
    assigned to it
  4. Repeat (2) and (3) until convergence
  5. Return C1 Ck

Can we assign d to C without violating any
constraint
  • VIOLATION(data point d, cluster C, must-link
    constraints Con , cannot-link constraints Con?)
  • For each (d, d) ? Con , If d is in some C,
    return true
  • For each (d, d?) ? Con? , If d? is in C, return
    true
  • Otherwise, return false

83
Content-Based Image Indexing
  • Keyword Approach
  • Problem there is no commonly agreed-upon
    vocabulary for describing image properties.
  • Computer Vision Techniques
  • Problem General image understanding and object
    recognition is beyond the capability of current
    computer vision technology.
  • Image Analysis Techniques
  • It is relatively easy to capture the primitive
    image properties such as
  • prominent regions,
  • their colors and shapes,
  • and related layout and location information
    within images.
  • These features can be used to index image data.
  • Challenge Semantic gap !

84
Features Acquisition Image Segmentation
  • Group adjacent pixels with similar color
    properties into one region, and
  • segment the pixels with distinct color properties
    into different regions.

Original Segmented Contour

85
Image Indexing by contents
  • By applying image segmentation techniques, a set
    of regions are detected along with their
    locations, sizes, colors, and shapes.
  • These features can be used to index image data.

86
Color
  • We can divide the color space into a small number
    of zones, each of which is clearly distinct with
    others for human eyes.
  • Each of the zones is assigned a sequence number
    beginning from zero.

Notes Human eyes are not very sensitive to
colors. In fact, users only have a vague idea
about the colors they want to specify.
87
Shape
  • Shape feature can be measured by two properties
    circularity and major axis orientation.
  • Circularity
  • Major Axis Orientation


The more circular the shape, the closer to one
the circularity
88
Location
  • Image is divided into sub-areas.
  • Each sub-area is labeled with a number.
  • Region location is represented by ID of the
    sub-area in which the gravity center of the
    region is contained.
  • Note When a user queries the database by visual
    contents, approximate feature values are used.
  • It is meaningless to use absolute feature values
    as indices.

1
0
2
  • Location of A is 4
  • Location of B is 1

5
4
3
6
7
8
89
Size
  • The size range is divided into groups.
  • A regions size is represented by the
    corresponding group number.
  • Example

Cell area is Asub
ASSIGNED SIZE ACTUAL SIZES
1 ¼ Asub ? S ½ Asub
2 ½ Asub ? S Asub
3 Asub ? S 2 Asub
4 2 Asub ? S 3 Asub
5 3 Asub ? S 4 Asub
6 4 Asub ? S 5 Asub
7 5 Asub ? S 6 Asub
8 6 Asub ? S 7 Asub
9 7 Asub ? S 8 Asub
10 8 Asub ? S 9 Asub
Notes Only regions more than one-fourth of the
sub-area are registered.
90
Texture Areas
  • Texture areas and images with dominant high
    frequency components are beyond the capacity of
    image segmentation techniques.
  • Matching on the distribution of colors (i.e.,
    color histograms) is a simple yet effective means
    for these areas.
  • Strategy Dividing an image into sub-areas and
    creating a histogram for each of the sub-areas.
  • Potential Issue the partitioning of the
    image is to capture locality information. We
    dont want to match an image with a red balloon
    on top with an image with a red car in the bottom.

91
Histograms
  • Gray-Level Histogram It is a plot of the number
    of pixels that assume each discrete value that
    the quantized image intensity can take.
  • Color Histogram It holds information on color
    distribution. It is a plot of the statistics of
    the R, G, B components in the 3-D color space.

92
Histograms (cont.)
Most histogram bins are sparsely populated, with
only a small number of bins capturing the
majority of pixel counts.
  • We can use the largest, say 20, bins as the
    representative bins of the histogram.
  • these 20 bins form a chain in the 3-D color
    space.

93
Histograms (cont.)
  • If we can represent such chains using a numerical
    number, then we can index the color images using
    a B-tree.
  • Connecting order Representative bins are sorted
    in ascending order by their distance from the
    origin of the color space.
  • Weighted Perimeter
  • Weighted Angle
  • Format of index key

93
94
Video Content Extraction
  • Other forms of information extraction can be
    employed
  • Close-captioned text
  • Speech recognition
  • Descriptive information from screenplay
  • Key frames that characterize a shot
  • These content information can be associated with
    the video story units.

95
Story Units
  • Shot Frames recorded in one camera operation
    form a shot.
  • Scene One or several related shots are combined
    in a scene.
  • Sequence A series of related scenes forms a
    sequence.
  • Video A video is composed of different story
    units such as shots, scenes, and sequences
    arranged according to some logical structure
    (defined by the screen play).
  • These concepts can be used to organize video data

96
Video Modeling Approaches
  • Physical Feature based Modeling
  • Semantic Content based Modeling

97
Physical Feature based Modeling
  • A Video is represented and indexed based on
    audio-visual features
  • Features can be extracted automatically
  • Queries are formulated in terms of color,
    texture, audio, or motion information
  • Very limited in expressing queries close to human
    thinking
  • It would be difficult to ask for a video clip
    showing the sinking of the ship in the movie
    Titanic using only color description

98
Semantic Content based Modeling
  • Video semantics are captured and organized to
    support video retrieval
  • Difficult to automate
  • Partially rely on manual annotation
  • Capable of supporting natural language like
    queries.

99
Semantic-Level Models
  • Segmentation-based Models
  • Stratification-based Models
  • Temporal Coherent Model

100
Segmentation-based Modeling
  • A video stream is segmented into temporally
    continuous segments
  • Each segment is associated with a description
    which could be natural text, keywords, or other
    kinds of annotation.
  • Disadvantages
  • Lack of flexibility
  • Limited in representing semantics

gt gt
101
Stratification-based
  • We partition the contextual information into
    single events.
  • Each event is associated with a video segment
    called a stratum.
  • Strata can overlap or encompass each other.

102
Temporal Coherent
  • Each event is associated with a set of video
    segments where it happens.
  • More flexible in structuring video semantics.

103
Stratum
The concept of stratification can be used to
assign descriptions to video footage. - Each
stratum refers to a sequence of video frames. -
The strata may overlap or totally encompass each
other.
Advantage Allowing easy retrieval by keyword,
e.g., using inverted index
104
Inverted Index
Inverted Index Inverted Index
ANSI D1, D2
C D3, D4, D5
DB2 D1, D6, D21
GUI D2, D8. D11
SYBASE D2, D6, D17
MULTIMEDIA D5, D15
ORACLE D3, D11, D19
RELATIONAL D2, D3, D11, D19
SQL D2, D11, D20
JAVA D3, D20
Each document entry contains
  • frequency of the term in the document,
  • locations of the term in the document,
  • other information pertaining to the relationship
    of the term and the document.

105
Video Algebra
  • Goal To provide a high-level abstraction that
  • models complex information associated with
    digital video data, and
  • supports content-based access
  • Strategy
  • The algebraic video data model consists of
    hierarchical compositions of video expressions
    with high-level semantic descriptions
  • The video expressions are constructed using video
    algebra operations

106
Presentation
  • The fundamental entity is a presentation
  • A presentation is described by a video
    expression.
  • A video expression describes a multi-window
    spatial-temporal, and content combination of
    video segments.

A presentation
107
Presentation
  • The fundamental entity is a presentation
  • A presentation is described by a video
    expression.
  • A video expression describes a multi-window
    spatial-temporal, and content combination of
    video segments.
  • An algebraic video node provides a means of
    abstraction by which video expressions can be
    named stored, and manipulated as units

Primitive video expression creates a
single-window presentation from a raw video
segment
Compound video expression constructed from
simpler expressions using video algebra operations
An algebraic video node
video expression
video expression
video expression
video expression
Raw video
108
Video Algebra Operations
  • The video algebra operations fall into four
    categories
  • Creation defines the construction of video
    expressions from raw video.
  • Description associates content attributes with a
    video expression.
  • Composition defines temporal relationships
    between component video expressions.
  • Output defines spatial layout and audio output
    for component video expressions.

109
Descriptions
  • description E1 content specifies that E1 is
    described by content.
  • a content is a Boolean combination of attributes
    that consists of a field name and a value.
  • some field names have predefined semantics (e.g.,
    title), while other fields are user-definable.
  • values can assume a variety of types, including
    strings and video node names.
  • field names or values do not have to be unique
    within a description.
  • hide-content E1 defines a presentation that
    hides the content of E1 (i.e.., E1 does not
    contain any description).
  • This operation provides a method for creating
    abstraction barriers for content-based access.

Example title CNN Headline News
110
Composition
The composition operations can be combined to
produce complex scheduling definitions and
constraints.
create a video presentation raw video
segment
C1 create Cnn.HeadlineNews.rv 10 30 C2
create Cnn.HeadlineNews.rv 20 40 C3 create
Cnn.HeadlineNews.rv 32 65 D1 (description
C1 Anchor speaking) D2 (description C2
Professor Smith) D3 (description C3 Economic
reform)
D3 follows D2 which follows D1, and common
footages are not repeated. (It creates a
non-redundant video stream from three overlapping
segments.)
C1
C3
C2
Anchor speaking
Professor Smith
Economic reform
111
Composition Operators (1)
Operator Description
E1 ? E2 defines the presentation where E2 follows E1
E1 È E2 defines the presentation where E2 follows E1 and common footage is not repeated
E1 Ç E2 defines the presentation where only common footage of E1 and E2 is played
E1 - E2 defines the presentation where only footage of E1 that is not in E2 is played
E1 E2 E1 and E2 are played concurrently and terminate simultaneously
(test) ? E1E2...En Ei is played if test evaluates to i.
loop E1 time defines a repetition of E1 for a duration of time
112
Composition Operators (2)
Operator Description
stretch E1 factor sets the duration of the presentation equal to factor times duration of E1 by changing the playback speed of the video segment
limit E1 time sets the duration of the presentation equal to the minimum of time and the duration of E1, but the playback speed is not changed
113
Composition Operators (3)
  • transition E1 E2 type time defines type
    transition effect between E1 and E2 time defines
    the duration of the transition effect
  • The transition type is one of a set of
    transition effects, such as dissolve, fade, and
    wipe.
  • contains E1 query defines the presentation that
    contains component expressions of E1 that match
    query. (similar to FROM clause in SQL)
  • A query is a Boolean combination of attributes
  • Example text smith and text question

114
Output Characteristics
  • Video expressions include output characteristics
    that specify the screen layout and audio output
    for playing back children streams.

A presentation
115
Output Characteristics
  • Video expressions include output characteristics
    that specify the screen layout and audio output
    for playing back children streams.
  • window E1 (X1 , Y1 ) - (X2 , Y2 ) priority
  • specifies that E1 will be displayed with
    priority in the window defined by the top-left
    corner (X1 , Y1) and the bottom-right corner (X2
    , Y2) such that Xi in 0, 1 and Yi in 0, 1.
  • Window priorities are used to resolve overlap
    conflicts of screen display.
  • audio E1 channel force priority
  • specifies that the audio of E1 will be output to
    channel with priority if force is true, then the
    audio operation overrides any channel
    specifications of the component video expressions.

116
Output Characteristics example
  • C1 create MagicvsBulls.rv 300 500
  • P1 window C1 (0, 0) - (0.5, 0.5) 10
  • P2 window C1 (0, 0.5) - (0.5, 1) 20
  • P3 window C1 (0.5, 0.5) - (1, 1) 30
  • P4 window C1 (0.5, 0) - (1, 0.5) 40
  • P5 (P1 P2 P4)
  • P6 (P1 P2 P3 P4)
  • (P5
  • (window
  • (P5 (window P6 (0.5, 0.5) - (1, 1)
    60))
  • (0.5, 0.5) - (1, 1) 50))

Since expressions can be nested, the spatial
layout of any particular video expression is
defined relative to the parent rectangle.
Larger means higher priority
Top-left corner
Bottom-right corner
A presentation
117
Scope of a video node description
  • The scope of a given algebraic video node
    description is the subgraph that originates from
    the node.
  • The components of a video expression inherit
    descriptions by context.
  • i.e., content attributes associated with some
    parent video nodes are also associated with all
    its descendant nodes.

118
Content Based Access
  • Search query Search a collection of video
    nodes for video expressions that match query.
  • Example search text smith AND text
    question
  • Strategy Matching a query to the attributes of
    an expression must take into account all of the
    attributes of that expression including the
    attributes of its encompassing expressions.

Smith on economic reform
Result of the query
?
This node also satisfies the query but is not
returned because its a descendant of a node
already in the result set
Smith
Anchor
O
Question from audience
O
Question
.
Raw video
119
Browsing and Navigation
  • Playback video-expression
  • Plays the video expression. It enables the user
    to view the presentation defined by the
    expressions.
  • Display video-expression
  • Display the video expression. It allows the
    user to inspect the video expression.
  • Get-parent video-expression
  • Returns the set of nodes that directly point to
    video-expression.
  • Get-children video-expression
  • Returns the set of nodes video-expression
    directly points at.

120
Algebraic Video System Prototype
  • The implementation is build on top of three
    existing subsystems
  • The VuSystem is used for managing raw video data
    and for its support of Tcl (Tool command
    language) programming. It provides an environment
    for recording, processing, and playing video.
  • The Semantic File System is used as a storage
    subsystem with content-based access to data for
    indexing and retrieving files that represent
    algebraic video nodes.
  • The Web server provides a graphical interface to
    the system that includes facilities for querying,
    navigating, video editing and composing, and
    invoking the video player.
Write a Comment
User Comments (0)
About PowerShow.com