Multimedia DBs - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Multimedia DBs

Description:

Agrawal, Faloutsos, Swami 1993. Chan & Fu 1999. eigenwave 0. eigenwave 1. eigenwave 2 ... what is an image? A: 2-d array. Images - color. Color histograms, and ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 38
Provided by: gkol
Learn more at: https://www.cs.bu.edu
Category:
Tags: dbs | is | multimedia | swami | what

less

Transcript and Presenter's Notes

Title: Multimedia DBs


1
Multimedia DBs
2
Time Series Data
A time series is a collection of observations
made sequentially in time.
25.1750 25.1750 25.2250 25.2500
25.2500 25.2750 25.3250 25.3500
25.3500 25.4000 25.4000 25.3250
25.2250 25.2000 25.1750 .. ..
24.6250 24.6750 24.6750 24.6250
24.6250 24.6250 24.6750 24.7500
value axis
time axis
3
PAA and APCA
  • Feature extraction for GEMINI
  • Fourier
  • Wavelets
  • Another approach segment the time series into
    equal parts, store the average value for each
    part.
  • Use an index to store the averages and the
    segment end points

4
Feature Spaces
Korn, Jagadish, Faloutsos 1997
Chan Fu 1999
Agrawal, Faloutsos, Swami 1993
5
Piecewise Aggregate Approximation (PAA)
Original time series (n-dimensional
vector) Ss1, s2, , sn
n-segment PAA representation (n-d vector) S
sv1 , sv2, , svn
PAA representation satisfies the lower bounding
lemma (Keogh, Chakrabarti, Mehrotra and Pazzani,
2000 Yi and Faloutsos 2000)
6
Can we improve upon PAA?
n-segment PAA representation (n-d vector) S
sv1 , sv2, , svN
7
APCA approximates original signal better than PAA
Improvement factor
3.77
1.69
1.21
1.03
3.02
1.75
8
APCA Representation can be computed efficiently
  • Near-optimal representation can be computed in
    O(nlog(n)) time
  • Optimal representation can be computed in O(n2M)
    (Koudas et al.)

9
Distance Measure
Lower bounding distance DLB(Q,S)
10
Index on 2M-dimensional APCA space
Any feature-based index structure can used (e.g.,
R-tree, X-tree, Hybrid Tree)
11
k-nearest neighbor Algorithm
For any node U of the index structure with MBR R,
MINDIST(Q,R) D(Q,S) for any data item S under U
12
Index Modification for MINDIST Computation
APCA point S sv1, sr1, sv2, sr2, , svM, srM
R1
S2
S5
sv3
R3
S3
S1
S6
S4
sv1
R2
S8
R4
sv2
S9
sv4
S7
sr2
sr3
sr1
sr4
APCA rectangle S (L,H) where L smin1, sr1,
smin2, sr2, , sminM, srM and H smax1, sr1,
smax2, sr2, , smaxM, srM
13
MBR Representation in time-value space
We can view the MBR R(L,H) of any node U as two
APCA representations L l1, l2, , l(N-1), lN
and H h1, h2, , h(N-1), hN
14
Regions
M regions associated with each MBR boundaries of
ith region
15
Regions
  • ith region is active at time instant t if it
    spans across t
  • The value st of any time series S under node U at
    time instant t must lie in one of the regions
    active at t (Lemma 2)

REGION 2
h3
value axis
l3
h1
REGION 3
h5
l1
l5
REGION 1
l2
l4
h4
h6
h2
l6
time axis
16
MINDIST Computation
For time instant t, MINDIST(Q, R, t) minregion
G active at t MINDIST(Q,G,t)
MINDIST(Q,R,t1) min(MINDIST(Q, Region1, t1),
MINDIST(Q, Region2, t1)) min((qt1 - h1)2 ,
(qt1 - h3)2 ) (qt1 - h1)2
REGION 2
h3
l3
h1
REGION 3
h5
l1
l5
REGION 1
l2
l4
h4
h6
h2
l6
Lemma3 MINDIST(Q,R) D(Q,C) for any time series
C under node U
17
Approximate Search
  • A simpler definition of the distance in the
    feature space is the following
  • But there is one problem what?

DLB(Q,S)
18
Multimedia dbs
  • A multimedia database stores also images
  • Again similarity queries (content based
    retrieval)
  • Extract features, index in feature space, answer
    similarity queries using GEMINI
  • Again, average values help!

19
Images - color
what is an image? A 2-d array
20
Images - color
Color histograms, and distance function
21
Images - color
Mathematically, the distance function is
22
Images - color
  • Problem cross-talk
  • Features are not orthogonal -gt
  • SAMs will not work properly
  • Q what to do?
  • A feature-extraction question

23
Images - color
  • possible answers
  • avg red, avg green, avg blue
  • it turns out that this lower-bounds the histogram
    distance -gt
  • no cross-talk
  • SAMs are applicable

24
Images - color
time
performance
seq scan
w/ avg RGB
selectivity
25
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q how to normalize them?

26
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q how to normalize them?
  • A divide by standard deviation)

27
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q other features / distance functions?

28
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q other features / distance functions?
  • A1 turning angle
  • A2 dilations/erosions
  • A3 ... )

29
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • Q how to do dim. reduction?

30
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • Q how to do dim. reduction?
  • A Karhunen-Loeve ( centered PCA/SVD)

31
Images shapes
  • Performance 10x faster

log( of I/Os)
all kept
of features kept
32
Is d(u,v) sqrt ((u-v)TA(u-v) ) a metric?
  • xTAx S xixjAij S ?ixi2
  • ?i is the ith eigenvalue
  • xi is the projection of x along the ith
    eigenvector
  • d(u,v) sqrt ((u-v)TA(u-v) ) sqrt (S
    ?i(ui-vi)2 )
  • d(u,v) gt 0, d(u,u) 0, d(u,v) d(v,u)
  • d(u,w) lt d(u,v) d(v,w), provided
  • sqrt (S ?i(ui-wi)2 ) lt sqrt (S ?i(ui-vi)2 )
    sqrt(S ?i(vi-wi)2 )
  • sqrt(S (v?i ui- v?iwi)2 ) lt sqrt(S (v?iui-
    v?ivi)2 ) sqrt(S(v?ivi- v?iwi)2 )
  • Metric condition for Lp norm

33
Filtering in QBIC
  • Histogram column vectors x, y of length n
  • S xi 1, S yi 1
  • Difference z (x-y)
  • S zi 0
  • Contribution of each color bin to a smaller set
    of colors
  • VT (c1, c2,.., cn), each ci is a column vector
    of length 3
  • xavg VT x, yavg Vty, column vectors of length
    3

34
Filtering in QBIC
  • Distances
  • davg2 (xavg - yavg)T(xavg - yavg) (VT
    z)T(VT z) zTVVt z zTW z
  • dhist2 zTA z
  • dhist2 gt ?1davg2 , where ?1 is the smallest
    eigenvalue of
  • Az ?Wz

35
Filtering in QBIC
  • Rewrite z to remove the extra condition that S zi
    0.
  • z becomes a (n-1) dimensional column vector
  • zTA z zTA z and zTW z zTW z
  • A and W are (n-1)x(n-1) matrices
  • Show that zTA z gt ?1zTW z

36
Proof of zTA z gt ?1zTW z
  • Minimize wrt z, zTA z, subject to the
    constraint zTW z C.
  • Same as minimizing wrt z,
  • zTA z - ?(zTW z - C)
  • Differentiate wrt z and set to 0
  • Az ?W z
  • ? and z must be eigenvalues and eigenvectors
    resp. of
  • Az ?W z

37
Proof of zTA z gt ?1zTW z
  • zTA z ?zTW z ?C
  • To minimize zTA z , we must choose the
    smallest eigenvalue ?1.
  • The minimization of zTA z, under z, subject
    to the constraint zTW z C equals ?1C
  • If zTW z C gt 0 then
  • zTA z gt ?1C
  • If zTW z 0 then
  • zTA z gt 0, A is positive semi-definite
Write a Comment
User Comments (0)
About PowerShow.com