Title: Multimedia DBs
1Multimedia DBs
2Time Series Data
A time series is a collection of observations
made sequentially in time.
25.1750 25.1750 25.2250 25.2500
25.2500 25.2750 25.3250 25.3500
25.3500 25.4000 25.4000 25.3250
25.2250 25.2000 25.1750 .. ..
24.6250 24.6750 24.6750 24.6250
24.6250 24.6250 24.6750 24.7500
value axis
time axis
3PAA and APCA
- Feature extraction for GEMINI
- Fourier
- Wavelets
- Another approach segment the time series into
equal parts, store the average value for each
part. - Use an index to store the averages and the
segment end points
4Feature Spaces
Korn, Jagadish, Faloutsos 1997
Chan Fu 1999
Agrawal, Faloutsos, Swami 1993
5Piecewise Aggregate Approximation (PAA)
Original time series (n-dimensional
vector) Ss1, s2, , sn
n-segment PAA representation (n-d vector) S
sv1 , sv2, , svn
PAA representation satisfies the lower bounding
lemma (Keogh, Chakrabarti, Mehrotra and Pazzani,
2000 Yi and Faloutsos 2000)
6Can we improve upon PAA?
n-segment PAA representation (n-d vector) S
sv1 , sv2, , svN
7APCA approximates original signal better than PAA
Improvement factor
3.77
1.69
1.21
1.03
3.02
1.75
8APCA Representation can be computed efficiently
- Near-optimal representation can be computed in
O(nlog(n)) time - Optimal representation can be computed in O(n2M)
(Koudas et al.)
9Distance Measure
Lower bounding distance DLB(Q,S)
10Index on 2M-dimensional APCA space
Any feature-based index structure can used (e.g.,
R-tree, X-tree, Hybrid Tree)
11k-nearest neighbor Algorithm
For any node U of the index structure with MBR R,
MINDIST(Q,R) D(Q,S) for any data item S under U
12Index Modification for MINDIST Computation
APCA point S sv1, sr1, sv2, sr2, , svM, srM
R1
S2
S5
sv3
R3
S3
S1
S6
S4
sv1
R2
S8
R4
sv2
S9
sv4
S7
sr2
sr3
sr1
sr4
APCA rectangle S (L,H) where L smin1, sr1,
smin2, sr2, , sminM, srM and H smax1, sr1,
smax2, sr2, , smaxM, srM
13MBR Representation in time-value space
We can view the MBR R(L,H) of any node U as two
APCA representations L l1, l2, , l(N-1), lN
and H h1, h2, , h(N-1), hN
14Regions
M regions associated with each MBR boundaries of
ith region
15Regions
- ith region is active at time instant t if it
spans across t - The value st of any time series S under node U at
time instant t must lie in one of the regions
active at t (Lemma 2)
REGION 2
h3
value axis
l3
h1
REGION 3
h5
l1
l5
REGION 1
l2
l4
h4
h6
h2
l6
time axis
16MINDIST Computation
For time instant t, MINDIST(Q, R, t) minregion
G active at t MINDIST(Q,G,t)
MINDIST(Q,R,t1) min(MINDIST(Q, Region1, t1),
MINDIST(Q, Region2, t1)) min((qt1 - h1)2 ,
(qt1 - h3)2 ) (qt1 - h1)2
REGION 2
h3
l3
h1
REGION 3
h5
l1
l5
REGION 1
l2
l4
h4
h6
h2
l6
Lemma3 MINDIST(Q,R) D(Q,C) for any time series
C under node U
17Approximate Search
- A simpler definition of the distance in the
feature space is the following - But there is one problem what?
DLB(Q,S)
18Multimedia dbs
- A multimedia database stores also images
- Again similarity queries (content based
retrieval) - Extract features, index in feature space, answer
similarity queries using GEMINI - Again, average values help!
19Images - color
what is an image? A 2-d array
20Images - color
Color histograms, and distance function
21Images - color
Mathematically, the distance function is
22Images - color
- Problem cross-talk
- Features are not orthogonal -gt
- SAMs will not work properly
- Q what to do?
- A feature-extraction question
23Images - color
- possible answers
- avg red, avg green, avg blue
- it turns out that this lower-bounds the histogram
distance -gt - no cross-talk
- SAMs are applicable
24Images - color
time
performance
seq scan
w/ avg RGB
selectivity
25Images - shapes
- distance function Euclidean, on the area,
perimeter, and 20 moments - (Q how to normalize them?
26Images - shapes
- distance function Euclidean, on the area,
perimeter, and 20 moments - (Q how to normalize them?
- A divide by standard deviation)
27Images - shapes
- distance function Euclidean, on the area,
perimeter, and 20 moments - (Q other features / distance functions?
28Images - shapes
- distance function Euclidean, on the area,
perimeter, and 20 moments - (Q other features / distance functions?
- A1 turning angle
- A2 dilations/erosions
- A3 ... )
29Images - shapes
- distance function Euclidean, on the area,
perimeter, and 20 moments - Q how to do dim. reduction?
30Images - shapes
- distance function Euclidean, on the area,
perimeter, and 20 moments - Q how to do dim. reduction?
- A Karhunen-Loeve ( centered PCA/SVD)
31Images shapes
log( of I/Os)
all kept
of features kept
32Is d(u,v) sqrt ((u-v)TA(u-v) ) a metric?
- xTAx S xixjAij S ?ixi2
- ?i is the ith eigenvalue
- xi is the projection of x along the ith
eigenvector - d(u,v) sqrt ((u-v)TA(u-v) ) sqrt (S
?i(ui-vi)2 ) - d(u,v) gt 0, d(u,u) 0, d(u,v) d(v,u)
- d(u,w) lt d(u,v) d(v,w), provided
- sqrt (S ?i(ui-wi)2 ) lt sqrt (S ?i(ui-vi)2 )
sqrt(S ?i(vi-wi)2 ) - sqrt(S (v?i ui- v?iwi)2 ) lt sqrt(S (v?iui-
v?ivi)2 ) sqrt(S(v?ivi- v?iwi)2 ) - Metric condition for Lp norm
33Filtering in QBIC
- Histogram column vectors x, y of length n
- S xi 1, S yi 1
- Difference z (x-y)
- S zi 0
- Contribution of each color bin to a smaller set
of colors - VT (c1, c2,.., cn), each ci is a column vector
of length 3 - xavg VT x, yavg Vty, column vectors of length
3
34Filtering in QBIC
- Distances
- davg2 (xavg - yavg)T(xavg - yavg) (VT
z)T(VT z) zTVVt z zTW z - dhist2 zTA z
- dhist2 gt ?1davg2 , where ?1 is the smallest
eigenvalue of - Az ?Wz
35Filtering in QBIC
- Rewrite z to remove the extra condition that S zi
0. - z becomes a (n-1) dimensional column vector
- zTA z zTA z and zTW z zTW z
- A and W are (n-1)x(n-1) matrices
- Show that zTA z gt ?1zTW z
36Proof of zTA z gt ?1zTW z
- Minimize wrt z, zTA z, subject to the
constraint zTW z C. - Same as minimizing wrt z,
- zTA z - ?(zTW z - C)
- Differentiate wrt z and set to 0
- Az ?W z
- ? and z must be eigenvalues and eigenvectors
resp. of - Az ?W z
37Proof of zTA z gt ?1zTW z
- zTA z ?zTW z ?C
- To minimize zTA z , we must choose the
smallest eigenvalue ?1. - The minimization of zTA z, under z, subject
to the constraint zTW z C equals ?1C - If zTW z C gt 0 then
- zTA z gt ?1C
- If zTW z 0 then
- zTA z gt 0, A is positive semi-definite