Time Series Indexing II - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Time Series Indexing II

Description:

GEMINI Range Queries. Build an index for the database in a feature space using an R-tree ... is great - but, how about compressing opera? ( baritone, silence, ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 45
Provided by: ValuedSony2
Learn more at: https://www.cs.bu.edu
Category:

less

Transcript and Presenter's Notes

Title: Time Series Indexing II


1
Time Series Indexing II
2
Outline
  • Spatial Databases
  • Temporal Databases
  • Spatio-temporal Databases
  • Multimedia Databases
  • Text databases
  • Time Series databases
  • Image and video databases
  • Data Mining

3
Time Series Data
A time series is a collection of observations
made sequentially in time.
25.1750 25.1750 25.2250 25.2500
25.2500 25.2750 25.3250 25.3500
25.3500 25.4000 25.4000 25.3250
25.2250 25.2000 25.1750 .. ..
24.6250 24.6750 24.6750 24.6250
24.6250 24.6250 24.6750 24.7500
value axis
time axis
4
TS Databases
  • A Time Series Database stores a large number of
    time series
  • Similarity queries
  • Exact match or sub-sequence match
  • Range or nearest neighbor
  • But first we should define the similarity model
  • E.g. D(X,Y) for X x1, x2, , xn , and Y y1,
    y2, , yn

5
Similarity Models
  • Euclidean and Lp based
  • Time Warping, Edit Distance and LCS based
  • Probabilistic (using Markov Models)
  • Landmarks
  • The appropriate similarity model depends on the
    application

6
Euclidean model
7
Similarity Retrieval
  • Range Query
  • Find all time series S where
  • Nearest Neighbor query
  • Find all the k most similar time series to Q
  • A method to answer the above queries Linear scan
    very slow
  • A better approach GEMINI

8
GEMINI
  • Solution Quick-and-dirty' filter
  • extract m features (numbers, eg., avg., etc.)
  • map into a point in m-d feature space
  • organize points with off-the-shelf spatial access
    method (SAM)
  • retrieve the answer using the SAM
  • discard false alarms

9
GEMINI Range Queries
  • Build an index for the database in a feature
    space using an R-tree
  • Algorithm RangeQuery(Q, e)
  • Project the query Q into a point in the feature
    space
  • Find all candidate objects in the index within e
  • Retrieve from disk the actual sequences
  • Compute the actual distances and discard false
    alarms

10
GEMINI NN Query
  • Algorithm K_NNQuery(Q, K)
  • Project the query Q in the same feature space
  • Find the candidate K nearest neighbors in the
    index
  • Retrieve from disk the actual sequences pointed
    to by the candidates
  • Compute the actual distances and record the
    maximum
  • Issue a RangeQuery(Q, emax)
  • Compute the actual distances, keep K

11
GEMINI
  • GEMINI works when
  • Dfeature(F(x), F(y)) lt D(x, y)
  • Proof. (see book)
  • Note that, the closer the feature distance to the
    actual one, the better.

12
Problem
  • How to extract the features? How to define the
    feature space?
  • Fourier transform
  • Wavelets transform
  • Averages of segments (Histograms or APCA)

13
Fourier transform
  • DFT (Discrete Fourier Transform)
  • Transform the data from the time domain to the
    frequency domain
  • highlights the periodicities
  • SO?

14
DFT
  • A several real sequences are periodic
  • Q Such as?
  • A
  • sales patterns follow seasons
  • economy follows 50-year cycle (or 10?)
  • temperature follows daily and yearly cycles
  • Many real signals follow (multiple) cycles

15
How does it work?
  • Decomposes signal to a sum of sine (and cosine)
    waves.
  • QHow to assess similarity of x with a wave?

value
x x0, x1, ... xn-1
time
0
n-1
1
16
How does it work?
  • A consider the waves with frequency 0, 1, ...
    use the inner-product (cosine similarity)

17
How does it work?
  • A consider the waves with frequency 0, 1, ...
    use the inner-product (cosine similarity)

18
How does it work?
  • basis functions

cosine, f1
sine, freq 1
0
n-1
1
cosine, f2
sine, freq 2
0
n-1
1
0
n-1
1
19
How does it work?
  • Basis functions are actually n-dim vectors,
    orthogonal to each other
  • similarity of x with each of them inner
    product
  • DFT all the similarities of x with the basis
    functions

20
How does it work?
  • Since ejf cos(f) j sin(f) (jsqrt(-1)),
  • we finally have

21
DFT definition
  • Discrete Fourier Transform (n-point)

inverse DFT
22
DFT definition
  • Good news Available in all symbolic math
    packages, eg., in mathematica
  • x 1,2,1,2
  • X Fourierx
  • Plot AbsX

23
DFT properties
  • Observation - SYMMETRY property
  • Xf (Xn-f )
  • ( complex conjugate (a b j) a - b j )
  • Thus we use only the first half numbers

24
DFT Amplitude spectrum
  • Amplitude
  • Intuition strength of frequency f

count
Af
freq 12
freq. f
time
25
DFT Amplitude spectrum
  • excellent approximation, with only 2 frequencies!
  • so what?

26
DFT Amplitude spectrum
  • excellent approximation, with only 2 frequencies!
  • so what?
  • A1 compression
  • A2 pattern discovery
  • A3 forecasting

27
DFT Parsevals theorem
  • sum( xt 2 ) sum ( X f 2 )
  • Ie., DFT preserves the energy
  • or, alternatively it does an axis rotation

x1
x x0, x1
x0
28
Lower Bounding lemma
  • Using Parsevals theorem we can prove the lower
    bounding property!
  • So, apply DFT to each time series, keep first
    3-10 coefficients as a vector and use an R-tree
    to index the vectors
  • R-tree works with euclidean distance, OK.

29
Wavelets - DWT
  • DFT is great - but, how about compressing opera?
    (baritone, silence, soprano?)

value
time
30
Wavelets - DWT
  • Solution1 Short window Fourier transform
  • But how short should be the window?

31
Wavelets - DWT
  • Answer multiple window sizes! -gt DWT

32
Haar Wavelets
  • subtract sum of left half from right half
  • repeat recursively for quarters, eightths ...

33
Wavelets - construction
  • x0 x1 x2 x3 x4 x5 x6 x7

34
Wavelets - construction
s1,0
.......
s1,1
d1,1
level 1
d1,0

-
  • x0 x1 x2 x3 x4 x5 x6 x7

35
Wavelets - construction
s2,0
level 2
d2,0
s1,0
.......
s1,1
d1,1
d1,0

-
  • x0 x1 x2 x3 x4 x5 x6 x7

36
Wavelets - construction
etc ...
s2,0
d2,0
s1,0
.......
s1,1
d1,1
d1,0

-
  • x0 x1 x2 x3 x4 x5 x6 x7

37
Wavelets - construction
Q map each coefficient on the time-freq. plane
f
s2,0
d2,0
t
s1,0
.......
s1,1
d1,1
d1,0

-
  • x0 x1 x2 x3 x4 x5 x6 x7

38
Wavelets - construction
Q map each coefficient on the time-freq. plane
f
s2,0
d2,0
t
s1,0
.......
s1,1
d1,1
d1,0

-
  • x0 x1 x2 x3 x4 x5 x6 x7

39
Wavelets - Drill
  • Q baritone/silence/soprano - DWT?

f
t
40
Wavelets - Drill
  • Q baritone/soprano - DWT?

f
t
41
Wavelets - construction
  • Observation1
  • can be some weighted addition
  • - is the corresponding weighted difference
    (Quadrature mirror filters)
  • Observation2 unlike DFT/DCT,
  • there are many wavelet bases Haar,
    Daubechies-4, Daubechies-6, ...

42
Advantages of Wavelets
  • Better compression (better RMSE with same number
    of coefficients)
  • closely related to the processing of the
    mammalian eye and ear
  • Good for progressive transmission
  • handle spikes well
  • usually, fast to compute (O(n)!)

43
Feature space
  • Keep the d most important wavelets coefficients
  • Normalize and keep the largest
  • Lower bounding lemma the same as DFT

44
PAA and APCA
  • Another approach segment the time series into
    equal parts, store the average value for each
    part.
  • Use an index to store the averages and the
    segment end points
Write a Comment
User Comments (0)
About PowerShow.com