Title: Spatial and Temporal Databases
1Spatial and Temporal Databases
Efficiently Time Series Matching by Wavelets
(ICDE 98) Kin-pong Chan and Ada Wai-chee Fu
2Table of Contents
- Introduction
- Related Works
- The Proposed Approach
- Overall Strategy
- Performance Evaluation
- Conclusion
3Introduction
- Time-series a sequence of real numbers, each
number representing a value at a time point
(financial data, scientific observation data, ) - Time-series databases supporting fast retrieval
of data and similarity query are desired
4Introduction (cont)
- Similarity Search
- Finds data sequences that differ only slightly
from - the given query sequence
- Example) One may want to find all companies whose
stock price - fluctuations behave similarly with IBM during a
year. - Similarity matching process
- Given
- compute
5Introduction (cont.)
- Indexing
- Dimensionality reduction
- Transformation is applied to reduce dimension
- Completeness
- Nature of data
- Effectiveness of power concentration of a
particular - transformation depends on the nature of the time
series
6Related Works
- Discrete Fourier Transform (Agrawal et al)
- Parsevals theorem
- F-index may raise false alarm, but guarantee no
false dismissal - Disadvantage misses the important feature of
time localization
7Related Works (cont.)
- Singular Value Decomposition decompose a matrix
X of size NM into - Restriction
- X is not updated
- X can be updated daily or monthly. In that case,
SVD has to be recomputed the whole matrix again
to update
8The proposed Approach Similarity Model
- Define new similarity model used in sequence
matching -
9Proposed Approach Haar Wavelet
- Haar wavelet
- Allows a good approximation with a subset of
coefficients - Fast to compute and requires little storage
- It preserves Euclidean distance
10Proposed Approach Haar Wavelet (cont)
- Example of Wavelet Computation
Assume Original time sequence is f(x) (9 7 3 5)
4 (9 7 3 5)
2 (8 4) (1 1)
1 (6) (2)
11Proposed Approach Haar Wavelet (cont)
- Instead of storing 6,2,1 and -1, assume we store
first two coefficient, 6 and 2 - Reconstruction Process
Resolution Average Coefficients
4 (8 8 4 4)
2 (8 4)
(0 0)
1 (6) (2)
Original (9 7 3 5), Reconstructed (8 8 4 4)
We can reduce dimension of the data
with sacrificing the accuracy
12Proposed Approach DFT versus Haar (cont)
- Motivation of replacing DFT with DWT
- Pruning power less false alarm appear in DWT
than DFT - Complexity consideration
- Complexity of Haar is O(n) while O(nlogn) for
Fast Fourier Transform - Note DWT does not require massive index
reorganization in case of update, which is a
major drawback of SVD
13Proposed ApproachGuarantee of no False Dismissal
- No qualified time sequence will be rejected, thus
no false dismissal - They show that this property holds for the Haar
wavelet - where
14The Overall Strategy
- Pre-processing
- Similarity Model Selection
- User can select Euclidean distance or v-shift
similarity - Haar wavelet transform is applied to time-series
- Index Construction
- Index structure such as R-tree is built using
first few coefficients - Range Query
- Nearest Neighbor Query
15Experimental Results
16Experimental Results (cont.)
17Conclusion
- Efficient time series matching through dimension
reduction by Haar wavelet transform - Outperforms DFT in terms of pruning power,
scalability and complexity