Efficient Elastic Burst Detection in Data Streams - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Efficient Elastic Burst Detection in Data Streams

Description:

The potential burst is detected at the level i 1 in the SWT ... A novel data structure for efficient detection of elastic bursts and other aggregates. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 24
Provided by: away7
Category:

less

Transcript and Presenter's Notes

Title: Efficient Elastic Burst Detection in Data Streams


1
Efficient Elastic Burst Detection in Data Streams
  • Yunyue Zhu and Dennis Shasha
  • Department of Computer Science
  • Courant Institute of Mathematical Sciences
  • New York University

SIGKDD 2003
2
Abstract
  • Burst detection
  • Find abnormal aggregates in data streams
  • Sliding window
  • In some applications, we want to monitor many
    sliding window sizes simultaneously.
  • Brute force O(n2)
  • Shifted Wavelet Tree near linear time

3
Problem Statement
  • For a time series x1, x2, , xn, given a set of
    window sizes w1, w2, , wm, an aggregate function
    F and threshold associated with each window size,
    f(wj), j 1, 2, , m
  • Monitoring elastics window aggregates of the time
    series is to find all the subsequences of all the
    window sizes such that the aggregate applied to
    the subsequences cross their window sizes'
    thresholds, i.e.

4
Wavelet Tree
  • Haar Wavelet Tree
  • Level 0 original time series
  • Level 1 pair wise averages and differences of
    the adjacent data items at level 0
  • Level i pair wise averages and differences on
    averages at level i-1
  • The wavelet coefficients can represent the trend
    of the time series.

5
Wavelet Tree (cont.)
  • Wavelet coefficient ? Aggregate
  • Average and difference ? Sum
  • Problem the windows at the same level are
    non-overlapping

6
Shifted Wavelet Tree
  • Add additional line of windows
  • They can be maintained explicitly or implicitly.

7
Shifted Wavelet Tree (cont.)
  • Any subsequence of length w, w ? 2i is included
    in one of the windows at level i 1 of the SWT.
  • We say that windows with size w, 2i -1 lt w ? 2i ,
    are monitored by level i 1 of the SWT.

Level 4
Level 3
7
3
8
SWT Construction
  • For each level i (i ? 1)
  • Compute the pair wise aggregate (sum) for each
    two consecutive data items at level i-1
  • Downsampling
  • sampling every second item in the series of
    aggregates ? the input for the higher level in
    the SWT
  • O(n), n time series length

9
Search for a Burst
  • Given window size w ? 2i, threshold f(w)
  • Search in two stages
  • The potential burst is detected at the level i1
    in the SWT
  • Detailed search in those subsequences of size 2i
    with sum ? f(w)
  • O(k), k alarms (output size)

10
Streaming Algorithm
  • Assume that new data becomes available at every
    time unit.
  • The set of window sizes are 2L lt w1 lt w2 lt lt wm
    lt 2U.
  • Maintain the levels from L2 to U1 of the SWT
    that monitor those windows.
  • Two methods
  • Online algorithm
  • Batch algorithm

11
Streaming AlgorithmOnline Algorithm
  • Whenever a new data item becomes available
  • Update those 2(U-L) aggregates of the windows in
    the SWT.
  • If the aggregate at level i exceeds di , perform
    a detailed search on those windows monitored by
    i.
  • For level i, threshold di min f(wj), 2i-2 lt wj
    ? 2i-1
  • Response time one time unit

12
Streaming AlgorithmBatch Algorithm
  • Maintain the aggregates at level L1
  • The aggregate in the most recently completed
    window of level L1 is updated every time unit.
  • An aggregate of a window at the upper levels will
    not be computed until all the data in that window
    are available.
  • Once an aggregate at a certain upper level is
    updated, we also check alarms for time intervals
    monitored by that level.
  • Higher throughput, longer response time.

13
Other Aggregates
  • The monitoring of many other aggregates based on
    elastic windows could benefit from our data
    structure, as long as the following conditions
    holds.
  • 1. The aggregate F is monotonically increasing or
    decreasing with respect to the window. e.g.
  • Max, Count ? monotonically increasing
  • Min ? monotonically increasing
  • 2. The alarm domain is one sided, that is,
  • monotonic increasing ? threshold, 8)
  • monotonic decreasing ? (-8, threshold

14
Extension to Two Dimensions
  • The problem is to report the positions of spatial
    sliding windows (rectangle regions) having
    different sizes, within which the density exceeds
    some predefined threshold.
  • Using the same techniques of SWT-1D.

Wavelet Tree 2D
Shifted Wavelet Tree 2D
15
Effectiveness Study
  • Bursts of the number of times that countries were
    mentioned in the presidential speech of the state
    of the union.

16
Effectiveness Study (cont.)
  • A predefined sliding window size is insufficient.
  • Bursts at large time scales are not necessarily
    reflected at smaller time scales.
  • may be composed of many consecutive bumps"

17
Effectiveness Study (cont.)
  • Bursts in population distribution data (1990)
  • Window sizes 1x1, 2x2 and 5x5 in
    Latitude/Longitude

18
Performance Study
  • Experiments on a 1.5GHz Pentium 4 PC with 512 MB
    of main memory running Windows 2000.
  • Datasets
  • The Gamma Ray data set
  • 12 hours of data from a small region of the sky,
    where Gamma Ray bursts were actually reported
  • The data are time series of the number of photons
    observed (events) every 0.1 second.
  • Totally 19,015 events in this time series
  • The NYSE TAQ Stock data set
  • Tick-by-tick trading activities of the IBM stock
    between July 1st, 1998 and July 1st, 2002.
  • 5,331,145 trading records (ticks)
  • Each record contains trading time, trading price
    and trading volume.

19
Performance Study (cont.)
  • Training threshold
  • Use the first few hours of Gamma Ray data and the
    first year of Stock data as training data.
  • For a window of size w, we compute the aggregates
    on the training data with sliding window of size
    w gt ? y
  • f(w) avg(? y) ?std(? y)
  • Window sizes 5, 10, ,5 Nw time units
  • Nw windows, varies from 5 to 50
  • Time units 0.1 sec for the Gamma Ray data, and 1
    min for the stock data.

20
Performance Study (cont.)
  • The processing time of our algorithm is
    output-dependent.

21
Performance Study (cont.)
  • Experiments on stock data

22
Performance Study (cont.)
  • Use spread as aggregate function

23
Conclusion and Future Work
  • This paper introduces elastic window model and
    demonstrates the desirability of the new model.
  • A novel data structure for efficient detection of
    elastic bursts and other aggregates.
  • Experiments show that our algorithm is faster
    than a brute force algorithm by several orders of
    magnitude.
  • Future work
  • A robust way of setting the thresholds
  • Non-monotonic aggregates
Write a Comment
User Comments (0)
About PowerShow.com