Online Pattern Discovery Applications in Data Streams - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Online Pattern Discovery Applications in Data Streams

Description:

Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 10
Provided by: Yun125
Category:

less

Transcript and Presenter's Notes

Title: Online Pattern Discovery Applications in Data Streams


1
Online Pattern Discovery Applications in Data
Streams
  • Sensor-less Pairs-trading in stock trading (find
    highly correlated pairs in n log n time)
  • Sensor-full Gamma Ray Detection in astrophysics
    (burst detection over a large number of window
    sizes in almost linear time)
  • Dennis Shasha (joint work with Yunyue Zhu)
  • yunyue,shasha_at_cs.nyu.edu

2
Application 1 Pairs Trading
  • Stock prices streams
  • The New York Stock Exchange (NYSE)
  • 50,000 securities (streams) 100,000 ticks (trade
    and quote)
  • Pairs Trading, a.k.a. Correlation Trading
  • Querywhich pairs of stocks were correlated with
    a value of over 0.9 for the last three hours?

XYZ and ABC have been correlated with a
correlation of 0.95 for the last three hours. Now
XYZ and ABC become less correlated as XYZ goes up
and ABC goes down. They should converge back
later. I will sell XYZ and buy ABC
3
Online Detection of High Correlation
  • Given tens of thousands of high speed time series
    data streams, to detect high-value correlation,
    including synchronized and time-lagged, over
    sliding windows in real time.
  • Real time
  • high update frequency of the data stream
  • fixed response time, online

4
StatStream Algorithm
  • Naive algorithm
  • N number of streams
  • w size of sliding window
  • space O(N) and time O(N2w) VS space O(N2) and
    time O(N2) .
  • Suppose that the streams are updated every
    second.
  • With a Pentium 4 PC, the exact method can
    monitor only 700 streams with a delay of 2
    minutes.
  • Our Approach
  • Discrete Fourier Transform to approximate
    correlation
  • grid structure to filter out unlikely pairs
  • Our approach can monitor 10,000 streams with a
    delay of 2 minutes.

5
StatStream Stream synoptic data structure
  • Three level time interval hierarchy
  • Time point, Basic window, Sliding window
  • Basic window (the key to our technique)
  • The computation for basic window i must finish by
    the end of the basic window i1
  • The basic window time is the system response
    time.
  • Digests

Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs
6
Application 2 elastic burst detection
  • Discover time intervals with an unusually large
    numbers of events.
  • In astrophysics, the sky is constantly observed
    for high-energy particles. When a particular
    astrophysical event happens, a shower of
    high-energy particles arrives in addition to the
    background noise.
  • In finance, stocks with unusual high trading
    volumes should attract the notice of traders (or
    perhaps regulators).
  • Challenge to discover time and duration of
    burst, which may vary
  • In astrophysics, a burst of high-energy particles
    associated with a special event might last for a
    few milliseconds or a few hours or even a few
    days
  • NB Similar idea may apply to spatial burst
    detection.

7
Application 2 burst detection
  • example

8
Burst Detection Problem Statement
  • ProblemGiven a time series of positive number
    x1, x2,..., xn, and a threshold function f(w),
    w1,2,...,n, find the subsequences of any size
    such that their sums are above the thresholds
  • all 0ltwltn, 0ltmltn-w, such that xm xm1 xmw-1
    gt f(w)
  • Brute force search O(n2) time
  • Our shift wavelet tree (SWT) O(nk) time.
  • k is the size of the output, i.e. the number of
    windows with bursts

9
Burst Detection Data Structure and Algorithm
  • Lemma 1any subsequence s is included by one
    window w in the SWT.
  • Lemma 2 if Sum(s)gtthreshold, then
    Sum(w)gtthreshold (no false positives).
Write a Comment
User Comments (0)
About PowerShow.com