Mining Multiple Data Streams - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Mining Multiple Data Streams

Description:

Trend Related Analysis, You would like to... identify the stocks that move in a similar trend. ... You can answer trend-related questions by examining these structures ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 16
Provided by: bul7
Category:

less

Transcript and Presenter's Notes

Title: Mining Multiple Data Streams


1
Mining Multiple Data Streams
  • Ahmet Bulut
  • Department of Computer Science
  • University of California, Santa Barbara

2
Stock Market

3
Trend Related Analysis, You would like to...
  • identify the stocks that move in a similar trend.
  • find the stocks that move similar to your stock
  • know aggregates related to your stock
  • AVG closing price in time interval (t1,t1?)
  • NUM days where daily_volumegt threshold

4
Single Stream
  • Find a method to keep statistics of a stream,
    that is
  • space efficient Storing the stream in memory is
    a no-no
  • run-time efficient Incremental, low update cost
  • adaptive Dynamically adapting to changing
    conditions

5
Summarization using Wavelet based Approximation
Tree
6
Is it a method that we are looking for?
  • space efficient For a stream of size O(N), it
    keeps O(log N) summaries (approximations)
  • run-time efficient For each incoming data, the
    amortized update cost is O(1)
  • adaptive We can change the number of summaries
    at each level, increase the resolution

7
Multiple Streams
  • Keep a summary structure for each of the streams
  • You can answer trend-related questions by
    examining these structures
  • However, a way to query the summaries ,
    irrelevant of the individual streams is useful

8
Stardust (R a DUet with STreams)
S3
S2
S1
Level 3
Level 2
Level 1
Level 0
R-tree 0
R-tree 1
9
Is this a method? ?
  • space-efficient Underlying structure is space
    efficient
  • run-time efficient M Streams, O(M) R Insertion
  • adaptiveWe can change the lowest level to start
    keeping summaries

10
Competing Technique
  • Similarity Search in Sequence Databases Agrawal
    et al. 1993
  • Fast Subsequence Matching Faloutsos et al. 1994
  • We keep a single index structure for all the
    streams
  • When (a new value arrives)
  • Apply DFT on the window covering new value.
  • Keep only a few of the coefficients
  • Trail on the reduced dimensional space
  • Put a fixed number of points in a box I-fixed
  • Use marginal cost for inclusion in a box
    I-adaptive

11
Range Queries (radius ?)
  • Search(radius)
  • Apply DFT, keep a few of the coefficients
  • Do range query with radius ?
  • Post Process
  • Prefix Search
  • Queries longer than window
  • Extract the prefix of window size
  • Search(?)
  • Multi-piece Search
  • Queries multiple of window, say k
  • Extract k window size portions
  • Search(?/?k))

12
Search in Stardust
  • Prefix Search has the same flavor
  • Multi-piece Search

?/?2
?/2
?/2?2
13
Experimental Results
14
Experimental Results contd
15
Future Work
  • Find correlations between streams
  • Thank the audience!
  • Take a vacation...
Write a Comment
User Comments (0)
About PowerShow.com