On-the-Fly Sharing for Streamed Aggregation - PowerPoint PPT Presentation

About This Presentation
Title:

On-the-Fly Sharing for Streamed Aggregation

Description:

On-the-Fly Sharing for Streamed Aggregation. Sailesh Krishnamurthy, Chung Wu , ... On-the-fly sharing for streamed aggregation. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 18
Provided by: josh86
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: On-the-Fly Sharing for Streamed Aggregation


1
On-the-Fly Sharing for Streamed Aggregation
  • Sailesh Krishnamurthy, Chung Wu , and Michael J.
    Franklin
  • Presented byJoshua Lee and Mingrui Wei

Material is partially referenced from 1
2
Agenda
  • Motivation
  • Contributions
  • Shared Time Slice (STS)
  • Shared Data Fragments (SDF)
  • Shared Data Shards (SDS)
  • Experimental Evaluation
  • Conclusion

3
Motivation
  • Naive approach leads to scalability and
    performance problems
  • Static optimization costs too much and does not
    fit in Dynamic Environments

4
Contributions
  • Shared Time Slices
  • Shared Data Fragments
  • Shared Data Shards
  • On-the-fly MQO
  • Performance study

5
Shared Time Slice (STS)
  • To share Windows, there are TWO approaches
  • Paned
  • Paired

6
STS, continued
  • Sharing sliced window
  • Combiningmultiple sliced windows

7
STS, continued
  • To share, or not to share
  • On-the-fly sliced window composition

8
Shared Data Fragments (SDF)
  • Shared processing for aggregate queries with the
    same window, but different predicates
  • For instance, a set of queries Q, all like Query
    3, where each Qi has a window with Range5min and
    Slide5min, but different WHERE clauses

9
SDF Motivation
  • T is the set of input tuples in one window
  • pi(T) is the set of tuples in T that satisfies pi
  • Ai(T) is the aggregate result over pi(T)
  • Unshared approach evaluates each pi(T) and then
    calculates each Ai(T) as the answer to Qi

10
SDF Fragments Defined
  • SDF approach is to define fragments Fi as
    disjoint subsets over T
  • The example, at left, shows that F5 consists of
    those tuples that satisfy p1 and p3

11
SDF Conceptual View
  • A partial aggregation, G, is applied to each Fi
  • Each Ai(T) is formed by a final aggregation, H,
    on a set of G(Fi) aggregations
  • For instance, in the previous example
  • A1(T) A(p3(T) HG(F1),G(F3),G(F5),G(F7)

12
SDF Implementation
  • Each tuple is augmented with a signature
    indicating which predicates it satisfies,
    identifying the fragment in which the tuple
    belongs (it also identifies the queries to which
    the fragment belongs)
  • The Fragment Manager dynamically aggregates all
    tuples with identical signatures
  • A final aggregation is performed on the Fragment
    Aggregates using the signatures to route to each
    query

13
SDF Cost Comparison
  • The unshared approachhas no partial
    aggregationstep and each tuple is subjected to
    as many aggregations as the number of queries it
    satisfies
  • The SDF approach requires a partial aggregation
    operation for each tuple, as well as a final
    aggregation per fragment for each query of which
    the fragment is a part

14
Shared Data Shards (SDS)
  • Both STS and SDF partition the input, form
    partial aggregates over the partitions, and then
    final aggregates
  • SDS first slices the input, then fragments the
    slices
  • Partial aggregates are calculated on the
    fragmented slices
  • Final aggregates are calculated using sets of the
    partial aggregates

15
Experimental Evaluation
  • A performance study using trading data from NYSE
    and NASDAQ was performed
  • They looked at STS vs. unshared with (same
    predicates, different windows)
  • They looked at SDF vs. unshared with (different
    predicates, same windows)
  • They looked at SDS vs. unshared with (different
    predicates, different windows)

16
Conclusion
  • Paired beats Paned
  • Shared Data Fragments beats Unshared
  • Shared Data Shards beatsUnshared Slice

17
References
  1. Krishnamurthy, S., Wu, C., and Franklin, M. 2006.
    On-the-fly sharing for streamed aggregation. In
    Proceedings of the 2006 ACM SIGMOD international
    Conference on Management of Data (Chicago, IL,
    USA, June 27 - 29, 2006). SIGMOD '06. ACM Press,
    New York, NY, 623-634.
Write a Comment
User Comments (0)
About PowerShow.com