TelegraphCQ: Continuous Dataflow Processing for an Uncertain World - PowerPoint PPT Presentation

About This Presentation
Title:

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World

Description:

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World ... Supports build (insert), probe (search) and eviction (deletion) operations. 6. 7/10/09 ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 12
Provided by: defau635
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World


1
TelegraphCQ Continuous Dataflow Processing for
an Uncertain World

  • Sirish Chandrasekaran, Owen Cooper, Amol
    Deshpande, Michael J. Franklin, Joseph M.
    Hellerstein,Wei Hong, Sailesh Krishnamurthy, Sam
    Madden, Vijayshankar Raman, Fred Reiss, and
    Mehul Shah
  • University of California, Berkeley
  • Intel Berkeley Laboratory
  • IBM Almaden Research Center
  • http//telegraph.cs.berkeley.edu/

2
Contents
  • Background and Motivation
  • Telegraph Architecture
  • Window Semantics in TelegraphCQ
  • TelegraphCQ Design Overview
  • TelegraphCQ Architecture
  • Conclusion
  • All diagrams and contents are directly
    adapted/taken from the paper itself!

3
TelegraphCQ Background and Motivation
  • Adaptive Dataflow Architecture systems that
    could adjust their processing on-the-fly in
    response to
  • Changes in user needs HACO99
  • Intermittent delays in accessing data across WANs
    UFA98
  • Shared Processing
  • CACQ MSHR02
  • PSoup CF02
  • Limitations -
  • processing restricted to in-memory data
  • No scheduling and resource management for queries
    with little or no overlap
  • No Quality of Service (QoS) for adapting to
    resource limitations
  • No tradeoff between flexibility and overhead

4
Telegraph - Architecture
  • Extensible set of composable dataflow
    modules/operators
  • Producer-Consumer design with Fjords API
  • Push as well as Pull queues
  • Ingress and Caching
  • Query Processing
  • Adaptive Routing

5
Adaptive Processing Eddies SteMs
  • EDDY
  • continuously route tuples according to a routing
    policy
  • per tuple basis routing requiring associated
    state to the tuple
  • SteMs
  • Temporary repository of tuples
  • Stores homogeneous tuples
  • Supports build (insert), probe (search) and
    eviction (deletion) operations

6
Fjords InterModule Communication
  • Allow use of mixture of push and pull connections
    between modules
  • a pull-queue is implemented using a blocking
    dequeue on the consumer side and a blocking
    enqueue on the producer side.
  • A push-queue is implemented using non-blocking
    enqueue and dequeue control is returned to the
    consumer when the queue is empty
  • Execute query over any combination of streaming
    and static data sources

Flux Scaling Up Dataflow Processing
  • Interposed between a producer-consumer operator
    pair in a pipelined, partitioned dataflow
  • Fault-tolerant, Load-balancing eXchange
  • Load-balancing via online repartitioning of the
    input stream and corresponding state of operators
  • Fault-tolerance by leveraging these state
    movement mechanisms to replicate an operators
    internal state and in-flight data

7
Initial CQ Approaches
  • CACQ
  • First CQ engine exploiting adaptive query
    processing framework
  • Modification of Eddies- execution of multiple
    queries by executing a single super- query as
    disjunction of all the queries
  • Tuple Lineage state to determine the client
  • Grouped Filters index for single variable
    Boolean factors over the same attribute for
    optimizing selections in the shared execution
  • PSoup
  • Extends CACQ
  • Allows queries to access historical data treats
    data and queries symmetrically
  • Adds support for disconnected operation-users can
    register queries

8
Window Semantics in TelegraphCQ
  • Rich windowing schemes over both already-arrived
    as well as incoming data
  • Various window semantics are-
  • Snapshot query execute exactly once over one
    window
  • e.g. Select the closing prices for MSFT on the
    first five days of trading
  • Landmark query fixed beginning point and a
    forward moving endpoint
  • e.g. Select all the days after the hundredth
    trading day, on which the closing price of MSFT
    has been greater than 50. Keep this query
    standing in the system for a thousand trading
    days
  • Sliding query forward moving beginning and end
  • e.g. On every fifth trading day starting today,
    calculate the average closing price of MSFT for
    the five most recent trading days. Keep the query
    standing for fifty trading days
  • Temporal Band-Join join tuples in one stream
    with those in another based on timestamp
  • e.g. For the five most recent trading days
    starting today, select all stocks that closed
    higher than MSFT on a given day. Keep the query
    standing for twenty trading days

9
TelegraphCQ Design Overview
  • Adapted the architecture of PostgreSQL
  • Implemented the new system in C/C to leverage
    the open source PostgreSQL code base
  • Reused components with different levels of changes

10
TelegraphCQ Architecture
  • Three processes that comprise the TelegraphCQ
    server
  • FrontEnd
  • Wrapper
  • Providing Abstraction of External Source
  • Separate Process( non-blocking)
  • Executor
  • Execution Object
  • Providing Execution Context for Multiple
    Queries
  • Dispatch Unit
  • Performing Actual Work

11
Conclusion
  • TelegraphCQ provides adaptive dataflow and shared
    processing architecture
  • Eddy and SteM form building blocks for adaptive
    processing
  • Features like Fjords inter-module communication
    (push and pull connections) and Flux
    Fault-tolerant and Load-balancing Exchange
  • CACQ (tuple-lineage and group-filters) PSoup
    (Symmetrical treatment of data and queries)
  • Built over the PostgreSQL framework

Thank you ?
Write a Comment
User Comments (0)
About PowerShow.com