Telegraph Continuously Adaptive Dataflow - PowerPoint PPT Presentation

About This Presentation
Title:

Telegraph Continuously Adaptive Dataflow

Description:

sensors and their data feeds are key. smart dust, biomedical (MEMS sensors) each consumer good records (mis)use ... pipelining 'online' operators, data 'juggle' ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 22
Provided by: joseph375
Learn more at: http://www.fdis.org
Category:

less

Transcript and Presenter's Notes

Title: Telegraph Continuously Adaptive Dataflow


1
TelegraphContinuously Adaptive Dataflow
  • Joe Hellerstein

2
Scenarios
  • Ubiquitous computing more than clients
  • sensors and their data feeds are key
  • smart dust, biomedical (MEMS sensors)
  • each consumer good records (mis)use
  • disposable computing
  • video from surveillance cameras, broadcasts, etc.
  • Global Data Federation
  • all the data is online what are we waiting for?
  • The plumbing is coming
  • XML/HTTP, etc. give LCD communication
  • but how do you flow, summarize, query and analyze
    data robustly over many sources in the wide area?

3
Dataflow in Volatile Environments
  • Federated query processors a reality
  • Cohera, IBM DataJoiner
  • No control over stats, performance,
    administration
  • Large Cluster Systems Scaling Out
  • No control over system balance
  • User CONTROL of running dataflows
  • Long-running dataflow apps are interactive
  • No control over user interaction
  • Sensor Nets the next killer app
  • E.g. Smart Dust
  • No control over anything!
  • Telegraph
  • Dataflow Engine for these environments

4
Data Flood Main Features
  • What does it look like?
  • Never ends interactivity required
  • Online, controllable algorithms for all tasks!
  • Big data reduction/aggregation is key
  • Volatile this scale of devices and nets will not
    behave nicely

5
The Telegraph Dataflow Engine
  • Key technologies
  • Interactive Control
  • interactivity with early answers and examples
  • online aggregation for data reduction
  • Dataflow programming via paths/iterators
  • Elevate query processing frameworks out of DBMSs
  • Long tradition of static optimization here
  • Suggestive, but not sufficient for volatile
    environments
  • Continuously adaptive flow optimization
  • massively parallel, adaptive dataflow via Rivers
    and Eddies

6
CONTROLContinuous Output and Navigation
Technology with Refinement On Line
  • Data-intensive jobs are long-running. How to
    give early answers and interactivity?
  • online interactivity over feeds
  • pipelining online operators, data juggle
  • online data correlation algs ripple joins,
    online mining and aggregation
  • statistical estimators, and their performance
    implications
  • Deliver data to satisfy statistical goals
  • Appreciate interplay of massive data processing,
    stats, and HCI
  • Of all men's miseries, the bitterest is this to
    know so much and have control over nothing
  • Herodotus

7
Performance Regime for CONTROL
  • New Greedy Performance Regime
  • Maximize 1st derivative of the user-happiness
    function

100
CONTROL
?
Traditional
Time
8
CONTROLContinuous Output and Navigation
Technology with Refinement On Line
9
CONTROLContinuous Output and Navigation
Technology with Refinement On Line
10

11
Potters Wheel Anomaly Detection
12
River
  • We built the worlds fastest sorting machine
  • On the NOW 100 Sun workstations SAN
  • But it only beat the record under ideal
    conditions!
  • River performance adaptivity for data flows on
    clusters
  • simplifies management and programming
  • perfect for sensor-based streams

13
Declarative Dataflow NOT new
  • Database Systems have been doing this for years
  • Xlate declarative queries into an efficient
    dataflow plan
  • query optimization considers
  • Alternate data sources (access methods)
  • Alternate implementations of operators
  • Multiple orders of operators
  • A space of alternatives defined by transformation
    rules
  • Estimate costs and data rates, then search
    space
  • But in a very static way!
  • Gather statistics once a week
  • Optimize query at submission time
  • Run a fixed plan for the life of the query
  • And these ideas are ripe to elevate out of DBMSs
  • And outside of DBMSs, the world is very volatile
  • There are surely going to be lessons outside the
    box

14
Static Query Plans
  • Volatile environments like sensors need to adapt
    at a much finer grain

15
Continuous Adaptivity Eddies
Eddy
  • How to order and reorder operators over time
  • based on performance, economic/admin feedback
  • Vs.River
  • River optimizes each operator horizontally
  • Eddies optimize a pipeline vertically

16
Competitive Eddies
17
Telegraph Putting it Together
  • Scalable, adaptive dataflow infrastructure. Apps
    include
  • sensor nets
  • massively parallel and wide-area query engines
  • net appliances chaining xform8n/aggreg8n/compress
    ion/ etc. in proxies
  • any volatile dataflow scenario
  • Technology a marriage of
  • CONTROL, Rivers Eddies
  • Many research questions here
  • E.g. how to combine River and Eddy adaptivity
  • E.g. how to tune Eddies for statistical
    performance goals
  • Combinations of browse/query/mine at UI
  • Storage management to handle new hardware
    realities
  • Look for a live service this summer!

18
Integration with Endeavour
  • Give
  • Be data-intensive backbone to diverse clients
  • Be replication/delivery dataflow engine for
    OceanStore
  • Telegraph Storage Manager provides storage
    (xactional/otherwise) for OceanStore
  • Provide platform for data-intensive tacit info
    mining
  • Take
  • Leverage OceanStore to manage distributed
    metadata, security
  • Leverage protocols out of TinyOS for sensors

19
Connectivity Heterogeneity
  • Lots of folks working on data format translation,
    parsing
  • we will borrow, not build
  • currently using JDBC Cohera Net Query
  • commercial tool, donated by Cohera Corp.
  • gateways XML/HTML (via http) to ODBC/JDBC
  • we may write Teletalk gateways from sensors
  • Heterogeneity
  • never a simple problem
  • Control project developed interactive, online
    data transformation tool ABC

20
More Info
  • Collaborators
  • Mike Franklin, Eric Brewer, Christos
    Papadimitriou
  • Sirish Chandrasekaran, Amol Deshpande, Kris
    Hildrum, Sam Madden, Vijayshankar Raman, Mehul
    Shah
  • Me jmh_at_cs.berkeley.edu
  • Web
  • http//db.cs.berkeley.edu/telegraph
  • http//control.cs.berkeley.edu

21
Extra slides for backup
Write a Comment
User Comments (0)
About PowerShow.com