Freddies: DHT-Based Adaptive Query Processing via Federated Eddies - PowerPoint PPT Presentation

About This Presentation
Title:

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies

Description:

Changing selectivity/costs of operators. Assumptions made at query time may no longer hold ... Changing operator selectivities and costs. Federated ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 22
Provided by: shawnj8
Category:

less

Transcript and Presenter's Notes

Title: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies


1
Freddies DHT-Based Adaptive Query Processing
via Federated Eddies
  • Ryan Huebsch
  • Shawn Jeffery
  • CS 294-4 Peer-to-Peer Systems
  • 12/9/03

2
Outline
  • Background PIER
  • Motivation Adaptive Query Processing (Eddies)
  • Federated Eddies (Freddies)
  • System Model
  • Routing Policies
  • Implementation
  • Experimental Results
  • Conclusions and Continuing Work

3
PIER
  • Fully decentralized relational query processing
    engine
  • Principles
  • Relaxed consistency
  • Organic Scaling
  • Data in its Natural Habitat
  • Standard Schemas via Grassroots software
  • Relational queries can be executed in a number of
    logically equivalent ways
  • Optimization step chooses the best
    performance-wise
  • Currently, PIER has no means to optimize queries

4
Adaptive Query Processing
  • Traditional query optimization occurs at query
    time and is based on statistics. This is hard
    because
  • Catalog (statistics) must be accurate and
    maintained
  • Cannot recover from poor choices
  • The story gets worse!
  • Long running queries
  • Changing selectivity/costs of operators
  • Assumptions made at query time may no longer hold
  • Federated/autonomous data sources
  • No control/knowledge of statistics
  • Heterogeneous data sources
  • Different arrival rates
  • Thus, Adaptive Query Processing systems attempt
    to change execution order during the query
  • Query Scrambling, Tukwila, Wisconsin, Eddies

5
Eddies
  • Eddy A tuple router that dynamically chooses the
    order of operators in a query plan
  • Optimize query at runtime on a per-tuple basis
  • Monitors selectivities and costs of operators to
    determine where to send a tuple to next
  • Currently centralized in design and
    implementation
  • Some other efforts for distributed Eddies from
    Wisconsin Singapore (neither use a DHT)

6
Why use Eddies in P2P? (The easy answers)
  • Much of the promise of P2P lies in its fully
    distributed nature
  • No central point of synchronization ? no central
    catalog
  • Distributed catalog with statistics helps, but
    does not solve all problems
  • Possibly stale, hard to maintain
  • Need CAP to do the best optimization
  • No knowledge of available resources or the
    current state of the system (load, etc)
  • This is the PIER Philosophy!
  • Eddies were designed for a federated query
    processor
  • Changing operator selectivities and costs
  • Federated/heterogeneous data sources

7
Why Eddies in P2P? (The not so obvious answers)
  • Available compute resources in a P2P network are
    heterogeneous and dynamically changing
  • Where should the query be processed?
  • In a large P2P system, local data distributions,
    arrival rates, etc. maybe different than global

8
Freddies Federated Eddies
  • A Freddy is an adaptive query processing operator
    within the PIER framework
  • Goals
  • Show feasibility of adaptive query processing in
    PIER
  • Build foundation and infrastructure for smarter
    adaptive query processing
  • Establish baseline for Freddy performance to
    improve upon with smarter routing policies

9
An Example Freddy
R join S
S join T
Put (Join Value RS)
Local Operators
Put (Join Value ST)
To DHT
Freddy
Output
Get(R)
Get(T)
Get(S)
R S T
From DHT
10
System Model
  • Same functionality as centralized Eddy
  • Allows easy concept reuse
  • Freddy uses its Routing Policy to determine the
    next operator for a tuple
  • Tuples in a Freddy are tagged with DoneBits
    indicating which operators have processed it
  • Freddy does all state management, thus existing
    operators require no modifications
  • Local processing comes first (in most cases)
  • Conserve network bandwidth
  • Not as simple as it seems
  • Freddy decide how to rehash a tuple
  • This determines join order
  • Challenge Decoupling of routing decision and
    operator. Most Eddy techniques no longer valid

11
Query Processing in Freddies
  • Query origin creates a query plan with a Freddy
  • Possible routings determined at this time, but
    not the order
  • Freddy operators on all participating nodes
    initiate data flow
  • As tuples arrive, the Freddy determines the next
    operator for this tuple based on the DoneBits and
    routing policy
  • Source tuples tagged with clean DoneBits and
    routed appropriately
  • When all DoneBits are set, the tuple is sent to
    the output operator (return to query origin)

12
Tuple Routing Policy
  • Determines to which operator to send a tuple
  • Local information
  • Messages expensive
  • Monitor local usage and adjust locally
  • Processing Buddy information
  • During processing, discover general trends in
    input/output nodes processing capabilities/output
    rates, etc
  • For instance, want to alert previous Freddy of
    poor PUT decisions
  • Design space is huge ? large research area

13
Freddy Routing Policies
  • Simple (KISS)
  • Static
  • Random Not as bad as you may think
  • Local Stat Monitoring (sampling)
  • More complex
  • Queue lengths
  • Somewhat analogous to the back-pressure effect
  • Monitors DHT PUT ACKs
  • Load balancing through learning of global join
    key distribution
  • Piggyback stats on other messages
  • Dont need global information, only stats about
    processing buddies (nodes with which we
    communicate)
  • Different sample than local may or may not be
    better

14
Implementation Experimental Setup
  • Design Decisions
  • Simplicity is key
  • Roughly 300 of NCSS (PIER is about 5300)
  • Single query processing operator
  • Separate routing policy module loaded at query
    time
  • Possible routing orders determined by simple
    optimizer
  • Required generalizations to the PIER execution
    engine to deal with generic operators
  • Allow PIER to run any dataflow operator
  • Simulator with 256 nodes, 100 tuples/table/node
  • Feasibility, not scalability
  • In the absence of global (or stale) knowledge, a
    static optimizer could chose any join ordering ?
    we compare Freddy performance to all possible
    static plans

15
3-way join
  • R join S join T
  • R join S is highly selective (drops 90)
  • S join T is expensive (multiples tuple count by
    25)
  • Possible static join orderings

T
R
R
S
S
T
16
3 Way Join Results
17
4-way join
  • R join S join T join U
  • S join T is still expensive
  • Possible static join orderings

U
R
U
R
T
U
R
S
R
S
S
T
S
T
T
U
Note A traditional optimizer cant make this plan
R
S
T
U
18
4-Way Join
19
The Promise of Routing Policy
  • Illustrative example of how routing policy can
    improve performance
  • This not meant to be an exhaustive comparison of
    policies, rather to show the possibilities
  • EddyQL considers number of outstanding PUTs
    (queue length) to decide where to send

20
Conclusions andContinuing Work
  • Freddies provide adaptable query processing in a
    P2P system
  • Require no global knowledge
  • Baseline performance shows promise for smarter
    policies
  • In the future
  • Explore Freddy performance in a dynamic
    environment
  • Explore more complex routing policies

21
Questions? Comments? Snide remarks for
Ryan?Glorious praise for Shawn?
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com