Freddies: DHT-Based Adaptive Query Processing via Federated Eddies - PowerPoint PPT Presentation

About This Presentation

Title:

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies

Description:

Changing selectivity/costs of operators. Assumptions made at query time may no longer hold ... Changing operator selectivities and costs. Federated ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 22

Provided by: shawnj8

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Freddies: DHT-Based Adaptive Query Processing via Federated Eddies

1
Freddies DHT-Based Adaptive Query Processing
via Federated Eddies

Ryan Huebsch
Shawn Jeffery
CS 294-4 Peer-to-Peer Systems
12/9/03

2
Outline

Background PIER
Motivation Adaptive Query Processing (Eddies)
Federated Eddies (Freddies)
System Model
Routing Policies
Implementation
Experimental Results
Conclusions and Continuing Work

3
PIER

Fully decentralized relational query processing
engine
Principles
Relaxed consistency
Organic Scaling
Data in its Natural Habitat
Standard Schemas via Grassroots software
Relational queries can be executed in a number of
logically equivalent ways
Optimization step chooses the best
performance-wise
Currently, PIER has no means to optimize queries

4
Adaptive Query Processing

Traditional query optimization occurs at query
time and is based on statistics. This is hard
because
Catalog (statistics) must be accurate and
maintained
Cannot recover from poor choices
The story gets worse!
Long running queries
Changing selectivity/costs of operators
Assumptions made at query time may no longer hold
Federated/autonomous data sources
No control/knowledge of statistics
Heterogeneous data sources
Different arrival rates
Thus, Adaptive Query Processing systems attempt
to change execution order during the query
Query Scrambling, Tukwila, Wisconsin, Eddies

5
Eddies

Eddy A tuple router that dynamically chooses the
order of operators in a query plan
Optimize query at runtime on a per-tuple basis
Monitors selectivities and costs of operators to
determine where to send a tuple to next
Currently centralized in design and
implementation
Some other efforts for distributed Eddies from
Wisconsin Singapore (neither use a DHT)

6
Why use Eddies in P2P? (The easy answers)

Much of the promise of P2P lies in its fully
distributed nature
No central point of synchronization ? no central
catalog
Distributed catalog with statistics helps, but
does not solve all problems
Possibly stale, hard to maintain
Need CAP to do the best optimization
No knowledge of available resources or the
current state of the system (load, etc)
This is the PIER Philosophy!
Eddies were designed for a federated query
processor
Changing operator selectivities and costs
Federated/heterogeneous data sources

7
Why Eddies in P2P? (The not so obvious answers)

Available compute resources in a P2P network are
heterogeneous and dynamically changing
Where should the query be processed?
In a large P2P system, local data distributions,
arrival rates, etc. maybe different than global

8
Freddies Federated Eddies

A Freddy is an adaptive query processing operator
within the PIER framework
Goals
Show feasibility of adaptive query processing in
PIER
Build foundation and infrastructure for smarter
adaptive query processing
Establish baseline for Freddy performance to
improve upon with smarter routing policies

9
An Example Freddy
R join S
S join T
Put (Join Value RS)
Local Operators
Put (Join Value ST)
To DHT
Freddy
Output
Get(R)
Get(T)
Get(S)
R S T
From DHT
10
System Model

Same functionality as centralized Eddy
Allows easy concept reuse
Freddy uses its Routing Policy to determine the
next operator for a tuple
Tuples in a Freddy are tagged with DoneBits
indicating which operators have processed it
Freddy does all state management, thus existing
operators require no modifications
Local processing comes first (in most cases)
Conserve network bandwidth
Not as simple as it seems
Freddy decide how to rehash a tuple
This determines join order
Challenge Decoupling of routing decision and
operator. Most Eddy techniques no longer valid

11
Query Processing in Freddies

Query origin creates a query plan with a Freddy
Possible routings determined at this time, but
not the order
Freddy operators on all participating nodes
initiate data flow
As tuples arrive, the Freddy determines the next
operator for this tuple based on the DoneBits and
routing policy
Source tuples tagged with clean DoneBits and
routed appropriately
When all DoneBits are set, the tuple is sent to
the output operator (return to query origin)

12
Tuple Routing Policy

Determines to which operator to send a tuple
Local information
Messages expensive
Monitor local usage and adjust locally
Processing Buddy information
During processing, discover general trends in
input/output nodes processing capabilities/output
rates, etc
For instance, want to alert previous Freddy of
poor PUT decisions
Design space is huge ? large research area

13
Freddy Routing Policies

Simple (KISS)
Static
Random Not as bad as you may think
Local Stat Monitoring (sampling)
More complex
Queue lengths
Somewhat analogous to the back-pressure effect
Monitors DHT PUT ACKs
Load balancing through learning of global join
key distribution
Piggyback stats on other messages
Dont need global information, only stats about
processing buddies (nodes with which we
communicate)
Different sample than local may or may not be
better

14
Implementation Experimental Setup

Design Decisions
Simplicity is key
Roughly 300 of NCSS (PIER is about 5300)
Single query processing operator
Separate routing policy module loaded at query
time
Possible routing orders determined by simple
optimizer
Required generalizations to the PIER execution
engine to deal with generic operators
Allow PIER to run any dataflow operator
Simulator with 256 nodes, 100 tuples/table/node
Feasibility, not scalability
In the absence of global (or stale) knowledge, a
static optimizer could chose any join ordering ?
we compare Freddy performance to all possible
static plans

15
3-way join

R join S join T
R join S is highly selective (drops 90)
S join T is expensive (multiples tuple count by
25)
Possible static join orderings

T
R
R
S
S
T
16
3 Way Join Results
17
4-way join

R join S join T join U
S join T is still expensive
Possible static join orderings

U
R
U
R
T
U
R
S
R
S
S
T
S
T
T
U
Note A traditional optimizer cant make this plan
R
S
T
U
18
4-Way Join
19
The Promise of Routing Policy

Illustrative example of how routing policy can
improve performance
This not meant to be an exhaustive comparison of
policies, rather to show the possibilities
EddyQL considers number of outstanding PUTs
(queue length) to decide where to send

20
Conclusions andContinuing Work