Adaptive Query Processing with Eddies

About This Presentation

Title:

Adaptive Query Processing with Eddies

Description:

Junior. Joe. Level. Name. Enrolled Courses. Students. Enrolled ... Junior. Joe. Junior. Joe. Jr. Joe. No matches; Eddy processes. the next tuple. Output ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 88

Provided by: mega49

Learn more at: http://www.cs.umd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Query Processing with Eddies

1
Adaptive Query Processing with Eddies

Amol Deshpande
University of Maryland

2
Roadmap

Adaptive Query Processing Motivation
Eddies AH00
STAIRs DH04 and SteMs RDH03
Experimental Study
Implementation in PostgreSQL Des03
Continuous queries MSHR02 (very briefly)
Open problems

3
Query Processing in Database Systems
Declarative Query
Database System
Results
4
Query Processing Example
select from students, enrolled, courses where
students.name enrolled.name and
enrolled.course courses.course
Database System
Students
Enrolled
Courses
5
Query Processing Example
select from students, enrolled, courses where
students.name enrolled.name and
enrolled.course courses.course
Courses
Students
Enrolled
6
Example Query Execution Plans
SEC
SEC
CE
SE
S
C
Students
Courses
C
S
E
E
Courses
Students
Enrolled
Enrolled
A Query Execution Plan
An alternate Execution Plan
7
Cost-based Query Optimization
Estimate cost of each plan and choose the best
SEC
Cost g(SE, C, R)
Input sizes

SE
C
Cost f(S, E, R)
Courses

S
E
Students
Enrolled
Runtime Parameters
Cost (Plan)
A Query Execution Plan
8
Cost-based Query Optimization
Results
Query Optimizer
Query Executor
Compiled Query Plan
Declarative Query
Disk(s)
9
Cost-based Query Optimization
Results
Query Optimizer
Query Executor
Compiled Query Plan
Declarative Query
Wide area data sources e.g. remote tables,
web data sources
Disk(s)
10
Cost-based Query Optimization
Results
Query Optimizer
Query Executor
Compiled Query Plan
Declarative Query
Streaming data e.g. Stock tickers
Network logs Sensor networks
11
Estimation Errors
Cost g(SE, C, R)
SEC
SE
C
Courses
S
E
Students
Enrolled
A Query Execution Plan
12
Estimation Errors
Cost g(SE, C, R)
SEC
SE
Unknown runtime parameters
C
Courses
S
E
Students
Enrolled
A Query Execution Plan
13
How to solve this problem ?

More sophisticated estimation techniques
Sophisticated summary structures
e.g. MHists PI97, Wavelets VWI98
Feedback loop in the optimization process
e.g. SLMK01, BC02
Adaptive query processing
Cant always build and maintain synopses
Runtime environments can be very unpredictable
Soadapt query plans mid-way during execution

14
Eddies Extreme Adaptivity
static plans
per tuple
inter- operator
late binding
intra- operator
Dynamic QEP, Parametric, Competitive
Traditional DBMS
Query Scrambling, MidQuery Re-opt
XJoin, DPHJ Convergent QP
Eddies

Telegraph TelegraphCQ (at UC Berkeley)
Eddies AH00
SteMs RDH03
Continuous queries MSHR02, CF02, C03, K03
Implementation in PostgreSQL Des04
Fault-tolerance and load balancing SHB04
STAIRs DH03
Other work
Distributed eddies, Content-based Routing BB05

15
Roadmap

Adaptive Query Processing Motivation
Eddies AH00
STAIRs DH04 and SteMs RDH03
Experimental Study
Implementation in PostgreSQL Des03
Continuous queries MSHR02 (very briefly)
Open problems

16
Eddies AH00
select from S where pred1(S) and pred2(S)
Plans considered by the optimizer
pred2(S)
pred1(S)
S
Output
pred1(S)
pred2(S)
S
Output
Decision made apriori based on statistics Sort
by (1-s)/c, where s selectivity, c cost
17
Eddies AH00
select from S where pred1(S) and pred2(S)
Executing the query using an Eddy

An eddy operator
Intercepts tuples from source(s) and output
tuples from operators
Query executed by routing tuples between the
operators
Uses feedback from the operators to route

Change routing gt Change query
execution plan used
18
Per-tuple State
select from S where pred1(S) and pred2(S)
Executing the query using an Eddy
pred2(S)
Eddy
Output
S

Two Bitmaps
Ready bits - which operators can a tuple be
routed to next
Done bits - which operators has a tuple already
been through

pred1(S)
Example Ready(t1) 1, 1 - can be routed to
either Done(t1) 0, 0 - not done either
Example Ready(t2) 1, 0 - can be routed to
pred1 Done(t2) 0, 1 - done pred2
For selection queries, ready is a bit-complement
of done
19
Eddies Routing Policy

Choosing which operator to route a given tuple to
The brain of the eddy

Send here 99 of the time Send to the other
operator 1 of the time
Lottery Scheduling Avnur 00 Simplified
Description 1. Maintain for each operator
tuples sent tuples returned
cost per tuple 2. Choose (roughly) based on the
above 3. Explore by randomly sending tuples in
the wrong orders
sent 100 received 2
pred2(S)
Eddy
Output
S
pred1(S)
sent 10 received 20
20
A Join Query
select from students, enrolled, courses where
students.name enrolled.name and
enrolled.course courses.course
Courses
Students
Enrolled
21
Eddies AH00
Query execution using an eddy
A traditional query plan
A key difference Tuples cant be arbitrarily
routed to any operator E.g. S tuples cant be
routed to E Join C Use ready bits to identify
this
22
Query Execution using Eddies
Probe to find matches
Insert with key hash(joe)
Eddy
S
E
Output
C
No matches Eddy processes the next tuple
23
Query Execution using Eddies
Probe
Insert
Eddy
S
E
Output
C
24
Query Execution using Eddies
Probe
Eddy
S
E
Output
C
Probe
25
Per-tuple State
Eddy
S
E
Output
C
26
Per-tuple State
Eddy
S
E
Output
C
27
Per-tuple State
Eddy
S
E
Output
C
28
Eddies Postmortem
Output
Output
E C
E S
S E
Courses
C E
Students
Students
Enrolled
Courses
Enrolled
Eddy executes different query execution plans for
different parts of data
29
Joins and Lottery Scheduling

Lottery scheduling doesnt work well with joins

30
Example Delayed Data Sources
SETUP
gtgt
Execution plan 1
Execution plan 2
SEC
SEC
CE
SE
S
C
C
E
S
E
Cost (Plan 1) gt Cost (Plan 2)
31
Example Delayed Data Sources
SETUP E and C arrive early S is
delayed
gtgt
S
E
C
time
32
SETUP E and C arrive early S is
delayed
gtgt
S0
sent and received suggest S Join E is better
option
S
E
S
S0
S S0
E
C
time
C
S0E
SE
(S S0)E
Eddy learns the correct sizes
Too Late !!
33
SETUP E and C arrive early S is
delayed
gtgt
S
E
C
C
SE
S
E
Execution Plan Used
Query is executed using the worse plan.
Too Late !!
34
Joins and Lottery Scheduling

Lottery scheduling doesnt work well with joins
Not clear how any routing policy can work without
reasonable knowledge of future
Whatever the current state in the join operators,
an adversary can send tuples to make it look very
bad
Two possible solutions
Allow manipulation of state (STAIRs) DH04
Dont embed state in the operators (SteMs)
RDH03

35
Roadmap

Adaptive Query Processing Motivation
Eddies AH00
STAIRs DH04 and SteMs RDH03
Experimental Study
Implementation in PostgreSQL Des03
Continuous queries MSHR02 (very briefly)
Open problems

36
STAIRs DH04

Expose join state to the eddy
Provide state management primitives
That guarantee correctness of execution
That can be used to manipulate embedded state in
the operators
Also allow support for cyclic queries etc

37
New Operator STAIR
S E
HashTable S.Name
HashTable E.Name

Eddy
S
Output
E
C

HashTable E.Course
HashTable C.Course
E C
38
New Operator STAIR
Storage, Transformation and Access for
Intermediate Results
39
Query execution using STAIRS

Similar to using Join Operators

Probe into E.Name STAIR
s1
s1
s1
s1
40
STAIR Operations

Build (insert)
Insert the given tuple into the STAIR
Probe (lookup)
Find matching tuples for the given tuple
State Management Operations
Demotion
Promotion

41
State Management Primitive Demotion

Replace a tuple in a STAIR with a projection of
that tuple

S.Name STAIR
HashTable

E.Name STAIR
s1
Demoting e2c1 to e2
HashTable

e1
e2c1
e2
Eddy
S
E
Output
C
HashTable

e2
s1e1
HashTable

c1
E.Course STAIR
Can be thought of as undoing work
C.Course STAIR
42
State Management Primitive Promotion

Replace a tuple in a STAIR with the result of
joining it with other tuples

S.Name STAIR

Two arguments
A tuple
A join to be used to promote this
tuple

HashTable

E.Name STAIR
s1
HashTable

e1
e1c1
e2c1
Eddy
S
E
Output
C
HashTable

e2
s1e1
HashTable

c1
e1
E.Course STAIR
Can be thought of as precomputation of work
C.Course STAIR
43
STAIRs Correctness

Theorem For any sequence of applications of the
state management operations, STAIRs will produce
the correct query output.
STAIRs will produce every result tuple
There will be no spurious duplicates

44
Lifting Burden of History Delayed Data Sources
45
SETUP E and C arrive early S is
delayed
gtgt
S0
S
E
S0
E
C
time
C
S0E
Eddy learns the correct selectivities
46
SETUP E and C arrive early S is
delayed
gtgt
S0
E.Name STAIR

HashTable
S
E
E
Eddy
S
C
E
Output
C
time
E

HashTable
C
Eddy decides to migrate E
Eddy learns the correct selectivities
By promoting E using E C
C.Course STAIR
47
SETUP E and C arrive early S is
delayed
gtgt
S.Name STAIR

HashTable
S
S0
E.Name STAIR

HashTable
S
S S0
S S0
E
Eddy
S
C
(S S0) E C
E
Output
C
time
E

HashTable
C
C.Course STAIR
48
S.Name STAIR

HashTable
S
E.Name STAIR

HashTable
UNION
Eddy
S
E
Output
C
E

HashTable
C
Most of the data is processed using the correct
plan
C.Course STAIR
49
Further Motivating Adaptive State Management

Eager pre-computation for faster response times
Query scrambling UFA98
Partial results RH02
Selective caching of intermediate results
Continuous queries over streams
Cyclic queries
Adapting the join spanning tree used

50
Making State Migration Decisions

Another policy question
Optimal migration decisions
Requires knowledge of future selectivities and
the sizes of relations

51
Roadmap

Adaptive Query Processing Motivation
Eddies AH00
STAIRs DH04 and SteMs RDH03
Experimental Study
Implementation in PostgreSQL Des03
Continuous queries MSHR02 (very briefly)
Open problems

52
Alternative SteMs RDH03

Dont embed the state in the operators at all
Note Not the original motivation for SteMs
Focus was on increasing opportunities for
adaptivity by breaking up the join operators
We will focus on a very simplistic version of the
operator

53
Query Execution using SteMs
54
Query Execution using SteMs
S SteM
Probe

Insert
E SteM

Eddy
S
E
C
C SteM

Probe
55
Query Execution using SteMs

State inside the operators is independent of
previous routing decisions
Because no intermediate tuples are ever stored
Doesnt have the same problem as the join or
STAIR operators
Optimal routing policy easy to write down
Similarities to queries with only selections
But not storing intermediate results increases
the computation cost significantly

56
SteMs Drawbacks

Recomputation of intermediate result tuples
Constrained plan choices
Available plans depend highly on the arrival
order

57
SETUP E and C arrive early S is
delayed
gtgt
S0
S SteM

S
S0
E SteM
E

E
C
Eddy
S
E
C
C SteM
time

C
Under the mechanism, there is no way to execute
the other plan for this setup
58
SteMs Drawbacks

Recomputation of intermediate result tuples
Constrained plan choices
Available plans depend highly on the arrival
order
Though more subtle, the second drawback might be
the more important one

59
Recap

An eddy operator
Can affect the query execution plan(s) used by
routing different tuples differently
Eddy w/ Selections
Well understood
Even if selections are correlated
Babu, Munagala et al SIGMOD 2004, ICDT 2005

60
Recap

Eddies for multi-way joins
Opportunities for adaptivity depend on the join
operators used
Higher adaptivity tends to push logic into the
eddy gt Routing policies very important

Sort-merge Hybrid-Hash
Index-nested loop joins
Nested-loop Joins
Pipelined/ Symmetric Hash Join
SteMs/ STAIRs
Blocking opeators Little adaptivity
Similarities to selections
Suffers from state accumulation problems
Policy issues not well-understood
See AH00
61
Roadmap

Adaptive Query Processing Motivation
Eddies AH00
STAIRs DH04 and SteMs RDH03
Experimental Study
Implementation in PostgreSQL Des03
Continuous queries MSHR02 (very briefly)
Open problems

62
Implementation Details

In PostgreSQL Database System code base
In the context of TelegraphCQ project
Highly efficient implementation SIGREC04
Eddy, SteMs, STAIRs export get_next() functions
Routing decisions are made per batch
Can control batch size
Routing decisions made for all possible ready
bitmaps
Decisions are encoded in arrays that are indexed
with ready bits
Efficiently find the operator to route to

63
Results - Overheads (1)
All plans have identical costs, so adaptivity
plays no role
64
Results - Overheads (2)
65
Policies used for experiments

Routing policy
Observe
Selectivities of predicates on base tables
Domain sizes of join attributes
Compute join selectivities and use them to route
tuples
Migration policy
Tie state migration decisions to routing
decisions
Follow the routing policy decisions to make sure
that most tuples are routed correctly
Caveats
May end doing migrations late in the query
execution
May thrash

66
State Migration Illustrative Example
select from customer c, orders o, lineitem l
where c.custkey o.custkey and o.orderkey
l.orderkey and c.nationkey 1 and c.acctbal
gt 9000 and l.shipdate gt date 1996-01-01
Setup lineitem arrives sorted on shipdate gt
selectivity(l.shipdate gt ) very low
initially gt orders routed to join with
lineitem (bad) No explicit delays introduced
67
Illustrative Example (1)
68
Illustrative Example (2)
69
Experiments Synthetic Workload

Modeled after the Wisconsin Benchmark
20 Tables for varying sizes
Randomly generated queries
Environment
Rates proportional to table sizes no delays or
Random initial delays introduced or
Random data rates

70
Traditional vs STAIRs
71
SteMs vs STAIRs
72
Joins vs STAIRs
73
Roadmap

Adaptive Query Processing Motivation
Eddies AH00
STAIRs DH04 and SteMs RDH03
Experimental Study
Implementation in PostgreSQL Des03
Continuous queries MSHR02 (very briefly)
Open problems

74
Continous Query Processing

Eddies ideal for executing continuous queries
over data streams
Dynamic runtime conditions make a static plan
unsuitable
Queries typically executed over sliding windows
Find average over last one week
Note Continuous vs Multi-query processing
Not identical
Data streams literature does not make this
difference explicit
Application environments tend to have a large
number of simultaneous queries

75
Continous Query Processing

CACQ Madden et al 2002
Focus on sharing work as much as adaptivity
Uses SteMs augmented with a deletion operator
To handle sliding windows
Also uses predicate indexes
For handling a large number of queries on the
same set of streams but with different predicates
E.g. millions of stock alerts over a few streams

76
Roadmap

Adaptive Query Processing Motivation
Eddies AH00
STAIRs DH04 and SteMs RDH03
Experimental Study
Implementation in PostgreSQL Des03
Continuous queries MSHR02 (very briefly)
Open problems

77
Some open problems (1)

Eddies for continuous query processing
Much work since CACQ, but not a solved problem
E.g. computational inefficiency of SteMs
Many other proposed CQ architectures face the
same problem
MJoins (NiagaraCQ)
Stanford STREAM processor (earlier version)
Later added intermediate result caches
Note These two dont use eddies explicitly
Routing policies for CQ still an open question
Different from routing policies for non-CQ queries

78
Some open problems (2)

Routing policies
Whether eddies will succeed depends on the
routing policies
Little work so far...
SteMs, STAIRs
Theoretical analysis of optimization space, and
practical viability analysis needed
Especially in the context of continuous query
processing

79
Some open problems (3)

Eddies for multi-query processing (non-CQ)
SteMs may be sufficient for CQ processing, but
not for normal multi-query processing
Parallel, distributed environments, P2P, Grid..
Disk
Flexibility demanded by adaptive techniques at
odds against the careful scheduling typically
done by DBMSs
XJoins
Very little work on understanding this

80
Some open problems (4)

Optimization with expanded plan space
Eddies can explore a plan space much larger than
traditional plan space
They allow relations to be broken into pieces,
with each piece executed separately
Can we explore this plan space in a non-adaptive
setting ?
Recent work on
Conditional Planning Deshpande et al, ICDE 2005
Content-based Routing Babu et al, VLDB 2005

81
Summary

Increasing need for adaptivity
Eddy A highly adaptive query processor
Executes queries by routing tuples through
operators
SteMs, STAIRs
New operators proposed to handle problems with
traditional join operators
Very promising especially for continuous and
wide-area query processing
Exciting research lies ahead

82
The End

Questions ?

83
Fatal Flaw Burden of Routing History
Routing decisions get embedded in the state
Eddy
S
E
Output
C
Future adaptibility is severly constrained
84
Example Delayed Data Sources
SETUP
gtgt
Execution plan 1
Execution plan 2
SEC
SEC
CE
SE
S
C
C
E
S
E
Cost (Plan 1) gt Cost (Plan 2)
85
Example Delayed Data Sources
SETUP E and C arrive early S is
delayed
gtgt
S
E
C
time
A plan may have to be chosen without any
statistical information about the data
Earliest time sufficient information may be
available to choose optimal plan
86
Tricky State Configurations 1

Want to undo the decision to route E1 to S E

E1
S0
E2C
Result S0EC already produced
C
S0E1
E2
87
Tricky State Configurations 2
S E
HashTable E.Name

HashTable S.Name
S
E1
E2C1
E2C2I

HashTable E.Course
HashTable C.Course
Eddy
S
C1
SE1
E
C2I
E2
C
I
E C

HashTable C.Intstructor
HashTable I.Instructor
I
C2
SE1C1
SE2C1
C I

Write a Comment

User Comments (0)