Adaptive Query Processing

About This Presentation

Title:

Adaptive Query Processing

Description:

Actuate. Measure: monitor cardinalities, resource utilization, etc ... Actuate: switch to a new query plan, hopefully without wasting too much work ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 58

Provided by: Shan218

Learn more at: http://www.cs.umd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Query Processing

1
Adaptive Query Processing

Amol Deshpande, University of Maryland
Vijayshankar Raman, IBM Almaden Research Center

Title slide
2
Goal

Research in query processing and optimization has
taken a new track into adaptive query processing
(QP)
Most new research projects that need QP use some
adaptive approach
Whether regular QP, XML QP, text, continuous QP,
web QP,
Selinger-style query processing fallen out of
flavor
Also true in industrial projects, especially new
ones.
Educate audience on what this is all about
Where adaptive QP works, where it has known
problems, what is unknown
What is Out of Scope of this tutorial
Parallelism, distributed eddies, continuous query
processing

3
Agenda

Background on non-adaptive Query Processing
History, where it works, and where it breaks
Adaptive Query Processing
Dimensions of Adaptivity
Evolutionary Adaptive query processing
Late binding and parametric optimization
Competition
Mid-query reoptimization
Tuple-wise adaptive Query Processing
Eddies, SteMs, STAIRs
Commentary
What works and what fails
Open problems

4
Background on Non-Adaptive Query Processing

Central value proposition of relational model
declarative queries
You decide what data access (query) you want to
do the DBMS will figure out the right
algorithm (plan)
right usually translated to good enough
Standard method of realizing this cost-based
query optimization

5
Cost-based Query Optimization
Results
Query Optimizer
Query Executor
Compiled Query Plan
Declarative Query
Disk(s)
6
Non-Adaptive Query Processing

Several Improvements over the years
Wider plan space
Started out with left deep tree of binary join
operators
Now
lots more operators
Many more query transformations
multi-block SQL
bushy plans
Plan space has grown to be huge and messy
Use of statistics to get better costs
Better dynamic programming techniques
(transformational, memoization)
Good surveys exist Graefe 1993, Chaudhuri
1998

7
Success of Non-Adaptive QP

Used pretty much everywhere
every commercial DBMS uses some form of
cost-based optimization
Why?
Plan space is too wide and combinatorial
Other ideas like randomized optimization simply
dont search widely enough(e.g. all possible
ways of using indexes)
Benefits from 25 years of tuning
Widely viewed as success story for relational
databases, and as vindication of relational
model
Some qualifications
DBAs love and demand control (hints,
syntax-direction, plan-fixing across releases, )
Applications often help the query processor by
splitting a query into multiple phases (precursor
of adaptivity)

8
Where does traditional optimization break (1)

In recent years, optimizers have started to break
at the seams as they are extended to new
environments

SELECT p.profile_id, p.account_id,
p.profile_title, p.profile_default,
r.profile_detail_id AS registrant_detail_id,
r.profile_first_name AS registrant_first_name,
r.profile_last_name AS registrant_last_name,
r.profile_job_title AS registrant_job_title,
r.profile_organization_name AS registrant_organiza
tion_name, r.profile_address1 AS
registrant_address1, r.profile_address2 AS
registrant_address2, r.profile_city AS
registrant_city, r.state_id FROM
TABLE_ACCOUNT_PROFILES p LEFT JOIN
TABLE_ACCOUNT_PROFILE_DETAILS r ON p.profile_id
r.profile_id AND r.profile_type_id 1 LEFT JOIN
TABLE_STATES rs ON r.state_id rs.state_id LEFT
JOIN TABLE_COUNTRIES rc ON
Results
very complex user queries
Query Optimizer
Query Executor
Declarative Query
Disk(s)
9
Where does traditional optimization break (1)

In recent years, optimizers have started to break
at the seams as they are extended to new
environments

wide area federations
SELECT p.profile_id, p.account_id,
p.profile_title, p.profile_default,
r.profile_detail_id AS registrant_detail_id,
r.profile_first_name AS registrant_first_name,
r.profile_last_name AS registrant_last_name,
r.profile_job_title AS registrant_job_title,
r.profile_organization_name AS registrant_organiza
tion_name, r.profile_address1 AS
registrant_address1, r.profile_address2 AS
registrant_address2, r.profile_city AS
registrant_city, r.state_id FROM
TABLE_ACCOUNT_PROFILES p LEFT JOIN
TABLE_ACCOUNT_PROFILE_DETAILS r ON p.profile_id
r.profile_id AND r.profile_type_id 1 LEFT JOIN
TABLE_STATES rs ON r.state_id rs.state_id LEFT
JOIN TABLE_COUNTRIES rc ON
Results
very complex user queries
Query Optimizer
Query Executor
Declarative Query
data streams
X M L
10
Where does traditional optimization break (2)

Bad Statistics
DBA-challenged environment is the norm
New data types like XML and text
Deep Web and federated systems
Correlations (many optimizers implement only
marginal histograms)
Very dynamic environments
Continuous queries query runs for ever, so
things change
Wide area federations

11
Where does traditional optimization break (3)

Complex queries
Large numbers of tables (usually bail to greedy
beyond 10-20 tables in a join)
Queries with Parameter markers
Select average(salary) from Employee where age gt
1 and age lt 2
Need for sharing work between queries esp. in
continuous queries
Asynchronous (push) data sources (e.g. data
streams)
New Metrics for QP
E.g., Interactive metric
User preferences change too quickly for
optimization to pay off
Pipelined plans focus on early results

12
Agenda

Background on non-adaptive Query Processing
History, where it works, and where it breaks
Adaptive Query Processing
Dimensions of Adaptivity
Evolutionary Adaptive query processing
Late binding and parametric optimization
Competition
Mid-query reoptimization
Tuple-wise adaptive Query Processing
Eddies, SteMs, STAIRs
Commentary
What works and what fails
Open problems

13
Idea of Adaptive Query Processing

Interleave the optimization and execution stages
We learn better statistics as the query executes
Two main styles

Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
14
Idea of Adaptive Query Processing

Interleave the optimization and execution stages
We learn better statistics as the query executes
Two main styles

Measure monitor cardinalities, resource
utilization, etc
Analyze verify compliance with performance goals
(very tricky problem during intermediate stages
of a query)
Plan can involve optimizer, or be a routing
policy
Actuate switch to a new query plan, hopefully
without wasting too much work

16
Dimensions of Adaptation
Measure
Analyze
Actuate
Plan

Traditional database system use a simple form of
inter-query adaptation
Measurements done on the tables directly, and
analyzed during query optimization time
Also, by measuring the cardinalities during
execution
Adaptive selectivity estimation
Chen/Roussopoulos 1994
LEO Volker et al 2001
SITS Bruno/Chaudhuri 2002
But always a single plan is used to execute the
query fully

17
Agenda

Background on non-adaptive Query Processing
History, where it works, and where it breaks
Adaptive Query Processing
Dimensions of Adaptivity
Evolutionary Adaptive query processing
Late binding and parametric optimization
Competition
Mid-query reoptimization
Tuple-wise adaptive Query Processing
Eddies, SteMs, STAIRs
Commentary
What works and what fails
Open problems

18
Adaptive Query Processing
Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
19
Late Binding and Parametric Optimization

Idea let optimizer pick multiple plans for each
query
Choose the best one just before execution

Pick one and Actuate
Plan
Q
Plan
Plans
20
Late Binding and Parametric Optimization

Parametric optimization Graefe/Cole 1994,
Ioannidis 1992, Ganguly 1998, etc.
Pick optimal plans for various regions of the
parameter spaces
Do this during dynamic programming itself
Using plausible assumptions about cost functions
Challenges
makes optimization too expensive
Many plausible assumptions dont hold in practice
(see, e.g., Haritsa 2005)
Cannot address problems that 1st crop up during
execution

selectivity 2
selectivity 1
21
Late Binding and Parametric Optimization

Variation Validity Ranges (Markl/Raman/Lohman/Pi
raheh/Simmens 2005)
Attach to each edge of a query plan, a range of
cardinalities within which plan is optimal
Easier to compute during dynamic programming
optimizer
Use numerical root finding techniques works well
with real cost functions
Extends nicely to more general notions of robust
query optimization
e.g., range within which plan is within 50 of
optimal (Raman/Haas/Reder/Markl 2005).
But, Limited effectiveness
only tells you if a plan is broken, not how to
fix it
Meshes well with mid-query reoptimization

22
Competition
Measure
Run for a while
Actuate a few
Analyze
Pick faster one
Q
Plan
Plan
Plans

E.g. DEC RDB Antonshenkov 1993 run access
methods competitively
Measure monitor cardinalities, costs of access
methods
Analyze decide which access method has higher
rate
Advantage simple method
Challenge avoid redundant work, avoid duplicates
Use a shared hash table global timestamp
Raman/Deshpande/Hellerstein 2003
Needs more evaluation of overheads

23
Adaptive Query Processing
Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
24
Mid-Query Reoptimization

Kabra et al 1998, Markl et al 2005, Babu et al
2005
Switch query execution plans at well-defined
points
Typically, materialization points
Advantage
seems to work well in practice
Independent implementations
Challenge
coarse grained adaptation
Cannot always reuse work done in prior executions

If plan is sufficiently sub-optimal, switch to
new plan
25
Measure monitoring cardinalities

During execution, monitor cardinalities on each
edge of query plan
E.g., each call to getNext() increments count
must refine for operators that are executed
multiple times(e.g., inner of nested loops
joins) e.g. Stillger/Markl/Lohman 2001
As we monitor, we narrow down range of possible
values for true cardinality (primarily the lower
bound)
Can extrapolate to expected value of true
cardinality
E.g.
Expected value of c2 expected value of c1
(current c2) / (current c1)

c1
c2
op
26
Analyze Switching plans Mid-Query

Re-optimize the original query to form a new plan
Getting a better plan
Plug in all cardinality information learned
during this query
Reusing work
Plug in fully materialized relations from current
executionas materialized views
Materialization points
Sorts
Inner of hash joins
Explicit materializations (uncorrelated inners,
common subexpressions, )

27
Analyze when to switch plans

When no tuples have been output, and
When current plan is sufficiently sub-optimal,
and
Range of true cardinalities on an edge has no
overlap with validity range for that edge
Or, expected true cardinality falls far outside
of validity range
Cost of reoptimization gt cost of new plan
remaining cost of current plan
Hard to gauge this accurately many variables
How many intermediate results we will succeed in
reusing
how much better is the new plan
Relatively easy to gauge this if we switch at a
materialization point
Because cost of reoptimization is negligible

28
Adaptive Query Processing
Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
29
Query Scrambling

Amsaleg/Franklin/Tomasic/Urhan 1996,
Urhan/Franklin/Amsaleg 1998
Designed for coping with delays in a wide area
setting

Query Processor
Measure delays
Network
Response time optimization
Wide Area Data Sources
30
Adaptive Query Processing
Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
31
Agenda

Background on non-adaptive Query Processing
History, where it works, and where it breaks
Adaptive Query Processing
Dimensions of Adaptivity
Evolutionary Adaptive query processing
Tuple-wise adaptive Query Processing
Eddies, SteMs, STAIRs
Commentary
What works and what fails
Open problems

32
Eddies Avnur/Hellerstein 2000
Eddies combine all the dimensions of adaptivity
into a single operator
Measure
Eddy
Analyze
Actuate
Plan
33
Eddies Avnur/Hellerstein 2000
A traditional query plan queries executed using
iterator (getNext) model
Query execution using an eddy
Output
Eddy
S
Output
E
C
C

An eddy operator
Intercepts tuples from sources and output
tuples from operators
Uses feedback from the operators to route

S
E
34
Example Database
select from students, enrolled, courses where
students.name enrolled.name and
enrolled.course courses.course
Name Level Course Instructor
Jen Senior CS2 Smith
Name Level Course
Joe Junior CS1
Jen Senior CS2
Course Instructor
CS2 Smith
Courses
Name Course
Joe CS1
Jen CS2
Name Level
Joe Junior
Jen Senior
Students
Enrolled
35
Query Execution using Eddies
Probe to find matches
Insert with key hash(joe)

HashTable S.Name
HashTable E.Name
Joe Junior
Joe Junior
Joe Junior
Eddy
S
E
Output
C

HashTable E.Course
HashTable C.Course
No matches Eddy processes the next tuple
E C
36
Query Execution using Eddies
Probe
Insert

HashTable E.Name
HashTable S.Name
Joe CS1
Joe Jr
Joe CS1
Jen Sr
Joe Jr CS1
Eddy
S
Joe CS1
E
Output
C

HashTable E.Course
HashTable C.Course
Joe Jr CS1
Joe Jr CS1
CS2 Smith
37
Query Execution using Eddies
Probe

HashTable S.Name
HashTable E.Name
Joe CS1
Joe Jr
Jen CS2 Smith
Jen Sr
Jen CS2 Smith
Jen Sr. CS2 Smith
Eddy
S
Jen Sr. CS2 Smith
Jen CS2
E
Output
C

HashTable E.Course
HashTable C.Course
Probe
Jen CS2
CS2 Smith
Joe Jr CS1
Jen CS2 Smith
Jen CS2
Note this is a symmetric hash join
38
Eddies Postmortem
Output
Output
E C
E S
S E
Courses
C E
Students
Course Instructor
CS2 Smith
Name Level
Joe Junior
Jen Senior
Students
Enrolled
Courses
Enrolled
Name Level
Joe Junior
Jen Senior
Name Course
Joe CS1
Name Course
Jen CS2
Course Instructor
CS2 Smith

Eddy executes different query execution plans for
different parts of data
Eddy adapts the join order during query execution
Access methods and join algorithms still chosen
up front

39
Eddies and Pipelined Joins

Simplest way to describe eddies is in terms of
symmetric hash join
This is the approach we used in our example
Approach extends directly to index nested loops
joins
Can also extend this to non-pipelined joins (e.g.
sort-merge, or hybrid hash)
BUT, opportunities for adaptation are much less
Only at end of building whole hash table (just
like with mid-query re-optimizations)
Generalized using SteMs and STAIRs

40
Eddies Routing Policy Non-adaptive

Choosing which operator to route a given tuple to
The brain of the eddy

Send S and E tuples here Returns S JOIN E (SE)
tuples
Non-Adaptive Optimization 1. Run a regular
optimizer to figure out the join order 2. Map
join order into order of tuple routing
Eddy
S
E
Output
C
Send SE, and C tuples here Returns SEC tuples
41
Eddies Routing Policy Lottery Scheduling

Choosing which operator to route a given tuple to
The brain of the eddy

Send here 99 of the time Send to the other
operator 1 of the time
sent 100 received 2
Lottery Scheduling Avnur 00 Simplified
Description 1. Maintain for each operator
tuples sent tuples returned
cost per tuple 2. Choose (roughly) based on the
above 3. Explore by randomly sending tuples in
the wrong orders
Eddy
S
E
Output
C
sent 10 received 20
42
Eddies Routing Policy Statistics Based

Choosing which operator to route a given tuple to
The brain of the eddy

Deshpande, Hellerstein 04 1. Learn
statistics on each input table as data streams
in 2. Choose (roughly) based on the statistics
(analogous to traditional QO)
Eddy
S
Output
E
C
43
Eddies Routing Policy Interactivity Metric

Choosing which operator to route a given tuple to
The brain of the eddy

Raman, Hellerstein 02 Use user
preferences for different kinds of tuples and
different kinds of result columns to determine
the routing order
Eddy
S
Output
E
C
44
Agenda

Background on non-adaptive Query Processing
History, where it works, and where it breaks
Adaptive Query Processing
Dimensions of Adaptivity
Evolutionary Adaptive query processing
Tuple-wise adaptive Query Processing
Eddies, SteMs, STAIRs
Commentary
What works and what fails
Open problems

45
State Modules (SteMs) More aggressive adaptation

Eddies allow dynamic adaptation of join order
Within a particular spanning tree of the join
graph
Other adaptations
Access methods especially index vs scan access
methods
Join algorithms e.g., index join (IJ) vs hash
join (HJ)
To guard against wrong cardinality estimates
To simultaneously optimize for both pipelining
completion time (IJ?HJ)
To deal with memory overflows (HJ?IJ)

46
Query Processing with SteMs

Eddy execs queries by routing tuples to SteMs and
AMs
SteM
Dictionary of homogeneous tuples
(insert) build and (search) probe operations
Access modules (AMs)
probe with tuples and get matches
One Example Symmetric Hash Join
Build SteMR
Probe SteMS with R tuple
Build SteMS
Probe SteMR with R tuple
Generalizes to n-ary join
Can simulate other join algorithms, by using
other routing policy
Thus, changing the routing allows adaptation of
join algorithm
Subtle routing constraints needed to ensure
correct execution Raman et al 2003

S bld
R bld
S probe
R probe
Eddy
repeat
S
R
47
Another application of SteMs

Sharing State across queries
E.g. 1000 queries of the formselect from R
where R.a parameter
Equivalent toselect from R, Q where R.a
Q.parameter
Eddy routes tuples from R SteM and queries from Q
SteM
Useful for continuous-query proc. and multi-query
proc
Need to extend SteMs with eviction (deletion)
operation for window joins
See Madden et. al paper in SIGMOD 2002

48
Agenda

Background on non-adaptive Query Processing
History, where it works, and where it breaks
Adaptive Query Processing
Dimensions of Adaptivity
Evolutionary Adaptive query processing
Tuple-wise adaptive Query Processing
Eddies, SteMs, STAIRs
Commentary
What works and what fails
Open problems

49
Query execution using STAIRS Deshpande/Hellerstei
n 2004(Storage, Transformation and Access for
Intermediate Results)

Extend SteMs to handle reuse of intermediate
results

Probe into E.Name STAIR
s1
s1
s1
s1
50
Agenda

Background on non-adaptive Query Processing
History, where it works, and where it breaks
Adaptive Query Processing
Dimensions of Adaptivity
Evolutionary Adaptive query processing
Tuple-wise adaptive Query Processing
Eddies, SteMs, STAIRs
Commentary
What works and what fails
Open problems

51
So where are we

Evolutionary techniques
Relatively well understood
Commercial database support in recent years
But
Dont offer fine-grained adaptivity required for
the new environments
Going back to the list of reasons why adaptive
query processing is required

52
Evolutionary Techniques

Can handle to some extent
Queries with parameter markers
Complex queries with large tables
Bad Statistics/correlations (if used in
conjunction with sophisticated synopses)
Can not easily handle
Deep web and federated systems (maybe)
Interactive query metrics
Can not handle
Very dynamic environments
Continuous queries query runs for ever, so
things change
Extensive sharing of work between queries esp.
in continuous queries
Asynchronous (push) data sources, triggers, etc.
Open
XML (depends on the complexity of the query
language)
Text and other types of complex data

53
So where are we

Revolutionary techniques
Show great promise
Have the potential to handle most of the issues
But
Much harder to implement
Seem to have non-negligible overheads in the
common case
Very immature research area
More work is needed

54
Some Open Problems

Parametric query optimization
Can potentially avoid the need for adaptive query
processing in many scenarios
Especially parameter markers and correlations
Few practical techniques so far
Optimization with expanded plan space
Eddies can explore a plan space much larger than
traditional plan space
They allow relations to be broken into pieces,
with each piece executed separately
Can we explore this plan space in a non-adaptive
setting ?
Recent work on
Conditional Planning Deshpande et al, ICDE 2005
Content-based Routing Babu et al, VLDB 2005

55
Some Open Problems

Eddies
Routing policies
Whether eddies will succeed depends on the
routing policies
Little work so far...
SteMs, STAIRs
Theoretical analysis of optimization space
Practical viability analysis
Especially in the context of continuous query
processing
Parallel, distributed environments, P2P, Grid
Disk
Flexibility demanded by adaptive techniques at
odds against the careful scheduling typically
done by DBMSs
Very little work on understanding this

56
Conclusions

Declarative queries are the fundamental
justification for the relational model
Otherwise we could all go back to CODASYL
Traditional query optimization does not work all
that well outside of a very narrow domain
Evolutionary techniques
Well understood Offer limited adaptivity
Revolutionary techniques
Hold much more promise Immature area
Very exciting research lies ahead

57
Questions ?

Write a Comment

User Comments (0)