Title: Adaptive Execution of Variable-Accuracy Functions
1Adaptive Execution of Variable-Accuracy
Functions
Matt Denny - UC Berkeley/Fred Alger, Inc.Michael
Franklin - UC Berkeley
- VLDB Conference
- Seoul
- September 2006
2Introduction
- Many applications apply expensive functions to
streams of data - Finance real-time market monitoring with
securities models - Power Management overload prediction using
current weather conditions - Supply Chain Management inventory models using
RFID data to find shortages in real-time
3Continuous Queries w/ UDFs
4The Problem
- Analytical functions can be expensive!
- minutes or hours per data point.
- Query processor has no control over execution of
individual function calls. - UDF API is a Black Box
- Earlier work aims to avoid UDF calls
- predicate reordering (HS93KMPS94CS96))
- memoization and caching (HN96, DF05)
- Remaining calls can still be a showstopper.
5The Intuition
- Many functions have accuracy/cost tradeoffs.
e.g., iterative solvers. - UDFs often appear in predicates and aggregates
where exact answers are not required.
6Our Solution
- VAOs (Variable Accuracy Operators)
- New query operators that
- Expose function cost/accuracy tradeoffs using a
new UDF API. - Exploit this tradeoff to avoid excess work while
correctly answering the query.
7VAOs - Basic Idea
- Initially run function to obtain a coarse answer.
- This needs to be cheaper than running to a more
accurate answer. - If more accuracy needed - iterate!
8Traditional Execution - Select
SELECT BD.bondID FROM BondData BD, IntRate IR
Rows 1 WHERE model(BD,IR.rate) gt 100
9VAO Execution Select
SELECT BD.bondID FROM BondData BD, IntRate IR
Rows 1 WHERE model(BD,IR.rate) gt 100
10VAO Execution Select
SELECT BD.bondID FROM BondData BD, IntRate IR
Rows 1 WHERE model(BD,IR.rate) gt 100
11VAO API
- Use iterative interface
- Traditional ltnumbergt f(ltargsgt)
- VAO ltresult objectgt f(ltargsgt)
- fields for (conservative) error bounds
- iterate() method refines bounds with more work
- for some vaos also need estimates for CPU cost
and error reduction of next iteration - Useful for
- Any sort of iterative function (e.g. root
finders, numerical integration) - Any technique with iterative step refinement
(e.g. PDEs)
12Iteration Strategy
- Selection iterates over an object until predicate
value is known. - Aggregate operators more difficult
- Answer dependent on sets of result objects
- Need to decide how to iterate over multiple
result objects
13Example MAX(f(x1), f(x2))
Need an iteration strategy that attempts to
minimize cost
14Solution Greedy Strategy
- Iterate over the object that has the best ratio
of benefit to CPU cost among the current choices. - Good strategy if functions converge
- Later iterations likely to have less benefit/unit
cost - Operator-dependent
15Example Revisited
- Goal State no overlap between f(x1) and f(x2)
- Greedy Strategy
- choose best overlap reduction per CPU cost
- Use error reduction estimates to estimate overlap
reduction. - Cost estimation depends on function.
16Example Revisited
- Determine if f(x1) gt f(x2)
Function Overlap Red. Est. CPU Cost Est.
f(x1)
f(x2)
17Example Revisited
- Determine if f(x1) gt f(x2)
Function Overlap Red. Est. CPU Cost Est.
f(x1)
f(x2)
18Example Revisited
- Determine if f(x1) gt f(x2)
f(x)
x
x
x
1
2
Function Overlap Red. Est. CPU Cost Est.
f(x1)
f(x2)
19Aggregates
Operator Goal State Greedy Heuristic
min/max(general) No overlap between minimum (maximum) value and other function error bounds Make educated guess for max. Choose iteration that reduces most overlap between guess and other error bounds per cycle
avg/sum avg/sum of error bounds have width less than user-defined tolerance Choose iteration which reduces avg/sum of bounds the most per cycle
20Performance Setup
- Standalone implemenation of VAO framework in C
- Used numeric bond model and bond data from DF05
- Real Bond Data - 500 Mortgage-backed Securities.
- Synthetic Bond Data - to stress test VAOs
- Single Interest Rate.
21VAO Implementation
- Numeric bond model S95 implemented with
traditional and VAOs interface - Based on PDE solver
- VAO iterate() double size of PDE grid
- Bounds and error reduction estimates derived by
using current and previous iteration results and
Richardsons Extrapolation BF01
22Selection Performance
- 500 bonds, 1 interest rate
Runtime depends on number of bonds close to
predicate.
23Stress Test
- Generate bonds with accurate values near the
predicate - Gaussian, mean predicate value, vary std. dev.
- Std. dev. of real
- bonds 7.78
24In the Paper
- Other Results
- Max
- Real bonds 111 sec. vs. 6953 sec.
- Synthetic bonds VAOs better than traditional
above .05 std. dev. - Average
- Up to 5x improvement if a small number of bonds
are weighted heavily in average. - Details on Error and Cost estimates for PDE-based
bond model. - Other types of models covered in Matts thesis.
-
25Conclusion
- Many emerging CQ applications require the
repeated execution of expensive functions. - VAOs are new operators that change how these
functions execute - Use new iterative API that exposes work-accuracy
tradeoff in functions - Do only enough work to answer the query using
greedy strategy to choose iterations - With real bond data and models, VAOs show 1-2
orders of magnitude improvement. - For more detailed information
- mdenny_at_cs.berkeley.edu
26The Advisors Dodge
Relative Contribution to Research
100
80
This Work
60
Percent Contribution
40
20
0
0
1
2
3
4
5
Time in Program (years)
Courtesy of Jennifer Widom
27Bibliography
- HS93 J. M. Hellerstein and M. Stonebraker,
Predicate Migration Optimizing Queries with
Expensive Predicates, SIGMOD 1993. - HN96 J. M. Hellerstein and J. Naughton, Query
Execution Techniques for Caching Expensive
Predicates, SIGMOD 1996. - DF05 M. Denny and M.J. Franklin. Predicate
Result Range Caching for Continuous Queries,
SIGMOD 2005
28Bibliography
- S95 R. Stanton, Rational Prepayment and the
Valuation of Mortgage-Backed Securities, The
Review of Financial Studies, Vol. 8, No. 3,
677-708. - BF01 R.L. Burden, J.D. Faires, Numerical
Analysis. Brooks/Cole, 2001.