Title: Partial QueryEvaluation in Internet Query Engines
1Partial Query-Evaluation in Internet Query Engines
- Jayavel ShanmugasundaramKristin TufteDavid
DeWittDavid MaierJeffrey Naughton
University of Wisconsin Oregon Graduate
Institute
2Outline
- Motivation
- Desired Operator Properties
- Implementation Alternatives
- Performance Evaluation
- Conclusion
3Querying the WWW The Present
Who won the Nobel prize for Physics in 1999?
4Querying the WWW The Present
HTMLFile
HTMLFile
HTMLFile
HTMLFile
HTMLFile
HTMLFile
HTMLFile
Want 1998 Red BMWNo accidents 20 price
5Querying the WWW The Future?
Want 1998 Red BMWNo accidents 20 price
6Inside the Internet Query Engine
(carId, model, price, otherinfo)
(carId, model, price, otherinfo)
Red Used BMW Cars
7The Problem
- Return results to users as soon as possible
- Results so far for queries with blocking
operators - Arbitrary blocking operators
- Not exists, Average, Nest
- Blocking operators occurring anywhere in the
query - Potentially intermixed with non-blocking operators
8Outline
- Motivation
- Desired Operator Properties
- Implementation Alternatives
- Performance Evaluation
- Conclusion
9What is a Partial Result of a Query?
- Let Full Result of Query Q on Inputs A and B be
- Q(A, B)
- Then Partial Result of Query Q on Inputs A and B
is - Q(PA, PB)
- PA ? A
- PB ? B
10Maximal Output Property
- Produce correct results as soon as possible
- Why?
- If query is non-blocking
- Produces results soon
- If query is blocking
- Return non-blocking parts soon (e.g., outer
join)
11Inside the Internet Query Engine
(carId, model, price, otherinfo)
(carId)
(carId, model, price, otherinfo)
Red 1998 BMW Cars
Accident Reports
12Anytime Property
- Blocking operators should be able to return the
result so far at any time
- Why?
- User can request partial results at any time
13Inside the Internet Query Engine
(carId, model, price, otherinfo)
(carId)
(carId, model, price, otherinfo)
Red 1998 BMW Cars
Accident Reports
14Non-Monotonic Input/Output Property
- Operators should handle changes, not just
additions to input - Similarly, operators should produce changes,
not just additions to output - Both blocking and non-blocking operators
- Why?
- Partial results may represent wrong answers
- Need to be corrected later
15Inside the Internet Query Engine
(carId, model, price, otherinfo)
(carId)
(carId, model, price, otherinfo)
Red 1998 BMW Cars
Accident Reports
16Flexible Input Property
- Should be able to process data from any input at
any time - Processes data as it becomes available
- Why?
- If query is non-blocking
- Can return results soon
- If query is blocking
- Faster partial result response time
17A Note on Partial Result Accuracy
- Focus is on producing partial results
- Architecture is general enough to exploit
existing techniques - Online aggregation Hellerstein et. al.
- Nested aggregates Tan et. al.
- Accuracy for general blocking operators?
18Outline
- Motivation
- Desired Operator Properties
- Implementation Alternatives
- Performance Evaluation
- Conclusion
19Where do we start?
- Use known flexible input, maximal output operator
implementations - Non-blocking select, symmetric hash join, Xjoin
- Blocking group-by, symmetric outer join
- Blocking operator implementations should satisfy
anytime property - All operator implementations should satisfy
non-monotonic input/output property
20Non-Monotonic Input/Output
- Re-evaluation Approach
- On partial result request, compute results so
far - Then forget all potentially incorrect inputs
- Differential Approach
- On partial result request, compute results so
far - Update incorrect inputs for future result
computation
21Inside the Internet Query Engine
(carId, model, price, otherinfo)
(carId)
(carId, model, price, otherinfo)
Red 1998 BMW Cars
Accident Reports
22Re-evaluation Join
(1, Z3, 10000)
(Z3, 15000)
(19, Z3, 20000)
(3, 400i, 20000)
(400i, 25000)
(5, 400i, 30000)
(3, 400i, 20000)
(19, Z3, 20000)
(Z3, 15000)
(1, Z3, 10000)
(400i, 25000)
(5, 400i, 30000)
23Re-evaluation Join
(1, Z3, 10000)
(Z3, 15000)
(19, Z3, 20000)
(3, 400i, 20000)
(400i, 23333)
(5, 400i, 30000)
(8, 400i, 20000)
(Z3, 15000)
(8, 400i, 20000)
(400i, 23333)
24Differential Join
(1, Z3, 10000)
(Z3, 15000)
(19, Z3, 20000)
(3, 400i, 20000)
(400i, 25000)
(5, 400i, 30000)
(3, 400i, 20000)
(19, Z3, 20000)
(Z3, 15000)
(1, Z3, 10000)
(400i, 25000)
(5, 400i, 30000)
25Differential Join
(1, Z3, 10000)
(Z3, 15000)
(19, Z3, 20000)
(3, 400i, 20000)
(400i, 25000)
(5, 400i, 30000)
update (400i, 23333)
26Differential Join
(1, Z3, 10000)
(Z3, 15000)
(19, Z3, 20000)
(3, 400i, 20000)
(400i, 23333)
(5, 400i, 30000)
(8, 400i, 20000)
(8, 400i, 20000)
27Re-evaluation vs. Differential
- Re-evaluation Approach
- Simple just forget partial inputs
- Easier to extend (no changes to tuple structure)
- Unnecessary computation
- Differential Approach
- Need to handle deletions/updates of inputs
- Changes to tuple structure
- Re-computes only what is necessary
28Outline
- Motivation
- Desired Operator Properties
- Implementation Alternatives
- Performance Evaluation
- Conclusion
29Response Time
30Outline
- Motivation
- Desired Operator Properties
- Implementation Alternatives
- Performance Evaluation
- Conclusion
31Conclusion
- New properties for query engine operators
- Operator implementation alternatives
- Re-evaluation
- Differential
- Evaluation
- Partial results improve response time
- Re-evaluation approach is simpler
- Differential approach is more efficient
32Future Work
- General GUI
- Partial result accuracy for general blocking
operators - Changes at finer granularities
- Consistent partial results
33Related Work
- Online aggregation Hellerstein et. al.
- Nested aggregates Tan et. al.
- Online reordering Raman et. al.
- Symmetric hash join Wilschut et. al.
- Adaptive operators Ives et. al.
- XJoin Urhan et. al.