Title: Query Optimization over Web Services
1Query Optimization overWeb Services
- Utkarsh Srivastava
- Jennifer Widom
- Kamesh Munagala
- Rajeev Motwani
2Performance Numbers
Relative Contribution to Research
100
80
This Work
60
Percent Contribution
40
20
0
0
1
2
3
4
5
Time in Program (years)
3Future Directions (sample)
- Web services with monetary cost
- Web services with unstable response times
- (QoS guarantees?)
- Multiple web services for same data
- Caching web-service query results
- More expressive queries, also workflows
- Web service profiling and statistics-tracking
4First Steps in Big Problem
New Query Optimization Problem
5Web Services
- Standardized way of sharing data and
- functionality
- Description and discovery
Data, Functionality
WSDL,UDDI
Web Services
Users/ Clients
SOAP
6Example Web Services
Stock symbol
WS1
Company info
Reuters
Stock symbol
WS2
Stock activity
NASDAQ
7Querying Across Web Services
Get info about all companies with high-activity
stock
Stock symbol
WS1
Company info
Query
User/ Client
Reuters
Results
- Easy
- Transparent
- Efficient
- Etc.
Stock symbol
WS2
Stock activity
NASDAQ
8Same Basic Goal as Traditional DBMS
Declarative Interface
Query
User/ Client
Data
Database Management System
Results
- Easy
- Transparent
- Efficient
- Etc.
9Web Service Management System
Web Service Management System
- Easy
- Transparent
- Efficient
- Etc.
10WSMS Architecture
Declarative Interface
WSMS
WS Invocations
Metadata Component
Schema mapper
Web service registration
WS1
Query input data
Query Processing Component
WS2
Client
Plan selection
Plan execution
Results
Profiling and Statistics Component
WSn
Statistics tracker
Response- time profiler
11Running Example
- Credit card company wants to send offers to
- people with
- credit rating gt 600, and
- payment history good on prior credit card
- Company has at its disposal
- L List of potential recipients (identified by
SSN) - WS1 SSN ? credit rating
- WS2 SSN ? cc number(s)
- WS3 cc number ? payment history
12Plan 1
WSMS
SSN
SSN cr
1 500
2 700
WS1
SSN,cr
SSN?cr
SSN
1
2
Filter on cr, keep SSN
L(SSN)
SSN ccn
2 123
2 456
Query Plan
WS2
Client
SSN?ccn
SSN,ccn
SSN
2
ccn ph
123 bad
456 good
WS3
SSN,ccn,ph
ccn?ph
Filter on ph, keep SSN
Note Pipelined processing
13Simple Representation of Plan 1
WS1
WS3
WS2
L
Results
ccn?ph
SSN?cr
SSN?ccn
14Plan 2
WSMS
SSN cr
1 500
2 700
WS1
SSN
SSN,cr
SSN
1
2
SSN?cr
Filter on cr, keep SSN
SSN
SSN
L(SSN)
SSN ccn
2 123
2 456
WS2
Client
SSN?ccn
Join
SSN,ccn
SSN
2
ccn ph
123 bad
456 good
WS3
SSN
SSN,ccn,ph
ccn?ph
Filter on ph, keep SSN
15Simple Representation of Plan 2
SSN?cr
WS1
L
Results
WS2
WS3
SSN?ccn
ccn?ph
16Quiz
Which plan is better?
WS1
WS3
WS2
Plan 1
L
Results
WS1
Plan 2
L
Results
WS2
WS3
- Cost metric steady-state throughput
- Assume join is free
Plan 1 is never worse
17Query Optimization Primer
- Possible query plans P1, , Pn
- Data/access statistics S
- Execution cost metric cost(Pi, S)
- GOAL Find least-cost plan
18Query Optimization Primer
- Possible query plans P1, , Pn
- Data/access statistics S
- Execution cost metric cost(Pi, S)
- GOAL Find least-cost plan
19Queries and Plans
- Select-Project-Join queries over input data L
- and set of web services WS1, , WSn
- Precedence constraints
- Output of WSi may be needed as input for WSj
- Ex WS2 SSN ? ccn and WS3 ccn ? ph
- Precedence DAG defines space of query plans
20Query Optimization Primer
- Possible query plans P1, , Pn
- Data/access statistics S
- Execution cost metric cost(Pi, S)
- GOAL Find least-cost plan
21Statistics
- Web service response times
- Web service selectivities
New Query Optimization Problem
22Statistics Response Times
- ri per-tuple response time of WSi from client
SSN
Client
WS1
SSN?cr
cr
r1
- ri 1/throughput, can be reduced by batching,
parallel calls
batching
(see paper)
- Assume independent response
- times within query plans
New Query Optimization Problem
23Statistics Selectivities
- si selectivity of WSi
- Average output tuples per input tuple to WSi
- including post-filtering in query plan
- WS1 SSN ? cr, filter cr gt 600
- If 90 of SSNs have cr gt 600 then s1 0.9
- WS2 SSN ? ccn
- If on average each SSN has 2 credit cards then s2
2.0
- Assume independent
- selectivities within query plans
New Query Optimization Problem
24Query Optimization Primer
- Possible query plans P1, , Pn
- Data/access statistics S
- Execution cost metric cost(Pi, S)
- GOAL Find least-cost plan
25Bottleneck Cost Metric
New Query Optimization Problem
26Bottleneck Cost Metric
Conference Lunch Buffet
Dish 1
Dish 2
Dish 3
Dish 4
Average per-tuple processing time response time
of slowest (bottleneck) stage in pipeline Note
selectivities1 in this example
27Cost Equation for Plan P
- Ri(P) Predecessors of WSi in plan P
?j?Ri(P) sj
- Fraction of input tuples seen by WSi
- WSi response time per input tuple
(?j?Ri(P) sj)ri
cost(P) max1in( (?j?Ri(P) sj)ri )
(assumes WSMS processing is not the bottleneck)
28Contrast with Sum Cost Metric
cost(P) ?1in( (?j?Ri(P) sj)ri )
- Stream filter ordering
- Expensive predicate placement
Polite Lunch Buffet
Dish 1
Dish 2
Dish 3
Dish 4
29Problem Statement
- Input
- Web services WS1, , WSn
- Response times r1, , rn
- Selectivities s1, , sn
- Precedence constraints among web services
- Output
- Web services arranged into a plan P
- P respects all precedence constraints
- cost(P) is minimized
30No Precedence Constraints
- All selectivities 1
- Theorem Optimal to order linearly by ri
- (selectivities irrelevant)
- General case
- (optimal)
proliferative web services
selective web services ordered by response-time
join at WSMS
Results
31With Precedence Constraints
cost(P) max1in( (?j?Ri(P) sj)ri )
32With Precedence Constraints
cost(P) ?1in( (?j?Ri(P) sj)ri )
- Sum cost metric
- Hard to even obtain a factor O(n?) of optimal
33With Precedence Constraints
cost(P) max1in( (?j?Ri(P) sj)ri )
- Bottleneck (max) cost metric
- Surprisingly, optimal solution in polynomial time
- O(n5) algorithm in paper
- Add one WS at a time to the plan
- WS chosen by solving a linear program
34Example Revisited
WS1
WS3
WS2
Plan 1
WS1
WS2
WS3
L
Results
SSN?cr
SSN?ccn
ccn?ph
SSN?cr
max1in( (?j?Ri(P) sj)ri )
WS1
WS1
Plan 2
L
Results
WS2
WS3
WS2
WS3
SSN?ccn
ccn?ph
Selective
WS3
WS2
Precedence constraint
Proliferative
35Implementation
- Built prototype WSMS query processor
- Optimizer and execution engine
- Assumes schema issues resolved, statistics
provided - Written in Java and uses Apache Axis
(open-source SOAP implementation) - Experiments (see paper) validate analytical
results
36Isnt Problem the Same as ?
- Web Service composition
- Targeted for workflow-oriented applications
- No provably optimal strategies
- Parallel/distributed query optimization
- Freedom to place query operators
- Much larger space of execution plans
- Data integration, mediators
- For general sources of data
- Optimization of total resource consumption
37Future Directions (sample)
- Web services with monetary cost
- Web services with unstable response times
- (QoS guarantees?)
- Multiple web services for same data
- Caching web-service query results
- More expressive queries, also workflows
- Web service profiling and statistics-tracking
38Conclusion
New Query Optimization Problem
39Conclusion
New Query Optimization Problem
Our contribution
40Questions?
100
80
60
Percent Contribution
40
20
0
0
1
2
3
4
5
Time in Program (years)