Title: Planning for the Web II Execution & Service Integration
1Planning for the Web IIExecution Service
Integration
- Dan Weld
- University of Washington
- June, 2003
2Acknowledgements
- Oren Etzioni
- Yolanda Gil
- Keith Golden
- Alon Halevy
- Zack Ives
- Tal Shaked
Caveat
3Outline
- Execution for Data Integration
- Coping with incomplete statistics, latency
- Interleaved planning execution
- Convergent query processing
- Service Integration
- Web service composition
- Background
- Representational issues
- Planning algorithms
- Automated data analysis
4Optimization and Execution
- Problem
- Few and unreliable statistics about the data.
- Unexpected (possibly bursty) network transfer
rates. - Generally, unpredictable environment.
- General solution (research area)
- Adaptive query processing.
- Interleave optimization and execution. As you get
to know more about your data, you can improve
your plan.
5Adaptivity Incremental Processing Query
Performance
Evaluated within the Tukwila system
Ives PhD
6Query Optimization Model Query Plans Execution
Choose the Best
op
op
op
op
Shipping (S) 90 tuples
Restock (R) 100 tuples
Restock (R) 100 tuples
Orders (O) 50 tuples
Orders (O) 50 tuples
Shipping (S) 90 tuples
From source sizes, stats, estimate result sizes,
costs
7Why Does Data Integration Make Optimization
Harder?
- Query optimization estimates costs using
knowledge about environment and data - Data source sizes (cardinalities)
- Often unavailable or not meaningful in data
integration - Histograms
- Too expensive to maintain in data integration
- I/O costs
- Network I/O costs fluctuate
- Need a way to gain this sort of knowledge!
8Some Solutions
- Adaptive operators
- Mid query reoptimization
- Convergent query processing
- Query scrambling Franklin et al.
- Eddies Hellerstein et al.
9Tukwila Data Integration System
- Novel components
- Event handler
- Optimization-execution loop
- Adaptive operators
10Double Pipelined Join
- Hybrid Hash Join
- No output until build relation read
- Asymmetric (build vs. probe) optimization
requires source behavior knowledge
- Double Pipelined Hash Join
- Outputs data immediately
- Symmetric requires less source knowledge to
optimize - Threads overlap I/O, computation
11Performance on Networked Data
Join of 3 tables sent via JDBC over 10Mb
Ethernet TPC-H Lineitem Supplier Order
Time (sec)
Tuples Output (1000s)
12Double Pipelined Join in Summary
- Benefits
- Easier to optimize (symmetric)
- Sub-operations scheduled flexibly
- Allows overlap of I/O and computation
- Incurs some overhead
- Threading, queues
- Required extensions to intelligently handle
overflow - Same hash function, number of buckets for each
side - Approaches flush buckets on left side or flush
symmetrically
13Some Solutions
- Adaptive operators
- Mid-query reoptimization
- Interleaved planning and execution
- Convergent query processing
- Query scrambling
- Eddies
14Mid-query reoptimization
Materialization Point write AB to disk
If actual ? predicted statistics ? replan Kabra
DeWitt
15Some Solutions
- Adaptive operators
- Mid query reoptimization
- Convergent query processing
- Query scrambling
- Eddies
16Convergent Query Processing
- Instead of adapting remainder of plan
- after executing all data on plan prefix
- Adapt whole plan
- after executing whole plan on part of data
- Can better gather information this way
17Convergent Query Processing in Action Changing
Join Plans in Mid-Stream
(R ? O ? S)
Join Restock, Orders, Shipping
ROS
RS
18Breaking a Join into Phases One Subset per
Table, Each Phase
Restock (R)
Orders (O)
19The Cleanup Plan Reuses PreviousWork Where
Possible
Restock ? Orders ? Shipping
?
R2 O2S2
R0 O0S0
R1 O1S1
R2O2
R0 S0
O1S1
20CQP on a 100Mbps LAN Nearly Optimal
Performance
866MHz P-III, 256MB buffer pool, re-optimization
every 10sec
cost to parse XML
21Slow WAN, Faster CPU CQP Reduces Work
1GHz P-III, 256MB, re-optimization every 10sec.
1Mbps network, RTT 50msec
22Outline
- Execution for Data Integration
- Coping with incomplete statistics, latency
- Interleaved planning execution
- Convergent query processing
- Service Integration
- Web service composition
- Background
- Representational issues
- Planning algorithms
- Automated data analysis
23What is a Web Service
- A web service is a network accessible interface
to application functionality, built using
standard Internet protocols (TCP/IP, XML, SOAP,
WSDL - Clients of a web service do NOT need to know how
it is implemented. - Why interesting?
- Increased automation
Web Service
Network
Application code
Application client
24Case Study Amazon
- Services Exported
- Product details (short, long, images, samples)
- Purchase functionality
- Ratings, reviews, collaborative filtering data,
lists, - Examples
- Store builder tools
- Amazon Browser visualization tool
- Windows desktop interfaces drag-n-drop
- MP3 Piranha
- Games
- Automatic review writer??
25Case Study Google
- Services Exported
- Search interface
- Limits on items returned, queries / day
- Examples
- Metacrawler functionality
- Geosearch nearby thai restaurants
- TIGER, FIPs -gt lat,long of pages
- Robust hyperlinks
- Creates a signature for destination pages
tracks with query
26Case Study Fed Express
- Shipment tracking
- Proof of delivery
- Invoice reviewed, adjusted, settled
- Schedule pickup time, location
- Outgoing or returns
- Order supplies (airbills, envelopes, boxes)
- Review shipping history
- Rate requests
- Location, package size
- International trade
- Required documents, duties, taxes
27Case Study Hailstorm / MyServices
- Web Services
- MyDocuments
- MyAddressbook
- MyWallet
- MyNotifications .
- Scenario
- Wallet keeps receipts, arranges product return
- Expedia uses notifications to warn of canceled
flight - Reality
- Ebay, AmEx, Groove,
28Case Study OAA
- Common schema for travel industry
- Reservations
- Flights, trains, rental cars, hotels
- Time distances
- Payment, deposits, vouchers
- Vacation Packages
29Web Service Technology Stack
shopping web service? WSDL URIs
Web Service Client
UDDI
Discovery
Web Service
Description
WSDL
WSDL
SOAP pkg request
Packaging
Proxy
SOAP pkg response
Transport
Network
30Step1. Write Web Service Method
shopping web service? WSDL URIs
Web Service Client
UDDI
Discovery
Web Service
Description
WSDL
WSDL
SOAP pkg request
Packaging
Proxy
SOAP pkg response
Transport
Network
31Step2. Describe Web Service using WSDL
shopping web service? WSDL URIs
Web Service Client
UDDI
Discovery
Web Service
Description
WSDL
WSDL
SOAP pkg request
Packaging
Proxy
SOAP pkg response
Transport
Network
32SOAP (Simple Object Access Protocol)
- SOAP Messages
- XML Payload
- Using SOAP as RPC (Remote Procedure Call) Messages
SOAP client
SOAP server
Request message
Response message
33If a WS were a Phone Call
- XML
- represents the conversation,
- SOAP
- describes the rules for how to call someone
- UDDI
- is the phone book.
- WSDL
- describes what the phone call is about and how
you can participate.
34WSDL
for int foo(int arg)
lttypesgt ltschema targetNamespace"http//tempuri.
org/xsd" xmlns"http//www.w3.org/2001/XMLSchem
a" xmlnsSOAP-ENC"http//schemas.xmlsoap.org/s
oap/encoding/" xmlnswsdl"http//schemas...l/"
elementFormDefault"qualified" gt lt/schemagt
lt/typesgt ltmessage name"Simple.foo"gt ltpart
name"arg" type"xsdint"/gt lt/messagegt ltmessage
name"Simple.fooResponse"gt ltpart name"result"
type"xsdint"/gt lt/messagegt ltportType
name"SimplePortType"gt ltoperation name"foo"
parameterOrder"arg" gt ltinput
message"wsdlnsSimple.foo"/gt ltoutput
message"wsdlnsSimple.fooResponse"/gt
lt/operationgt lt/portTypegt
35DISCO
- If you know the URL for a service
- DISCO lets you query them
- And get back a WSDL description
- But what if you dont know the right URL?
36UDDI
- Hosted Registries
- Microsoft, IBM, HP, SAP, NTT, BEA
- Entries defined with
- Business information
- Name, contacts, descriptions, identifier, yellow
pages category - Service information
- Entities, each of which describes a family of
related services which together implement a
business process - Binding information
- How to invoke URI, required parameters, options,
Tmodel - Service specifications (Tmodel)
- As a symbol fingerprint to recognize a known
service - Decomposable to find WSDL description
37Acronyms (W3C, MSFT, IBM)
- UDDI
- Discover, describe, register services
- SOAP-based service for locating WSDL-formatted
service descriptions - DISCO
- Discover / retrieve SCLSDL descrips
- SDL / NASSL
- SOAP description lang get params / types
- SCL
- SOAP contract lang extends SDL orchestration
of msgs - WSDL
- Describe abstract interface and protocol
bindings of arbitrary network services
(extends scl) - XLANG / WSFL / BPEL4WS
- lang for biz processes used in BizTalk
- Biz process execution language for web services
- MSFT, IBM, BEA proposal
SDL
NASSL
SCL
WSDL
38The Layer Cake TBL,XML2000
39RDF (Resource Description Framework)
- Way to describe resources via metadata
- Makes no assumptions about a particular
application domain - Based on XML
- Another one?
- Standard for semantic web
- Restricts resource descriptions to triplets
- (subject,predicate,object)
- Provides a lightweight ontology system
- Subproperty, Subclass, Domain Range
40DAMLOIL (www.daml.org)
- DAML extends RDF and RDFS with richer modeling
primitives. - disjointWith, intersectionOf, oneOf, cardinality
- Able to provide properties of properties
- uniqueness, transitivity, etc.
41DAML-S
- DAMLOIL ontology describing Web Services
- Complements low level descriptions like WSDL
- Describes what and why a service operates,
- Not just how to communicate with it.
- Goals Discovery, Invocation, Composition,
- Verification, Execution Monitoring
(mapping to WSDL)
42Outline
- Execution for Data Integration
- Coping with incomplete statistics, latency
- Interleaved planning execution
- Convergent query processing
- Service Integration
- Web service composition
- Background
- Representational issues
- Planning algorithms
- Automated data analysis
43Partial Survey of Planners
- UW Internet Softbot
- Planners SENSp / XII / PUCCINI
- Repr. languages UWL / SADL LCW
- PKS
- Planning at the knowledge level
- McDermott
- Forward-chaining search w/ GRG guidance
- McIlraith et al.
- ConGolog (procs, loops, conditionals, w/ nondet
- Papazoglou, Traverso et al.
- Stratified service arch XSRL language MBP
- Finin Srivastava Knoblock Ambite Nau
44Planning for image processing tasks
- Many fielded systems
- Lanskys COLLAGE , Chien et al. MVP/ASIP,
- Golden ADLIM, Blythe GRID
- Spatial representations important
45Motivating Scenarios
- Planning a trip
- Yahoo maps -gt driving time -gt travel prefs
- Automatic expense form filing
- Purchasing a group of items
- Aggregation from multiple vendors
- Select for payment types, stock level, deliv
- Local 3rd party reputation services (BBB)
- Monitoring marketplace
- Auction sites
- Events (check calendar / notification service
46UW Internet Softbot
- Software robot
- Effectors mv, ftp, chmod, cd, lpr, rm, ...
- Sensors ls, finger, INSPEC, netfind, wc, ...
- Say what we want, not how to do it
- Find phone numbers, fetch/print online papers,
- Integrate multiple resources
47Motivation/Contributions
- Represent actions like ls, finger
- Represent goals such as
- Rename paper.tex to kr.tex
- Print all files in directory papers.
- (even with incomplete information)
- No previous system could express
48The Middle Ground
1. Action Representation
2. Knowledge Representation
49Softbot Architecture
50SADL Family Tree
Fikes Nilsson, 71
STRIPS
Etzioni et al, 92
Pednault, 89
", Conditional Effects
Incomplete info, Noise-free sensors
UWL
ADL
SADL
Represents ls, Rename, finger...
Golden Weld, 96
51SADL/UWL Annotations
- Goal annotations
- satisfy achieve by any means
- hands-off dont change (maintenance)
- Effect annotations
- cause change world
- observe change agents knowledge
- Delete the file named junk
- satisfy (name (ƒ, junk)) Ù satisfy(deleted (ƒ))
52Information Goals are Temporal
- Two time points
- When proposition sampled
- When reply given
- Tell me now who was President in 1883
- Tell me tomorrow who is President now
- Identify (ASAP) the file now named junk
53Information Goals are Temporal
- Rename paper.tex to kr.tex
- designator (name) changes
- UWL cant express
- SADL solution initially time goal
was posed - initially (name (ƒ, paper.tex)) Ù
- satisfy (name (ƒ, kr.tex))
- initially (name (ƒ, core)) Ù satisfy (deleted (ƒ
)) - Compare to more general temporal representation
54Tidiness Goals
- Print paper, but dont leave it uncompressed.
- initially (compressed (paper), tv) Ù
- satisfy (printed (paper)) Ù
- satisfy (compressed (paper), tv)
- State of paper.ps may change temporarily
- but must be restored
- Compare to more general goal lang, e.g. LTL
C
B
55Unbounded Information Gain
- action ls (d )
- precondition satisfy(current.shell(csh)) Ù
- satisfy(readable(d ))
- effect " f when in.dir(f, d)
- l,n,d
observe(length(f, l )) Ù - observe(name(f, n )) Ù
- observe(in.dir(f, d ))
56Compare PKS Representation
Initial State Kf ( (pwd) root), (indir
papers root), (indir planner root), (dir
root), (dir papers), (dir planner), (file
paper_tex) Kx ((indir paper_tex planner)
(indir paper_tex papers)) Goal K(indir
paper_tex (pwd))
57The Internet Softbot
58Knowledge Representation
- Closed World Assumption (CWA)
- Made by classical planners
- Anything not recorded as true is false
- Open World Assumption (OWA)
- Anything not recorded true or false is unknown
- Sensor abuse
- Cant handle " goals
59Sensor Abuse
- OWA Dont know when to stop sensing
- Many ways to find same information
- Many plans containing same action
- After executing find / -name foo, should know
- ls bin wont reveal more files named foo
- ls tex wont reveal more files named foo
- Google may reveal more files named foo
60How Classical Planners Handle "
- " block (x) OnTable (x)
- replaced with
- OnTable (A) Ù OnTable (B)
- Ù OnTable (C)
- Relies on CWA
- Must know all blocks
- OWA can never be sure
B
A
C
C
61Local Closed World Knowledge
- Complete info over restricted domain
- All blocks on table, all products at Amazon
- Local Closed World Knowledge (LCW)
- Restricted form of circumscription
- Provides fast closed world inference
- Allows fast updates
- Suited to planner action representations.
62LCW Semantics
- I know all files in directory bin
- LCW(in.dir(f, bin))
- LCW(in.dir(f, bin)) º
- "f ? in.dir(f, bin) Ú
- ? Øin.dir(f, bin)
63LCW Representation
- M Ground literals in agents model
- in.dir(icaps03, papers)
- in.dir(junk, papers)
- Ø executable(core)
- L LCW formulas in agents model
- LCW(in.dir(f, papers))
- If P Ï M, and L ? LCW(P), then ØP
- Conclude Ø in.dir(foo, papers)
64LCW Reasoning
- Inference
- If I know all files in tex, and I know the size
of every file, then do I know the size of every
file in tex? - Updates
- If I know the size of every file in tex, and I
remove a file from tex, do I still know the size
of every file in tex? - What if I add a file to tex?
65LCW Reasoning is Hard
- Theorem
- If LCW formulas can contain Ú and Ø then
- answering an LCW query is NP-hard.
- But we need fast inference!
- Solution restrict representation
- Positive first-order conjunctions
- Fast polynomial time inference/updates
Etzioni et al. AIJ Levy VLDB96 Friedman
Weld IJCAI97
66LCW Updates
- L must be updated when M changes.
- All changes to M fall into one of four
categories - Information loss ?(f, T, F U)
- Information gain ?(f, U T, F)
- Domain Growth ?(f, F T)
- Domain contraction ?(f, T F)
67Domain Growth
- Adding core to bin invalidate
- LCW(in.dir(f, bin) Ù size(f,c))
- unless the size of core is known!
- Theorem
- If ?(f, F T) then
- L L - MREL(f)
- MREL(f) º F Î REL(f) ? LCW(F-X)?
- REL(f) º FÎ L (XÎF,?,a) X? fa Ù ? Ø(F-X)?
A
68LCW Updates
69Pruning Redundant Sensing
Time (CPU seconds)
Experience (problems attempted)
70The Internet Softbot
Task Manager
SADL Actions
LCW Knowledge
PUCCINI Planner
Sensors
Effectors
UNIX shell WWW
71XII / Puccini Planner
- Based on UCPOP
- Generative, Partial-Order, Causal-Link
- I.e. much like Gerevinis LPG
- Efficient sensing (LCW control)
- Lifted support of ? goals
Golden et al. 94, Golden Phd
72 Satisfying " Goals
- Link Directly to " Effect
- Subgoal on LCW
- Then Expand to Ground Form
- Partition
rm "f Satisfy(Deleted(f))
ls LCW lpr foo, lpr bar "f
Satisfy(Printed(f))
73Threats to LCW, "
LCW(in.dir(f, /tex) size(f, l))
ls -l /tex
goal
74 Softbot Status
- Fully Implemented (1997)
- Hundreds of Unix, Internet Actions
- Daunting Combinatorics
- Declarative Search Control
- Laborious, Brittle
- Hence...
- ? Improved Declarative Control
- ? Reactive Control
- ? Less Expressive Language
75PG-based Heuristics / Sensing
Shaked03
?
?
76Using the Graph
- LPG-like search (local search on POP)
- Propagating sensing action links
- Executing to reach better states
- Sophisticated heuristics!
77Conclusion
- Planning for the web is ripe for progress
- Data integration
- Modeling sources GAV, LAV,
- Answering queries using views
- Interleaved planning and execution, eddies, cqp
- Service integration
- Web service composition
- Representing unbounded information gain
- Latest heuristic search techniques gt fast!
,
78PKS
- Contingent, forward-chaining planner
- Constructs a complete, correct plan
- Separates plan-time and execution-time effects
- Less Expressive
- No universal quantification
- Still needs search control heuristics
Pettrick Bacchus KR00, AIPS02