Planning for the Web II Execution & Service Integration - PowerPoint PPT Presentation

About This Presentation
Title:

Planning for the Web II Execution & Service Integration

Description:

Planning for the Web II Execution & Service Integration Dan Weld University of Washington June, 2003 Acknowledgements Oren Etzioni Yolanda Gil Keith Golden Alon ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 77
Provided by: homesCsWa
Category:

less

Transcript and Presenter's Notes

Title: Planning for the Web II Execution & Service Integration


1
Planning for the Web IIExecution Service
Integration
  • Dan Weld
  • University of Washington
  • June, 2003

2
Acknowledgements
  • Oren Etzioni
  • Yolanda Gil
  • Keith Golden
  • Alon Halevy
  • Zack Ives
  • Tal Shaked

Caveat
3
Outline
  • Execution for Data Integration
  • Coping with incomplete statistics, latency
  • Interleaved planning execution
  • Convergent query processing
  • Service Integration
  • Web service composition
  • Background
  • Representational issues
  • Planning algorithms
  • Automated data analysis

4
Optimization and Execution
  • Problem
  • Few and unreliable statistics about the data.
  • Unexpected (possibly bursty) network transfer
    rates.
  • Generally, unpredictable environment.
  • General solution (research area)
  • Adaptive query processing.
  • Interleave optimization and execution. As you get
    to know more about your data, you can improve
    your plan.

5
Adaptivity Incremental Processing Query
Performance
Evaluated within the Tukwila system
Ives PhD
6
Query Optimization Model Query Plans Execution
Choose the Best
op
op
op
op
Shipping (S) 90 tuples
Restock (R) 100 tuples
Restock (R) 100 tuples
Orders (O) 50 tuples
Orders (O) 50 tuples
Shipping (S) 90 tuples
From source sizes, stats, estimate result sizes,
costs
7
Why Does Data Integration Make Optimization
Harder?
  • Query optimization estimates costs using
    knowledge about environment and data
  • Data source sizes (cardinalities)
  • Often unavailable or not meaningful in data
    integration
  • Histograms
  • Too expensive to maintain in data integration
  • I/O costs
  • Network I/O costs fluctuate
  • Need a way to gain this sort of knowledge!

8
Some Solutions
  • Adaptive operators
  • Mid query reoptimization
  • Convergent query processing
  • Query scrambling Franklin et al.
  • Eddies Hellerstein et al.

9
Tukwila Data Integration System
  • Novel components
  • Event handler
  • Optimization-execution loop
  • Adaptive operators

10
Double Pipelined Join
  • Hybrid Hash Join
  • No output until build relation read
  • Asymmetric (build vs. probe) optimization
    requires source behavior knowledge
  • Double Pipelined Hash Join
  • Outputs data immediately
  • Symmetric requires less source knowledge to
    optimize
  • Threads overlap I/O, computation

11
Performance on Networked Data
Join of 3 tables sent via JDBC over 10Mb
Ethernet TPC-H Lineitem Supplier Order
Time (sec)
Tuples Output (1000s)
12
Double Pipelined Join in Summary
  • Benefits
  • Easier to optimize (symmetric)
  • Sub-operations scheduled flexibly
  • Allows overlap of I/O and computation
  • Incurs some overhead
  • Threading, queues
  • Required extensions to intelligently handle
    overflow
  • Same hash function, number of buckets for each
    side
  • Approaches flush buckets on left side or flush
    symmetrically

13
Some Solutions
  • Adaptive operators
  • Mid-query reoptimization
  • Interleaved planning and execution
  • Convergent query processing
  • Query scrambling
  • Eddies

14
Mid-query reoptimization
Materialization Point write AB to disk
If actual ? predicted statistics ? replan Kabra
DeWitt
15
Some Solutions
  • Adaptive operators
  • Mid query reoptimization
  • Convergent query processing
  • Query scrambling
  • Eddies

16
Convergent Query Processing
  • Instead of adapting remainder of plan
  • after executing all data on plan prefix
  • Adapt whole plan
  • after executing whole plan on part of data
  • Can better gather information this way

17
Convergent Query Processing in Action Changing
Join Plans in Mid-Stream
(R ? O ? S)
Join Restock, Orders, Shipping
ROS
RS
18
Breaking a Join into Phases One Subset per
Table, Each Phase
Restock (R)
Orders (O)
19
The Cleanup Plan Reuses PreviousWork Where
Possible
Restock ? Orders ? Shipping
?
R2 O2S2
R0 O0S0
R1 O1S1
R2O2
R0 S0
O1S1
20
CQP on a 100Mbps LAN Nearly Optimal
Performance
866MHz P-III, 256MB buffer pool, re-optimization
every 10sec
cost to parse XML
21
Slow WAN, Faster CPU CQP Reduces Work
1GHz P-III, 256MB, re-optimization every 10sec.
1Mbps network, RTT 50msec
22
Outline
  • Execution for Data Integration
  • Coping with incomplete statistics, latency
  • Interleaved planning execution
  • Convergent query processing
  • Service Integration
  • Web service composition
  • Background
  • Representational issues
  • Planning algorithms
  • Automated data analysis

23
What is a Web Service
  • A web service is a network accessible interface
    to application functionality, built using
    standard Internet protocols (TCP/IP, XML, SOAP,
    WSDL
  • Clients of a web service do NOT need to know how
    it is implemented.
  • Why interesting?
  • Increased automation

Web Service
Network
Application code
Application client
24
Case Study Amazon
  • Services Exported
  • Product details (short, long, images, samples)
  • Purchase functionality
  • Ratings, reviews, collaborative filtering data,
    lists,
  • Examples
  • Store builder tools
  • Amazon Browser visualization tool
  • Windows desktop interfaces drag-n-drop
  • MP3 Piranha
  • Games
  • Automatic review writer??

25
Case Study Google
  • Services Exported
  • Search interface
  • Limits on items returned, queries / day
  • Examples
  • Metacrawler functionality
  • Geosearch nearby thai restaurants
  • TIGER, FIPs -gt lat,long of pages
  • Robust hyperlinks
  • Creates a signature for destination pages
    tracks with query

26
Case Study Fed Express
  • Shipment tracking
  • Proof of delivery
  • Invoice reviewed, adjusted, settled
  • Schedule pickup time, location
  • Outgoing or returns
  • Order supplies (airbills, envelopes, boxes)
  • Review shipping history
  • Rate requests
  • Location, package size
  • International trade
  • Required documents, duties, taxes

27
Case Study Hailstorm / MyServices
  • Web Services
  • MyDocuments
  • MyAddressbook
  • MyWallet
  • MyNotifications .
  • Scenario
  • Wallet keeps receipts, arranges product return
  • Expedia uses notifications to warn of canceled
    flight
  • Reality
  • Ebay, AmEx, Groove,

28
Case Study OAA
  • Common schema for travel industry
  • Reservations
  • Flights, trains, rental cars, hotels
  • Time distances
  • Payment, deposits, vouchers
  • Vacation Packages

29
Web Service Technology Stack
shopping web service? WSDL URIs
Web Service Client
UDDI
Discovery
Web Service
Description
WSDL
WSDL
SOAP pkg request
Packaging
Proxy
SOAP pkg response
Transport
Network
30
Step1. Write Web Service Method
shopping web service? WSDL URIs
Web Service Client
UDDI
Discovery
Web Service
Description
WSDL
WSDL
SOAP pkg request
Packaging
Proxy
SOAP pkg response
Transport
Network
31
Step2. Describe Web Service using WSDL
shopping web service? WSDL URIs
Web Service Client
UDDI
Discovery
Web Service
Description
WSDL
WSDL
SOAP pkg request
Packaging
Proxy
SOAP pkg response
Transport
Network
32
SOAP (Simple Object Access Protocol)
  • SOAP Messages
  • XML Payload
  • Using SOAP as RPC (Remote Procedure Call) Messages

SOAP client
SOAP server
Request message
Response message
33
If a WS were a Phone Call
  • XML
  • represents the conversation,
  • SOAP
  • describes the rules for how to call someone
  • UDDI
  • is the phone book.
  • WSDL
  • describes what the phone call is about and how
    you can participate.

34
WSDL
for int foo(int arg)
lttypesgt ltschema targetNamespace"http//tempuri.
org/xsd" xmlns"http//www.w3.org/2001/XMLSchem
a" xmlnsSOAP-ENC"http//schemas.xmlsoap.org/s
oap/encoding/" xmlnswsdl"http//schemas...l/"
elementFormDefault"qualified" gt lt/schemagt
lt/typesgt ltmessage name"Simple.foo"gt ltpart
name"arg" type"xsdint"/gt lt/messagegt ltmessage
name"Simple.fooResponse"gt ltpart name"result"
type"xsdint"/gt lt/messagegt ltportType
name"SimplePortType"gt ltoperation name"foo"
parameterOrder"arg" gt ltinput
message"wsdlnsSimple.foo"/gt ltoutput
message"wsdlnsSimple.fooResponse"/gt
lt/operationgt lt/portTypegt
35
DISCO
  • If you know the URL for a service
  • DISCO lets you query them
  • And get back a WSDL description
  • But what if you dont know the right URL?

36
UDDI
  • Hosted Registries
  • Microsoft, IBM, HP, SAP, NTT, BEA
  • Entries defined with
  • Business information
  • Name, contacts, descriptions, identifier, yellow
    pages category
  • Service information
  • Entities, each of which describes a family of
    related services which together implement a
    business process
  • Binding information
  • How to invoke URI, required parameters, options,
    Tmodel
  • Service specifications (Tmodel)
  • As a symbol fingerprint to recognize a known
    service
  • Decomposable to find WSDL description

37
Acronyms (W3C, MSFT, IBM)
  • UDDI
  • Discover, describe, register services
  • SOAP-based service for locating WSDL-formatted
    service descriptions
  • DISCO
  • Discover / retrieve SCLSDL descrips
  • SDL / NASSL
  • SOAP description lang get params / types
  • SCL
  • SOAP contract lang extends SDL orchestration
    of msgs
  • WSDL
  • Describe abstract interface and protocol
    bindings of arbitrary network services
    (extends scl)
  • XLANG / WSFL / BPEL4WS
  • lang for biz processes used in BizTalk
  • Biz process execution language for web services
  • MSFT, IBM, BEA proposal

SDL
NASSL
SCL
WSDL
38
The Layer Cake TBL,XML2000
39
RDF (Resource Description Framework)
  • Way to describe resources via metadata
  • Makes no assumptions about a particular
    application domain
  • Based on XML
  • Another one?
  • Standard for semantic web
  • Restricts resource descriptions to triplets
  • (subject,predicate,object)
  • Provides a lightweight ontology system
  • Subproperty, Subclass, Domain Range

40
DAMLOIL (www.daml.org)
  • DAML extends RDF and RDFS with richer modeling
    primitives.
  • disjointWith, intersectionOf, oneOf, cardinality
  • Able to provide properties of properties
  • uniqueness, transitivity, etc.

41
DAML-S
  • DAMLOIL ontology describing Web Services
  • Complements low level descriptions like WSDL
  • Describes what and why a service operates,
  • Not just how to communicate with it.
  • Goals Discovery, Invocation, Composition,
  • Verification, Execution Monitoring

(mapping to WSDL)
42
Outline
  • Execution for Data Integration
  • Coping with incomplete statistics, latency
  • Interleaved planning execution
  • Convergent query processing
  • Service Integration
  • Web service composition
  • Background
  • Representational issues
  • Planning algorithms
  • Automated data analysis

43
Partial Survey of Planners
  • UW Internet Softbot
  • Planners SENSp / XII / PUCCINI
  • Repr. languages UWL / SADL LCW
  • PKS
  • Planning at the knowledge level
  • McDermott
  • Forward-chaining search w/ GRG guidance
  • McIlraith et al.
  • ConGolog (procs, loops, conditionals, w/ nondet
  • Papazoglou, Traverso et al.
  • Stratified service arch XSRL language MBP
  • Finin Srivastava Knoblock Ambite Nau

44
Planning for image processing tasks
  • Many fielded systems
  • Lanskys COLLAGE , Chien et al. MVP/ASIP,
  • Golden ADLIM, Blythe GRID
  • Spatial representations important

45
Motivating Scenarios
  • Planning a trip
  • Yahoo maps -gt driving time -gt travel prefs
  • Automatic expense form filing
  • Purchasing a group of items
  • Aggregation from multiple vendors
  • Select for payment types, stock level, deliv
  • Local 3rd party reputation services (BBB)
  • Monitoring marketplace
  • Auction sites
  • Events (check calendar / notification service

46
UW Internet Softbot
  • Software robot
  • Effectors mv, ftp, chmod, cd, lpr, rm, ...
  • Sensors ls, finger, INSPEC, netfind, wc, ...
  • Say what we want, not how to do it
  • Find phone numbers, fetch/print online papers,
  • Integrate multiple resources

47
Motivation/Contributions
  • Represent actions like ls, finger
  • Represent goals such as
  • Rename paper.tex to kr.tex
  • Print all files in directory papers.
  • (even with incomplete information)
  • No previous system could express

48
The Middle Ground
1. Action Representation
2. Knowledge Representation
49
Softbot Architecture
50
SADL Family Tree
Fikes Nilsson, 71
STRIPS
Etzioni et al, 92
Pednault, 89
", Conditional Effects
Incomplete info, Noise-free sensors
UWL
ADL
SADL
Represents ls, Rename, finger...
Golden Weld, 96
51
SADL/UWL Annotations
  • Goal annotations
  • satisfy achieve by any means
  • hands-off dont change (maintenance)
  • Effect annotations
  • cause change world
  • observe change agents knowledge
  • Delete the file named junk
  • satisfy (name (ƒ, junk)) Ù satisfy(deleted (ƒ))

52
Information Goals are Temporal
  • Two time points
  • When proposition sampled
  • When reply given
  • Tell me now who was President in 1883
  • Tell me tomorrow who is President now
  • Identify (ASAP) the file now named junk

53
Information Goals are Temporal
  • Rename paper.tex to kr.tex
  • designator (name) changes
  • UWL cant express
  • SADL solution initially time goal
    was posed
  • initially (name (ƒ, paper.tex)) Ù
  • satisfy (name (ƒ, kr.tex))
  • initially (name (ƒ, core)) Ù satisfy (deleted (ƒ
    ))
  • Compare to more general temporal representation

54
Tidiness Goals
  • Print paper, but dont leave it uncompressed.
  • initially (compressed (paper), tv) Ù
  • satisfy (printed (paper)) Ù
  • satisfy (compressed (paper), tv)
  • State of paper.ps may change temporarily
  • but must be restored
  • Compare to more general goal lang, e.g. LTL

C
B
55
Unbounded Information Gain
  • action ls (d )
  • precondition satisfy(current.shell(csh)) Ù
  • satisfy(readable(d ))
  • effect " f when in.dir(f, d)
  • l,n,d
    observe(length(f, l )) Ù
  • observe(name(f, n )) Ù
  • observe(in.dir(f, d ))

56
Compare PKS Representation
Initial State Kf ( (pwd) root), (indir
papers root), (indir planner root), (dir
root), (dir papers), (dir planner), (file
paper_tex) Kx ((indir paper_tex planner)
(indir paper_tex papers)) Goal K(indir
paper_tex (pwd))
57
The Internet Softbot
58
Knowledge Representation
  • Closed World Assumption (CWA)
  • Made by classical planners
  • Anything not recorded as true is false
  • Open World Assumption (OWA)
  • Anything not recorded true or false is unknown
  • Sensor abuse
  • Cant handle " goals

59
Sensor Abuse
  • OWA Dont know when to stop sensing
  • Many ways to find same information
  • Many plans containing same action
  • After executing find / -name foo, should know
  • ls bin wont reveal more files named foo
  • ls tex wont reveal more files named foo
  • Google may reveal more files named foo

60
How Classical Planners Handle "
  • " block (x) OnTable (x)
  • replaced with
  • OnTable (A) Ù OnTable (B)
  • Ù OnTable (C)
  • Relies on CWA
  • Must know all blocks
  • OWA can never be sure

B
A
C
C
61
Local Closed World Knowledge
  • Complete info over restricted domain
  • All blocks on table, all products at Amazon
  • Local Closed World Knowledge (LCW)
  • Restricted form of circumscription
  • Provides fast closed world inference
  • Allows fast updates
  • Suited to planner action representations.

62
LCW Semantics
  • I know all files in directory bin
  • LCW(in.dir(f, bin))
  • LCW(in.dir(f, bin)) º
  • "f ? in.dir(f, bin) Ú
  • ? Øin.dir(f, bin)

63
LCW Representation
  • M Ground literals in agents model
  • in.dir(icaps03, papers)
  • in.dir(junk, papers)
  • Ø executable(core)
  • L LCW formulas in agents model
  • LCW(in.dir(f, papers))
  • If P Ï M, and L ? LCW(P), then ØP
  • Conclude Ø in.dir(foo, papers)

64
LCW Reasoning
  • Inference
  • If I know all files in tex, and I know the size
    of every file, then do I know the size of every
    file in tex?
  • Updates
  • If I know the size of every file in tex, and I
    remove a file from tex, do I still know the size
    of every file in tex?
  • What if I add a file to tex?

65
LCW Reasoning is Hard
  • Theorem
  • If LCW formulas can contain Ú and Ø then
  • answering an LCW query is NP-hard.
  • But we need fast inference!
  • Solution restrict representation
  • Positive first-order conjunctions
  • Fast polynomial time inference/updates

Etzioni et al. AIJ Levy VLDB96 Friedman
Weld IJCAI97
66
LCW Updates
  • L must be updated when M changes.
  • All changes to M fall into one of four
    categories
  • Information loss ?(f, T, F U)
  • Information gain ?(f, U T, F)
  • Domain Growth ?(f, F T)
  • Domain contraction ?(f, T F)

67
Domain Growth
  • Adding core to bin invalidate
  • LCW(in.dir(f, bin) Ù size(f,c))
  • unless the size of core is known!
  • Theorem
  • If ?(f, F T) then
  • L L - MREL(f)
  • MREL(f) º F Î REL(f) ? LCW(F-X)?
  • REL(f) º FÎ L (XÎF,?,a) X? fa Ù ? Ø(F-X)?

A
68
LCW Updates
69
Pruning Redundant Sensing
Time (CPU seconds)
Experience (problems attempted)
70
The Internet Softbot

Task Manager
SADL Actions
LCW Knowledge
PUCCINI Planner
Sensors
Effectors
UNIX shell WWW
71
XII / Puccini Planner
  • Based on UCPOP
  • Generative, Partial-Order, Causal-Link
  • I.e. much like Gerevinis LPG
  • Efficient sensing (LCW control)
  • Lifted support of ? goals

Golden et al. 94, Golden Phd
72
Satisfying " Goals
  • Link Directly to " Effect
  • Subgoal on LCW
  • Then Expand to Ground Form
  • Partition

rm "f Satisfy(Deleted(f))
ls LCW lpr foo, lpr bar "f
Satisfy(Printed(f))
73
Threats to LCW, "
LCW(in.dir(f, /tex) size(f, l))
ls -l /tex
goal
74
Softbot Status
  • Fully Implemented (1997)
  • Hundreds of Unix, Internet Actions
  • Daunting Combinatorics
  • Declarative Search Control
  • Laborious, Brittle
  • Hence...
  • ? Improved Declarative Control
  • ? Reactive Control
  • ? Less Expressive Language

75
PG-based Heuristics / Sensing
Shaked03
?
?
76
Using the Graph
  • LPG-like search (local search on POP)
  • Propagating sensing action links
  • Executing to reach better states
  • Sophisticated heuristics!

77
Conclusion
  • Planning for the web is ripe for progress
  • Data integration
  • Modeling sources GAV, LAV,
  • Answering queries using views
  • Interleaved planning and execution, eddies, cqp
  • Service integration
  • Web service composition
  • Representing unbounded information gain
  • Latest heuristic search techniques gt fast!

,
78
PKS
  • Contingent, forward-chaining planner
  • Constructs a complete, correct plan
  • Separates plan-time and execution-time effects
  • Less Expressive
  • No universal quantification
  • Still needs search control heuristics

Pettrick Bacchus KR00, AIPS02
Write a Comment
User Comments (0)
About PowerShow.com