Title: Exchanging Intensional XML Data
1Exchanging Intensional XML Data
- Tova Milo, INRIA Tel Aviv University
- Serge Abiteboul, INRIA Xyleme S.A.
- Bernd Amann, CNAM
- Omar Benjelloun, INRIA
- Frederic Dang Ngoc, INRIA
SIGMOD 2003 San Diego
2Introduction
3Intensional documents
- Early days of the web
- Extensional data (static HTML)
- CGI scripts (perl, )
- Code is executed to generate data.
- Intensional data
- HTML with embedded code (php, jsp, )
- Embedded code is executed before sending data.
- XML with embedded calls to Web services
- Calls are still evaluated before sending (Jelly,
MX, ). - Active XML
- Calls do not have to be evaluated before sending
data. - Advantages of intensional data
- More information it shows how data is generated
- Dynamic it provide the means, e.g. to refresh
data - Control the exchange of intensional data (to call
or not to call?).
4Web services in a nutshell
- A number of standards
- XML, SOAP, WSDL, UDDI,
- Means to provide, invoke and describe remote
functions with XML input/output. - They make intensional documents exchangeable.
5Context Active XML (AXML)
- A language XML with embedded service calls
- A peer-to-peer system
- Each peer
- Repository of intensional (AXML) documents
- Server provides Web services (XQuery)
- Client when invoking the embedded service calls
- And many more cool features
- distribution and replication
- continuous services
- etc.
- AXML peers exchange intensional data.
6Outline
- Introduction
- Intensional data
- Schema-controlled exchange of intensional data
- Safe rewriting algorithm
- Conclusion
7Intensional data
8Materialization
lt?xml version1.0 ?gt ltnewspapergt lttitlegtLe
Mondelt/titlegt ltdategt06/10/2003lt/dategt ltcall
svcYahoo.GetTempgt ltcitygtParislt/citygt
lt/callgt ltcall svcTimeOut.GetEventsgt
exhibits lt/callgt lt/newspapergt
06/10/2003
lttempgt16Clt/tempgt
Le Monde
ltexhibitsgt ltcall svcYahoo.GetExhibitsgt
ltcitygtParislt/citygt lt/callgt lt/exhibitsgt
- Materialization replacing a service call by its
result. - Its a recursive process.
9To call or not to call ?
- Materialization can be performed
- by the sender, before sending a document
- or by the receiver, after receiving it.
10Why control the materialization of calls?
- For added functionality, e.g.
- Intensional data allows to get up-to-date
information. - For security reasons or capabilities, e.g.
- I dont trust this Web service/domain,
- I dont have the right credentials to invoke it,
- It costs money,
- Maybe the receiver doesnt know Active XML!
- For performance reasons, e.g.
- A proxy can invoke all the services on behalf of
a PDA. - and many more reasons you can think of!
11How to control it? Using types
- We extend XML Schema, with intensional types
XMLSchemaint
g
Data exchange schema
q
f
g
f
q
...
...
g
g
g
q
r
f
...
g
f
r
...
q
g
g
q
...
r
...
...
...
...
- Static analysis algos use signatures of services
WSDLint
12Schema-controlled exchange
13The extended schema language
To simplify, we use here a DTD-like syntax
- Data
- newspaper title.date.(GetTemptemp).(GetEventse
xhibit) - title data
- date data
- temp data
- city data
- exhibit title.(GetDatedate)
- Functions
- GetTemp(city) -gt temp
- GetEvents(data) -gt (exhibitperformance)
- GetDate(title) -gt date
- Rewriting replace call(s) by an arbitrary
output of the service.
14Rewritings
- The Goal
- Given
- an intensional document d
- a schema s,
- Can we rewrite d so that it matches s?
- Safe rewriting one that for sure leads to s
- (we know without making any call).
- Possible rewriting one that possibly leads to s
(depending on the answer of the service).
15Difficulties
- Infinite search space
- Vertical
- Horizontal
- Main problem
- The result of a Web service call is unknown,
- We just know a signature (input/output types)
- We want a very efficient solution.
- Foundations of the problem
- tree automata,
- with existential and universal transitions.
16Results
- Restrictions on the considered rewritings
- Left-to-right No going back and forth
- K-depth bound on the nesting of function calls
- (Search space still infinite but finitely
representable) - Under these restrictions
- We have algorithms to find safe/possible
rewritings. - They are PTIME (for deterministic schemas).
- We can also do it between schemas.
- Recent follow-up work by MSS03
- The general problem is undecidable.
- Some complexity results.
17Safe rewriting algorithm
18Safe rewriting algorithm
- Sketch
- Deal with function parameters,
- Traverse the tree top-down,
- For each data node, rewrite its children.
19Rewriting the input parameters of calls
- To invoke a service, the parameters must match
its signature.
- Start from the deepest calls
- Finish by rewriting the document.
20Safe rewriting algorithm
- Sketch
- Deal with function parameters,
- Traverse the tree top-down,
- For each data node, rewrite its children.
21Rewriting a nodes children
- We have
- The children title.date.GetTemp.GetEvents
- The type to match title.date.temp.(GetEventsexhi
bit) - Output types of services
- GetTemp -gt temp
- GetEvents -gt (exhibit performance)
- Three steps
- Build an FSA that accepts all k-depth rewritings
of the word. - Build an FSA that recognizes the complement of
the type. - Compute their intersection to find a safe
rewriting. - Smarter algo in the system lazy automata
construction.
22Rewriting a nodes children
- accepts all k-depth rewritings of the
word. - This is for title.date.GetTemp.GetEvents
- Output types of services
- GetTemp -gt temp
- GetEvents -gt (exhibit performance)
date
GetEvents
GetTemp
q2
q3
?
?
?
?
temp
exhibit
performance
23Rewriting a nodes children (2)
- is the complement automaton for the target
type. - Newspaper title.date.temp.(GetEventsexhibit)
GetEvents
p6
exhibit
exhibit
24Rewriting a nodes children (3)
?
exhibit
q4,p6
q7,p5
q4,p5
performance
?
performance
exhibit
GetEvents
?
?
exhibit
performance
q3,p6
q7,p6
q7,p3
q4,p3
q7,p6
?
GetTemp
title
date
GetEvents
q1,p1
q2,p2
q3,p3
q4,p4
q0,p0
?
?
A safe rewriting exists!
title.date.GetTemp.GetEvents
title.date.temp.GetEvents
temp
q5,p2
q6,p3
25Other algorithms (in the paper)
- Possible rewriting
- Schema compatibility
- Verifies that all instances of a schema safely
rewrite to instances of another schema. - Key idea It is sufficient to check a finite
number of instance representatives.
26Conclusion
- Schema-controlled exchange of intensional data
- Implemented as part of the Active XML system
- Fun applications
- Easy customization of Web services (VLDB03 demo)
- Types form the basis to match client preferences
- Surveillance of an AXML application (call
tracing) - Perspectives
- Extend with automatic data conversion
- Further optimize the algorithm (notably, for
simple cases)
27Shameless advertisement
Shameless advertisement
- Active XML
- a language and peer-to-peer system based on XML
with embedded calls to Web services - VLDB02 demo
- SIGMOD session, tomorrow morning
- http//www-rocq.inria.fr/verso/Gemo/Projects/axml
- (or google//ActiveXML)
- Lots of cool applications
- mobile computing, network configuration,
warehouse of web resources
28Merci