Title: On Views and XML
1On Views and XML
- Serge Abiteboul
- INRIA
- PODS 1999
2Organization
- Introduction
- XML View Query
- Change Control
- Objects
- Structured Semistructured Data
- Active Features
- Incomplete Information
- more...
Many Facets!
3Warning
- This is not a survey on database views
- This is not a tutorial on XML
- This is about the use of XMLecommerce as excuses
to survey some works on views cast in a
fashionable context O2views, views of OEM,
ActiveViews, Lorel/Ozone... - (and also motivate future works)
4Executive Summary Database folks should be
interested in XML Views and more and more are
Footnote this is a great way to recycle your old
results on views, incomplete information,
deductive databases, universal instance
assumption, dependency theory, etc.
5Introduction XML in short
- Document mark-up language descendant of SGML
- Standard for data exchange on the Web
- We are interested here in data exchange and not
in document editing and retrieval
6EXAMPLE EDI Electronic Data Interchange
- Standard for business data exchange
- 2 standards
- ANSI X12 in US -- all B2G by end 1999
- EDIFACT in world -- UN committee
- translate ? EDI ? transmit
7- lt!DOCTYPE Book-Order PUBLIC "-//Editor//DTD Book
Order Message//EN"gt - ltBook-Order Supplier"4012345000094"
Send-to"http//www.bic.org/order.in"gt - lttitlegtEditor Lite-EDI Book Orderinglt/titlegt
ltOrder-Nogt967634lt/Order-Nogt - ltMessage-Dategt19961002lt/Message-Dategt
ltBuyer-EANgt5412345000176lt/Buyer-EANgt - ltOrder-Line Reference-No"0528837"gt
- ltISBNgt0316907235lt/ISBNgt
- ltAuthor-TitlegtLabaln, Brian/Chromelt/Author-Titlegt
- ltQuantitygt2lt/Quantitygt
- lt/Order-Linegt
- ltOrder-Line Reference-No"0528838"gt
- ltISBNgt0856674427lt/ISBNgt
- ltAuthor-TitlegtParry, Linda (ed)/William
Morrislt/Author-Titlegt - ltQuantitygt1lt/Quantitygt
- lt/Order-Linegtltinput type"checkbox"
name"partial" value"allowed"/gt - lttextgtTick here if a delayed/partial supply of
order is acceptablelt/textgt - ltinput type"checkbox" name"confirmation"
value"requested"/gt - lttextgtTick here if Confirmation of Acceptance of
Order is to be returned by e-maillt/textgt - ltinput type"checkbox" name"DeliveryNote"
value"required"/gt
data in XML/EDI
8I personally prefer
9XML
- Some noise and confusion
- Is the syntax important? No
- What is XML?
- the means to exchange tree/graph data on the Web
- an object-oriented API for it
- more
10A (simplified) model for XML
- XML-tree - list(node)
- node - string element ref node
- element - label list(att string) list(node)
- label - string
- att - string
- an attribute occurs at most once
11XML in short
- ltpersongt
- ltnamegtSerge Abiteboullt/namegtPODS invited speaker
- lta xmllinksimple hrefgif/serge.gifgt old
picturelt/agt - ltaddressgt ltcitygtLe Chesnaylt/citygtltzipgt92310lt/zipgt
lt/addressgt - lta xmllinksimple hrefwww-rocq.inria.fr/abi
tebougtWeblt/agt - lt/persongt
- DTD grammar DCD some typing
- DOM object API RDF meta data
- XPOINTER/XLINK ...
12XML Views
Query Publishsubscribe Crawlerfilter
engine Security manager Request broker Business
intelligence Output/report/delivery
Data Warehouse
Web browsers
OLAP
Web browsers
View server
Image video
Web browsers
reports
Information repository
13What databases can bring to XML is query
optimization and query rewriting
View Query
14View Query
- like for relational model
- use of query optimization techniques
- use of query rewriting techniques
- processing queries using views
- main issue virtual vs. materialized
15B2C Comparative Shopping
- http//www.addall.com
- 24 bookstores searched in about 10 seconds
- between 42 and 78
- thats why people will use them!
16What DB can bring to XML is the control of
changes
View Change Control
17Some of the most studied problems for relational
views
- update propagation
- incremental updates
- view update problem
18D2V Incremental Updates
- a customer has loaded portions of the catalog
- some prices change
- no need to reload the entire catalog
- many such examples on the Web
- ? updates
19V2D View Update
- Sometimes considered less of an issue the Web is
read only! - Many Web applications involve updates
- We may be able to annotate the products of the
catalog - some of the data is in read mode
- some data is not visible (this is only a view!)
- some data may be updated
20Example Change Detection
- A customer (self) is in a department
(self.department) and may want to see only the
current promotions of products in this department
(MyPromotions) - let MyPromotions be
- select I.
- from I in Catalog.promotions.item
- where I.department self.department
21Query Subscription Changes from Chawathes
thesis
- Changes in label graphs as in DOEM
Catalog
name
Gismos78
item
promotion
department
electronic
price
234
department
self
278
22Query Subscription Changes
- Change value of atomic vertex value
- Creation of new vertex
- Addition/removal of an edge
- Change of the label on an edge add/remove
- Move a vertex add/remove
- annotations on edges and vertexes
23Query Subscription Queries
- select P.code, P.description
- from P in Catalog.product
- where P.price ltchangedgtQ vertex annotation
- where P.ltaddedgtdescription edge annotation
- where P.price data in annotation
- ltchanged ltoldQ, date TgtgtQ
- and Q - Q gt 100 and T gt 99/04/03
24Query Subscription Examples
- On the first of each month, send me the list of
all products in my interest list such that their
price increased by more than 10 - Each time there are ten new employees, send me
their names and departments - Notify me if the price of this house decreases
- similarity on event when condition do action
25XML World of Objects
The underlying model for XML is object-based
and XML views should be based on OO(DB)
technology
26Views World of objects
- API for XML Domain Object Model
- Views XML as object-oriented
- Allows designing C or Java applications
- E.g.
- use subclass Promotion of XMLNode
- Catalog.promotions is only a set of virtual
elements - the list of promotions is generated on demand
based on the nature of customers
27Views in OODB O2Views
- Virtual values
- like for relational views
- entirely virtual XML document, e.g., view of
relational data - virtual attributes
- e.g., product code, name, price,
- alternatives the set of products that
- are similar and are on promotion
28Views in OODB O2Views
- Virtual class a set of database objects that are
grouped together and as such acquire a new
interface - catalog1/DTD1,,catalog17/DTD17
- products are represented differently in each
catalog - unique DTD that allows to view all products
- each product can be viewed with that DTD
29Views in OODB O2Views
- Imaginary class groups objects that are all
virtual, e.g., join of two relations - For more see Souzas thesis
30XML data/views semistructured structured
data
XML should also allow the exchange of
structured data as in relational/ODMG models
31Semistructured Structured Data
- If we know about the structure of data, not using
it may damage performance - The use of structure facilitates the programming
of applications, e.g., in Java - Structure may be useful to explain data to users
- For more see Lahiris thesis and Ozone OQL
Lorel
32Web catalog - continued
- Product-basic all products
- categoryelectronic, subcategorysound,
- nameGismo223, codeF2GHYYRF,
- selling-price1200FF
- Product-specific for Gismos only
- voltagelist(110,220), Gismo-normGHTF333
- External resources
- descriptionhttp//m.ec.fr/cat/Gismo
- reviewshttp//reviews.com/Gismo
- Private data
- buying-price100, quantity-in-stock20000,
supplierSears, authorized-discount30
Regular data
Semistructured data
External data
Other regular data
33This data in XML
- ltproductgt
- ltbasicgt
- ltcatgt electronic ltsubcat gtsound lt/subcatgtltcatgt
- ltngtGismo223 lt/ngtltcgtF2GHYYRFlt/cgt
- ltsp currencyFrench-francgt1200lt/spgt lt/basicgt
- ltspecificgt
- ltvgt110lt/vgtltvgt220lt/vgt
- ltGismo-normgtGHTF333lt/Gismo-normgt lt/specificgt
- ltexternalgt lt/externalgt
- ltprivategt
- ltbp currencydollargt100lt/bpgt ltqisgt20000lt/qisgt,
ltsgtSearslt/sgt ltadgt30lt/adgtlt/privategtlt\productgt
34What is such data exactly?
- A mix of structured and semistructured data with
pointers between two worlds - Purely XML. Then
- use a relation as a materialized view
- Product(name, code, category, subcategory, price,
rest) - Index on name and subcategory
- select P.name, P.price from P in Product
- where P.subcategory sound
35Digression storage of XML
- as blobs
- generic mapping ignore the structure
- specific mapping
- relational
- object
- hybrid
36As blobs
- ltproductgt ltbasicgt ltcatgt electronic ltsubcat
gtsound lt/subcatgtltcatgt ltngtGismo22lt/ngtltcgtF2GHYYRFlt/c
gt ltsp currencyFrench-francgt1200lt/spgt lt/basicgt
ltspecificgt ltvgt110lt/vgtltvgt220lt/vgt
ltGismo-normgtGHTF333lt/Gismo-normgt lt/specificgt
ltexternalgt lt/externalgt ltprivategt ltbp
currencydollargt100lt/bpgt ltqisgt20000lt/qisgt,
ltsgtSearslt/sgt ltadgt30lt/adgtlt/privategtlt\productgt - full-text index
37Generic mapping
- root product o1 o3 electronic
- o1 basic o2 o4 sound
- o2 cat o3 o5 Gismo223
- o2 subcat o4 o6 F2GHYYRF
- o2 n o5 o7 1200...
- o2 c o6
- o2 sp o7...
- o7 currency French-franc
- o12 currency dollar...
-
element graph
atomic objects
attributes
38Specific
- Class Product
- type tuple( catstring subcatset(string)
- n string, cstring price Price
specific OEM - external list(tuple(labelstringvalURL))
- private pr tuple(
- bpPrice qis integer
- supplier Company ) )
- type Price tuple(sumint, currencyCurrency)
39What is better? Hybrid?
- Need for comparative studies
- My feeling/common sense?
- Use structure for very structured portions of
data - Use semistructured for less so or portions with
very evolving structures - Use blobs for components accessed mostly via
full-text indexing, e.g., paragraphs in a document
40Views Active Features
41Active Views
- System developed at INRIA
- Long term goals
- Declarative specification of data intensive
applications with cooperation between partners - Ease of use and fast deployment
- (Automatic) verification
42Architecture
JAVA
AVApi
DOM
O2
Java application
O2 Notification
Java RMI
XML repository
ACTIVEVIEWS MANAGER
Web Browser
Java Client
43Motivations
- Database Applications
- passive behavior
- closed systems
- persistence, concurrency, access control
- New needs
- interactions between clients e.g., notification
- change control
- reactive behavior
- E.g e-Commerce, cooperative work
44Illustration of Interactions Notification
- In the vendor view
- when Customer.entersDept(dept)
- if dept self.dept
- then notifyme
45Notification
AVServer
entersDept book
AVClient customer
notify
notify
AVServer
AVClient vendor in book dept
46Illustration of Interaction Change Control
- In the customer view
- let monitored MyPromotions be
- s elect I.name, I.price
- from I in Catalog.promotions.item
- where I.department self.department
- read, write, append, monitored, refresh,
deferred - simpler case monitoring of the catalog
-
47Change control
3 Modification
AVServer
4 Write
AVClient
1 Read
6 Notification
2 Read
7.Read
5 Notification
AVServer
AVClient
48Choices
- All XML
- XML repository
- XML query language
- XML views
- Declarative specification
- almost no code to write
- compilation to an executable application
- active rules
49Important Aspects
- workflow
- e.g., customization to search for a biblio ref,
look first in my own files, otherwise look in
dblp otherwise look - activities (search, buy, accounting, chat)
- active rules
- logical traces
- notifications
50View Incomplete Information
Use something like Imielinski-Lipski tables
51Example portal
Q1 Q2 comp comp price v1 v1 109 v2 v2 X v3 v3 99
v4 v4 89 v5 v5 Y
- Q1 gismo vendors
- V ? P sell(V,gismo,P)
- Q1 v1, v2, v3, v4, v5
- Q2 price for each vendor
- V, P sell(V,gismo,P)
- Q3 cheap gismo vendors
- V ? P (sell(V,gismo,P) and Plt80)
Q3 comp price cond v2 X Xlt80 v5 Y Ylt80
52Example more portal
- Load all electronic products
- expiration e.g. to recover storage space
- for all products loaded before May 1st, discard
images and text of annotations - give me the gismos that have been annotated by
Jeff Ullman and the annotations
53View workspace, distribution, cache...
Just to say, there is much more to it...
54Conclusion
55Some Challenges Semistructured Data Processing
- XML storage under non generic form
- XML query language optimization
- XML bulk loading
- data conversion, integration
- incomplete information
56Some Challenges Change Control and View
Interaction
- update detection
- incremental propagation
- temporal XML versions, DOEM...
- rule and trigger management
- management of large number of user active views
(personalized)
57Some Challenges Workflow
- workflow management task sequencing
- declarative specification of applications
- program Verification
58Conclusion
Database folks should be interested in XML Views
and more and more are...