Title: HypersetWeblike Databases and the Experimental Implementation of the Query Language Delta
1Hyperset/Web-like Databases and theExperimental
Implementation of the Query Language Delta
- Mr Richard Molyneux
- Also co-authored by Dr Vladimir Sazonov
- Department of Computer Science
- University of Liverpool
2Introduction
- Semi-structured databases (SSD)
- Schema-less.
- Self describing.
- Typically represented by graph.
- Examples hypertext documents and XML.
- Web-like databases (WDB).
- Known query languages to SSD Lorel, UnQL,
UnCAL, G-Log, XML-QL, XSLT, XSL and XQuery
(XPath).
3Introduction Hyperset Approach
- In general
- arbitrary sets of sets of sets even with cycles.
- no prescribed structure.
- Hypersets a generalisation of the relational
approach. - relational database set of relation.
- relation set of tuples.
- tuple set of labelled values.
4Hyperset Approach
- Hyperset data are represented as graphs, or
equivalently as systems of set equations - Analogous to the Web hence
- Web-like databases (WDB).
bob wifealice, nameBob alice
husbandbob,nameAlice, petsam sam
nameSam, speciescat
5Set Equality Bisimulation
b2 authorjones, titleDatabases p3
authorjones, titleDatabases
- Therefore, b2 p3
- or b2 is bisimilar to p3
- Thus, a book is equal to a paper?
- or a result of bad
- database design?
- Anyway, this WDB
- is formally allowed as any other graph or system
of set equations - will illustrate bisimulation issues
6Set Equality BisimulationBetter Database Design
- Therefore, b2 ? p3
- or b2 is not bisimilar to p3
b2 authorjones, titleDatabases,
typeBook p3 authorjones,
titleDatabases, typePaper
7Set Equality Bisimulation
- Bisimulation equality between any graph nodes
or set names. - Any two sets are equal if
- In general, this is a recursive procedure of
computing deep equality
for each (labelled) element of first set there
exists an equal (bisimular) element in the second
set, and vice-versa.
b2 author jones, title Databases, title
Databases p3 title Databases, author
jones
Therefore, b2 p3 or b2 is bisimilar to p3
8Hyperset Approach - Implementation
- In our implementation, systems of set equations
can be transformed into XML representation (and
vice-versa). -
- In general, arbitrary XML elements can
participate.
lt?xml version"1.0"?gt ltseteqns xmlnsset"..."gt
ltseteqn setid"bob"gt ltnamegtBoblt/namegt ltwife
setref"alice" /gt lt/seteqngt ltseteqn
setid"alice"gt ltnamegtAlicelt/namegt lthusband
setref"bob" /gt ltpet setref"sam" /gt
lt/seteqngt ltseteqn setid"sam"gt
ltnamegtSamlt/namegt ltspeciesgtcatlt/speciesgt
lt/seteqngt lt/seteqnsgt
bob wifealice, nameBob alice
husbandbob, nameAlice, petsam
sam nameSam, speciescat
9Delta Query Language to Hyperset WDB
- Previously theoretical hyperset query language.
- Sound theoretical background.
- Expressive power is characterised in terms of
polynomial time thus - computationally viable (in theory),
- sufficiently complete (no gaps in the language)
- Considers WDB up to bisimulation.
10Delta operators
- Delta has rich expressive power provided by its
operators. - The implemented language retains this expressive
power with additional features. - Delta expressions are divided into
- Terms (set queries) set valued
- Formulas (Boolean queries) truth valued
11Delta Set Valued Operations
- Collection (separation) similar to SQL select.
- Recursion recursive version of separate.
- Decoration, plan performance operator useful
for restructuring queries. - Other useful set theoretic operations
- Enumeration l1x1,l2x2,,lnxn
- Union Ux, xUy
- Transitive closure TC(x)
collect s(l,x) where lx in t and F(l,x)
separate lx in t where F(l,x)
12Delta Boolean Valued Operations
- Equality bisimulation.
- Label relations l1 lt l2, l1 substring of l2,
- Membership lx in y
- Logical operators And, Or, Not, Implies
- Bounded (computable) Quantifiers
-
- Everything is bounded in Delta!
forall lx in t F(x,l) exists lx in t F(x,l)
13Delta Other Features
- Set queries can participate in Boolean queries,
and vice versa - The implemented language has block structure.
- Useful things -
- Libraries and Declarations Ability to define
- set constants,
- label constants,
- queries to be invoked by query calls.
- If-then-else
- Full description of Delta syntax (as BNF).
14Query Execution
- Three stages
- Parsing checking that query is well-formed.
- Contextual analysis checking that
- query is well-typed,
- contains no non-declared identifiers.
- Query Evaluation
- Extend WDB with new set equation,
- Result Query
- Simplify the extended system of set equations
until no complex set or boolean expressions
remain.
15Example of Distributed WDB
- Distributed WDB two XML files
-
- URL1 (represented as
- system of set equations) -
-
- URL2
URL1 grey nodes URL2 white nodes
BibDB bookb1, bookb2, paperURL2p1,
paperURL2p2, paperURL2p3 b1
refers-tob2, refers-toURL2p1 b2
authorJones , titleJones
16Example of Query
- Example query (in natural language)
- Example query (in Delta)
Find all publications which refer to the book b2
in the Bibliography database (BibDB).
set query let set constant BibDB be URL1BibDB,
set constant b2 be URL1b2 in collect
pub-typepub where pub-typepub in BibDB
and exists refers-toref in pub . ref b2
endlet
17Example of Simple Query
- Query result (after simplification)
- Recall the query
- Find all publications which refer to
- the book b2 in the Bibliography
- database (BibDB).
-
- p2 also refers to b2,
- because it refers to p3,
- which is bisimular (equal) to b2.
result bookURL1b1, paperURL2p2
18Example of Query Restructuring WDB
- Transform WDB to any
- required structure, e.g.
set query let set constant BibDB URL1BibDB in
let set constant restructuredBibDB be (U
collect nullif (LPaper or LBook)
thenpublicationX, typecall
Pair(callSecond(X),L), Lcall
Pair(L, ) else LX fi
where LX in call GraphOfPairs(BibDB) ) in
decorate ( restructuredBibDB, BibDB
) endlet endlet
That one publication has the type both of book
and paper is the result of the initial design of
BibDB. It is not a failure of the above query.
19Example of query with Path Expressions(Not yet
implemented)
- Useful feature for selecting nodes to arbitrary
depth.
Query set query select pub-typex in BibDB
where exists ltb1gtrefers-toltxgtrefers-toltb2gt
. author"Smith" in x Result result
paperURL2p2
20Remaining Tasks
- Straightforward computation of bisimulation
across distributed WBD is intractable. - Potential solutions to this problem
Local/global approximations distributed
computation of bisimulation locally in each site,
and using these local approximations to compute
global bisimulation. Maintaining strong
extensionality ensuring that the WDB, and all
updates maintain strong extensionality i.e. all
nodes must be non-bisimular (no redundancies).
21Comparative Analysis
- Our approach is top-down (theory to practice),
compared to bottom-up of most other approaches. - UnQL and UnCAL most close to hyperset approach
- embeddable within Delta, but not vice versa.
- But their expressive power is theoretically
unclear. - Still more like graph query languages.
- Lorel is pure graph query language
- Formally incomparable with Delta.
- Ignores bisimilation.
- But there are some similarities with Delta.
- Most of query languages to semistructured
databases bear their expressive power from path
expressions whereas in Delta these are only a
syntactic sugaring practically very convenient,
but formally unnecessary.
22Conclusion
- Current version of implementation is complete
- Implementation available online at the Appendix
Page to this talk - http//www.csc.liv.ac.uk/molyneux/ICSOFT2007appe
ndix/ - Some key features are not implemented yet such as
- path expressions
- distributed evaluation of bisimulation/equality
in background time. - Despite theoretical nature of our approach, we
have some important practical features such as - XML representation of data
- syntactic sugaring
- libraries and declarations of queries
- Delta has rich expressive power ( PTIME) and
solid mathematical foundations.
23