Title: Converting Disjunctive Data to Disjunctive Graphs
1Converting Disjunctive Data to Disjunctive Graphs
- Lars Olson
- Data Extraction Group
- Funded by NSF
2Introduction
- Disjunctive databases
- Needed to represent disjunctive data
- Queries are CoNP-complete in general Imielinski
and Vadaparty, 1989 - Transitive closure in disjunctive graphs
- CoNP-complete in general
- Polynomial time, under certain circumstances
Lobo et. al, 1995
3The Problem
- How do we convert the data into a disjunctive
graph? - What is the complexity of the conversion?
- Time
- Space / Memory
4Implementation
- XML data repository
- Shore / Niagara (Univ. of Wisconsin)
- Xerces XML parser (Apache.org)
- How do we represent a disjunctive database in
storage? - Needs to be easy to convert to disjunctive graph
- Needs to minimize the changes to the DTD and
thus, the existing data
5XML ? Graph Conversion
doc
Node
ltdocgt ltNode nameAgt ltEdgeTo
refB/gt lt/Nodegt ltNode nameBgtlt/Nodegt ... lt/
docgt
Node
EdgeTo
A
B
B
- Use primary key to distinguish doc?Node edges
- Use foreign key to perform join (EdgeTo.ref
Node.name)
6Disjunctions in XML, 1st Case
ltNode nameAgt ltEdgeTo refB/gt ltDisjgt ltEdge
To refC/gt ltEdgeTo refD/gt lt/Disjgt lt/Nodegt
...
B
A
C
D
but how do we represent a disjunctive tail?
7Disjunctions in XML, 1st Case
ltNode nameAgt ltEdgeTo refB/gt ltDisjgt ltEdge
To refC/gt ltEdgeTo refD/gt lt/Disjgt lt/Nodegt
ltDisjgt ltNode nameEgt ltEdgeTo
refG/gt ltEdgeTo refH/gt lt/Nodegt ltNode
nameFgt ltEdgeTo refG/gt ltEdgeTo
refH/gt lt/Nodegt lt/Disjgt ...
or
8Disjunctions in XML, 2nd Case
ltDisjgt ltTailgt ltNode nameE/gt ltNode
nameF/gt lt/Tailgt ltHeadgt ltEdgeTo
refG/gt ltEdgeTo refH/gt lt/Headgt lt/Disjgt ...
E
G
F
H
What if the disjunction isnt the full
cross-product?
9Disjunctions in XML, 3rd Case
ltDisjgt ltTailgt ltNode nameI/gt lt/Tailgt ltHeadgt
ltEdgeTo refK/gt lt/Headgt ltTailgt ltNode
nameJ/gt lt/Tailgt ltHeadgt ltEdgeTo
refK/gt ltEdgeTo refL/gt lt/Headgt lt/Disjgt ...
10Time and Space Complexity
- n of nodes in DOM tree
- counts edges as well
- not necessarily proportional to of values in
the database - Ordinary XML traverse tree, add edges.
Distinguish records with primary keys, add edges
for foreign keys. O(n) time, O(n) space.
11Time and Space Complexity
- ltDisjgt same, except only one edge to all
children. O(n), O(n). - ltDisjgt with ltTailgt and ltHeadgt traverse tree,
add ltTailgt and ltHeadgt elements to a list, add one
edge, repeat for each Tail/Head pair. O(n), O(n).
12Summary
- We need to introduce new XML constructs
- ltDisjgt
- Helper constructs ltTailgt and ltHeadgt
- Three cases
- simple tail, compound head
- full cross-product
- partial cross-product
- Time and space requirements consistent with the
transitive closure algorithm
13Future Work
- Solving path queries
- Adding XML constructs for more complicated
disjunctions - e.g. Tail (A or B), Head ((C and D) or E)
- Determining frequency of disjunctive data in
real-world data - Developing a normal form for disjunctive XML
- Minimize redundancy
- Minimize disjunctive tails