Title: Type Indexing in XML Database
1Type Indexing in XML Database
Akhilesh Shirbhate (02329014)
2Indexing in XML Databases
- Structured or semi-structured semantic tag
system. - Real useful data has some semantic meaning.
- Huge size of XML documents in its repository.
- Providing answers efficiently for queries is very
challenging. - We need indices !!!
- The indexing for tree structured XML database is
still an active research area.
3Terminology
- Judgements in type system
- Type rules in type system
- Type equivalence
- Subtyping
- DTD
- XML Schema
- XQuery
- Type
- Typed Language
4Subtyping Proposals for XML
- XML Schema
- XQuery
- Xduce
- Tindex
5XML Schema
- Simple and Complex Types
- Simple type gtgt value space, lexical space, set
of facets - Facets gtgt fundamental, restrictive
- Complex types gtgt order constraint, occurrence
constraint - Model groups gtgt particles held together.
- Supports Restriction subtyping and Extention
subtyping.
6Xquery
- Expression rich
- path expr (based on XPath)
- element constructor
- FLWR expr (for, let, where, return)
- data type modification and comparing (cast)
- Kleene operators supported.
- Can compute joins !!
7Xduce
- Functional language
- Provides
- Subtyping algorithm for regular expression types.
- Type inference algorithm for regular expression
pattern matching. - Type definition
- T () Empty sequence
- T X Type name
- T l T Label
- T T , T Concatenation
- T T T Union
8Xduce (continued)
- Xduce types are equvalent to XQuery types
DEFINE TYPE l1 ELEMENT person ELEMENT name
xsstring, ELEMENT tel xsstring
type l1 person name String, Xtelsta
r type Xtelstar telString,Xtelstar ()
9Xduce internal form
- Internal type expression T is
- T ø Empty set
- T í Leaf
- T ß Internal type
- T T T Union
- T l (X , X) Label
- Type equivalence
- T ø T
- T T T
- ( T U ) R T ( U R )
- T U U T
- T U T R if U R
10Tindex
- Has its roots in theorem proving systems and
XDuce. - Terminology
- term, variable (), clause, multiliteral clause
- Unification Problem
- The unification problem consists of selecting all
terms l in S such that there exists a
substitution µ that satisfies lµ tµ. For
instance, the terms f (a, b) and f (, ) are
both unifiable with the term f (a, ).
11Problem Specified !!!
- Term Indexing problem
- Given a set S (called the set of indexed terms),
a binary relation R over terms (called the
retrieval condition), and a term t (called the
query term), identify the subset M of S
consisting of all terms l such that R(l,t) hold
true. - Type Indexing problem
- Given a set S (called the set of indexed types),
a binary relation R over types (called the
retrieval condition), and a type t (called the
query type), identify the subset M of S
consisting of all types l such that R(l,t) hold
true.
12Supertype selection problem
- Supertype selection problem
- Given a set S of types, and a query type t,
identify the subset M of S consisting of all
types l such that t lt l - Solution is easy !!! Put unification as retrival
contidition in multiterm indexing problem. - To get absolute set of supertypes, we need to
make a second sequencial pass over the set
retrned by above procedure.
13Example of need for second pass
DFFINE TYPE l ELEMENT person ELEMENT name
xsstring ELEMENT tel xsstring person(
X1, X0) where M(X0) í M(X1) name(X2,
X3) M(X2) ß M(X3) tel (X2, X3) X0 l
person ( name ( ß, tel ( ß, ) í ) , í )
Similarly, l1 person ( name ( ß, ) tel (
ß, í ) , í ) l2 person ( name ( ß, ) tel (
( ß, ) í , í )
14Tindex construction
DEFINE TYPE t1 ELEMENT author ELEMENT name
xsstring, ELEMENT email xsstring
DEFINE TYPE t2 ELEMENT author
ELEMENT name xsstring, ELEMENT email
xsstring ELEMENT tel csstring
15Conclusion
- XML data gt Repositories gt XML database gt Query
language gt Indexing gt Content/Structure gt XML
Schema gt Tindex/XDuce - Problems gt Type inference , subtyping, retrival
and unification. - Tindex solves the problem elegantly using theory
from the domain of theorem proving and
datastructures like trie. - Further research is active because the indexing
needs of XML databases differ from those of
relational databases due to semi-structuredness
of XML documents.