Type Indexing in XML Database PowerPoint PPT Presentation

presentation player overlay
1 / 15
About This Presentation
Transcript and Presenter's Notes

Title: Type Indexing in XML Database


1
Type Indexing in XML Database
Akhilesh Shirbhate (02329014)
2
Indexing in XML Databases
  • Structured or semi-structured semantic tag
    system.
  • Real useful data has some semantic meaning.
  • Huge size of XML documents in its repository.
  • Providing answers efficiently for queries is very
    challenging.
  • We need indices !!!
  • The indexing for tree structured XML database is
    still an active research area.

3
Terminology
  • Judgements in type system
  • Type rules in type system
  • Type equivalence
  • Subtyping
  • DTD
  • XML Schema
  • XQuery
  • Type
  • Typed Language

4
Subtyping Proposals for XML
  • XML Schema
  • XQuery
  • Xduce
  • Tindex

5
XML Schema
  • Simple and Complex Types
  • Simple type gtgt value space, lexical space, set
    of facets
  • Facets gtgt fundamental, restrictive
  • Complex types gtgt order constraint, occurrence
    constraint
  • Model groups gtgt particles held together.
  • Supports Restriction subtyping and Extention
    subtyping.

6
Xquery
  • Expression rich
  • path expr (based on XPath)
  • element constructor
  • FLWR expr (for, let, where, return)
  • data type modification and comparing (cast)
  • Kleene operators supported.
  • Can compute joins !!

7
Xduce
  • Functional language
  • Provides
  • Subtyping algorithm for regular expression types.
  • Type inference algorithm for regular expression
    pattern matching.
  • Type definition
  • T () Empty sequence
  • T X Type name
  • T l T Label
  • T T , T Concatenation
  • T T T Union

8
Xduce (continued)
  • Xduce types are equvalent to XQuery types

DEFINE TYPE l1 ELEMENT person ELEMENT name
xsstring, ELEMENT tel xsstring
type l1 person name String, Xtelsta
r type Xtelstar telString,Xtelstar ()
9
Xduce internal form
  • Internal type expression T is
  • T ø Empty set
  • T í Leaf
  • T ß Internal type
  • T T T Union
  • T l (X , X) Label
  • Type equivalence
  • T ø T
  • T T T
  • ( T U ) R T ( U R )
  • T U U T
  • T U T R if U R

10
Tindex
  • Has its roots in theorem proving systems and
    XDuce.
  • Terminology
  • term, variable (), clause, multiliteral clause
  • Unification Problem
  • The unification problem consists of selecting all
    terms l in S such that there exists a
    substitution µ that satisfies lµ tµ. For
    instance, the terms f (a, b) and f (, ) are
    both unifiable with the term f (a, ).

11
Problem Specified !!!
  • Term Indexing problem
  • Given a set S (called the set of indexed terms),
    a binary relation R over terms (called the
    retrieval condition), and a term t (called the
    query term), identify the subset M of S
    consisting of all terms l such that R(l,t) hold
    true.
  • Type Indexing problem
  • Given a set S (called the set of indexed types),
    a binary relation R over types (called the
    retrieval condition), and a type t (called the
    query type), identify the subset M of S
    consisting of all types l such that R(l,t) hold
    true.

12
Supertype selection problem
  • Supertype selection problem
  • Given a set S of types, and a query type t,
    identify the subset M of S consisting of all
    types l such that t lt l
  • Solution is easy !!! Put unification as retrival
    contidition in multiterm indexing problem.
  • To get absolute set of supertypes, we need to
    make a second sequencial pass over the set
    retrned by above procedure.

13
Example of need for second pass
DFFINE TYPE l ELEMENT person ELEMENT name
xsstring ELEMENT tel xsstring person(
X1, X0) where M(X0) í M(X1) name(X2,
X3) M(X2) ß M(X3) tel (X2, X3) X0 l
person ( name ( ß, tel ( ß, ) í ) , í )
Similarly, l1 person ( name ( ß, ) tel (
ß, í ) , í ) l2 person ( name ( ß, ) tel (
( ß, ) í , í )
14
Tindex construction
DEFINE TYPE t1 ELEMENT author ELEMENT name
xsstring, ELEMENT email xsstring
DEFINE TYPE t2 ELEMENT author
ELEMENT name xsstring, ELEMENT email
xsstring ELEMENT tel csstring
15
Conclusion
  • XML data gt Repositories gt XML database gt Query
    language gt Indexing gt Content/Structure gt XML
    Schema gt Tindex/XDuce
  • Problems gt Type inference , subtyping, retrival
    and unification.
  • Tindex solves the problem elegantly using theory
    from the domain of theorem proving and
    datastructures like trie.
  • Further research is active because the indexing
    needs of XML databases differ from those of
    relational databases due to semi-structuredness
    of XML documents.
Write a Comment
User Comments (0)
About PowerShow.com