XML Database Engines - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

XML Database Engines

Description:

... the size and number of documents that are kept in main memory ... Queries themselves are XML documents with embedded query instructions: staff employee ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 49
Provided by: deptgeo2
Category:
Tags: xml | database | engines

less

Transcript and Presenter's Notes

Title: XML Database Engines


1
XML Database Engines
  • Rakesh Malhotra
  • Thesis Defense
  • July 20, 2001

2
Outline
  • Introduction
  • MMXDB
  • XML Database Engines
  • Lore, XSet, QuiXote
  • Conclusions

3
Introduction
  • A general database system
  • Parser
  • Query optimizer
  • Database engine
  • A native XML database
  • as a system that is developed, from querying to
    storage, for XML data

4
Introduction
  • Database engine
  • Storage
  • Indexing
  • Evaluation
  • Snapshots, Transactions, Security

5
Introduction(working example)
  • ltstaffgt
  • ltemployeegt
  • ltnamegtSmithlt/namegt
  • ltssngt28656667lt/ssngt
  • ltsalarygt28000lt/salarygt
  • ltdnogt28lt/dnogt
  • ltofficegtltbuildinggtAlt/buildinggtltroomgt6
    lt/roomgtlt/officegt
  • lt/employeegt
  • ltemployeegt
  • ltnamegtClarklt/namegt
  • ltssngt12345678lt/salarygt
  • ltsalarygt18000lt/salarygt
  • ltdnogt18lt/dnogt
  • ltofficegtltbuildinggtAlt/buildinggtltroomgt7
    lt/roomgtlt/officegt
  • lt/employeegt
  • lt/staffgt

6
Introduction
  • Working example
  • Find all the employees whose name is Smith

7
Introduction(DOM representation)
8
MMXDB
9
MMXDB
  • Thin client
  • ATT algebra whose syntax resembles a high level
    query language was used to specify queries
  • Query parser and optimizer produce the query
    evaluation tree
  • Query evaluator evaluates the query
  • Storage Manager is responsible for storage and
    indexing

10
Database Engine(Storage)
  • Map to existing relational/object-relational/objec
    t database
  • Several commercial systems (Oracle 9i)
  • SQL or OQL are used as the query language
  • Time Tested
  • Break up the data
  • Not suited to handle data where structure keeps
    changing

11
Database Engine(Storage)
  • Store XML data as text files (with or without
    compression
  • Simple
  • No reconstruction cost in creating original
    document
  • The entire database has to be loaded for query
    processing (re-parse the data every time)
  • Drawbacks can be overcome by partial retrieval
    (store offsets to XML elements inside the text
    file)

12
Database Engine(Storage)
  • Native form
  • No additional layers of mapping
  • Native data structure that is hierarchical

13
Database Engine(Storage)
  • Paging
  • Issues

14
Storage(Lore)
  • Uses a system, Ozone, for storing Object Exchange
    Model (OEM) format data
  • Built on O2 (object database) / each object has
    an oid
  • Ozone is an extension of O2 built to handle
    semi-structured data
  • Ozone has OEMcomplex class and OEMatomic class
  • OEMcomplex can be OEMcomplexset (ordered) or
    OEMcomplexlist (unordered)

15
Storage(Lore)
  • OEM_Staff
  • (employees, OEM(list(Employee)))
  • OEM_Employee
  • (name, OEM_string)
  • (ssn, OEM_integer)
  • (salary, OEM_integer)
  • (dno, OEM_integer)
  • (office, OEM(Office))
  • OEM_Office
  • (building, OEM_string)
  • (room, OEM_integer)

16
Storage(XSet)
  • Entire document is stored either in main memory
    (index) or disk
  • Java serialization is used to store data on disk
  • Each document is assigned a monotonically
    increasing identifier
  • Memory overhead limits the size and number of
    documents that are kept in main memory

17
Storage(QuiXote)
  • Stores data as a set of ltschema, setOfDatagt
    pairs.
  • QNX data model tree structure
  • Uses Millau (compression system) to compress data
    and indices for storage on disk
  • Millau also permits partial retrieval of documents

18
Storage(QuiXote)
19
Storage(MMXDB)
  • Separate objects are created for each element
    node of the DOM that is not the parent of the
    leaf node
  • Collections are implemented as lists
  • Staff_extent is a hashtable that stores object
    instances of staff (e.g. staff0)
  • Staff0 has a List (employee) that stores the
    object ids of staff_employee
  • Java serialization is used for secondary storage

20
(No Transcript)
21
Storage(Comparison)
  • Lore
  • Object oriented database / complex and atomic
    classes
  • XSet
  • Relies heavily on indexing / pointers to
    documents are stored in the index
  • QuiXote
  • Tree like structure / every node is labeled /
    uses compression
  • MMXDB
  • Tree structure and extents in the main memory /
    stores data using Java serialization

22
Database Engine(Indexing)
  • Reduce search on all similar paths by clustering
    them together
  • Reduce search on all similar values by clustering
    them together
  • (Assist in solving regular expressions)

23
Database Engine(Indexing)
24
Indexing(Lore)
  • Value Index
  • Indexes all atomic objects (integer, real,
    string)
  • The incoming edge is associate with the value
  • Supports coercion
  • Vindex(l, Op, Value, x)
  • Label Index
  • Used to traverse up a tree from child to parent
  • Lindex(x,l,y)

25
Indexing(Lore)
  • Edge Index
  • Use to create an index on edges
  • Bindex(x,l,y)
  • Text Index
  • Information retrieval style full text index
  • For a word w, Tindex returns a pair (o,n) where o
    is the leaf node that contains the word w at the
    nth place

26
Indexing(Lore)
  • Path Index
  • Indexes all paths
  • It is a dataguide
  • Dataguide structural summary of all possible
    paths within the database
  • Lots of indices, lots of memory

27
Indexing(XSet)
  • Data access revolves around indexing
  • Each document is parsed and merged into a
    hierarchical tag index
  • Elements are stored as sets inside treaps
  • Treaps use dual indices (data and priority)

28
Indexing(XSet)
29
Indexing(QuiXote)
  • Value index
  • text or attribute
  • Structure index
  • similar to a path index
  • Link index
  • for intra-document references
  • mapping from ID names to elements

30
Indexing(MMXDB)
  • Path index
  • All elements that have the same path are stored
    in the same extent
  • Clustering of data

31
(No Transcript)
32
Database EngineQuery Evaluation
  • Executes the user query based on the query
    evaluation tree provided by the query planner
  • Predicate approach
  • Logical tree is converted into a physical query
    plan
  • Takes advantage of storage and indexing
    mechanisms to efficiently evaluate the query
  • Physical tree is evaluated top-down, bottom up,
    or hybrid

33
Database EngineQuery Evaluation
  • Functional Approach
  • Evaluating a query is akin to evaluating
    functions
  • Algebra operators are expressed as functions and
    the evaluation is recursive
  • Easy to evaluate but inflexible towards operator
    re-arrangement
  • Relies on underlying language for optimization

34
Query Evaluation(Lore)
  • Predicate approach
  • Logical operators are converted to physical ones
  • Several options are explored and one chosen based
    on the cost model
  • Cost based model chooses between top down, bottom
    up, or hybrid approach

35
Query Evaluation(XSet)
  • Queries themselves are XML documents with
    embedded query instructions
  • ltstaffgt
  • ltemployeegt
  • ltnamegtSmithlt/namegt
  • lt/employeegt
  • lt/staffgt
  • The index is searched and documents that match
    are retrieved and the query answered

36
Query Evaluation(QuiXote)
  • Query Processing is carried out using two parts
  • Query pre-processor
  • pre-compiles structural relationships
  • produces indices
  • Query processor
  • processes the user query

37
Query Evaluation(QuiXote)
  • Query pre-processor
  • schema extractor
  • extracts the schema for documents that dont
    have one
  • relationship set generator
  • computes relationships, e.g. child, parent,
    ancestor, attribute, rechability
  • rechability for staff is
  • (employee, 1), (name, 2), (ssn, 2),
    (salary, 2), (dno, 2), (office, 2),
    (building, 3), (room, 3)

38
Query Evaluation(QuiXote)
  • Query processor
  • document filter
  • filters out documents that do not meet the
    criteria
  • query optimizer
  • performs strength reduction (using relationship
    sets)
  • e.g., to select all employees, the query can use
    relationship sets to see that employees are only
    at height 1 from the root, so a deeper search is
    not required.

39
Query Evaluation(QuiXote)
  • query executor
  • query is executed based on the QEP

40
Query Evaluation(MMXDB)
  • The algebra (ATT) expresses queries in a
    functional language
  • Query evaluation is recursive
  • for e in employee(staff0) do
  • for n in name(e) do
  • if data(n) Smith then e

41
Query Evaluation(MMXDB)
42
Query Evaluation(MMXDB)
  • All evaluate functions return a list of values
    that is preceded by the type of the result
  • Evaluate functions are called recursively from
    the root (ForExp in the example)

43
Comparison
  • Lore
  • Predicate approach and cost models
  • XSet
  • Relies on indexing
  • QuiXote
  • predicate approach and cost models
  • MMXDB
  • functional approach

44
Conclusions
  • Research issues of storage, indexing, and query
    evaluation were addressed
  • Storage
  • QuiXote (compression and partial document
    retrieval)
  • Indexing
  • Lore (five kinds of indices)
  • Evaluation
  • Lore (predicate based) and QuiXote

45
Conclusions
  • Native XML databases will not instantly become a
    standard
  • XML specific applications would greatly benefit
  • Improvements to MMXDB
  • adopt storage similar to QuiXote?
  • strengthening of the index (add value index)
  • upward traversal of the tree
  • predicate approach of evaluation, including a
    cost model to compare several plans

46
  • QUESTIONS?

47
Demo
  • ltbibgt
  • ltbook year"1999" isbn"1234-A"gt
  • lttitlegtGimme the Powerlt/titlegt
  • ltauthorgtAbiteboullt/authorgt
  • ltauthorgtBunemanlt/authorgt
  • ltauthorgtSuciult/authorgt
  • lt/bookgt
  • ltbook year"2001" isbn4567-B"gt
  • lttitlegtPower is Takenlt/titlegt
  • ltauthorgtSuciult/authorgt
  • ltauthorgtFelixlt/authorgt
  • lt/bookgt
  • lt/bibgt
  • bib.xml

48
Demo
  • ltreviewsgt
  • ltbookgt
  • lttitlegtGimme the Powerlt/titlegt
  • ltreviewgtA fine booklt/reviewgt
  • lt/bookgt
  • ltbookgt
  • lttitlegtPower is takenlt/titlegt
  • ltreviewgtThis is greatlt/reviewgt
  • lt/bookgt
  • lt/reviewsgt
  • Reviews.xml
Write a Comment
User Comments (0)
About PowerShow.com