Lore: A Database Management System for Semistructured Data - PowerPoint PPT Presentation

About This Presentation
Title:

Lore: A Database Management System for Semistructured Data

Description:

The data may be irregular and thus not conform to a rigid schema. ... WHERE DBGroup.Member.Office(.Room%|.Cubicle)? Like '%2' Result Name 'Jones' Name 'Smith' ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 59
Provided by: sanjay70
Learn more at: https://web.mst.edu
Category:

less

Transcript and Presenter's Notes

Title: Lore: A Database Management System for Semistructured Data


1
Lore A Database Management System for
Semistructured Data
2
LORE
  • Lore Lightweight Object Repository
  • http//WWW-DB.Stanford.EDU/lore/demo/

3
Lore - motivation
  • The data may be irregular and thus not conform to
    a rigid schema.
  • Relational data model has null values, and OO
    models have inheritance and complex objects. Both
    have difficulties in designing schemas to
    incorporate irregular data.
  • It may be difficult to decide in advance on a
    single, correct schema, The structure of the data
    may evolve rapidly, data elements may change
    types, or data not conforming to previous
    structure may be added.

4
  • Thus, there is a need for management of
  • semi-structured data!
  • Lore system manages semi-structured data. The
    data managed by Lore is not confined to a schema
    and it may be irregular or incomplete.
  • OEM is the Lores data model. OEM - object
    Exchange Model - graph based self-describing
    object instance model where nodes are objects and
    edges are labeled with attribute names and leaf
    nodes have atomic values
  • Lore is light weight object repository and Lorel
    is Lores query language.

5
  • Path expression queries
  • Automatic type coercion
  • Use of Data guides Structural summary of the
    database

6
  • Objects with Oid
  • Atomic objects no outgoing edge and contain a
    value from one of the basic atomic types such as
    Integer, real, string, gif, java, audio etc.
  • Complex objects may have outgoing edges
  • Names Sp. Labels that serves as aliases for
    objects and as entry points to the database.

7
Object Exchange Model - OEM
  • Motivation - information exchange and extraction
  • Each value exchanged is given an explicit label.
  • Object ?temp-in-Fahrenheit, integer, 80? -
    temp-in-Fahrenheit is the label. Each object is
    self-describing, with a label, type and value.
  • ?set-of-temps, set, cmpnt1, cmpnt2 ?
  • cmpnt1 is ?temp-in-Fahrenheit, integer, 80?
  • cmpnt2 is ?temp-in-Celsius, integer, 20?

8
Labels
  • Plays two roles
  • identifying an object (component)
  • identifying the meaning of an object (component)

?person-record, set, cmpnt1, cmpnt2, cmpnt3 ?
cmpnt1 is ?person-name, string, Fred?
cmpnt2 is ?office-num-in-bldg-5, integer, 333?
cmpnt3 is ?department, string, toy?
  • Person-name both identifies cmpnt1 and coveys its
    meaning.
  • In relational data this corresponds to .

9
Labels - Issues
  • What does the label mean?
  • Database of labels
  • Ontology of labels - within each source
  • Labels are relative (more specific) to the source
    of the data object.
  • Similar labels from different sources need to be
    resolved.
  • Labels provide the flexibility in representing
  • object structure

10
Self-describing data models
  • Have been in existence for a long time? Why
    additional interest now?
  • Use the nature of self-describing data model
    for information exchange, and to extend the model
    to include object nesting.
  • To provide an appropriate object request language
    (query facility)

11
OEM - Specification
  • Each object in OEM has the following structure
  • Label A variable character string describing
    what the object represents.
  • Type The data type of the objects value. Each
    is either an atom type, or type set.
  • Value A variable-length value of the object.
  • Object-ID A unique variable-length identifier
    for the object or null.

12
OEM - Summary
  • OEM is an information exchange model. It does not
    specify how objects are stored at source.
  • OEM does specify how objects are received at a
    client, but after objects are received they can
    be stored in any way the client likes.
  • Each source has a distinguished object with
    lexical identifier root.
  • Note the schema-less nature of OEM is
    particularly useful when a client does not know
    in advance the labels or structure of OEM objects.

13
  • ltbiblio,set,doc1,doc2,,docngt
  • doc1 is ltdoc, set, auths1, topic1, call-no1gt
  • auths1 is ltauth-set,set auth11gt
  • auth11 is ltauth-ln, string, Ullmangt
  • topic1 is lttopic, string,Databasesgt
  • call-no1 is ltinternal-call-no, integer, 25gt
  • doc2 is ltdoc, set, auths2, topic2, call-no2gt
  • auths2 is ltauth-set,set auth21, auth22,
    auth23gt
  • auth21 is ltauth-ln, string, Ahogt
  • auth22 is ltauth-ln, string, Hopcroftgt
  • auth23 is ltauth-ln, string, Ullmangt

Example
  • topic2 is lttopic, string,Algorithmsgt
  • call-no1 is ltdewey-decimal, string, BR273gt
  • docn is ltdoc, set, authsn, topicn, call-nongt
  • authsn is ltauth,string, Crichtongt
  • topic1 is lttopic, string,Dinosaursgt
  • call-no1 is ltfictional-call-no, integer, 95gt
  • biblio is the root object.

14
OEM - QL
  • SELECT Fetch-expression
  • FROM Object
  • WHERE Condition
  • The result of this query is itself an object,
    with special label answer
  • ?answer, set, obj1, obj2, , objn ?
  • Each returned obji is a component of object
    specified in the From clause of the query, where
    the component is located by the Fetch-expression
    and satisfies the Condition.

15
Path
  • The notion of path is used in both
    Fetch-Expression in the Select clause and the
    condition in the Where clause.
  • Path describes traversals through an object using
    subobject structure and labels.
  • Example biblio.doc.auth
  • Paths are used in Fetch-Expression to specify
    which components are are returned in the answer
    object.
  • Paths are used in the condition to qualify the
    fetched objects or other (related) components in
    the same object structure.

16
Queries - Simple
  • Retrieve the topic of each document for which
    Ullman is one of the authors
  • SELECT biblio.doc.topic
  • FROM root
  • WHERE biblio.doc.auth-set.auth-ln Ullman
  • Intuitively, the querys where clause finds all
    paths through subobject structure with the
    sequence of labels biblio,doc,auth-set,auth-ln
    such that the object at the end of the path has
    value Ullman.
  • ltanswer, set, obj1, obj2gt
  • obj1 is lttopic, string, Databasesgt
  • obj2 is lttopic, string, Algorithmsgt

17
Queries - wild-cards
  • Retrieve all documents with internal call number
  • SELECT biblio.?.topic
  • FROM root
  • WHERE biblio.?.internal-call-no
  • ? label matches any label. For this query,
    the doc labels can be replaced by any other
    strings and query would produce the same result.
    By convention, two occurrences of ? In the same
    query must match the same label unless variables
    are used.
  • ltanswer, set, obj1gt
  • obj1 is lttopic, string, Databasesgt

18
Queries - wild-paths
  • Retrieve all documents with internal call number
  • SELECT .topic
  • FROM root
  • WHERE .internal-call-no
  • Symbol matches any path of length one or
    more. The use of followed by a single label is
    a convenient and common way to locate objects
    with a certain label in complex structure.
    Similar to ?, two occurrences of in the same
    query must match the same sequence of labels,
    unless variables are used.
  • ltanswer, set, obj1gt
  • obj1 is lttopic, string, Databasesgt

19
Queries - variables
  • Retrieve each document for which both
    Hopcroft and Aho are co-authors
  • SELECT biblio.doc
  • FROM root
  • WHERE biblio.doc.auth-set.auth-ln(a1)Aho and
  • biblio.doc.auth-set.auth-ln(a1)H
    opcroft
  • Here, the query finds all the paths with
    structure biblio, doc, auth-set, and with two
    distinct path completions with label auth with
    values Aho and Hopcroft
  • ltanswer, set, obj1gt
  • obj1 is the complete doc2

20
OEM (Cont.)
  • Examples
  • Object 3 is complex, and its subobjects are 8,
    9, 10, and 11.
  • Object 7 is atomic and has value Clark.
  • DBGroup is a name that denotes object 1.(Names
    are entry points into the database).

21
An OEM Database
DBGroup
1
Member
Project
Member
Member
Project
Member
2
3
4
5
6
Name
Project
Name
Office
Project
Age
Name
Age
Office
Office
9
11
8
10
12
13
14
7
15
16
Clark
Smith
46
Gates 252
Lore
Tsimmis
Jones
28
Room
Building
Room
Building
17
18
19
20
CIS
411
CIS
252
22
Lorel Queries - Simple Path Expression
  • Retrieve the offices of members with age greater
    than 30 years
  • Query SELECT DBGroup.Member.Office
  • WHERE DBGroup.Member.Age gt 30
  • Result Office Gates 252
  • Office
  • Building CIS
  • Room 411

23
Lorel Query Rewrite
  • Previous query rewritten to
  • select Ofrom DBGroup.Member M, M.Office Owhere
    exists y in M.Age y lt 30
  • Comparison on age transformed to existential
    condition.
  • Since all properties are set-valued in OEM.
  • A user can ask DBGroup.Member.Age lt 30 regardless
    of whether Age is single valued, set valued, or
    unknown.

24
Lorel Query Rewrite
  • Why?
  • Breaking query into simple path expressions
    necessary for query optimization.
  • Need to explicitly handle coercion.
  • Atomic objects and values. 0.5 lt 0.9 should
    return true
  • Comparing objects and sets of objects.
    DBGroup.Member.Age is a set of objects.

25
Queries - General Path Expression
  • Query SELECT DBGroup.Member.Name
  • WHERE DBGroup.Member.Office(.Room.Cubicle)?
  • Like 252
  • Result Name Jones
  • Name Smith
  • Room matches all labels starting from Room, like
    Room68. stands for disjunction. ? indicates
    that the label pattern is optional. like 252
    specifies that the data value should end with
    string 252.

26
Queries - SubQueries
Retrieve Lore project members who work on other
projects Query SELECT M.Name, ( SELECT
M.Project.Title WHERE M.Project.Title !
Lore) FROM DBGroup.Member M WHERE
M.Project.Title Lore Result Member Name
Jones Title Tsimmis
27
Data Guides
  • A DataGuide is a concise and accurate summary of
    the structure of an OEM database (stored as OEM
    database itself, kind of like the system
    catalog).
  • Why?
  • No explicit schema, how do we formulate
    meaningful queries?
  • Large databases (cant just view graph
    structure).
  • What if a path expression doesnt exist (waste).
  • Each possible path expression is encoded once.

28
9, 13
29
DataGuides As Histograms
  • Each object in the dataguide can have a link to
    its corresponding target set.
  • A target set is a set of oids reachable by that
    path.
  • TS of DBGroup.Member.Age is 9, 13.
  • This is a path index. Can find set of objects
    reachable by a particular path.
  • Can store statistics in DataGuide
  • For example, the of atomic objects of each type
    reachable by p.

30
Conclusions
  • Takes advantage of the structure where it exists.
  • Handles lack of structure well (data type
    coercion, general path expressions).
  • Query language allows users to get and update
    data from semistructured sources.
  • DataGuide allows users to determine what paths
    exist, and gives useful statistical information
  • Lore does facilitate query and updates on
    semi-structural databases

31
OEM vs. XML
  • OEMs objects correspond to elements in XML
  • Sub-elements in XML are inherently ordered.
  • XML elements may optionally include a list of
    attribute value pairs.
  • Graph structure for multiple incoming edges
    specified in XML with references (ID, IDREF
    attributes). i.e. the Project attribute.

32
OEM to XML
  • Example
  • ltMember project5 6gt ltnamegtJoneslt/namegt lta
    gegt46lt/agegt ltofficegt ltbuildinggtgateslt/buildin
    ggt ltroomgt252lt/roomgt lt/officegtlt/membergt
  • This corresponds to rightmost member in the
    example OEM, where project is an attribute.

33
Query Optimization for Semistructured Data
34
(No Transcript)
35
External Data Manager
  • Enables retrieval of information from other data
    sources, transparent to the user.
  • An external object in Lore is a placeholder for
    the external data and specifies how lore
    interacts with an external data source.
  • The spec for an external object includes
  • Location of a wrapper program to fetch and
    convert data to OEM, time interval until fetched
    information becomes stale, and a set of arguments
    used to limit info fetched from external source.

36
Layer 0
  • Access to Lore API, Applications
  • Parser textual query as input and parse tree as
    output
  • Preprocessor input parse tree to OQL like query
  • Query plan and query optimizer (index etc)

37
Layer 1
  • Object manager translation layer between OEM to
    lower level files, compare objects,
  • Query operator execute query plan, perform
    simple coercion, iterating over subobjects of a
    complex object

38
  • QUERY
  • select M.Name,
  • (select M.Project.Title where M.Project.Title !
    "Lore" )
  • from DBGroup.Member M
  • where M.Project.Title "Lore"
  • RESULT
  • Member
  • Name "Jones"
  • Title "Tsimmis"

39
  • update P.Member
  • ( select DBGroup.Member
  • where DBGroup.Member.Name "Clark" )
  • from DBGroup.Project P
  • where P.Title "Lore" or
  • P.Title "Tsimmis"
  • P.member specifies to add member edges between
    P and every object returned by subquery.

40
Query Execution Plan
  • Iterative approach in query processing
  • Execution begins at the top of the query plan,
    with each node in the plan requesting a tuple at
    a time from its children and performing some
    operation on the tuple.
  • After a node completes its operation, it passes a
    resulting tuple to its parent.

41
  • Object assignment OA is a simple data
    structures containing slots corresponding to
    range variables in the query and some additional
    slots
  • Each slot within an OA will hold the oid of a
    vertex on a data path currently being considered
    by the query engine
  • Example OA1 holds oid for member smith, then
    OA2 and OA3 can hold the oids for one of smiths
    office subobjects and one of his age subobjects
    resp.

42
  • Query Operators
  • Each operator takes a number of arguments with
    the last argument being OA slot that will contain
    the result of the operation.
  • Select , project has no target slot.
  • Scan returns all oids that are subobjects of a
    given object
  • Scan (starting OA slot, path exp., Target OA
    slot)
  • Scan until no subobjects that satisfies path
    expression
  • Scan (OA1, office, OA2) place into slot OA2
    one at a time all office subobjects appearing in
    slot OA1.

43
  • Join/select/project nearly identical to RDBMS
  • Project is to limit which objects should be
    returned
  • Select applied predicate to the object identified
    by the oid in the OA slot specified
  • Aggregate implements quantification and
    aggregation node calls its child exhaustively ,
    storing the results temp. or computing
    aggregation
  • A new object is created when no more valid OAs

44
  • select O
  • from DBGroup.Member M,
  • M.Office O
  • where exists A in M.Age A gt 30

45
(No Transcript)
46
Primary Operators
  • Setop, ArithOp, Create Set and Groupby
  • Setop handles union, intersect,except
  • Arithop addition, multiplication
  • Createset is to package the results of an
    arbitrary subquery before proceeding its called
    its child exhaustively , storing each oid
    returned as part of a newly created complex
    object.
  • It stores oid for the new set of objects within
    target slot
  • Group by handles subquery that includes a groupby
    expression

47
  • select M.Name, count(M.Publication)
  • from DBGroup.Member M
  • where M.Dept "CS"
  • select (select N from M.Name N), count(select P
  • from M.Publication P)
  • from DBGroup.Member M
  • where exists D in M.Dept D "CS"

48
(No Transcript)
49
Indexing
  • Value Index vindex
  • Lindex a link (edge) index
  • Lindex (oid, label)
  • and returns the oids of all parents via the
    specified label (provides parent pointers)
  • Vindex (label,operator,value) returns all
    atomic objects having an incoming edge with the
    specified label and a value satisfying the
    specified operator (lt)

50
Index Query Plan
  • Bottem Up
  • Locate all objects with desired values and
    appropriately labeled incoming edges via vindex
  • Using Lindex then traverse up from these objects
    to match the path exp

51
New Query Operators
  1. Vindex
  2. Lindex
  3. Once
  4. Named-obj

52
Indexes
  • Vindex(l, op, value, x) places into x all atomic
    objects that satisfy the op value condition
    with an incoming edge labeled l.
  • Vindex(Age, lt, 30,y) places into y objects with
    age lt 30.
  • Lindex(x, l, y) places into x all objects that
    are parents of y via edge labeled l.
  • Lindex(x, Age, y) places into x all parents of
    y via label Age.

53
Indexes (cont.)
  • Bindex(l, x, y) finds all parent-child object
    pairs connected by a label l.
  • Bindex(Age, x, y) locates all parent-child
    pairs with label Age.
  • Pindex(PathExpression, x) placed into x all
    objects reachable via the path expression.
  • Pindex(A.B x, x.C y, y) places into y all
    objects reachable by going from A to B to C.
  • Uses DataGuide.

54
  • Vindex as B tress
  • Lindex Linear Hashing
  • String Vindex index entries for all string
    based atomic values
  • Real Vindex index entries for all numeric based
    atomic values
  • String-coerced-to real Vindex
  • Contains all strings values that can be coerced
    into an integer or real

55
  • If the value is of type string then
  • Lookup in string index
  • If the value can be coerced to a real then lookup
    in the coerced value in real vindex
  • If the value is of type real (integer) then
  • Lookup in real vindex
  • Also lookup in the string-coerced to real vindex

56
(No Transcript)
57
Simple Query
  • select Ofrom DBGroup.Member M, M.Office Owhere
    exists y in M.Age y lt 30
  • Possible plans
  • Top-down (similar to pointer-chasing,
    nested-loops join)
  • Use Vindex to check y lt 30, traverse backwards
    from child to parent using Lindex(bottom-up).
  • Hybrid, both top down and bottom up. Meet in
    middle.

58
Coercion for Basic op.
  • Arg2 String Real Int
  • Arg1
  • String - Stirng Real both Real
  • Real Stirng Real - Int Real
  • Int both Real Int Real -
Write a Comment
User Comments (0)
About PowerShow.com