Lore: A Database Management System for - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Lore: A Database Management System for

Description:

Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom. Stanford University ... 1)The data can be irregular and not conform to a rigid schema. ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 29
Provided by: E83
Category:

less

Transcript and Presenter's Notes

Title: Lore: A Database Management System for


1
Lore A Database Management System for
Semi-structured Data
Jason McHugh, Serge Abiteboul, Roy Goldman,
Dallan Quass, Jennifer Widom Stanford University
Mustafa Emre ERKOÇ 20.11.2002
2
What is Lore (Lightweight Object Repository) A
DBMS designed specifically for managing semi
structured data. In the paper Storage manager
Indexing Query Processing and
Optimization User Interfaces is
described. Needs for such a DBMS 1)The data
can be irregular and not conform to a rigid
schema. 2)There can be changes in the
structure of the schema such as data elements
can change types, data not conforming previous
schema can be edit and other schema
modifications What Lore Does since it is
designed for managing semi structured data, there
is no existence of schema. Lore data model is
very simple, self describing and nested object
model called OEM(Object Exchange Model)
3
Outline of Presentation Representing and
Querying Semi Structured Model Data Model
Lorel Query Language System Architecture Query
and Update Processing Query Operators and Query
Plan Query Indexing Lindex and Vindex
Index Query Plan Update Query Plan Physical
Storage Novel Features External Data
Manager Data Guides Interfaces API Web System
Status and Future Work
4
What has done so far? The Lorel query language
for semi structured data. 11/96 Enabling query
formulation and optimization in semi structured
database. 08/97 Representing and querying
changes in semi structured data. 02/97 Querying
semi structured heterogeneous information.12/96 Q
uery optimization for semi structured
data. Integrating dynamically-fetched external
information into a dbms for semi structured data.
05/97
5
(No Transcript)
6
  • Representing and querying ? The object exchange
    model
  • Oem is designed for semistructured data Data can
    be thought as labeled directed graph
  • The vertices are objects and names are special
    labels serve as aliases for objects and as
  • entry points into database
  • Every object has a unique object identifier (oid)
    Example 5
  • Atomic Object object that has no outgoing edge.
    Contain a value from one of the basic
  • atomic types such as integer, real,string, gif
    etc
  • Example Object 7 with value Clark
  • Complex Objects all other objects that may have
    subobjects.
  • Example 3 and subobjects 8 9 10 and 11
  • Any object that cannot be accessed by a path from
    some name is considered to be deleted.
  • Members have zero, one or more offices
  • An office is sometimes string and sometimes
    integer and sometimes complex object
  • Room may be a string and integer

7
Representing and querying semistuctured
data?Lorel query language Lorel is an extension
of OQL. Basic building block of lorel is the
simple path expression which is a name followed
by a sequence of labels. Ex DBGruop.Member.Offi
ce Ex select DBGruop.Member.Office Where
DBGruop.Member.Agegt30 Result is office Gates
252 Office Building CIS Room
411 This query will not yield a run-time error
if an Age object has a string value or is complex
or They are singled-value, set-value or even
empty for some group members. The OQL style of
this query is Select O From DBGruop.Member M,
M.Office O Where exits A in M.Age Agt30 As will
be seen later the first step of query processing
is to rewriting the query into OQL style
8
Representing and querying semistuctured
data?lorel query language One can
specify pattern for path pattern for
labels pattern for atomic values. Example Selec
t DBGroup.Member.Name Where DBGroup.Member.Office(
.Room.Cubicle)? like252 Result Name
Jones Name Smith The following query
illustrates subqueries and constructed
results. Select N.Name, (select
M.Project.Title where M.Project.Title
!lore) From DBGroup.Member M Where
M.Project.Title lore Result Name
Jones Title Tsimmis
9
  • System Architecture
  • Access to Lore is through a variety of
    applications or directly from application
  • Program Interface (API )
  • There is a simple textual interface generally
    used by developers.
  • The graphical interface is the primary interface
    for the end users. It provides powerful
  • tools for query results, formulating simple
    queries and mechanism for viewing the
  • multimedia atomic types
  • The query compilation layer is consist of
  • The parser -Preprocessor
  • Query plan generator -Query optimizer.
  • Query optimizer, in addition to doing some
    transformations on the query plan, decides
  • whether the use of indexes is feasible.
  • The Data Engine Layer houses
  • - OEM object manager - Query operators
  • -External data manager - Various utilities.
  • the object manager functions as the translation
    layer between OEM and the low level file

10
System Architecture
11
  • Query and Update Processing in Lore
  • Query processing in Lorel is fairly conventional
    with some exceptions
  • Becuase of the flexibility of Lorel, the
    preprocessing of the parse tree to produce OQL
  • like query is complex.
  • Although the Lore engine is built around standard
    operators, some take an original flavor
  • Ex scan operator.
  • A unique feature of Lore is its automatic
    coercion of atomic values. Coercion has an
  • impact on the implementation of comparators.
  • The result of a Lorel query is always a set of
    OEM object which become subobject of
  • a newly created result object.

12
Query and Update Processing in Lore?Query
processing Query execution strategy is based on
familiar database operators. A recursive
iterator approach is used in query
processing. With the iterators execution begins
at the top of the query plan,with each node in
the plan requesting a tuple from its children and
performing some operations on the tuples After a
nodes complete its operation, it passes a
resulting tuple up to its parent. The tuples on
which the operation takes places is Object
Assignments. OA is a simple Data structure
containing slots corresponding to range variables
in the query
If OA1holds the oid for member Smith then OA2
and Oa3 can hold the oids For one of the
Smiths Office subobject and one of the Smiths
Age subobject respectively
13
Query and Update Processing in Lore?Query
operators Join, project, select operators are
nearly identical to their corresponding
relational operators. Aggregation operator calls
its child exhaustively, storing the result
temporarily or computing the aggregate
incrementally. When the child can produce no more
valid OAs, a new object is created whose value
the final aggregation. Other operators SetOp,
ArithOp, CreateSet, and Gruopby. SetOp handles
the set operations Union, intersect and
except. ArithOp handles arithmatic operators as
addition, multiplication etc. CreateSet is used
to package the result of an arbitrary subquery
before proceeding. Gruopby handles queries that
include a groupby expression.
14
Example select M.Name, count (M.Publication)
from DBGroup.Member M Where M/Dept
CS
OQL like translation is select (select N
from M.Name N), count (select P from
M.Publication) from DB.Group.Member M
where exist D in M.Dept D CS
15
Query and update processing in Lore?Query
Optimization Indexing Query processor
currently implement only a few simple heuristic
optimization technique E.g Selection operators
are pushed down the query tree, and in some cases
redundant operator are eliminated or
combined Lore explored query plans that use
indexes when feasible. In a relational DBMS, an
index is created on an attribute in order to
locate tuples with particular attribute values
quickly. In Lore, such a value index, alone is
not sufficient since path to an object is as
important as the value of object. So there are
two types of indexes Link (Edge) Index also
called Lindex, takes an oid and a label , and
returns the oids of All parents via specified
label. It essentially provides parent
pointers Value Index also called Vindex,
takes a label, operator and value. It returns all
atomic objects having an incoming edge with
specified label and a value satisfying the
specified operator and value.
16
Indexing? Value Indexing Value indexing in Lore
requires some novel features due to its
non-strict typing system. Indexing system deals
with coercion involving integers, reals, and
strings only. But the situation is simplified by
always coercing integers to reals as shown in the
table.
Three types of Vindexes are maintained String
vindex Real vindex String coerced to real
vindex When using a Vindex for comparison there
are two cases 1- If the comparison type string
then i-do a look up in the string vindex ii- if
the Value can be coerced to a real then look up
fir the coerced value in the real vindex 2- If
the value is real or iteger i-do a look up in
the real vindex ii- also look up in the String
coerced to real vindex
17
ExSelect OFrom DBGruop.Member M, M.Office
OWhere exits A in M.Age Agt30
Indexing? Index query plan If the query contains
a comparison between a path expression and an
integer, real or string and if the appropriate
indexes are exist then a query plan that uses
indexes will be created
Query plans using indexes are different in the
shape from those based on Scan operators index
plans traverse the database bottom-up , while
scan based plans perform a top-down traversal.
18
Indexing?Index query plan The new operators
are Vindex iteratively finds all atomic objects
with the value bigger than 30 and an
incoming edge labeled Age, placing their oids in
slot OA2 Lindex iteratively places into OA1 all
parents of the object in OA2 via an Age
edge. Once the object could potentially have
several partners via Age, however
since Age is existentially quantified in the
query, we only want to consider each
parent once, even if the it has several Age
subobject this is why we use once
here. Name_Obj in the second lindex, operator
finds all parents of OA1 object via member
edge and place them in OA0. The objects in AO0
should be named as DBGroup and the purpose of
Name_Obj operator is to do this.
19
Query and Update processing in Lore?Update query
plan Update is done by adding an Update operator
to the query execution engine. Ex update
P.Member (select DBGroup.Member
where DBGroup.Member.NameClark) from
DBGroup.Project p where P.Title Lore or
P.Title Tsimmis This update
operator add an edge between the projects whose
titles are Lore and Tsimmis, and members called
Clark. Here we use Create_edge operation. The
other valid operations are Destroy_edge, modify_a
tomic
20
  • Query and Update Processing in Lore? Bulk Loading
    and Physical Storage
  • Bulk Loading
  • Data can be added to a Lore database in two ways
  • By using update statement
  • By load file
  • In the second case, a textual description of an
    OEM database is accepted by a loas utility
  • which includes useful features such as symbolic
    references for shared subobject, as well
  • as the ability to incorporate new data into an
    existing database.
  • Physical Structure
  • Lore arranges objects in a physical disk pages
    each page has a number of slots with a
  • single object in each slot.
  • Object are variable length and so Lore places
    object according to a first-fit algorithm. Lore
  • use an object forwarding mechanism to handle
    object that grow too large objects that may
  • span page.
  • In addition, Lore supports for large objects that
    span many pages.

21
Novel Features ? External Data External Data
Manager enables dynamic retrieval of information
from other data sources based on the queries
issued to Lore. During the query evaluation,
externally obtained data is combined with
resident data and so The distinction between two
type of data is invisible to end user. During the
query processing, when the execution engine
discover an external object information is
fetched from external source to answer query.
Also the fetched information is stored in
database until it becomes stale.
22
Novel Features?External Data Specifications for
an external objects i-The location of wrapper
program that fetches the external data and
translate it into OEM ii-A quantum that
indicates that indicates the time interval until
the fetched information become stale. iii-A set
of arguments that are used to limit the
information fetched in a call to the external
source
Arguments sent to the external source can come
from 3 places i-the query being processed (query
defined) ii-values of other objects in the local
database(data defined) iii-constant values tied
to external object(hard-coded)
23
Novel Features?External Data Many calls to an
external sources quickly dominate query
processing time. To limit the number of calls
some mechanism is used. One of them is to track
the argument sets used by previous queries and
determine when previously fetched information or
entirely subsumes information required by the
current argument set
24
Novel Features? Data Guides Lore does not have
an explicit schema so query formulation and
optimization are particularly challenging. A
dataguide is a concise and accurate structure of
an OEM database, stored as an OEM object. Each
possible path expression of a database is encoded
exactly once in the dataguides.
Considering the role of missing schema,
the Dataguides can guide the query processor In
relational or O2 systems the schema is
explicitly created before any data is loaded,
However, dataguides are dynamically generated or
maintained.
25
  • Interfaces to Lore? Application Programming
    Interface
  • API provides a gateway between Lore and any user
    interface or client application.
  • For instance, it is used by the systems textual
    interface, which passes user commands to
  • Lore and present result of queries in a
    hierarchical display.
  • API is composed of a small collection of C
    classes. At the highest level it is used
  • as a client program to connect to a Lore
    database.
  • To submit queries
  • To process query result.
  • LoreConnection class is to connect a specific
    database
  • Submit function is to submit Lorel queries. It is
    also used for other Lore command such
  • as index creation and updates. When submit is
    called with a query, it returns query results
  • as a LoreOem object. LoreOem only contains oid
    and the actual value is fetched from
  • database on demand.

26
User Interfaces to Lore?Web interface
By visiting an url and choosing a database, a
user can connect to web interface. The user may
submit a textual Lorel query or select a sample
prewritten query. Result are displayed in an
HTML, hierarchical format.
27
System status and Future work Lore is consist of
approximately 60.000 lines of C codes. Some
language features such as external predicates and
functions are still under implementation.
General path expression are not implemented in
their full generality, although a substantial
and very useful is. It is planned to explore how
Lorel could be translated to SQL 3 and thus
implemented on top of an object relational
dbms. Performance Issues There is a little
performance analysis are done for Lore.
Comparing the performance of lore against the
implemented lorel on top of O2 There is a
significant additional research to do in query
optimization, including query writing, operation
ordering, selecting the best use of indexes in
query plans. Currently all expansions of path
expressions in query path are done at run-time.
It is planned to explore the compile-time
approach and compare its performance against
the Run-time approach.
28
System Status and Future Work It is also
considering to investigate more sophisticated
techniques for customizing the presentation of
OEM object in a Web environment. Finally, it is
planned to incorporate a special text type along
with a full-text indexing System because many
applications appropriate for a DMBS such as Lore,
include a Significant amount of text data.
Write a Comment
User Comments (0)
About PowerShow.com