Title: Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database
1Storing and Maintaining Semistructured Data
Efficiently in an Object-Relational Database
- Mo Yuanying and Ling Tok Wang
2Contests
- 1. Main accomplishment
- 2. Related Works
- 3. ORA-SS
- 4. Storing Algorithm
- 5. Comparison with Related Works
- 6. Conclusion
3Main Accomplishment
- This study provides an efficient and consistent
storage for semistructured data by developing
algorithms that map the XML document to logical
ORA-SS model and then to an object-relational
data store.
4Contests
- 1. Main accomplishment
- 2. Related Works
- 3. ORA-SS
- 4. Storing Algorithm
- 5. Comparison with Related Works
- 6. Conclusion
5(1) the file system
Related Works
- store each XML document as a separate operating
system file and use a DOM or SAX parser whenever
the document is accessed by a query - Disadvantage
- XML files in ASCII format need to be parsed every
time when they are accessed for either browsing
or querying. - the entire parsed file must be memory-resident
during query processing in DOM. - it is hard to build and maintain indices on
documents stored this way. - update operations are difficult to implement.
6(2)Using a relational DBMS
Related Works
- XML data is stored in relations and the XML query
language (for example, XQuery) is translated to
SQL and executed by the underlying relational
database system
- Disadvantages
- A great deal of redundancy
- Difficult to do search or update
- Handling multi-valued attribute is expensive
-- The Edge Approach -- The Attribute Approach --
Universal Table -- Normalized Universal
Approach -- STORED
7(3)Using a storage manager
Related Works
- the XML query is parsed, translated to a suitable
operator tree representation, optimized, and then
executed by an XML Query Engine - -- Shore
- -- B-tree
- Disadvantage
- Inconvenient when doing the search or update
8(4)Our approach --Store ORA-SS in nested
relations
Related Works
- Problems in existing storage approaches
- Stored in flat files it is long and difficult
to query or update - Relational DBMS these approaches cannot get the
semantic information - ORA-SS reflects the nested structure of
semi-structured data, distinguishes between
object classes, relationship types and
attributes. It is possible to specify the degree
of n-ary relationship types and indicate if an
attribute is an attribute of a relationship type
or an attribute of an object class. Such
information is essential for designing an
efficient and non-redundant storage organization
for semi-structured data - Handling multi-valued attribute better in nested
relations
9Contests
- 1. Main accomplishment
- 2. Related Works
- 3. ORA-SS
- 4. Storing Algorithm
- 5. Comparison with Related Works
- 6. Conclusion
10ORA-SS
- A semantically richer data model for
semi-structured data - 3 main concepts
- Object class
- Relationship type
- Attribute
11Example
ORA-SS
12Example (Cont)
ORA-SS
- Ternary relationship type
13Example (Cont)
ORA-SS
- The distinction between binary and ternary
relationship types cannot be made in other
semi-structured data models.
14ORA-SS
- ORA-SS can specify the degree of n-ary
relationship types - ORA-SS can indicate if an attribute is an
attribute of a relationship type or an attribute
of an object class - Existing semi-structured data models cannot
specify such information while it is essential
and important for storage
15Contests
- 1. Main accomplishment
- 2. Related Works
- 3. ORA-SS
- 4. Storing Algorithm
- 5. Comparison with Related Works
- 6. Conclusion
16ORA-SS to OR database
Storing Algorithm
- Object-Relational database can handle
multi-valued attributes efficiently. - Multi-valued attributes are treated as repeating
groups in nested relations.
17ORA-SS to OR database
Storing Algorithm
- Main rules
- Each object class together with its attributes
forms a nested relation while multi-valued
attributes as repeating groups of this relation
(Object relation). - Each relationship type(object classes involved in
this relationship type) together with its
attributes forms a nested relation while
multi-valued attributes as repeating groups of
this relation (Relationship relation).
18(1)Object class translation algorithm
Storing Algorithm
- O1 The identifier and candidate key of this
object class is the primary key and candidate key
of the generated relation. - O2 Each single-valued attribute of this object
class is a single-valued attribute of the
generated relation. - O3 Composite attributes of object class are
represented directly. They are replaced by their
components in the generated relation.
19Object class translation algorithm (cont)
Storing Algorithm
- O4 Each multi-valued attribute of this object
class forms a repeating group in this relation. - O5 Each reference is a foreign key in this
relation. - O6 Each disjunctive attribute is treated as two
attributes. - O7 For the ID dependency relationship type, the
rule for the ID dependent object class is the
same as the rule for the regular object class.
The ID dependent object class together with its
attributes forms a nested relation within its
parent object class.
20Translation Example1
Storing Algorithm
21(2)Relationship type translation algorithm
Storing Algorithm
- R1 All the identifiers of the object classes
participating in this relationship type form the
single-valued attributes of the nested relation. - The key of the relationship type can be
determined by the participation constraint of the
relationship type. - R2 Each single-valued attribute of this
relationship type is a single-valued attribute of
the generated relation.
22Relationship type translation algorithm (cont)
Storing Algorithm
- R3 Composite attributes of relationship type are
represented directly. They are replaced by their
components in the generated relation - R4 Each multi-valued attributes of this
relationship type forms a repeating group in this
relation. - R5 A disjunctive relationship type is treated as
two relationship types. - R6 There is no need to translate ID dependency
relationship type.
23Translation Example1
Storing Algorithm
24Translation for Ordering and ANY
Storing Algorithm
- (3)Translation for Ordering
- we define another attribute named ordinal within
the ordered object class (ie, the ordered
attribute). - (4)Translation for ANY
- the unknown structured attribute or an attribute
may have a different structure for different
instances, which is denoted as ANY - we define a separate table as (Identifier, ANY,
ANY-value). - Identifier is the identifier of the object class
or the relationship type which this ANY belongs
to. - ANY is the different structure name (the TAG) for
the different instances. - ANY-value is its value.
25Translation Results
Storing Algorithm
- Followed these algorithms, the Normal Form ORA-SS
schema will result in the normal form nested
relations. - the undesirable update anomalies in
semi-structured databases are removed and any
redundancy due to many-to-many relationships and
n-ary relationships are controlled
26Contests
- 1. Main accomplishment
- 2. Related Works
- 3. ORA-SS
- 4. Storing Algorithm
- 5. Comparison with Related Works
- 6. Conclusion
27Comparison
- Other models
- Supply(J, S, P, price, Qty)
28Conclusion
- Our approach is to use ORA-SS as our data model
and use object-relational database as the
database management system. - We can store and access the semi-structured data
correctly, more efficient and without avoidable
redundancy. - There is no node ID needed in our approach.
29Conclusion (cont)
- Our approach can capture the semantic information
which is essential and important for storage. - Our approach can represent the degree of n-ary
relationship types. - Our approach can represent the attribute as
attribute of object class or attribute of
relationship type.