Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database


1
Storing and Maintaining Semistructured Data
Efficiently in an Object-Relational Database
  • Mo Yuanying and Ling Tok Wang

2
Contests
  • 1. Main accomplishment
  • 2. Related Works
  • 3. ORA-SS
  • 4. Storing Algorithm
  • 5. Comparison with Related Works
  • 6. Conclusion

3
Main Accomplishment
  • This study provides an efficient and consistent
    storage for semistructured data by developing
    algorithms that map the XML document to logical
    ORA-SS model and then to an object-relational
    data store.

4
Contests
  • 1. Main accomplishment
  • 2. Related Works
  • 3. ORA-SS
  • 4. Storing Algorithm
  • 5. Comparison with Related Works
  • 6. Conclusion

5
(1) the file system
Related Works
  • store each XML document as a separate operating
    system file and use a DOM or SAX parser whenever
    the document is accessed by a query
  • Disadvantage
  • XML files in ASCII format need to be parsed every
    time when they are accessed for either browsing
    or querying.
  • the entire parsed file must be memory-resident
    during query processing in DOM.
  • it is hard to build and maintain indices on
    documents stored this way.
  • update operations are difficult to implement.

6
(2)Using a relational DBMS
Related Works
  • XML data is stored in relations and the XML query
    language (for example, XQuery) is translated to
    SQL and executed by the underlying relational
    database system
  • Disadvantages
  • A great deal of redundancy
  • Difficult to do search or update
  • Handling multi-valued attribute is expensive

-- The Edge Approach -- The Attribute Approach --
Universal Table -- Normalized Universal
Approach -- STORED
7
(3)Using a storage manager
Related Works
  • the XML query is parsed, translated to a suitable
    operator tree representation, optimized, and then
    executed by an XML Query Engine
  • -- Shore
  • -- B-tree
  • Disadvantage
  • Inconvenient when doing the search or update

8
(4)Our approach --Store ORA-SS in nested
relations
Related Works
  • Problems in existing storage approaches
  • Stored in flat files it is long and difficult
    to query or update
  • Relational DBMS these approaches cannot get the
    semantic information
  • ORA-SS reflects the nested structure of
    semi-structured data, distinguishes between
    object classes, relationship types and
    attributes. It is possible to specify the degree
    of n-ary relationship types and indicate if an
    attribute is an attribute of a relationship type
    or an attribute of an object class. Such
    information is essential for designing an
    efficient and non-redundant storage organization
    for semi-structured data
  • Handling multi-valued attribute better in nested
    relations

9
Contests
  • 1. Main accomplishment
  • 2. Related Works
  • 3. ORA-SS
  • 4. Storing Algorithm
  • 5. Comparison with Related Works
  • 6. Conclusion

10
ORA-SS
  • A semantically richer data model for
    semi-structured data
  • 3 main concepts
  • Object class
  • Relationship type
  • Attribute

11
Example
ORA-SS
  • Binary relationship type

12
Example (Cont)
ORA-SS
  • Ternary relationship type

13
Example (Cont)
ORA-SS
  • The distinction between binary and ternary
    relationship types cannot be made in other
    semi-structured data models.

14
ORA-SS
  • ORA-SS can specify the degree of n-ary
    relationship types
  • ORA-SS can indicate if an attribute is an
    attribute of a relationship type or an attribute
    of an object class
  • Existing semi-structured data models cannot
    specify such information while it is essential
    and important for storage

15
Contests
  • 1. Main accomplishment
  • 2. Related Works
  • 3. ORA-SS
  • 4. Storing Algorithm
  • 5. Comparison with Related Works
  • 6. Conclusion

16
ORA-SS to OR database
Storing Algorithm
  • Object-Relational database can handle
    multi-valued attributes efficiently.
  • Multi-valued attributes are treated as repeating
    groups in nested relations.

17
ORA-SS to OR database
Storing Algorithm
  • Main rules
  • Each object class together with its attributes
    forms a nested relation while multi-valued
    attributes as repeating groups of this relation
    (Object relation).
  • Each relationship type(object classes involved in
    this relationship type) together with its
    attributes forms a nested relation while
    multi-valued attributes as repeating groups of
    this relation (Relationship relation).

18
(1)Object class translation algorithm
Storing Algorithm
  • O1 The identifier and candidate key of this
    object class is the primary key and candidate key
    of the generated relation.
  • O2 Each single-valued attribute of this object
    class is a single-valued attribute of the
    generated relation.
  • O3 Composite attributes of object class are
    represented directly. They are replaced by their
    components in the generated relation.

19
Object class translation algorithm (cont)
Storing Algorithm
  • O4 Each multi-valued attribute of this object
    class forms a repeating group in this relation.
  • O5 Each reference is a foreign key in this
    relation.
  • O6 Each disjunctive attribute is treated as two
    attributes.
  • O7 For the ID dependency relationship type, the
    rule for the ID dependent object class is the
    same as the rule for the regular object class.
    The ID dependent object class together with its
    attributes forms a nested relation within its
    parent object class.

20
Translation Example1
Storing Algorithm
21
(2)Relationship type translation algorithm
Storing Algorithm
  • R1 All the identifiers of the object classes
    participating in this relationship type form the
    single-valued attributes of the nested relation.
  • The key of the relationship type can be
    determined by the participation constraint of the
    relationship type.
  • R2 Each single-valued attribute of this
    relationship type is a single-valued attribute of
    the generated relation.

22
Relationship type translation algorithm (cont)
Storing Algorithm
  • R3 Composite attributes of relationship type are
    represented directly. They are replaced by their
    components in the generated relation
  • R4 Each multi-valued attributes of this
    relationship type forms a repeating group in this
    relation.
  • R5 A disjunctive relationship type is treated as
    two relationship types.
  • R6 There is no need to translate ID dependency
    relationship type.

23
Translation Example1
Storing Algorithm
24
Translation for Ordering and ANY
Storing Algorithm
  • (3)Translation for Ordering
  • we define another attribute named ordinal within
    the ordered object class (ie, the ordered
    attribute).
  • (4)Translation for ANY
  • the unknown structured attribute or an attribute
    may have a different structure for different
    instances, which is denoted as ANY
  • we define a separate table as (Identifier, ANY,
    ANY-value).
  • Identifier is the identifier of the object class
    or the relationship type which this ANY belongs
    to.
  • ANY is the different structure name (the TAG) for
    the different instances.
  • ANY-value is its value.

25
Translation Results
Storing Algorithm
  • Followed these algorithms, the Normal Form ORA-SS
    schema will result in the normal form nested
    relations.
  • the undesirable update anomalies in
    semi-structured databases are removed and any
    redundancy due to many-to-many relationships and
    n-ary relationships are controlled

26
Contests
  • 1. Main accomplishment
  • 2. Related Works
  • 3. ORA-SS
  • 4. Storing Algorithm
  • 5. Comparison with Related Works
  • 6. Conclusion

27
Comparison
  • Other models
  • Supply(J, S, P, price, Qty)

28
Conclusion
  • Our approach is to use ORA-SS as our data model
    and use object-relational database as the
    database management system.
  • We can store and access the semi-structured data
    correctly, more efficient and without avoidable
    redundancy.
  • There is no node ID needed in our approach.

29
Conclusion (cont)
  • Our approach can capture the semantic information
    which is essential and important for storage.
  • Our approach can represent the degree of n-ary
    relationship types.
  • Our approach can represent the attribute as
    attribute of object class or attribute of
    relationship type.
Write a Comment
User Comments (0)
About PowerShow.com