Chapter 10: XML - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Chapter 10: XML

Description:

Sensor networks, e.g. small footprint and energy-wise computing. Context ... Users can add new tags, and separately specify how the tag should be handled for display ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 60
Provided by: kers151
Category:

less

Transcript and Presenter's Notes

Title: Chapter 10: XML


1
Chapter 10 XML
  • The world of XML

2
Context
  • The dawn of database technology 70s
  • A DBMS is a flexible store-recall system for
    digital information
  • It provides permanent memory for structured
    information

3
Context
  • Database Managements technology for
    administrative settings completed in the early
    80s
  • Search for demanding application areas that could
    benefit from a database approach
  • A sound datamodel to structure the information
    and maintain integrity rules
  • A high level programming language model to
    manipulate the data
  • Separation of concerns between modelling and
    manipulation, and physical storage and order of
    execution thanks to query optimizer technology

4
Context
  • Demanding areas of research in DBMS core
    technology
  • Office Information systems, e.g. document
    modelling and workflow
  • CAD/CAM, e.g. how to manage the design of an
    airplane or nucleur power plant
  • GIS, e.g. managing remote sensing information
  • WWW, e.g. how to integrate heterogenous sources
  • Agent-based systems, e.g. reactive systems
  • Multimedia, e.g. video storage/retrieval
  • Datamining, e.g. discovery of client profiles
  • Sensor networks, e.g. small footprint and
    energy-wise computing

5
Context
  • Demanding areas of research in DBMS core
    technology
  • Office Information systems, Extensible DBMS,
    blobs
  • CAD/CAM, Object-oriented DBMS, geometry
  • GIS, GIS DBMS, geometry and images
  • Agent-based systems, Active DBMS, triggers
  • Multimedia, MM DBMS, feature analysis
  • Datamining, Datawarehouse systems, cube,
    association rules
  • Sensor networks, P2P databases, ad-hoc networking

6
Context
  • Application interaction with DBMS
  • Proprietary application programming interface,
    shielding the hardware distinctions
  • Use readable interfaces to improve monitoring and
    development
  • Example in Monetdb the interaction is based on
    ascii text with the first character indicative
    for the message type
  • gt prompt, await for next request
  • ! error occurred, rest is the message
  • start of a tuple answer
  • Language embedding to remove the impedance
    mismatch, i.e. avoid cost of transforming data
  • Effectively failed in the OO world

7
Context
  • Learning points database perspective,
  • Database system should not be concerned with the
    user-interaction technology, they should be
    blind and deaf
  • The strong requirements on schema, integrity
    rules and processing is a harness
  • Interaction with applications should be
    self-descriptive as much as possible, because,
    you can not a priori know a complete schema
  • Need for semi-structured databases

8
Semi-structured data
  • Properties of semistructured databases
  • The schema is not given in advance and may be
    implicit in the data
  • The schema is relatively large and changes
    frequently
  • The schema is descriptive rather than
    prescriptive, integrity rules may be violated
  • The data is not strongly typed, the values of
    attributes may be of different type
  • Stanford Lore system is the prototypical first
    attempt to support semi-structured databases

9
Context
  • Accidentally, in the world of digital publishing
    there is a need for a simple datamodel to
    structure information
  • SMGL HTML XML XHTML
  • XPATH XQUERY XSLT
  • By the end 90s, the document world meets the
    database world

10
Introduction
  • XML Extensible Markup Language
  • Defined by the WWW Consortium (W3C)
  • Originally intended as a document markup language
    not a database language
  • Documents have tags giving extra information
    about sections of the document
  • E.g. lttitlegt XML lt/titlegt ltslidegt Introduction
    lt/slidegt
  • Derived from SGML (Standard Generalized Markup
    Language), but simpler to use than SGML
  • Extensible, unlike HTML
  • Users can add new tags, and separately specify
    how the tag should be handled for display

11
XML Introduction (Cont.)
  • The ability to specify new tags, and to create
    nested tag structures made XML a great way to
    exchange data, not just documents.
  • Much of the use of XML has been in data exchange
    applications, not as a replacement for HTML
  • Tags make data (relatively) self-documenting
  • E.g. ltbankgt
  • ltaccountgt
  • ltaccount-numbergt A-101
    lt/account-numbergt
  • ltbranch-namegt Downtown
    lt/branch-namegt
  • ltbalancegt 500
    lt/balancegt
  • lt/accountgt
  • ltdepositorgt
  • ltaccount-numbergt A-101
    lt/account-numbergt
  • ltcustomer-namegt Johnson
    lt/customer-namegt
  • lt/depositorgt
  • lt/bankgt

12
XML Motivation
  • Data interchange is critical in todays networked
    world
  • Examples
  • Banking funds transfer
  • Order processing (especially inter-company
    orders)
  • Scientific data
  • Chemistry ChemML,
  • Genetics BSML (Bio-Sequence Markup Language),
  • Paper flow of information between organizations
    is being replaced by electronic flow of
    information
  • Each application area has its own set of
    standards for representing information (W3C
    maintains ca 30 standards)
  • XML has become the basis for all new generation
    data interchange formats

13
XML Motivation (Cont.)
  • Each XML based standard defines what are valid
    elements, using
  • XML type specification languages to specify the
    syntax
  • DTD (Document Type Descriptors)
  • XML Schema
  • Plus textual descriptions of the semantics
  • XML allows new tags to be defined as required
  • However, this may be constrained by DTDs
  • A wide variety of tools is available for parsing,
    browsing and querying XML documents/data

14
Structure of XML Data
  • Tag label for a section of data
  • Element section of data beginning with lttagnamegt
    and ending with matching lt/tagnamegt
  • Elements must be properly nested
  • Proper nesting
  • ltaccountgt ltbalancegt . lt/balancegt lt/accountgt
  • Improper nesting
  • ltaccountgt ltbalancegt . lt/accountgt lt/balancegt
  • Formally every start tag must have a unique
    matching end tag, that is in the context of the
    same parent element.
  • Every document must have a single top-level
    element

15
Motivation for Nesting
  • Nesting of data is useful in data transfer
  • Example elements representing customer-id,
    customer name, and address nested within an order
    element
  • Nesting is not supported, or discouraged, in
    relational databases
  • With multiple orders, customer name and address
    are stored redundantly
  • normalization replaces nested structures in each
    order by foreign key into table storing customer
    name and address information
  • Nesting is supported in object-relational
    databases and NF2
  • But nesting is appropriate when transferring data
  • External application does not have direct access
    to data referenced by a foreign key

16
Example of Nested Elements
  • ltbank-1gt ltcustomergt
  • ltcustomer-namegt Hayes lt/customer-namegt
  • ltcustomer-streetgt Main lt/customer-streetgt
  • ltcustomer-citygt Harrison
    lt/customer-citygt
  • ltaccountgt
  • ltaccount-numbergt A-102 lt/account-numbergt
  • ltbranch-namegt Perryridge
    lt/branch-namegt
  • ltbalancegt 400 lt/balancegt
  • lt/accountgt
  • ltaccountgt
  • lt/accountgt
  • lt/customergt . .
  • lt/bank-1gt

17
Structure of XML Data (Cont.)
  • Mixture of text with sub-elements is legal in
    XML.
  • Example
  • ltaccountgt
  • This account is seldom used any more.
  • ltaccount-numbergt A-102lt/account-numbergt
  • ltbranch-namegt Perryridgelt/branch-namegt
  • ltbalancegt400 lt/balancegtlt/accountgt
  • Useful for document markup, but discouraged for
    data representation

18
Attributes
  • Elements can have attributes
  • ltaccount acct-type checking gt
  • ltaccount-numbergt A-102
    lt/account-numbergt
  • ltbranch-namegt Perryridge
    lt/branch-namegt
  • ltbalancegt 400 lt/balancegt
  • lt/accountgt
  • Attributes are specified by namevalue pairs
    inside the starting tag of an element
  • An element may have several attributes, but each
    attribute name can only occur once
  • ltaccount acct-type checking monthly-fee5gt

19
Attributes Vs. Subelements
  • Distinction between subelement and attribute
  • In the context of documents, attributes are part
    of markup, while subelement contents are part of
    the basic document contents
  • In the context of data representation, the
    difference is unclear and may be confusing
  • Same information can be represented in two ways
  • ltaccount account-number A-101gt .
    lt/accountgt
  • ltaccountgt ltaccount-numbergtA-101lt/account-numb
    ergt lt/accountgt
  • Suggestion use attributes for identifiers of
    elements, and use subelements for contents

20
More on XML Syntax
  • Elements without subelements or text content can
    be abbreviated by ending the start tag with a /gt
    and deleting the end tag
  • ltaccount numberA-101 branchPerryridge
    balance200 /gt
  • To store string data that may contain tags,
    without the tags being interpreted as
    subelements, use CDATA as below
  • lt!CDATAltaccountgt lt/accountgtgt
  • Here, ltaccountgt and lt/accountgt are treated as
    just strings

21
Namespaces
  • XML data has to be exchanged between
    organizations
  • Same tag name may have different meaning in
    different organizations, causing confusion on
    exchanged documents
  • Specifying a unique string as an element name
    avoids confusion
  • Better solution use unique-nameelement-name
  • Avoid using long unique names all over document
    by using XML Namespaces
  • ltbank XmlnsFBhttp//www.FirstBank.comgt
  • ltFBbranchgt
  • ltFBbranchnamegtDowntownlt/FBbranchnamegt
  • ltFBbranchcitygt Brooklynlt/FBbranchcitygt
  • lt/FBbranchgt
  • lt/bankgt

22
XML Document Schema
  • Database schemas constrain what information can
    be stored, and the data types of stored values
  • XML documents are not required to have an
    associated schema
  • However, schemas are very important for XML data
    exchange
  • Otherwise, a site cannot automatically interpret
    data received from another site
  • Two mechanisms for specifying XML schema
  • Document Type Definition (DTD)
  • Widely used
  • XML Schema
  • Newer, not yet widely used

23
Document Type Definition (DTD)
  • The type of an XML document can be specified
    using a DTD
  • DTD constraints structure of XML data
  • What elements can occur
  • What attributes can/must an element have
  • What subelements can/must occur inside each
    element, and how many times.
  • DTD does not constrain data types
  • All values represented as strings in XML
  • DTD syntax
  • lt!ELEMENT element (subelements-specification) gt
  • lt!ATTLIST element (attributes) gt

24
Element Specification in DTD
  • Subelements can be specified as
  • names of elements, or
  • PCDATA (parsed character data), i.e., character
    strings
  • EMPTY (no subelements) or ANY (anything can be a
    subelement)
  • Example
  • lt! ELEMENT depositor (customer-name
    account-number)gt
  • lt! ELEMENT customer-name(PCDATA)gt
  • lt! ELEMENT account-number (PCDATA)gt
  • Subelement specification may have regular
    expressions
  • lt!ELEMENT bank ( ( account customer
    depositor))gt
  • Notation
  • - alternatives
  • - 1 or more occurrences
  • - 0 or more occurrences

25
Bank DTD
  • lt!DOCTYPE bank
  • lt!ELEMENT bank ( ( account customer
    depositor))gt
  • lt!ELEMENT account (account-number branch-name
    balance)gt
  • lt! ELEMENT customer(customer-name
    customer-street

    customer-city)gt
  • lt! ELEMENT depositor (customer-name
    account-number)gt
  • lt! ELEMENT account-number (PCDATA)gt
  • lt! ELEMENT branch-name (PCDATA)gt
  • lt! ELEMENT balance(PCDATA)gt
  • lt! ELEMENT customer-name(PCDATA)gt
  • lt! ELEMENT customer-street(PCDATA)gt
  • lt! ELEMENT customer-city(PCDATA)gt
  • gt

26
Attribute Specification in DTD
  • Attribute specification for each attribute
  • Name
  • Type of attribute
  • CDATA
  • ID (identifier) or IDREF (ID reference) or IDREFS
    (multiple IDREFs)
  • more on this later
  • Whether
  • mandatory (REQUIRED)
  • has a default value (value),
  • or neither (IMPLIED)
  • Examples
  • lt!ATTLIST account acct-type CDATA checkinggt
  • lt!ATTLIST customer
  • customer-id ID REQUIRED
  • accounts IDREFS REQUIRED gt

27
IDs and IDREFs
  • An element can have at most one attribute of type
    ID
  • The ID attribute value of each element in an XML
    document must be distinct
  • Thus the ID attribute value is an object
    identifier
  • An attribute of type IDREF must contain the ID
    value of an element in the same document
  • An attribute of type IDREFS contains a set of (0
    or more) ID values. Each ID value must contain
    the ID value of an element in the same document

28
Bank DTD with Attributes
  • Bank DTD with ID and IDREF attribute types.
  • lt!DOCTYPE bank-2
  • lt!ELEMENT account (branch, balance)gt
  • lt!ATTLIST account
  • account-number ID
    REQUIRED
  • owners IDREFS
    REQUIREDgt
  • lt!ELEMENT customer(customer-name,
    customer-street,

  • customer-city)gt
  • lt!ATTLIST customer
  • customer-id ID
    REQUIRED
  • accounts IDREFS
    REQUIREDgt
  • declarations for branch, balance,
    customer-name,
    customer-street and customer-citygt

29
XML data with ID and IDREF attributes
  • ltbank-2gt
  • ltaccount account-numberA-401 ownersC100
    C102gt
  • ltbranch-namegt Downtown lt/branch-namegt
  • ltbranchgt500 lt/balancegt
  • lt/accountgt
  • ltcustomer customer-idC100 accountsA-401gt
  • ltcustomer-namegtJoelt/customer-namegt
  • ltcustomer-streetgtMonroelt/customer-street
    gt
  • ltcustomer-citygtMadisonlt/customer-citygt
  • lt/customergt
  • ltcustomer customer-idC102 accountsA-401
    A-402gt
  • ltcustomer-namegt Marylt/customer-namegt
  • ltcustomer-streetgt Erinlt/customer-streetgt
  • ltcustomer-citygt Newark lt/customer-citygt
  • lt/customergt
  • lt/bank-2gt

30
Limitations of DTDs
  • No typing of text elements and attributes
  • All values are strings, no integers, reals, etc.
  • Difficult to specify unordered sets of
    subelements
  • Order is usually irrelevant in databases
  • (A B) allows specification of an unordered
    set, but
  • Cannot ensure that each of A and B occurs only
    once
  • IDs and IDREFs are untyped
  • The owners attribute of an account may contain a
    reference to another account, which is
    meaningless
  • owners attribute should ideally be constrained to
    refer to customer elements

31
XML Schema
  • XML Schema is a more sophisticated schema
    language which addresses the drawbacks of DTDs.
    Supports
  • Typing of values
  • E.g. integer, string, etc
  • Also, constraints on min/max values
  • User defined types
  • Is itself specified in XML syntax, unlike DTDs
  • More standard representation, but verbose
  • Is integrated with namespaces
  • Many more features
  • List types, uniqueness and foreign key
    constraints, inheritance ..
  • BUT significantly more complicated than DTDs,
    not yet widely used.

32
XML Schema Version of Bank DTD
  • ltxsdschema xmlnsxsdhttp//www.w3.org/2001/XMLSc
    hemagt
  • ltxsdelement namebank typeBankType/gt
  • ltxsdelement nameaccountgtltxsdcomplexTypegt
    ltxsdsequencegt ltxsdelement
    nameaccount-number typexsdstring/gt
    ltxsdelement namebranch-name
    typexsdstring/gt ltxsdelement
    namebalance typexsddecimal/gt
    lt/xsdsquencegtlt/xsdcomplexTypegt
  • lt/xsdelementgt
  • .. definitions of customer and depositor .
  • ltxsdcomplexType nameBankTypegtltxsdsquencegt
  • ltxsdelement refaccount minOccurs0
    maxOccursunbounded/gt
  • ltxsdelement refcustomer minOccurs0
    maxOccursunbounded/gt
  • ltxsdelement refdepositor minOccurs0
    maxOccursunbounded/gt
  • lt/xsdsequencegt
  • lt/xsdcomplexTypegt
  • lt/xsdschemagt

33
Storage of XML Data
  • XML data can be stored in
  • Non-relational data stores
  • Flat files
  • Natural for storing XML
  • But has all problems discussed in Chapter 1 (no
    concurrency, no recovery, )
  • XML database
  • Database built specifically for storing XML data,
    supporting DOM model and declarative querying
  • Currently no commercial-grade scaleable system
  • Relational databases
  • Data must be translated into relational form
  • Advantage mature database systems
  • Disadvantages overhead of translating data and
    queries

34
Storing XML in Relational Databases
  • Store as string
  • E.g. store each top level element as a string
    field of a tuple in a database
  • Use a single relation to store all elements, or
  • Use a separate relation for each top-level
    element type
  • E.g. account, customer, depositor
  • Indexing
  • Store values of subelements/attributes to be
    indexed, such as customer-name and account-number
    as extra fields of the relation, and build
    indices
  • Oracle 9 supports function indices which use the
    result of a function as the key value. Here, the
    function should return the value of the required
    subelement/attribute
  • SQL server 2005 same

35
Storing XML in Relational Databases
  • Store as string
  • E.g. store each top level element as a string
    field of a tuple in a database
  • Benefits
  • Can store any XML data even without DTD
  • As long as there are many top-level elements in a
    document, strings are small compared to full
    document, allowing faster access to individual
    elements.
  • Drawback Need to parse strings to access values
    inside the elements parsing is slow.

36
OEM model
  • Semi structured and XML databases can be modelled
    as graph-problems
  • Early prototypes directly supported the graph
    model as the physical implementation scheme.
    Querying the graph model was implemented using
    graph traversals
  • XML without IDREFS can be modelled as trees

37
(No Transcript)
38
Storing XML as Relations (Cont.)
  • Tree representation model XML data as tree and
    store using relations
    nodes(id, type, label, value)
    child (child-id, parent-id)
  • Each element/attribute is given a unique
    identifier
  • Type indicates element/attribute
  • Label specifies the tag name of the element/name
    of attribute
  • Value is the text value of the element/attribute
  • The relation child notes the parent-child
    relationships in the tree
  • Can add an extra attribute to child to record
    ordering of children
  • Benefit Can store any XML data, even without DTD
  • Drawbacks
  • Data is broken up into too many pieces,
    increasing space overheads
  • Even simple queries require a large number of
    joins, which can be slow

39
Storing XML in Relations (Cont.)
  • Map to relations
  • If DTD of document is known, you can map data to
    relations
  • Bottom-level elements and attributes are mapped
    to attributes of relations
  • A relation is created for each element type
  • An id attribute to store a unique id for each
    element
  • all element attributes become relation attributes
  • All subelements that occur only once become
    attributes
  • For text-valued subelements, store the text as
    attribute value
  • For complex subelements, store the id of the
    subelement
  • Subelements that can occur multiple times
    represented in a separate table
  • Similar to handling of multivalued attributes
    when converting ER diagrams to tables
  • Benefits
  • Efficient storage
  • Can translate XML queries into SQL, execute
    efficiently, and then translate SQL results back
    to XML

40
Alternative mappings
  • Mapping the structure
  • The Edge approach
  • The Attribute approach
  • The Universal Table approach
  • The Normalized Universal approach
  • The Dataguide approach
  • Mapping values
  • Separate value tables
  • Inlining
  • Shredding

41
Edge approach
  • Use a single Edge table to capture the graph
    structure
  • Edge(source, ordinal, name, flag, target)
  • Flag value, reference
  • Keys source, ordinal)
  • Index source, name,target

42
Attribute approach
  • Group all attributes with the same name into one
    table
  • Aname(source,ordinal,flag, target)
  • Key source,ordinal
  • Indextarget

43
Universal approach
  • Use the Universal Table, all attributes are
    stored as columns
  • Universal(source, ord-1,flag-1,target-1,
    ,ord-n,flag-n,target-n)
  • Key source, index target-i

44
Normalized Universal
  • Same as Universal, but factor out the repeating
    values
  • Universal(source, ord-1,flag-1,target-1,
    ,ord-n,flag-n,target-n)
  • Overflow_n(source,ord, flag,target)
  • Key source, and source,ord
  • Index target-i

45
Mapping values
  • Separate value tables
  • Use V_type(vid, value) tables, eg. int(vid,val),
    str(vid,val),.

46
Mapping values
  • Inlining
  • As illustrated in previous mappings, inline the
    values in the structure relations

47
Shredding
  • Try to recognize repeating structures and map
    them to separate tables
  • Handle the remainder through any of the previous
    methods

48
Evaluation
  • Some results reported by Florescu, Kossmann using
    a commercial DBMS on documents of 100K objects in
    1999
  • Database storage overhead

49
Evaluation
  • Some results reported by Florescu, Kossmann using
    a commercial DBMS on documents of 100K objects in
    1999
  • Bulk loading

50
Evaluation
  • Some results reported by Florescu, Kossmann using
    a commercial DBMS on documents of 100K objects in
    1999
  • Reconstruction

51
The Data
  • Semistructured data instance a large graph

52
The indexing problem
  • The storage problem
  • Store the graph in a relational DBMS
  • Develop a new database storage structure
  • The indexing problem
  • Input large, irregular data graph
  • Output index structure for evaluating (regular)
    path expressions, e.g.
  • bib.paper.author.firstname

53
XSet a simple index for XML
  • Part of the Ninja project at Berkeley
  • Example XML data

54
XSet a simple index for XML
  • Each node a hashtable
  • Each entry list of pointers to data nodes (not
    shown)

55
XSet Efficient query evaluation
  • SELECT X FROM part.name X -yes
  • SELECT X FROM part.supplier.name X -yes
  • SELECT X FROM part..subpart.name X -maybe
  • SELECT X FROM .supplier.name X -maybe

Will gain when index fits in memory
56
Region Algebras
  • structured text text with tags (like XML)
  • data sequence of characters c1c2c3
  • region interval in the text
  • representation (x,y) cx,cx1, cy
  • example ltsectiongt lt/sectiongt
  • region set a set of regions
  • example all ltsectiongt regions (may be nested)
  • region algebra operators on region set,
  • s1 op s2

57
Representation of a region set
  • Example the ltsubpartgt region set

58
Region algebra some operators
  • s1 intersect s2 r r? s1, r ?s2
  • s1 included s2 r r?s1, ?r ? s2, r ? r
  • s1 including s2 r r? s1, ?r ? s2, r ? r
  • s1 parent s2 r r? s1, ?r? s2, r is a parent
    of r
  • s1 child s2 r r? s1, ?r ? s2, r is child of
    r

Examples ltsubpartgt included ltpartgt ltpartgt
including ltsubpartgt
59
Efficient computation of Region Algebra Operators
  • Example s1 included s2
  • s1 (x1,x1'), (x2,x2'),
  • s2 (y1,y1'), (y2,y2'),
  • (i.e. assume each consists of disjoint regions)
  • Algorithm
  • if xi lt yj then i i 1
  • if xi' gt yj' then j j 1
  • otherwise print (xi,xi'), do i i 1
  • Can do in sub-linear time when one region is very
    small

60
From path expressions to region expressions
Region expressions correspond to simple XPath
expressions
  • part.name name child (part child
    root)
  • part.supplier.name name child (supplier child
    (part child root))
  • .supplier.name name child supplier
  • part..subpart.name name child (subpart
    included (part child root))

61
Storage structures for region algebras
  • Every node is characterised by an integer pair
    (x,y)
  • This means we have a 2-d space
  • Any 2-d space data structure can be used
  • If you use a (pre-order,post-order) numbering you
    get triangular filling of 2-d
  • (to be discussed later)

62
Alternative mappings
  • Mapping the structure to the relational world
  • The Edge approach
  • The Attribute approach
  • The Universal Table approach
  • The Normalized Universal approach
  • The Monet/XML approach
  • The Dataguide approach
  • Mapping values
  • Separate value tables
  • Inlining
  • Shredding

63
Dataguide approach
  • Developed in the context of Lore, Lorel (Stanford
    Univ)
  • Predecessor of the Monet/XML model
  • Observation
  • queries in the graph-representation take a
    limited form
  • they are partial walks from the root to an object
    of interest
  • this behaviour was stressed by the query language
    Lorel, i.e. an SQL-based query language based on
    processing regular expressions

SELECT X FROM (Bib..author).(lastnamefirstname).
Abiteboul X
64
DataGuides
  • Definition
  • given a semistructured data instance DB, a
    DataGuide for DB is a graph G s.t.
  • - every path in DB also occurs in G
  • - every path in G occurs in DB
  • - every path in G is unique

65
Dataguides
  • Example

66
DataGuides
  • Multiple DataGuides for the same data

67
DataGuides
  • Definition
  • Let w, w be two words (I.e word queries) and G
    a graph
  • w ?G w if w(G) w(G)
  • Definition
  • G is a strong dataguide for a database DB if ?G
    is the same as ?DB
  • Example
  • - G1 is a strong dataguide
  • - G2 is not strong
  • person.project !?DB dept.project
  • person.project !?G2 dept.project

68
DataGuides
  • Constructing the strong DataGuide G
  • Nodes(G)root
  • Edges(G)?
  • while changes do
  • choose s in Nodes(G), a in Labels
  • add syx in s, (x -a-gty) in Edges(DB) to
    Nodes(G)
  • add (x -a-gty) to Edges(G)
  • Use hash table for Nodes(G)
  • This is precisely the powerset automaton
    construction.

69
DataGuides
  • How large are the dataguides ?
  • if DB is a tree, then size(G) lt size(DB)
  • why? answer every node is in exactly one extent
    of G
  • here dataguide XSet
  • How many nodes does the strong dataguide have for
    this DB ?

20 nodes (least common multiple of 4 and 5)
Dataguides usually fail on data with cyclic
schemas, like
Write a Comment
User Comments (0)
About PowerShow.com