XML Constraints - PowerPoint PPT Presentation

About This Presentation
Title:

XML Constraints

Description:

a relation may have multiple keys, while an element can have at most one ID (primary) ... A mixture of relational keys and object identities (oids) ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 67
Provided by: infor180
Learn more at: https://cse.buffalo.edu
Category:
Tags: xml | constraints | keys

less

Transcript and Presenter's Notes

Title: XML Constraints


1
XML Constraints
  • Wenfei Fan
  • University of Edinburgh
  • and
  • Bell Laboratories

2
Outline of Part IV
  • XML Specifications types and integrity
    constraints
  • Specification of XML constraints
  • keys, foreign keys, FDs
  • absolute vs. relative constraints
  • Analysis of XML constraints
  • Consistency analysis
  • Implication analysis
  • Applications of XML constraints, and research
    issues
  • Relational storage of XML data via constraint
    propagation
  • Schema-directed XML integration
  • Normal forms, query optimization, updates, data
    cleaning . . .

3
Introduction to XML specificaiton
  • XML Specification
  • types
  • integrity constraints
  • the need for XML constraints

4
XML data - an example
  • Rooted, node-labeled tree
  • elements db, province, capital, city,
    subtree/sub-document elements/subelements, e.g.,
    the capital child of province
  • _at_attributes _at_name, _at_inProvince, carrying text
  • text nodes, with text but no label, e.g.,
    Hasselt

5
XML specification DTD (type)
  • Production constrains the subelement list of
    each element lt!ELEMENT db (province,
    capital)gt
  • lt!ELEMENT province (city, capital)gt
  • Attributes uniquely identified by name for each
    element, unordered
  • province _at_name, capital _at_inProvince

6
XML specification integrity constraints
  • Keys and foreign keys (vs. relational
    constraints)
  • key the value of a _at_name uniquely identifies a
    province
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • FK _at_inProvince of a capital references _at_name of
    a province
  • capital._at_inProvince ? province._at_name

7
XML specification
  • A type (DTD) D
  • A set of integrity constraints, ?
  • Example
  • DTD D structure of the document, vs. types in a
    PL
  • lt!ELEMENT db (province, capital)gt
  • lt!ELEMENT province (city, capital)gt
  • province._at_name, capital._at_inProvince
  • Constraints ? defined in terms of data values
    across elements
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • capital._at_inProvince ? province._at_name

8
Why XML constraints?
  • Supported by W3C XML standard, XML Schema
  • In databases (supported by SQL standard),
    constraints are
  • an essential part of the semantics of data,
  • fundamental to conceptual design,
  • useful for choosing efficient storage and access
    methods,
  • central to update anomaly prevention,
  • data cleaning
  • In the XML setting constraints have proved
    useful in
  • database storage of XML data (via constraint
    propagation),
  • schema-directed database publishing/integration
    in XML,
  • XML query optimization and formulation,
  • design theory for XML specifications normal
    forms
  • data cleaning,

9
Data exchange on the Web XML publishing
  • All members of a community (or industry) agree on
    a schema and exchange data w.r.t. the schema
    e-commerce, health-care, ...
  • Schema-Directed XML Publishing/Integration
  • mapping data from traditional database to XML
  • satisfying the predefined DTD and constraints

Web
XML
XML
Q XML view
DB1
DB2
10
Data exchange on the Web XML shredding
  • XML shredding
  • mapping XML data to relations
  • relational design normalization via constraint
    propagation from XML to relations
  • optimal relational storage of XML data
  • semantic connection query/update optimization

Web
XML
XML
XML keys
XML shredding
propagation
DB1
DB2
relational FDs
11
XML constraints
  • Specification of XML constraints
  • keys, foreign keys, FDs
  • absolute vs. relative constraints

12
The limitations of the XML standard (DTD)
  • lt!ATTLIST country name ID
    requiredgt
  • lt!ATTLIST province capital ID
    requiredgt
  • lt!ATTLIST capital inProvince IDREF
    requiredgt
  • Scoping
  • ID unique within the entire document (like oids),
    while a key needs only to uniquely identify a
    tuple within a relation
  • IDREF untyped one has no control over what it
    points to -- you point to something, but you
    dont know what it is!
  • ltstudent id01 nameSaddam
    takingqsx/gt
  • ltstudent id02 nameBush
    takingqsx 01/gt
  • ltcourse idqsx/gt

13
The limitations of the XML standard (DTD)
  • keys need to be multi-valued, while IDs must be
    single-valued (unary)
  • enroll (sid string, cid string,
    gradestring)
  • a relation may have multiple keys, while an
    element can have at most one ID (primary)
  • ID/IDREF can only be defined in a DTD, while XML
    data may not come with a DTD/schema
  • ID/IDREF, even relational keys/foreign keys, fail
    to capture the semantics of hierarchical data
    will be seen shortly
  • A mixture of relational keys and object
    identities (oids)
  • Mild extensions of relational constraints do not
    work for XML!

14
Absolute constraints
  • Absolute keys and foreign keys are to hold on the
    entire document.
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • capital._at_inProvince ? province._at_name
  • Extensions of relational counterparts

15
Absolute keys and foreign keys PODS00, 01, JACM
  • key ??X ? ?. An XML document satisfies
    the key iff
  • ? x y ? ext(?) (?l ?X (x.l y.l) ? x y)
  • foreign key (FK) a combination of an inclusion
    constraint ? ?1X ?? ??2Y, and a key ?
    ?2Y ? ? ??2 .
  • A document satisfies the FK iff it satisfies the
    key and
  • ? x ? ext(??1 ) ? y ? ext(??2 ) (xX yY)
  • ?, ?1 ,??2 element types X, Y sets (lists)
    of attributes
  • ext(?) the set of ? elements in an XML document.
  • Equality issue
  • (string) value equality when comparing
    attributes
  • node identify when comparing XML elements
  • Unary keys and foreign keys defined in terms of
    single-attribute.

16
Relative constraints WWW01, PODS02,SICOMP
  • An XML tree specifies countries, provinces,
    province capitals.
  • What is a key for a province?
  • What does _at_inProvince of a capital reference?

db
...
country
country
...
...
province
capital
capital
province
_at_name
_at_name
Holland
Belgium
capital
_at_name
_at_name
capital
_at_inProvince
Hasselt
_at_inProvince
Maastricht
Limburg
Limburg
Limburg
Limburg
_at_inProvince
Hasselt
_at_inProvince
Hasselt
Limburg
Limburg
17
Examples of relative constraints
  • Relative constraints on a subdocument rooted at
    a country
  • key country (province._at_name ?
    province)
  • country (capital._at_inProvince ? capital)
  • FK country (capital._at_inProvince ?
    province._at_name)
  • Absolute on the entire document country._at_name
    ? country

db
...
country
country
...
...
province
capital
capital
province
_at_name
_at_name
Belgium
Holland
capital
_at_name
Hasselt
capital
_at_name
_at_inProvince
_at_inProvince
Maastricht
Limburg
Limburg
Limburg
Limburg
_at_inProvince
Hasselt
_at_inProvince
Hasselt
Limburg
Limburg
18
Relative keys and foreign keys
  • key ??(??1X ? ??1). An document satisfies the
    key iff
  • ? c ? ext(?) ? y, z ? ext(?1)
  • ( (y ?? c) ? (z ?? c) ? ?l ?X (y.l z.l) ?
    y z)
  • foreign key (FK) ??( ?1X ?? ??2Y ) and a key
    ?( ?2Y ? ??2) .
  • A document satisfies the FK iff it satisfies the
    key and
  • ? c ? ext(?) ? y ? ext(?1) (( y ?? c) ?
  • ? z ? ext(??2 ) ((z ?? c) ? yX zY
    ))
  • where ?
  • (y ?? c) y is a descendant of c (y in the
    subtree rooted at c)
  • ? context type
  • ext(?) the set of ? elements in an XML document.

19
Relative vs. Absolute
  • Absolute constraints are a special case of
    relative ones
  • country._at_name ? country ? db ( country._at_name
    ? country )
  • absolute a fixed context type -- the root type
    r
  • Absolute constraints are scoped within the entire
    document whereas relative ones within the
    context of a subdocument.
  • country (province._at_name ? province)
  • country (capital._at_inProvince ? capital)
  • country (capital._at_inProvince ?
    province._at_name)
  • country._at_name ? country
  • Together they specify constraints on the entire
    document
  • Beyond relational constraints important for
    hierarchically structured data XML, scientific
    databases, biomedical data, ...

20
Define keys with path expressions
  • XML data is hierarchically structured!
  • name as a key for employees of companies only
    target set is identified with a path expression
    //company//employee
  • XML data is semistructured it may not have a
    DTD/schema!
  • key paths may be missing or have multiple
    occurrences
  • key specification should be independent of types

name
name
_at_id
_at_id
firstName
lastName
21
Path expressions
  • Path expression navigating XML trees
  • A simple yet powerful path language
  • q ? l q/q
    //
  • ? empty path
  • l tag
  • q/q concatenation
  • // descendants and self recursively
    descending downward

22
Absolute path constraints WWW01
  • Absolute key (Q, P1, . . ., Pk )
  • Path expressions Q, Pi XPath, regular path
    expressions,
  • target path Q to identify a target set Q of
    nodes on which the key is defined (vs. relation)
  • a set of key paths P1, . . ., Pk to provide
    an identification for nodes in Q (vs. key
    attributes)
  • semantics for any two nodes in Q, if they
    have all the key paths and agree on them by value
    equality (existential), then they must be the
    same node (value equality and node identity)
  • Examples
  • (//company//employees, name, phone) --
    composite key
  • ( //company//employees, //_at_id) --
    multiple keys
  • (//., _at_id)
    -- capturing ID attributes in DTDs

23
Value equality on trees
  • Two nodes are value equal iff
  • either they are text nodes (PCDATA) with the same
    value
  • or they are attributes with the same tag and the
    same value
  • or they are elements having the same tag and
    their children are pairwise value equal
  • E.g. two value-equal names

...
24
Capturing the semistructured nature
  • independent of types
  • no structural requirement tolerating
    missing/multiple paths
  • (person, name) (person, name, _at_phone)

25
Relative path constraints WWW01
  • Relative key (Q, K)
  • path Q identifies a set Q of nodes, called
    the context path
  • K (Q, P1, . . ., Pk ) is a key on
    sub-documents rooted at nodes in Q (relative
    to Q).
  • Example. (//country, (province, _at_capital))
  • (//country, _at_name) -- absolute key
  • Absolute keys are a special case of relative
    keys
  • (Q, K) when Q is the empty path
  • Similarly for foreign keys
  • Specification of XML constraints is more involved
    than its relational counterparts

26
Keys and foreign keys in XML Schema
  • key (Q, P1, . . ., Pk )
  • Path expressions Q, Pi fragments of XPath
  • Uniqueness and existence for each node x in
    Q and each i in 1, n, there exists a unique
    node yi reached via Pi, and yi is either a text
    node or an attribute
  • Foreign keys (Q, P1, . . ., Pk ) ?? (S,
    S1, . . ., Sk )
  • (S, S1, . . ., Sk ) is a key
  • Uniqueness and existence both Pi and Si
  • The uniqueness and existence condition
    complicates the consistency and implication
    analyses
  • Absolute constraint

27
Other constraints for XML
  • Functional dependencies P1, . . ., Pk ?
    S1, . . ., Sk
  • Generalizations of relational FDs for deriving
    an extension of relational-schema normal forms
  • Absolute constraints Arenas and Libkin, PODS02
  • XICs ? x1 ? xn ( B(x1, , Xn) ?
  • ? (i ? 1, l) (? y1 ? yk
    Ci (x1, , xn, y1, , yk))
  • Generalization of relational embedded constraints
  • B, Ci conjunction of simple XPath expressions
  • Subsuming relative keys and foreign keys (Deutsch
    and Tannen, KRDB01)

28
Constraint analysis
  • Analysis of XML constraints
  • Consistency analysis
  • Implication analysis
  • Absolute, relative, path-expression constraints

29
Consistency of XML specifications
  • Given D a DTD
  • ? a set of integrity constraints
    over D
  • Consistency Is there an XML document that both
    conforms to D and satisfies ??
  • One wants to know whether XML specifications make
    sense!
  • Run-time check attempts to validate documents
    with (D, ?).
  • This would not tell us whether repeated failures
    are due to a bad specification or problems with
    the documents
  • ? static analysis is desirable

30
An inconsistent specification
  • The specification with D and ? is inconsistent!
  • DTD D
  • lt!ELEMENT db (province, capital)gt
  • lt!ELEMENT province (city, capital)gt
  • province._at_name, capital._at_inProvince
  • Constraints ?
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • capital._at_inProvince ? province._at_name
  • In contrast, one can specify keys and foreign
    keys in SQL without worrying about their
    consistency with schema.

31
Cardinality constraints by keys, foreign keys
  • Constraints ?
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • capital._at_inProvince ? province._at_name
  • Notation
  • ext(?) the set of ? elements in an XML document
  • ext(?.l) the set of l attribute values of all ?
    elements
  • ?
  • ext(province._at_name)
    ext(province)
  • ext(capital._at_inProvince) ext(capital)
  • ext(capital._at_inProvince) ?
    ext(province._at_name)
  • ? ext(capital) ? ext(province)

32
Cardinality constraints imposed by DTDs
  • DTD D lt!ELEMENT db (province, capital)gt
  • lt!ELEMENT province (city,
    capital)gt
  • Variables
  • Xprovince the number of province elements under
    the root
  • Xcapital the number of capital subelements of
    the root
  • Ycapital the number of capital subelements of
    provinces
  • ?
  • Xprovince ? 1, Xcapital ? 1
  • ext(province) Xprovince,
    Xprovince Ycapital
  • ext(capital) Xcapital Ycapital
  • ?
  • ext(capital) gt ext(province)

33
The interaction
  • Contradiction
  • From the constraints ? ext(capital) ?
    ext(province)
  • From the DTD D ext(capital) gt
    ext(province)
  • Thus there exists NO XML document that both
    conforms to D and satisfies ?.

34
Consistency analysis PODS01, 02, JACM, SICOMP
  • Trivial for relational databases given any
    schema and keys, foreign keys, one can always
    find a nonempty instance of the schema satisfying
    the constraints.
  • Hard for XML XML specifications may not be
    consistent!
  • Both DTDs and constraints impose cardinality
    constraints
  • The interaction between these two classes of
    cardinality constraints is rather complicated.

35
Consistency analysis of XML constraints
  • Theorem The consistency problem is
  • undecidable for multi-attribute absolute keys and
    foreign keys
  • NP-complete for unary absolute keys and foreign
    keys, even for primary keys (primary at most one
    key for each element type)
  • in NEXPTIME for primary multi-attribute absolute
    keys and unary foreign keys
  • in 2NEXPTIME and PSPACE-hard for unary absolute
    regular keys and foreign keys (target path ?/?,
    where ? is a regular path expression and ? an
    element type key paths attributes)
  • undecidable for relative keys and foreign keys,
    even when all the constraints are unary and
    primary.
  • As opposed to the trivial analysis of the
    relational counterpart.

36
Proof ideas
  • Multi-attribute constraints reduction from the
    implication problem for functional and inclusion
    dependencies in RDBs.
  • Unary keys and foreign keys
  • a nontrivial encoding of DTDs and unary
    constraints in terms of linear integer
    constraints (O(n2 log n)-time)
  • polynomially equivalent to LIP, linear integer
    programming
  • Multi-attribute primary keys and unary foreign
    keys
  • polynomially equivalent to Prequadratic
    Diophantine Problem (PDE) satisfiability of
    linear integer constraints and prequadratic
    constraints of the form x lt y z
  • the precise complexity of PDE, a restriction to
    the Hilberts 10th problem, is open --
    nontrivial.

37
Proof idea for relative constraints
  • Theorem The consistency problem is undecidable
    for relative keys and foreign keys, even when all
    the constraints are unary and are under the
    primary key restriction.
  • As opposed to the NP complexity of its absolute
    counterpart.
  • Proof idea reduction from the Hilberts 10th
    problem.
  • Diophantine equation problem
  • P1 (x1, , xk) Q1 (x1, , xk) c1
  • . . .
  • Pn (x1, , xk) Qn (x1, , xk) cn

38
More on regular-expression constraints
  • XML data is hierarchically structured
  • define _at_eid as a key of employees of companies
    and schools
  • define _at_taughtBy as a foreign key of students
    referencing _at_eid of school employees.

39
Examples of regular constraints
  • Key (university._ company._).employee._at_eid
    ?
  • (university._ company._).employee
  • FK _.student._at_taughtBy ? university._.employe
    e._at_eid
  • _ wildcard that matches any label
  • _ the Kleene closure of _

40
Regular path expression
  • Vertical regular expressions
  • ? ? ? _ ?.? ??
    ?
  • ? empty word ? element type _
    wildcard
  • ., , concatenation, disjunction, Kleene
    star
  • Example (university._ company._).employee
  • university._.employee
  • nodes(?. ?) the set of ? elements in an XML
    document that are reachable from the root by
    following ?

41
Regular expression constraints
  • key ? ?.?X ? ? ?.?. A document satisfies
    the key iff
  • ? x y ? nodes( ?.? ) (?l ?X (x.l y.l)
    ? x y)
  • foreign key ? ?1.?1X ?? ?2.?2Y, and a key
    ??2.?2Y ? ??2.?2
  • A document satisfies the FK iff it satisfies
    the key and
  • ? x ? nodes(? ?1.?1 ) ? y ? nodes(? ?2.?2 )
    (xX yY)
  • where nodes(?.?) the set of ? elements reachable
    from the root by following ?.

42
Regular an extension of absolute constraints
  • Example
  • Key (university._ company._).employee._at_eid
    ?
  • (university._ company._).employee
  • FK _.student._at_taughtBy ? university._.employe
    e._at_eid
  • Observation nodes( _. ? ) ext(?)
  • Recall absolute constraints
  • key ??X ? ? ? ? ? _. ? X ? ? _. ?
  • foreign key ??1X ?? ??2Y, ??2Y ? ? ??2
    ?
  • ? _. ?1 X ?? _.??2 Y, ?
    _. ?2 Y ? ?_.??2

43
Consistency analysis of regular constraints
  • Corollary The consistency problem is undecidable
    for multi-attribute regular keys and foreign
    keys.
  • Theorem It is decidable in 2NEXPTIME and is
    PSPACE-hard for unary regular constraints.
  • 2NEXPTIME an involved encoding in terms of LIP
  • regular expressions in a DTD interact with
    (vertical) regular path expressions reduce DTD
    to a simple normal form
  • regular path expressions interact with each
    other introduce exponentially many variables for
    all boolean combinations
  • encoding reachability (nodes(?.?)) of a path
    expression tag variables with states of finite
    state automata

44
Some tractable cases
  • Restrictions on constraints.
  • Theorem For multi-attribute relative keys only,
    the consistency problem is in linear time for
    arbitrary DTDs.
  • Recall relative keys country (province._at_name
    ? province)
  • In contrast, due to the existence and uniqueness
    condition
  • Theorem It is intractable for unary keys alone
    in XML Schema.
  • Restrictions on DTDs
  • Theorem When DTD is fixed, the consistency
    problem is in PTIME for absolute unary keys and
    foreign keys.
  • In practice, DTD is designed at one time, but
    constraints are written in stages constraints
    are incrementally added.

45
Implication analysis PODS00, 01, 02, DBPL01
  • Given D a DTD
  • ? a set of constraints expressed in
    C
  • ? a property (a constraint of C)
  • Implication (C ) Is it the case that for any
    XML document, if it conforms to D and satisfies
    ?, then it must satisfy ??
  • C a constraint language
  • The need for studying implication
  • data integration constraints checking at virtual
    views
  • optimization of XML queries and XML relational
    storage
  • design theory for XML specifications
    normalization

46
Some complexity results for implication analysis
  • Theorem The implication problem is
  • undecidable for multi-attribute absolute keys
    and foreign keys, and for unary relative keys and
    foreign keys
  • PSPACE-hard for unary regular absolute keys and
    foreign keys
  • coNP-complete for unary absolute keys and foreign
    keys.
  • coNP-hard for XML-Schema unary keys
  • in linear time for absolute multi-attribute keys
  • in PTIME for arbitrary absolute keys and foreign
    keys when the DTD is fixed, and
  • in PTIME for relative path keys in the absence of
    DTDs
  • The analysis of XML constraints is far more
    intricate than its relational counterpart

47
Applications
  • Application of XML constraints, and open problems
  • Constraint propagation
  • Schema-directed XML integration
  • Normal form
  • Query rewriting/optimization
  • Update processing
  • Data cleaning
  • . . .

48
XML shredding relational storage of XML data
  • XML shredding
  • mapping XML data to relations
  • relational design normalization
  • optimal relational storage of XML data
  • semantic connection query/update optimization

Web
XML
XML
XML keys
XML shredding
propagation
DB1
DB2
relational FDs
49
Example XML constraints
  • (//book, isbn) -- isbn is an (absolute)
    key of book
  • (//book, (chapter, number) -- number is a
    key of chapter relative to book
  • (//book, (title, )) -- each book has a
    unique title

chapter
chapter
50
Mapping from XML to a predefined relation
  • Predefined RDB chapter(bookTitle, chapterNum,
    chapterTitle)
  • Mapping for each book, extract its title, and
    the numbers and titles of all its chapters
  • Predefined relational key (bookTitle,
    chapterNum)
  • Can the XML data be mapped to the RDB without
    violating the key?

51
A safe mapping
  • Now change the relational schema to
  • RDB chapter(isbn, chapterNum, chapterTitle)
  • The relation can be populated without any
    violation. Why?
  • The relational key (isbn, chapterNum) for
    chapter is implied (entailed) by the keys on the
    original XML data
  • (//book, isbn), (//book, (chapter,
    number), (//book, (title, ))

52
Constraint Propagation ICDE03, JCSS
  • Input
  • a set K of XML keys (context and target path a
    fragment of XPath, key paths attributes)
  • a predefined relational schema S,
  • a mapping f from XML to S (XPath, projection,
    join, union)
  • and a relational functional dependency FD over S
  • Output is the FD propagated from K via f?
    I.e., does FD hold over the DB f(T) for any XML
    document T that satisfies K?
  • Theorem The constraint propagation problem is in
    PTIME.
  • Checking the consistency of a predefined
    relational schema for storing XML data
  • XML schema/DTD is not required K is the only
    semantics

53
Deriving relational schema for storing XML
  • One wants to find a good relational schema to
    store
  • chapter(isbn, bookTitle, author, chapterNum,
    chapterTitle)
  • What is a good schema? In normal form BCNF, 3NF,
  • Prevent update anomaly (the relational theory)
  • Efficient storage, query optimization
  • But how to find a normalized design?

54
Constraint propagation and normalization
  • From the given XML keys
  • (//book, isbn), (//book, (chapter,
    number), (//book, (title, ))
  • one can derive functional dependencies
  • isbn ? bookTitle, isbn, chapterNum ?
    chapterTitle
  • Normalize the relation by using these functional
    dependencies
  • chapter(isbn, bookTitle, author, chapterNum,
    chapterTitle)
  • book(isbn, bookTitle),
  • chapter(isbn, chapterNum, chapterTitle),
  • author(isbn, author)
  • The new schema is in BCNF!

55
Computing minimum cover of propagated FDs
  • Input a set K of XML keys, and a mapping f
    from XML to a universal schema U
  • Output a minimum cover F of all the functional
    dependencies (FDs) propagated from the XML keys K
    via f
  • F is a cover (a set of FDs) any FD propagated
    from K via f is implied by F
  • F is minimum F contains no redundant FDs, i.e.,
    any FD in F is not entailed by other FDs in F.
  • Theorem There is a PTIME algorithm for computing
    a minimum cover of propagated FDs.
  • Normalize relational schema for storing/querying
    XML data!

56
Research issues
  • For general constraints/mapping languages
    undecidable
  • if the mapping language is relationally complete
    (selection, projection, join, union, difference),
    even for XML keys alone
  • if both XML keys and foreign keys are considered,
    even for the identity transformation
  • Open
  • To identify (a) practical mapping languages and
    (b) practical XML constraints that allow
    efficient constraint propagation
  • Constraint propagation from relations to XML
  • Information preserving (lossless) data exchange
  • Query/update rewriting/optimization
  • Overcoming incompleteness of source data (foreign
    keys)

57
XML publishing/integration
  • All members of a community (or industry) agree on
    a schema and exchange data w.r.t. the schema
    e-commerce, health-care, ...
  • Schema-directed XML Publishing/Integration
  • mapping data from traditional database to XML
  • satisfying the predefined DTD and constraints

Web
XML
XML
Q XML view
DB1
DB2
58
Schema-directed integration SIGMOD03
DTD
DB
DB
integration
DB
constraints
multiple, distributed sources
  • Schema-directed XML view conforming to a schema
    (D, ?)
  • D a DTD
  • ? a set of XML constraints (relative keys,
    foreign keys)
  • Attribute Integration Grammar (AIG)
  • DTD-directed view definition recursive,
    nondeterministic
  • Inherited and synthesized attributes
  • Constraint compilation automatically captures
    integrity constraints and DTD in a uniform
    framework

59
XML normal forms
  • 3NF, BCNF?
  • Extensions of (nested) relational normal forms,
    via XML FDs
  • M. Arenas and L. Libkin. A Normal Form for XML
    Documents, PODS 02. XNFs, decomposition
    algorithms, complexity,
  • M. Vincent, J. Liu and C. Liu. Strong functional
    dependencies and their application to normal
    forms in XML. TODS 29(3), 2004
  • X. Wu, T.W. Ling, S. Lee, M. Lee, G. Dobbie.
    NF-SS A Normal Form for Semistructured Schema.
    ER (Workshops) 2001

60
Research issues for XML normal forms
  • Implication analysis more intriguing than
    relational FDs
  • Relative functional dependencies hierarchical
    nature of XML
  • Right normal form for XML to prevent update
    anomalies?
  • XML data is often static update anomalies?
  • XML data is typically stored in RDBMS
  • When XML data is updated, it is done through
    RDBMS
  • Redundancy often helps, e.g., performance and
    reliability
  • Normal form a right class of constraints to
    assure lossless shredding into relations of
    certain normal form
  • Unfortunately, no previous work has studied this

61
Run-time analysis incremental constraint
checking
  • Input XML tree T, constraints ?, update ?T,
    where T satisfies ?
  • Question does (T ?T) satisfy ??
  • ?X . Code generator incremental checking. Lucent
    applications
  • M. Benedikt, G. Brun, J. Gibson, R. Kuss and A.
    Ng. Automated update management for XML integrity
    constraints. PLANX02
  • Application of incremental techniques for
    attribute grammar
  • M. Abrao, B. Bouchou, M. Alves, D. Laurent, M.
    Musicante. Incremental Constraint Checking for
    XML Documents XSym04
  • Research issues
  • Complexity of incremental constraint checking
  • XML editors broken link detection and repair
  • Incremental checking techniques for XML data
    stored in RDBMS

62
Query rewriting and optimization
  • Query translation from XQuery to SQL XML data
    stored in RDBMS
  • encode XIGs and XQuery in relational queries and
    constraints
  • extensions of chase and backchase
  • A. Deustch and V. Tannen
  • Reformulation of XML Queries and Constraints
    ICDT03
  • MARS A System for Publishing XML from Mixed and
    Redundant Storage VLDB03
  • R. Krishnamurthy, R. Kaushik, J. Naughton.
    Efficient XML-to-SQL Query Translation Where to
    Add the Intelligence? VLDB 2004
  • Research issues
  • Rewriting queries over (recursive security) views
    of XML data
  • Query optimization for (compressed) XML data in
    native store

63
Data cleaning
  • Input XML tree T, constraints ?, DTD D
  • Question if T does not satisfy D ?, find a
    repair T such that (a) T satisfies D ?, and
    (b) the distance between T and T is minimal
    (update operations insert, delete, modify)
  • G. Flesca, F. Furfaro, S. Greco, E. Zumpano.
    Repairs and Consistent Answers for XML Data with
    Functional Dependencies XSym03
  • Research issues
  • Effective techniques for repairing integrated XML
    data conflicts and inconsistencies may emerge as
    violations of constraints.
  • Various constraint languages,
  • XML schema
  • Automated tools for repairing Web pages broken
    links

64
Summary
  • Specification of XML constraints
  • absolute vs. relative, path constraints XML data
    is hierarchical and semi-structured
  • mild extensions of relational constraints are not
    sufficient
  • Consistency and implication analysis of XML
    constraints
  • DTDs interact with XML constraints
  • far more intricate than their relational
    counterparts
  • Applications of XML constraints
  • XML storage, query, update, integration,
    cleaning,
  • many practical issues remain to be explored

65
References
  • In addition to the papers mentioned earlier
  • Keys for XML
  • Computer Networks, Volume 39(5), August 2002, pp
    473 - 487.
  • P. Buneman, S. Davidson, W. Fan, C. Hara, W.
    Tan
  • On XML Integrity Constraints in the Presence of
    DTDs
  • Journal of the ACM (JACM), 49(3), pp 368 - 406,
    May 2002.Wenfei Fan and Leonid Libkin
  • On Verifying Consistency of XML Specifications
  • PODS 2002Marcelo Arenas, Wenfei Fan and Leonid
    Libkin
  • What's Hard about XML Schema Constraints?
  • DEXA 2002
  • Marcelo Arenas, Wenfei Fan and Leonid Libkin

66
References
  • Propagating XML Constraints to Relations
  • JCSS, 73(3)316-361, May 2007.Susan Davidson,
    Wenfei Fan, and Carmem Hara
  • Capturing both Types and Constraints in Data
    Integration SIGMOD, 2003M. Benedikt, C. Chan, W.
    Fan, J. Freire, and R. Rastogi
  • XML Constraints Specification, Analysis, and
    Applications
  • LAAIC, 2005Wenfei Fan
  • Containment and Integrity Constraints for XPath
  • KRDB 2001
  • Alin Deutsch, Val Tannen
Write a Comment
User Comments (0)
About PowerShow.com