On Verifying Consistency of XML Specifications - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

On Verifying Consistency of XML Specifications

Description:

elements: db, province, capital, city, subtrees/sub-document ... Document Type Definition: a formalism. DTD D = (E, A, P, R, r) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 41
Provided by: CIS471
Category:

less

Transcript and Presenter's Notes

Title: On Verifying Consistency of XML Specifications


1
On Verifying Consistency of XML Specifications
  • Wenfei Fan
  • Internet Management Research Dept., Bell Labs
  • Dept. of CIS, Temple University

2
Overview
  • XML Specifications
  • types DTDs (Document Type Definitions)
  • integrity constraints keys and foreign keys
  • Interaction between DTDs and constraints
  • Consistency analysis of XML specifications
  • absolute constraints
  • relative constraints
  • regular constrains
  • Implication analysis of XML constraints
  • Joint work with L. Libkin and M. Arenas, Univ. of
    Toronto PODS01, PODS02, JACM

3
  • 1. XML specifications introduction

4
XML data - an example
  • Rooted, node-labeled tree
  • elements db, province, capital, city,
    subtrees/sub-document
  • subelements, e.g., the capital child of province
  • _at_attributes _at_name, _at_inProvince, carrying text
  • text nodes, e.g., Hasselt

5
XML specifications with DTDs
  • Production constrains the subelement list of
    each element lt!ELEMENT db (province,
    capital)gt
  • lt!ELEMENT province (city, capital)gt
  • Attributes uniquely identified by name for each
    element, unordered
  • province _at_name, capital _at_inProvince

6
Document Type Definition a formalism
  • DTD D (E, A, P, R, r)
  • E a set of element types, e.g., db, province,
    capital, city
  • A a set of attributes, e.g., _at_name,
    _at_inProvince
  • P element type definitions in terms of regular
    expressions, e.g., db ? province, capital
  • R attribute definitions,
  • e.g., province._at_name, capital._at_inProvince
  • r the element type of the root, e.g., db.
  • ECFG nonterminals (E, A), productions (P, R),
    start symbol (r)

7
XML specifications with constraints
  • Keys and foreign keys (vs. relational
    constraints)
  • key the value of a _at_name uniquely identifies a
    province
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • FK _at_inProvince of a capital references _at_name of
    a province
  • capital._at_inProvince ? province._at_name

8
Why keys and foreign keys?
  • Supported by the XML standard, XML Schema, XML
    Data
  • In databases (supported by SQL standard)
  • essential part of the semantics of data,
  • fundamental to conceptual design,
  • useful for choosing efficient storage and access
    methods,
  • central to update anomaly prevention,
  • In the XML setting have proved useful in
  • database storage of XML data (query and update),
  • database publishing in XML,
  • data integration,
  • XML query optimization and formulation,
  • design theory for XML specifications,

9
XML specification
  • A DTD D
  • A set of keys and foreign keys, ?
  • Example
  • DTD D structure of the document
  • lt!ELEMENT db (province, capital)gt
  • lt!ELEMENT province (city, capital)gt
  • province._at_name, capital._at_inProvince
  • Constraints ? fundamental semantics of the data
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • capital._at_inProvince ? province._at_name

10
  • 2. Interaction between DTDs and constraints

11
Consistency of XML specifications
  • Given D a DTD
  • ? a set of keys and foreign keys
    over D
  • Consistency Is there an XML document that both
    conforms to D and satisfies ??
  • One wants to know whether XML specifications make
    sense!
  • Run-time check attempts to validate documents
    with (D, ?).
  • This would not tell us whether repeated failures
    are due to a bad specification or problems with
    the documents
  • ? static analysis is a better approach

12
An inconsistent specification
  • The specification with D and ? is inconsistent!
  • DTD D
  • lt!ELEMENT db (province, capital)gt
  • lt!ELEMENT province (city, capital)gt
  • province._at_name, capital._at_inProvince
  • Constraints ?
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • capital._at_inProvince ? province._at_name
  • In contrast, one can specify keys and foreign
    keys in SQL without worrying about their
    consistency with schema.

13
Cardinality constraints by keys, foreign keys
  • Constraints ?
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • capital._at_inProvince ? province._at_name
  • Notation
  • ext(?) the set of ? elements in an XML document
  • ext(?.l) the set of l attribute values of all ?
    elements
  • ?
  • ext(province._at_name)
    ext(province)
  • ext(capital._at_inProvince) ext(capital)
  • ext(capital._at_inProvince) ?
    ext(province._at_name)
  • ?
  • ext(capital) ? ext(province)

14
Cardinality constraints imposed by DTDs
  • DTD D lt!ELEMENT db (province, capital)gt
  • lt!ELEMENT province (city,
    capital)gt
  • Variables
  • Xprovince the number of province elements under
    the root
  • Xcapital the number of capital subelements of
    the root
  • Ycapital the number of capital subelements of
    provinces
  • ?
  • Xprovince ? 1, Xcapital ? 1
  • ext(province) Xprovince,
    Xprovince Ycapital
  • ext(capital) Xcapital Ycapital
  • ?
  • ext(capital) gt ext(province)

15
The interaction
  • Contradiction
  • From the constraints ? ext(capital) ?
    ext(province)
  • From the DTD D ext(capital) gt
    ext(province)
  • Thus there exists NO XML document that both
    conforms to D and satisfies ?.

16
  • 3. Consistency analysis of XML specifications

17
The consistency problem
  • Given D a DTD
  • ? a set of keys and foreign keys
    expressed in C
  • Consistency (C ) Is there an XML document that
    both conforms to D and satisfies ??
  • C a constraint language, ranges over
  • absolute constraints
  • relative constraints
  • regular constraints
  • These constraint languages are important for
    hierarchically structured data, including but not
    limited to XML.

18
Absolute keys and foreign keys
  • key ??X ? ?. A document satisfies the key
    iff
  • ? x y ? ext(?) (?l ?X (x.l y.l) ? x y)
  • foreign key (FK) a combination of an inclusion
    constraint ??1X ?? ??2Y, and a key ?
    ?2Y ? ? ??2 .
  • A document satisfies the FK iff it satisfies the
    key and
  • ? x ? ext(??1 ) ? y ? ext(??2 ) (xX yY)
  • where ??, ?1 ,??2 element types X, Y sets
    (lists) of attributes
  • ext(?) the set of ? elements in an XML document.
  • Equality issue
  • value equality when comparing attributes
  • node identify when comparing XML elements

19
More on absolute constraints
  • Absolute constraints are to hold on the entire
    document.
  • Unary constraints keys and foreign keys defined
    in terms of single-attribute.
  • Example of unary constraints
  • province._at_name ? province
  • capital._at_inProvince ? capital
  • capital._at_inProvince ? province._at_name

20
Consistency analysis
  • Trivial for relational databases given any
    schema and keys, foreign keys, one can always
    find a nonempty instance of the schema satisfying
    the constraints.
  • Hard for XML XML specifications may not be
    consistent!
  • Both DTDs and constraints impose cardinality
    constraints
  • The interaction between these two classes of
    cardinality constraints is rather complicated.

21
Consistency analysis of absolute constraints
  • Theorem The consistency problem is undecidable
    for multi-attribute keys and foreign keys.
  • Theorem It becomes NP-complete for unary
    constraints.
  • Primary key restriction at most one key for each
    element type.
  • Theorem It remains intractable for unary
    constraints under the primary key restriction.
  • Theorem For primary multi-attribute keys and
    unary foreign keys, the consistency problem is
    decidable in NEXPTIME.
  • As opposed to the trivial analysis of the
    relational counterpart.

22
Proof ideas
  • Multi-attribute constraints reduction from the
    implication problem for functional and inclusion
    dependencies in RDBs.
  • Unary keys and foreign keys
  • a nontrivial encoding of DTDs and unary
    constraints in terms of linear integer
    constraints (O(n2 log n)-time)
  • polynomially equivalent to LIP, linear integer
    programming
  • Multi-attribute primary keys and unary foreign
    keys
  • polynomially equivalent to Prequadratic
    Diophantine Problem (PDE) satisfiability of
    linear integer constraints and prequadratic
    constraints of the form x ? y z
  • the precise complexity of PDE, a restriction to
    the Hilberts 10th problem, is open --
    nontrivial.

23
Introduction to relative constraints
  • An XML tree specifies countries, provinces,
    province capitals.
  • What is a key for a province?
  • What does _at_inProvince of a capital reference?

db
...
country
country
...
...
province
capital
capital
province
_at_name
_at_name
Holland
Belgium
capital
_at_name
_at_name
capital
_at_inProvince
Hasselt
_at_inProvince
Maastricht
Limburg
Limburg
Limburg
Limburg
_at_inProvince
Hasselt
_at_inProvince
Hasselt
Limburg
Limburg
24
Examples of relative constraints
  • Relative constraints on a subdocument rooted at
    a country
  • key country (province._at_name ?
    province)
  • country (capital._at_inProvince ? capital)
  • FK country (capital._at_inProvince ?
    province._at_name)
  • Absolute on the entire document country._at_name
    ? country

db
...
country
country
...
...
province
capital
capital
province
_at_name
_at_name
Belgium
Holland
capital
_at_name
Hasselt
capital
_at_name
_at_inProvince
_at_inProvince
Maastricht
Limburg
Limburg
Limburg
Limburg
_at_inProvince
Hasselt
_at_inProvince
Hasselt
Limburg
Limburg
25
Relative keys and foreign keys
  • key ??(??1X ? ??1). A document satisfies the
    key iff
  • ? c ? ext(?) ? y, z ? ext(?1)
  • ( (y ?? c) ? (z ?? c) ? ?l ?X (y.l z.l) ?
    y z)
  • foreign key (FK) ??( ?1X ?? ??2Y ) and a key
    ?( ?2Y ? ??2) .
  • A document satisfies the FK iff it satisfies the
    key and
  • ? c ? ext(?) ? y ? ext(?1) (( y ?? c) ?
  • ? z ? ext(??2 ) ((z ?? c) ? yX zY
    ))
  • where ?
  • (y ?? c) y is a descendant of c (y in the
    subtree rooted at c)
  • ? context type
  • ext(?) the set of ? elements in an XML document.

26
Relative vs. Absolute
  • Absolute constraints are a special case of
    relative ones
  • country._at_name ? country ?
  • db ( country._at_name ? country )
  • absolute a fixed context type r
  • Absolute constraints are scoped within the entire
    document whereas relative ones within the
    context of a subdocument.
  • country (province._at_name ? province)
  • country (capital._at_inProvince ? capital)
  • country (capital._at_inProvince ?
    province._at_name)
  • country._at_name ? country
  • Together they specify constraints on the entire
    document
  • Important for hierarchically structured data
    XML, scientific databases, biomedical data, ...

27
Consistency analysis of relative constraints
  • Theorem The consistency problem is undecidable
    for relative keys and foreign keys, even when all
    the constraints are unary and are under the
    primary key restriction.
  • As opposed to the NP complexity of its absolute
    counterpart.
  • Proof ideas reduction from the Hilberts 10th
    problem.
  • Diophantine equation problem
  • P1 (x1, , xk) Q1 (x1, , xk) c1
  • . . .
  • Pn (x1, , xk) Qn (x1, , xk) cn

28
Introduction to regular constraints
  • XML data is hierarchically structured
  • define _at_eid as a key of employees of companies
    and schools
  • define _at_taughtBy as a foreign key of students
    referencing _at_eid of school employees.

29
Examples of regular constraints
  • Key (university._ company._).employee._at_eid
    ?
  • (university._ company._).employee
  • FK _.student._at_taughtBy ? university._.employe
    e._at_eid
  • _ wildcard that matches any label
  • _ the Kleene closure of _

30
Regular path expression
  • Vertical regular expressions
  • ? ? ? _ ?.? ??
    ?
  • ? empty word ? element type _
    wildcard
  • ., , concatenation, disjunction, Kleene
    star
  • Example (university._ company._).employee
  • university._.employee
  • nodes(?. ?) the set of ? elements in an XML
    document that are reachable from the root by
    following ?

31
Regular expression constraints
  • key ? ?.?X ? ? ?.?. A document satisfies
    the key iff
  • ? x y ? nodes( ?.? ) (?l ?X (x.l y.l)
    ? x y)
  • foreign key ? ?1.?1X ?? ?2.?2Y, and a key
    ??2.?2Y ? ??2.?2
  • A document satisfies the FK iff it satisfies
    the key and
  • ? x ? nodes(? ?1.?1 ) ? y ? nodes(? ?2.?2 )
    (xX yY)
  • where nodes(?.?) the set of ? elements reachable
    from the root by following ?.

32
Regular an extension of absolute constraints
  • Example
  • Key (university._ company._).employee._at_eid
    ?
  • (university._ company._).employee
  • FK _.student._at_taughtBy ? university._.employe
    e._at_eid
  • Observation nodes( _. ? ) ext(?)
  • Recall absolute constraints
  • key ??X ? ? ? ? ? _. ? X ? ? _. ?
  • foreign key ??1X ?? ??2Y, ??2Y ? ? ??2
    ?
  • ? _. ?1 X ?? _.??2 Y, ?
    _. ?2 Y ? ?_.??2

33
Consistency analysis of regular constraints
  • Corollary The consistency problem is undecidable
    for multi-attribute regular keys and foreign
    keys.
  • Theorem It is decidable in NEXPTIME and is
    PSPACE-hard for unary regular constraints.
  • NEXPTIME an involved encoding in terms of LIP
  • regular expressions in a DTD interact with
    (vertical) regular path expressions reduce DTD
    to a simple normal form
  • regular path expressions interact with each
    other introduce exponentially many variables for
    all boolean combinations
  • encoding reachability (nodes(?.?)) of a path
    expression tag variables with states of finite
    state automata

34
Some tractable cases
  • Restrictions on constraints unary, primary.
  • Theorem For multi-attribute relative keys only,
    the consistency problem is in linear time for
    arbitrary DTDs.
  • Recall relative keys country (province._at_name
    ? province)
  • Restrictions on DTDs
  • Theorem When DTD is fixed, the consistency
    problem is in PTIME for absolute unary
    constraints.
  • In practice, DTD is designed at one time, but
    constraints are written in stages constraints
    are incrementally added.

35
Other restricted cases
  • Theorem In the absence of recursion and Kleene
    star in the DTD involved, the consistency problem
    remains
  • undecidable for multi-attribute absolute
    constraints,
  • intractable for unary absolute constraints,
  • PSPACE-hard for unary regular constraints.
  • Recall absolute ? regular unary
    single-attribute
  • Other severely restricted cases
  • nonrecursive DTD of a bounded depth, a set of
    absolute unary constraints of a bounded size
    NLOGSPACE
  • nonrecursive DTD, unary relative constraints that
    can be partitioned into sets local to each
    other with respect to the DTD (without
    interaction) PSPACE-complete

36
  • 4. Implication analysis of XML constraints

37
Implication of XML constraints
  • Given D a DTD
  • ? a set of keys and foreign keys
    expressed in C
  • ? a property (a key or foreign key of C)
  • Implication (C ) Is it the case that for any
    XML document, if it conforms to D and satisfies
    ?, then it must satisfy ??
  • C a constraint language
  • The need for studying implication
  • data integration constraints checking at virtual
    views
  • optimization of XML queries and XML relational
    storage
  • design theory for XML specifications
    normalization

38
Some complexity results for implication
  • Proposition For any class of XML constraints, if
    its consistency problem is K-hard, then its
    implication problem is coK-hard, where K is some
    complexity class that contains DLOGSPACE.
  • Corollary The implication problem is
  • undecidable for multi-attribute absolute
    constraints and for unary relative constraints
  • PSPACE-hard for unary regular constraints
  • coNP-hard for unary absolute constraints.
  • Recall relative country (province._at_name ?
    province)
  • regular _.student._at_taughtBy ?
    _.professor._at_id
  • absolute country._at_name ? country

39
Upper bounds
  • Theorem The implication problem is
    coNP-complete for unary absolute constraints.
  • Proof idea a nontrivial encoding in terms of
    LIP and the Set Intersection Pattern Problem
  • Theorem The implication problem is decidable in
  • linear time for absolute multi-attribute keys,
    and
  • in PTIME for arbitrary absolute constraints when
    the DTD is fixed.

40
Summary
  • XML specification DTD and constraints (keys,
    foreign keys)
  • Consistency and implication analysis of XML
    constraints
  • DTDs interact with XML constraints
  • The analysis is far more intricate than its
    relational counterpart
  • The negative results carry over to XML Schema,
    XML Data and other more expressive specifications
Write a Comment
User Comments (0)
About PowerShow.com