Foundational Data Modeling and Schema Transformations for XML Data Engineering - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Foundational Data Modeling and Schema Transformations for XML Data Engineering

Description:

Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David W. Embley – PowerPoint PPT presentation

Number of Views:248
Avg rating:3.0/5.0
Slides: 43
Provided by: Steph555
Category:

less

Transcript and Presenter's Notes

Title: Foundational Data Modeling and Schema Transformations for XML Data Engineering


1
Foundational Data Modeling and Schema
Transformations for XML Data Engineering
  • Stephen W. Liddle
  • Information Systems Department
  • Reema Al-Kamha David W. Embley
  • Computer Science Department
  • Brigham Young University, Provo, Utah

2
XML Data Engineering
  • Model XML conceptually
  • Map conceptual models to XML
  • Reverse-engineer XML to conceptual models
  • Ensure properties
  • Information preserving transformations
  • Constraint preserving transformations
  • Redundancy-free guarantees

3
C-XML
4
Modeling XML Conceptually
  • Scaling the mountain of abstraction
  • Delicate balance
  • Enough modeling constructs
  • But not to many
  • High-level capture of essentials
  • Avoidance of low-level implementation details
  • Formal but easily understood
  • XML needs better abstractions

5
XML Schema/Model Mismatch
  • XML features not explicitly supported in
    traditional conceptual models
  • Ordered lists of concepts
  • Choice of concept from among several
  • Mixed content
  • Use of content from another model
  • Nested information hierarchies
  • C-XML

6
Missing Modeling Constructs (1)
  • Sequence structure
  • Parent concept
  • Ordered child concepts
  • Constrained recurrence of children
  • Constrained recurrence of sequence itself

ltxssequence minOccurs"1" maxOccurs"2"gt
ltxselement name"FirstName" type"xsstring"/gt
ltxselement name"MiddleName" type"xsstring
minOccurs"0" maxOccurs"2"/gt
ltxselement name"LastName" type"xsstring"/gt lt/x
ssequencegt
7
Missing Modeling Constructs (1)
8
Missing Modeling Constructs (2)
  • Choice structure
  • Parent concept
  • Choose one child concept from several
    alternatives
  • Constrained recurrence of chosen child
  • Constrained recurrence of choice itself

ltxschoice maxOccurs"2"gt ltxselement
name"PhoneNumber" type"xsstring"
minOccurs"1" maxOccurs"2" /gt ltxselement
name"Email" type"xsstring"/gt ltxselement
name"Fax" type"xsstring"/gt lt/xschoicegt
9
Missing Modeling Constructs (3)
  • Mixed attribute
  • Allows character and element data to be
    intertwined
  • ltxscomplexType mixed"true"gt
  • Any and anyAttribute structures
  • Insert structures from other namespaces
  • Constrained recurrence
  • ltxsany namespace"other" minOccurs"0"/gt
  • ltxsanyAttribute namespace"any"/gt

10
Missing Modeling Constructs (4)
  • Nesting of hierarchical structures
  • Key organizational characteristic of XML
  • Arbitrarily complex nesting possible

11
C-XML Example
12
C-XML to XML Schema
13
lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"
elementFormDefault"qualified"gt ltxselement
name"Root"gt ltxscomplexTypegt
ltxsallgt ltxselement
ref"Students"/gt ltxselement
ref"Courses"/gt ltxselement
ref"GradStudents"/gt ltxselement
ref"UndergradStudents"/gt lt/xsallgt
lt/xscomplexTypegt ltxskeyref
name"UndergradStudentOID-Keyref"
refer"StudentOID-Key"gt ltxsselector
xpath"./UndergradStudents/UndergradStudent"/gt
ltxsfield xpath"_at_UndergradStudentOID"/gt
lt/xskeyrefgt ltxskeyref
name"GradStudentOID-Keyref" refer"StudentOID-Key
"gt ltxsselector xpath"./GradStudents/Gra
dStudent"/gt ltxsfield xpath"_at_GradStudent
OID"/gt lt/xskeyrefgt lt/xselementgt
ltxselement name"Students"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Student" maxOccurs"unbounded"
gt ltxscomplexTypegt
ltxssequencegt ltxschoice
minOccurs"1" maxOccurs"1"gt
ltxselement name"StudentName"
type"xsstring"/gt
ltxssequencegt
ltxselement name"FirstName" type"xsstring"/gt
ltxselement
name"MiddleNames"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"MiddleName" minOccurs"0"
maxOccurs"2"gt
ltxscomplexTypegt
ltxsattribute name"MiddleName"
type"xsstring" use"required"/gt
lt/xscomplexTypegt
lt/xselementgt
lt/xssequencegt
lt/xscomplexTypegt
ltxskey name"MiddleName-Key"gt
ltxsselector
xpath"./MiddleName"/gt
ltxsfield xpath"_at_MiddleName"/gt
lt/xskeygt
lt/xselementgt
ltxselement name"LastName" type"xsstring"/gt
lt/xssequencegt
lt/xschoicegt
ltxselement name"Semester-Course-Grades"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"Semester-Course-Grade"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxsattribute
name"Semester" use"required"/gt
ltxsattribute ref"Course"
use"required"/gt
lt!-- C-XML forall x (Course(x)gtexists 0
ltx1, x2, x3gt (Course(x) Student(x1) Semester(x2)
Grade(x3) )) --gt
ltxsattribute name"Grade" type"xsstring"
use"required"/gt
lt/xscomplexTypegt
lt/xselementgt
lt/xssequencegt
lt/xscomplexTypegt ltxskey
name"Semester-Course-Grade-Key"gt
ltxsselector xpath"./Semester-Course-G
rade"/gt ltxsfield
xpath"_at_Semester"/gt
ltxsfield xpath"_at_Course"/gt
ltxsfield xpath"_at_Grade"/gt
lt/xskeygt
lt/xselementgt lt/xssequencegt
ltxsattribute name"StudentOID"
type"xsstring" use"required"/gt
ltxsattribute name"StudentID" type"xsstring"
use"required"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"StudentOID-Key"gt
ltxsselector xpath"./Student"/gt
ltxsfield xpath"_at_StudentOID"/gt lt/xskeygt
ltxskey name"StudentID-Key"gt
ltxsselector xpath"./Student"/gt
ltxsfield xpath"_at_StudentID"/gt lt/xskeygt
lt/xselementgt ltxselement name"Courses"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Course" maxOccurs"unbounded"
gt ltxscomplexTypegt
ltxsattribute ref"Course" use"required"/gt
ltxsattribute name"Department"
type"xsstring" use"required"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"Course-Key"gt ltxsselector
xpath"./Course"/gt ltxsfield
xpath"_at_Course"/gt lt/xskeygt
lt/xselementgt ltxselement name"GradStudents"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"GradStudent"
maxOccurs"unbounded"gt
ltxscomplexTypegt ltxsattribute
name"GradStudentOID" type"xsstring"
use"required"/gt ltxsattribute
name"Advisor" use"required"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"GradStudentOID-Key"gt
ltxsselector xpath"./GradStudent"/gt
ltxsfield xpath"_at_GradStudentOID"/gt
lt/xskeygt lt/xselementgt ltxselement
name"UndergradStudents"gt ltxscomplexTypegt
ltxssequencegt ltxselement
name"UndergradStudent" maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxsattribute name"UndergradStudentOID"
type"xsstring" use"required"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"UndergradStudentOID-Key"gt
ltxsselector xpath"./UndergradStudent"/gt
ltxsfield xpath"_at_UndergradStudentOID"/gt
lt/xskeygt lt/xselementgt ltxsattribute
name"Course" type"xsstring"/gt lt/xsschemagt
14
Algorithm Overview
  • Generate a forest of scheme trees
  • Translate an individual object set
  • Translate scheme-tree collections of object sets
  • Create a root node
  • Add uniqueness constraints
  • Translate generalization/specialization
    hierarchies

15
Generate Scheme Trees
(Student, StudentID, StudentName, FirstName,
LastName, (MiddleName), (Course, Semester,
Grade))
16
Generate Scheme Trees
(Course, Department)
17
Generate Scheme Trees
(GradStudent, Advisor)
(UndergradStudent)
18
Generate Scheme Trees
(Student, StudentID, StudentName, FirstName,
LastName, (MiddleName), (Course, Semester,
Grade))
(Course, Department)
(GradStudent, Advisor)
(UndergradStudent)
19
Generate Scheme Trees
Student, StudentID, StudentName, FirstName,
LastName
(Student, StudentID, StudentName, FirstName,
LastName, (MiddleName), (Course, Semester,
Grade))
(Course, Department)
(GradStudent, Advisor)
(UndergradStudent)
20
Individual Object Sets
ltxsattribute name"Department"
type"xsstring"/gt ltxsattribute name"Course"
type"xsstring"/gt ltxsattribute
ref"Course"/gt ltxselement name"FirstName"
type"xsstring"/gt ltxselement name"Student"gt
ltxscomplexTypegt ... ltxsattribute
name"StudentOID" type"xsstring"
use"required"/gt lt/xscomplexTypegt lt/xselement
gt
21
Scheme-Tree Translation
MiddleNames
Students
Student
Students
Course-Semester-Grades
MiddleNames
MiddleName
Course-Semester-Grade
UndergradStudents
Courses
GradStudents
UndergradStudent
Course
GradStudent
22
Scheme-Tree Translation
ltxselement name"Semester-Course-Grades"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Semester-Course-Grade"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt ...
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
... lt/xselementgt
ltxselement name"Students"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Student" maxOccurs"unbounded"
gt ltxscomplexTypegt
... lt/complexTypegt
lt/xselementgt lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
23
Scheme-Tree Translation
ltxselement name"Semester-Course-Grade"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt ltxsattribute
name"Semester" use"required"/gt
ltxsattribute ref"Course" use"required"/gt
lt!-- C-XML forall x (Course(x)gt
exists 0 ltx1, x2, x3gt (Course(x) Student(x1)
Semester(x2) Grade(x3) )) --gt ltxsattribute
name"Grade" type"xsstring" use"required"/gt
lt/xscomplexTypegt lt/xselementgt
24
(No Transcript)
25
Root Element
ltxsschema gt ltxselement name"Root"gt
ltxscomplexTypegt ltxsallgt
ltxselement ref"Students"/gt
ltxselement ref"Courses"/gt
ltxselement ref"GradStudents"/gt
ltxselement ref"UndergradStudents"/gt
lt/xsallgt lt/xscomplexTypegt ...
lt/xselementgt ... lt/xsschemagt
Students
Courses
GradStudents
UndergradStudents
26
Uniqueness Constraints
ltxselement name"Students"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Student"
maxOccurs"unbounded"gt
ltxscomplexTypegt ...
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"StudentOID-Key"gt
ltxsselector xpath"./Student"/gt
ltxsfield xpath"_at_StudentOID"/gt lt/xskeygt
ltxskey name"StudentID-Key"gt
ltxsselector xpath"./Student"/gt
ltxsfield xpath"_at_StudentID"/gt lt/xskeygt
lt/xselementgt
27
Generalization/Specialization
ltxskeyref name"UndergradStudentOID-Keyref"
refer"StudentOID-Key"gt ltxsselector
xpath"./UndergradStudents/UndergradStudent"/gt
ltxsfield xpath"_at_UndergradStudentOID"/gt
lt/xskeyrefgt ltxskeyref
name"GradStudentOID-Keyref" refer"StudentOID-Key
"gt ltxsselector xpath"./GradStudents/Gra
dStudent"/gt ltxsfield xpath"_at_GradStudent
OID"/gt lt/xskeyrefgt
28
XML Schema to C-XML
29
(No Transcript)
30
Algorithm Overview
  • Generate object sets for each element attribute
  • Specify built-in and simple types in data frames
  • Obtain relationship sets from parent-child
    connections
  • Obtain participation constraints from minOccurs,
    maxOccurs, and use constraints

31
Attribute Transformation
32
Element Transformation
33
Choice Transformation
34
Sequence Transformation
35
Key Constraints Transformation
36
Substitution Group Extension Transformation
37
Observation on Transformations
  • These transformations to and from C-XML are not
    inverses of one another
  • However,

C-XML
XML Schema
XML Schema
C-XML
38
Demo
39
Property Guarantees
40
Transformation Properties C-XML to XML
Schema
  • Theorem 1 preserves information.
    Proof injective
  • Theorem 2 Allowing for pragma constraints,
    preserves constraints.
    Proof by construction
  • Theorem 3 yields an XML-Schema instance whose
    complying XML documents are redundancy free.
    Proof TKDE, Aug06

41
Transformation Properties XML Schema to C-XML
  • Theorem 4 preserves information.
    Proof
    injective
  • Theorem 5 preserves constraints.
    Proof by
    construction

42
Conclusions
  • C-XML models XML conceptually
  • Transformations
  • C-XML to XML
  • Reverse-engineer XML to C-XML
  • Properties
  • Information preserving
  • Constraint preserving
  • Redundancy-free guarantee

www.deg.byu.edu
Write a Comment
User Comments (0)
About PowerShow.com