Title: Foundational Data Modeling and Schema Transformations for XML Data Engineering
1Foundational Data Modeling and Schema
Transformations for XML Data Engineering
- Stephen W. Liddle
- Information Systems Department
- Reema Al-Kamha David W. Embley
- Computer Science Department
- Brigham Young University, Provo, Utah
2XML Data Engineering
- Model XML conceptually
- Map conceptual models to XML
- Reverse-engineer XML to conceptual models
- Ensure properties
- Information preserving transformations
- Constraint preserving transformations
- Redundancy-free guarantees
3C-XML
4Modeling XML Conceptually
- Scaling the mountain of abstraction
- Delicate balance
- Enough modeling constructs
- But not to many
- High-level capture of essentials
- Avoidance of low-level implementation details
- Formal but easily understood
- XML needs better abstractions
5XML Schema/Model Mismatch
- XML features not explicitly supported in
traditional conceptual models - Ordered lists of concepts
- Choice of concept from among several
- Mixed content
- Use of content from another model
- Nested information hierarchies
- C-XML
6Missing Modeling Constructs (1)
- Sequence structure
- Parent concept
- Ordered child concepts
- Constrained recurrence of children
- Constrained recurrence of sequence itself
ltxssequence minOccurs"1" maxOccurs"2"gt
ltxselement name"FirstName" type"xsstring"/gt
ltxselement name"MiddleName" type"xsstring
minOccurs"0" maxOccurs"2"/gt
ltxselement name"LastName" type"xsstring"/gt lt/x
ssequencegt
7Missing Modeling Constructs (1)
8Missing Modeling Constructs (2)
- Choice structure
- Parent concept
- Choose one child concept from several
alternatives - Constrained recurrence of chosen child
- Constrained recurrence of choice itself
ltxschoice maxOccurs"2"gt ltxselement
name"PhoneNumber" type"xsstring"
minOccurs"1" maxOccurs"2" /gt ltxselement
name"Email" type"xsstring"/gt ltxselement
name"Fax" type"xsstring"/gt lt/xschoicegt
9Missing Modeling Constructs (3)
- Mixed attribute
- Allows character and element data to be
intertwined - ltxscomplexType mixed"true"gt
- Any and anyAttribute structures
- Insert structures from other namespaces
- Constrained recurrence
- ltxsany namespace"other" minOccurs"0"/gt
- ltxsanyAttribute namespace"any"/gt
10Missing Modeling Constructs (4)
- Nesting of hierarchical structures
- Key organizational characteristic of XML
- Arbitrarily complex nesting possible
11C-XML Example
12C-XML to XML Schema
13lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"
elementFormDefault"qualified"gt ltxselement
name"Root"gt ltxscomplexTypegt
ltxsallgt ltxselement
ref"Students"/gt ltxselement
ref"Courses"/gt ltxselement
ref"GradStudents"/gt ltxselement
ref"UndergradStudents"/gt lt/xsallgt
lt/xscomplexTypegt ltxskeyref
name"UndergradStudentOID-Keyref"
refer"StudentOID-Key"gt ltxsselector
xpath"./UndergradStudents/UndergradStudent"/gt
ltxsfield xpath"_at_UndergradStudentOID"/gt
lt/xskeyrefgt ltxskeyref
name"GradStudentOID-Keyref" refer"StudentOID-Key
"gt ltxsselector xpath"./GradStudents/Gra
dStudent"/gt ltxsfield xpath"_at_GradStudent
OID"/gt lt/xskeyrefgt lt/xselementgt
ltxselement name"Students"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Student" maxOccurs"unbounded"
gt ltxscomplexTypegt
ltxssequencegt ltxschoice
minOccurs"1" maxOccurs"1"gt
ltxselement name"StudentName"
type"xsstring"/gt
ltxssequencegt
ltxselement name"FirstName" type"xsstring"/gt
ltxselement
name"MiddleNames"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"MiddleName" minOccurs"0"
maxOccurs"2"gt
ltxscomplexTypegt
ltxsattribute name"MiddleName"
type"xsstring" use"required"/gt
lt/xscomplexTypegt
lt/xselementgt
lt/xssequencegt
lt/xscomplexTypegt
ltxskey name"MiddleName-Key"gt
ltxsselector
xpath"./MiddleName"/gt
ltxsfield xpath"_at_MiddleName"/gt
lt/xskeygt
lt/xselementgt
ltxselement name"LastName" type"xsstring"/gt
lt/xssequencegt
lt/xschoicegt
ltxselement name"Semester-Course-Grades"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"Semester-Course-Grade"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxsattribute
name"Semester" use"required"/gt
ltxsattribute ref"Course"
use"required"/gt
lt!-- C-XML forall x (Course(x)gtexists 0
ltx1, x2, x3gt (Course(x) Student(x1) Semester(x2)
Grade(x3) )) --gt
ltxsattribute name"Grade" type"xsstring"
use"required"/gt
lt/xscomplexTypegt
lt/xselementgt
lt/xssequencegt
lt/xscomplexTypegt ltxskey
name"Semester-Course-Grade-Key"gt
ltxsselector xpath"./Semester-Course-G
rade"/gt ltxsfield
xpath"_at_Semester"/gt
ltxsfield xpath"_at_Course"/gt
ltxsfield xpath"_at_Grade"/gt
lt/xskeygt
lt/xselementgt lt/xssequencegt
ltxsattribute name"StudentOID"
type"xsstring" use"required"/gt
ltxsattribute name"StudentID" type"xsstring"
use"required"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"StudentOID-Key"gt
ltxsselector xpath"./Student"/gt
ltxsfield xpath"_at_StudentOID"/gt lt/xskeygt
ltxskey name"StudentID-Key"gt
ltxsselector xpath"./Student"/gt
ltxsfield xpath"_at_StudentID"/gt lt/xskeygt
lt/xselementgt ltxselement name"Courses"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Course" maxOccurs"unbounded"
gt ltxscomplexTypegt
ltxsattribute ref"Course" use"required"/gt
ltxsattribute name"Department"
type"xsstring" use"required"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"Course-Key"gt ltxsselector
xpath"./Course"/gt ltxsfield
xpath"_at_Course"/gt lt/xskeygt
lt/xselementgt ltxselement name"GradStudents"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"GradStudent"
maxOccurs"unbounded"gt
ltxscomplexTypegt ltxsattribute
name"GradStudentOID" type"xsstring"
use"required"/gt ltxsattribute
name"Advisor" use"required"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"GradStudentOID-Key"gt
ltxsselector xpath"./GradStudent"/gt
ltxsfield xpath"_at_GradStudentOID"/gt
lt/xskeygt lt/xselementgt ltxselement
name"UndergradStudents"gt ltxscomplexTypegt
ltxssequencegt ltxselement
name"UndergradStudent" maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxsattribute name"UndergradStudentOID"
type"xsstring" use"required"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"UndergradStudentOID-Key"gt
ltxsselector xpath"./UndergradStudent"/gt
ltxsfield xpath"_at_UndergradStudentOID"/gt
lt/xskeygt lt/xselementgt ltxsattribute
name"Course" type"xsstring"/gt lt/xsschemagt
14Algorithm Overview
- Generate a forest of scheme trees
- Translate an individual object set
- Translate scheme-tree collections of object sets
- Create a root node
- Add uniqueness constraints
- Translate generalization/specialization
hierarchies
15Generate Scheme Trees
(Student, StudentID, StudentName, FirstName,
LastName, (MiddleName), (Course, Semester,
Grade))
16Generate Scheme Trees
(Course, Department)
17Generate Scheme Trees
(GradStudent, Advisor)
(UndergradStudent)
18Generate Scheme Trees
(Student, StudentID, StudentName, FirstName,
LastName, (MiddleName), (Course, Semester,
Grade))
(Course, Department)
(GradStudent, Advisor)
(UndergradStudent)
19Generate Scheme Trees
Student, StudentID, StudentName, FirstName,
LastName
(Student, StudentID, StudentName, FirstName,
LastName, (MiddleName), (Course, Semester,
Grade))
(Course, Department)
(GradStudent, Advisor)
(UndergradStudent)
20Individual Object Sets
ltxsattribute name"Department"
type"xsstring"/gt ltxsattribute name"Course"
type"xsstring"/gt ltxsattribute
ref"Course"/gt ltxselement name"FirstName"
type"xsstring"/gt ltxselement name"Student"gt
ltxscomplexTypegt ... ltxsattribute
name"StudentOID" type"xsstring"
use"required"/gt lt/xscomplexTypegt lt/xselement
gt
21Scheme-Tree Translation
MiddleNames
Students
Student
Students
Course-Semester-Grades
MiddleNames
MiddleName
Course-Semester-Grade
UndergradStudents
Courses
GradStudents
UndergradStudent
Course
GradStudent
22Scheme-Tree Translation
ltxselement name"Semester-Course-Grades"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Semester-Course-Grade"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt ...
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
... lt/xselementgt
ltxselement name"Students"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Student" maxOccurs"unbounded"
gt ltxscomplexTypegt
... lt/complexTypegt
lt/xselementgt lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
23Scheme-Tree Translation
ltxselement name"Semester-Course-Grade"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt ltxsattribute
name"Semester" use"required"/gt
ltxsattribute ref"Course" use"required"/gt
lt!-- C-XML forall x (Course(x)gt
exists 0 ltx1, x2, x3gt (Course(x) Student(x1)
Semester(x2) Grade(x3) )) --gt ltxsattribute
name"Grade" type"xsstring" use"required"/gt
lt/xscomplexTypegt lt/xselementgt
24(No Transcript)
25Root Element
ltxsschema gt ltxselement name"Root"gt
ltxscomplexTypegt ltxsallgt
ltxselement ref"Students"/gt
ltxselement ref"Courses"/gt
ltxselement ref"GradStudents"/gt
ltxselement ref"UndergradStudents"/gt
lt/xsallgt lt/xscomplexTypegt ...
lt/xselementgt ... lt/xsschemagt
Students
Courses
GradStudents
UndergradStudents
26Uniqueness Constraints
ltxselement name"Students"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Student"
maxOccurs"unbounded"gt
ltxscomplexTypegt ...
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
ltxskey name"StudentOID-Key"gt
ltxsselector xpath"./Student"/gt
ltxsfield xpath"_at_StudentOID"/gt lt/xskeygt
ltxskey name"StudentID-Key"gt
ltxsselector xpath"./Student"/gt
ltxsfield xpath"_at_StudentID"/gt lt/xskeygt
lt/xselementgt
27Generalization/Specialization
ltxskeyref name"UndergradStudentOID-Keyref"
refer"StudentOID-Key"gt ltxsselector
xpath"./UndergradStudents/UndergradStudent"/gt
ltxsfield xpath"_at_UndergradStudentOID"/gt
lt/xskeyrefgt ltxskeyref
name"GradStudentOID-Keyref" refer"StudentOID-Key
"gt ltxsselector xpath"./GradStudents/Gra
dStudent"/gt ltxsfield xpath"_at_GradStudent
OID"/gt lt/xskeyrefgt
28XML Schema to C-XML
29(No Transcript)
30Algorithm Overview
- Generate object sets for each element attribute
- Specify built-in and simple types in data frames
- Obtain relationship sets from parent-child
connections - Obtain participation constraints from minOccurs,
maxOccurs, and use constraints
31Attribute Transformation
32Element Transformation
33Choice Transformation
34Sequence Transformation
35Key Constraints Transformation
36Substitution Group Extension Transformation
37Observation on Transformations
- These transformations to and from C-XML are not
inverses of one another - However,
C-XML
XML Schema
XML Schema
C-XML
38Demo
39Property Guarantees
40Transformation Properties C-XML to XML
Schema
- Theorem 1 preserves information.
Proof injective - Theorem 2 Allowing for pragma constraints,
preserves constraints.
Proof by construction - Theorem 3 yields an XML-Schema instance whose
complying XML documents are redundancy free.
Proof TKDE, Aug06
41Transformation Properties XML Schema to C-XML
- Theorem 4 preserves information.
Proof
injective - Theorem 5 preserves constraints.
Proof by
construction
42Conclusions
- C-XML models XML conceptually
- Transformations
- C-XML to XML
- Reverse-engineer XML to C-XML
- Properties
- Information preserving
- Constraint preserving
- Redundancy-free guarantee
www.deg.byu.edu