Title: Normalization Theory for XML
1Normalization Theory for XML
2Why Normalization?
3How to study normalization?
- How do we specify FDs, MVDs?
- What are normalization steps?
- Different Normal Forms.
4XML Schema Structural Spec
- G (N, T, P, S)
- N Book, Author, Publisher, PCDATA
- T book, author, publisher, pcdata
- S Book
- Book ? book (Author , Publisher)
- Author ? author (PCDATA)
- Publisher ? publisher (_at_nameString)
- PCDATA ? pcdata (?)
Regular Tree Grammar Every production rule is of
the form A ? a X A ? N, a ? T, X is a regular
expression over N
5XML Schema Constraint Spec
(Library, Person, lt_at_namegt) (Library, Book,
lt_at_ISBNgt) (Library, Paper, lt_at_titlegt) (Person,
Review, lt_at_articlegt)
_at_articleIDREF references (Book Paper)
6Unnesting for XML Proposal 1
Person
The following FD holds for this
instance article, rating ? name
7Unnesting for XML Proposal 2
for p in //person for name in p//_at_name for
paper in p//_at_PID for article in
p//_at_article for article in p//_at_rating return
ltpersongt ltnamegtnamelt/namegt
ltPIDgtpaperlt/PIDgt ltarticlegtarticlelt/article
gt ltratinggtratinglt/ratinggt lt/persongt
Person
The following FD does not hold for this
instance article, rating ? name
8Example
N Root, Library, Book, Author T root,
library, book, author S Root Root ? root
(Library ) Library ? library (_at_lname, _at_address,
Book) Book ? book (_at_title, _at_loc, Author
) Author ? author (_at_aname, _at_age)
- Key constraints
- (Root, Library, lt_at_lnamegt)
- (Root, Book, lt_at_titlegt)
- (Book, Author, lt_at_anamegt)
- FDs
- (Root, Book, parentlibrary/_at_lname ? _at_loc)
- (Root, Author, _at_aname ? _at_age)
9Properties of FDs
- For any 2 types X, Y, the FD (X, Y, p ? y) is
true, where - p is a path expression producing 1/more
descendant elements of Y - Eg (X, Book, author ? book)
- For any two types X, Y, the FD (X, Y, y ? a) is
true, where - a is an attribute or element that can occur only
once for a given y - Eg (X, Book, book ? _at_title)
- Eg (X, Book, book ? parentlibrary/_at_address)
- Eg (X, Book, book ? parentlibrary)
10Properties of FDs (contd)
- To prove For any FD, (X, Y, S ? a)
- All path expressions in S end only in attributes.
11Normalization Step 1
- For a FD of the form (X, Y, S ? a)
- If ? Z such that (X, Z, S) is a key constraint
- Move a to be child of Z
(Root, Book, parentlibrary/_at_lname ? _at_loc)
12(No Transcript)
13Normalization Step 2
- For a FD of the form (X, Y, S ? a)
- If ? ? Z such that (X, Z, S) is a key constraint
- Create new type Z with (X, Z, S) as key
- Move a to be child of Z
(Root, Author, _at_aname ? _at_age)
14- Key constraints
- (Root, Library, lt_at_lnamegt)
- (Root, Book, lt_at_titlegt)
- (Book, Author, lt_at_anamegt)
- (Root, Author1, lt_at_anamegt)
- Foreign Key constraints
- (Book, Author, lt_at_anamegt) references
- (Root, Author1, lt_at_anamegt)
15Conclusions and Future Work
- Studied 2 ways of unnesting XML documents
- Defined functional dependencies
- Studied normalization Steps
- Improving the goodness of normalized XML schemas.
- Current restriction path expressions cannot
navigate IDREF/(S) - Inference of FDs and MVDs