Semi-structured Data - PowerPoint PPT Presentation

About This Presentation
Title:

Semi-structured Data

Description:

Example of semi-structured data representing a movie and stars ... A DTD for the movie and star database with attributes and. integrity constraints ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 31
Provided by: tson
Learn more at: https://www.cs.nmsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Semi-structured Data


1
Semi-structured Data
2
Facts about the Web
  • Growing fast
  • Popular
  • Semi-structured data
  • Data is presented for human-processing
  • Data is often self-describing (including name
    of attributes within the data fields)

3
Vision for Web data
  • Object-like it can be represented as a
    collection of objects of the form described by
    the conceptual data model
  • Schemaless not conformed to any type structure
  • Self-describing necessary for machine readable
    data

4
Facts about database systems
  • Integration of databases with different schemas
    is often needed
  • Sharing information between different databases
    on the World Wide Web becomes more and more
    important for business

5
Semi-structured data
  • Bridging different data models (relational,
    object-oriented

6
Semi-structured data representation
  • A database of semi-structured data is a graph
    with
  • A set of nodes, each is either a leaf or a
    interior node
  • Each interior node has a set of arcs coming out
    from it, connecting it with another node each
    arc has a label and
  • A root that does not have an arc entering it.
    Every node must be reachable from the root.

7
Root
movie
star
star
starOf
cf
mh
mv
starsIn
address
city
name
address
name
street
title
year
Mark Hamill
Carrie Fisher
Oak
Hwood
Star Wars
1977
street
city
street
city
starOf
Maple
Hwood
Locust
Malibu
starsIn
Example of semi-structured data representing a
movie and stars
8
Information integration via semi-structured data
User
  • Simple
  • Semi-structured data as interface between users
    of different databases (with different schemas)

Interface
DB1
DB2
Application of DB1
Application of DB2
9
XML Overview
  • Simplifying the data exchange between software
    agents
  • Popular thanks to the involvement of W3C (World
    Wide Web Consortium independent organization
  • www.w3c.org)

10
XML Characteristics
  • Simple, open, widely accepted
  • HTML-like (tags) but extensible by users (no
    fixed set of tags)
  • No predefined semantics for the tags (because XML
    is developed not for the displaying purpose)
  • Semantics is defined by stylesheet (later)

11
XML Documents
  • User-defined tags
  • lttaggt info lt/taggt
  • Properly nestedlttag1gt.. lttag2gtlt/tag1gtlt/tag2gt
  • is not valid
  • Root element an element contains all other
    elements
  • Processing instructions lt?command .?gt
  • Comments lt!--- comment --- gt
  • CDATA type
  • DTD

12
XML element
  • Begin with a opening tag of the form
  • ltXML_element_namegt
  • End with a closing tag
  • lt/XML_element_namegt
  • The text between the beginning tag and the
    closing tag is called the content of the element

13
XML element
Star Elelement
Name elelement
  • ltStar-Movie-Datagt
  • ltStargt
  • ltNamegt Carrie Fisher lt/Namegt
  • ltAddressgt ltStreetgt 123 Maple St. lt/Streetgt
    ltCitygt Hollywood lt/Citygt lt/Addressgt
  • ltAddressgt ltStreetgt 5 Locus Ln. lt/Streetgt ltCitygt
    Malibult/Citygt lt/Addressgt
  • lt/Stargt
  • ltStargt
  • ltNamegt Mark Hamill lt/Namegt
  • ltAddressgt ltStreetgt 456 Oak Rd. lt/Streetgt ltCitygt
    Brentwood lt/Citygtlt/Addressgt
  • lt/Stargt
  • ltMoviegt
  • ltTitlegt Star Wars lt/Titlegt ltYeargt1997lt/Yeargt
  • lt/Moviegt
  • lt/ Star-Movie-Datagt

14
XML element
Attribute Value of the attribute
  • ltStar-Movie-Datagt
  • ltStar nameCarrie Fishergt
  • .
  • lt/Stargt
  • lt/ Star-Movie-Datagt

15
Relationship between XML elements
  • Child-parent relationship
  • Elements nested directly in an element are the
    children of this element (Student is a child of
    PersonList, Name is a child of Student, etc.)
  • Ancestor/descendant relationship important for
    querying XML documents (extending the
    child/parent relationship)

16
XML elements Database Objects
  • XML elements can be converted into objects by
  • considering the tags names of the children as
    attributes of the objects
  • Recursive process

Partially converted object
ltStudent StudentID123gt ltNamegt XYZ PQR
lt/Namegt ltCrsTakengt ltCrsNamegtCS582lt/CrsNa
megt ltGradegtAlt/Gradegt lt/CrsTakengt lt/Studen
tgt
(099, Name XYZ PQR CrsTaken
ltCrsNamegtCS582lt/CrsNamegt
ltGradegtAlt/Gradegt )
17
XML elements Database Objects
  • Differences Additional text within XML elements

ltStudent StudentID123gt ltNamegt XYZ PQR
lt/Namegt has taken the following course
ltCrsTakengt Database management system II
ltCrsNamegtCS582lt/CrsNamegt with the grade
ltGradegtAlt/Gradegt lt/CrsTakengt lt/Studentgt
18
XML elements Database Objects
  • Differences XML elements are orderd

ltCrsTakengt ltCrsNamegtCS582lt/CrsNamegt
ltGradegtAlt/Gradegt lt/CrsTakengt
ltCrsTakengt ltGradegtAlt/Gradegt
ltCrsNamegtCS582lt/CrsNamegt lt/CrsTakengt
901, Grade A, CrsName CS582
19
XML Attributes
  • Can occur within an element (arbitrary many
    attributes, order unimportant, same attribute
    only one)
  • Allow a more concise representation
  • Could be replaced by elements
  • Less powerful than elements (only string value,
    no children)
  • Can be declared to have unique value, good for
    integrity constraint enforcement (next slide)

20
XML Attributes
  • Can be declared to be the type of ID, IDREF, or
    IDREFS
  • ID unique value throughout the document
  • IDREF refer to a valid ID declared in the same
    document
  • IDREFS space-separated list of strings of
    references to valid IDs

21
Well-formed XML Document
  • It has a root element
  • Every opening tag is followed by a matching
    closing tag, elements are properly nested
  • Any attribute can occur at most once in a given
    opening tag, its value must be provided, quoted

22
Document Type Definition
  • Set of rules (by the user) for structuring an XML
    document
  • Can be part of the document itself, or can be
    specified via a URL where the DTD can be found
  • A document that conforms to a DTD is said to be
    valid
  • Viewed as a grammar that specifies a legal XML
    document, based on the tags used in the document

23
DTD Components
  • A name must coincide with the tag of the root
    element of the document conforming to the DTD
  • A set of ELEMENTs one ELEMENT for each allowed
    tag, including the root tag
  • ATTLIST statements specifies the allow
    attributes and their type for each tag
  • , , ? like in grammar definition
  • zero or finitely many number
  • at least one
  • ? zero or one

24
DTD Components Element
  • lt!ELEMENT Name definitiongt
  • type, element list etc.
  • Name of the element
  • definition can be EMPTY, (PCDATA), or element
    list (e1,e2,,en) where the list (e1,e2,,en) can
    be shorted using grammar like notation

25
DTD Components Element
  • lt!ELEMENT Name(e1,,en)gt

  • nth element
  • 1st element
  • Name of the element
  • lt!ELEMENT PersonList (Title,Contents)gt
  • lt!ELEMENT Contents(Person )gt

26
DTD Components Element
  • lt!ELEMENT Name EMPTYgt
  • no child for the element Name
  • lt!ELEMENT Name (PCDATA)gt
  • value of Name is a character string
  • lt!ELEMENT Title EMPTYgt
  • lt!ELEMENT Id (PCDATA)gt

27
DTD Components Attribute List
  • lt!ATTLIST EName Att Type Propertygt
    where
  • - Ename name of an element defined in the DTD
  • - Att attribute name allowed to occur in the
    opening tag of Ename
  • - type might/might not be there specify the
    type of the attribute (CDATA, ID, IDREF, IDREFS)
  • - Property either REQUIRED or IMPLIED

28
  • lt!DOCTYPE Stars
  • lt!ELEMENT STARS (STAR)gt
  • lt!ELEMENT STAR(NAME,ADDRESS,MOVIES)gt
  • lt!ELEMENT NAME (PCDATA)gt
  • lt!ELEMENT ADDESS (STREET, CITY)gt
  • lt!ELEMENT STREET (PCDATA)gt
  • lt!ELEMENT CITY (PCDATA)gt
  • lt!ELEMENT MOVIES (MOVIE)gt
  • lt!ELEMENT MOVIE (TITLE, YEAR)gt
  • lt!ELEMENT TITLE (PCDATA)gt
  • lt!ELEMENT YEAR (PCDATA)gt
  • gt

A simple DTD for the movie and star database (no
integrity constraints)
29
  • lt!DOCTYPE Stars-Movies
  • lt!ELEMENT STARS-MOVIES (STAR MOVIES)gt
  • lt!ELEMENT STAR(NAME,ADDRESS)gt
  • lt!ATTLIST STAR starID ID starredIn IDREFgt
  • lt!ELEMENT NAME (PCDATA)gt
  • lt!ELEMENT ADDESS (STREET, CITY)gt
  • lt!ELEMENT STREET (PCDATA)gt
  • lt!ELEMENT CITY (PCDATA)gt
  • lt!ELEMENT MOVIE (TITLE, YEAR)gt
  • lt!ATTLIST MOVIE movieID ID starsOf IDREFgt
  • lt!ELEMENT TITLE (PCDATA)gt
  • lt!ELEMENT YEAR (PCDATA)gt
  • gt

A DTD for the movie and star database with
attributes and integrity constraints
30
Homework 5 (Due Oct 23)
  • 4.2.3 (Pg 146, complete book) (10pt)
  • 4.4.1 (part c, Pg 164, complete book) (10pt)
  • 4.5.4 (Pg 172, complete book) (10pt)
Write a Comment
User Comments (0)
About PowerShow.com