Introduction to XML Databases - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Introduction to XML Databases

Description:

attributes can be attached to the opening tag. ... These 2 objects for an OODB are identical (the same) ... Elements must have at most one ID attribute ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 23
Provided by: sylvia62
Category:

less

Transcript and Presenter's Notes

Title: Introduction to XML Databases


1
Introduction to XML Databases
  • Sylvia Osborn

2
What is XML?
  • XML (eXtensible Markup Language) is a language
    that describes text and tags.
  • It is defined by a number of documents available
    on the w3c (world wide web consortium) web site
    (http//www.w3.org).
  • Traditionally, tagged text has been used for
    publishing (e.g. SGML) documents.
  • HTML also consists of tags and text, but HTML has
    a pre-defined set of tags to markup (or show some
    formatting) for text that is to be displayed in a
    web browser.
  • there was a lot of interest in the database
    community in semi-structured data, i.e. data
    with some structure, and with some parts that
    were just very long. E.g. a book would have a
    title, author, publisher and then just the book
    contents.
  • When we have any collection of similar objects on
    a persistent storage medium, we want to query
    them. We will see a couple of query language
    ideas for XML.
  • Like most standards, things keep evolving.

3
An Sample XML Document
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltstudentList xmlnsxsi"http//www.w3.org/2001/XML
    Schema-instance" xsinoNamespaceSchemaLocation"Z
    \853\exampleschema.xsd"gt
  • ltstudentgt
  • ltNamegtJohn Doelt/Namegt
  • ltStudentIDgt123456lt/StudentIDgt
  • ltAddressgt
  • ltStNumbergt414lt/StNumbergt
  • ltStNamegtPine Streetlt/StNamegt
  • ltCitygtLondonlt/Citygt
  • lt/Addressgt
  • lt/studentgt
  • ltstudentgt
  • ltNamegtBob Daylt/Namegt
  • ltStudentIDgt9876543212lt/StudentIDgt
  • ltAddressgt
  • ltPOBoxgtBox 255lt/POBoxgt
  • ltCitygtLondonlt/Citygt
  • lt/Addressgt
  • lt/studentgt

4
Some parts of XML
  • an XML element is any text properly nested
    between two matching tags
  • ltsampleTaggt ... lt/sampleTaggt.
  • the name of the element is the tag text (here
    sampleTag)
  • content is the text between the tags
  • ltsampleTaggtexample contentlt/sampleTaggt
  • ltsampleTaggt example content lt/sampleTaggt
  • has different content i.e. the blanks are part
    of the tagged text.
  • parent-child relationships between elements are
    given by the nesting of the tags. In the previous
    example, Name is a child of student, and
    studentList is a parent of student.

5
XML parts, contd
  • attributes can be attached to the opening tag. In
    the above example, there are two attributes for
    studentList. Their values are enclosed in .
  • attribute values must be enclosed in quotes.
  • Element text is not quoted.
  • empty elements can be given by ltPOBox/gt.
  • An element can be empty but have attribute
    values, e.g. ltsampleTag attr1value
    attr2value2/gt
  • comments are any text enclosed in lt! ... gt. In
    Galax, a comment is enclosed in ( )
  • processing instructions are enclosed in lt? ...
    ?gt and may be used by the XML processor receiving
    the document.

6
How does XML differ from HTML?
  • XML documents can contain any number of different
    tags chosen by the document author, whereas HTML
    tags are predefined in the official specification
    of HTML
  • In XML, all opening tags must have a matching
    closing tag. Some HTML tags (like ltpgt, which
    starts a new paragraph) do not need closing tags.
    Others may be omitted and the browsers won't
    complain.

7
A well-formed XML Document
  • is one in which
  • there is a root element
  • the tags are properly nested, and every opening
    tag is followed by the corresponding closing tag
  • attributes occur at most once in an opening tag,
    and their values must be given and must be in
    quotes.
  • At first glance, they look like aggregation
    hierarchies, expressed a different way. There
    are some differences...

8
Differences between XML documents and database
objects
  • XML is a markup language, so the following is a
    legal ltAddressgt
  • ltAddressgt John Doe lives on
  • at house number ltStNumbergt 123 lt/StNumbergt
  • on ltStNamegt Main St. lt/StNamegt in
  • ltCitygt London lt/Citygt
  • which is on the north side, at the corner.
  • lt/Addressgt
  • The data existing outside of the structure would
    not be allowed in a database system.

9
Differences, contd
  • 2. XML documents are ORDERED. These 2 objects for
    an OODB are identical (the same)
  • OID 123 StNumber "123", StName "Main
    Street"
  • OID 123 StName "Main Street", StNumber
    "123"
  • whereas these 2 XML documents are different
  • ltAddressgt ltStNamegt Main St. lt/StNamegt
  • ltStNumbergt 123 lt/StNumbergt
  • lt/Addressgt
  • ltAddressgt ltStNumbergt 123 lt/StNumbergt
  • ltStNamegt Main St. lt/StNamegt
  • lt/Addressgt

10
Schema definitions for XMLdocuments
  • There are two ways of specifying document type
    definitions DTDs and XML Schema
  • DTDs are older XML Schema is fairly new
  • DTD syntax is not in XML, XML Schema is
  • DTDs allow you to specify the elements, their
    nesting, the attributes, and to give a data type
    to the element contents.
  • With DTDs, the choice of data types is extremely
    limited - basically just PCDATA which means
    parsed character data
  • attributes in a DTD can be of types string, ID,
    IDREF, IDREFS
  • ID attributes must have a value which is unique
    throughout the document (can act like an object
    ID)
  • IDREF refers to an ID somewhere else, IDREFS is
    one or more IDREF
  • they also did not know about namespaces

11
Sample DTD
  • lt!DOCTYPE PersonList
  • lt!ELEMENT PersonList (Title, Contents)gt
  • lt!ELEMENT Title EMPTY gt
  • lt!ELEMENT Contents (Person)gt
  • lt!ELEMENT Person (Name, Id, Address))gt
  • lt!ELEMENT Name (PCDATA)gt
  • lt!ELEMENT Id (PCDATA)gt
  • lt!ELEMENT Address (Number, Street)gt
  • lt!ELEMENT Number (PCDATA)gt
  • lt!ELEMENT Street (PCDATA)gt
  • lt!ATTLIST PersonList Type CDATA IMPLIED
  • Date CDATA IMPLIEDgt
  • lt!ATTLIST Title Value CDATA REQUIREDgt
  • gt

12
Namespaces
  • a namespace is a way of specifying where the
    definition of a tag should come from, so that
    multiple definitions of the same tag can be used
    in the same document.
  • Namespaces are defined by using the special
    attribute xmlns
  • ltitem xmlns"http//www.somesite.com/mypagepartd
    efs" xmlnsmice"http//www.somesite.com/m
    ypagemicedefs"gt
  • ltnamegtcpult/namegt ltmicenamegtwirelesslt/micenam
    egtlt/itemgt
  • the definitions of item and name come from the
    default namespace defined by the xmlns
    attribute. Anything prefixed by mice is defined
    in the alternate micedefs namespace.
  • you can have many alternate namespaces as long as
    their names are distinct.
  • you can even nest namespace definitions in the
    document. A scoping rule is used.
  • The idea is that within certain areas of
    discourse, e.g. e-commerce, some well accepted
    namespaces will evolve and be used by everyone
    doing this type of business.

13
XML Schema
  • XML Schema is a data definition language for XML
    documents (i.e. it gives the schema definition)
  • XML Schema uses XML syntax
  • it is integrated with the namespace mechanism
  • it provides a number of built-in types
  • it allows us to define complex types
  • it supports key and referential integrity
    constraints
  • it provides a better way than DTDs of specifying
    documents where the order does not matter

14
Structures in XML Schema
  • Primitive types are decimal, integer, float,
    boolean, date, string, ID and IDREF
  • Simple types can also include lists and unions
    (either this structure or that)
  • Simple types can be defined by patterns
  • There are constraints (called restrictions) like
    min and max value
  • Can specify enumerated types (just list all the
    valid values)

15
Examples of Simple Type Definitions in XML Schema
  • ltxssimpleType name"uwoStudentNumberType"gt
  • ltxsrestriction base"xsstring"gt
  • ltxspattern value"1-90-97"/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • ltxssimpleType name"workerAgeType"gt
  • ltxsrestriction base"xsinteger"gt
  • ltxsminInclusive value"14"/gt
  • ltxsmaxInclusive value"65"/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt

16
More Simple Type Examples
  • ltxssimpleType name"eyeColourType"gt
  • ltxsrestriction base"xsstring"gt
  • ltxsenumeration value"blue"/gt
  • ltxsenumeration value"brown"/gt
  • ltxsenumeration value"hazel"/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • ltxssimpleType name"studentIDsType"gt
  • ltxslist itemType"uwoStudentNumberType"/gt
  • lt/xssimpleTypegt

17
Element Types and Complex Types
  • Simple elements are declared as follows
  • ltelement namePersonName typeString/gt
  • Attributes are declared as follows
  • ltattribute nameage typeworkerAgeType/gt
  • Complex types can consist of a sequence of
    elements (which are ordered), an unordered set
    (specified by the tag all), or a choice
  • None of them is default, so a complex type must
    always say what it is

18
Example Complex Type
  • ltxscomplexType name"addressType"gt
  • ltxssequencegt
  • ltxschoicegt
  • ltxselement name"POBox" type"xsstring"/gt
  • ltxssequencegt
  • ltxselement name"StNumber"
    type"xsstring"/gt
  • ltxselement name"StName"
    type"xsstring"/gt
  • lt/xssequencegt
  • lt/xschoicegt
  • ltxselement name"City" type"xsstring"/gt
  • lt/xssequencegt
  • lt/xscomplexTypegt

19
A type with some subelements
  • ltxscomplexType name"StudentType"gt
  • ltxssequencegt
  • ltxselement name"Name" type"xsstring"/gt
  • ltxselement name"StudentID"
    type"uwoStudentNumberType"/gt
  • ltxselement name"Address" type"addressType"/gt
  • lt/xssequencegt
  • lt/xscomplexTypegt

20
To have lists of them
  • ltxscomplexType name"studentListType"gt
  • ltxssequence maxOccurs"unbounded"gt
  • ltxselement name"student" type"StudentType"/gt
  • lt/xssequencegt
  • lt/xscomplexTypegt
  • ltxselement name"studentList" type"studentListTy
    pe"/gt

21
Some data using this schema
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltstudentList xmlnsxsi"http//www.w3.org/2001/XML
    Schema-instance" xsinoNamespaceSchemaLocation"Z
    \853\exampleschema.xsd"gt
  • ltstudentgt
  • ltNamegtJohn Doelt/Namegt
  • ltStudentIDgt12345678lt/StudentIDgt
  • ltAddressgt
  • ltStNumbergt414lt/StNumbergt
  • ltStNamegtPine Streetlt/StNamegt
  • ltCitygtLondonlt/Citygt
  • lt/Addressgt
  • lt/studentgt
  • ltstudentgt
  • ltNamegtBob Daylt/Namegt
  • ltStudentIDgt98765432lt/StudentIDgt
  • ltAddressgt
  • ltPOBoxgtBox 255lt/POBoxgt
  • ltCitygtLondonlt/Citygt
  • lt/Addressgt
  • lt/studentgt

22
ID and IDref
  • An attribute can be declared to be an ID
  • Values of type ID are what XML calls Names
  • These Names must be unique values throughout the
    whole documents (i.e. provide a unique ID, given
    with the data)
  • Elements must have at most one ID attribute
  • Values of type IDREF must match the value of an
    ID attribute for some element in the document
  • Names start with a letter _ or and can contain
    letters, digits, and other characters (see XML
    documents)
  • IDs are like user-defined object ids with a
    specific syntax, and an IDREF is a pointer to them
Write a Comment
User Comments (0)
About PowerShow.com