XML - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

XML

Description:

XML has become the basis for all new generation data ... XML-QL, Quilt, XQL, ... Database Design and Implementation, Yan Huang. Tree Model of XML Data ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 40
Provided by: Yan174
Category:
Tags: xml | become | model

less

Transcript and Presenter's Notes

Title: XML


1
XML
2
Introduction
  • XML Extensible Markup Language
  • Defined by the WWW Consortium (W3C)
  • Originally intended as a document markup language
    not a database language
  • Documents have tags giving extra information
    about sections of the document
  • E.g. lttitlegt XML lt/titlegt ltslidegt Introduction
    lt/slidegt
  • Derived from SGML (Standard Generalized Markup
    Language), but simpler to use than SGML
  • Extensible, unlike HTML
  • Users can add new tags, and separately specify
    how the tag should be handled for display
  • Goal was (is?) to replace HTML as the language
    for publishing documents on the Web

3
XML Introduction (Cont.)
  • The ability to specify new tags, and to create
    nested tag structures made XML a great way to
    exchange data, not just documents.
  • Much of the use of XML has been in data exchange
    applications, not as a replacement for HTML
  • Tags make data (relatively) self-documenting
  • ltbankgt
  • ltaccountgt
  • ltaccount-numbergt A-101
    lt/account-numbergt
  • ltbranch-namegt Downtown
    lt/branch-namegt
  • ltbalancegt 500
    lt/balancegt
  • lt/accountgt
  • ltdepositorgt
  • ltaccount-numbergt A-101
    lt/account-numbergt
  • ltcustomer-namegt Johnson
    lt/customer-namegt
  • lt/depositorgt
  • lt/bankgt

4
XML Motivation
  • Data interchange is critical in todays networked
    world
  • Examples
  • Banking funds transfer
  • Order processing (especially inter-company
    orders)
  • Scientific data
  • Chemistry ChemML,
  • Genetics BSML (Bio-Sequence Markup Language),
  • Paper flow of information between organizations
    is being replaced by electronic flow of
    information
  • Each application area has its own set of
    standards for representing information
  • XML has become the basis for all new generation
    data interchange formats

5
XML Motivation (Cont.)
  • Earlier generation formats were based on plain
    text with line headers indicating the meaning of
    fields
  • Similar in concept to email headers
  • Does not allow for nested structures, no standard
    type language
  • Tied too closely to low level document structure
    (lines, spaces, etc)
  • Each XML based standard defines what are valid
    elements, using
  • XML type specification languages to specify the
    syntax
  • DTD (Document Type Descriptors)
  • XML Schema
  • Plus textual descriptions of the semantics
  • XML allows new tags to be defined as required
  • However, this may be constrained by DTDs
  • A wide variety of tools is available for parsing,
    browsing and querying XML documents/data

6
Structure of XML Data
  • Tag label for a section of data
  • Element section of data beginning with lttagnamegt
    and ending with matching lt/tagnamegt
  • Elements must be properly nested
  • Proper nesting
  • ltaccountgt ltbalancegt . lt/balancegt lt/accountgt
  • Improper nesting
  • ltaccountgt ltbalancegt . lt/accountgt lt/balancegt
  • Formally every start tag must have a unique
    matching end tag, that is in the context of the
    same parent element.
  • Every document must have a single top-level
    element

7
Example of Nested Elements
  • ltbank-1gt ltcustomergt
  • ltcustomer-namegt Hayes lt/customer-namegt
  • ltcustomer-streetgt Main lt/customer-streetgt
  • ltcustomer-citygt Harrison
    lt/customer-citygt
  • ltaccountgt
  • ltaccount-numbergt A-102 lt/account-numbergt
  • ltbranch-namegt Perryridge
    lt/branch-namegt
  • ltbalancegt 400 lt/balancegt
  • lt/accountgt
  • ltaccountgt
  • lt/accountgt
  • lt/customergt . .
  • lt/bank-1gt

8
Motivation for Nesting
  • Nesting of data is useful in data transfer
  • Example elements representing customer-id,
    customer name, and address nested within an order
    element
  • Nesting is not supported, or discouraged, in
    relational databases
  • With multiple orders, customer name and address
    are stored redundantly
  • normalization replaces nested structures in each
    order by foreign key into table storing customer
    name and address information
  • Nesting is supported in object-relational
    databases
  • But nesting is appropriate when transferring data
  • External application does not have direct access
    to data referenced by a foreign key

9
Structure of XML Data (Cont.)
  • Mixture of text with sub-elements is legal in
    XML.
  • Example
  • ltaccountgt
  • This account is seldom used any more.
  • ltaccount-numbergt A-102lt/account-numbergt
  • ltbranch-namegt Perryridgelt/branch-namegt
  • ltbalancegt400 lt/balancegtlt/accountgt
  • Useful for document markup, but discouraged for
    data representation

10
Attributes
  • Elements can have attributes
  • ltaccount acct-type checking gt
  • ltaccount-numbergt A-102
    lt/account-numbergt
  • ltbranch-namegt Perryridge
    lt/branch-namegt
  • ltbalancegt 400 lt/balancegt
  • lt/accountgt
  • Attributes are specified by namevalue pairs
    inside the starting tag of an element
  • An element may have several attributes, but each
    attribute name can only occur once
  • ltaccount acct-type checking monthly-fee5gt

11
Attributes Vs. Subelements
  • Distinction between subelement and attribute
  • In the context of documents, attributes are part
    of markup, while subelement contents are part of
    the basic document contents
  • In the context of data representation, the
    difference is unclear and may be confusing
  • Same information can be represented in two ways
  • ltaccount account-number A-101gt .
    lt/accountgt
  • ltaccountgt ltaccount-numbergtA-101lt/account-numb
    ergt lt/accountgt
  • Suggestion use attributes for identifiers of
    elements, and use subelements for contents

12
More on XML Syntax
  • Elements without subelements or text content can
    be abbreviated by ending the start tag with a /gt
    and deleting the end tag
  • ltaccount numberA-101 branchPerryridge
    balance200 /gt
  • To store string data that may contain tags,
    without the tags being interpreted as
    subelements, use CDATA as below
  • lt!CDATAltaccountgt lt/accountgtgt
  • Here, ltaccountgt and lt/accountgt are treated as
    just strings

13
Namespaces
  • XML data has to be exchanged between
    organizations
  • Same tag name may have different meaning in
    different organizations, causing confusion on
    exchanged documents
  • Specifying a unique string as an element name
    avoids confusion
  • Better solution use unique-nameelement-name
  • Avoid using long unique names all over document
    by using XML Namespaces
  • ltbank XmlnsFBhttp//www.FirstBank.comgt
  • ltFBbranchgt
  • ltFBbranchnamegtDowntownlt/FBbranchnamegt
  • ltFBbranchcitygt Brooklyn lt/FBbranchcitygt
  • lt/FBbranchgt
  • lt/bankgt

14
XML Document Schema
  • Database schemas constrain what information can
    be stored, and the data types of stored values
  • XML documents are not required to have an
    associated schema
  • However, schemas are very important for XML data
    exchange
  • Otherwise, a site cannot automatically interpret
    data received from another site
  • Two mechanisms for specifying XML schema
  • Document Type Definition (DTD)
  • Widely used
  • XML Schema
  • Newer, increasing use

15
Document Type Definition (DTD)
  • The type of an XML document can be specified
    using a DTD
  • DTD constraints structure of XML data
  • What elements can occur
  • What attributes can/must an element have
  • What subelements can/must occur inside each
    element, and how many times.
  • DTD does not constrain data types
  • All values represented as strings in XML
  • DTD syntax
  • lt!ELEMENT element (subelements-specification) gt
  • lt!ATTLIST element (attributes) gt

16
Element Specification in DTD
  • Subelements can be specified as
  • names of elements, or
  • PCDATA (parsed character data), i.e., character
    strings
  • EMPTY (no subelements) or ANY (anything can be a
    subelement)
  • Example
  • lt! ELEMENT depositor (customer-name
    account-number)gt
  • lt! ELEMENT customer-name (PCDATA)gt
  • lt! ELEMENT account-number (PCDATA)gt
  • Subelement specification may have regular
    expressions
  • lt!ELEMENT bank ( ( account customer
    depositor))gt
  • Notation
  • - alternatives
  • - 1 or more occurrences
  • - 0 or more occurrences

17
Bank DTD
  • lt!DOCTYPE bank
  • lt!ELEMENT bank ( ( account customer
    depositor))gt
  • lt!ELEMENT account (account-number branch-name
    balance)gt
  • lt! ELEMENT customer(customer-name
    customer-street

    customer-city)gt
  • lt! ELEMENT depositor (customer-name
    account-number)gt
  • lt! ELEMENT account-number (PCDATA)gt
  • lt! ELEMENT branch-name (PCDATA)gt
  • lt! ELEMENT balance(PCDATA)gt
  • lt! ELEMENT customer-name(PCDATA)gt
  • lt! ELEMENT customer-street(PCDATA)gt
  • lt! ELEMENT customer-city(PCDATA)gt
  • gt

18
Attribute Specification in DTD
  • Attribute specification for each attribute
  • Name
  • Type of attribute
  • CDATA
  • ID (identifier) or IDREF (ID reference) or IDREFS
    (multiple IDREFs)
  • more on this later
  • Whether
  • mandatory (REQUIRED)
  • has a default value (value),
  • or neither (IMPLIED)
  • Examples
  • lt!ATTLIST account acct-type CDATA checkinggt
  • lt!ATTLIST customer
  • customer-id ID REQUIRED
  • accounts IDREFS REQUIRED gt

19
IDs and IDREFs
  • An element can have at most one attribute of type
    ID
  • The ID attribute value of each element in an XML
    document must be distinct
  • Thus the ID attribute value is an object
    identifier
  • An attribute of type IDREF must contain the ID
    value of an element in the same document
  • An attribute of type IDREFS contains a set of (0
    or more) ID values. Each ID value must contain
    the ID value of an element in the same document

20
Bank DTD with Attributes
  • Bank DTD with ID and IDREF attribute types.
  • lt!DOCTYPE bank-2
  • lt!ELEMENT account (branch, balance)gt
  • lt!ATTLIST account
  • account-number ID
    REQUIRED
  • owners IDREFS
    REQUIREDgt
  • lt!ELEMENT customer(customer-name,
    customer-street,

  • customer-city)gt
  • lt!ATTLIST customer
  • customer-id ID
    REQUIRED
  • accounts IDREFS
    REQUIREDgt
  • declarations for branch, balance,
    customer-name,
    customer-street and customer-citygt

21
XML data with ID and IDREF attributes
  • ltbank-2gt
  • ltaccount account-numberA-401 ownersC100
    C102gt
  • ltbranch-namegt Downtown lt/branch-namegt
  • ltbalancegt 500 lt/balancegt
  • lt/accountgt
  • ltcustomer customer-idC100 accountsA-401gt
  • ltcustomer-namegtJoe
    lt/customer-namegt
  • ltcustomer-streetgt Monroe
    lt/customer-streetgt
  • ltcustomer-citygt Madisonlt/customer-ci
    tygt
  • lt/customergt
  • ltcustomer customer-idC102 accountsA-401
    A-402gt
  • ltcustomer-namegt Mary
    lt/customer-namegt
  • ltcustomer-streetgt Erin
    lt/customer-streetgt
  • ltcustomer-citygt Newark
    lt/customer-citygt
  • lt/customergt
  • lt/bank-2gt

22
XML Schema
  • XML Schema is a more sophisticated schema
    language which addresses the drawbacks of DTDs.
    Supports
  • Typing of values
  • E.g. integer, string, etc
  • Also, constraints on min/max values
  • User defined types
  • Is itself specified in XML syntax, unlike DTDs
  • More standard representation, but verbose
  • Is integrated with namespaces
  • Many more features
  • List types, uniqueness and foreign key
    constraints, inheritance ..
  • BUT significantly more complicated than DTDs,
    not yet widely used.

23
Querying and Transforming XML Data
  • Translation of information from one XML schema to
    another
  • Querying on XML data
  • Above two are closely related, and handled by the
    same tools
  • Standard XML querying/translation languages
  • XPath
  • Simple language consisting of path expressions
  • XSLT
  • Simple language designed for translation from XML
    to XML and XML to HTML
  • XQuery
  • An XML query language with a rich set of features
  • Wide variety of other languages have been
    proposed, and some served as basis for the Xquery
    standard
  • XML-QL, Quilt, XQL,

24
Tree Model of XML Data
  • Query and transformation languages are based on a
    tree model of XML data
  • An XML document is modeled as a tree, with nodes
    corresponding to elements and attributes
  • Element nodes have children nodes, which can be
    attributes or subelements
  • Text in an element is modeled as a text node
    child of the element
  • Children of a node are ordered according to their
    order in the XML document
  • Element and attribute nodes (except for the root
    node) have a single parent, which is an element
    node
  • The root node has a single child, which is the
    root element of the document
  • We use the terminology of nodes, children,
    parent, siblings, ancestor, descendant, etc.,
    which should be interpreted in the above tree
    model of XML data.

25
XPATH
  • XPath is used to address (select) parts of
    documents using path expressions
  • A path expression is a sequence of steps
    separated by /
  • Think of file names in a directory hierarchy
  • Result of path expression set of values that
    along with their containing elements/attributes
    match the specified path

26
XPATH
  • E.g. /regionSet/region/COUNTRIES/COUNTRIES_I
    TEM/COUNTRY_NAME evaluated on the regions
    returns
  •  
  • ltCOUNTRY_NAMEgtSudanlt/COUNTRY_NAMEgt
  • ltCOUNTRY_NAMEgtSwazilandlt/COUNTRY_NAMEgt
  •  
  •   ltCOUNTRY_NAMEgtSouth Georgia and the South
    Sandwich Islandslt/COUNTRY_NAMEgt
  •  

27
XPath Example
  • Sample XML
  • The root element
  • /STATES
  • The SCODE of all STATE elements of STATES element
  • /STATES/STATE/SCODE
  • All the CAPTIAL element with a CNAME sub-element
    of the STATE element of the STATES element
  • /STATES/STATE/CAPITALCNAMEAtlanta
  • All CITIES elements in the XML document
  • //CITIES

28
More XPath Example
  • Element AA with two ancestors
  • ///AA
  • First BB element of AA element
  • /AA/BB1
  • All the CC elements of the BB elements which has
    an sub-element A with value 3
  • /BBA3/CC
  • Any elements AA or elements CC of elements BB
  • //AA /BB/CC

29
Even More XPath Example
  • Select all sub-elements of elements BB of
    elements AA
  • /BB/AA/
  • When you do not know the sub-elements
  • Different from /BB/AA
  • Select all attributes named aa
  • //_at_aa
  • Select all CITIES elements with an attribute
    named aa
  • //CITIES_at_aa
  • Select all CITIES elements with an attribute
    named aa with value 123
  • //CITIES_at_aa 123

30
Axis
  • Context node
  • Evaluation of XPath is from left to right
  • The context node the current node (set) being
    evaluated
  • Axis
  • Specifies the relationship of the resulting
    nodes relative to context node
  • Example
  • /childAA children of AA, abbreviated by /AA
  • //AA/ancestorBB BB elements who are ancestor
    of any AA elements

31
Axes
  • ancestor //BBB/ancestor
  •  ltAAAgt           ltBBB/gt           ltCCC/gt
              ltBBB/gt           ltBBB/gt
              ltDDDgt                ltBBB/gt
              lt/DDDgt           ltCCC/gt   lt/AAAgt

32
Axes
  • ancestor //BBB/ancestorDDD
  •  ltAAAgt           ltBBB/gt           ltCCC/gt
              ltBBB/gt           ltBBB/gt
              ltDDDgt                ltBBB/gt
              lt/DDDgt           ltCCC/gt   lt/AAAgt

33
Axes
  • attribute Contains all attributes of the current
    node
  • //BBB/attribute abbreviated by //_at_
  • ltAAAgt           ltBBB aa1/gt           ltCCC/gt
              ltBBB aa2 /gt           ltBBB aa3
    /gt           ltDDDgt                ltBBB bb31
    /gt           lt/DDDgt           ltCCC/gt   lt/AAAgt
  • //BBB/attributebb

34
Axes
  • child
  • /AAA/DDD/childBBB child can be omitted for
    abbreviation
  •  ltAAAgt           ltBBB/gt           ltCCC/gt
              ltBBB/gt           ltBBB/gt
              ltDDDgt                ltBBB/gt
              lt/DDDgt           ltCCC/gt   lt/AAAgt

35
Axes
  • descendant
  • /AAA/descendent
  • ltAAAgt           ltBBB/gt           ltCCC/gt
              ltBBB/gt           ltBBB/gt
              ltDDDgt                ltBBB/gt
              lt/DDDgt           ltCCC/gt   lt/AAAgt
  • /AAA/descendentCCC ?

36
Axes
  • parent
  • //BBB/parent
  • ltAAAgt           ltBBB/gt           ltCCC/gt
              ltBBB/gt           ltBBB/gt
              ltDDDgt                ltBBB/gt
              lt/DDDgt           lt CCC/gt   lt/AAAgt
  • //BBB/parentDDD ?

37
Axes
  • descendant-or-self
  • following
  • following-sibling
  • preceding
  • preceding-sibling
  • self

38
Predicates
  • Filters a element set
  • A predicate is placed inside square brackets (
    )
  • Example //BBBposition() mod 2 0
  •    ltAAAgt           ltBBB/gt           ltBBB/gt
              ltBBB/gt           ltBBB/gt
              ltBBB/gt           ltBBB/gt
              ltBBB/gt           ltBBB/gt
              ltCCC/gt           ltCCC/gt
              ltCCC/gt      lt/AAAgt

39
Predicates
  • //BBB_at_aa31
  • ltAAAgt           ltBBB aa1/gt           ltCCC/gt
              ltBBB aa2 /gt           ltBBB aa3
    /gt           ltDDDgt                ltBBB bb31
    /gt           lt/DDDgt           ltCCC/gt   lt/AAAgt
  • Is it different from //BBB/attributebb?
Write a Comment
User Comments (0)
About PowerShow.com