XML - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

XML

Description:

DOM treats XML content as tree and can be used to access XML data stored in databases. ... DOM does not support any form of declarative querying however. ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 21
Provided by: nikisa
Category:
Tags: xml | dom

less

Transcript and Presenter's Notes

Title: XML


1
XML
  • Name Niki Sardjono
  • Class CS 157A
  • Instructor Prof. S. M. Lee

2
Introduction
  • XML stands for Extensible Markup Language
  • Its root is in document managements and derived
    from Standard Generalized Markup Language (SGML)
  • XML can represent Database data and other kinds
    of structured data.

3
Background
  • The root is a document Markup language
  • Markup refers to anything in a document that is
    not meant to be part of the printed output.
  • For the family of markup language (HTML,SGML, and
    XML), the markup takes the form of tags enclosed
    in angle brackets, ltgt, and are always used in
    pair with lttaggt and lt/taggt for beginning and
    ending of the document where the tag refers.
  • Example would be lttitlegt Database lt/titlegt

4
  • Unlike HTML however, XML does not prescribe the
    set of tags allowed and tags can be specialized
    as needed.
  • Compared to storage of data in database, XML can
    be inefficient since tag names are repeated
    throughout the documents. However XML can have an
    advantage if its used to exchange data.
  • - the presence of tags makes message self
    documenting (schema dont need to be
    consulted to understand meaning of text).
  • - The format of the document is not rigid.
  • - XML format is widely accepted.
  • XML in a sense is becoming the dominant
    format for data exchange.

5
Structure of XML Data
  • The fundamental construct in XML document is the
    element (a pair of matching start-and end-tags
    and the text between them)
  • XML documents must have a single root element
    that encompasses all other elements in a
    document. Examples ltaccountgt ltbalan
    cegt lt/balancegt lt/accountgt
  • Text is said to appear in the context of an
    element if it appears between the start-tag and
    end-tag of that element and tags are properly
    nested if every start-tag has a unique matching
    end-tag that is in the context of the same parent
    element.

6
  • Nested representations are widely used in XML
    data interchange applications to avoid joins
  • XML specifies the notion of an attribute.
  • Attributes are strings and do not contain markup,
    and can appear only once in a given tag.

7
  • Example would be
    ltaccount
    acct-type checkinggt
    ltaccount-numbergt A-120lt/account numbergt
    ltbranch-namegt Perryridge
    lt/branch-namegt ltbalancegt 400
    lt/balancegt
    lt/accountgt
  • A name space mechanism has been introduced in XML
    to allow organizations to specify globally unique
    names to be used as element tags.
  • The idea is to prepend each tag or attribute with
    a universal resource identifier (Example would be
    Web Address.), but using long namespace would be
    inconvenient, so namespace standard provides a
    standard to use abbreviation for identifiers.

8
  • Example ltbank xminsFB http//www.FirstBank.
    comgt ltFBbranchgt
  • .
  • lt/FBbranchgt
  • We can use default namespace in the example above
    by using xmins instead of xminsFB. In the root
    element.

9
XML Document SchemaDocument type definition
  • DTD (Document Type Definition) is an optional
    part of XML.
  • The main purpose of DTD
  • To constrain and type the information present in
    the document, but only constrains the appearance
    of subelements and attributes within an element.
  • DTD is a list of rules for what pattern of
    subelements appear within an element.
  • Operators used are
  • specifies one or more
  • specifies or
  • specifies zero or more
  • ? specifies optional elements

10
  • Attributes can be specified into several types
    such as
  • CDATA character data
  • ID unique identifier for the element.
  • IDREF a reference to an element which uses a
    value that appears in ID attribute in
    some elements in the document.
  • IDREFS is a list of identifiers.
  • Limitations on DTDs as schema mechanism
  • Individual text elements and attributes cannot be
    further typed, which is quite problematic for
    data processing and exchange applications.
  • Difficult to use DTD to specify unordered sets of
    subelements.
  • Lack of typing in ID IDREF which will lead to
    impossibility to specify the type of element to
    which an IDREF IDREFS should refer.

11
XMLSchema
  • XMLSchema is a more sophisticated schema language
    compared to DTD.
  • Benefits compared to DTD
  • Allows user-defined types to be created.
  • Allows the text that appears in elements to be
    constrained to specific types.
  • Allows types to be restricted to create
    specialized types, for instance by specifying min
    and max values.
  • Allows complex types to be extended by using form
    of inheritance.
  • Is a superset of DTDs.
  • Allow uniqueness and foreign key constraints.
  • It is integrated with namespaces to allow
    different parts of documents to conform to
    different Schema.
  • It is itself specified by XML syntax.
  • Disadvantage of it is XMLSchema is significantly
    more complicated compared to DTDs.

12
Querying and Transformation
  • Querying and Transformation are essential to
    extract information from large bodies of XML
    data, and convert it to different representations
    (schemas) in XML.
  • Several languages provide increasing degrees of
    querying and transformation capabilities
  • XPath is a language for path expressions, and is
    actually a building block for the remaining two
    query languages.
  • XSLT is the transformation language (part of XLS
    style sheet system, used to control the
    formatting of XML data to HTML or other). It can
    generate XML as output.
  • XQuery is the standard for querying of XML data.
  • All of these languages use the tree model of XML
    data, where nodes correspond to elements and
    attributes.

13
XPath
  • Path expression in XPath is a sequence of
    locations steps separated by /. Example would
    be /bank-2/customer/name/text()
  • Its the same with directory structure where the
    initial / is the root and the other / are above.
    It is also inspected from left to right.
  • If an element name appears before the next /,
    it will refer to all the elements of the
    specified name that are children of elements in
    the current element set. Attributes can also be
    accessed by using the character _at_. Example
    would be /bank-2/account/_at_account-num which
    will return a set of all values of account-number
    attributes of account elements. IDREF however by
    default are not followed.

14
  • Xpath supports a number of other features
  • Selection predicates may follow any step in a
    path and contained in square brackets. Example
    /bank-2/accountbalance gt 400.
  • Provides several functions that can be used as
    part of predicates including testing the position
    of the current node in sibling order and counting
    the number match. Example
    /bank-2/account/customer/count()gt2
  • Function id(foo) returns nodes(if any) with an
    attribute of type ID and value foo.
  • The operator allows expression results to be
    unioned. For example /bank-2/account/id(_at_
    owner) /bank-2/loan/id(_at_borrower) will return
    customers with either accounts or loans. However,
    the operator cant be nested inside other
    operators.
  • Can skip multiple level of nodes by using //
  • Each step need not select from the children of
    the nodes in the current node set.

15
XSLT
  • XML Style Language (XSL) was originally designed
    for generating HTML from XML. The language
    however includes a general-purpose transformation
    mechanism, called XSL Transformation (XSLT).
  • XSLT transformations is expressed as a series of
    recursive rules, called templates.
  • Structural recursion is important in XSLT due to
    the fact that the data are based on tree
    structure. So XSLT can use recursion to apply
    template rules recursively on subtrees.
  • XSLT has a feature called key which is similar to
    id() in goals, but can use more than the ID
    attributes. Example
    ltxslkey name acctno matchacctno
    useaccount number/gt where name is to
    distinguish keys, match to specify which nodes
    the key applies, and use which expressions to be
    used as value of the key.

16
XQuery
  • Built by the world wide web consortium (w3c).
  • Organized into FLWR comprising of for, let,
    where, and return.
  • for gives a series of variables that range over
    the results of XPath expressions. Where more than
    one var. is specified, the result will include
    Cartesian product of possible values the variable
    can take.
  • let allow complicated expressions to be assigned
    to variable names for simplicity of
    representation.
  • where performs additional tests on joined tuples
    from the for section.
  • return allows the construction of result in XML.
  • Example for x in /bank-2/account
    let acctno x/_at_account-number
    where
    x/balance gt 400
    return
    ltaccount-numbergt acctnolt/account-numbergt

17
Application Program Interface
  • Two standards which is DOM (document object
    model) and SAX (Simple API for XML).
  • DOM treats XML content as tree and can be used to
    access XML data stored in databases. XML
    databases can also be built using DOM as its
    primary interface for accessing and modifying
    data. DOM does not support any form of
    declarative querying however.
  • SAX is an event model, where it provides a common
    interface between parsers and applications.

18
Storage of XML Data
  • Using a relational database.
  • If data from XML was generated from relational
    schema, the converting process is straight
    forward. If its not however, there are several
    alternatives to approach this problem
  • Store as string
  • store each child element of the top-level
    element as a string in a separate tuple in
    database. It is easy to use, however the database
    system does not know the schema of the stored
    elements. A partial solution to that problem
    would be to store different types of elements in
    different relations, and also store the values of
    some critical elements as attributes of the
    relation to enable indexing. Drawback of this
    type of storage is that a large part of the XML
    information is stored within strings.

19
  • Tree representation

    use a tree structure where elements attributes
    in XML data is given a unique identifier. Tuple
    inserted in the nodes deoends on identifier(id),
    type (attribute or element), the name of the
    element or attribute(label), and the ext value of
    element or attribute(value). Advantage would be
    that all XML information can be represented
    directly in relational form, and many XML queries
    can be translated into relational queries and
    executed inside the database system. The drawback
    would be that each element gets broken up into so
    many pieces and will require a large number of
    join to assemble elements.
  • Map to relations

    XML elements whose schema is known are mapped to
    relations and attributes. If its unknown it will
    be stored as strings or as tree representation.
  • There is also Nonrelational Data Stores which is
  • Store in flat files

    lacks data isolation, integrity checks,
    atomicity, concurrent access, and security.
  • Store in an XML Database

20
XML Applications
  • Central goal is to make it easy to communicate
    information on the Web and between applications.
Write a Comment
User Comments (0)
About PowerShow.com