Data Storage and Retrieval in an XML World PowerPoint PPT Presentation

presentation player overlay
1 / 37
About This Presentation
Transcript and Presenter's Notes

Title: Data Storage and Retrieval in an XML World


1
Data Storage and Retrieval in an XML World
  • By
  • Dare Obasanjo

2
What is XML?
  • eXtensible Markup Language
  • Meta-markup language developed by W3C to deal
    with shortcomings of HTML
  • HTML grew too complex and unwieldy (over 100 tags
    in latest version).
  • XML allowed for domain specific markup.
  • Semantics not document specific but application
    specific.
  • XML is a subset of the Standardized and General
    Markup Language (SGML).

3
XML and Data
  • XML also provided a way to describe structured
    data
  • XML has many advantages as a data storage and
    interchange format
  • Built in support for internationalization via
    unicode.
  • Platform independence.
  • Human readable format makes it easier to debug.
  • Extensibility - new versions of a format dont
    have to break apps based on older versions of the
    format.
  • Large number of off-the-shelf tools for
    processing XML documents already exist.

4
Structuring XML
  • Since XML is a way to describe structured data
    there should be a means to specify the structure
    of an XML document.
  • DTDs and Schemas are different mechanisms for
    providing a grammar for an XML document.
  • An XML document that conforms to a DTD or schema
    is considered to be valid.

5
Document Type Definitions (DTDs)
  • DTDs were inherited from SGML.
  • DTDs have a different syntax from XML
  • They are used to specify legal elements that can
    occur in an XML document and the order they occur
    in.

6
Sample DTD and XML document
  • lt!ELEMENT gatech_student (name, age)gt
  • lt!ATTLIST gatech_student gtnum CDATAgt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ELEMENT age (PCDATA)gt
  • ltgatech_student gtnum"gt000x"gt
  • ltnamegtGeorge Burdelllt/namegt
  • ltagegt21lt/agegt
  • lt/gatech_studentgt

7
DTDs Unsatisfactory
  • DTDs proved inadequate due to to a number of
    reasons. The main reasons being
  • They used a different syntax than XML
  • Non-existent support for datatypes
  • Lack of control over occurrence of elements

8
XML Data Reduced (XDR)
  • A recommendation for XML schemas was submitted to
    the W3C by the Microsoft Corporation as a
    potential XML schema standard.
  • XDR tackled some of the problems of DTDs
  • XDR schemas are XML files
  • Support for a number of datatypes analogous to
    those used in relational database management
    systems and popular programming languages
  • Occurrence of elements is controllable

9
Sample XDR and XML document
  • ltSchema name"myschema" xmlns"urnschemas-microso
    ft-comxml-data" xmlnsdt"urnschemas-microsoft-c
    omdatatypes"gt
  • ltElementType name"age" dttype"ui1" /gt
  • ltElementType name"name" dttype"string" /gt
  • ltAttributeType name"gtnum" dttype"string"
    /gt
  • ltElementType name"gatech_student"
    order"seq"gt
  • ltelement type"name" minOccurs"1"
    maxOccurs"1"/gt
  • ltelement type"age" minOccurs"1"
    maxOccurs"1"/gt
  • ltattribute type"gtnum" /gt
  • lt/ElementTypegt
  • lt/Schemagt
  • ltgatech_student gtnum"gt000x"gt
  • ltnamegtGeorge Burdelllt/namegt
  • ltagegt21lt/agegt
  • lt/gatech_studentgt

10
XML Schema Definitions (XSD)
  • W3C standard
  • XSD outshines XDR in the following ways
  • Supports more datatypes
  • Provides the ability to create custom data types
  • Supports object oriented programming concepts
    like inheritance and polymorphism.

11
Sample XSD and XML document
  • ltschema xmlns"http//www.w3.org/2001/XMLSchema"
    gt
  • ltelement name"gatech_student"gt
  • ltcomplexTypegt
  • ltsequencegt
  • ltelement name"name" type"string"/gt
  • ltelement name"age" type"unsignedInt"/gt
  • lt/sequencegt
  • ltattribute name"gtnum"gt
  • ltsimpleTypegt
  • ltrestriction base"string"gt ltpattern
    value"gt\d3A-Za-z1"/gt lt/restrictiongt
  • lt/simpleTypegt lt/attributegt lt/complexTypegt
    lt/elementgt lt/schemagt
  • ltgatech_student gtnum"gt000x"gt
  • ltnamegtGeorge Burdelllt/namegt
  • ltagegt21lt/agegt
  • lt/gatech_studentgt

12
Querying XML
  • It is sometimes necessary to extract subsets of
    the data stored within an XML document.
  • A number of languages have been created for
    querying XML documents including Lorel, Quilt,
    UnQL, Xduce, XML-QL, Xpath, XQL, Xquery and YaTL.
  • XPath is a W3C recommendation and XQuery is a W3C
    working draft.

13
XML Path Language (XPath)
  • XPath is a language for addressing parts of an
    XML document using a syntax that resembles
    hierarchical paths used to address parts of a
    filesystem or URL
  • Also provides functions for interacting with
    selected data
  • Functions for the accessing information about
    document nodes
  • Functions for the manipulating of strings,
    numbers and booleans.
  • Developers can add functions to the XPath
    library.

14
Sample XPath queries
  • /gatech_student/name
  • Selects all name elements that are children of
    the root element gatech_student.
  • //age
  • Selects all age elements in the document.
  • /gatech_student/
  • Selects all child elements of the root element
    gatech_student.
  • /gatech_student_at_gtnum
  • Selects all gtnum attributes of the
    gatech_student elements in the document.
  • //name()'age'
  • Selects all elements that are named "age".
  • /gatech_student/age/ancestor
  • Selects all ancestors of all the age elements
    that are children of the gatech_student element
    (which should select the gatech_student element).

15
XML Query Language (XQuery)
  • XQuery is an attempt to provide a query language
    that provides the same breadth of functionality
    and underlying formalism as SQL does for
    relational databases.
  • XQuery is a functional language where each query
    is an expression.
  • XQuery has a sophisticated type system based on
    XML schema datatypes and supports the
    manipulation of the document nodes unlike XPath.
  • W3C is also working towards creating an alternate
    version of XQuery that has the same semantics but
    uses XML based syntax instead called XQueryX.

16
XQuery Expressions
  • path expressions
  • element constructors
  • FLWR expressions
  • expressions involving operators and functions
  • conditional expressions
  • quantified expressions
  • expressions that test or modify datatypes

17
XML Usage Models I (Review)
  • Document-centric
  • Semi structured documents
  • Irregular content
  • Human creation and/or consumption is primary
    aspect
  • Sample XHTML document
  • lthtml xmlns "http//www.w3.org/1999/xhtml"gt
  • ltheadgt
  • lttitlegtSample Web Pagelt/titlegt
  • lt/headgt
  • ltbodygt
  • ltpgt All XHTML documents must be
    well-formed and valid. lt/pgt
  • ltimg src"http//www.example.com
    /sample.jpg" height "50" width "25"/gt
  • ltbr /gt ltbr /gt
  • lt/bodygt lt/htmlgt

18
XML Usage Models II (Review)
  • Data-centric
  • Structured
  • Appears in a regular order
  • Mechanical creation and consumption.
  • XML usage is incidental
  • Sample SOAP message
  • ltSOAP-ENVEnvelope xmlnsSOAP-ENV"http//schemas
    .xmlsoap.org/soap/envelope/" SOAP-ENVencodingStyl
    e"http//schemas.xmlsoap.org/soap/encoding/"gt
  • ltSOAP-ENVBodygt
  • ltmGetLastTradePrice
    xmlnsm"Some-URI"gt
  • ltsymbolgtDISlt/symbolgt
  • lt/mGetLastTradePricegt
  • lt/SOAP-ENVBodygt
  • lt/SOAP-ENVEnvelopegt

19
XML Storage in a data centric model
  • Stored in database (typically an RDBMS)
  • One may want to extract data from a database as
    XML, store XML into a database or both
  • Primary choices for retrieving data
  • Middleware components
  • XML-enabled databases

20
Middleware components
  • Could be full blown application or an API.
  • Different strategies used
  • ADO same API for XML SQL access.
  • jxTransformer custom queries used to specify
    how the results of a SQL query should be
    converted to XML.
  • DatabaseDOM - user created template file
    contains the SQL to XML mappings for the SQL
    query results.
  • DB2XML - default mapping of SQL results to XML
    data exists that cannot be altered by the user

21
Sample jxTransformer Query
  • SQL
  • SELECT EmployeeID, FirstName, LastName, Title,
    HireDate, Salary
  • FROM Employees e WHERE HireDate gt d
    2000-01-01
  • jxTransformer custom query
  • xml_document(
  • xml_element(result,
  • SELECT
  • xml_element(Employees_Info
  • xml_attribute(ID,
    e.EmployeeID),
  • xml_element(name,
  • xml_element(first,
    e.FirstName),
  • xml_element(last,
    e.LastName) ),
  • xml_element(title,
    e.Title),
  • xml_element(hiredate,
    e.HireDate),
  • xml_element(salary, e.Salary) )
  • FROM EMPLOYEES e WHERE e.HireDate gt d
    2000-01-01 ) )

22
Sample DatabaseDOM template
  • ltXMLDATABASEMAPgt
  • ltTEMPLATEgt
  • ltEMPLOYEE_LISTgt
  • ltEMPLOYEE NO"EMPNO" GENDER"SEX"gt
  • ltNAMEgt
  • ltFIRSTgtFIRSTNMElt/FIRSTgt
  • ltMIDDLE_INITIALgtMIDINITlt/MIDDLE_INITIA
    Lgt
  • ltLASTgtLASTNAMElt/LASTgt lt/NAMEgt
  • lt/EMPLOYEEgt
  • lt/EMPLOYEE_LISTgt
  • lt/TEMPLATEgt
  • ltDATABASEgt
  • ltMAXRETURNROWSgt1000lt/MAXRETURNROWSgt
  • ltJDBCgt
  • ltURLgtjdbcdb2samplelt/URLgt
  • ltDRIVERgtCOM.ibm.db2.jdbc.app.DB2Driverlt/DRIVER
    gt
  • lt/JDBCgt
  • ltUSERIDgtpaullt/USERIDgt ltPASSWORDgtXXXXXXlt/PASSWOR
    Dgt
  • ltTABLEgtEMPLOYEElt/TABLEgt

23
XML-enabled databases
  • An XML-enabled database understands how to
    convert data to XML and back
  • Big 3 RDBMS vendors all have different XML
    strategies.

24
XML and DB2
  • Uses DB2 Extender to add XML support
  • Can store an entire XML document and its DTD as a
    user-defined column of an xml type
  • XMLCLOB
  • XMLVARCHAR
  • XMLFile
  • Option to shred the document into multiple tables
    and columns also available.
  • XML data can be queried with syntax that is
    compliant with W3C XPath recommendation.
  • Updating of XML data is also possible using
    stored procedures.

25
SAMPLE DB2 XML EXTENDER TABLE AND QUERY
  • TABLE
  • TABLE mail_user user_name
  • VARCHAR(20) NOT NULL PRIMARY KEY
  • passwd VARCHAR(10)
  • mailbox XMLVARCHAR
  • QUERY
  • SELECT user_name FROM mail_user WHERE
    extractVarchar(mailbox,"/Mailbox/Inbox/Email/Subje
    ct") LIKE "XML"

26
XML and Oracle 9i
  • XML documents can be stored as whole documents in
    user-defined columns of type XMLType or CLOB/BLOB
  • Shredded documents can be reconstituted using the
    XML SQL Utility.
  • Querying XML possible via two means
  • Oracle Text (CONTAINS WITHIN clauses in SQL)
    for BLOB or VARCHAR2 columns
  • XMLType columns can be queried via extract() and
    existsNode() functions which use XPath.
  • Relational views of XML data possible.

27
SAMPLE ORACLE 9i TABLE AND QUERY
  • TABLE
  • CREATE TABLE mail_user(
  • user_name VARCHAR2(20),
  • passwd VARCHAR2(10),
  • mailbox SYS.XMLTYPE )
  • QUERY
  • SELECT user_name FROM mail_user m WHERE
    m.mailbox.extract('/Mailbox/Inbox/Email/Subject/te
    xt()').getStringVal() like 'XML'

28
XML and SQL Server
  • XML can be retrieved from relational rows using
    FOR XML clause in SQL
  • RAW
  • AUTO
  • EXPLICIT
  • XML views of relational data possible.
  • Specified using XSD files for mapping
  • Queried using XPath

29
XML Storage in a document centric model
  • Stored in a content management system
  • A content management system typically consists of
    a repository that stores a variety of XML
    documents, an editor and an engine that provides
    one or more of the following features
  • version, revison and access control
  • ability to reuse documents in different formats
  • collaboration
  • web publishing facilities
  • support for a variety of text editors (e.g.
    Microsoft Word, Adobe Framemaker, etc)
  • indexing and search capabilities

30
XML Storage in a Hybrid Model
  • Where both data-centric and document-centric
    models are in use best choice is native XML
    database.
  • A native XML database is a database that has an
    XML document as its fundamental (logical) unit of
    storage and defines a (logical) model for an XML
    document, as opposed to the data in that
    document, and stores and retrieves documents
    according to that model.
  • At a minimum, the model must include elements,
    attributes, PCDATA, and document order.

31
Tamino a commercial native XML database
  • Created by Software AG.
  • Features
  • Storage retrieval of XML documents
  • Storage retrieval relational data
  • Interfacing with external applications and data
    sources.
  • Transactional (ACID properties)
  • Querying via X-Query (based on XPath NOT XQuery)
  • Indexing
  • GUI tools
  • Web based administration
  • Schema editor
  • Interactive query interface

32
Tamino Schemas
  • Schemas in Tamino are DTD-based and are mainly
    used as a way to describe how the XML data should
    be indexed
  • Document storage choices
  • Specify a pre-existing DTD which is then
    converted to a Tamino schema,
  • store a well-formed XML document without a schema
    which means that default indexing ensues
  • Create a schema from scratch for the XML document
    being stored
  • Schemas are also used as a way to specify data
    types which is important for type based
    operations during querying (e.g. numeric ops)

33
Tamino and SQL
  • Tamino ships with a SQL engine
  • Schemas can be used to creating mappings from SQL
    to XML
  • XML can be retrieved from RDBMS sources either
    internal (SQL engine) or external
  • Schemas can also be used to represent joins
    across different document types (which could mean
    different data sources)

34
Tamino programming support
  • APIs available for accessing XML store from both
    Java and Microsoft's Jscript
  • C programmers can interact with the SQL engine
    using the SQL precompiler
  • ODBC, OLE DB and JDBC clients can communicate
    with SQL Engine
  • X-Tensions framework allows developers to extend
    the functionality of Tamino by using C COM
    objects or Java objects

35
dbXML an Open Source native XML database
  • Created by the dbXML group.
  • Lightweight and modular
  • Can easily be embedded in applications
  • XML Documents arranged in hierarchical
    filesystem-like manner.
  • Querying via XPath.
  • Indexing support but no transactions or schemas.
  • Command line administration tools

36
dbXML programming support
  • Written in Java and has implementation of XMLDB
    initatives XML Database API.
  • Exposes CORBA API to enable access from any
    language with CORBA bindings.

37
Conclusion
  • Paper on this topic
  • http//www.25hoursaday.com/StoringAndQueryingXML.h
    tml
  • Missed Opportunities
  • eXcelon
  • Questions???
Write a Comment
User Comments (0)
About PowerShow.com