XML In An RDBMS World - PowerPoint PPT Presentation

About This Presentation
Title:

XML In An RDBMS World

Description:

XML In An RDBMS World Michael D. Thomas mdthomas_at_ibiblio.org Data, data, data In general, computing is never far from data XML, RDBMSes deal with data play in the ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 76
Provided by: Micha169
Learn more at: http://www.ibiblio.org
Category:
Tags: rdbms | xml | world

less

Transcript and Presenter's Notes

Title: XML In An RDBMS World


1
XML In An RDBMS World
  • Michael D. Thomas
  • mdthomas_at_ibiblio.org

2
Data, data, data
  • In general, computing is never far from data
  • XML, RDBMSes deal with data play in the same
    very large playground
  • Extensive overlap between RDBMSes and XML
  • Goal Learn how to choose between RDBMS vs. XML,
    how to integrate the two technologies

3
First, some pragmatism
  • When all you have is a hammer, everything looks
    like a nail.
  • What if all you have is a hammer?
  • When all you have is a hammer, everything must
    look like a nail.
  • Best to have XML and RDBMS in your toolset
  • Architectural purity is often impossible
  • But we should know when what we are
    compromising

4
Tonights Agenda
  • Compare XML and SQL RDBMS Technologies
  • Understand Their Sweet Spots
  • Examine Their Use Through Studies of
    Anti-Patterns and Patterns Of Usage
  • Discuss XQuery and XML Storage In The RDBMS

5
Topics
  • Defining The Playground XML vs. RDBMS
  • Anti-Pattern XML As Transactional Data Store
  • Data Stores The ACID Test
  • Anti-Pattern Document Storage The RDBMS
  • XQuery RDBMS Doc Storage
  • Anti-Pattern XML Declarative Languages
    Gratuitous Encapsulation Of SQL
  • Understanding Declarative Languages, Such As SQL
  • Anti-Pattern XML Meta-data The Overly Abstract
    Database
  • Managing The Split Between XML-based
    RDBMS-based Data Models In The Same Application

6
XML As Database Anti-Pattern
  • Ill use an XML file as a database
  • Advantages Simple, Cheap
  • An XML Schema defines the metadata of the
    database
  • Data is right on the filesystem, not hidden in
    a complicated SQL RDBMS

7
XML As Database Problems
  • Updates are hard
  • Have to re-invent concurrency handling multiple
    users changing data at the same time
  • No constraint checking
  • No optimization, such as indexes, caching, etc.

8
Organization of Data
  • Been formalizing the storage information for a
    long time
  • Age of computing called for new approaches
  • Relational model highly successful
  • XML is relatively new
  • Good for loosely structured data documents --
    and data transmission

9
Transmission Interoperability
  • Electronic transmission is newer than electronic
    storage
  • With the growth of the Internet, transmission of
    data is exploding
  • No Interoperability problems in 1950 lots of
    interoperability problems now
  • XML is an important standard for transmission and
    interoperability

10
Vocabulary
  • Datastore Anything that stores data
  • SQL Relational Database A database that
    organizes info in tables and adheres to
    relational theory
  • XML 1) eXtensible Markup Language 2) The XML
    standard 3) All or part of an XML document

11
Vocabulary
  • XML Database A database specialized for the
    storage of XML documents
  • Object Database A database that stores objects
  • Object-Relational Database A relational
    database with an extensible type system

12
Data Centric vs. Document Centric
  • Data Centric approach the datastore is focused
    on handling highly structured, fine grained data.
    Favors the relational model
  • Document centric approach the datastore is
    focused on handling semi-structured data, such as
    web pages, books, etc. Favors XML

13
Data Centric vs. Document Centric
  • Best queries are largely a result of structuring
    the data well (Messy desk vs. organized file
    cabinet)
  • Documents are semi-structured data with ad hoc
    structures
  • The overhead of defining rigorous structure for
    each type of document increases the overall cost
    of management
  • (Would still give you the best queries)

14
Storage vs. Transmission
  • Data Storage and Data Transmission are two
    different concepts
  • XML is very strong for interoperable transmission
  • Can store by writing XML to a file
  • You can transmit relational data by exporting a
    few tables and ftping, but isnt a strong solution

15
Datastore Application Stack
16
XML/Relational Comparison
Relational/SQL XML
Metadata Definition Create Table Define XML Schema (Optional)
Data Persistence Insert Update Create XML Document
Data Query Select XPath, XQuery
17
Application With XML Transmissions
18
Three Data Models
  • Persistent Model how data is stored
  • Active Model (Object Model) how data is
    arranged when it is being manipulated by a
    program, usually written in an imperative
    language
  • Presentation/Transmission Model how data is
    transmitted, usually as XML

19
Datastore Basics
  • Any Datastore must tackle the following issues
  • Concurrency
  • Transactions the ACID test
  • Locking
  • Joins
  • Normalization
  • Administration Issues

20
Concurrency
  • Datastores support concurrency if multiple users
    can access the same datastore at the same time
  • Datastores must not allow the same data to be
    modified at the same time

21
Transactions
  • Atomicity No matter how complex, a transaction
    is atomic and indivisible. Transactions are all
    or nothing.
  • Consistency Transactions must leave the
    database in a consistent state, i.e., consistent
    with the rules
  • Isolation Transaction is isolated from other
    transactions
  • Durability The effects of a transaction persist

22
Isolation levels
  • TRANSACTION_READ_UNCOMMITTED Dirty reads,
    non-repeatable reads, phantom reads
  • TRANSACTION_READ_COMMITTED
  • Non-repeatable reads, phantom reads
  • TRANSACTION_REPEATABLE_READ
  • phantom reads
  • TRANSACTION_SERIALIZABLE

23
Locking
  • Data must be locked for transactions to be
    isolated
  • Locking is both a datastore and an application
    concern
  • How much extraneous data is locked? (Page level
    locking, document level locking)
  • Pessimistic locking prevents reading of locked
    data
  • Optimistic locking generates an error when
    inappropriate data updates are attempted

24
Joins
  • A Join joins the data between two different data
    entities
  • E.g., SELECT from emp, dept WHERE emp.deptno
    dept.deptno
  • Joins are the cornerstone of SQL
  • XPath doesnt do joins between XML documents!
  • XQuery, others can

25
Normalization
  • Normalization organizing data to minimize
    redundancy
  • Normalized data is easier to maintain and easier
    to understand conceptually
  • In relational db design, normalized DBs need to
    be denormalized for performance reasons
  • XML docs can also be normalized, but not as much
    support for tying elements together
  • Normalization is important when designing, less
    crucial at implementation

26
Administrative Issues
  • Includes backup recovery, installation,
    upgrades, optimization, maintenance, etc.
  • In general
  • Bigger is better (economies of scale, 24x7
    support)
  • More popular, more standard is better (law of
    increasing returns)
  • Whatever is already working in your organization
    is better (no need to hire, re-train
    administrators)
  • Existing DB vendors have a huge advantage

27
Datastores
  • Different types
  • Relational Database highly structured data such
    as account balances, inventory quantities, etc.)
  • Document Database used to store documents,
    probably in XML format
  • The same DB can serve as both, e.g., Oracle

28
Overview Of Relational Databases
  • The Relational Model
  • SQL
  • Entity Relationship Diagrams

29
Relational Model -- structure
  • Data is grouped in tables
  • Tables have columns and rows
  • Columns are fairly fixed the set of columns
    shouldnt change much (if at all) over the life
    of a table
  • A table can have any number of rows
  • Rows change constantly

30
Relational model primary key
  • In general, a table should have one or more
    columns defined as the primary key
  • The primary key is unique and non-null
  • Usually only one column
  • Can consist of more than one column (composite
    primary key)

31
Relational model foreign key
  • A foreign key describes a relationship between
    two tables
  • The foreign key column of tableA points to a
    column in tableB
  • TableB is said to be the parent of tableA
  • Often, the foreign key points at a primary key

32
SQL
  • Three types
  • Query SELECT FROM emp WHERE deptno10 ORDER BY
    ename
  • Data Modification Language (DML) Update, Insert
  • Data Definition Language (DDL) Create Table

33
Joins
  • A Join is a Cartesian Cross-Product
  • SELECT count() FROM emp
  • SELECT count() FROM emp
  • SELECT count() FROM emp, dept
  • SELECT ename, emp.deptno, dept.deptno, dname FROM
    emp, dept
  • SELECT ename, emp.deptno, dept.deptno, dname FROM
    emp, dept WHERE emp.deptno dept.deptno

34
Joins
  • SELECT ename, dname FROM emp, dept WHERE
    emp.deptno dept.deptno

35
Entity Relationship Diagrams
  • Used to create a map of your data
  • Describes tables, attributes of tables and
    relationships between tables
  • Come at it from two directions lay out the
    entities, assign the attributes group the
    attributes into entities

36
XML vs. SQL Tables
  • XML order matters! (Rows in a relational table
    are unordered)
  • XML is a tree structure
  • XML documents tend to be semi-structured, SQL
    tables are highly structured
  • SQL tables arent as flexible or inter-changable
  • XML joins arent straightforward
  • An XML document/element doesnt serve multiple
    purposes as well as DB schema/table

37
Anti-Pattern XML Document Storage As A SQL BLOB
  • Need to store XML in a database
  • Better than storing in a filesystem
  • Make a BLOB (Binary Large Object) and store a
    document as an element in a row of a table
  • Problem cant query contents of document without
    extracting it from the db!
  • For simple queries, might have to extract all of
    the documents into the application very
    inefficient

38
Anti-Pattern One-off SQL Schema For A Particular
XML Document
  • BLOB storage is bad, so why not shred the
    document across multiple DB tables?
  • I.e., for XML elements named Dept make a Dept
    table, Emp elements make a Emp table,
    describe the hierarchy with foreign key
    constraints
  • Query performance is much, much better probably
    better than XPath against XML as a file

39
Problems
  • A lot of work! Have to do this for every XML
    schema
  • Hard! XML schemas are inherently more flexible
    than SQL schemas. Some mappings can be difficult
  • Negates flexibility of XML
  • Not as learnable Programmers have to learn to
    query your SQL schema, not just XPath and XML
    Schema

40
Anti-Pattern Developing A General Shredding
Solution For All Of XML
  • Your Application has several XML schemas, and
    its time-consuming to develop shredding for
    all of them
  • So, you try to do a more generalized shredding
  • Is possible, but is a very horizontal problem.
    You probably wont get the time to solve it
    completely.
  • Still might present learnability problems

41
XML Datastore Architectures
  • XML Views Of Relational Data
  • Relational Wrappers Of XML
  • Independent Storage of XML Documents (Native XML
    Database)
  • Text Storage Of XML In RDBMS
  • XML Shredding Across Relational Tables
  • Storing XML as Objects in Object- Relational DBs
    (Oracle XMLType)
  • Everything is XML (XQuery approach)

42
Vocabulary XML Collections
  • XML Collection is a collection of XML documents
  • A row is to a table as an XML document is to an
    XML collection.

43
XML Derived From RDBMS
  • Data exists naturally as relational data
  • Needs to be represented as an XML document for
    some reason
  • The derived XML is usually used for transmission

44
Independent XML Store
  • A specialized database is used to store XML
    documents
  • Typically, an application will either have two
    datastores relational and XML or relational
    data will be stored as XML
  • In the dual datastore case, the application code
    has to join the different data sets

45
Text Storage Of XML In Databases
  • XML is stored in a column of a relational table
  • Allows you to mix structured and semi-structured
    approaches
  • However, hard to query against the XML directly

46
XML Shredding Across Tables
  • An XML document is stored across many tables
  • Is possible to use SQL queries against the
    documents parts
  • Vendors can implement an XPath-to-SQL translator
  • Can structure the tables based on a schema
  • A pain to handle yourself

47
Exercise
  • Shred an XML schema across relational tables

48
XMLType
  • Object-Relational databases allow you to define
    your own types
  • You could define an Address object, define
    functions for the object, and store instances of
    the object in a column in the database
  • Oracle defines an object-relational type,
    XMLType, for the storage of XML documents in the
    database
  • SELECT e.poDoc.getClobval() AS poXML FROM
    po_xml_tab e WHERE e.poDoc.existsNode('/POPNAME
    "po_2"') 1
  • Shredding is managed encapsulated for you

49
Oracle 9i XMLType, Text
50
Everything Is XML (XQuery)
  • With XMLType, Oracle says that everything fits in
    to the Object-Relational realm
  • XQuery says that everything can be represented
    and queried as XML
  • Relational data is derived from RDBMS using
    SQL/XML

51
XQuery
52
SQL/XML
  • select xmlelement("emp",
  • 'Employee ' ,
  • xmlelement( "name", e.job '
    ' e.ename),
  • ' was hired on ',
  • xmlelement("hiredate",
    e.hiredate)) as result
  • from emp e
  • --------------------------------------------------
    ----------------------------
  • ltempgtEmployee
  • ltnamegtCLERK SMITHlt/namegt
  • was hired on
  • lthiredategt17-DEC-80lt/hiredategt
  • lt/empgt

53
Query Soup
  • SQL queries relational data
  • XPath queries a particular XML document
  • SQL/XML a standard for deriving XML from
    relational data
  • XQuery queries a collection of XML documents
    and uses XPath to query particular XML documents

54
Query Soup
  • There are other XML query languages, such as
    XML-QL and Quilt
  • XSLT is a transformation, not a query, language
  • XSQL MS XML/SQL are wrapper languages

55
Anti-Pattern Encapsulating SQL With XML
  • General Rule Encapsulation is good, but
    encapsulating good things is bad.
  • SQL is good
  • (Also, HTML is good)
  • Why hide SQL with XML?

56
XML encapsulating SQL
  • SELECT emp.ename, dept.dname FROM emp, dept WHERE
    emp.deptno dept.deptno
  • ltsql-querygt
  • ltfieldsgt
  • ltfieldgt ename lt/fieldgt
  • ltfieldgt dname lt/fieldgt
  • lt/fieldsgt
  • lttablesgt
  • lttablegt emp lt/tablegt
  • lttablegt dept lt/tablegt
  • lt/tablesgt
  • ltjoinsgt
  • ltjoingtemp.deptno dept.deptno lt/joingt
  • lt/joinsgt
  • lt/sql-querygt

57
Same concept in XQueryX
  • XQueryX is an XML representation of an XQuery
    The result is not particularly convenient for
    humans to read and write, but it is easy for
    programs to parse

58
XQuery Example
  • XQuery
  • for b in doc("http//www.bn.com/bib.xml")/bib/b
    ook where
  • b/publisher "Addison-Wesley" and
  • b/_at_year gt 1991 return ltbook year
  • " b/_at_year "gt b/title lt/bookgt

59
XQueryX Version
  • lt?xml version"1.0" encoding"UTF-8"?gt
    lt?xml-stylesheet type"text/xsl"
    href"xqueryx.xsl"?gt ltxqxmodule
    xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
    ce" xmlnsxqx"http//www.w3.org/2003/12/XQueryX"
    xsischemaLocation"http//www.w3.org/2003/12/XQue
    ryX xqueryx.xsd"gt ltxqxmainModulegt
    ltxqxqueryBodygt ltxqxexpr xsitype"xqxelementCon
    structor"gt ltxqxtagNamegtbiblt/xqxtagNamegt
    ltxqxelementContentgt ltxqxexpr xsitype"xqxflwrE
    xpr"gt ltxqxforClausegt ltxqxforClauseItemgt
    ltxqxtypedVariableBindinggt ltxqxvarNamegtblt/xqxvar
    Namegt lt/xqxtypedVariableBindinggt ltxqxforExprgt
    ltxqxexpr xsitype"xqxpathExpr"gt ltxqxexpr
    xsitype"xqxfunctionCallExpr"gt
    ltxqxfunctionNamegtdocumentlt/xqxfunctionNamegt
    ltxqxparametersgt ltxqxexpr xsitype"xqxstringCon
    stantExpr"gt ltxqxvaluegtbib.xmllt/xqxvaluegt
    lt/xqxexprgt lt/xqxparametersgt lt/xqxexprgt
    ltxqxstepExprgt ltxqxxpathAxisgtchildlt/xqxxpathAxis
    gt ltxqxelementTestgt ltxqxnodeNamegt
    ltxqxQNamegtbiblt/xqxQNamegt lt/xqxnodeNamegt
    lt/xqxelementTestgt lt/xqxstepExprgt ltxqxstepExprgt
    ltxqxxpathAxisgtchildlt/xqxxpathAxisgt
    ltxqxelementTestgt ltxqxnodeNamegt
    ltxqxQNamegtbooklt/xqxQNamegt lt/xqxnodeNamegt
    lt/xqxelementTestgt lt/xqxstepExprgt lt/xqxexprgt
    lt/xqxforExprgt lt/xqxforClauseItemgt
    lt/xqxforClausegt ltxqxwhereClausegt ltxqxexpr
    xsitype"xqxoperatorExpr" xqxinfix"true"gt
    ltxqxopTypegtANDlt/xqxopTypegt ltxqxparametersgt
    ltxqxexpr xsitype"xqxoperatorExpr"
    xqxinfix"true"gt ltxqxopTypegtlt/xqxopTypegt
    ltxqxparametersgt ltxqxexpr xsitype"xqxpathExpr"
    gt ltxqxexpr xsitype"xqxvariable"gt
    ltxqxnamegtblt/xqxnamegt lt/xqxexprgt ltxqxstepExprgt
    ltxqxxpathAxisgtchildlt/xqxxpathAxisgt
    ltxqxelementTestgt ltxqxnodeNamegt
    ltxqxQNamegtpublisherlt/xqxQNamegt lt/xqxnodeNamegt
    lt/xqxelementTestgt lt/xqxstepExprgt lt/xqxexprgt
    ltxqxexpr xsitype"xqxstringConstantExpr"gt
    ltxqxvaluegtAddison-Wesleylt/xqxvaluegt \

60
XQueryX Version (cont.)
  • ltxqxQNamegtpublisherlt/xqxQNamegt lt/xqxnodeNamegt
    lt/xqxelementTestgt lt/xqxstepExprgt lt/xqxexprgt
    ltxqxexpr xsitype"xqxstringConstantExpr"gt
    ltxqxvaluegtAddison-Wesleylt/xqxvaluegt lt/xqxexprgt
    lt/xqxparametersgt lt/xqxexprgt ltxqxexpr
    xsitype"xqxoperatorExpr" xqxinfix"true"gt
    ltxqxopTypegtgtlt/xqxopTypegt ltxqxparametersgt
    ltxqxexpr xsitype"xqxpathExpr"gt ltxqxexpr
    xsitype"xqxvariable"gt ltxqxnamegtblt/xqxnamegt
    lt/xqxexprgt ltxqxstepExprgt ltxqxxpathAxisgtchildlt/x
    qxxpathAxisgt ltxqxattributeTestgt ltxqxnodeNamegt
    ltxqxQNamegtyearlt/xqxQNamegt lt/xqxnodeNamegt
    lt/xqxattributeTestgt lt/xqxstepExprgt lt/xqxexprgt
    ltxqxexpr xsitype"xqxintegerConstantExpr"gt
    ltxqxvaluegt1991lt/xqxvaluegt lt/xqxexprgt
    lt/xqxparametersgt lt/xqxexprgt lt/xqxparametersgt
    lt/xqxexprgt lt/xqxwhereClausegt ltxqxreturnClausegt
    ltxqxexpr xsitype"xqxelementConstructor"gt
    ltxqxtagNamegtbooklt/xqxtagNamegt
    ltxqxattributeListgt ltxqxexpr xsitype"xqxattrib
    uteConstructor"gt ltxqxattributeNamegtyearlt/xqxattr
    ibuteNamegt ltxqxattributeValuegt ltxqxexpr
    xsitype"xqxpathExpr"gt ltxqxexpr
    xsitype"xqxvariable"gt ltxqxnamegtblt/xqxnamegt
    lt/xqxexprgt ltxqxstepExprgt ltxqxxpathAxisgtchildlt/x
    qxxpathAxisgt ltxqxattributeTestgt ltxqxnodeNamegt
    ltxqxQNamegtyearlt/xqxQNamegt lt/xqxnodeNamegt
    lt/xqxattributeTestgt lt/xqxstepExprgt lt/xqxexprgt
    lt/xqxattributeValuegt lt/xqxexprgt
    lt/xqxattributeListgt ltxqxelementContentgt
    ltxqxexpr xsitype"xqxpathExpr"gt ltxqxexpr
    xsitype"xqxpathExpr"gt ltxqxexpr
    xsitype"xqxvariable"gt ltxqxnamegtblt/xqxnamegt
    lt/xqxexprgt ltxqxstepExprgt ltxqxxpathAxisgtchildlt/x
    qxxpathAxisgt ltxqxelementTestgt ltxqxnodeNamegt
    ltxqxQNamegttitlelt/xqxQNamegt lt/xqxnodeNamegt
    lt/xqxelementTestgt lt/xqxstepExprgt lt/xqxexprgt
    lt/xqxexprgt lt/xqxelementContentgt lt/xqxexprgt
    lt/xqxreturnClausegt lt/xqxexprgt
    lt/xqxelementContentgt lt/xqxexprgt
    lt/xqxqueryBodygt lt/xqxmainModulegt lt/xqxmodulegt

61
When To Use Declarative XML
62
XML As User Interface
  • Development teams often forget to write use cases
    for administrators, downstream developers
  • XML declarative languages must be considered as
    ways to provide user interfaces
  • Documentation, examples important
  • Usability, Learnability issues important

63
Declarative XML In App. Lifecycle
  • Compile-Time Best Performance, Least
    Flexibility
  • Start-Time XML Parsed Once, Good Performance,
    Good Flexibility. App must be restarted or
    manually refreshed to get flexibility
  • Run-Time Can Impact Performance, But Great
    Flexibility. Web Services Approach

64
Start-Time Configuration
65
Runtime Configuration
66
Anti-Pattern The Overly Abstract Database
  • By using XML, you can achieve runtime
    configurability
  • Instead of hard coding SQL table names, do the
    same job in XML configuration files
  • Problem database is so abstract that it is very
    hard or impossible to do SQL queries against it
    directly
  • Database is closed just a persistence extension
    to the application
  • The application (and thus, the app dev team)
    assumes all responsibility for any work with the
    data

67
The Split Persistence Data Model
  • Applications Have Persistence Data Models
  • Persistence Data Model any data stored on disk
  • Everything might be in the DB
  • Everything might be in XML docs
  • Usually, there is some split i.e., 95 in a SQL
    DB, but 5 in a properties file
  • Managing the split between DB-based data and
    XML-based data is accidental complexity and can
    be a headache

68
Traditional Architecture
69
Typical Architecture
70
Joins between data models
  • Can use XQuery to join the two data models
  • Could use XPath enhanced SQL to join the two
    models
  • Often, the two models are joined at the object
    level

71
Databases Have Meta Data
72
Databases Have Metadata
  • Column names, table names constraints are all
    metadata
  • Defined at DB design time, which is usually app
    design time
  • App is often hard coded against the database meta
    data

73
XML as DB Metadata
74
Pros vs. Cons
  • More flexible XML can change at runtime
  • More complex at the application level
  • Database must be more abstract
  • Can make the application harder to extend
  • Dont forget a DB table can also serve the same
    purpose as an XML meta data doc

75
Alternative Tables As Meta Data
Write a Comment
User Comments (0)
About PowerShow.com