Title: XML and the Semi-Structured Data Model
1XML and the Semi-Structured Data Model
2Motivation
- We have seen that relational databases are very
convenient to query. However - There is a LOT of data not in relational
databases!! - Perhaps the most widely accessed database is the
web, and it certainly isnt a relational database.
3Documents Vs. Databases
Documents Databases
Paragraphs, Sentences Tables, tuples
Easy for people to understand Easy for computers to understand
Static Dynamic
4Querying the Web
- The web can be queried using a search engine,
however, we cant ask questions like - What is the weather in Zanzibar today?
- What is the lowest price for which a Jaguar is
sold on the web? - Problems
- There are no facilities for asking complex
questions, such as aggregation of data - Words have overloaded meanings (Jaguar)
5Understanding the Web
- In order to query the web, we must be able to
understand it. - 2 Computer Science Approaches
- Artificial Intelligence Approach
- Database Approach
6Artificial Intelligence Approach
- The web is unstructured and we must deal with
it - Use techniques for machine learning to understand
the web. - Example To understand the word Jaguar check if
it appears on a page with the word car or
automobile or rather with jungle and Africa - Problem Such techniques tend to be inexact and
have a large percentage of mistakes
7Database Approach
- The web is unstructured and we will structure
it - Sometimes problems that are very difficult can be
solved easily by enforcing a standard - Encourage the use of XML as a standard for data
exchange on the web
8Example XML Document
Opening Tag
- lt?xml version1.0?gt
- lttransactiongt
- ltaccountgt89-344lt/accountgt
- ltbuy shares 100gt
- ltticker exch NASDAQgtWEBMlt/tickergt
- lt/buygt
- ltsell shares 30gt
- ltticker exch NYSEgtGElt/tickergt
- lt/sellgt
- lt/transactiongt
Closing Tag
Element
Attribute Name
Attribute Value
9XML Representation of a Table
- lt?xml version1.0?gt
- ltROWSETgt
- ltROW num 1 gt
- ltENAMEgtKING lt/ENAMEgt
- ltSALgt5000lt/SALgt
- lt/ROWgt
- ltROW num 2 gt
- ltENAMEgtSCOTT lt/ENAMEgt
- ltSALgt3000lt/SALgt
- lt/ROWgt
- lt/ROWSETgt
ENAME SAL
KING 5000
SCOTT 3000
10Very Unstructured XML
- lt?xml version1.0?gt
- ltDamageReportgt
- The insureds ltVehicle Make Volksgt Beetle
lt/Vehiclegt broke through the guard rail and
plummeted into the ravine. The cause was
determined to be ltCausegtfaulty brakes lt/Causegt.
Amazingly there were no casualties. - lt/DamageReportgt
11XML Vs. HTML
- XML and HTML are brothers. They are both special
cases of SGML. - HTML has specific tag and attribute names. These
are associated with a specific meaning - XML can have any tag and attribute name. These
are not associated with any meaning - HTML is used to specify visual style
- XML is used to specify meaning
12Rules for Creating XML Documents
13Rule 1 XML Declaration
- An XML document should begin with an XML
declaration. A simple declaration is - lt?xml version1.0?gt
-
- Other things can be specified, such as
character encoding.
14Rule 2 Document Element
- Use exactly one top-level document element
- Example
- lt?xml version1.0?gt
- ltQuestiongt This is legal lt/Questiongt
- lt?xml version1.0?gt
- ltQuestiongt Is this legal? lt/Questiongt
- ltAnswergt No. lt/Answergt
15Rule 3 Match Opening and Closing Tags
- XML is case sensitive. The following examples are
all illegal - Example
- ltQuestiongt This is legal lt/QUESTIONgt
- ltQuestiongt ltBgt Is this legal? lt/Questiongt lt/Bgt
16Rule 4 Comments
- Comments are between lt!-- and --gt characters.
Comments cant appear as attribute values or
within a tag. - Example
- lt!-- This is a legal comment --gt
- ltQuestion lt!-- This is illegal --gtgt
- Why is this illegal
- lt!-- This is a legal comment --gt
- lt/Questiongt
17Rule 5 Element Names
- Element and attribute names must be continuous
sequences of letters or hyphens or underscores. - Example
- Legal Names
- lt_legalgt
- ltThis-is-OKgt
- I Illegal Names
- lt2-Part-Questiongt
- ltTwo Part Questiongt ltQuestion 4You
Yesgt
18Rule 6 Attribute Values
- Attribute values
- go in opening tags.
- should be enclosed by matching quotes ( or )
- should have only text and not tags
- Legal Example
- ltQuestion Poster YitzchakgtDo you like
XML? lt/Questiongt - ltAnswer Poster YaakovgtI do.lt/Answergt
19Rule 6 Continued
- Illegal Examples
- ltQuestion Poster YitzchakgtDo you like
XML? lt/Questiongt - ltQuestiongtDo you like XML? lt/Question
Poster Yitzchakgt - ltQuestion Poster ltfirstgtYitzchaklt/firstgt
gtDo you like XML? lt/Questiongt -
20Rule 7 Empty Elements
- Empty elements are elements that do not contain
text or nested elements. They can be written in a
compact syntax - ltPerson First Shmuel Last Levygtlt/Persongt
- is the same as
- ltPerson First Shmuel Last Levy /gt
21Abstract View of XML
22A Different Data Model
Relational Semi-Structured
Abstract Model Sets of tuples Labeled Directed Graph
Concrete Model Tables XML Documents
Standard for Storing Data Data Exchange
23An Example
- lt?xml version1.0?gt
- lttransactiongt
- ltaccountgt89-344lt/accountgt
- ltbuy shares 100gt
- ltticker exch NASDAQgtWEBMlt/tickergt
- lt/buygt
- ltsell shares 30gt
- ltticker exch NYSEgtGElt/tickergt
- lt/sellgt
- lt/transactiongt
24Corresponding Tree
transaction
account
buy
sell
89-344
shares
shares
ticker
ticker
100
30
exch
exch
NASDAQ
NYSE
WEBM
GE
25Using XML
- Quering XML There are query languages that query
XML and return XML. Examples XQuery, XPath,
SQL4X - Displaying XML An XML document can have an
associated style-sheet which specifies how the
document should be translated to HTML. Examples
CSS, XSL
26Namespaces
- Namespaces are used to attach an accepted meaning
to a set of tags. - Syntax for defining a namespace
- ltSomeElement xmlnsprefixnamenamespaceURL gt
- the namespace will be recognized within the
SomeElement element.
27Example Namespace
- ltirsForm id1040 xmlnsirshttp//www.irs.gov
gt - ltirsNamegtTina Wellslt/irsNamegt
- ltPhoneNumbergt03-5655666lt/PhoneNumbergt
- lt/irsNamegt
- In order for the namespace to be recognized in
all elements, the declaration should be in the
document element
28XSQL Pages
29What are XSQL Pages?
- XSQL pages are XML documents that have SQL
queries embedded in them. - When a user requests to view an XSQL page, the
web server - Dynamically computes the embedded queries
- Translates the query results into XML
- Inserts the results in the proper places in the
document - Transforms the result to HTML if a stylesheet is
given
30A Simple Example
- lt?xml version1.0?gt
- ltxsqlquery connectionscott xmlnsxsqlurnora
cle-xsqlgt - SELECT sname
- FROM Sailors
- lt/xsqlquerygt
You should specify the connection and the
namespace on the document element
31Page Seen in Browser
- lt?xml version1.0?gt
- ltROWSETgt
- ltROW num 1 gt
- ltSNAMEgtRustylt/SNAMEgt
- lt/ROWgt
- ltROW num 2 gt
- ltSNAMEgtJustin lt/SNAMEgt
- lt/ROWgt
- lt/ROWSETgt
- A ROWSET element encloses query result
- Each ROW element encloses each row
- Each column in the row is within a tag with its
columns name
32Another Example
- lt?xml version1.0?gt
- ltRESULTS connectionscott xmlnsxsqlurnoracle
-xsqlgt - Here is something interesting
- ltxsqlquerygt
- SELECT sname, age rating as ra
- FROM Sailors
- WHERE sid 13
- lt/xsqlquerygt
- lt/RESULTSgt
33Resulting Document
- lt?xml version1.0?gt
- ltRESULTSgt
- Here is something interesting
- ltROWSETgt
- ltROW num 1 gt
- ltSNAMEgtRustylt/SNAMEgt
- ltRAgt55lt/RAgt
- lt/ROWgt
- lt/ROWSETgt
- lt/RESULTSgt
34Using Parameters
- Your page can use parameters. The value of a
parameter param is determined in the following
fashion - The value of the URL parameter param if supplied
- The value of the HTTP session object param if
supplied - The value of the closest ancestors attribute
named param, if present - An empty string
35Example with Parameters
- lt?xml version1.0?gt
- ltxsqlquery connectionscott xmlnsxsqlurnora
cle-xsql - sname Joegt
- SELECT
- FROM Sailors
- WHERE sname _at_sname
- lt/xsqlquerygt
36Evaluating the Query
- Suppose the XSQL document is at
- http//cs.huji.ac.il/db/query1.xsql
- Then, requesting the url
- http//cs.huji.ac.il/db/query1.xsql?snameJim
- will return all the details of Jim.
- Requesting
- http//cs.huji.ac.il/db/query1.xsql
- will return all the details of Joe (the defualt
value)
37A Strange Example
- lt?xml version1.0?gt
- ltxsqlquery connectionscott xmlnsxsqlurnora
cle-xsql - select where 11 order1gt
- SELECT _at_select
- FROM _at_from
- WHERE _at_where
- ORDER BY _at_order
- lt/xsqlquerygt
38Customizing Results
- The query tag can have different attributes that
customize the query results. Here are some of the
important options - max-rows The maximum number of rows returned
- skip-rows The number of rows to skip before
returning rows - rowset-element The name of the rowset element
- row-element The name of the row element
39Customizing Results
- lt?xml version1.0?gt
- ltxsqlquery connectionscott xmlnsxsqlurnora
cle-xsql - skip 0 max-rows2 skip-rows_at_skip gt
- SELECT
- FROM Program
- ORDER BY url
- lt/xsqlquerygt
By calling the same page with different values
for skip, we can see the different programs
40Notes
- An XSQL document can have many queries.
- The queries can appear within arbitrary XML tags
- We can produce XML that has a more nested
structure using the CURSOR function...
41Remembering Subqueries in the SELECT Clause
- Subqueries in the SELECT clause must return a
single value. What do we do if we want for each
boat, all the sailors who reserved the boat? - We want each bid to be associated with a table of
Sailors data!
42Using the CURSOR Function
- lt?xml version1.0?gt
- ltxsqlquery connectionscott xmlnsxsqlurnora
cle-xsqlgt - SELECT bid,
- CURSOR(SELECT sid, sname
- FROM Sailors S, Reserves R
- WHERE S.sid R.sid
- and R.bid B.bid)
- as Reservers
- FROM Boats B
- lt/xsqlquerygt
43- lt?xml version1.0?gt
- ltROWSETgt
- ltROW num 1 gt
- ltBIDgt113lt/BIDgt
- ltRESERVERSgt
- ltRESERVERS_ROW num 1 gt
- ltSIDgt 13 lt/SIDgt
- ltSNAMEgt Joe lt/SNAMEgt
- lt/RESERVERS_ROWgt
- ltRESERVERS_ROW num 2 gt
- .... lt/RESERVERS_ROWgt
- lt/RESERVERSgt
- lt/ROWgt
- lt/ROWSETgt
Note use of select query alias instead of inner
row set and row tags.
44Setting Page Level Parameters
- The following statement defines a parameter
pname. The value of pname is the value in the
first column of the first row of the query - The variable pname will be recognized in the page
- ltxsqlset-page-param namepnamegt
- SELECT Statement
- lt/xsqlset-page-paramgt
45Example
- lt?xml version1.0?gt
- ltpage connectionscott xmlnsxsqlurnoracle-xs
qlgt - ltxsqlset-page-param namenum-storiesgt
- SELECT headings_num
- FROM user_prefs
- WHERE userid_at_user
- lt/xsqlset-page-paramgt
- ltxsqlquery max-rows_at_num-stories gt
- SELECT title, url
- FROM latest_news
- lt/xsqlquerygt
- lt/pagegt
46Another Way to Define a Page Level Parameter
- Page level parameters can also be set with the
statement - ltxsqlset-page-param namepname valueval/gt
- For example
- ltxsqlset-page-param namenum-stories
value10/gt
47Additional Options
- The set-page-param element can have the following
attributes - only-if-unset If the value is yes then the
parameter will be set only if it has no value - ignore-empty-value If value is yes then the
parameter will be set only if its value will not
be an empty string
48Setting Cookie Values
- The following statement defines a parameter
pname. The value of pname is the value in the
first column of the first row of the query - The variable pname will be recognized until the
cookie expires - ltxsqlset-cookie namepnamegt
- SELECT Statement
- lt/xsqlset-cookiegt
49Additional Attributes for Set-Cookie
- The set-cookie element can have the following
attributes - max-age The number of seconds before the cookie
expires (defaults to expire when user exits
current browser instance) - only-if-unset
- ignore-empty-value
50Example
- lt?xml version1.0?gt
- ltpage connectionscott xmlnsxsqlurnoracle-xs
qlgt - ltxsqlset-cookie namesiteuser
max-age31536000 - only-if-unsetyes ignore-empty-valueyes
gt - SELECT username
- FROM site_users
- WHERE username _at_username and
- password_at_password
- lt/xsqlset-cookiegt
- lt!-- Other Actions Here --gt
- lt/pagegt
51DML or PL/SQL
- We can do DML (update, insert, delete) or call
PL/SQL procedures with the following basic
syntax - ltxsqldmlgt
- DML Statement
- lt/xsqldmlgt
- or
- ltxsqldmlgt
- BEGIN
- Any valid PL/SQL Statement
- END
- lt/xsqldmlgt
52Example
- ltxsqldmlgt
- INSERT INTO page_requests_log(page,userid)
- VALUES(page12.xsql, _at_siteuser)
- lt/xsqldmlgt
- If successful the following element is added to
the page - ltxsql-status actionxsqldml rowsn /gt
- Otherwise, an error element is added
- ltxsql-error actionxsqldmlgt ...
- lt/xsql-errorgt