Title: Semistructured Data and XML and RDF
1Semistructured Data and XML and RDF
Some of the materials of this lecture are from
Prof. Dan Suciu, Univ. of Washington and John
Punin, Rensselaer Polytechnic Inst.
2Overview
- Semistructured Data
- Model
- Syntax
- Comparison with relational data
- XML
- Motivation
- Syntax, DTDs
- RDF
- Motivation
- Syntax, RDFS
- Semantics
3Semistructured Data
- Schemaless, self-describing
- Labeled tree
4XML v.s. Semistructured Data
- both described best by a graph
- both are schema-less, self-describing
5More Differences
- XML is ordered, ssd is not
- XML can mix text and elements
- lttalkgt Making Java easier to type and easier
to type - ltspeakergt Phil Wadler lt/speakergt
- lt/talkgt
- XML has lots of other stuff entities, processing
instructions, etc.
These differences make XML data management harder
6XML
- a W3C standard to complement HTML
- origins structured text SGML
- motivation
- HTML describes presentation
- XML describes content
-
- http//www.w3.org/TR/2000/REC-xml-20001006
(version 2, 10/2000)
7From HTML to XML
HTML describes the presentation
8HTML
- lth1gt Bibliography lt/h1gt
- ltpgt ltigt Foundations of Databases lt/igt
- Abiteboul, Hull, Vianu
- ltbrgt Addison Wesley, 1995
- ltpgt ltigt Data on the Web lt/igt
- Abiteoul, Buneman, Suciu
- ltbrgt Morgan Kaufmann, 1999
9XML
- ltbibliographygt
- ltbookgt lttitlegt Foundations lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
- ltauthorgt Hull lt/authorgt
- ltauthorgt Vianu lt/authorgt
- ltpublishergt Addison Wesley
lt/publishergt - ltyeargt 1995 lt/yeargt
- lt/bookgt
-
- lt/bibliographygt
XML describes the content
10XML Terminology
- tags book, title, author,
- start tag ltbookgt, end tag lt/bookgt
- elements ltbookgtltbookgt,ltauthorgtlt/authorgt
- elements are nested
- an XML document single root element
well formed XML document if it has matching tags
11More XML Attributes
- ltbook price 55 currency USDgt
- lttitlegt Foundations of Databases lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
-
- ltyeargt 1995 lt/yeargt
- lt/bookgt
attributes are alternative ways to represent data
12More XML Oids and References
- ltperson ido555gt ltnamegt Jane lt/namegt lt/persongt
- ltperson ido456gt ltnamegt Mary lt/namegt
- ltchildren
idrefo123 o555/gt - lt/persongt
- ltperson ido123 mothero456gtltnamegtJohnlt/namegt
- lt/persongt
oids and references in XML are just syntax
13XML Namespaces
- http//www.w3.org/TR/REC-xml-names (1/99)
- name prefixlocalpart
ltbook xmlnsisbnwww.isbn-org.org/defgt
lttitlegt lt/titlegt ltnumbergt 15 lt/numbergt
ltisbnnumbergt . lt/isbnnumbergt lt/bookgt
14XML Namespaces
- syntactic ltnumbergt , ltisbnnumbergt
- semantic provide URL for schema
lttag xmlnsmystyle http//gt
ltmystyletitlegt
lt/mystyletitlegt ltmystylenumbergt
lt/taggt
15XLink
- Generalizes HTMLs href
- Many types simple, extended, locator, ...
- Discuss only simple links
ltperson xmlnsxlinkhttp///.w3.org/1999/xlink
xlinktypesimple
xlinkhrefhttp//a.b.c/myhomepage.html
xlinktitleThe Homepage
xlinkshowreplace
xlinkactuateonRequestgt ..... lt/persongt
required attributes
optional attributes
16XPointer
- An extension of XPath
- Usage
- hrefwww.a.b.c/document.xmlxpointerExpr
- An xpointer expression points to
- A point
- A range
17XMLDocument Type Definitions
- part of the original XML specification
- an XML document may have a DTD
- terminology for XML
- well-formed if tags are correctly closed
- valid if it has a DTD and conforms to it
- validation is useful in data exchange
18XMLDocument Type Definitions
- part of the original XML specification
- an XML document may have a DTD
- terminology for XML
- well-formed if tags are correctly closed
- valid if it has a DTD and conforms to it
- validation is useful in data exchange
19Very Simple DTD
Example of valid XML document
ltcompanygt ltpersongt ltssngt 123456789 lt/ssngt
ltnamegt John lt/namegt
ltofficegt B432 lt/officegt
ltphonegt 1234 lt/phonegt lt/persongt
ltpersongt ltssngt 987654321 lt/ssngt
ltnamegt Jim lt/namegt
ltofficegt B123 lt/officegt lt/persongt
ltproductgt ... lt/productgt ... lt/companygt
20Content Model
- Element content what we can put in an element
(aka content model) - Content model
- Complex a regular expression over other
elements - Text-only PCDATA
- Empty EMPTY
- Any ANY
- Mixed content (PCDATA A B C)
- (i.e. very restrictied)
21Attributes in DTDs
lt!ELEMENT person (ssn, name, office,
phone?)gt lt!ATTLIS person age CDATA
REQUIRED id
ID REQUIRED
manager IDREF REQUIRED
manages IDREFS
REQUIRED gt
ltperson age25 idp29432
managerp48293 managesp34982
p423234gt ltnamegt ....lt/namegt
... lt/persongt
22Attributes in DTDs
- Kind
- REQUIRED
- IMPLIED optional
- value default value
- value FIXED the only value allowed
23XML Schemas
- http//www.w3.org/TR/xmlschema-1/10/2000
- generalizes DTDs
- uses XML syntax
- two documents structure and datatypes
- http//www.w3.org/TR/xmlschema-1
- http//www.w3.org/TR/xmlschema-2
- XML-Schema is very complex
- often criticized
- some alternative proposals
24RDF
- RDF Model and Syntax
- Language to describe resources
- Use metadata (data about data) to describe Web
resources - Provides interoperability between applications
that exchange machine-understandable information
on the Web - Use XML as a syntax
25RDF Triple
- Resources - Things being described by RDF
expressions. Resources are always named by URIs - HTML Document
- Specific XML element within the document source.
- Collection of pages
- Properties - Specific aspect, characteristic,
attribute or relation used to describe a resource
- Creator
- Title
- Name
- Property Value the corresponding value of the
property related to the resource - John Smith
- Science Fiction
26RDF Schema
- Basic vocabulary to describe RDF
- Defines properties of the resources (e.g., title,
author, subject, etc) - Defines kinds of resources being describes
(books, Web pages, people, etc) - XML Schema gives specific constraints on the
structure of an XML document - RDF Schema provides information about the
interpretation of the RDF statements
27RDF Entailment
- Generate new RDF facts
- Schema and instance information
- Entailment Rules