Title: Representing%20Data%20with%20XML
1Representing Data with XML
- September 27, 2005
- Shawn Henry
- with slides from Neal Arthorne
2Data Representation
- Design goals for data representation
- Portable (platform independent)
- Easy for machines to process
- Human legible
- Flexible and usable over the Internet and other
networks - Concisely defined with formal rules
3Extensible Markup Language
- World Wide Web Consortium (W3C) defines the
Extensible Markup Language (XML) - W3C also defined HTML, CSS, HTTP, SVG and other
markup languages - XML Working group formed in 1996
- XML 1.0 (Third Edition) 4 February 2004 (original
Recommendation in 1998)
4XML Example
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltfoodsgt
- ltpizza titleDeluxe Pizzagt
- ltnamegtThe Deluxelt/namegt
- lttoppingsgt
- lttoppinggtpepperslt/toppinggt
- lttoppinggtpepperonilt/toppinggt
- lttoppinggtmushroomslt/toppinggt
- lttoppinggtcheeselt/toppinggt
- lttoppinggttomato saucelt/toppinggt
- lt/toppingsgt
- ltpricegt7.99lt/pricegt
- lt/pizzagt
- lt/foodsgt
- XML documents should be well-formed (syntax,
closing tags etc) - XML documents are valid if they conform to a
specified grammar (usually DTD or XML Schema) - DTDs (Document Type Definitions) provide a
grammar for the XML by defining elements,
attributes and entities
6XML Advantages
- XML provides
- Logical structure for data in a textual
representation - Formal rules for validating documents
- Flexibility to define your own markup language
- Portability across networks and platforms
- Becoming a widely accepted data interchange
format - Processed with off-the-shelf tools
7XML Disadvantages
- XML drawbacks
- Not a binary format so it requires a lot of
overhead for a little bit of data - Very little support for binary or mixed media
data formats (hex or base64 encoding) - Only for data and holds no semantics or reasoning
- DTDs do not provide
- Data types for each element or attribute
- Complex structural rules for documents
8XML Schema
- XML Schema defines a new schema language to
replace DTD - Standardized by W3C in 2001
- Advantages
- Provides data typing and logical structure
- Written in XML (easy to process)
- Higher complexity than DTD
9XML Schema Example
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltxsdschema xmlnsxsd"http//www.w3.org/2001/XMLS
chema"gt - ltxsdelement name"pizza"gt
- ltxsdcomplexTypegt
- ltxsdallgt
- ltxsdelement name"name" type"xsdstring" /gt
- ltxsdelement name"toppings" type"Toppings"
/gt - ltxsdelement name"price" type"xsdfloat" /gt
- lt/xsdallgt
- ltxsdattribute name"title" type"xsdstring"
/gt - lt/xsdcomplexTypegt
- lt/xsdelementgt
- ltxsdcomplexType name"Toppings"gt
- ltxsdsequencegt
- ltxsdelement name"topping" minOccurs"1"
maxOccurs"unbounded" type"xsdstring" /gt - lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdschemagt
- An XML document is an instance document of an
XML Schema
10Simple Types
- Simple Types are of three varieties
- Atomic Built-in or derived, e.g.
- ltxsdsimpleType name"myInteger"gt
- ltxsdrestriction base"xsdinteger"gt
- ltxsdminInclusive value"10000"/gt
- ltxsdmaxInclusive value"99999"/gt
- lt/xsdrestrictiongt
- lt/xsdsimpleTypegt
- List multiple items of the same type
- ltlistOfMyIntgt20003 15037 95977 95945lt/listOfMyIntgt
- Union Union or two or more Simple Types
11Built-in Types
- XML Schema defines numerous built-in types
- integer, decimal, token, byte, boolean, date,
time, short, long, float, anyURI, language - Facets can be used to restrict existing types
- min/maxInclusive, min/maxExclusive, pattern,
enumeration, min/maxLength, length, totalDigits,
12Complex Types
- Complex Types define logical structures with
attributes and nested elements - They use a sequence, choice or all containing
elements that use Simple Types or other Complex
Types - May reference types defined elsewhere in the
schema or imported using import statement
13In the Schema of Things
- XML Schema supersedes DTD
- Defines a typed data format with no semantics or
relations between data - Next step higher level of abstraction and the
ability to define objects and relations
14Resource Description Framework
- W3C standard for describing resources on the
World Wide Web (1999, revised 2004) - Objects identified by Uniform Resource
Identifiers (URIs) - Generalized to identify objects that may not be
retrievable on the Web - RDF represented by a directed graph and in XML
15RDF Example
Federico Diaz
- In English http//www.example.com/people/diaz/con
tact has the full name Federico Diaz and has an
employer called Fisher and Sons.
16RDF Parts
- Each RDF statement is a triple containing a
subject (identifier by URI), a predicate (e.g.
creator, title, full name) and an object - An object can be either a literal value (e.g.
Federico Diaz) or another RDF resource - All three parts can be identified with an URI and
fragment identifier
17RDF Semantics
- RDF attaches no specific meaning to RDF
statements just like the name of a database
field is meaningless to an SQL engine - RDF does provide a way to attach data types to
literal values, but RDF does not define data
types - Generally RDF software uses the XML Schema data
types - ltsize rdfdatatypexsdintgt10lt/sizegt
- Arbitrary XML can also be used as a literal
- ltxprop rdfparseType"Literalgt
18RDF Schema
- RDF Schema is a vocabulary description language
that relates resources to each other using RDF - RDFS uses classes of objects like in
Object-Oriented (OO) systems - Class properties relate to other classes using OO
concepts such as generalization
19RDF Schema Use
- Differs from OO in that Properties are defined in
terms of the resources to which they apply (their
domain) they are not restricted to the scope of
a single class - domain Classes to which a Property applies
- range The Class of a Property (i.e. type)
- Allows new Properties to be created that apply to
the same domain without redefining the domain
20RDFS Classes
- Classes introduced by RDFS
- Resource - top level class
- Literal all literal values like text strings
- Class the class of all classes
- Datatype top level RDF datatype
- Properties introduced by RDFS
- subClassOf
- subPropertyOf
- domain domain of a Property
- range range of a Property
- label, comment, seeAlso human readable labels
21RDFS Example
- lt?xml version"1.0"?gt
- lt!DOCTYPE rdfRDF lt!ENTITY xsd
"http//www.w3.or/2001/XMLSchema"gtgt - ltrdfRDF
- xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synt
ax-ns" - xmlnsrdfs"http//www.w3.org/2000/01/rdf-schema
" - xmlbase"http//example.org/schemas/food"gt
- ltrdfsClass rdfID"Food"/gt
- ltrdfsClass rdfID"Pizza"gt
- ltrdfssubClassOf rdfresource"Food"/gt
- lt/rdfsClassgt
- ltrdfsClass rdfID"Topping"gt
- ltrdfssubClassOf rdfresource"Food"/gt
- lt/rdfsClassgt
- ltrdfsDatatype rdfabout"xsdfloat"/gt
- ltrdfProperty rdfID"hasTopping"gt
22RDF Example
- lt?xml version"1.0"?gt
- lt!DOCTYPE rdfRDF lt!ENTITY xsd
"http//www.w3.org/2001/XMLSchema"gtgt - ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
df-syntax-ns" - xmlnsex"http//example.org/schemas/f
ood" - xmlbase"http//example.org/things"gt
- ltexPizza rdfID"ShawnsPizza"gt
- ltexprice
- rdfdatatype"xsdfloat"gt12.99lt/expri
cegt - ltexhasTopping rdfresource"http//www.exa
mple.org/food/85740"/gt - ltexhasTopping rdfresource"http//www.exampl
e.org/food/85729"/gt - lt/exPizzagt
- lt/rdfRDFgt
- Lets authors create vocabularies of Classes and
Properties and show how the terms should be used
to describe resources, e.g. - Property author applies to class Book
- Class Employee is a subclass of Person
- Does not define descriptive properties such as
dateOfIssue or title but references them
using URIs - Like in XML/XML Schema, an RDF instance document
can be validated against its RDF Schema
24Machines Understanding the Web
- RDF/RDFS along with XML/XML Schema provide a
means to describe resources on the web with basic
generalization - For a higher conceptual level, applications
require semantic information - Ontologies serve as a starting point for
25Ontologies on the Web
- Ontologies define the terms used to represent an
area of knowledge. OWL Use Cases
Requirements, 2004 - Example use cases
- A web portal that needs to classify information
- Multimedia archive that requires a taxonomy of
media or content-specific properties - Corporate portal website that integrates
vocabularies from different departments
26Web Ontology Language (OWL)
- Supersedes DAMLOIL
- DARPA Agent Markup Language (DAML) was based on
RDF/RDFS and includes much of what is now OWL - Adds terms used to better describe relations
between classes of RDF resources - With OWL, ontologies can be integrated, extended
and shared
27Web Ontology Language
- Individuals
- OWL does not honour the Unique Names Assumption
(UNA) - Properties
- Binary relations between individuals
- Functional, transitive or symmetric
- Classes
- Sets containing individuals
- Organized into a taxonomy with subclasses and
28Three Flavours of OWL
- OWL Lite
- For classification hierarchies with simple
constraints - OWL DL
- Expressiveness with computational completeness
- OWL Full
- Maximum expressiveness
- No computational guarantees
- Extension of RDF
29OWL Features
- OWL improvements on RDF/RDFS
- Cardinality
- min/maxCardinality for Properties with respect to
a Class - Equality, disjointness
- equivalentClass, equivalentProperty, sameAs,
differentFrom, disjointWith - Transitive, Symmetric, Functional Properties
- labelling a Property allows for reasoning
- A has B and B has C implies A has C (Transitive)
- A has B implies B has A (Symmetric)
30OWL Features (contd)
- Boolean expressions of Class relations
- unionOf, complementOf, intersectionOf
- Property restrictions
- Limits how properties can be used by an instance
of a class - Versioning
- priorVersion, versionInfo, incompatibleWith,
Conceptual level reasoning smart applications
Knowledge processing and reasoning
RDF Schema
Resource description and vocabulary
Knowledge Data
XML Schema
Data formatting and data types
Unicode/ISO byte streams
Machine data representation
- World Wide Web Consortium http//www.w3.org
- XML http//www.w3.org/TR/REC-xml
- XML Schema Part 0 Primer http//www.w3.org/TR/xml
schema-0/ - RDF Primer http//www.w3.org/TR/rdf-primer/
- RDF Concepts http//www.w3.org/TR/rdf-concepts/
- RDF/XML Syntax http//www.w3.org/TR/rdf-syntax-gra
mmar/ - RDF Schema http//www.w3.org/TR/rdf-schema/
- OWL Use Cases Requirements http//www.w3.org/TR/
webont-req/ - OWL Overview http//www.w3.org/TR/owl-features/