Title: COMPSCI 732: Semantic Web Technologies
1COMPSCI 732Semantic Web Technologies
- Semantic Web Architecture
2Where are we?
Title
1 Introduction
2 Semantic Web Architecture
3 Resource Description Framework (RDF)
4 Web of Data
5 Generating Semantic Annotations
6 Storage and Querying
7 Web Ontology Language (OWL)
8 Rule Interchange Format (RIF)
3Overview
- Introduction and motivation
- Technical solutions
- Semantic Web architecture
- Uniform Resource Identifier
- eXtensible Markup Language (XML)
- XML Schema
- Namespaces
- Extensions
- Illustration by a large example
- Summary
- References
4INTRODUCTION AND MOTIVATION
5A Semantic Web Scenario From Today
- Queries
- Which type of music is played by UK radio
stations? - Which UK radio station is playing titles by
Swedish composers? - Information to answer query is available on the
Web - Web search engines analyze Web content one page
at a time - The Semantic Web provides better framework to
answer such queries - combines data
- distributed across different sources, and
- described in machine-interpretable manner
6Steps in Answering Queries
- Playlists of BBC radio shows published online in
Semantic Web formats - Music groups such as ABBA have an
identifierhttp//www.bbc.co.uk/music/artists/d87
e52c5-bb8d-4da8-b941-9f4928627dc8artist - Identifier can relate music group to information
at Musicbrainz - Music community portal exposing data on Semantic
Web - http//musicbrainz.org
- Knows about band members (e.g. Benny Andersson)
- Aligns its information with Wikipedia
- Information on UK radio stations may be found in
lists on Web pages - Can be translated into similar Semantic Web
representation
7Describing Things and Their Relationships
- Meaning of Relationships, e.g., band memberships
explained online, too - Using collections of Ontologies available on the
Web - Dublin Core (general properties of information
resources) - http//dublincore.org/
- SKOS (covering taxonomic descriptions)
- http//www.w3.org/2004/02/skos/
- Specialized ontologies (covering the music
domain) - Data at the BBC currently use at least nine
different ontologies - http//www.bbc.co.uk/ontologies/programmes
- Availability of data in these formats enables
queries to be answered - Based on a query language
8Towards the Required Infrastructure
- What infrastructure is required to implement the
scenario from before? - Generic software components, Languages, Protocols
- Their seamless interaction to satisfy requests
- Purpose of Lecture
- Investigate Semantic Web Architecture
- Analyze requirements from technical need to
identify and relate data - Analyze organizational needs to maintain Semantic
Web as a whole
9Web Architecture
- The Semantic Web is an evolution of the Web
- Important for the fast growth and adoption of the
Web are - Many people can set up Web servers easily and
independently from each other - More people can create documents, put them
online, and link them to each other - Even more people can browse and access any Web
server to retrieve documents - Web architecture allows graceful degradation of
user experience when - Network is partially slow (World Wide Wait),
while other parts still operate at full speed - Single Web servers break, because others still
work - Hyperlinks are broken, because other links still
lead somewhere - Separation of concerns justifies less quality
outputs - Users can easily create and access documents
- Distributed nature of system, without need of
central coordinator, results in robustness
10Web Architecture Principles
- Explicit simple data representation
- Common data representation hides underlying
technologies (e.g. HTML) - Distributed system
- Data sources without centralized instance
controlling who owns what type of info - Distributed ownership and control can facilitate
adoption and scalability - E.g. Web pages are under full control of their
producers - Cross-referencing
- Reuse of existing data and data definitions from
different authorities (e.g. hyperlinks) - Loose coupling achieved by common language layers
- Communication in standardized languages
- These must be easy to customize
- Overall communication must not be jeopardized by
such specialization - E.g. Coupling of Web clients/servers HTTP for
transport, HTML for Web content - Ease of publishing and consumption
- Easy publishing and consumption of simple data
- Comprehensive publishing and consumption of
complex data, e.g.HTML simple to convey textual
info powerful browsers/content management systems
11Semantic Web Requirements and Examples
- Must be able to represent entities and their
relationships (1) - A person, the birthday of a person, the name of a
person (Benny Andersson) - Must be serializable in standardized manner to
easily exchange data between different computing
nodes (1,2,4) - Ease of joining information from MusicBrainz,
BBC, DBPedia - Entities must be referable across borders of
ownership or computing systems to allow for
cross-linking of data (1,2,3,4) - ABBAs Benny Andersson becomes hard to
distinguish from other Benny Anderssons - Expressive, machine-understandable data
description language (1,4,5) - Manual inspection not scalable refinements of
basic model impossible - BBC Data involves radio stations, shows, their
versions, songs and their artists - A query and manipulation language to select and
aggregate data (5) - The number of Swedish composers being broadcast
on a specific program - Reasoning desirable to facilitate querying (5)
- Direct relationship between a program and a song
using inference - Transport of data and query and their results by
agreed-upon protocols (HTTP) - May involve encrypted data requests and
transports (HTTPs) signature of data items to
ensure authenticity of user requests and control
access to resources
12Additional Requirements
- Core requirements not yet included in language
architecture - Versatile means for user interaction
- Broad accessibility requires viewing, searching,
browsing, querying of data - While at the same time abstracting from
intricacies underlying their distributed origin - On-the-fly data integration of multiple data
sources assemble information from multitude of
sources without a priori knowledge about domain
or structure of data - Facilitation of data production and publishing
metadata creation and migration of data must be
made convenient, independent from origin of data - Provenance and Trust
- Authorship and ownership get lost during data
processing and aggregation - Origin, Reliability, Trustworthiness must be
rethought to apply them for individual and
aggregated data items, to establish faithful
authentication at Semantic Web scale - Alignment of unconnected sets of data
- Interlinking implies capability to suggest
alignments between identifiers or concepts from
different sets of data, beyond mere use of
identifiers such as URI/IRIs - Such alignment may be necessary to enable a real
Web of Data
13Semantic Web Architecture
- Formalized components and their relationships
- What technologies make up the Semantic Web
- What are the dependencies between components
- Roadmap for steps of developing the Semantic Web
14TECHNICAL SOLUTION
- The Semantic Web architecture and its foundations
15Search and Query the Web I
- The Web is a constantly growing network of
distributed resources - More than 1 trillion unique URLs
- More than 100 billion pages
- More than 200 million web sites
- Check most updated data on http//news.netcraft.c
om/archives/web_server_survey.html - User needs to be able to efficiently search
resources/content over the Web - When I Google Milan do I find info on the city
or the soccer team? - User needs to be able to perform query over
largely distributed resources - When is the next performance of the rock band
U2, where it will be located, what are the best
ways to reach the location, what are the
attractions nearby
16Search and Query the Web II
- On2Broker is the evolution of Ontobroker, a
systems that aims at providing a solution to the
problems discussed in the previous slides by
adopting Semantic Technologies - On2Broker is a system that processes distributed
information sources and that provides intelligent
information retrieval, query answering - On2Broker relies on components of the Semantic
Web Architecture - D. Fensel, S. Decker, M. Erdmann, R. Studer
Ontobroker in a Nutshell. ECDL 1998 663-664
17On2Broker Architecture
18On2Broker Components I
- Query Interface
- Provides a structured input that enables users to
define their queries without any knowledge of the
query language - Input queries are then transformed to the query
language (e.g. SparQL) - Repository
- Decouples query answering, information retrieval
and reasoning - Provide support for materialization of inferred
knowledge
19On2Broker Components II
- Crawlers and Wrappers (or Info Agent)
- Extract knowledge from different distributed and
heterogeneous data sources - RDFa pages and RDF repositories can be included
directly - HTML and XML data sources require processing by
wrappers to derive RDF data - Inference Engine
- Relies on knowledge imported from the crawlers
and axioms contained in the repository to support
query answers - Adopts Horn logic and closed world assumption
20On2Broker Example
- Tim Berners-Lee knows Christian Bizer and Tom
Heath
1. Whom does Tim Berners-Lee know?
2. SELECT DISTINCT ?s ?o WHERE ?s foafknows
?o .
- Extract RDF from http//www.w3.org/People/Berners
-Lee/dblp
- Extract RDF from fensel.comdblp
- Extends KBif x dblpcoauthor y then x
foafknows y - if y foafknows x then x foafknows y
21SemWeb Architecture Requirements
- Extensibility
- Each layer should extend the previous one(s)
- Support for data interchange
- Using data from one source in other applications
- Support for ontology description with different
complexity - Including rules
- Support for data query
- Support for data provenance and trust evaluation
see the Semantic Web Roadmap http//www.w3.org/De
signIssues/Semantic.html
22Semantic Web Stack
Rules RIF
Adapted from http//en.wikipedia.org/wiki/Semantic
_Web_Stack
23UNICODE, URI and XML
- UNICODE is the standard international character
set - E.g. used to encode the data in the repository
- Uniform Resource Identifiers (URIs) identify
things and concepts - E.g. used to identify resources on the Web and in
the repository - Be aware to distinguish between information and
non-information resources - http//www.bbc.co.uk/music/artists/d87e52c5-bb8d-
4da8-b941-9f4928627dc8artist vs.
http//dbpedia.org/resource/ABBA - Data publishers on the Semantic Web use Linked
data principles - Use URIs as names for things
- Use HTTP URIs so that people can look up those
names - When someone looks up a URI, provide useful
information, using standards (RDF,SPARQL) - Include links to other URIs, so that they can
discover more things. - eXtensible Markup Language (XML) used for data
exchange - Used on the Semantic Web to exchange the
description of resources - E.g. format that can be transformed into RDF and
imported into the repository
24RDF, RDFS and OWL
- Resource Description Framework (RDF)
- is the HTML of the Semantic Web
- Simple way to describe resources on the Web
- Based on triples ltsubject, predicate, objectgt
- Various serializations, including one based on
XML - A simple ontology language (RDFS)
- E.g. language used to store the data in the
repository - More in lecture 3
- Web Ontology Language (OWL)
- Is a more complex ontology language than RDFS
- Layered language based on Description Logics
- Overcomes some RDF(S) limitations
- E.g. ontology language used to define the schemas
used in repository - More in lecture 7
25RDF Graph Encoding a Description of ABBA
26RDF Serialized in RDF/XML
- lt?xml version1.0gt
- lt!DOCTYPE rdfRDF
- lt!ENTITY bbca http//www.bbc.co.uk/music/artists/
gt - lt!ENTITY bbci http//www.bbc.co.uk/music/images/a
rtists/gt - lt!ENTITY mba http//musicbrainz.org/artist/gtgt
- ltrdfRDF
- xmlnsrdfhttp//www.w3.org/1999/02/22-rdf-synta
x-ns - xmlnsowlhttp//www.w3.org/2002/07/owl
- xmlnsfoafhttp//xmlns.com/foaf/0.1/
- xmlnsmohttp//purl.org/ontology/mo/gt
- ltmoMusicArtist rdfabouthttp//www.bbc.co.uk/m
usic/artists/d87e52c5-bb8d-4da8-b941-9f4928627dc8
artistgt - ltrdftype rdfresourcehttp//purl.org/ontology/
mo/MusicGroup/gt - ltfoafnamegtABBAlt/foafnamegt
- ltfoafhomepage rdfresourcehttp//www.abbasite.
com//gt - ltmoimage rdfresourcebbci542x305/d87e52c5-bb
8d-4da8-b941-9f4928627dc8.jpggt - ltmomember rdfresourcebbca042c35d3-0756-4804
-b2c2-be57a683efa2artistgt - ltmomember rdfresourcebbca2f031686-3f01-4f33
-a4fc-fb3944532efaartistgt
27RDF Serialized in Turtle
- _at_prefix rdf lthttp//www.w3.org/1999/02/22-rdf-sy
ntax-nsgt . - _at_prefix owl lthttp//www.w3.org/2002/07/owlgt .
- _at_prefix foaf lthttp//xmlns.com/foaf/0.1/gt .
- _at_prefix mo lthttp//purl.org/ontology/mo/gt .
- lthttp//www.bbc.co.uk/music/artists/d87e52c5-bb8d
-4da8-b941-9f4928627dc8artistgt - rdftype moMusicArtist, moMusicGroup
- foafname ABBA
- foafhomepage lthttp//www.abbasite.com/gt
28RDFS and OWL Example
- Reasoning example in RDFS
- rdfssubClassOf can model class hierarchies
- moMusicGroup and moMusicArtist specify two
classes - Axiom ltmoMusicGroup, rdfssubClassOf,
moMusicArtistgt - Stating that ABBA is an instance of type
MusicGroup enables reasoners to conclude that
ABBA is also an instance of type MusicArtist - When query asks for all MusicArtists, then ABBA
will be contained in query result, even though
there is no explicit assertion of this - Reasoning example in OWL
- owlsameAs can be used to specify that two
resources are identical - To consolidate information about ABBA from
multiple sources we can specify
thathttp//www.bbc.co.uk/music/artists/d87e52c5
-bb8d-4da8-b941-9f4928627dc8artist and
http//dbpedia.org/resource/ABBA are the same
29SPARQL and Rule Languages
- SPARQL
- Query language for RDF triples
- A protocol for querying RDF data over the Web
- E.g. language used to query the repository from
the user interface - Can also be used for Updates
- More in lecture 6
- Rule languages (esp. Rule Interchange Format RIF)
- W3C recommendation for exchanging rule sets
between rule engines - Extend ontology languages with proprietary axioms
- Based on different types of logics
- Description Logic
- Logic Programming
- E.g. used to enable reasoning over data to infer
new knowledge - More in lecture 8
30SPARQL Example
- SPARQL query for other music groups that members
of ABBA sing in - PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
tax-nsgt - PREFIX foaf lthttp//xmlns.com/foaf/0.1/gt
- PREFIX mo lthttp//purl.org/ontology/mo/gt
- SELECT ?memberName ?groupName
- WHERE
- lthttp//www.bbc.co.uk/music/artists/d87e52c5-bb8
d-4da8-b941-9f4928627dc8artistgt - momember ?m .
- ?x momember ?m .
- ?x rdftype moMusicGroup .
- ?m foafname ?memberName .
- ?x foafname ?groupName
- FILTER (?groupName ltgt ABBA)
31SPARQL Example
- SPARQL query for other music groups that members
of ABBA sing in - Graphical representation of WHERE clause
lthttp//www.bbc.co.uk/music/artists/d87e52c5-bb8d
-4da8-b941-9f4928627dc8artistgt momember ?m
. ?x momember ?m . ?x rdftype moMusicGroup
. ?m foafname ?memberName . ?x foafname ?gro
upName
32Two RIF rules for mapping FOAF predicates
- True statements in antecedent of rule mean true
statements in its conclusion - if ?x foaffirstName ?first
- foafsurname ?last
- then
- ?x foaffamily_name ?last
- foafgivenname ?first
- foafname funcstring-join(?first ?last)
-
- if ?x foafname ?name and
- predcontains(?name, )
- then
- ?x foaffirstName funcstrong-before(?name,
) - foafsurname funcstrong-after(?name, )
33Logics, Proof and Trust
- Security and Encryption
- HTTPs provides data integrity and confidentiality
when transmitting data and queries - Digital signing of RDF graphs provides
authenticity and non-repudiation - Unifying logic
- Bring together the various ontology and rule
languages - Connect unlinked data to provide more meaning to
data, and drive data integration - E.g. identity management and alignment via
http//sameas.org - Proof
- Explanation of inference results, data provenance
- Trust
- Trust that the system performs correctly
- Trust that the system can explain what it is
doing - Network of trust for data sources and services
- Technology and user interface
- Many open problems, topics for future research
34Foundations
Rules RIF
35UNICODE
36Character Sets
- ASCII 7 bit, 128 characters (a-z, A-Z, 0-9,
punctuation) - Extension code pages 128 chars (ß, Ä, ñ, ø, Š,
etc.) - Different systems, many different code pages
- ISO Latin 1, CP1252 Western languages (197 Å)
- ISO Latin 2, CP1250 East Europe (197 L)
- Code page is an interpretation, not a property of
text - Swedish programmer would have to write ä
aÄiÜ'Ön' ü instead of ai'\n' - Thus if we do not interpret correctly the code
page, the result visualized will not be the
expected one
37UNICODE an unambiguous code
- We need a solution that can be unambiguously
interpreted, i.e. whether a code corresponds to a
single character and vice versa - Thats why UNICODE was created!
- Å L Æ ?
- U0024 U00C5 U0139 U00C6 U03AE
- ? ? ? ? ?
- U0643 U215D U2665 U0416
U0E0D
38UNICODE
- ISO standard
- About 100,000 characters, space for 1,000,000
- Unique code points from U-0000 through U-FFFF to
U-10FFFF - Well-defined process for adding characters
- When dealing with any text, simply use UNICODE
- Character code charts http//www.unicode.org/char
ts/ - See also
- http//www.tbray.org/talks/rubyconf2006.pdf
- http//tbray.org/ongoing/When/200x/2003/04/06/Unic
ode
39URI UNIFORM RESOURCE IDENTIFIERS
- How to identify things on the Web
40Identifier, Resource, Representation
Taken from http//www.w3.org/TR/webarch/
41URI, URN, URL
- A Uniform Resource Identifier (URI) is a string
of characters used to identify a name or a
resource on the Internet - A URI can be a URL or a URN
- A Uniform Resource Name (URN) defines an item's
identity - the URN urnisbn0-395-36341-1 is a URI that
specifies the identifier system, i.e.
International Standard Book Number (ISBN), as
well as the unique reference within that system
and allows one to talk about a book, but doesn't
suggest where and how to obtain an actual copy of
it - A Uniform Resource Locator (URL) provides a
method for finding it - the URL http//www.auckland.ac.nz identifies a
resource (UoA's home page) and implies that a
representation of that resource (such as the home
page's current HTML code, as encoded characters)
is obtainable via HTTP from a network host named
www.auckland.ac.nz
42URI Syntax
- Examples
- http//www.ietf.org/rfc/rfc3986.txt
- mailtoJohn.Doe_at_example.com
- newscomp.infosystems.www.servers.unix
- telnet//melvyl.ucop.edu/
-
- URI Syntax scheme //authority /path
?query fragid - The scheme distinguishes different kinds of URIs
- Authority normally identifies a server
- Path normally identifies a directory and a file
- Query adds extra parameters
- Fragment ID identifies a secondary resource
43URI Syntax contd
- Reserved characters (like /?_at_ )
- Many allowed characters
- Rest percent-encoded by UTF-8
- http//google.com/search?qtechnikerstraC39Fe
- IRI Internationalized Resource Identifier
- Allows whole UNICODE
- Specifies transformation into URI mostly UTF-8
encoding
44URI Schemes
Scheme Description RFC
file Host-specific file names 1738
ftp File Transfer Protocol 1738
http Hypertext Transfer Protocol 2616
https Hypertext Transfer Protocol Secure 2818
im Instant Messaging 3860
imap internet message access protocol 5092
ipp Internet Printing Protocol 3510
iris Internet Registry Information Service 3981
ldap Lightweight Directory Access Protocol 4516
mailto Electronic mail address 2368
mid message identifier 2392
- Schemes partition the URI space into subspaces
- Schemes can add or clarify properties of
resources - Ownership (how authorities are formed)
- Persistence (how stable the URIs should be)
- Protocol (default access protocol)
From http//www.iana.org/assignments/uri-schemes.h
tml
45XML EXTENSIBLE MARKUP LANGUAGE
- How to exchange structured data on the Web
46eXtensible Markup Language
- Language for creating languages
- Meta-language
- XHTML is a language HTML expressed in XML
- W3C Recommendation (standard)
- XML is, for the information industry, what the
container is for international shipping - For structured and semistructured data
- Main plus wide support, interoperability
- Platform-independent
- Applying new tools to old data
47Structure of XML Documents
- Elements, attributes, content
- One root element in document
- Characters, child elements in content
48XML Element
- Syntax ltnamegtcontentslt/namegt
- ltnamegt is called the opening tag
- lt/namegt is called the closing tag
- Examples
- ltgendergtFemalelt/gendergt
- ltstorygtOnce upon a time there was. lt/storygt
- Element names case-sensitive
49Attributes to XML Elements
- Name/value pairs, part of element contents
- Syntax
- ltname attribute_name"attribute_value"gtcontentslt/n
amegt - Values surrounded by single or double quotes
- Example
- lttemperature unit"F"gt64lt/temperaturegt
- ltswearword language'fr'gtconlt/swearwordgt
50Empty Elements
- Empty element ltnamegtlt/namegt
- This can be shortened ltname/gt
- Empty elements may have attributes
- Example
- ltgrade value'A'/gt
51Comments
- May occur anywhere in element contents or outside
the root element - Start with lt!--
- End with --gt
- May not contain a double hyphen
- Comments cannot be nested
- Exampleltelementgtcontent lt!-- a comment, will
be ignored in processing --gtlt/elementgtlt!--
comment outside the root element --gt
52Nesting Elements
- Elements may contain other (child) elements
- The containing element is the parent element
- Elements must be properly nested
- Example with improper nesting
- ltbgtbold ltigtbold-italiclt/bgt italic?lt/igt
- The above is not XML (not well-formed)
53Special Characters in XML
- lt and gt are obviously reserved in content
- Written as lt and gt
- Same for ' and " in attribute values
- Written as apos and quot
- Now is also reserved
- Written as amp
- Any character 223 or xdf ? ß
- Decimal or hexa-decimal unicode code point
- Elements and attributes whose name starts with
xmlare also special
54Uses of XML
- Document mark-up XHTML
- HTML is a language, so it can be expressed in XML
- Exchanged data
- Scalable vector graphics SVG
- E-commerce ebXML
- Messaging in general SOAP
- And many more standards
- Internal data
- Databases
- Configuration files
- Etc.
55Why XML?
- For semistructured data
- Loose but constrained structure
- Unspecified content length
- For structured data
- Table(s) or similar rows
- Well-defined structure, data types
- Good interoperability
- But requirements for quick access, processing
56XML Parsers
- Document Object Model (DOM) builder
- Creates an object model of XML document,
tree-traversal API - In-memory representation, random access
- DOM complex, simpler JDOM etc.
- Simple API for XML parsing (SAX)
- Views XML as stream of events
- el_start("date"), attribute("day", "10"),
el_end("date") - Content reported as callback to methods on
handler object of design - DOM builder can use SAX
- Pull parsers
- Intermediate parsed results can be accessed as
local variables - StAX (JAVA), XMLReader (PHP), System.XML.XMLReader
(.NET)
57NAMESPACES
- How to distinguish categories of resources
58The Problem
- Documents use different vocabularies
- Example 1 CD music collection
- Example 2 online order transaction
- Merging multiple documents together
- Name collisions can occur
- Example 1 albums have a ltnamegt
- Example 2 customers have a ltnamegt
- How do you differentiate between the two?
59The Solution Namespaces!
- What is a namespace?
- A syntactic way to differentiate similar names in
an XML document - Binding namespaces
- Uses Uniform Resource Identifier (URI)
- e.g. http//example.com/NS
- Can bind to a named or default prefix
60Namespace Binding Syntax
- Use xmlns attribute
- Named prefix
- ltafoo xmlnsahttp//example.com/NS/gt
- Default prefix
- ltfoo xmlnshttp//example.com/NS/gt
- Element and attribute names are qualified
- URI, local part (or local name) pair
- e.g. http//example.com/NS , foo
61Example Document I
- Namespace binding
- lt?xml version1.0 encodingUTF-8?gt
- ltordergt
- ltitem codeBK123gt
- ltnamegtCare and Feeding of Wombatslt/namegt
- ltdesc xmlnshtmlhttp//www.w3.org/1999/xhtmlgt
- The lthtmlbgtbestlt/htmlbgt book ever written!
- lt/descgt
- lt/itemgt
- lt/ordergt
62Example Document II
- Namespace scope
- lt?xml version1.0 encodingUTF-8?gt
- ltordergt
- ltitem codeBK123gt
- ltnamegtCare and Feeding of Wombatslt/namegt
- ltdesc xmlnshtmlhttp//www.w3.org/1999/xhtmlgt
- The lthtmlbgtbestlt/htmlbgt book ever written!
- lt/descgt
- lt/itemgt
- lt/ordergt
63Example Document III
- Bound elements
- lt?xml version1.0 encodingUTF-8?gt
- ltordergt
- ltitem codeBK123gt
- ltnamegtCare and Feeding of Wombatslt/namegt
- ltdesc xmlnshtmlhttp//www.w3.org/1999/xhtmlgt
- The lthtmlbgtbestlt/htmlbgt book ever written!
- lt/descgt
- lt/itemgt
- lt/ordergt
64XML SCHEMA
- How to define XML document structures
65What is it?
- A grammar definition language
- More expressive than Document Type Definitions
(DTDs) - Uses XML syntax
- Defined by W3C
- Primary features
- Datatypes
- e.g. integer, float, date, etc
- More powerful content models
- e.g. namespace-aware, type derivation, etc
66XML Schema Types
- Simple types
- Basic datatypes
- Can be used for attributes and element text
- Extendable
- Complex types
- Defines structure of elements
- Extendable
- Types can be named or anonymous
67Simple Types
- DTD datatypes
- Strings, ID/IDREF, NMTOKEN, etc
- Numbers
- Integer, long, float, double, etc
- Other
- Binary (base64, hex)
- QName, URI, date/time
- etc
68Deriving Simple Types
- Apply facets
- Specify enumerated values
- Add restrictions to data
- Restrict lexical space
- Allowed length, pattern, etc
- Restrict value space
- Minimum/maximum values, etc
- Extend by list or union
69A Simple Type Example
- Integer with value (1234, 5678
- ltxsdsimpleType nameMyIntegergt
- ltxsdrestriction basexsdintegergt
- ltxsdminExclusive value1234/gt
- ltxsdmaxInclusive value5678/gt
- lt/xsdrestrictiongt
- lt/xsdsimpleTypegt
70A Simple Type Example II
- Validating integer with value (1234, 5678
- ltdata xsitype'MyInteger'gtlt/datagt INVALID
- ltdata xsitype'MyInteger'gtAndylt/datagt INVALID
- ltdata xsitype'MyInteger'gt-32lt/datagt INVALID
- ltdata xsitype'MyInteger'gt1233lt/datagt INVALID
- ltdata xsitype'MyInteger'gt1234lt/datagt INVALID
- ltdata xsitype'MyInteger'gt1235lt/datagt
- ltdata xsitype'MyInteger'gt5678lt/datagt
- ltdata xsitype'MyInteger'gt5679lt/datagt INVALID
71Complex Types
- Element content models
- Simple
- Mixed
- Unlike DTDs, elements in mixed content can be
ordered - Sequences and choices
- Can contain nested sequences and choices
- All
- All elements required but order is not important
72A Complex Type Example I
- Mixed content that allows ltbgt, ltigt, and ltugt
- ltxsdcomplexType nameRichText mixedtruegt
- ltxsdchoice minOccurs0 maxOccursunboundedgt
- ltxsdelement nameb typeRichText/gt
- ltxsdelement namei typeRichText/gt
- ltxsdelement nameu typeRichText/gt
- lt/xsdchoicegt
- lt/xsdcomplexTypegt
73A Complex Type Example II
- Validation of RichText
- ltcontent xsitype'RichText'gtlt/contentgt
- ltcontent xsitype'RichText'gtAndylt/contentgt
- ltcontent xsitype'RichText'gtXML is
ltigtawesomelt/igt.lt/contentgt - ltcontent xsitype'RichText'gtltBgtboldlt/Bgtlt/contentgt
INVALID - ltcontent xsitype'RichText'gtltfoo/gtlt/contentgt
INVALID
74EXTENSIONS
75Building On The Foundations
- RDF for semantic data
- Graphs of linked data
- Semantic Web
- Any XML or HTML can support translation to RDF
- GRDDL a pointer to a transformation
- RDFa RDF in XHTML
- Makes existing data part of the Semantic Web
- XML has encryption and digital signature
- Necessary technologies for data provenance, trust
76Web of Linked Data
77RDF Teaser
- Resource Description Framework
- Metadata about Web resources
- But also any other data
- Graphs of resources interlinked with properties
- Rafael Nadal plays Tennis
- knows Shakira
- Shakira sings Waka waka
- Ontology languages for data schemas
- Various properties knows, plays, sings
- Classes of resources Person, Athlete, Singer,
Sport, Song - SPARQL for querying the data
78ILLUSTRATION BY A LARGER EXAMPLE
- The Semantic Web architecture in practice
79Semantic Conference
- All the data about the conference is part of the
Semantic Web - Date, location
- Organizers, peer-review committees
- Articles (papers), their authors
- Detailed program schedule
- Each Semantic Web architecture layer plays a
role - ISWC is annotating conference data usingSemantic
Web technologies - http//data.semanticweb.org/conference/iswc/2011/h
tml - Currently available data regards only papers and
authors - This could be extended to support features
discussed above
80Foundation Layers
- UNICODE
- All participants' names should be in UNICODE
because they are international Denny Vrandecic,
Diego Meroño, François Maué - Same for paper titles "a-decay and ß-decay of
heavy atoms" - URI All important things must have identifiers,
for example - Conference http//data.semanticweb.org/conference
/iswc/2011 - Participant http//data.semanticweb.org/person/pi
ero-bonatti - Participant's affiliation http//data.semanticweb
.org/organization/talis-information-limited - Paper http//data.semanticweb.org/conference/iswc
/2011/paper/tutorial/7
81Data Layers
- XML
- The HTML pages should be in XHTML
- The RDF data (below) should be in RDF/XML
- News feed should be in Atom (an XML format)
- RDF
- The conference dataset, and any useful subsets,
should be published in RDF for download for
example - http//data.semanticweb.org/conference/iswc/2011/r
df
82Ontologies, Query
- RDFS, OWL
- The conference would use various vocabularies and
ontologies, such as - FOAF (Friend of a friend) for talking about the
attendees and authors/presenters - Dublin Core for paper metadata
- Calendar ontology for the program
- SPARQL
- The conference server should have a public SPARQL
endpoint that can be used for queries over the
conference data - http//data.semanticweb.org/snorql/
83Browsing ISWC Data
http//data.semanticweb.org/person/tom-heath/html
84Querying ISWC Data
http//data.semanticweb.org/snorql/
85SUMMARY
- Thats almost all for today
86Things to Keep in Mind
- Semantic Web builds on the Web
- For any text, use UNICODE, probably UTF-8
- URIs can identify anything
- Not only documents on the Web
- XML helps with data exchange, interoperability
- XML languages are distinguished with namespaces
87References
- Mandatory
- http//www.w3.org/TR/webarch/
- http//www.w3.org/DesignIssues/Architecture.html
- Further reading
- http//www.w3.org/Provider/Style/URI
- http//www.ietf.org/rfc/rfc3986.txt
- http//www.unicode.org/charts/
- http//www.tbray.org/talks/rubyconf2006.pdf
- http//tbray.org/ongoing/When/200x/2003/04/06/Unic
ode - http//www.w3.org/TR/xml/
- http//www.w3.org/TR/xml-names/
- http//www.w3.org/TR/xmlschema-1/
- Fensel et al., On2broker Semantic-Based Access
to Information Sources at - the WWW
- Fensel et al. Ontobroker in a Nutshell
- http//www.ics.uci.edu/fielding/pubs/dissertation
/top.htm - http//www.w3.org/DesignIssues/Semantic.html
88References
- Wikipedia links
- http//en.wikipedia.org/wiki/Semantic_Web_Stack
- http//en.wikipedia.org/wiki/URI
- http//en.wikipedia.org/wiki/Unicode
- http//en.wikipedia.org/wiki/XML
- http//en.wikipedia.org/wiki/XML_Namespaces
- http//en.wikipedia.org/wiki/Resource_Description_
Framework - http//en.wikipedia.org/wiki/RDF_Schema
- http//en.wikipedia.org/wiki/Web_Ontology_Language
- http//en.wikipedia.org/wiki/SPARQL
- http//en.wikipedia.org/wiki/Rule_Interchange_Form
at
89Next Lecture
Title
1 Introduction
2 Semantic Web Architecture
3 Resource Description Framework (RDF)
4 Web of Data
5 Generating Semantic Annotations
6 Storage and Querying
7 Web Ontology Language (OWL)
8 Rule Interchange Format (RIF)
90Questions?
90