Title: Semantic Web Standards
1Semantic Web Standards
Slides based on Ian Horrocks class
2Where we are Today the Syntactic Web
Hendler Miller 02
3The Syntactic Web is
- A hypermedia, a digital library
- A library of documents called (web pages)
interconnected by a hypermedia of links - A database, an application platform
- A common portal to applications accessible
through web pages, and presenting their results
as web pages - A platform for multimedia
- BBC Radio 4 anywhere in the world! Terminator 3
trailers! - A naming scheme
- Unique identity for those documents
- A place where computers do the presentation
(easy) and people do the linking and interpreting
(hard). - Why not get computers to do more of the hard
work?
Goble 03
4Hard Work using the Syntactic Web
Find images of Peter Patel-Schneider, Frank van
Harmelen and Alan Rector
Rev. Alan M. Gates, Associate Rector of the
Church of the Holy Spirit, Lake Forest, Illinois
5Impossible (?) using the Syntactic Web
- Complex queries involving background knowledge
- Find information about animals that use sonar
but are not either bats or dolphins - Locating information in data repositories
- Travel enquiries
- Prices of goods and services
- Results of human genome experiments
- Finding and using web services
- Visualise surface interactions between two
proteins - Delegating complex tasks to web agents
- Book me a holiday next weekend somewhere warm,
not too far away, and where they speak French or
English
6What is the Problem?
- Consider a typical web page
- Markup consists of
- rendering information (e.g., font size and
colour) - Hyper-links to related content
- Semantic content is accessible to humans but not
(easily) to computers
7What information can we see
- WWW2002
- The eleventh international world wide web
conference - Sheraton waikiki hotel
- Honolulu, hawaii, USA
- 7-11 may 2002
- 1 location 5 days learn interact
- Registered participants coming from
- australia, canada, chile denmark, france,
germany, ghana, hong kong, india, ireland, italy,
japan, malta, new zealand, the netherlands,
norway, singapore, switzerland, the united
kingdom, the united states, vietnam, zaire - Register now
- On the 7th May Honolulu will provide the backdrop
of the eleventh international world wide web
conference. This prestigious event - Speakers confirmed
- Tim berners-lee
- Tim is the well known inventor of the Web,
- Ian Foster
- Ian is the pioneer of the Grid, the next
generation internet
8What information can a machine see
- WWW2002
- The eleventh international world wide web
conference - Sheraton waikiki hotel
- Honolulu, hawaii, USA
- 7-11 may 2002
- 1 location 5 days learn interact
- Registered participants coming from
- australia, canada, chile denmark, france,
germany, ghana, hong kong, india, ireland, italy,
japan, malta, new zealand, the netherlands,
norway, singapore, switzerland, the united
kingdom, the united states, vietnam, zaire - Register now
- On the 7th May Honolulu will provide the backdrop
of the eleventh international world wide web
conference. This prestigious event - Speakers confirmed
- Tim berners-lee
- Tim is the well known inventor of the Web,
- Ian Foster
- Ian is the pioneer of the Grid, the next
generation internet
9Solution XML markup with meaningful tags?
- ltnamegtWWW2002
- The eleventh international world wide
webconlt/namegt - ltlocationgtSheraton waikiki hotel
- Honolulu, hawaii, USAlt/locationgt
- ltdategt7-11 may 2002lt/dategt
- ltslogangt1 location 5 days learn interactlt/slogangt
- ltparticipantsgtRegistered participants coming from
- australia, canada, chile denmark, france,
germany, ghana, hong kong, india, ireland, italy,
japan, malta, new zealand, the netherlands,
norway, singapore, switzerland, the united
kingdom, the united states, vietnam,
zairelt/participantsgt - ltintroductiongtRegister now
- On the 7th May Honolulu will provide the backdrop
of the eleventh international world wide web
conference. This prestigious event - Speakers confirmedlt/introductiongt
- ltspeakergtTim berners-leelt/speakergt
- ltbiogtTim is the well known inventor of the
Web,lt/biogt
10But What About
- ltconfgtWWW2002
- The eleventh international world wide
webconlt/confgt - ltplacegtSheraton waikiki hotel
- Honolulu, hawaii, USAlt/placegt
- ltdategt7-11 may 2002lt/dategt
- ltslogangt1 location 5 days learn interactlt/slogangt
- ltparticipantsgtRegistered participants coming from
- australia, canada, chile denmark, france,
germany, ghana, hong kong, india, ireland, italy,
japan, malta, new zealand, the netherlands,
norway, singapore, switzerland, the united
kingdom, the united states, vietnam,
zairelt/participantsgt - ltintroductiongtRegister now
- On the 7th May Honolulu will provide the backdrop
of the eleventh international world wide web
conference. This prestigious event - Speakers confirmedlt/introductiongt
- ltspeakergtTim berners-leelt/speakergt
- ltbiogtTim is the well known inventor of the Web,
11Machine sees
- ltnamegtWWW2002
- The eleventh international world wide webclt/namegt
- ltlocationgtSheraton waikiki hotel
- Honolulu, hawaii, USAlt/locationgt
- ltdategt7-11 may 2002lt/dategt
- ltslogangt1 location 5 days learn interactlt/slogangt
- ltparticipantsgtRegistered participants coming from
- australia, canada, chile denmark, france,
germany, ghana, hong kong, india, ireland, italy,
japan, malta, new zealand, the netherlands,
norway, singapore, switzerland, the united
kingdom, the united states, vietnam,
zairelt/participantsgt - ltintroductiongtRegister now
- On the 7th May Honolulu will provide the backdrop
of the eleventh international world wide web
conference. This prestigious event - Speakers confirmedlt/introductiongt
- ltspeakergtTim berners-leelt/speakergt
- ltbiogtTim is the well known inventor of the
Wlt/biogt - ltspeakergtIan Fosterlt/speakergt
- ltbiogtIan is the pioneer of the Grid, the nelt/biogt
12Need to Add Semantics
- External agreement on meaning of annotations
- E.g., Dublin Core
- Agree on the meaning of a set of annotation tags
- Problems with this approach
- Inflexible
- Limited number of things can be expressed
- Use Ontologies to specify meaning of annotations
- Ontologies provide a vocabulary of terms
- New terms can be formed by combining existing
ones - Meaning (semantics) of such terms is formally
specified - Can also specify relationships between terms in
multiple ontologies
13History of the Semantic Web
- Web was invented by Tim Berners-Lee (amongst
others), a physicist working at CERN - TBLs original vision of the Web was much more
ambitious than the reality of the existing
(syntactic) Web - TBL (and others) have since been working towards
realising this vision, which has become known as
the Semantic Web - E.g., article in May 2001 issue of Scientific
American
14Scientific American, May 2001
Beware of the Hype
15Beware of the Hype
- Hype seems to suggest that Semantic Web means
semantics web AI - A new form of Web content that is meaningful to
computers will unleash a revolution of new
abilities - More realistic to think of it as meaning
semantics web AI more useful web - Realising the complete vision is too hard for
now (probably) - But we can make a start by adding semantic
annotation to web resources
Images from Christine Thompson and David Booth
16Web Schema Languages
- Existing Web languages extended to facilitate
content description - XML ? XML Schema (XMLS)
- RDF ? RDF Schema (RDFS)
- XMLS not an ontology language
- Changes format of DTDs (document schemas) to be
XML - Adds an extensible type hierarchy
- Integers, Strings, etc.
- Can define sub-types, e.g., positive integers
- RDFS is recognisable as an ontology language
- Classes and properties
- Sub/super-classes (and properties)
- Range and domain (of properties)
17RDF and RDFS
- RDF stands for Resource Description Framework
- It is a W3C candidate recommendation
(http//www.w3.org/RDF) - RDF is graphical formalism ( XML syntax
semantics) - for representing metadata
- for describing the semantics of information in a
machine- accessible way - RDFS extends RDF with schema vocabulary, e.g.
- Class, Property
- type, subClassOf, subPropertyOf
- range, domain
18The RDF Data Model
- Statements are ltsubject, predicate, objectgt
triples
- Can be represented using XML serialisation, e.g.
- ltIan,hasColleague,Uligt
- Statements describe properties of resources
- A resource is a URI representing a (class of)
object(s) - a document, a picture, a paragraph on the Web
- http//www.cs.man.ac.uk/index.html
- a book in the library, a real person (?)
- isbn//5031-4444-3333
-
- Properties themselves are also resources (URIs)
19URIs
- URI Uniform Resource Identifier
- "The generic set of all names/addresses that are
short strings that refer to resources - URIs may or may not be dereferencable
- URLs (Uniform Resource Locators) are a particular
type of URI, used for resources that can be
accessed on the WWW (e.g., web pages) - In RDF, URIs typically look like normal URLs,
often with fragment identifiers to point at
specific parts of a document - http//www.somedomain.com/some/path/to/filefragme
ntID
20Linking Statements
- The subject of one statement can be the object of
another - Such collections of statements form a directed,
labeled graph - Note that the object of a triple can also be a
literal (a string) - Note also that RDF triples dont by themselves
give meaning - You know that (1) Ian and Carol are most likely
colleagues (barring multiple jobs for Uli (2)
(Uli hasCollegue Ian) holds (colleagueness
unlike love is symmetric). But DOES YOUR
PROGRAM KNOW THIS?
21RDF Syntax
- RDF has an XML syntax that has a specific
meaning - Every Description element describes a resource
- Every attribute or nested element inside a
Description is a property of that Resource with
an associated object resource - Resources are referred to using URIs
- ltDescription about"some.uri/person/ian_horrocks"
gt - lthasColleague resource"some.uri/person/uli_sa
ttler"/gt - lt/Descriptiongt
- ltDescription about"some.uri/person/uli_sattler"gt
- lthasHomePagegthttp//www.cs.mam.ac.uk/sattlerlt
/hasHomePagegt - lt/Descriptiongt
- ltDescription about"some.uri/person/carole_goble"
gt - lthasColleague resource"some.uri/person/uli_sa
ttler"/gt - lt/Descriptiongt
22RDF Schema (RDFS)
- RDF gives a formalism for meta data annotation,
and a way to write it down in XML, but it does
not give any special meaning to vocabulary such
as subClassOf or type - Interpretation is an arbitrary binary relation
- I.e., ltPerson,subClassOf,Animalgt has no special
meaning - RDF Schema defines schema vocabulary that
supports definition of ontologies - gives extra meaning to particular RDF
predicates and resources (such as subClasOf) - this extra meaning, or semantics, specifies how
a term should be interpreted
23Background Theory
RDF Schema is really RDF background knowledge!
Instances
24RDF/RDFS vs. General Knowledge Rep Reasoning
- We noted that RDF can be seen as base level
facts and RDFS can be seen as background
theory/facts/rules - At this level, inference with RDF/RDFS seems to
be just a special case of Knowledge
Representation Reasoning - This is good (CSE471 Ahoy!) and bad (reasoning
over most non-trivial logics is NP-hard or much
much worse). - RDF/RDFS can be seen as an attempt to limit the
complexity of reasoning by limiting the
expressiveness of what can be expressed - RDF/RDFS together can be seen as capturing a
certain tractable subset of First Order Logic - ..already there is trouble in paradise with
people complaining that the expressiveness is not
enough - Enter OWL, which attempts to provide
expressiveness equivalent to description logics
(a sort of inheritance reasoning in First-order
logic)
25Problems with RDFS
- RDFS too weak to describe resources in sufficient
detail - No localised range and domain constraints
- Cant say that the range of hasChild is person
when applied to persons and elephant when applied
to elephants - No existence/cardinality constraints
- Cant say that all instances of person have a
mother that is also a person, or that persons
have exactly 2 parents - No transitive, inverse or symmetrical properties
- Cant say that isPartOf is a transitive property,
that hasPart is the inverse of isPartOf or that
touches is symmetrical -
- Difficult to provide reasoning support
- No native reasoners for non-standard semantics
- May be possible to reason via FO axiomatisation
26RDFS Examples
- RDF Schema terms (just a few examples)
- Class
- Property
- type
- subClassOf
- range
- domain
- These terms are the RDF Schema building blocks
(constructors) used to create vocabularies - ltPerson,type,Classgt
- lthasColleague,type,Propertygt
- ltProfessor,subClassOf,Persongt
- ltCarole,type,Professorgt
- lthasColleague,range,Persongt
- lthasColleague,domain,Persongt
27RDF/RDFS Liberality
- No distinction between classes and instances
(individuals) - ltSpecies,type,Classgt
- ltLion,type,Speciesgt
- ltLeo,type,Liongt
- Properties can themselves have properties
- lthasDaughter,subPropertyOf,hasChildgt
- lthasDaughter,type,familyPropertygt
- No distinction between language constructors and
ontology vocabulary, so constructors can be
applied to themselves/each other - lttype,range,Classgt
- ltProperty,type,Classgt
- lttype,subPropertyOf,subClassOfgt
28RDF Schema is now being superseded by OWL
29Web Ontology Language Requirements
- Desirable features identified for Web Ontology
Language - Extends existing Web standards
- Such as XML, RDF, RDFS
- Easy to understand and use
- Should be based on familiar KR idioms
- Formally specified
- Of adequate expressive power
- Possible to provide automated reasoning support
30From RDF to OWL
- Two languages developed to satisfy above
requirements - OIL developed by group of (largely) European
researchers (several from EU OntoKnowledge
project) - DAML-ONT developed by group of (largely) US
researchers (in DARPA DAML programme) - Efforts merged to produce DAMLOIL
- Development was carried out by Joint EU/US
Committee on Agent Markup Languages - Extends (DL subset of) RDF
- DAMLOIL submitted to W3C as basis for
standardisation - Web-Ontology (WebOnt) Working Group formed
- WebOnt group developed OWL language based on
DAMLOIL - OWL language now a W3C Recommendation (i.e., a
standard like HTML and XML)
31OWL Language
- Three species of OWL
- OWL full is union of OWL syntax and RDF
- OWL DL restricted to FOL fragment (¼ DAMLOIL)
- OWL Lite is easier to implement subset of OWL
DL - Semantic layering
- OWL DL ¼ OWL full within DL fragment
- DL semantics officially definitive
- OWL DL based on SHIQ Description Logic
- In fact it is equivalent to SHOIN(Dn) DL
- OWL DL Benefits from many years of DL research
- Well defined semantics
- Formal properties well understood (complexity,
decidability) - Known reasoning algorithms
- Implemented systems (highly optimised)
32Layer 4½ Mapping Between Ontologies
- Taxonomy Crisis
- How can your agent know that my title is your
name?! - How can my agent know that some of your address
objects are post-boxes, not physical addresses?! - How can my agent know that many Asian first names
correspond to Western surnames? - Semantic Web Solution Services for
translating/mapping between related ontologies. - Suppose Amazon.com uses Dublin Core (title),
while Fred Hanna uses its own document ontology
(name). So far my agent is forced to choose
a ontology, or must be carefully crafted to
understand both lanuages - A better solution A niche now exists for a
independent entity (UniversalBookInfo.com) that
maps title ? name etc
33without UniversalBookInfo.com
Nick wants tobuy War Peace
Nicksvery complicatedagent
Programmersbank account
Amazonontology
FredHannaontology
Amazon
Fred Hanna
34with UniversalBookInfo.com
Nick wants tobuy War Peace
Nicks agent
Joes agent
Janes Agent
UniversalBookInfo.com
Amazon
Fred Hanna
Bank Account
35(In)famous Layer Cake
? Semanticsreasoning
?
? Relational Data
?
? Data Exchange
- Relationship between layers is not clear
- OWL DL extends DL subset of RDF
36Who will annotate the data?
- Semantic web works if the users annotate their
pages using some existing ontology (or their own
ontology, but with mapping to other ontologies) - But users typically do not conform to standards..
- and are not patient enough for delayed
gratification - Two Solutions
- 1. Intercede in the way pages are created (act as
if you are helping them write web-pages) - What if we change the MS Frontpage/Claris
Homepage so that they (slyly) add annotations? - E.g. The Mangrove project at U. Wash.
- Help user in tagging their data (allow graphical
editing) - Provide instant gratification by running services
that use the tags. - 2. Collaborative tagging!
- Folksonomies (look at Wikipedia article)
- FLICKR, Technorati, deli.cio.us etc
- 3. Automated information extraction (next topic)
37FolksonomiesThe good
- Bottom-up approach to taxonomies/ontologies
- In systems like Furl, Flickr and Del.icio.us...
people classify their pictures/bookmarks/web
pages with tags (e.g. wedding), and then the most
popular tags float to the top (e.g. Flickr's tags
or Del.icio.us on the right).... - Folksonomies can work well for certain kinds of
information because they offer a small reward for
using one of the popular categories (such as your
photo appearing on a popular page). People who
enjoy the social aspects of the system will
gravitate to popular categories while still
having the freedom to keep their own lists of
tags.
Classic case of research playing catch-up with
practice -)
38Works best when Many people Tag the same Info
39Folksonomies the bad
- On the other hand, not hard to see a few reasons
why a folksonomy would be less than ideal in a
lot of cases - None of the current implementations have synonym
control (e.g. "selfportrait" and "me" are
distinct Flickr tags, as are "mac" and
"macintosh" on Del.icio.us). - Also, there's a certain lack of precision
involved in using simple one-word tags--like
which Lance are we talking about? (Though this is
great for discovery, e.g. hot or Edmonton) - And, of course, there's no heirarchy and the
content types (bookmarks, photos) are fairly
simple. - For indexing and library people, folksonomies are
about as appealing as Wikipedia is to
encyclopedia editors. - But.. there's some interesting stuff happening
around them.
Computizing Eyeballs
(brain) cycle stealing
40Collaborative Computing AKA Brain Cycle
StealingAKA Computizing Eyeballs
- A lot of exciting research related to web
currently involves co-opting the masses to help
with large-scale tasks - It is like cycle stealingexcept we are
stealing human brain cycles (the most idle of
the computers if there is ever one -) - Remember the mice in the Hitch Hikers Guide to
the Galaxy? (..who were running a mass-scale
experiment on the humans to figure out the
question..) - Collaborative knowledge compilation (wikipedia!)
- Collaborative Curation
- Collaborative tagging
- Many big open issues
- How do you pose the problem such that it can be
solved using collaborative computing? - How do you incentivize people into letting you
steal their brain cycles?