Title: The Semantic Web -- an overview --
1The Semantic Web -- an overview --
- Dr Yuri A. Tijerino
- Computer Science Department
- Brigham Young University
2The Book of Genesis tells of a great tower built
by men not only from fear of a second Flood but
above all to make a name for themselves. Gods
punishment was the Babylonian confusion of
tongues, with men unable to understand each
other, the result being that the tower was never
finished.
3Todays Web
- information overload
- massive, heterogeneous data sets
- unstructured documents (e.g. e-mails)
- lack of context and meaning
- new forms of content
- software programs, sensors, ambient devices
- blur between content services
- mixing Web content with smart executables
4Limitations of the Web Today
5The Semantic Web
- Tim Berners-Lee
- an extension of the current web in which
information is given well-defined meaning, better
enabling computers and people to work in
cooperation - An open platform allowing information to be
shared and processed automatically - adding context and structure via metadata
6The Agent Computing Paradigm
- The old way of thinking about computer programs
a program - begins executing
- takes input
- gives output
- finishes executing
- The new way programs
- interact with each other
- are always active
- should be robust (ie, able to deal with the
unexpected)
7From Agents to Knowledge Markup
- Almost everything we need to know is on the web.
- What a great resource for agents!
- But Agents dont understand web pages.
- Natural Language processing is too hard for
computers, and will remain so for a long time. - The solution Knowledge Markup.
8Knowledge Markup in a Nutshell
- A web page describes objects.
- Datasets, human beings, services, items for sale,
etc. - The semantics of an object are defined by the
place it occupies in some domain ontology. - The basic idea of knowledge markup is to use
ontologies to markup a web page according to the
location its objects occupy in the ontology. - Essentially, knowledge markup is knowledge
representation done in ontologies.
9Benefits of Knowledge Markup
- Agents can parse a page, and immediately
understand its semantics. - No need for natural language processing.
- Searches can be done on concepts. The inheritance
mechanisms of the back-end knowledge base obviate
the need for keywords. - Data and knowledge sharing.
10Knowledge Markup Example (Hypothetical)
- You ask the system Show me all universities near
the beach. - The UCLA page doesnt say anything about the
beach, but it does say (through knowledge markup)
that its near the Pacific Ocean. - UCLA makes use of a geography ontology which
includes the rule Ocean(x) ?hasBeaches(x). - When your search agent parses the UCLA page, it
loads in the relevant ontologies, deduces that
UCLA is near the beach, and returns the page.
11Semantic Web Vision
12XML is a first step
- Semantic markup
- HTML ? layout
- XML ? meaning
- Metadata
- within documents
- not across documents
13XML example
- ltplaygt
- lttitlegtThe Life and Death of King Johnlt/titlegt
- ltDramatis Personaegt
- ltpersonagtThe Earl of PEMBROKElt/personagt
- ltpersonagtThe Earl of ESSEXlt/personagt
-
- lt/Dramatis Personaegt
- ltStagedirgtSCENE England, the
Court.lt/Stagedirgt - ltactgtAct 1
- ltscenegtScene I.
- ltspeechgt
- ltspeakergtJOHNlt/speakergt
- ltlinegtNow, Chatillon, what would
France with us?lt/linegt - lt/speechgt
14Resource Description Framework (RDF)
- A standard of W3C
- Relationships between documents
- (or parts of documents)
- Can be an XML application
- Consisting of triples or sentences
- subject
- property or predicate (verb)
- object
- RDF RDFS used to define ontologies
15A simple example
- Tolkein wrote The Lord of the Rings
- hasWritten (http//www.famouswriters.org/tolkein/
, http//www.books.org/ISBN00001047582
/)
16RDF Schema
- RDF Schema is a frame based language used for
defining RDF vocabularies. - Introduces properties rdfssubPropertyOf and
rdfssubClassOf - Defines semantics for inheritance and
transitivity. - Introduces notions of rdfsDomain and rdfsRange
- Also provides rdfsConstraintProperty
17RDF Schema Lexicon
18The Recapitulation of AI Research
- The last 5 years have seen a recapitulation of 40
years of AI history. - Data Structures ?XML
- Semantic Networks ?RDF
- Early Frame Based Systems ?RDFS
-
- As a mechanism for metadata encapsulation,
RDFS works just fine. But it is unsuited for
general purpose knowledge representation. This is
where the AI community steps in, saying,
essentially, We know how to do this please let
us help.
19What are Ontologies?
- Ontologies provide a shared and common
understanding of a domain - a (shared) specification of a conceptualisation
- concept map
- a simple example - Yahoo
- BusinessEconomy gt Finance gt Banking
- universal, non-consensual, manual, changes slowly
- for WWW, defined using RDF-Schema (RDFS)
20A History of Knowledge Representation
- Knowledge representation (KR) is the branch of
artificial intelligence (AI) that deals with the
construction, description and use of ontologies. - How do we model a domain for input into the
machine? - Ontology is the branch of philosophy that answers
the question - what is there?
- Some big names in ontology Parmenides, Plato,
Aristotle, Kant, Pierce, Husserl - For a program to reason, it must have a
conceptual understanding of the world. This
understanding is provided by us. Thus we have to
answer questions that weve been considering for
several thousand years. - Today, in computer science, an ontology is
typically a hierarchical collection of classes,
permissible relationships amongst those classes,
and inference rules
21Ontology as Taxonomy
Computer Networks
Distributed Systems
Network Architectures
ISDN
Wireless Communication
ATM
Client/ Server
Network OS
22Ontology of People and their Roles
Employee
Expert
Analyst
Manager
Programme Mgr
Project Mgr
23Ontology and Logic
- Reasoning over ontologies
- Inferencing capabilities
- X is author of Y ? Y is written by X
- X is supplier to Y Y is supplier to Z ?
- X and Z are part of the same
supply chain - Based on Description Logic research
24Example Ontology (Scientific Pedagogy)
- Classes
- Experimental science (ES)
- Theoretical science (TS)
- Good Teaching Example (GTE)
- Relationships
- Motivates A particular instance of TS may
motivate an instance of ES. - Demonstrates A particular instance of ES may
demonstrate an instance of TS. - Inference Rules
- ES(X) and TS(Y) and Demonstrates(X,Y) ?GTE(X,Y)
25The DAML Program
- DAML DARPA Agent Markup Language
- Defense Advanced Research Agency (DARPA) program
- Program Managers James Hendler, Murray Burke
- Begin in August 2000
- Goal achieve semantic interoperability between
Web pages, databases, programs, and sensors - Integration contractor and 16 technology
development teams - MIT (Tim Berners-Lee, Ben Grosof)
- Stanford (Gio Weiderhold, Richard Fikes, Deborah
McGuinness) - UMBC (Tim Finin)
- U West Florida (Pay Hayes)
- Yale (Drew McDermott)
-
- Advisors Ramanthan Guha, Peter Patel-Schneider,
- Web site http//www.daml.org/
- Cycorp (Doug Lenat)
- Nokia (Ora Lassila)
- Teknowledge (Bob Balzer)
26DAMLOIL OWL
- A representation language for user-defined
ontologies - An ontology added to RDF and RDF-Schema
- Specification document
- http//www.daml.org/2001/03/damloil-index.html
- Expressive power analogous to
- Description logics (e.g., CLASSIC)
- Monotonic frame languages (e.g., OKBC knowledge
model) - Designed in collaboration with the European
Community - Designers of the Ontology Inference Layer (OIL)
- Basis for Web Ontology Language (OWL), the
candidate W3C standard
27DAMLOIL Classes
- Thing
- Restriction
- List
- Ontology
- AbstractProperty
- TransitiveProperty
- DatatypeProperty
- UniqueProperty
- UnambiguousProperty
- Nothing
28DAMLOIL Properties
- Classes
- disjointWith
- Defining Non-primitive classes
- unionOf, disjointUnionOf, intersectionOf,
complementOf, oneOf - Restrictions
- onProperty, toClass, hasValue, hasClass,
hasClassQ - minCardinality, maxCardinality, cardinality
- minCardinalityQ, maxCardinalityQ, cardinalityQ
- Equivalence
- equivalentTo, sameClassAs, samePropertyAs
- Lists
- first, rest, item
- Properties
- inverseOf
- Ontologies
- versionInfo, imports
29Property Restrictions on Classes
ltClass ID "Person"gt ltcommentgt Person is a
subclass of objects whose parents are persons.
lt/commentgt ltrdfssubClassOfgt
ltdamlRestrictiongt ltdamlonProperty
rdfresource hasParent /gt
ltdamltoClass rdfresource Person /gt
lt/damlRestrictiongt lt/rdfssubClassOfgt
ltcomment gt Person is a subclass of resources
that have one father. lt/commentgt
ltrdfssubClassOfgt ltdamlRestrictiongt
ltdamlonProperty rdfresource hasFather
/gt ltdamlcardinalitygt 1
lt/damlcardinalitygt lt/damlRestrictiongt
lt/rdfssubClassOfgt
30Comments on DAMLOIL (OWL)
- Expressive power of a description logic
- Representation language for both classes and
instances - Additional expressive power needed (at least FOL)
- No rationale for excluding any axiom from an
ontology that is - Not a tautology
- Satisfied by the intended interpretation of the
ontology - Example of need for additional expressive power
- The magnitude of a physical quantity in a given
unit of measure - (gt (AND (Quantity-Magnitude ?q ?u ?m)
(Quantity-Dimension ?q ?d)) - (AND (type Physical-Quantity ?q) (type
Unit-Of-Measure ?u) - (type Magnitude ?m)
(Unit-Dimension ?u ?d))) - May be too difficult for the Web community to
understand - Acceptance will be depend on user-friendly tools
- Ok to support development of Semantic Web
technology
31Issues Facing OWL Need for Really Good
Annotation Tools
- OWL is not meant to be read or written by human
beings. - Humans will make assertions through intuitive
user interfaces, which will generate the
appropriate OWL markup. - In fact, the markup should fall out of the
activity of building a web page. - This requires some thought.
32Proof and Trust
33 Example 1 Focused Crawling
- Special purpose search engines will increasingly
replace all-purpose engines. - The notion of an all-purpose search engine is
yielding to that of special-purpose engines. - Such engines do not want to index irrelevant
pages. - Current focused crawling techniques employ
heuristics based on text mining, and
collaborative filtering. - A cleaner approach would be for web sites to
describe themselves with RDF or DAML. - An entire site map could be expressed in RDF,
along with metadata descriptions of each node in
the map. - An agent would know precisely which of the sites
pages are worth checking out.
34Example 2 Indexing the Hidden Web
- Search engines google, infoseek, etc. work by
constantly crawling the web, and building huge
indexes, with entries for every word encountered. - But a lot of web information is not linked to
directly. It is hidden behind forms. - eg www.allmovies.com allows you to search a vast
database of movies and actors. But it does not
link to those movies and actors. You are required
to enter a search term. - A web-spider, not knowing how to interact with
such sites, cannot penetrate any deeper than the
page with the form.
35Indexing the Hidden Web (Contd.)
- Now imagine that allmovies.com had some DAML
attached, which said - I am allmovies.com. I am an interface to a
vast database of movie and actor information. If
you input a movie title into the box, I will
return a page with the following information
about the movie If you input an actor name, I
will return a page with the following information
about the actor
36Indexing the Hidden Web (Contd.)
- An OWL aware spider can come to such a page and
do one of two things - If it is a spider for a specialized search
engine, it may ignore the site altogether. - If not, it can say to itself I know some movie
titles. Ill input them (being careful not to
overwhelm the site), and index the results (and
keep on spidering from the result pages). - At the least, the search engine can record the
fact that - www.allmovies.com/execperson?namex returns
information about the actor with name x.
37Example 3 Knowledge Sharing/Corporate Memory
- Our problem The Church is growing, and various
organizations, departments and divisions need to
collaborate and share knowledge. - The wheel often gets reinvented.
- Our proposal
- Build an ontology which captures gospel, family
history, education and other relevant knowledge. - Mark up talks, scriptures, curriculum materials,
etc. according to this ontology. - Harvest the information with OWL aware
web-crawlers. - Build OWL aware query agents.
38Example 3 (Contd.)
- Leaders and members should be able to tell the
query agent the current form of their data (e.g.
a articles and a subject), their desired output
(e.g. all other related lessons prepared by
others), and get back the series of available
lessons and advice necessary to prepare the
lesson. - We also have a chicken and egg problem here.
- Leaders and members dont want to invest time in
yet another knowledge technology. - How do we do it?
39DAML Example 4 ittalks.org
- www.ittalks.org will be a repository of
information about information technology (IT)
talks given at universities and research
organization across America. - A users information (research interests,
schedule, constraints, etc.) will be stored on
their personal DAML page. - When a new talk is added, the personal agents of
interested users will be notified. - The personal agents will determine, based on
schedule, driving time, more refined interest
specifications, etc, whether or not to inform the
user.
40ittalks.org (Contd.)
- Example Scenario
- You are going to be in Boston for a few days. You
enter this in your schedule, and you are
automatically notified of several talks, at
several Boston universities, that match your
interests. You select one that you would like to
attend. You get a call on your cell-phone letting
you know when it is time to leave for the talk.
41The Road Ahead
- Enormous synergy between KM, ubiquitous
computing, and agents. - Start Trek, here we come.
- The concept is clear, but many details need to be
worked out. - Semantic Web systems can be built incrementally.
- Start small. Even a very modest effort can
massively improve search results.
42What Gartner says ...
- To 2004, significant Semantic Web activity in
improving information access for eBusiness - To 2006, broader-scale success is still
uncertain for several reasons - potentially prolonged poor economic climate
- lack of clear business models around ontologies
- but much sector-specific activity
- overhead and complexity of creating and
maintaining ontologies - Despite this, information-intensive enterprises
should definitely do more than just take note.
43Conclusions
- To conclude
- The first version of the Web lacked a metadata
framework which was needed to describe resources - W3C developed RDF to provide this framework
- As well as providing an framework for metadata
applications, RDF allows software to reach beyond
individual Web sites - The Semantic Web will be based on registries of
machine-understandable definition - The Semantic Web standard language (OWL) was
derived from US and EC efforts (DAMLOIL) - The Semantic Web will be difficult to achieve
- It will be expensive to provide rich
interoperable services without a Semantic Web
44Find Out More (1)
- Semantic Web, W3Clthttp//www.w3c.org/2001/sw/gt
- Semantic Web Road map, Tim Berners-Leelthttp//www
.w3c.org/DesignIssues/Semantic.htmlgt - The Semantic Web, Scientific Americanlthttp//www.
sciam.com/2001/0501issue/0501berners-lee.htmlgt - The Semantic Web Community Portal,
lthttp//www.semanticweb.org/gt - The Semantic Web A Primerlthttp//www.xml.com/pub
/a/2000/11/01/semanticweb/gt - All found using Google to search for semantic
Web
45Find Out More (2)
- An introduction to RDF
- lthttp//www-106.ibm.com/developerworks/xml/librar
y/w-rdf/gt - The Semantic Web Community Portal
- lthttp//www.semanticweb.org/gt
- The Semantic Web An Introduction
- lthttp//infomesh.net/2001/swintro/gt