Title: Dickson K.W. Chiu
1CSIT600f Introduction to Semantic Web
- Dickson K.W. Chiu
- PhD, SMIEEE
- Text Antoniou van Harmelen A Semantic Web
Primer - Ref Ivan Herman Tutorial on Semantic Web
Technology
2Towards a Semantic Web
- WWW is an impressive success
- amount of available information (gt 1 Giga-page)
- number of human users (gt 200 Mega-user)
- The current Web represents information using
- natural language (English, Hungarian, Chinese,)
- graphics, multimedia, page layout
- Humans can process this easily
- can deduce facts from partial information
- can create mental associations
- are used to various sensory information
- (well, sort of people with disabilities may have
serious problems on the Web with rich media!)
3Need for understanding Web info
- Tasks often require to combine data on the Web
- hotel and travel infos may come from different
sites - searches in different digital libraries
- etc.
- Again, humans combine these information easily
- even if different terminologies are used!
4However
- However machines are ignorant!
- partial information is unusable
- difficult to make sense from, e.g., an image
- drawing analogies automatically is difficult
- difficult to combine information
- is ltfoocreatorgt same as ltbarauthorgt?
- how to combine different XML hierarchies?
5Example Searching
- The best-known example
- Google et al. are great, but there are too many
false hits - adding descriptions to resources should improve
this
6Where we are Today the Syntactic Web
Hendler Miller 02
7The Syntactic Web is
- A hypermedia, a digital library
- A library of documents called (web pages)
interconnected by a hypermedia of links - A database, an application platform
- A common portal to applications accessible
through web pages, and presenting their results
as web pages - A platform for multimedia
- BBC Radio 4 anywhere in the world!
- Peer-to-peer sharing (BT, edonkey, PPLive, )
- A naming scheme
- Unique identity for those documents
- A place where computers do the presentation
(easy) and people do the linking and interpreting
(hard). - Why not get computers to do more of the hard
work?
8Hard using the Syntactic Web
- Finding the image of something
- Find pictures that contain red birds with blue
background - Complex queries involving background knowledge
- Find information about animals that use sonar
but are not either bats or dolphins - Locating information in data repositories
- Travel enquiries
- Prices of goods and services
- Results of human genome experiments
- Finding and using web services
- Visualise surface interactions between two
proteins - Delegating complex tasks to web agents
- Book me a holiday next weekend somewhere warm,
not too far away, and where they speak French or
English
9What is the Problem?
- Markup comprise
- rendering information (e.g., font size and
colour) - Hyper-links to related content
- Semantic content is accessible to humans but not
(easily) to computers
Consider a typical web page
10What information can we see
- WWW2002
- The eleventh international world wide web
conference - Sheraton waikiki hotel
- Honolulu, hawaii, USA
- 7-11 may 2002
- 1 location 5 days learn interact
- Registered participants coming from
- australia, canada, chile denmark, france,
germany, ghana, hong kong, india, ireland, italy,
japan, malta, new zealand, the netherlands,
norway, singapore, switzerland, the united
kingdom, the united states, vietnam, zaire - Register now
- On the 7th May Honolulu will provide the backdrop
of the eleventh international world wide web
conference. This prestigious event - Speakers confirmed
- Tim berners-lee
- Tim is the well known inventor of the Web,
- Ian Foster
- Ian is the pioneer of the Grid, the next
generation internet
11Information a machine may see
- WWW2002
- The eleventh international world wide web
conference - Sheraton waikiki hotel
- Honolulu, hawaii, USA
- 7-11 may 2002
- 1 location 5 days learn interact
- Registered participants coming from
- australia, canada, chile denmark, france,
germany, ghana, hong kong, india, ireland, italy,
japan, malta, new zealand, the netherlands,
norway, singapore, switzerland, the united
kingdom, the united states, vietnam, zaire - Register now
- On the 7th May Honolulu will provide the backdrop
of the eleventh international world wide web
conference. This prestigious event - Speakers confirmed
- Tim berners-lee
- Tim is the well known inventor of the Web,
- Ian Foster
- Ian is the pioneer of the Grid, the next
generation internet
12Solution XML markup with meaningful tags?
ltnamegtWWW2002 The eleventh international world
wide webconlt/namegt ltlocationgtSheraton waikiki
hotel Honolulu, hawaii, USAlt/locationgt
How about ltconfgtWWW2002 The eleventh
international world wide webconlt/confgt ltplacegtSher
aton waikiki hotel Honolulu, hawaii, USAlt/placegt
Then how about lt??gtWWW2002 The eleventh
international world wide webconlt/??gt lt??gtSheraton
waikiki hotel Honolulu, hawaii, USAlt/??gt
13What Is Needed?
- A resource should provide information about
itself - also called metadata
- metadata should be in a machine processable
format - agents should be able to reason about
(meta)data - metadata vocabularies should be defined
14What Is Needed (Technically)?
- To make metadata machine processable, we need
- unambiguous names for resources (URIs)
- a common data model for expressing metadata (RDF)
- and ways to access the metadata on the Web
- common vocabularies (Ontologies)
- The Semantic Web is a metadata based
infrastructure for reasoning on the Web - It extends the current Web (and does not replace
it)
15Adding Semantics
- External agreement on meaning of annotations
- E.g., Dublin Core (http//dublincore.org/)
- Agree on the meaning of a set of annotation tags
- Problems with this approach
- Inflexible
- Limited number of things can be expressed
- Use Ontologies to specify meaning of annotations
- Ontologies provide a vocabulary of terms
- New terms can be formed by combining existing
ones - Meaning (semantics) of such terms is formally
specified - Can also specify relationships between terms in
multiple ontologies
16History of the Semantic Web
- Web was invented by Tim Berners-Lee (amongst
others), a physicist working at CERN - TBLs original vision of the Web was much more
ambitious than the reality of the existing
(syntactic) Web - TBL (and others) have since been working towards
realising this vision, which has become known as
the Semantic Web - E.g., article in May 2001 issue of Scientific
American
... a goal of the Web was that, if the
interaction between person and hypertext could be
so intuitive that the machine-readable
information space gave an accurate representation
of the state of people's thoughts, interactions,
and work patterns, then machine analysis could
become a very powerful management tool, seeing
patterns in our work and facilitating our working
together through the typical problems which beset
the management of large organizations.
17Berner-Lees Architecture
? Semanticsreasoning
?
? Relational Data
?
? Data Exchange
- Relationship between layers is not clear
- OWL DL extends DL subset of RDF
18A Spectrum of Ontology
Thesauri narrower term relation
Frames (properties)
General Logical constraints
Formal is-a
Catalog/ ID
Informal is-a
Formal instance
Disjointness, Inverse, part-of
Terms/ glossary
Value Restrs.
19Ontology Origins and History
- Ontology in Philosophy - a philosophical
disciplinea branch of philosophy that deals with
the nature and the organization of reality - Science of Being (Aristotle, Metaphysics, IV, 1)
- studies being or existence as well as the basic
categories thereof - trying to find out what entities and what types
of entities exist - has strong implications for the conceptions of
reality.
20Ontology in Linguistics
Tank
21Ontology in Computer Science
- An ontology is an engineering artifact
Neches91 - defines basic terms and relations comprising the
vocabulary of a topic area - the rules for combining terms and relations to
define extensions to the vocabulary - An explicit specification of a
conceptualization Gruber93 - Formal specification of a shared
conceptualization (of a certain domain) Borst
97 - Shared understanding of a domain of interest
- Formal and machine manipulable model of a domain
of interest
22Structure of an Ontology
- Ontologies typically have two distinct
components - Names for important concepts in the domain
- Elephant is a concept whose members are a kind of
animal - Herbivore is a concept whose members are exactly
those animals who eat only plants or parts of
plants - Adult_Elephant is a concept whose members are
exactly those elephants whose age is greater than
20 years - Background knowledge/constraints on the domain
- Adult_Elephants weigh at least 2,000 kg
- All Elephants are either African_Elephants or
Indian_Elephants - No individual can be both a Herbivore and a
Carnivore
23Ontology Elements
- Concepts (classes) their hierarchy
- Concept properties (slots / attributes)
- Property restrictions (type, cardinality, domain,
etc.) - Relations between concepts (disjoint, equality,
etc.) - Instances
- E-R diagram / UML diagram ???
- Note Property ? Slot ? Relation ?
Relationtype ? Attribute ? Semantic link
type
24A Semantic Web First Steps
Make web resources more accessible to automated
processes
- Extend existing rendering markup with semantic
markup - Metadata annotations that describe
content/function of web accessible resources - Use Ontologies to provide vocabulary for
annotations - Formal specification is accessible to machines
- A prerequisite is a standard web ontology
language - Need to agree common syntax before we can share
semantics - Syntactic web based on standards such as HTTP and
HTML
25More Example Automatic Assistant
- Your own personal (digital) automatic assistant
- knows about your preferences
- builds up knowledge base using your past
- can combine the local knowledge with remote
services - hotel reservations, airline preferences
- dietary requirements
- medical conditions
- calendaring
- etc
- It communicates with remote information (i.e., on
the Web!) -
26Example Database Integration
- Databases are very different in structure, in
content - Lots of applications require managing several
databases - after company mergers
- combination of administrative data for
e-Government - biochemical, genetic, pharmaceutical research
- etc.
- Most of these data are now on the Web
- The semantics of the data(bases) should be known
- how this semantics is mapped on internal
structures is immaterial
27Example Digital Libraries
- It is a bit like the search example
- It means catalogs on the Web
- librarians have known how to do that for
centuries - goal is to have this on the Web, World-wide
- extend it to multimedia data, too
- But it is more software agents should also be
librarians! - help you in finding the right publications
28Example Semantics of Web Services
- Web services technology is great
- But if services are ubiquitous, searching issue
comes up, for example - find me the most elegant Schrödinger equation
solver - what does it mean to be
- elegant?
- most elegant?
- mathematicians ask these questions all the time
- It is necessary to characterize the service
- not only in terms of input and output parameters
- but also in terms of its semantics
29How Simple Ontologies Help
- not as costly to build and potentially
- more importantly, many are available
- provide a controlled vocabulary
- website organization and navigation support
- support expectation setting (e.g. user interface)
- umbrella structures from which to extend
content (e.g., UNSPSC) - searching support
- sense disambiguation support (e.g., terms belong
to different categories)
Deborah McGuinness. Ontologies Come of Age. The
Semantic Web Why, What and How, MIT Press, 2001.
(MS-Word)
30How Structured Ontologies Help
- more structure gt more power
- consistency checking
- completion (of unspecified attributes and
relations) - interoperability support
- validation and verification testing or even
encode entire test suites - structured, comparative, and customized search
- intelligence in application, e.g., system
configuration support
31Benefits of Semantic Web
- Communication between people
- Interoperability between software agents
- Reuse of domain knowledge
- Make domain knowledge explicit
- Analyze domain knowledge
32The Semantic Web is Not
- Artificial Intelligence on the Web
- although it uses elements of logic
- it is much more down-to-Earth (we will see
later) - it is all about properly representing and
characterizing metadata - of course AI systems may use the metadata of the
SW - but it is a layer way above it
- A purely academic research topic
- SW is out of the university labs now
- lots of applications exist already (see examples
later) - big players of the industry use it (Sun, Adobe,
HP, IBM,) - of course, much is still be done!
- Building an ontology is not a goal in itself