Title: Semantic Web Introduction
1Semantic WebIntroduction
2Outline
- The current Web
- What is Semantic Web?
- How the Semantic Web Will Be Possible?
- Framework of the Semantic Web
3Weaving the Web Vision of Tim Berners-Lee
- The first step is putting data on the Web in a
form that machines can naturally understand, or
converting it to that form. This creates what I
call a Semantic Web (SW) a web of data that can
be processed directly or indirectly by
machines-- Tim Berners-Lee Weaving the Web,
Harper San Francisco, 1999 - Require that there be a machine-understandable
semantics for some or all of the information
presented in the WWW
4Web Today
- Information repository.
- Simplicity.
- Primarily for human interpretation and use.
5Current Web for Knowledge Management
- Search information Keyword-based search
- Extracting information human browsing and
reading - Automatic agents lack the commonsense knowledge
required to extract "relevant" from textual
representations and fail to integrate information
spread over different source - Maintenance
- keeping collections consistent, correct, and up
to date requires a mechanized representation of
semantics and constraints that help to detect
anomalies - Automatic document generation
- According to user profiles or other relevant
aspects - Require a machine-accessible representation of
the semantics of these information sources
6Current Web for Knowledge Management (Cont.)
- Semantic Web technology will enable structural
and semantic definitions of documents providing
completely new possibilities - Intelligent search instead of keyword matching
- Query answering instead of information retrieval
- Document exchange among departments via ontology
mappings - Definition of customized views on documents
7Current Web for Web Commerce
- B2C online store, auction houses, online
marketplaces - Shop-bots visit several stores, extract product
information, and present to the customer an
instant market overview - Functionality is provided via wrappers written
for each online store - Keyword search regularities in the presentation
format of stores' Web sites text extraction
heuristics - Efforts time-consuming activity for writing
wrappers - Quality limited product information, error
prone, incomplete - Why most product information on Web sites is
provided in natural language, and automatic text
recognition is still a research area with
significant unsolved problems - Require a machine-processable semantics for the
information provided
8Current Web for Web Commerce (Cont.)
- When standard representation formalisms for the
structure and semantics of data are available - Software agents can then be built that can
understand the product information the Web sites
provide - Meta-online stores can then be constructed with
little effort, and this technique will also
enable complete market transparency in various
dimensions of diverse product properties - The low-level programming of wrappers based on
text extraction and format heuristics will be
replaced by semantic mappings that translate
different formats used to represent products and
can be used to navigate and search automatically
for the required information
9Toward Semantic Web
Information repository ? Information-service
provider
10What Is the Semantic Web?
11Vision of Tim Berners-Lee for the Future of the
Web
- Two-part vision
- Make the Web a more collaborative medium
- Make the Web understandable, and thus
processable, by machines - Original Web proposal to CERN
- Relations between information items like
"includes," "describes,", and "wrote" ? not
currently captured on the Web - Use RDF to capture such relationship
12Original Web Proposal to CERN
Encompass additional metadata above and beyond
what is currently in the Web. This additional
metadata is needed for machines to be able to
process information on the Web
13Web vs. Semantic Web
Web
Semantic Web
14Smart Data
- How do we create a web of data that machines can
process? - MAKE THE DATA SMARTER
- The smart data continuum
- Text and databases (pre-XML)
- XML documents for a single domain
- Taxonomies and documents with mixed vocabularies
- Ontologies and rules
15The Smart Data Continuum
16Text and databases
- Pre-XML
- The initial stage where most data is proprietary
to an application - The "smarts" are in the application and not in
the data
17XML documents for a single domain
- The stage where data achieves application
independence within a specific domain - Data is now smart enough to move between
applications in a single domain - Example XML standards in the healthcare
industry, insurance industry, or real estate
industry
18Taxonomies and documents with mixed vocabularies
- Data can be composed from multiple domains and
accurately classified in a hierarchical taxonomy - The classification can be used for discovery of
data - Simple relationships between categories in the
taxonomy can be used to relate and thus combine
data - Data is now smart enough to be easily discovered
and sensibly combined with other data
19Ontologies and rules
- New data can be inferred from existing data by
following logical rules - Data is now smart enough to be described with
concrete relationships, and sophisticated
formalisms where logical calculations can be made
on this "semantic algebra." - This allows the combination and recombination of
data at a more atomic level and very fine-grained
analysis of data - Data no loner exists as a blob but as a part of a
sophisticated microcosm - Example automatic translation of a document in
one domain to the equivalent (or as close as
possible) document in another domain
20Definition of Semantic Web
- A machine processable web of smart data
- Smart data can be further defined as data that is
application-independent, composeable, classified,
and part of a larger information ecosystem
(ontology) - An extension of the current web in which
information is given well-defined meaning, better
enabling computers and people to work in
cooperation. - An infrastructure enables machines to COMPREHEND
semantic documents and data. - A brain for humankind, which assists the
evolution of human knowledge as a whole.
21How the Semantic Web Will Be Possible?
22Achieving a Semantic Web Requires
- Developing languages for expressing
machine-understandable meta-information for
documents and developing terminologies (i.e.
namespaces or ontologies) using these languages
and making them available on the Web - Developing tools and new architectures that use
such languages and terminologies to provide
support in finding, accessing, presenting, and
maintaining information sources - Realizing applications that provide a new level
of service to the human users of the semantic Web
23Languages Two Aspects
- Languages that provide formal syntax and formal
semantics to enable automated processing of
content - Languages that provide standardized vocabulary
referring to real-world semantics enabling
automatic and human agents to share information
and knowledge ontologies
24Formal Languages
- Layer language model for the Semantic Web
- HTML just for presentation, not for processing
- XML separate content and data from presentation
- The Semantic Web is an XML application
- RDF defines a syntactic convention and a simple
data model for representing machine-processable
semantic of data - RDF Schema (RDFS) defines basic ontological
modeling primitives on top of RDF - Ontology Inference Layer (OIL) and DARPA Agent
Markup LanguageOntology (DAML-ONT) full blown
ontology modeling language as extension of RDFS
25Data representation
- XML addresses only document structure.
- RDF is a web-enabled language of Subject, Verb,
Object triples to represent relationships between
data. - Ontology is the specification of a
conceptualizationdefines terms and relationships
between them. - Logic allows us to reason across the RDF elements.
26How does XML Fit into the Semantic Web?
- XML is the syntactic foundation layer of the
Semantic Web - XML guarantees a base level of interoperability
- XML is built upon Unicode characters and Uniform
Resource Identifiers (URIs) - Unicode allow XML to be authored using
international characters - URI used as unique identifiers for concepts in
Semantic Web - XML is not enough (only syntactic
interoperability) - Sharing XML documents adds meaning to the
content but, only when both parties understand
the element name - ? ltpricegt12.00lt/pricegt VS. ltcostgt12.00lt/costgt
- ? Require SW technologies like ontologies
27RDF and RDFS
- RDF standard for Web metadata developed by W3C
- Suitable for describing any Web resources
- Provide interoperability among applications that
exchange machine-understandable information on
the Web - An XML application and adds a simple data model
on top of XML - Objects, properties, and values of properties
- RDFS candidate recommendation
- Define additional modeling primitives on top of
RDF - Allow the definition of classes (i.e. concepts),
inheritance hierarchies for classes and
properties, and domain and range restrictions for
properties
28RDF Application Integration Hub
29Logical Assertions
- An assertion is the smallest expression of useful
information - Use Resource Description Framework (RDF) to
capture the association (assertion) between
subjects and objects - How can we use these assertions?
- The author of a document has written other
articles on similar topics - A well-know authority on the subject has refuted
the main points of an article - Assertions are not free-form commentary but
instead add logical statements to a resource or
about a resource
Object
OnlineMike
knows
Subject
OnlineMary
age
Each Subject/ Object is a resource
Literal
33
30Classification Taxonomy
- We classify things to establish groupings by
which generalization can be made - Downside of classification systems
- Categories can be arbitrary
- Membership criteria are often ambiguous
- Lack rigorous logic for machines to make
inference from useful for humans browsing for
information - XML Topic Maps (XTM)
- Example Linnaean classification of a house cat
- Kingdom Animalia
- Phylum Chordata
- Class Mammalia
- Order Carnivora
- Family Felidae
- Genus Felis
- Species Felis domesticus
31Ontologies Formal Class Model
- An ontology is a formal, explicit, specification
of a shared conceptualization - Conceptualization an abstract model of some
phenomenon in the world that identifies the
relevant concepts of that phenomenon - Explicit the type of concepts used and the
constraints on their use are explicitly defined - Formal ontology should be machine
understandable - Shared an ontology captures consensual
knowledge. It is not restricted to some
individual but accepted by a group - Ontology examples
- WordNet (http//www.cogsci.princeton.edu/wn)
a thesaurus for over 100,000 terms explained in
natural language - CYC (http//www.cyc.com) formal axiomating
theories for many aspects of common sense
knowledge
32Ontologies (Cont.)
- An ontology is a formal representation of classes
and relationships between classes to enable
inference - Formal class hierarchies constrained properties
relations between classes - An ontology contain classes, subclasses,
properties of classes, and relations between
classes - An ontology captures logical information in a
manner that can allow inference - John is a Leader ? John is a Person and John may
lead an organization - Additional formalisms are added to enable
inference - Symmetric property, transitive property
(hasAncestor)
33Key Ontology Components
Person Birthdate date Gender char
Image
depiction
knows
published
is-A
worksfor
Resource
Organization
Leader
leads
34Ontologies (Cont.)
- Shared formal conceptualizations of particular
domains. - Enable Web-based knowledge processing, sharing,
and reuse. - Provide a common understanding of topics.
- Ensure that everyone agrees on terms, types,
constraints, etc.
35Ontologies (Cont.)
- Developed in AI to facilitate knowledge sharing
and reuse - Applications knowledge engineering, NLP,
knowledge representation, intelligent information
integration, cooperative information systems,
information retrieval, electronic commerce, and
KM - Why so popular what they promise a shared and
common understanding of some domain that can be
communicated among people and application systems - Aim at consensual domain knowledge cooperative
development process
36Rules
- With XML, RDF, and inference rules, the Web can
be transformed from a collection of documents
into a knowledge base - Modus Ponens "If P is TRUE, then Q is TRUE"
- P is TRUE therefore, Q is TRUE
- "An apple is tasty if it is not cooked. This
apple is not cooked. Therefore, it is tasty." - SW can use information in an ontology with logic
rules to infer new information - "Let's say one company decides that if someone
sells more than 100 of our products, then they
are a member of the Super Salesman Club." ? a
smart program can now follow this rule to make a
simple deduction "John has sold 102 things,
therefore John is a member of the Super Salesman
club"
37Using rules to infer the uncleOf Relation
- If a person C is a male and childOf a person A,
then person C is a "sonOf" person A - If a person B is a male and siblingOf a person A,
then person B is a "brotherOf" person A - If a person C is a "sonOf" person A, and person B
is a "brotherOf" person A, then person B is the
"uncleOf" person C
PersonA
PersonB
siblingOf
childOf
uncleOf
PersonC
38OIL An Ontology Language
- OIL (http//www.ontoknowledge.org/oil)
- Ontology Inference Layer or Ontology Interchange
Language - Three requirements for an ontology language
- It must be highly intuitive to the human user.
- Given the current success of the frame-based and
OO modeling paradigm, it should have a frame-like
look and feel - It must have a well-defined formal semantics with
established reasoning properties in terms of
completeness, correctness, and efficiency - It must have a proper link with existing Web
languages like XML and RDF, ensuring
interoperability - OIL fulfill the above three requirements
39DAML-ONT Another Ontology Language
- DAML-ONT (http//www.daml.org)
- Funded by the U.S. DARPA
- Sill in an early stage of development and lacks a
formal definition of its semantics
40Tools
- Formal languages to express and represent
ontologies - Editors and semiautomatic construction to build
new ontologies - Reusing and merging existing ontologies (ontology
environment) - Reasoning services (instance and schema
inferences that enable advanced query answering
service, support ontology creation, and help map
between different terminologies) - Annotation tools to link unstructured and
semi-structured information sources with metadata - Tools for information access and navigation that
enable intelligent information access for human
users - Translation and integration services between
different ontologies that enable multi-standard
data interchange and multiple view definitions
(especially for B2B electronic commerce)
41Ontology Editors
- Help human knowledge engineers build ontologies
- Ontology development and maintenance
- Define concept hierarchies, attributes for
concepts, axioms and constraints - Inspect, browse, codify, and modify ontologies
- GUI
- Conform to existing standards in Web-based
software development - Example Protégé (Stanford)
42Protégé Editor
43Semi-Automatic Ontology Constructor
- Manually building ontologies is time-consuming
- Tools that learn ontologies from natural language
exploit the interaction constraints on the
various levels (morphology, syntactic, semantic,
pragmatics, background knowledge) in order to
discover new concepts and stipulate relationships
among concepts - These tools combine machine learning, information
extraction, and linguistic techniques to extract
relevant concepts, build is-a hierarchies, and
determine relationships among concepts
44Semi-Automatic Ontology Constructor Text-To-Onto
- KM group of the Institute AIFB, Karlsruhe
University - Learn ontologies from text
- Select a relevant corpus of domain texts (NLP
texts or HTML texts) - Use domain lexicon to perform domain-specific
parsing - Existing knowledge structures (ex. A taxonomy of
concepts) are incorporated as background
knowledge - Discover new knowledge structures, which are then
captured in the ontology modeling module to
expand the existing ontology
45Text-To-Onto
46Ontology Environment
- Reuse existing ontologies to save time and labor
- Must allow adaptation and merging of existing
ontologies to make them fit for new tasks and
domains - Operations for combining ontologies ontology
inclusion, restriction, and polymorphic
refinement - Use ontologies in different formats, reorganize
taxonomies, resolve name conflicts, browse
ontologies, edit terms - Example Chimaera (Stanford) merging and
diagnosing (and evolving) ontologies
47Chimaera
48Reasoning Services
- Reasoning over instances of an ontology
- Derive a certain value for an attribute applied
to an object - Powerful support in formulating rules and
constraints and answering queries over schema
information - Used to answer queries about the explicit and
implicit knowledge specified by an ontology - Help to build ontologies
- Example Ontobroker (Ontoprise,
http//www.ontoprise.de)
49Reasoning Services (Cont.)
- Reasoning over concepts of an ontology
- Automatically derives the right position for a
new concept in a given concept hierarchy - Help to build ontologies
- Example FaCT (Fast Classification of
Teminologies) (Manchester University) derive
concept hierarchies automatically
50Annotation Tools
- Ontologies can be used to describe a large number
of instances - Annotation tools help the knowledge engineer to
establish such links via - Linking an ontology with a database schema or
deriving a database schema from an ontology (in
cases of structured data) - Deriving an XML DTD, an XML schema, and an RDF
schema from an ontology (in case of
semi-structured data) - Manually or semi-automatically adding ontological
annotation to unstructured data
51The Ontology-Learning Process
52Tools for Information Access and Navigation
- Low-level navigation now clicking on links and
using keyword searches - Keyword-based search retrieves irrelevant
information that uses a particular word in a
different meaning from the one intended, and it
may miss relevant links in which different words
than the keyword are used to describe the content
for which the user is searching - Navigation is supported only by predefined links
current navigation technology does not support
clustering and linking of pages based on semantic
similarity - Query responses require human browsing and
reading to extract the relevant information from
the information sources returned - Burden Web users with an additional loss of time
and seriously limits information retrieval by
automatic agents
53Tools for Information Access and Navigation
(Cont.)
- Low-level navigation now (Cont.)
- Keyword-based document retrieval fails to
integrate information spread over different
sources - Current retrieval services can retrieve only
information that is directly represented on the
Web. No further inference service is provided for
deriving implicit information that must be
derived from the explicit text - Ontologies can
- Support IR based on the actual content of a page
- Help user navigate information space based on
semantic concepts - Enable advanced query answering and information
extraction services, integrating heterogeneous
and distributed information sources enriched by
inferred background knowledge
54Hyperbolic Browsing Interface
Semantic Information Visualization
55Automatically Generated Semantic Structure Map
56Translation and Integration Services
- Current B2B the heterogeneity of product
descriptions on Web sites and the exponentially
increasing effort that must be devoted to mapping
these heterogeneous descriptions as the number of
Web sites increases - Effective and efficient CM of heterogeneous
product catalogues is critical for the success of
B2B - Mapping heterogeneous descriptions
- Different representations of product catalogs
must be merged - XML with DTD1 ?? XML with DTD2
- Different vocabularies used to describe products
must be merged - Languages, concepts, attributes, values and value
types
57Applications
- Applications for Knowledge Management
- On-To-Knowledge
- SwissLife (http//www.swisslife.ch)
- British Telcom (http//www.bt.com/innovations)
- Enersearch (http//www.enersearch.se)
- Applications for B2C
- Semantic Edge (http//www.semanticedge.com)
58Web Services
- Web services are software services identified by
a URI that are described, discovered, and
accessed using Web protocol - Web services consume and produce XML
- Furthering the adoption of XML, or more smart
data - As Web services proliferate, they are more
difficult to discover ? using SW technologies to
solve the Web service discovery problem - Enabling Web services to interact with other Web
services
59Semantic Web services and agents
- Automation tasks
- Service discovery
- Service comparison
- Service execution
- Service composition and interoperation
- What should be markup?
- Web services
- User constraints and preferences
- Agent procedure
60Framework of Semantic Web
61Semantic Web Principles
- Everything is identifiable
- Any abstract thing can have a URI.
- Partial information
- Anyone can say anything about anything.
- Dont expect global consistency.
- Evolution
- Allow effective combination of the independent
work of diverse communities. - Minimalist design
- Make the simple things simple, and the complex
things possible.
62What achieves the Semantic Web?
- Knowledge representation
- Ontology Logic Inference Learning
- Semantic Web services
- Agents
63Semantic interoperability
64Proofs exchange between agents
65Trust
- Add semantics to make trust determination better
- You may want to allow access to information if a
trusted friend vouches (via a digital signature)
for a third party - Digital signatures are crucial to the "web of
trust" - By allowing anyone to make logical statements
about resources, smart applications will only
want to make inferences on statements that they
can trust - Verifying the source of statements is a key part
of SW
66Semantic Web wave
67Who are involved?
- Artificial intelligence
- The web
- Databases
- Agents
- Theoretical computer science and logic
- Systems
- Computational linguistics and pattern recognition
- Document engineering and digital libraries
- Human-computer interfaces
- Social and human sciences
68Discussions
- The Semantic Web is not a separate Web but an
extension of the current one. - Markup makes data computer-interpretable,
use-apparent, and agent-ready. - The Semantic Web will break out of the virtual
realm and extend into our physical world. - Ubiquitous knowledge and computing
- Can you imagine the Semantic Web A brain for
humankind?
69References
- http//www.w3.org/2001/sw/
- http//www.semanticweb.org/
- Semantic Web-enabled Web services
(http//swws.semanticweb.org)