The Semantic Web - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

The Semantic Web

Description:

Partially based on previous material by Bob McKay, Yin ... Can be spammed by careful tailoring of website information (i.e. extra text same ... Can be spammed ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 54
Provided by: CS087
Category:
Tags: semantic | spammed | web

less

Transcript and Presenter's Notes

Title: The Semantic Web


1
The Semantic Web
  • David Cornforth
  • School of ITEE
  • University of NSW _at_ ADFA
  • Partially based on previous material by Bob
    McKay, Yin Shan and Biao Wang, also of UNSW _at_
    ADFA, and material by Jim Hendler at
    http//www.cs.umd.edu/hendler/presentations/

2
Outline
  • Current Search Technologies
  • Why a Semantic Web?
  • Metadata and RDF
  • Ontologies

3
Search Engines
  • Definition
  • Huge databases of web page files that have been
    assembled automatically by machine.
  • One of the primary ways that Internet users find
    information
  • "search engine" often used generically of both
  • crawler-based search engines
  • Google, altavista,
  • human-powered directories
  • Yahoo,

4
Crawler-Based Search Engines
  • "crawl" or "spider" the web, and create listings
    automatically
  • People search through what they have found
  • Web page changes are found based on regular
    re-crawl
  • Crawler-based search engine has three parts
  • Spider
  • Crawler - a robot program that wanders the web
  • Index
  • Catalog - built up by the spider
  • Search software
  • Sifts indexed page database to find matches
  • Can be spammed by careful tailoring of website
    information (i.e. extra text same colour as
    background)
  • Constant wars between search engine and spammers

5
Human-Powered Directories
  • The original version of Yahoo was human-powered
    directory, and depended on humans for its
    listings
  • A search looks for matches only in the
    descriptions submitted
  • Changing web pages has no effect on existing
    listing
  • Almost impossible to spam, because of the human
    gateway
  • Impractical today only a minute portion of the
    web can be indexed this way

6
Meta-Search Engines
  • E.g. Chubba, Copernic and MetaCrawler
  • No database for Web pages
  • Submit users queries to major search engines
  • Collect and display results to user
  • maybe re-ranking
  • aggregating into one list
  • Advantage
  • maximized coverage
  • Disadvantage
  • hard to handle complex queries

7
Ranking Web Pages
  • Ranking rules (or algorithms) are the core of
    search engines and the main point for competition
    between search engines
  • Older-style algorithms base the ranking on the
    relevance of the contents of the page
  • if a page contains the exact query term in the
    title
  • if the term appears early in the document
    (location)
  • if the term is repeated in the document
    (frequency)

8
Ranking Web Pages
  • More recent engines (especially Google) use
    variants of the PageRank algorithm
  • Ranking is based on the connectivity of the page
  • Especially, on the number of pages which refer to
    this page
  • And on their ranking
  • Can be spammed
  • Search engines have (usually secret) algorithms
    for detecting and punishing spamming
  • Usually combined with relevance ranking

9
Ranking Relevance
  • lthtmlgt
  • ltheadgt
  • lttitlegtMark Twainlt/titlegt
  • lt/headgt
  • ltbodygt
  • lth1gtMark Twainlt/h1gt
  • Nationality Americanltpgt
  • Genre Fictionltpgt
  • Summary Mark Twain was the pen name of Samuel
    Clemens, an American humorist who lived from
    1835-1910
  • Work
  • ltulgt
  • ltligtAdventures of Huckleberry Finn 1884
  • ltligt.
  • lt/bodygt
  • lt/htmlgt

lthtmlgt ltheadgt lttitlegtMark Twain Insurance
Companylt/titlegt lt/headgt ltbodygt lth1gtCall Mark
Twain Insurance todaylt/h1gt The Mark Twain
Insurance has been in business since 1956. During
that time, the folks at Mark Twain have
. lt/bodygt lt/htmlgt
10
Ranking and Relevance
  • Obviously, most people who search Mark Twain on
    the Web hope to get the page 1 as return.
  • Unfortunately, most relevance rules would rank
    the page 2 as more relevant

11
Other Ranking Rules
  • Most commercial search engines have their own
    additional ranking rules.
  • Infoseek and Hotbot factor meta-content into
    their formula
  • Hotbot promotes pages that can attract visitors
    by watching what results someone selects for a
    particular search.
  • But they only incrementally improve search results

12
Why arent Search Engines Enough?
  • The scale and dynamicity of Web information
  • Requires Machine-dependent information searching
  • The majority of resources are designed for human
    browsing rather than machine browsing
  • Imprecise search results
  • Current web searches turn up lots of totally
    irrelevant pages
  • You have to search through the search results
  • One approach is Artificial Intelligence
  • More sophisticated searching techniques

13
The Semantic Web A Solution?
  • Vision
  • Data on the Web defined and linked in a way that
    it can be used by machines
  • Not just for display purposes
  • For automation, integration and reuse of data
    across various applications
  • Establish a machine-understandable Web More
  • Homogeneous
  • Data-like
  • Amenable to search
  • Approach
  • Establish metadata architecture for Web resources

14
What is Metadata?
  • For traditional database
  • Data about data
  • For Web
  • Data describing Web resources
  • Tim Berners-Lee
  • Metadata is machine understandable information
    about
  • web resources
  • other things
  • The distinction between "data" and "metadata" is
    not absolute
  • many times the same resource will be interpreted
    in both ways simultaneously

15
Features of Metadata
  • Metadata is data
  • Metadata can describe metadata
  • Metadata may refer to any resource which has a
    URI
  • Metadata may be stored in any resource
  • no matter to which resource it refers
  • Metadata can be regarded as a set of assertions
  • each assertion being about a resource
  • Assertions which state a named relationship
    between two resources are known links

16
Main Principles of Semantic Web
  • Everything can be identified by URI's
  • Resources and links can have types
  • Partial information is tolerated
  • There is no need for absolute truth
  • Evolution is supported
  • Minimalist design

17
Main Purpose of Semantic Web
  • providing an infrastructure that enables not just
    web pages, but
  • Databases
  • Services
  • Programs
  • Sensors
  • personal devices
  • even household appliances
  • to both consume and produce data on the web

18
Layers of Semantic Web
19
Roles of Building Blocks
  • Unicode
  • Means for use of international character sets
  • URI
  • Means for identifying the objects in Semantic Web
  • XML
  • Interoperable syntactical foundation
  • Upon which the more important issue of
    representing relationships and meaning can be
    built
  • Resource Description Framework (RDF) and RDF
    Schema System for
  • Making statements about objects with URI's
  • Defining vocabularies that can be referred to by
    URI's

20
Roles of Building Blocks (cont)
  • Ontology
  • Supports the evolution of vocabularies
  • To allow definition of relations between
    different concepts
  • Digital Signature
  • Detecting alterations to documents
  • Logic Layer
  • Permits the writing of rules
  • Proof layer
  • Execution of the rules
  • Evaluation together with the Trust layer
    mechanism
  • Whether to trust the given proof
  • Hence whether to trust the data

21
An Example of Metadata
  • lthtmlgt
  • ltheadgt
  • lttitlegtDocumentlt/titlegt
  • ltmeta namekeywords contentweb search, RDF,
    metadatagt
  • lt/headgt
  • ...
  • The most widely known is probably the simple
    keywords and descriptions embedded into HTML
    META tags
  • Collected and indexed by the large Web search
    engines like Alta Vista.
  • Only useful if everyone uses the same standards

22
What is RDF?
  • Resource Description Framework
  • A framework for metadata
  • provides interoperability between applications
    that exchange machine-understandable information
    on the Web
  • emphasizes facilities to enable automated
    processing of Web resources
  • provides the basic building blocks for supporting
    the Semantic Web

23
How can RDF Help?
  • Resource discovery - provide better search engine
    capabilities
  • Cataloguing
  • Describing content, content relationships
  • available at a particular Web site, Page, or
    Digital library
  • Intelligent software agents
  • knowledge sharing
  • Exchange
  • Content rating (Eg PICS - Platform for Internet
    Content Selection)
  • Describing collections of pages
  • Representing a single logical document
  • Describing intellectual property rights of Web
    pages

24
Simple RDF Statement(assertion)
Tom is the author of the paper Pictorially
AuthorOf
Paper
Tom
Subject
Predicate
Object
25
RDF Model
  • A model is a set of statements
  • A statement
  • Predicate(subject,object)
  • Predicate is a resource
  • Subject is a resource
  • Object is either a resource or a literal
  • Object Predicate(subject)
  • Previous Example
  • Tom AuthorOf(Paper)

26
RDF Resources
  • A resource can be anything that has identity.
    Familiar examples include an electronic document,
    an image, a service (e.g., "today's weather
    report for Los Angeles"), and a collection of
    other resources. Not all resources are network
    "retrievable" e.g., human beings, corporations,
    and bound books in a library can also be
    considered resources. The resource is the
    conceptual mapping to an entity or set of
    entities, not necessarily the entity which
    corresponds to that mapping at any particular
    instance in time. Thus, a resource can remain
    constant even when its content---the entities to
    which it currently corresponds---changes over
    time, provided that the conceptual mapping is not
    changed in the process.
  • (Quoted from RFC 2396)

27
RDF Resources (cont)
  • A resource is identified by a Resource Identifier
  • URI optional anchor id (sub-component)
  • The resource identified by a URI may be abstract
  • not network retrievable
  • Even the link can have a URI
  • A statement about a property is called reification

28
RDF Syntax
  • A RDF data model needs a concrete representation
    syntax so it can create and exchange metadata
  • XML is an obvious choice
  • The RDF specification uses it
  • BUT
  • The RDF data model is not tied to a particular
    syntax

29
Basic RDF XML Syntax
  • Multiple statements for same resource grouped
  • into a Description element
  • The Description has an attribute about
  • Names the resource
  • Description contains other elements
  • cause the creation of statements in the model
    instance
  • Other descriptions or property values can refer
    to it
  • Using the value of the resources id attribute
    in their own about attribute

30
XML Syntax for First Example
  • ltrdfRDFgt
  • ltrdfDescription rdfhrefdcorn/Publications/
    Paper01.pdf"gt
  • ltTitlegtBuilding your own perpetual motion
    machinelt/Titlegt
  • ltAuthorgtDavid Cornforthlt/Authorgt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt
  • (Description has an initial uppercase)

31
Example
  • A set of statements can be visualized as a graph.

dcorn/Publications/Paper01.pdf
Author
Title
Building your own perpetual motion machine

David Cornforth
32
Namespaces in RDF
  • BUT
  • Is my Title the same as your Title ?
  • XML namespaces can be used to uniquely identify
    elements
  • Each namespace has a URI associated with it.
  • RDF schemas identify allowable property types

33
Namespaces in RDF
  • Predicates must be also labelled by URI
  • To eliminate ambiguities that arise from using
    only word identifiers.
  • Eg vocabulary providers can define different
    versions of the predicate hasHomepage.
  • The XML-namespace syntax can be used to
    abbreviate URIs in statements
  • Eg we can define the substitution of the
    namespace-prefix w6 for http//www.w6.org/schema/
  • Then use simply w6hasHomepage.

34
XML for Decker Example
  • lt?xml version1.0?gt
  • lt rdfRDF
  • xmlnshttp//www.w6.org/schema/
  • xmlnsrdfhttp//www.w3.org/1999/02/22-rdf
    -syntax-ns
  • xmlnssohttp//www.semantic.org/elements/
    gt
  • ltrdfRDFgt
  • ltrdfDescription about"http//www.corn.org/rese
    arch"gt
  • lthasHomepagegt
  • ltrdfDescription about"http//www.corn.org/
    homepage"gt
  • ltsoAuthorgtTom Deckerlt/soAuthorgt
  • lt/rdfDescriptiongt
  • lt/hasHomepagegt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt

35
Graphical view is
http//www.corn.org/research
hasHomepage
http//www.corn.org/homepage
Author
Tom Decker
36
Alternative Form
  • ltrdfRDFgt
  • ltrdfDescription about"http//www.corn.org/re
    search/group1"gt
  • lthasHomepage rdfresource"http//www.corn
    .org/homepage"/gt
  • lt/rdfDescriptiongt
  • ltrdfDescription about"http//www.corn.org/ho
    mepage"gt
  • ltsoAuthorgtOra Lassilalt/soAuthorgt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt

37
Compared to Xlink
  • RDF is able to attach URIs to the link properties
    itself. Example of Xlink
  • ltsupervisor xlinkhrefITEE/staff/fstein.xml"
    ...gt
  • ltsupervisorname xmllang"en"gt
  • ltnamegt
  • lttitlegtDr.lttitlegt
  • ltgivengtFrankielt/givengt
  • ltfamilygtSteinlt/familygt
  • lt/namegt
  • lt/supervisornamegt
  • lt/supervisorgt

38
RDF ApplicationDublin Core Metadata
  • A set of elements for describing documents
  • Intended use
  • internet resource discovery tools
  • Readily implemented in RDF
  • Although the original implementation was in HTML
    meta elements

39
Dublin Core Properties
  • Title
  • A name for the resource
  • Creator
  • Entity primarily responsible for creating the
    content
  • Subject
  • The topic of the content
  • Description
  • A summary of the content
  • Publisher
  • Entity responsible for making the resource
    available

40
Dublin Core Properties
  • Contributor
  • An entity responsible for making contributions to
    the content
  • Date
  • A date associated with an event in the life cycle
  • Type
  • The nature or genre of the content
  • Format
  • The physical or digital manifestation
  • Identifier
  • An unambiguous reference to the resource in a
    context
  • Eg a URI

41
Dublin Core Properties
  • Source
  • A reference to a resource from which the current
    resource is derived
  • Language
  • A language of the content
  • Relation
  • A reference to a related resource
  • Coverage
  • Extent or scope of the content
  • Rights
  • Information about rights held in/over the resource

42
What is an Ontology?
  • Philosophy
  • A systematic account of Existence
  • AI
  • A Knowledge-Based System
  • Definition
  • An explicit formal specification of how to
    represent the objects, concepts and other
    entities that are assumed to exist in some area
    of interest and the relationships that hold among
    them.
  • Purpose
  • Enabling knowledge sharing and reuse

43
Why Web Ontology?
  • To solve the problem Is my title the same as
    your Title ?
  • Explicitly represent
  • the meaning of terms in vocabularies
  • the relationships between those terms
  • Knowledge management in large/distributed
    organisations
  • Explicitly represent semantics of semi-structured
    information
  • Support for Information
  • Acquisition
  • Maintenance
  • Access

44
Ontology Layer
  • More meta information, such as
  • Transitive property
  • Unique, Unambiguous, Cardinality, etc
  • Ontology definition languages
  • DL
  • OIL
  • SHOE
  • OWL
  • etc.
  • Huge extra usage for extra functionality
  • Not Turing complete or tractable
  • Wide interoperability interconversion required

45
OWL
  • Web Ontology Language
  • Intended to extend RDF/Schema
  • to permit more powerful reasoning about resources
  • Three sub-languages
  • OWL Lite
  • OWL DL
  • OWL Full
  • Only OWL Full is actually a superset of RDF/Schema

46
OWL Full
  • Permits maximum expressiveness
  • For example handles classes of classes
  • Reasoning in OWL Full is undecidable
  • No complete reasoning system can be built

47
OWL DL
  • Computational completeness
  • All conclusions can be computed
  • Decidability
  • Computations will finish in finite time
  • Some limits on expressivity
  • Based on Description Logics
  • Restrictions from OWL Full
  • For example, a class cannot be an instance of
    another class

48
OWL Lite
  • Lightweight reasoning
  • Hence easy to support
  • Efficient
  • Supports
  • Classification hierarchy
  • Simple constraints
  • Eg cardinality constraints
  • But only 0 or 1

49
Search in the Future Semantic Web
  • Example try to find a medical specialist by
    invoking a software agent
  • Agent needs to take into account
  • Potential providers
  • Insurance coverages
  • Location maps
  • Your schedule
  • How can Google do it for you?

50
What are Agents?
  • Common sense defintion
  • Agent does something on behalf of another
  • Real Estate agent
  • Flight agent
  • Computer system
  • variety of definitions
  • "Autonomous agents are computational systems that
    inhabit some complex dynamic environment, sense
    and act autonomously in this environment, and by
    doing so realize a set of goals or tasks for
    which they are designed.
  • Maes 1995
  • "Intelligent agents are software entities that
    carry out some set of operations on behalf of a
    user or another program with some degree of
    independence or autonomy, and in so doing, employ
    some knowledge or representation of the user's
    goals or desires."
  • IBM

51
Characteristics of Agents
  • The following basic characteristics differ agents
    from normal programs (Most agents dont have all
    of them)
  • Autonomous - Requiring limited outside direction
  • Reactive - sensing , acting
  • Goal-oriented - pro-active, purposeful
  • Communicative - social
  • Learning - adaptive
  • Persistent - continuous
  • Personality

52
Categories of Internet Agents
  • Web Search Agents
  • MetaCrawler
  • Information-filtering Agents
  • NewsPage Direct
  • Off-line Delivery Agents
  • PointCast Network
  • Notification Agents
  • BotSpot
  • Negotiation Agents
  • Kasbah
  • AuctionBot
  • eMediator
  • Many others Job Agents, Book Agents, .

53
Questions?
Write a Comment
User Comments (0)
About PowerShow.com