Presented by eBiquity group, UMBC - PowerPoint PPT Presentation

About This Presentation
Title:

Presented by eBiquity group, UMBC

Description:

Show me the instances of a class 'http://foo.com/Person' ... Data Finder. Swoogle. Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 6. Swoogle. Concepts ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 32
Provided by: ebiqui
Category:

less

Transcript and Presenter's Notes

Title: Presented by eBiquity group, UMBC


1
Swoogle
search and metadata for the semantic web
  • Presented by eBiquity group, UMBC
  • CIKM04, Nov 12, 2004

Partial research support was provided by DARPA
contract F30602-00-0591 and by NSF by awards
NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.
2
Outline
  • Motivation
  • Concepts
  • Demo
  • Architecture
  • document discovery
  • metadata creation
  • ontology rank
  • Status
  • Summary

http//swoogle.umbc.edu/
3
Motivation
  • (Google Web) has made us all smarter
  • something similar is needed by people and
    software agents for information on the semantic
    web

4
Motivation Common Questions
  • Find an ontology
  • What are the ontologies about time ?
  • Shall I use an existing ontology or create one?
  • Find instance data
  • Show me the instances of a class
    http//foo.com/Person?
  • Gather relevant information for my application.
  • Characterize the Semantic Web
  • How many RDF documents are online?
  • What are the most popular ontologies ?
  • What graph properties does the semantic web have?
  • Does namespace URI link to the corresponding
    ontology?

5
The Role of Swoogle in Semantic Web
Swoogle
6
Related work
  • Ontology based annotation search
  • Annotate web documents
  • SHOE (UMCP, 1997)
  • Ontobroker (AIFB, karlsruhe, 1998),
  • WebKB (Martin Eklund, 1999),
  • QuizRDF (BT,2002)
  • Annotate proper reference relations
  • CREAM (AIFB,2003)
  • Ontology repositories
  • Ontology level
  • DAML Ontology Library
  • Schema Web
  • SemWebCentral
  • Term level
  • W3Cs Ontaria (2004)
  • Ontology management systems
  • Stanfords Ontolingua
  • IBMs Snobase
  • Based on both ontology and instance document
  • Automated discovery
  • Search and rank ontologies and terms
  • Digest but not store
  • Create metadata based on RDF and OWL semantics
  • Provide services to both human and software agents

Swoogle aims to be a Google-like online ontology
repository
7
Concepts
  • Document
  • A Semantic Web Document (SWD) is an online
    document written in semantic web languages (i.e.
    RDF and OWL).
  • An ontology document (SWO) is a SWD that contains
    mostly term definition (i.e. classes and
    properties). It corresponds to T-Box in
    Description Logic.
  • An instance document (SWI or SWDB) is a SWD that
    contains mostly class individuals. It corresponds
    to A-Box in Description Logic.
  • Term
  • A term is a non-anonymous RDF resource which is
    the URI reference of either a class or a
    property.
  • Individual
  • An individual refers to a non-anonymous RDF
    resource which is the URI reference of a class
    member.

In swoogle, a document D is a valid SWD iff.
JENA correctly parses D and produces at least
one triple.
JENA is a Java framework for writing Semantic
Web applications. http//www.hpl.hp.com/semweb/jen
a2.htm
rdftype
foafPerson
rdfsClass
rdftype
http//.../foaf.rdffinin
foafPerson
8
Concepts Example
http//foo.com/foaf.rdffinin
SWD
rdftype
foafPerson
SWO
SWI
http//foo.com/foaf.rdffinin
finin_at_umbc.edu
foafmbox
http//xmlns.com/foaf/1.0/
Individual
rdfssubClassOf
wordNetAgent
Class
foafPerson
Term
Property
rdftype
rdfsClass
rdfsdomain
NOTE Qualified Names (QName) are used to
shorten well-known namespaces as follows rdf
gt http//www.w3.org/1999/02/22-rdf-syntax-ns"
rdfs gt http//www.w3.org/2000/01/rdf-schema
foaf gt http//xmlns.com/foaf/1.0/ wordNet
gt http//xmlns.com/wordnet/1.6/
foafmbox
rdftype
rdfProperty
9
Demo
Find Time Ontology (Swoogle Search)
1
  • Digest Time Ontology
  • Document view
  • Term view

2
3
Find Term Person (Ontology Dictionary)
  • Digest Term Person
  • Class properties
  • (Instance) properties

4
Swoogle Statistics
5
10
Find Time Ontology
Demo1
We can use a set of keywords to search ontology.
For example, time, before, after are basic
concepts for a Time ontology.
11
Usage of Terms in SWD
http//www.cs.umbc.edu/finin/foaf.rdf
http//foo.com/foaf.rdf
rdftype
rdftype
foafPerson
foafPerson

foafmbox
http//foo.com/foaf.rdffinin
finin_at_umbc.edu
finin_at_umbc.edu
foafmbox
http//xmlns.com/foaf/1.0/
populated Class
rdfssubClassOf
wordNetAgent
populated Property
foafPerson
rdftype
rdfsClass
rdfsdomain
defined Class
foafmbox
rdftype
defined Property
rdfProperty
defined Individual
12
Digest Time Ontology (term view)
Demo2(a)
TimeZone
before
.
intAfter
13
Document Metadata
  • Web document metadata
  • When/how discovered/fetched
  • Suffix of URL
  • Last modified time
  • Document size
  • SWD metadata
  • Language features
  • OWL species
  • RDF encoding
  • Statistical features
  • Defined/used terms
  • Declared/used namespaces
  • Ontology Ratio
  • Ontology Rank
  • Ontology annotation
  • Label
  • Version
  • Comment
  • Related Relational Metadata
  • Links to other SWDs
  • Imported SWDs
  • Referenced SWDs
  • Extended SWDs
  • Prior version
  • Links to terms
  • Classes/Properties defined/used

14
Digest Time Ontology (document view)
Demo2(b)
15
Find Term Person
Demo3
Not capitalized! URIref is case sensitive!
16
Term Metadata An integrated definition
  • Class Definition
  • rdfssubClassOf -- foafAgent
  • rdfslabel Person
  • Properties (from SWO)
  • foafmbox
  • foafname
  • Properties (from SWI)
  • foafname
  • dctitle

foafPerson
17
Digest Term Person
Demo4
167 different properties
562 different properties
18
Demo5
Swoogle Statistics
19
Swoogle Architecture
data analysis
interface
IR analyzer
SWD analyzer
Web Server
Web Service
SWD Metadata
SWD Cache
metadata creation
Agent Service
SWD Reader
SWD discovery
The Web
Candidate URLs
Web Crawler
20
1. SWD Discovery
  • Swoogle uses three crawlers to discover likely
    SWD URLs
  • A Google Crawler uses Google to find URLs using
  • keywords http//www.w3.org/2000/01/rdf-schema,...
  • File type suffices .rdf, .owl
  • A Focused Crawler crawls through HTML files
    recursively within the given website.
  • A SWD Crawler crawls through SWDs and discover
    URLs according to term semantics.
  • To determine the likely SWD URLs
  • Non-swd extension filter .jpg, .mp3, and etc.
  • Protocol filter file//, urn, and etc.
  • Namespace of RDF resources in SWD

21
2. Metadata Creation
  • Document metadata
  • General metadata
  • SWD metadata
  • Ontology metadata
  • Term Metadata (definition)
  • Class property
  • (Instance) property i.e. class-property bond
  • Relational metadata

Term Document
Term rdfssubClassOf, rdfsdomain rdfsseeAlso,
Document Uses, Defines, owlimports,
22
2.1 Ontology Ratio
  • Why?
  • The fuzzy distinction between ontology and
    instance document
  • Given a SWD foo, and let
  • C(foo) the set of classes defined in foo
  • P(foo) the set of properties defined in foo
  • I(foo) the set of instances defined in foo
  • Ontology Ratio as a heuristic to do the
    classification
  • 0 pure SWI
  • 1 pure SWO
  • gt 0.8 foo is said to be an ontology.

23
2.2 Relational Metadata
  • Inter-document relation
  • rdfsseeAlso
  • IMport (IM) e.g. owlimport
  • Similar/Equal SWD
  • Inter-term relation
  • EXtension (EX) e.g. rdfssubClassOf
  • use-TerM (TM) e.g. rdfrange
  • use-INdividual (IN) e.g. owlsameAs
  • Prior Version (PV, IPV, CPV)
  • Generalized inter-document relations
  • Generalized from individual level relation
  • Capture more relations while with less complexity
  • Usage
  • Link SWDs
  • Ontology rank

24
3. Data analysis Ranking SWD
  • Why?
  • Ranking captures page importance and popularity
  • Ranking has been proven useful in HTML search.
  • SWD is different from HTML and has more semantics
  • So, a new SWD ranking mechanism is needed !
  • Related ideas?
  • Googles PageRank
  • Kleinbergs HITS

25
3.1 Random surfer model (PageRank)
  • How PageRank is computed?
  • page As rank is
  • Where
  • Ti are the pages that link to A
  • C(X) of page Xs out links
  • d is a damping factor (e.g., 0.85)
  • Compute by iterating until converge
  • Uniform probability of following any link is
    convention in the Web but not in the SW
  • Links have semantics that influence the
    probability of following them
  • Rational users read an ontology and all
    ontologies it referenced.

Jump to a random page
read page
bored?
yes
no
Follow arandom link
26
3.2 Rational Random Surfer Model
  • Weighted random behavior
  • Rational behavior
  • Rank of a SWI
  • Rank of a a SWO

1
Jump to a random page
read page
SWO?
no
yes
2
Read referenced SWOs
2
bored?
yes
no
1
Follow arandom link
where TC(A) is transitive closure of SWOs
referencing A.
27
3.3 Ontology Rank Example
http//www.cs.umbc.edu/finin/foaf.rdf
rdftype
foafPerson
foafmbox

finin_at_umbc.edu
28
3.3 Ontology Rank Example (contd)
http//www.w3.org/2000/01/rdf-schema
rawPR 300
PR 403
TM
http//xmlns.com/wordnet/1.6/
TM
rawPR 3
PR 103
EX
http//xmlns.com/foaf/1.0/
TM
rawPR 100
PR 100
http//www.cs.umbc.edu/finin/foaf.rdf
rawPR 0.2
PR 0.2
29
Current Status
  • Swoogle Watch reported (Nov 7, 2004)
  • 40 M triples
  • 270 K SWDs 4k ontologies
  • 144 K terms 91K classes 51K properties
  • Ongoing work
  • Ontology Dictionary
  • Swoogle Statistics
  • Web Service interface (see Swoogle website)
  • IR with the Semantic Web (Content search)
  • Character N-Grams
  • Bag of URIrefs
  • Swangling

30
Summary
2004
  • Automated SWD discovery
  • SWD metadata creation and search
  • Ontology rank (rational surfer model)
  • Swoogle watch
  • Web Interface

Swoogle (Mar, 2004)
  • Ontology dictionary
  • Swoogle statistics
  • Web service interface (WSDL)
  • Bag of URIref IR search

Swoogle2 (Sep, 2004)
  • Better crawl refresh strategies
  • More metadata (ontology mapping)
  • More IR features
  • Better web service interfaces
  • Capture and store all triples
  • More reasoning

2005
Swoogle3
31
The End
Questions?
  • Website http//swoogle.umbc.edu
  • Slides at http//ebiquity.umbc.edu/v2.1/resource/
    html/id/66/
  • Demo http//ebiquity.umbc.edu/v2.1/resource/html/
    id/65/
Write a Comment
User Comments (0)
About PowerShow.com