Swoogle Tutorial Part I: Swoogle R - PowerPoint PPT Presentation

About This Presentation
Title:

Swoogle Tutorial Part I: Swoogle R

Description:

Web Crawler. SWD Reader. IR analyzer. SWD analyzer. Agent Service. 2. ... Semantic-Web-crawler, e.g. scutter. Revisiting URLs. eBiquity Lab, CSEE, UMBC. Swoogle ... – PowerPoint PPT presentation

Number of Views:359
Avg rating:3.0/5.0
Slides: 37
Provided by: ebiqui
Category:

less

Transcript and Presenter's Notes

Title: Swoogle Tutorial Part I: Swoogle R


1
Swoogle Tutorial (Part I Swoogle R D)
Presented by eBiquity Lab, CSEE, UMBC
  • A brief introduction to Swoogle
  • An overview of Swoogle research
  • A summary of Swoogle development

2
1. Introduction
  • Motivation
  • Swoogle in the Semantic Web
  • Glossary
  • Swoogle Architecture

Swoogle
3
Motivation
  • (Google Web) has made us all smarter
  • something similar is needed by people and
    software agents for information on the semantic
    web

4
The Role of Swoogle in Semantic Web
Swoogle
5
Concepts Explained
http//foo.com/foaf.rdffinin
SWD
rdftype
foafPerson
SWO
SWI
http//foo.com/foaf.rdffinin
finin_at_umbc.edu
foafmbox
http//xmlns.com/foaf/1.0/
Individual
rdfssubClassOf
wordNetAgent
Class
foafPerson
Term
Property
rdftype
rdfsClass
rdfsdomain
NOTE Qualified Names (QName) are used to
shorten well-known namespaces as follows rdf
http//www.w3.org/1999/02/22-rdf-syntax-ns"
rdfs http//www.w3.org/2000/01/rdf-schema
foaf http//xmlns.com/foaf/1.0/ wordNet
http//xmlns.com/wordnet/1.6/
foafmbox
rdftype
rdfProperty
6
Glossary
  • Document
  • A Semantic Web Document (SWD) is an online
    document written in semantic web languages (i.e.
    RDF and OWL).
  • An ontology document (SWO) is a SWD that contains
    mostly term definition (i.e. classes and
    properties). It corresponds to T-Box in
    Description Logic.
  • An instance document (SWI or SWDB) is a SWD that
    contains mostly class individuals. It corresponds
    to A-Box in Description Logic.
  • Term
  • A term is a non-anonymous RDF resource which is
    the URI reference of either a class or a
    property.
  • Individual
  • An individual refers to a non-anonymous RDF
    resource which is the URI reference of a class
    member.

In swoogle, a document D is a valid SWD iff.
JENA correctly parses D and produces at least
one triple.
JENA is a Java framework for writing Semantic
Web applications. http//www.hpl.hp.com/semweb/jen
a2.htm
rdftype
foafPerson
rdfsClass
rdftype
http//.../foaf.rdffinin
foafPerson
7
Swoogle Architecture
data analysis
interface
IR analyzer
SWD analyzer
Web Server
Web Service
SWD Metadata
SWD Cache
metadata creation
Agent Service
SWD Reader
SWD discovery
The Web
Candidate URLs
Web Crawler
8
2. Swoogle Research
  • Discovery
  • Digest
  • Search Navigation
  • Rank
  • Statistics

Swoogle
9
Discovery - research
  • Discovering URLs of possible SWD automatically
  • Google-crawler
  • Focused-crawler
  • Semantic-Web-crawler, e.g. scutter
  • Revisiting URLs

10
Discovery -- results
  • Crawler performance
  • Google crawler is the best
  • Focused crawler needs to be improved
  • Verified pure SWDs are only 1/3 of discovered
    URLs
  • Some NSWDs contains embedded RDF graph.

Source Swoogle (2005-Jan-05) SELECT
discovered_by, sum(isRDF), sum(1-isRDF),
count() FROM digest_url WHERE 1 group by
discovered_by
11
Digest -- research
  • Document metadata
  • Annotative
  • General metadata
  • SWD metadata
  • Ontology metadata
  • Inter-document relations
  • Document-term relations
  • Term metadata
  • Term Definition
  • Inter-term Relation
  • Class-property bond (C-P bond) rdfsdomain
  • Property-Class bond (P-C bond) rdfsrange

12
Document Metadata
  • Web document metadata
  • When/how discovered/fetched
  • Suffix of URL
  • Last modified time
  • Document size
  • SWD metadata
  • Language features
  • OWL species
  • RDF encoding
  • Statistical features
  • of Defined/used terms
  • of Declared/used namespaces
  • Ontology Ratio
  • Ontology Rank
  • Ontology annotation
  • Label
  • Version
  • Comment
  • Relations
  • Links to other SWDs
  • Imported SWDs
  • Referenced SWDs
  • Extended SWDs
  • Prior version
  • Links to terms
  • Classes/properties defined
  • Classes/properties used

13
Digest Time Ontology (document view)
Demo2(a)
14
Document-Term Relation
http//www.cs.umbc.edu/finin/foaf.rdf
http//foo.com/foaf.rdf
rdftype
rdftype
foafPerson
foafPerson

foafmbox
http//foo.com/foaf.rdffinin
finin_at_umbc.edu
finin_at_umbc.edu
foafmbox
http//xmlns.com/foaf/1.0/
populated Class
rdfssubClassOf
wordNetAgent
populated Property
foafPerson
rdftype
rdfsClass
rdfsdomain
defined Class
foafmbox
rdftype
defined Property
rdfProperty
defined Individual
15
Digest Time Ontology (term view)
Demo2(b)
.
16
Term Metadata
  • Term Definition
  • rdfssubClassOf -- foafAgent
  • rdfslabel Person
  • C-P bond (from SWO)
  • foafmbox
  • foafname
  • C-P bond (from SWI)
  • foafname
  • dctitle

foafPerson
17
Digest Term Person
Demo4
18
Term Distribution (grouped by local name)
19
Digest -- result
Ontological Term Distribution (populated,
defined)
Source Swoogle (2005-Jan-05) SELECT
res_type,sign(cnt_instance_populate0),
sign(cnt_swd_def0),count(), sum(cnt_instance_pop
ulate) FROM digest_term WHERE 1 group by
res_type, sign(cnt_instance_populate0),
sign(cnt_swd_def0)
20
Search Navigation -- research
  • The Semantic Web is not the Web
  • Search service
  • Document search RDF document is not free text
  • Term search URIref and compound local name
  • Navigation service
  • The RDF graph Typed links
  • The web of RDF documents Few hyperlinks
  • The social network of agents trust provenance

21
Find Time Ontology
Demo1
We can use a set of keywords to search ontology.
For example, time, before, after are basic
concepts for a Time ontology.
22
Find Term Person
Demo3
Not capitalized! URIref is case sensitive!
23
Current Swoogle Navigation Model
  • A URIref refers to
  • A term, i.e. instance of RDFS class/property
  • An individual, i.e. populated terms
  • A SWD could be
  • SWO term definition
  • SWI individuals
  • Observations
  • RDF Resources are semantically linked in RDF
    graph
  • SWDs are poorly linked due to the absence of
    explicit hyperlink concept
  • Ontologies are more interesting
  • Approach
  • Build inter-document relations
  • Rational surfing model

24
Semantic Web Navigation Model new!
sameNamespace sameLocalname
RDF Graph Navigation
Term Search
URIref
usesNamespace
Resource
Namespace
rdfsOntology owldlOntology
isUsedBy
isDefinedBy
populatesClass populatesProperty refersClass refer
sProperty
definesClass definesProperty
URL
rdfssubClassOf
RDF Document
Ontology
owlimports owlpriorVersion owlbackwardCompatibl
eWith owlimcompatiableWith
rdfsseeAlso rdfsisDefinedBy
Document Search
25
Ranking -- research
  • Surfing models
  • Ranking method
  • PageRank variation

26
Ranking with Rational Surfing Model An Example
http//www.cs.umbc.edu/finin/foaf.rdf
rdftype
foafPerson
foafmbox

finin_at_umbc.edu
27
Demo6
Swoogle top 10
Swoogle use PageRank like algorithm to rank
semantic web documents. Well-known ontologies are
highly ranked.
This report is dynamically generated based on the
latest data, and it will take 5 to 10 seconds.
28
Statistics research
  • Summarize the dataset collected by Swoogle
  • Swoogle Watch
  • Swoogle Today
  • Distribution of visited URLs
  • Document discovery log
  • Term discovery log
  • Semantic Web Watch
  • SWD distribution by last-modified month
  • SWD distribution by website
  • SWD distribution by suffix
  • Ontology Watch
  • Term (class/property) usage
  • Namespace usage

29
Demo5(a)
Swoogle Today
30
Demo5(b)
Swoogle Statistics
FOAF
Trustix
W3C
Stanford
31
Demo5(c)
Swoogle Statistics
32
Miscellaneous
  • Submit URL for focused Crawler
  • Swoogle Web Service (Delivered in Sept.)
    http//swoogle.umbc.edu/webservice/
  • Search document
  • Search term
  • Term digest

33
Demo7
Submit URL for focused crawler
When you cant find your ontologies in Swoogle,
it may be the case that your ontologies are not
indexed by swoogle yet. Please submit it and
increase its visibility.
When your query fails
From site map
34
3. Summary
  • Summary
  • Current Status

Swoogle
35
Summary
2004
  • Automated SWD discovery
  • SWD metadata creation and search
  • Ontology rank (rational surfer model)
  • Swoogle watch
  • Web Interface

Swoogle (Mar, 2004)
  • Ontology dictionary
  • Swoogle statistics
  • Web service interface (WSDL)
  • Bag of URIref IR search

Swoogle2 (Sep, 2004)
  • Better discovery revisit strategies
  • Better navigation models
  • Semantic web dataset
  • Index Instance data
  • More metadata (ontology mapping)
  • Better web service interfaces

2005
Swoogle3
36
Current Status
  • Swoogle Watch reported (Jan 6, 2005)
  • 46.7 M triples
  • 336 K SWDs 4k ontologies
  • 153 K terms 94K classes 59K properties
  • Ongoing work
  • Research
  • Self-adaptive SWD Discovery
  • Efficient SWD digest and RDF Graph Abstract
  • Semantic Web navigation model
  • Engineering
  • Enhancing Web Service interface
Write a Comment
User Comments (0)
About PowerShow.com