Title: Swoogle Tutorial Part I: Swoogle R
1Swoogle Tutorial (Part I Swoogle R D)
Presented by eBiquity Lab, CSEE, UMBC
- A brief introduction to Swoogle
- An overview of Swoogle research
- A summary of Swoogle development
21. Introduction
- Motivation
- Swoogle in the Semantic Web
- Glossary
- Swoogle Architecture
Swoogle
3Motivation
- (Google Web) has made us all smarter
- something similar is needed by people and
software agents for information on the semantic
web
4The Role of Swoogle in Semantic Web
Swoogle
5Concepts Explained
http//foo.com/foaf.rdffinin
SWD
rdftype
foafPerson
SWO
SWI
http//foo.com/foaf.rdffinin
finin_at_umbc.edu
foafmbox
http//xmlns.com/foaf/1.0/
Individual
rdfssubClassOf
wordNetAgent
Class
foafPerson
Term
Property
rdftype
rdfsClass
rdfsdomain
NOTE Qualified Names (QName) are used to
shorten well-known namespaces as follows rdf
http//www.w3.org/1999/02/22-rdf-syntax-ns"
rdfs http//www.w3.org/2000/01/rdf-schema
foaf http//xmlns.com/foaf/1.0/ wordNet
http//xmlns.com/wordnet/1.6/
foafmbox
rdftype
rdfProperty
6Glossary
- Document
- A Semantic Web Document (SWD) is an online
document written in semantic web languages (i.e.
RDF and OWL). - An ontology document (SWO) is a SWD that contains
mostly term definition (i.e. classes and
properties). It corresponds to T-Box in
Description Logic. - An instance document (SWI or SWDB) is a SWD that
contains mostly class individuals. It corresponds
to A-Box in Description Logic. - Term
- A term is a non-anonymous RDF resource which is
the URI reference of either a class or a
property. - Individual
- An individual refers to a non-anonymous RDF
resource which is the URI reference of a class
member.
In swoogle, a document D is a valid SWD iff.
JENA correctly parses D and produces at least
one triple.
JENA is a Java framework for writing Semantic
Web applications. http//www.hpl.hp.com/semweb/jen
a2.htm
rdftype
foafPerson
rdfsClass
rdftype
http//.../foaf.rdffinin
foafPerson
7Swoogle Architecture
data analysis
interface
IR analyzer
SWD analyzer
Web Server
Web Service
SWD Metadata
SWD Cache
metadata creation
Agent Service
SWD Reader
SWD discovery
The Web
Candidate URLs
Web Crawler
82. Swoogle Research
- Discovery
- Digest
- Search Navigation
- Rank
- Statistics
Swoogle
9Discovery - research
- Discovering URLs of possible SWD automatically
- Google-crawler
- Focused-crawler
- Semantic-Web-crawler, e.g. scutter
- Revisiting URLs
10Discovery -- results
- Crawler performance
- Google crawler is the best
- Focused crawler needs to be improved
- Verified pure SWDs are only 1/3 of discovered
URLs - Some NSWDs contains embedded RDF graph.
Source Swoogle (2005-Jan-05) SELECT
discovered_by, sum(isRDF), sum(1-isRDF),
count() FROM digest_url WHERE 1 group by
discovered_by
11Digest -- research
- Document metadata
- Annotative
- General metadata
- SWD metadata
- Ontology metadata
- Inter-document relations
- Document-term relations
- Term metadata
- Term Definition
- Inter-term Relation
- Class-property bond (C-P bond) rdfsdomain
- Property-Class bond (P-C bond) rdfsrange
12Document Metadata
- Web document metadata
- When/how discovered/fetched
- Suffix of URL
- Last modified time
- Document size
- SWD metadata
- Language features
- OWL species
- RDF encoding
- Statistical features
- of Defined/used terms
- of Declared/used namespaces
- Ontology Ratio
- Ontology Rank
- Ontology annotation
- Label
- Version
- Comment
- Relations
- Links to other SWDs
- Imported SWDs
- Referenced SWDs
- Extended SWDs
- Prior version
- Links to terms
- Classes/properties defined
- Classes/properties used
13Digest Time Ontology (document view)
Demo2(a)
14Document-Term Relation
http//www.cs.umbc.edu/finin/foaf.rdf
http//foo.com/foaf.rdf
rdftype
rdftype
foafPerson
foafPerson
foafmbox
http//foo.com/foaf.rdffinin
finin_at_umbc.edu
finin_at_umbc.edu
foafmbox
http//xmlns.com/foaf/1.0/
populated Class
rdfssubClassOf
wordNetAgent
populated Property
foafPerson
rdftype
rdfsClass
rdfsdomain
defined Class
foafmbox
rdftype
defined Property
rdfProperty
defined Individual
15Digest Time Ontology (term view)
Demo2(b)
.
16Term Metadata
- Term Definition
- rdfssubClassOf -- foafAgent
- rdfslabel Person
- C-P bond (from SWO)
- foafmbox
- foafname
- C-P bond (from SWI)
- foafname
- dctitle
foafPerson
17Digest Term Person
Demo4
18Term Distribution (grouped by local name)
19Digest -- result
Ontological Term Distribution (populated,
defined)
Source Swoogle (2005-Jan-05) SELECT
res_type,sign(cnt_instance_populate0),
sign(cnt_swd_def0),count(), sum(cnt_instance_pop
ulate) FROM digest_term WHERE 1 group by
res_type, sign(cnt_instance_populate0),
sign(cnt_swd_def0)
20Search Navigation -- research
- The Semantic Web is not the Web
- Search service
- Document search RDF document is not free text
- Term search URIref and compound local name
- Navigation service
- The RDF graph Typed links
- The web of RDF documents Few hyperlinks
- The social network of agents trust provenance
21Find Time Ontology
Demo1
We can use a set of keywords to search ontology.
For example, time, before, after are basic
concepts for a Time ontology.
22Find Term Person
Demo3
Not capitalized! URIref is case sensitive!
23Current Swoogle Navigation Model
- A URIref refers to
- A term, i.e. instance of RDFS class/property
- An individual, i.e. populated terms
- A SWD could be
- SWO term definition
- SWI individuals
- Observations
- RDF Resources are semantically linked in RDF
graph - SWDs are poorly linked due to the absence of
explicit hyperlink concept - Ontologies are more interesting
- Approach
- Build inter-document relations
- Rational surfing model
24Semantic Web Navigation Model new!
sameNamespace sameLocalname
RDF Graph Navigation
Term Search
URIref
usesNamespace
Resource
Namespace
rdfsOntology owldlOntology
isUsedBy
isDefinedBy
populatesClass populatesProperty refersClass refer
sProperty
definesClass definesProperty
URL
rdfssubClassOf
RDF Document
Ontology
owlimports owlpriorVersion owlbackwardCompatibl
eWith owlimcompatiableWith
rdfsseeAlso rdfsisDefinedBy
Document Search
25Ranking -- research
- Surfing models
- Ranking method
- PageRank variation
26Ranking with Rational Surfing Model An Example
http//www.cs.umbc.edu/finin/foaf.rdf
rdftype
foafPerson
foafmbox
finin_at_umbc.edu
27Demo6
Swoogle top 10
Swoogle use PageRank like algorithm to rank
semantic web documents. Well-known ontologies are
highly ranked.
This report is dynamically generated based on the
latest data, and it will take 5 to 10 seconds.
28Statistics research
- Summarize the dataset collected by Swoogle
- Swoogle Watch
- Swoogle Today
- Distribution of visited URLs
- Document discovery log
- Term discovery log
- Semantic Web Watch
- SWD distribution by last-modified month
- SWD distribution by website
- SWD distribution by suffix
- Ontology Watch
- Term (class/property) usage
- Namespace usage
29Demo5(a)
Swoogle Today
30Demo5(b)
Swoogle Statistics
FOAF
Trustix
W3C
Stanford
31Demo5(c)
Swoogle Statistics
32Miscellaneous
- Submit URL for focused Crawler
- Swoogle Web Service (Delivered in Sept.)
http//swoogle.umbc.edu/webservice/ - Search document
- Search term
- Term digest
33Demo7
Submit URL for focused crawler
When you cant find your ontologies in Swoogle,
it may be the case that your ontologies are not
indexed by swoogle yet. Please submit it and
increase its visibility.
When your query fails
From site map
343. Summary
Swoogle
35Summary
2004
- Automated SWD discovery
- SWD metadata creation and search
- Ontology rank (rational surfer model)
- Swoogle watch
- Web Interface
Swoogle (Mar, 2004)
- Ontology dictionary
- Swoogle statistics
- Web service interface (WSDL)
- Bag of URIref IR search
Swoogle2 (Sep, 2004)
- Better discovery revisit strategies
- Better navigation models
- Semantic web dataset
- Index Instance data
- More metadata (ontology mapping)
- Better web service interfaces
2005
Swoogle3
36Current Status
- Swoogle Watch reported (Jan 6, 2005)
- 46.7 M triples
- 336 K SWDs 4k ontologies
- 153 K terms 94K classes 59K properties
- Ongoing work
- Research
- Self-adaptive SWD Discovery
- Efficient SWD digest and RDF Graph Abstract
- Semantic Web navigation model
- Engineering
- Enhancing Web Service interface