Title: Presented by eBiquity group, UMBC
1Swoogle
search and metadata for the semantic web
- Presented by eBiquity group, UMBC
- CIKM04, Nov 12, 2004
Partial research support was provided by DARPA
contract F30602-00-0591 and by NSF by awards
NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.
2Outline
- Motivation
- Concepts
- Demo
- Architecture
- document discovery
- metadata creation
- ontology rank
- Status
- Summary
http//swoogle.umbc.edu/
3Motivation
- (Google Web) has made us all smarter
- something similar is needed by people and
software agents for information on the semantic
web
4Motivation Common Questions
- Find an ontology
- What are the ontologies about time ?
- Shall I use an existing ontology or create one?
- Find instance data
- Show me the instances of a class
http//foo.com/Person? - Gather relevant information for my application.
- Characterize the Semantic Web
- How many RDF documents are online?
- What are the most popular ontologies ?
- What graph properties does the semantic web have?
- Does namespace URI link to the corresponding
ontology?
5The Role of Swoogle in Semantic Web
Swoogle
6Related work
- Ontology based annotation search
- Annotate web documents
- SHOE (UMCP, 1997)
- Ontobroker (AIFB, karlsruhe, 1998),
- WebKB (Martin Eklund, 1999),
- QuizRDF (BT,2002)
- Annotate proper reference relations
- CREAM (AIFB,2003)
- Ontology repositories
- Ontology level
- DAML Ontology Library
- Schema Web
- SemWebCentral
- Term level
- W3Cs Ontaria (2004)
- Ontology management systems
- Stanfords Ontolingua
- IBMs Snobase
- Based on both ontology and instance document
- Automated discovery
- Search and rank ontologies and terms
- Digest but not store
- Create metadata based on RDF and OWL semantics
- Provide services to both human and software agents
Swoogle aims to be a Google-like online ontology
repository
7Concepts
- Document
- A Semantic Web Document (SWD) is an online
document written in semantic web languages (i.e.
RDF and OWL). - An ontology document (SWO) is a SWD that contains
mostly term definition (i.e. classes and
properties). It corresponds to T-Box in
Description Logic. - An instance document (SWI or SWDB) is a SWD that
contains mostly class individuals. It corresponds
to A-Box in Description Logic. - Term
- A term is a non-anonymous RDF resource which is
the URI reference of either a class or a
property. - Individual
- An individual refers to a non-anonymous RDF
resource which is the URI reference of a class
member.
In swoogle, a document D is a valid SWD iff.
JENA correctly parses D and produces at least
one triple.
JENA is a Java framework for writing Semantic
Web applications. http//www.hpl.hp.com/semweb/jen
a2.htm
rdftype
foafPerson
rdfsClass
rdftype
http//.../foaf.rdffinin
foafPerson
8Concepts Example
http//foo.com/foaf.rdffinin
SWD
rdftype
foafPerson
SWO
SWI
http//foo.com/foaf.rdffinin
finin_at_umbc.edu
foafmbox
http//xmlns.com/foaf/1.0/
Individual
rdfssubClassOf
wordNetAgent
Class
foafPerson
Term
Property
rdftype
rdfsClass
rdfsdomain
NOTE Qualified Names (QName) are used to
shorten well-known namespaces as follows rdf
gt http//www.w3.org/1999/02/22-rdf-syntax-ns"
rdfs gt http//www.w3.org/2000/01/rdf-schema
foaf gt http//xmlns.com/foaf/1.0/ wordNet
gt http//xmlns.com/wordnet/1.6/
foafmbox
rdftype
rdfProperty
9Demo
Find Time Ontology (Swoogle Search)
1
- Digest Time Ontology
- Document view
- Term view
2
3
Find Term Person (Ontology Dictionary)
- Digest Term Person
- Class properties
- (Instance) properties
4
Swoogle Statistics
5
10Find Time Ontology
Demo1
We can use a set of keywords to search ontology.
For example, time, before, after are basic
concepts for a Time ontology.
11Usage of Terms in SWD
http//www.cs.umbc.edu/finin/foaf.rdf
http//foo.com/foaf.rdf
rdftype
rdftype
foafPerson
foafPerson
foafmbox
http//foo.com/foaf.rdffinin
finin_at_umbc.edu
finin_at_umbc.edu
foafmbox
http//xmlns.com/foaf/1.0/
populated Class
rdfssubClassOf
wordNetAgent
populated Property
foafPerson
rdftype
rdfsClass
rdfsdomain
defined Class
foafmbox
rdftype
defined Property
rdfProperty
defined Individual
12Digest Time Ontology (term view)
Demo2(a)
TimeZone
before
.
intAfter
13Document Metadata
- Web document metadata
- When/how discovered/fetched
- Suffix of URL
- Last modified time
- Document size
- SWD metadata
- Language features
- OWL species
- RDF encoding
- Statistical features
- Defined/used terms
- Declared/used namespaces
- Ontology Ratio
- Ontology Rank
- Ontology annotation
- Label
- Version
- Comment
- Related Relational Metadata
- Links to other SWDs
- Imported SWDs
- Referenced SWDs
- Extended SWDs
- Prior version
- Links to terms
- Classes/Properties defined/used
14Digest Time Ontology (document view)
Demo2(b)
15Find Term Person
Demo3
Not capitalized! URIref is case sensitive!
16Term Metadata An integrated definition
- Class Definition
- rdfssubClassOf -- foafAgent
- rdfslabel Person
- Properties (from SWO)
- foafmbox
- foafname
- Properties (from SWI)
- foafname
- dctitle
foafPerson
17Digest Term Person
Demo4
167 different properties
562 different properties
18Demo5
Swoogle Statistics
19Swoogle Architecture
data analysis
interface
IR analyzer
SWD analyzer
Web Server
Web Service
SWD Metadata
SWD Cache
metadata creation
Agent Service
SWD Reader
SWD discovery
The Web
Candidate URLs
Web Crawler
201. SWD Discovery
- Swoogle uses three crawlers to discover likely
SWD URLs - A Google Crawler uses Google to find URLs using
- keywords http//www.w3.org/2000/01/rdf-schema,...
- File type suffices .rdf, .owl
- A Focused Crawler crawls through HTML files
recursively within the given website. - A SWD Crawler crawls through SWDs and discover
URLs according to term semantics. - To determine the likely SWD URLs
- Non-swd extension filter .jpg, .mp3, and etc.
- Protocol filter file//, urn, and etc.
- Namespace of RDF resources in SWD
212. Metadata Creation
- Document metadata
- General metadata
- SWD metadata
- Ontology metadata
- Term Metadata (definition)
- Class property
- (Instance) property i.e. class-property bond
- Relational metadata
Term Document
Term rdfssubClassOf, rdfsdomain rdfsseeAlso,
Document Uses, Defines, owlimports,
222.1 Ontology Ratio
- Why?
- The fuzzy distinction between ontology and
instance document - Given a SWD foo, and let
- C(foo) the set of classes defined in foo
- P(foo) the set of properties defined in foo
- I(foo) the set of instances defined in foo
- Ontology Ratio as a heuristic to do the
classification - 0 pure SWI
- 1 pure SWO
- gt 0.8 foo is said to be an ontology.
232.2 Relational Metadata
- Inter-document relation
- rdfsseeAlso
- IMport (IM) e.g. owlimport
- Similar/Equal SWD
- Inter-term relation
- EXtension (EX) e.g. rdfssubClassOf
- use-TerM (TM) e.g. rdfrange
- use-INdividual (IN) e.g. owlsameAs
- Prior Version (PV, IPV, CPV)
- Generalized inter-document relations
- Generalized from individual level relation
- Capture more relations while with less complexity
- Usage
- Link SWDs
- Ontology rank
243. Data analysis Ranking SWD
- Why?
- Ranking captures page importance and popularity
- Ranking has been proven useful in HTML search.
- SWD is different from HTML and has more semantics
- So, a new SWD ranking mechanism is needed !
- Related ideas?
- Googles PageRank
- Kleinbergs HITS
253.1 Random surfer model (PageRank)
- How PageRank is computed?
- page As rank is
- Where
- Ti are the pages that link to A
- C(X) of page Xs out links
- d is a damping factor (e.g., 0.85)
- Compute by iterating until converge
- Uniform probability of following any link is
convention in the Web but not in the SW - Links have semantics that influence the
probability of following them - Rational users read an ontology and all
ontologies it referenced.
Jump to a random page
read page
bored?
yes
no
Follow arandom link
263.2 Rational Random Surfer Model
- Weighted random behavior
- Rational behavior
- Rank of a SWI
- Rank of a a SWO
-
1
Jump to a random page
read page
SWO?
no
yes
2
Read referenced SWOs
2
bored?
yes
no
1
Follow arandom link
where TC(A) is transitive closure of SWOs
referencing A.
273.3 Ontology Rank Example
http//www.cs.umbc.edu/finin/foaf.rdf
rdftype
foafPerson
foafmbox
finin_at_umbc.edu
283.3 Ontology Rank Example (contd)
http//www.w3.org/2000/01/rdf-schema
rawPR 300
PR 403
TM
http//xmlns.com/wordnet/1.6/
TM
rawPR 3
PR 103
EX
http//xmlns.com/foaf/1.0/
TM
rawPR 100
PR 100
http//www.cs.umbc.edu/finin/foaf.rdf
rawPR 0.2
PR 0.2
29Current Status
- Swoogle Watch reported (Nov 7, 2004)
- 40 M triples
- 270 K SWDs 4k ontologies
- 144 K terms 91K classes 51K properties
- Ongoing work
- Ontology Dictionary
- Swoogle Statistics
- Web Service interface (see Swoogle website)
- IR with the Semantic Web (Content search)
- Character N-Grams
- Bag of URIrefs
- Swangling
30Summary
2004
- Automated SWD discovery
- SWD metadata creation and search
- Ontology rank (rational surfer model)
- Swoogle watch
- Web Interface
Swoogle (Mar, 2004)
- Ontology dictionary
- Swoogle statistics
- Web service interface (WSDL)
- Bag of URIref IR search
Swoogle2 (Sep, 2004)
- Better crawl refresh strategies
- More metadata (ontology mapping)
- More IR features
- Better web service interfaces
- Capture and store all triples
- More reasoning
2005
Swoogle3
31The End
Questions?
- Website http//swoogle.umbc.edu
- Slides at http//ebiquity.umbc.edu/v2.1/resource/
html/id/66/ - Demo http//ebiquity.umbc.edu/v2.1/resource/html/
id/65/