Scalability : A Semantic Web Perspective - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Scalability : A Semantic Web Perspective

Description:

used to evaluate semantic web reasoning systems. becoming a de facto standard in the ... Racer on 2.4GHz / 256MB / Windows XP. Racer not available for Solaris ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 48
Provided by: jeffh8
Category:

less

Transcript and Presenter's Notes

Title: Scalability : A Semantic Web Perspective


1
Scalability A Semantic Web Perspective
  • Jeff Heflin
  • Lehigh University

2
My Background
  • Semantic Web
  • eight years of experience
  • helped design DAMLOIL and OWL
  • research focus
  • scalable reasoning
  • distributed semantics
  • distributed queries

3
Reasoning vs. Resources
  • Lehigh University Benchmark
  • used to evaluate semantic web reasoning systems
  • becoming a de facto standard in the Semantic Web
    community
  • helping to push research on scalability
  • Features
  • OWL ontology for university domain (moderate
    complexity)
  • customizable data generation
  • can select number of universities and random
    number generator seed
  • arbitrary size
  • repeatable
  • plausible
  • real world constraints are applied
  • Naming Scheme
  • LUBM( of universities, seed)

4
Metrics
  • initialization metrics
  • load time
  • total time to load input files and do any
    pre-processing
  • repository size
  • disk space utilized (for systems with secondary
    storage only)
  • query metrics
  • 14 queries that test a range of features
  • query response time
  • each query is executed 10 times and averaged to
    account for caching
  • degree of completeness
  • of correct answers / of entailed answers
  • degree of soundness
  • of correct answers / of returned answers

5
Experiment Data Loading
  • Time Sesame-mem is the fastest in loading up to
    10 univ.
  • DLDB scales better
  • OWLJessKB is the slowest, can only load 1 univ.
  • Space DLDB also scales better
  • Only DLDB loaded 50 univ. in a day

6
Experiment Query Time
7
Completeness vs. Response Time
8
State-of-the-art Scalable Reasoners
  • Sesame 2 (Aduna)
  • RDF(S) simple semantic net style reasoning
  • LUBM(500,0) ( 70 million statements)
  • BRAHMS (U. of Georgia)
  • RDF(S) simple semantic net style reasoning
  • LUBM(700,0) ( 100 million statements)
  • query time 1-5 minutes
  • OWLIM (Ontotext)
  • complete for most of OWL Lite (a simple DL)
  • LUBM(300,0) ( 40 million statements)
  • DLDB (Lehigh)
  • complete for a large subset of OWL DL (an
    expressive DL)
  • LUBM(100,0) ( 13 million statements)
  • 45 million statements (real Semantic Web
    documents)

9
Challenges for MANET Reasoners
  • scalability
  • millions of devices each with potentially
    millions of facts (changing over time)
  • no big servers
  • complete reasoning is infeasible
  • how to find the best answers first?
  • other considerations
  • power consumption
  • bandwidth consumption
  • how do we empirically compare approaches?
  • develop a MANET benchmark
  • to drive research on reasoners for the unique
    aspects of this problem

10
MANET Benchmark
  • more distributed than LUBM
  • dynamically changing data?
  • ontologies
  • policy ontologies, device ontologies, etc.
  • new metrics
  • power consumption
  • bandwidth consumption
  • time to first answer
  • relative quality of answers

11
For more information...
  • My information
  • heflin_at_cse.lehigh.edu
  • http//www.cse.lehigh.edu/heflin/
  • For more on the Semantic Web
  • http//www.semwebcentral.org/
  • http//www.w3.org/2001/sw/
  • http//www.daml.org/
  • http//www.semanticweb.org/

12
The End
13
Ontology
  • Definition
  • a logical theory that accounts for the intended
    meaning of a formal vocabulary (Guarino 98)
  • has a formal syntax and unambiguous semantics
  • inference algorithms can compute what logically
    follows
  • Relevance to Web
  • identify context
  • provide shared definitions
  • eases the integration of distinct resources

14
RDF and RDF Schema
ltrdfsProperty rdfIDnamegt ltrdfsdomain
rdfresourcePersongt lt/rdfsPropertygt ltrdfsCl
ass rdfIDChairgt ltrdfssubclassOf
rdfresource http//schema.org/genPerson
gt lt/rdfsClassgt
rdfsClass
rdfsProperty
rdftype
rdftype
gPerson
rdftype
rdfsdomain
rdfssubclassOf
ltrdfRDF xmlnsghttp//schema.org/gen
xmlnsuhttp//schema.org/univgt ltuChair
rdfIDjohngt ltgnamegtJohn Smithlt/gnamegt
lt/uChairgt lt/rdfRDFgt
uChair
gname
rdftype
gname
John Smith
15
URIs and Namespaces
  • URI
  • Uniform Resource Identifier
  • includes URLs
  • but also anything that you can design an
    identification scheme for
  • helps to prevent collision of names
  • all the symbols in RDF are either URIs or
    Literals
  • Namespace
  • a mechanism for abbreviating URIs
  • by assigning a prefix for a URI fragment

16
OWL
markup linked to semantics
  • Web Ontology Language
  • W3C Recommendation
  • released Feb. 2004
  • based on RDF

ltrdfDescription rdfaboutgt ltimports
resourcewww.books.com/bookontgt ltrdfDescription
gt ltBook rdfIDbook26489gt ltauthorgtE.B.
Whitelt/authorgt lttitlegtCharlottes
Weblt/titlegt ltpricegt6.99lt/pricegt ltsubject
rdfresourcebookontFictionChildgt lt/Bookgt
semantic markup
ltClass IDBookgt ltProperty IDsubjectgt
ltdomain resourceBookgt ltrange
resourceTopicgt lt/Propertygt ltClass
IDFictionChildgt ltsubclassOf
resourceFictiongt ltsubclassOf
resourceChildrensgt lt/Classgt
imports
bookont ontology
17
Species of OWL
  • OWL Full
  • very expressive (e.g., classes as instances)
  • theoretical properties not well understood
  • OWL DL
  • has a standard model theoretic semantics
  • OWL Lite
  • subset of OWL DL
  • easier to reason with

18
OWL Class Constructors
borrowed from Ian Horrocks
19
OWL Axioms
borrowed from Ian Horrocks
20
Benefit of Description Logic
  • optimized computation of subsumption
  • calculate implicit subClassOf relations
  • ontology integration
  • if two ontologies use class expressions to define
    their vocabularies in terms of a third ontology,
    then subsumption can be used to compute an
    integrated ontology

21
OWL RDF Syntax
  • ltowlClass rdfIDBandgt ltrdfssubClassOfgt
    ltowlRestrictiongt ltowlonProperty
    rdfresourcehasMember /gt
    ltowlallValuesFrom resourceMusician /gt
    lt/owlRestrictiongt lt/rdfssubClassOfgtlt/owlCla
    ssgt
  • A Band is a subset of the set of objects which
    only have Musicians as members

22
OWL Inference
ltowlProperty rdfIDheadgt
ltrdfsubPropertyOf rdfsresourcemembe
r /gtlt/owlPropertygt ltowlClass
rdfIDTerroristgt ltowlsameClassAsgt
ltowlRestrictiongt ltowlonProperty
rdfresourcemember /gt
ltowlsomeValuesFrom
rdfresourceTerroristOrg /gt
lt/owlRestrictiongt lt/owlsameClassAsgtlt/owlCla
ssgt
  • The head of an organization is also a member of
    it
  • A member of a terror organization is a terrorist
  • Therefore, the head of a terror organization is a
    terrorist

type
Bin Laden
Terrorist
head
type
Al Qaeda
TerrorOrg
23
A Web of Ontologies
revises
commits to
A1
A2
S1
extends
extends
extends
extends
revises
revises
B3
B1
B2
C1
D1
extends
extends
extends
commits to
commits to
commits to
S4
E1
F1
S5
commits to
commits to
S2
S3
24
Criticisms of the Semantic Web
  • Who will create all of the RDF/OWL data?
  • How do you integrate heterogeneous ontologies?
  • How can you handle spam / deceit /
    misinformation?
  • How can a system based on formal logic achieve
    Web scale?

25
Semantic Web Scalability
  • Questions
  • what inference algorithms are best for large
    scale data?
  • can AI reasoning be combined with databases to
    achieve the best of both worlds?
  • how do we accurately evaluate systems when there
    is relatively little real world data available?
  • how do we compare systems with very different
    capabilities?

26
DLDB
  • approach
  • lightweight coupling of a database and a
    description logic reasoner
  • optimized table design
  • implementation
  • DL Description Logics (FaCT reasoner)
  • rich inference capability
  • close correspondence to semantics of OWL
  • DB Relational Database (MicrosoftAccess)
  • ubiquitous DBMS for small to medium size databases

27
Design RDF(S) Entailment
  • Use views to reason about class membership

ltowlClass rdfIDStudent/gt ltowlClass
rdfID"UndergraduateStudent"gt
ltrdfssubClassOf rdfresource"Student"
/gt ltowlClass/gt
CREATE VIEW Student_v AS SELECT FROM
Student UNION SELECT FROM
UndergraduateStudent_view
28
Design OWL Entailment
Student ? Person who takes a Course GraduateStud
ent ? Person who takes a GraduateCourse GraduateCo
urse ? Course
Ontology
DL Reasoner
Graduate Student ? Student
Inferred Hierarchy
table view creation
CREATE VIEW Student_1_view AS SELECT FROM
Student_1 UNION SELECT FROM UndergraduateStuden
t_1_view UNION SELECT FROM GraduateStudent_1_vie
w
Database operation
29
Implementation DB Schema
Student_1
Ontologies_Index
Source_Index
TakeCourse_1
URI_Index
30
Implementation Query
Query Interface application
(Type GraduateStudent ?X) (TakeCourse ?X
http//www.foo.edu/dept0/course0)
KIF-like conjunctive query
Query API
SELECT GraduateStudent_2_view.ID FROM
GraduateStudent_2_view, takeCourse_2_view WHERE
GraduateStudent_2_view.id takeCourse_2_view.sub
ject AND takeCourse_2_view.object
http//www.foo.edu/dept0/course0
Query Translation Algorithm
SQL Sentences
RDBMS
31
Benchmark System
32
Initial Experiment
  • Conducted in 2004
  • Four systems tested
  • Sesame Memory, Sesame DB, OWLJessKB, DLDB
  • Five data sizes
  • ranging from 15 files (8 MB) to 999 files (583
    MB)
  • Test Environment
  • 1.8G/256MB mem/80GB disk/WinXP Pro
  • JDK 1.4.1, 512MB max heap size, (1 GB for
    OWLJessKB)
  • note, this is a very inexpensive platform

33
Results - Completeness
34
Results - Soundness
  • Sesame and DLDB were sound on all queries
  • OWLJessKB was unsound on some queries
  • this problem has been fixed in the most recent
    release of OWLJessKB

35
Results Query Time Scaling
  • Some queries DLDB did better, others Sesame-DB
    did better

36
Results - Overall
37
High Performance DLDB
  • Sun W2100z workstation
  • dual 64-bit Opteron / 2GB / Solaris10
  • RDBMS PostgreSQL 8.0
  • Racer on 2.4GHz / 256MB / Windows XP
  • Racer not available for Solaris
  • Two machines are connected via 100 Base-T
    Ethernet
  • Additional features
  • support for owlinverseOf
  • complete on 12 out of 14 queries.

38
Improved Scalability
  • Unlike MS Access, PostgreSQL has no limitations
    on number of tables and DB sizes
  • Conducted experiment with up to 13 million
    triples
  • The load times grew about proportionally to the
    dataset sizes
  • 2 GB disk space for the largest data set

39
Real Semantic Web Data
  • Used Swoogles crawl of Semantic Web documents
    (RDF and OWL)
  • High performance DLDB loaded 343,977 SW documents
    in 15.6 days
  • 41,741 ontologies
  • 45 million triples were stored using 8 GB
  • 50,976 classes and 24,094 properties
  • Sample queries showed reasonable response time
  • many queries under a second

40
Benchmark Architecture
41
Determination of Query Completeness and Soundness
42
Experimental Results and Their Interpretation
  • Combined metric (multi-metrics, multi-datasets)

43
Recent Work
  • High performance DLDB
  • DLDBs architecture allows easy composition of
    any SQL-compliant RDMBS and DIG-compliant DL
    reasoner.
  • Benchmarking of other systems

44
Improved Scalability (II)
  • Most query response time demonstrate linear
    increment as the data set size increase.
  • DLDB add support to owlinverseOf and make itself
    complete on 12 out of 14 queries.

45
Knowledge Acquisition
  • data
  • create or find relevant ontology
  • then either
  • convert existing forms to RDF
  • e.g., XML, relational DBs, CGs, etc.
  • information extraction
  • natural language processing
  • controlled English? (Sowa, yesterday)
  • ontologies
  • import existing ontologies
  • manual creation (e.g., Protogé)
  • machine learning
  • formal concept analysis? (Rudolph, yesterday)

46
Semantic Web Timeline
May 2001 Berners-Lee et al. Scientific
American article
Mar. 1996 - SHOE 0.90 (simple frames in HTML)
Feb. 1998 XML (semi-structured data for Web)
Feb. 1999 RDF (semantic nets in XML)
Feb. 2004 OWL (W3C Rec.)
1996
2004
2000
2002
1998
Jan. 1998 SHOE 1.0 (frames Horn logic)
Sep. 1998 Berners-Lees Semantic Web Roadmap
Mar. 2001 DAMLOIL (expressive DL in RDF)
June. 2002 1st Intl Semantic Web Conference
47
Semantic Web Challenges
  • The Web is distributed
  • many sources, varying authority
  • inconsistency
  • The Web is dynamic
  • representational needs may change
  • The Web is enormous
  • systems must scale well
  • The Web is an open-world
Write a Comment
User Comments (0)
About PowerShow.com