Title: RDFS Reasoning and Query Answering on Top of DHTs
1RDFS Reasoning and Query Answering on Top of DHTs
- Zoi Kaoudi, Iris Miliaraki and Manolis Koubarakis
- Department of Informatics and Telecommunications
- National Kapodistrian University of Athens
2Outline
- Introduction
- Background
- Algorithms
- Evaluation
- Future Work
3Introduction
- RDFS reasoning essential for Semantic Web
applications - Centralized RDF stores
- Forward chaining
- Backward chaining
- Hybrid approach
- Time space trade-off
4Introduction
- DHT-based RDF stores
- No support for RDFS reasoning (RDFPeers, papers
by Liarou et. al, etc.) - Only BabelPeers Battre06 has considered RDFS
reasoning with a forward chaining approach only
5Our work
- Implementation of both forward chaining and
backward chaining algorithms in a real DHT system
that enables RDFS reasoning - Comparative study of algorithms
- Analytically
- Experimentally
6Background in DHTs
- Structured overlay networks
- Solve the item location problem in a distributed
and dynamic network of nodes (in O(log N) hops) - Let x be some data item. Find the node that holds
x! - For data items we use a key (K) to compute an
identifier (id) - Distributed version of hash table data structure
- idHash(K)
- Main operations
- Put(K, x) given a key K (for a data item), map
the key onto a node. - Get(K) Return the data item with a given a key.
- O(logn) hops
7Data Model
- RDF data and RDFS descriptions can be written as
RDF triples - RDF(S) database
- RDFS entailment rules from W3C RDF Semantics
- Not considered in this paper
- Axiomatic triples
- Rules with blank nodes
8RDFS Entailment Rules
- subClass(X,Y) - triple(X, rdfssubClassOf, Y).
- subClass(X,Y) - triple(X, rdfssubClassOf, Z),
subClass(Z, Y). - subProperty(X, Y) - triple(X, rdfssubPropertyOf,
Y). - subProperty(X, Y) - triple(X, rdfssubPropertyOf,
Z), subProperty(Z, Y). - type(X, Y) - triple(X, rdftype, Y).
- type(X, Y) - type(X, Z), subClass(Z, Y).
- type(X, Y) - triple(X, P, Z), triple(P,
rdfsdomain, Y). - type(X, Y) - triple(Z, P, X), triple(P,
rdfsrange, Y).
edb relation triple
idb relations subClass, subProperty, type
9Indexing
triple t (s1, p1, o1)
Index identifier (Hash(s1)) Index identifier
(Hash(p1)) Index identifier (Hash(o1))
Responsible node for s1
Responsible node for p1
query q (?s, p1, ?o)
Responsible node for o1
Index identifier (Hash(p1))
10Algorithms
- Forward chaining
- Compute all inferences a priori
- Backward chaining
- Compute inferences on demand
11Distributed Forward Chaining
person
sc
artist
sc
sc
painter
sc
sc
flemish
12Distributed Forward Chaining
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
person
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
sc
artist
sc
(flemish, rdfssubClassOf, painter)
painter
sc
(artist, rdfssubClassOf, person)
flemish
13Distributed Forward Chaining
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
triples
person
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
sc
rules
artist
subClass(X,Y) - triple(X, rdfssubClassOf,
Y). subClass(X,Y) - triple(X, rdfssubClassOf,
Z), subClass(Z, Y).
sc
(flemish, rdfssubClassOf, painter)
painter
infer
sc
(artist, rdfssubClassOf, person)
flemish
(painter, rdfssubClassOf, person)
14Distributed Forward Chaining
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
?
(flemish, rdfssubClassOf, artist)
(flemish, rdfssubClassOf, artist)
person
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
?
(painter, rdfssubClassOf, person)
(painter, rdfssubClassOf, person)
sc
artist
sc
(flemish, rdfssubClassOf, painter)
painter
sc
(artist, rdfssubClassOf, person)
flemish
15Distributed Forward Chaining
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
?
(flemish, rdfssubClassOf, person)
(painter, rdfssubClassOf, person)
person
(flemish, rdfssubClassOf, artist)
?
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
(flemish, rdfssubClassOf, person)
sc
artist
sc
(flemish, rdfssubClassOf, artist)
(flemish, rdfssubClassOf, painter)
painter
sc
(painter, rdfssubClassOf, person)
(artist, rdfssubClassOf, person)
flemish
16Distributed Forward Chaining
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
?
(flemish, rdfssubClassOf, person)
(painter, rdfssubClassOf, person)
person
(flemish, rdfssubClassOf, artist)
?
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
(flemish, rdfssubClassOf, person)
sc
artist
sc
(flemish, rdfssubClassOf, artist)
The same triple is generated in two nodes!!
(flemish, rdfssubClassOf, painter)
The same triple is sent to be stored twice!!
painter
sc
(painter, rdfssubClassOf, person)
(artist, rdfssubClassOf, person)
flemish
17Distributed Forward Chaining
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
?
(flemish, rdfssubClassOf, person)
(flemish, rdfssubClassOf, person)
(painter, rdfssubClassOf, person)
person
(flemish, rdfssubClassOf, artist)
?
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
(flemish, rdfssubClassOf, person)
(flemish, rdfssubClassOf, person)
sc
artist
sc
(flemish, rdfssubClassOf, artist)
(flemish, rdfssubClassOf, artist)
(flemish, rdfssubClassOf, painter)
(flemish, rdfssubClassOf, painter)
painter
sc
(painter, rdfssubClassOf, person)
(artist, rdfssubClassOf, person)
flemish
18Querying after FC
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
Query Find all subclasses of person q (X,
rdfssubClassOf, person)
(painter, rdfssubClassOf, person)
(flemish, rdfssubClassOf, artist)
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
(flemish, rdfssubClassOf, person)
Answer is found!
(flemish, rdfssubClassOf, artist)
(flemish, rdfssubClassOf, painter)
(flemish, rdfssubClassOf, person)
(painter, rdfssubClassOf, person)
(artist, rdfssubClassOf, person)
19Algorithms
- Forward chaining
- Compute all inferences a priori
- Backward chaining
- Compute inferences on demand
20Data Model - revisited
- Recursive rules
- Rule adornment from recursive query processing
- Good orderings for evaluating predicates
- eg. subClass(X, artist) ?subClassfb (X,Y)
- Extended adornment
- Ordered string of f, b, k
- k an argument that is bound and the key
- b bound argument (not the key)
- f free argument
- eg. At node responsible for key artist
- triple(X, rdftype, artist) ? triplefbk (X,
rdftype, Y). - Good ordering for evaluating predicates in a
distributed environment
21RDFS Entailment Rules - revisited
- subClasskf (X,Y) - triplekbf (X,
rdfssubClassOf, Y). - subClasskf (X,Y) - triplekbf (X,
rdfssubClassOf, Z), subClassff (Z, Y). - subClassfk (X,Y) - triplefbk (X,
rdfssubClassOf, Y). - subClassfk (X,Y) - subClassff(X, Z), triplefbk
(Z, rdfssubClassOf, Y). - subPropertykf (X,Y) - triplekbf (X,
rdfssubPropertyOf, Y). - subPropertykf (X,Y) - triplekbf (X,
rdfssubPropertyOf, Z), subPropertyff (Z, Y). - subPropertyfk (X,Y) - triplefbk (X,
rdfssubPropertyOf, Y). - subPropertyfk (X,Y) - subPropertyff(X, Z),
triplefbk (Z, rdfssubPropertyOf, Y). - typekf (X, Y) - triplekbf (X, rdftype, Y).
- typekf (X, Y) - triplekff (X, P, Z), triplefbf
(P, rdfsdomain, Y). - typekf (X, Y) - tripleffk (Z, P, X), triplefbf
(P, rdfsrange, Y). - typekf (X, Y) - triplekbf (X, rdftype, Z),
subClassff (Z, Y). - typefk (X, Y) - triplefbk (X, rdftype, Y).
- typefk (X, Y) - triplefff (X, P, Z), triplefbk
(P, rdfsdomain, Y). - typefk (X, Y) - triplefff (Z, P, X), triplefbk
(P, rdfsrange, Y). - typefk (X, Y) - typeff (X, Z), triplefbk (Z,
rdfssubClassOf, Y).
22Distributed Backward Chaining
Query Find all subclasses of person q (X,
rdfssubClassOf, person)
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
(sculptor, rdfssubClassOf, artist)
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
person
(sculptor, rdfssubClassOf, artist)
sc
artist
(flemish, rdfssubClassOf, painter)
sc
sc
painter
(artist, rdfssubClassOf, person)
sc
sc
23Distributed Backward Chaining
Query Find all subclasses of person q (X,
rdfssubClassOf, person)
(painter, rdfssubClassOf,artist) (flemish,
rdfssubClassOf, painter)
(sculptor, rdfssubClassOf, artist)
(painter, rdfssubClassOf, artist) (artist,
rdfssubClassOf, person)
person
(sculptor, rdfssubClassOf, artist)
sc
artist
(flemish, rdfssubClassOf, painter)
sc
sc
painter
(artist, rdfssubClassOf, person)
24Distributed Backward Chaining
subClass fk (X, person)
25Distributed Backward Chaining
Which predicate should we evaluate first?
subClass fk (X, person)
r1
r2
triple fbk (X, rdfssubClassOf, person)
triple fbk (Z, rdfssubClassOf, person)
subClass ff (X, Z)
(r1)
subClassfk (X,Y) - triplefbk (X,
rdfssubClassOf, Y). subClassfk (X,Y) -
subClassff(X, Z), triplefbk (Z, rdfssubClassOf,
Y).
(r2)
We choose to evaluate first the one that has a k
in its adornment
26RDFS Entailment Rules - revisited
- subClasskf (X,Y) - triplekbf (X,
rdfssubClassOf, Y). - subClasskf (X,Y) - triplekbf (X,
rdfssubClassOf, Z), subClassff (Z, Y). - subClassfk (X,Y) - triplefbk (X,
rdfssubClassOf, Y). - subClassfk (X,Y) - subClassff(X, Z), triplefbk
(Z, rdfssubClassOf, Y). - subPropertykf (X,Y) - triplekbf (X,
rdfssubPropertyOf, Y). - subPropertykf (X,Y) - triplekbf (X,
rdfssubPropertyOf, Z), subPropertyff (Z, Y). - subPropertyfk (X,Y) - triplefbk (X,
rdfssubPropertyOf, Y). - subPropertyfk (X,Y) - subPropertyff(X, Z),
triplefbk (Z, rdfssubPropertyOf, Y). - typekf (X, Y) - triplekbf (X, rdftype, Y).
- typekf (X, Y) - triplekff (X, P, Z), triplefbf
(P, rdfsdomain, Y). - typekf (X, Y) - tripleffk (Z, P, X), triplefbf
(P, rdfsrange, Y). - typekf (X, Y) - triplekbf (X, rdftype, Z),
subClassff (Z, Y). - typefk (X, Y) - triplefbk (X, rdftype, Y).
- typefk (X, Y) - triplefff (X, P, Z), triplefbk
(P, rdfsdomain, Y). - typefk (X, Y) - triplefff (Z, P, X), triplefbk
(P, rdfsrange, Y). - typefk (X, Y) - typeff (X, Z), triplefbk (Z,
rdfssubClassOf, Y).
Notice that every predicate has the k in its
adornment is the edb relation triple!
27Distributed Backward Chaining
subClass fk (X, person)
r1
r2
triple fbk (X, rdfssubClassOf, person)
triple fbk (Z, rdfssubClassOf, person)
subClass ff (X, Z)
Z / artist
subClass fk (X, artist)
r1
r2
triple fbk (Z, rdfssubClassOf, artist)
triple fbk (X, rdfssubClassOf, artist)
subClass ff (X, Z)
Z / sculptor
Z / painter
28Distributed Backward Chaining
subClass fk (X, person)
r1
r2
triple fbk (X, rdfssubClassOf, person)
triple fbk (Z, rdfssubClassOf, person)
subClass ff (X, Z)
Z / artist
subClass fk (X, artist)
r1
r2
triple fbk (Z, rdfssubClassOf, artist)
triple fbk (X, rdfssubClassOf, artist)
subClass ff (X, Z)
Z / sculptor
Z / painter
subClass fk (X, painter)
subClass fk (X, sculptor)
r1
r1
r2
r2
triple fbk (X, rdfssubClassOf, sculptor)
triple fbk (X, rdfssubClassOf, painter)
triple fbk (Z, rdfssubClassOf, painter)
triple fbk (Z, rdfssubClassOf, sculptor)
subClass ff (X, Z)
subClass ff (X, Z)
29Evaluation
- Analytical cost model
- Experimental evaluation
30Experimental Setup
- Both algorithms have been implemented as a real
distributed system using Bamboo DHT - Experiments were conducted in PlanetLab (123
nodes available at the time of the experiments) - Synthetic data from RBench generator
Theoharis05 - number of instances 103, 104
- RDFS class hierarchy tree depth 2-6 (7 to 128
classes) - distribution both uniform and Zipf (z1)
- Query Give me the instances of the root class
31Metrics
- Network traffic
- number of messages sent
- bandwidth
- Storage load
- total number of triples stored
- Storage time
- Query response time
32Network traffic (while storing)
33Storage load
34Query response time
35Comparison
- Forward chaining
- Query response time
- Storage load
- Storage time
- Network traffic due to generated redundancies
- Backward chaining
- Storage load
- Storage time
- No redundancies
- Query response time
- Query processing load
36Summary
- How to implement forward and backward chaining in
a distributed environment - Both algorithms have been integrated in the
conjunctive query processing algorithms of our
system Atlas (http//atlas.di.uoa.gr) - What techniques we need to extend to make this
implementation feasible - How these algorithms perform in a real
decentralized environment (PlanetLab) - Our algorithms could be adapted for general
recursive query processing
37Future Work
- Ongoing work
- Support for all RDFS entailment rules
- Experimenting with complex queries (LUBM
benchmark) - Future work
- Optimize forward chaining
- Hybrid approach
- Network churn
38Thank you!!
Questions?