Title: RDF Aggregate Queries and Views
1RDF Aggregate Queries and Views
- Edward Hung, Yu Deng, V.S. Subrahmanian
- University of Maryland, College Park
2Maintenance of RDF Aggregate Views
- Introduction of RDF and RDQL
- RDQL Extension for Aggregate Views
- Aggregate View Maintenance Algorithms AMX
- Implementation and Experiments
- Related Work
3Publication
- Edward Hung, Yu Deng, V.S. Subrahmanian, "RDF
Aggregate Queries and Views", to appear in the
Proc. of the 21st International Conference on
Data Engineering (ICDE), Tokyo, Japan, 2005.
4Introduction
- Resource Description Framework (RDF)
- W3C Recommendation
- Represents metadata about resources identifiable
on the web (by Uniform Resource Identifier (URI)) - Triple (Resource, Property, Value)
- (Artist, rdftype, rdfsClass)
- (Painter, rdftype, rdfsClass)
- (Painter, rdfssubClassOf, Artist)
5- lt?xml version"1.0"?gt
- lt!DOCTYPE rdfRDF lt!ENTITY xsd
"http//www.w3.org/2001/XMLSchema"gtgt - ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-
rdf-syntax-ns" xmlnsrdfs"http//www.w3.org/20
00/01/rdf-schema" xmlbase"http//www.auctions
chema.com/schema1"gt - ltrdfsClass rdfID"Artist"/gt
- ltrdfsClass rdfID"Painter"gtltrdfssubClassOf
rdfresource"Artist"/gtlt/rdfsClassgt - ltrdfsDatatype rdfabout"xsdstring"/gt
- ltrdfProperty rdfID"fname"gt
- ltrdfsdomain rdfresource"Artist"/gt
- ltrdfsrange rdfresource"xsdstring"/gt
- lt/rdfPropertygt
- lt/rdfRDFgt
- lt?xml version"1.0"?gt
- lt!DOCTYPE rdfRDF lt!ENTITY xsd
"http//www.w3.org/2001/XMLSchema"gtgt - ltrdfRDF xmlnsrdf "http//www.w3.org/1999/02/22
-rdf-syntax-ns" - xmlnsns1"http//www.auctionschema.com/schema
1"gt - ltrdfDescription rdfabout"http//www.artist.n
etguyrose"gt - ltrdftype rdfresource"ns1Painter"/gt
- ltns1fname rdfdatatype"xsdstring"gt Guy
lt/ns1fnamegt
RDF Schema
RDF Instance
6- lt?xml version"1.0"?gt
- lt!DOCTYPE rdfRDF lt!ENTITY xsd
"http//www.w3.org/2001/XMLSchema"gtgt - ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-
rdf-syntax-ns" xmlnsrdfs"http//www.w3.org/20
00/01/rdf-schema" xmlbase"http//www.auctions
chema.com/schema1"gt - ltrdfsClass rdfID"Artist"/gt
- ltrdfsClass rdfID"Painter"gtltrdfssubClassOf
rdfresource"Artist"/gtlt/rdfsClassgt - ltrdfsDatatype rdfabout"xsdstring"/gt
- ltrdfProperty rdfID"fname"gt
- ltrdfsdomain rdfresource"Artist"/gt
- ltrdfsrange rdfresource"xsdstring"/gt
- lt/rdfPropertygt
- lt/rdfRDFgt
- lt?xml version"1.0"?gt
- lt!DOCTYPE rdfRDF lt!ENTITY xsd
"http//www.w3.org/2001/XMLSchema"gtgt - ltrdfRDF xmlnsrdf "http//www.w3.org/1999/02/22
-rdf-syntax-ns" - xmlnsns1"http//www.auctionschema.com/schema
1"gt - ltrdfDescription rdfabout"http//www.artist.n
etguyrose"gt - ltrdftype rdfresource"ns1Painter"/gt
- ltns1fname rdfdatatype"xsdstring"gt Guy
lt/ns1fnamegt
fname
Artist
String
subClassOf
Painter
fname
r1
Guy
r1 http//www.artist.netguyrose
7(No Transcript)
8RDQL RDF Query Language
- SELECT?highprice
- WHERE (?artist, ltns1lnamegt, "Rose"),
- (?artist, ltns1fnamegt, "Guy"),
- (?artist, ltns1createsgt, ?artifact),
- (?artifact, ltns1estimatedgt, ?price),
- (?price, ltns1highgt, ?highprice),
- (?artifact, ltns1presentedgt, ?date)
- AND 2004-04-01 lt ?date lt 2004-04-30
- USING ns1 FOR http//www.auctionschema.com/schema1
gt
view pattern
9RDQL Extension for Aggregates and Views
- CREATEVIEW AS
- SELECT max(?highprice)
- WHERE (?artist, ltns1lnamegt, "Rose"),
- (?artist, ltns1fnamegt, "Guy"),
- (?artist, ltns1createsgt, ?artifact),
- (?artifact, ltns1estimatedgt, ?price),
- (?price, ltns1highgt, ?highprice),
- (?artifact, ltns1presentedgt, ?date)
- AND 2004-04-01 lt ?date lt 2004-04-30
- USING ns1 FOR http//www.auctionschema.com/schema1
gt
10(No Transcript)
11- We are expanding the syntax of RDQL so that it
allows constants in SELECT clauses which
equivalently creates new resources and properties
using the constants. - For example, the previous query can be modified
as follows - CREATEVIEW AS
- SELECT ltns1works_by_guyrosegt, ltns1maxpricegt,
max(?highprice) - WHERE (?artist, ltns1lnamegt, "Rose"),
- (?artist, ltns1fnamegt, "Guy"),
- (?artist, ltns1createsgt, ?artifact),
- (?artifact, ltns1estimatedgt, ?price),
- (?price, ltns1highgt, ?highprice),
- (?artifact, ltns1presentedgt, ?date)
- AND 2004-04-01 lt ?date lt 2004-04-30
- USING ns1 FOR http//www.auctionschema.com/schema1
gt - The result is a valid RDF statement
(ltns1works_by_guyrosegt,ltns1maxpricegt,800000"
ns1USD)
12(No Transcript)
13(No Transcript)
14Aggregate View Maintenance
- Relational Approach
- Store all triples in a relational table with
schema (Resource, Property, Value) - OR
- Store resources and values of the same property
in a separate relational table with schema
(Resource, Value) - self-joins (triples in where-clause) 1
- Large number of delta rules during relational
view maintenance ? expensive
15Aggregate View Maintenance
- Graph-structured DB (GSDB) Zhuge, Garcia-Molina,
ICDE 1998 - GSDB assumes a rooted graph model while RDF is a
general graph - A GSDB view contains a set of nodes while our RDF
views can contain nodes, edges, or any
combinations.
16Aggregate View Maintenance
- Our Approach
- Localized search in RDF graphs
- breadth-first search starting at the
inserted/deleted edge - auxiliary data are needed for certain aggregate
views - min, max, avg
17Compute Aggregates Algorithm CAA
18view pattern
19BAG
20BAG 800000
21BAG 800000, 500000
SELECT max(?highprice)
22Aggregate View Maintenance Algorithms AMX
- AMI Insertion
- AMD Deletion
- AMT Triple Modification
- AMR Resource Modification
23BAG 800000, 500000
Update Insertion
paints
24BAG 800000, 500000
paints
25BAG 800000, 500000, 60000
SELECT max(?highprice)
paints
26AMI for Insertion
27(No Transcript)
28Distributive Aggregate Function
- An aggregate function f is distributive w.r.t a
source update operation if and only if after such
an operation, the updated value of the function
can be computed based on its old value and the
value(s) of the source update without reference
to the source. - More formally, f is distributive w.r.t. an update
operation U if and only if there exists a
function g such that f(I') g(f(I), v) where
f(I) is the aggregate value, I' is the updated
instance after the update operation U(I, v), and
v is the value(s) used in the update (e.g., the
new value to add, the old value to remove, etc).
29Distributive Aggregate Function
- Examples of distributive aggregate functions
- count, sum, average w.r.t. insertion, deletion
and update - For average, we will need an additional attribute
size which stores the size of S (in line 3 of
CAA) in order to compute the correct updated
value (or, we can use sum, count to calculate it) - max and min are distributive w.r.t. insertion,
but not deletion and update - Auxiliary data computed from the source (such as
S) can help to maintain non-distributive
aggregate functions to avoid the need to refer to
the source.
30TMaintainI
31BAG 800000, 500000, 60000
Update Deletion
paints
32BAG 800000, 500000, 60000
paints
33BAG 500000, 60000
SELECT max(?highprice)
paints
34AMD for Deletion
35TMaintainD
36Implementation and Experiment
- Implemented in Java
- Jena RDQL Engine of HP
- Comparison with Relational Approach (standard
view maintenance algorithm on relational tables) - Counting Algorithm in Gupta et al. "Maintaining
Views Incrementally", SIGMOD 1993 - Dataset Chef Moz Project RDF dump
- Data stored in memory
37(No Transcript)
38Other Related Work
- Voltz et al. DBFUSION02
- the first to introduce a view mechanism for RDF
data - Their views require that
- the results contain class instances (i.e., a
subject or object variable), or - the result itself has the pattern of RDF
statement (i.e., a triple containing subject,
predicate and object). - Magkanaraki et al ISWC03
- proposed RVL, a view definition language that can
also create virtual RDF schemas and restructure
class and property hierarchies such that new
resources, property values, classes and property
types can be created. - None of these works specifically address (i)
aggregates in RDF or (ii) the problem of
maintaining aggregate RDF views.
39Summary
- RDQL Extension for Views and Aggregates
- Compute Aggregates Algorithm CAA
- Aggregate View Maintenance Algorithms AMX