Title: Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer
1gStore Answering SPARQL Queries Via Subgraph
Matching
- Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer
Özsu3, Dongyan Zhao1
1Peking University, 2Hong Kong University of
Science and Technology, 3University of Waterloo
2Outline
- Background Related Work
- Overview of gStore
- Encoding Technique
- VS-tree Query Algorithm
- Experiments
- Conclusions
3Outline
- Background Related Work
- Overview of gStore
- Encoding Technique
- VS-tree Query Algorithm
- Experiments
- Conclusions
4Semantic Web
Semantic Web Technologies is a collection of
standard technologies to realize a Web of Data.
5RDF Data Model
URI
Literals
URI
6RDF Graph
Literal Vertex
Entity Vertex
7SPARQL Queries
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Query Graph
8Subgraph Match vs. SPARQL Queries
9Naïve Triple Store
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Too many Self-Joins
SQL Select T3.Subject From T as T1, T as T2, T
as T3 Where T1.PredictBornOnDate and
T1.Object1809-02-12 and T2.PredictDiedOnDate
and T2.Object1865-04-15 and T3.
PredicthasName and T1.Subject T2.Subject
and T2. Subject T3.subject
10Existing Solutions
- Three categories of solutions are proposed to
speed up query processing - Property Table
- Jena K. Wilkinson et al. SWDB 03,
- 2. Vertically Partitioned Solution
- SW-store D. J. Abadi et al. VLDB 07,
- 3. Exhaustive-IndexingRDF-3x T. Neumann et
al. VLDB 08, Hexastore C. Weiss et al. VLDB 08
, -
11Existing Solutions-Property Table
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Reducing of join steps
SQL Select People.hasName from People where
People.BornOnDate 1809-02-12 and
People.DiedOnDate 1865-04-15.
12Existing Solutions-Vertically Partitioned
Solution
Fast Merge Join
13Existing Solutions- Exhaustive-Indexing
Range query Merge Join
- Each SPARQL query statement can be translated
into one range query. - SPARQL Query
- Select ?name Where ?m lthasNamegt ?name. ?m
ltBornOnDategt 1809-02-12. ?m ltDiedOnDategt
1865-04-15.
14Some Limitations
- Difficult to handle wildcard queries.
- Difficult to handle updates.
-
-
15Outline
- Background Related Work
- Overview of gStore
- Encoding Technique
- VS-tree Query Algorithm
- Experiments
- Conclusions
16Intuition of gStore
Finding Matches over a Large Graph is not a
trivial task.
17Preliminaries
Literal Vertex
Entity Vertex
18Storage Schema in gStore
Encoding all neibhors into a bit-string, called
signature.
19Encoding Technique (1)
Abr, bra, rah, aha, .,
0000 0010 0000 0000
( hasName, Abraham Lincoln)
1000 0000 0000 0000
0010 0000 0000
1000 0010 0100 0001
0000 0000 0100 0000
( BornOnDate, 1809-02-12)
0100 0000 0000
0100 0010 0100 1000
0000 0000 0000 0001
OR
( DiedOnDate, 1865-04-15)
1000 0010 0100 0001
0000 1000 0000
0000 0010 0100 0000
OR
( DiedIn, yWashington_D.c)
0000 0010 0000
1100 0010 0100 1001
0000 0010 0000
1000 0010 0100 0001
20Encoding Technique (2)
21Encoding Technique (3)
22Outline
- Background Related Work
- Overview of gStore
- Encoding Technique
- VS-tree Query Algorithm
- Experiments
- Conclusions
23A Straightforward Solution (1)
u2
u1
001
004
006
002
003
006
L1
L2
24A Straightforward Solution (2)
L1
L2
Large Join Space ! ?
001
004
006
002
003
006
25VS-tree
26Pruning Technique
Reduced Join Space! ?
u2
u1
10010
001
004
006
002
003
006
27An Example for Pruning Effect
Query ?x1 yhasGivenName ?x5 ?x1
yhasFamilyName ?x6 ?x1 rdftype
ltwordnet_scientist_110560637gt ?x1 ybornIn ?x2
?x1 yhasAcademicAdvisor ?x4 ?x2 ylocatedIn
ltSwitzerlandgt ?x3 ylocatedIn ltGermanygt ?x4
ybornIn ?x3
Before Pruning After Pruning
x1 810 810
X2 424 197
x3 66 66
x4 36187 6686
28Query Algorithm-Top-Down
29Outline
- Background Related Work
- Overview of gStore
- Encoding Technique
- VS-tree Query Algorithm
- Experiments
- Conclusions
30Datasets
Triple Size
Yago 20 million 3.1GB
DBLP 8 million 0.8 GB
31Exact Queries
32Wildcard Queries
33Outline
- Background Related Work
- Overview of gStore
- Encoding Technique
- VS-tree Query Algorithm
- Experiments
- Conclusions
34Conclusions
- Vertex Encoding Technique
- An Efficient index Structure VS-tree
- A Novel Filtering Technique.
35Q/A
Thank You!
zoulei_at_pku.edu.cn
36Updates- Insertion in G
37Updates- Insertion in VS-tree
38Updates- Deletion in VS-tree
To be deleted
39Framework in gStore
40A Straightforward Solution (1)
u
u 001 u