Title: Computer Science and Engineering
1- Inverted Linear Quadtree Ef?cient Top K Spatial
Keyword Search
- Computer Science and Engineering
Chengyuan Zhang1,Ying Zhang1,Wenjie Zhang1,
Xuemin Lin2,1
1The University of New South Wales, Australia 2
East China Normal University
2Background
- An enormous amount of spatio-textual objects
available in many applications - online local search
- e.g., online yellow pages
- social network services
- e.g., Facebook, Flickr
3p5 (pizza, steak,seafood)
p2 (pizza, coffee,steak)
p4 (coffee, sushi)
pizza,coffee
p3 (pizza, sushi)
p1 (pizza, coffee,sushi)
4Top k spatial keyword search (TOPK-SK)
- Data
- A set of spatio-textual objects
- Each object is represented a location and a set
of keywords - Query
- Query location (q.loc)
- A set of query keywords (q.T)
- Answer
- The closest k objects, each of which contains
all query keywords
5Naïve Approach
Running Example
Distance Order
P3
P4
P7
P8
P5
P1
P10
P9
P6
P2
P11
- 11 spatio-textual objects
- Vocabulary t1, t2, t3
- Query q with q.T t1, t2 and k 1
p11 (t2)
p6 (t2,t3)
P10 (t1)
p10 (t1)
p7 (t3)
p4 (t1)
p9 (t2)
p1 (t1,t2)
p8 (t3)
Objects Accessed p3, p4, p7, p8 ,p5, p1!
p3 (t1,t3)
p5 (t2,t3)
p2 (t1,t2)
6Inverted R-tree Y. Zhou,et al., CIKM 2005
K1, q.Tt1, t2
DistanceOrder
P3
P4
P7
P8
P5
P1
P10
P9
P6
P2
P11
- For each keyword t, construct an R tree for
objects containing t
E1 E2
E1
E2
P1 P2 P3
P4 P10
R1 (t1)
E1 E2
E2
Objects Accessed p3, p4, p5, p1!
E1
P2 P5
P1 P6 P11
R2 (t2)
E1 E2
E1
E2
P6 P7 P9
P3 P5 P8
R3 (t3)
7IR2-tree I. D. Felipe, et. al., ICDE 2008
- Index Structure
- Combination of an R-Tree and signature technique
- Each node contains a rectangle and a signature (
a fixed length bitmap) - Each word is hashed to a particular bit
- The signature of a node is the Bitwise OR of
all the signatures of its child nodes
8Example
10
t1
Objects Accessed p3, p4, p7, p1!
01
t2
Distance Order
P3
P4
P7
P8
P5
P1
P10
P9
P6
P2
P11
t3
01
k1, q.Tt1, t2
False positive!
E11 E12
11 11
E12
E11
E9 E10
11 11
E7 E8
11 11
E10
E9
E8
Result
E7
E4 E5
01 11
E6
11
E3
11
E1 E2
11 01
E2
E3
E4
E1
E6
E5
p1 p3
11 11
p2
11
p8 p5
01 01
p10 p11
10 01
p6 P9
01 10
p4 p7
10 01
E8
p1
E5
p5
9Observations
Number of object within search region
Number of object accessed
Avg. probability that an objects is accessed
- Naïve approach
- Disadvantages all objects in the search region
are accessed ( large s and p1 ) - Inverted R-tree
- Advantages exclude unrelated objects ( small s
) - Disadvantages cannot take advantage of AND
semantics (p1) - IR2-tree
- Advantages have filtering technique to reduce p
- Disadvantages large s and p is affected by
non-related objects - Other Single Augmented R-tree
- Other spatial keyword search KR tree R.
Hariharan, et al., SSDBM 2007 -
WIR tree D. Wu , et al., TKDE 2011 - Spatial keyword ranking query IR tree G.
Cong ,et al., PVLDB 2009 - CM-CDIR tree D. Wu ,et al.,
VLDBJ 2012 - Their shortcomings same as IR2-tree
Cost model n sp
10Motivation
- Index structure
- have a small number of objects within the search
region - can prune objects within the search region
- Properties
- falls in the category of inverted index
- exploit the AND semantics
- adaptive to the distribution of the objects for
each keyword
11Motivation
Signature of a region regarding a keyword
1
non-Empty
0
Empty
p1 t1
Query Keyword t1, t2
p2 t1, t2
p3 t2
t1 1
0
t2 0
t1 1
1
t2 1
12Linear Quadtree Structure
- Regular space partition based indexing
- Each node can be identified by its split sequence
(Morton code, a.k.a Z order) - A circle and a square to denote the non-leaf node
and leaf node - A leaf node is set black if it is not empty,
otherwise, it is a white leaf node - Keep the black leaf nodes (B tree)
NE
1100
SW, SE
0001
13IL-Quadtree
- For each keyword ti ? V we build a linear
quadtree, denoted by LQi, for the objects which
contain the keyword ti - Besides the black leaf nodes we also keep the
quadtree node information ( signature ) - 1 for black leaf nodes and non-leaf nodes and
0 otherwise
14Search Algorithm
k1, q.Tt1, t2
Distance Order
P3
P4
P7
P8
P5
P1
P10
P9
P6
P2
P11
Objects Accessed p4, p1!
15Direction-aware spatial keyword search
G. Li, et al., ICDE 2012
- Data
- A set of spatio-textual objects
- Each objects has a location and a set of keywords
- Query
- A location (q.loc)
- A set of query keywords (q.T)
- A direction ?, ?
- Answer
- The closest k objects, each of which contains all
keywords in q.T, and in the search direction
16Spatial Keyword Based Ranking G. Cong ,et
al., PVLDB 2009, VLDBJ 2012
- Query
- Spatial location
- Query keywords
- Returns the k best objects ranked by
- Spatial distance to the query location
- Textual relevance to the query keywords
- Spatio-textual ranking Score
- The spatial proximity (d) is the normalized
Euclidean distance between p and q - The textual relevance (?) is the tf-idf based
textual similarity between the description of p
and the query keywords. - Our Solution
- the maximal keywords weight replaces the bit
signature aggregate inverted linear quadtree - spatial distance ranking function replaced by
spatio-textual ranking score function - Score based pruning based on weight and region of
the quadtree node
17Experimental Setting
- Implemented in Java
- Debian Linux
- Intel Xeon 2.40GHz dual CPU
- 4 GB memory
- Dataset
- GN US Board on Geographic Names
- Tigers, Cars
- Spatial datasets from Rtree-Portal
- Textual content from 20 Newsgroups
- SYN synthetic dataset
- Query (1000) location , l query keywords
- Evaluate Response time and I/O
18Important Statistics
Parameters evaluated
Definition Notation Default Value
Number of required result k 10
Number of query keywords l 3
Term frequency of vocabulary z 1.1
Number of objects n 1,000,000
Vocabulary size v 100,000
Avg. keywords per object m 15
19Tuning
- w Minimal depth of the black leaf node
- c The split threshold
- Best performance
- w 8 and c 64
20- l The number of query keywords
- Gird M. Christoforaki,et al., CIKM, 2011
- GridSIG the extension of Grid, utilizing
signature technique
21Algorithms Evaluated
- ILQ
- Inverted Linear Quadtree based techniques
- IVR
- inverted Rtree Y. Zhou, et al., CIKM 2005
- MIR2
- I. D. Felipe,et al., ICDE 2008
- KR
- R. Hariharan,et al., SSDBM 2007
- WIR
- D. Wu ,et al., TKDE 2011
- IR
- G. Cong ,et al., PVLDB 2009
- CM-CDIR
- D. Wu ,et al., VLDBJ 2012
22Evaluation on different datasets
23Comparison Varying l
24Comparison Varying k
25Comparison Varying Parameters
26Conclusion
- Important properties of indexing techniques to
support top k spatial keyword search - Propose the inverted linear quadtree structure
to efficiently support top k spatial keyword
search - Extensive experiment on both real and synthetic
data
Future work
- Enhance the region based signature technique
group objects to reduce false positive. - Support top k spatial keyword search on other
metric spaces
27Thank you!
28Spatial Keyword Ranking Query
- Our Algorithm
- Aggregate ILQ
- Compare with
- IR G. Cong, et al., PVLDB 2009
- CM-CDIR D. Wu ,et al., VLDBJ 2012
- Dataset Tiger
29Direction-Aware TOPK-SK Query
- Our Algorithm
- ILQ
- Compare with
- DESKS G.Li,et al., ICDE 2012
30Comparison Varying k
31IR-Tree
32KR Tree