Computer Science and Engineering - PowerPoint PPT Presentation

About This Presentation
Title:

Computer Science and Engineering

Description:

Inverted Linear Quadtree: Ef cient Top K Spatial Keyword Search Computer Science and Engineering Chengyuan Zhang1,Ying Zhang1,Wenjie Zhang1, Xuemin Lin2,1 – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 33
Provided by: BradH165
Category:

less

Transcript and Presenter's Notes

Title: Computer Science and Engineering


1
  • Inverted Linear Quadtree Ef?cient Top K Spatial
    Keyword Search
  • Computer Science and Engineering

Chengyuan Zhang1,Ying Zhang1,Wenjie Zhang1,
Xuemin Lin2,1
1The University of New South Wales, Australia 2
East China Normal University
2
Background
  • An enormous amount of spatio-textual objects
    available in many applications
  • online local search
  • e.g., online yellow pages
  • social network services
  • e.g., Facebook, Flickr

3
p5 (pizza, steak,seafood)
p2 (pizza, coffee,steak)
p4 (coffee, sushi)
pizza,coffee
p3 (pizza, sushi)
p1 (pizza, coffee,sushi)
4
Top k spatial keyword search (TOPK-SK)
  • Data
  • A set of spatio-textual objects
  • Each object is represented a location and a set
    of keywords
  • Query
  • Query location (q.loc)
  • A set of query keywords (q.T)
  • Answer
  • The closest k objects, each of which contains
    all query keywords

5
Naïve Approach
Running Example
Distance Order
P3
P4
P7
P8
P5
P1
P10
P9
P6
P2
P11
  • 11 spatio-textual objects
  • Vocabulary t1, t2, t3
  • Query q with q.T t1, t2 and k 1

p11 (t2)
p6 (t2,t3)
P10 (t1)
p10 (t1)
p7 (t3)
p4 (t1)
p9 (t2)
p1 (t1,t2)
p8 (t3)
Objects Accessed p3, p4, p7, p8 ,p5, p1!
p3 (t1,t3)
p5 (t2,t3)
p2 (t1,t2)
6
Inverted R-tree Y. Zhou,et al., CIKM 2005
K1, q.Tt1, t2
DistanceOrder
P3
P4
P7
P8
P5
P1
P10
P9
P6
P2
P11
  • For each keyword t, construct an R tree for
    objects containing t

E1 E2
E1
E2
P1 P2 P3
P4 P10
R1 (t1)
E1 E2
E2
Objects Accessed p3, p4, p5, p1!
E1
P2 P5
P1 P6 P11
R2 (t2)
E1 E2
E1
E2
P6 P7 P9
P3 P5 P8
R3 (t3)
7
IR2-tree I. D. Felipe, et. al., ICDE 2008
  • Index Structure
  • Combination of an R-Tree and signature technique
  • Each node contains a rectangle and a signature (
    a fixed length bitmap)
  • Each word is hashed to a particular bit
  • The signature of a node is the Bitwise OR of
    all the signatures of its child nodes

8
Example
10
t1
Objects Accessed p3, p4, p7, p1!
01
t2
Distance Order
P3
P4
P7
P8
P5
P1
P10
P9
P6
P2
P11
t3
01
k1, q.Tt1, t2
False positive!
E11 E12
11 11
E12
E11
E9 E10
11 11
E7 E8
11 11
E10
E9
E8
Result
E7
E4 E5
01 11
E6
11
E3
11
E1 E2
11 01
E2
E3
E4
E1
E6
E5
p1 p3
11 11
p2
11
p8 p5
01 01
p10 p11
10 01
p6 P9
01 10
p4 p7
10 01
E8
p1
E5
p5
9
Observations
Number of object within search region
Number of object accessed
Avg. probability that an objects is accessed
  • Naïve approach
  • Disadvantages all objects in the search region
    are accessed ( large s and p1 )
  • Inverted R-tree
  • Advantages exclude unrelated objects ( small s
    )
  • Disadvantages cannot take advantage of AND
    semantics (p1)
  • IR2-tree
  • Advantages have filtering technique to reduce p
  • Disadvantages large s and p is affected by
    non-related objects
  • Other Single Augmented R-tree
  • Other spatial keyword search KR tree R.
    Hariharan, et al., SSDBM 2007

  • WIR tree D. Wu , et al., TKDE 2011
  • Spatial keyword ranking query IR tree G.
    Cong ,et al., PVLDB 2009
  • CM-CDIR tree D. Wu ,et al.,
    VLDBJ 2012
  • Their shortcomings same as IR2-tree

Cost model n sp
10
Motivation
  • Index structure
  • have a small number of objects within the search
    region
  • can prune objects within the search region
  • Properties
  • falls in the category of inverted index
  • exploit the AND semantics
  • adaptive to the distribution of the objects for
    each keyword

11
Motivation
Signature of a region regarding a keyword
1
non-Empty
0
Empty
p1 t1
Query Keyword t1, t2
p2 t1, t2
p3 t2
t1 1
0
t2 0
t1 1
1
t2 1
12
Linear Quadtree Structure
  • Regular space partition based indexing
  • Each node can be identified by its split sequence
    (Morton code, a.k.a Z order)
  • A circle and a square to denote the non-leaf node
    and leaf node
  • A leaf node is set black if it is not empty,
    otherwise, it is a white leaf node
  • Keep the black leaf nodes (B tree)

NE

1100
SW, SE
0001
13
IL-Quadtree
  • For each keyword ti ? V we build a linear
    quadtree, denoted by LQi, for the objects which
    contain the keyword ti
  • Besides the black leaf nodes we also keep the
    quadtree node information ( signature )
  • 1 for black leaf nodes and non-leaf nodes and
    0 otherwise

14
Search Algorithm
k1, q.Tt1, t2
Distance Order
P3
P4
P7
P8
P5
P1
P10
P9
P6
P2
P11
Objects Accessed p4, p1!
15
Direction-aware spatial keyword search
G. Li, et al., ICDE 2012
  • Data
  • A set of spatio-textual objects
  • Each objects has a location and a set of keywords
  • Query
  • A location (q.loc)
  • A set of query keywords (q.T)
  • A direction ?, ?
  • Answer
  • The closest k objects, each of which contains all
    keywords in q.T, and in the search direction

16
Spatial Keyword Based Ranking G. Cong ,et
al., PVLDB 2009, VLDBJ 2012
  • Query
  • Spatial location
  • Query keywords
  • Returns the k best objects ranked by
  • Spatial distance to the query location
  • Textual relevance to the query keywords
  • Spatio-textual ranking Score
  • The spatial proximity (d) is the normalized
    Euclidean distance between p and q
  • The textual relevance (?) is the tf-idf based
    textual similarity between the description of p
    and the query keywords.
  • Our Solution
  • the maximal keywords weight replaces the bit
    signature aggregate inverted linear quadtree
  • spatial distance ranking function replaced by
    spatio-textual ranking score function
  • Score based pruning based on weight and region of
    the quadtree node

17
Experimental Setting
  • Implemented in Java
  • Debian Linux
  • Intel Xeon 2.40GHz dual CPU
  • 4 GB memory
  • Dataset
  • GN US Board on Geographic Names
  • Tigers, Cars
  • Spatial datasets from Rtree-Portal
  • Textual content from 20 Newsgroups
  • SYN synthetic dataset
  • Query (1000) location , l query keywords
  • Evaluate Response time and I/O

18
Important Statistics
Parameters evaluated
Definition Notation Default Value
Number of required result k 10
Number of query keywords l 3
Term frequency of vocabulary z 1.1
Number of objects n 1,000,000
Vocabulary size v 100,000
Avg. keywords per object m 15
19
Tuning
  • w Minimal depth of the black leaf node
  • c The split threshold
  • Best performance
  • w 8 and c 64

20
  • l The number of query keywords
  • Gird M. Christoforaki,et al., CIKM, 2011
  • GridSIG the extension of Grid, utilizing
    signature technique

21
Algorithms Evaluated
  • ILQ
  • Inverted Linear Quadtree based techniques
  • IVR
  • inverted Rtree Y. Zhou, et al., CIKM 2005
  • MIR2
  • I. D. Felipe,et al., ICDE 2008
  • KR
  • R. Hariharan,et al., SSDBM 2007
  • WIR
  • D. Wu ,et al., TKDE 2011
  • IR
  • G. Cong ,et al., PVLDB 2009
  • CM-CDIR
  • D. Wu ,et al., VLDBJ 2012

22
Evaluation on different datasets
23
Comparison Varying l
24
Comparison Varying k
25
Comparison Varying Parameters
26
Conclusion
  • Important properties of indexing techniques to
    support top k spatial keyword search
  • Propose the inverted linear quadtree structure
    to efficiently support top k spatial keyword
    search
  • Extensive experiment on both real and synthetic
    data

Future work
  • Enhance the region based signature technique
    group objects to reduce false positive.
  • Support top k spatial keyword search on other
    metric spaces

27
Thank you!
28
Spatial Keyword Ranking Query
  • Our Algorithm
  • Aggregate ILQ
  • Compare with
  • IR G. Cong, et al., PVLDB 2009
  • CM-CDIR D. Wu ,et al., VLDBJ 2012
  • Dataset Tiger

29
Direction-Aware TOPK-SK Query
  • Our Algorithm
  • ILQ
  • Compare with
  • DESKS G.Li,et al., ICDE 2012

30
Comparison Varying k
31
IR-Tree
32
KR Tree
Write a Comment
User Comments (0)
About PowerShow.com