Keyword Search on Spatial Databases - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Keyword Search on Spatial Databases

Description:

Application require a combination of spatial and keyword search. ... Dik Lun Lee, Young Man Kim, Gaurav Patel: Efficient Signature File Methods for Text Retrieval. ... – PowerPoint PPT presentation

Number of Views:1460
Avg rating:5.0/5.0
Slides: 27
Provided by: vage2
Category:

less

Transcript and Presenter's Notes

Title: Keyword Search on Spatial Databases


1
Keyword Search on Spatial Databases
  • Ian De Felipe
  • Vagelis Hristidis
  • Naphtali Rishe
  • School of Computing and Information
    SciencesFlorida International UniversityMiami,
    FL

2
Roadmap
  • Motivation - Problem Definition
  • Baseline Methods
  • IR2-Tree and Search Algorithms
  • Experiments
  • Related Work
  • Conclusions

3
Roadmap
  • Motivation - Problem Definition
  • Baseline Methods
  • IR2-Tree and Search Algorithms
  • Experiments
  • Related Work
  • Conclusions

4
Motivation
  • Application require a combination of spatial and
    keyword search.
  • E.g., online yellow pages allow users to specify
    address and set of keywords
  • Efficient algorithms exists to tackle separately
  • Spatial search Nearest Neighbor (NN)
  • Keyword search

5
Problem Definition
  • A spatial keyword query consists of a query area
    and a set of keywords.
  • The answer is list of objects ranked according to
    combination of distance to query area and
    relevance to query keywords.
  • A variant is distance-first spatial keyword
    query, where objects are ranked by distance and
    keywords are applied as conjunctive filter.
  • Distance-first top-k spatial keyword query
    returns k top object only.
  • Focus on this variant in presentation.
    Generalization presented in paper.

6
Example Distance-First Spatial Keyword Query
  • Find nearest hotels to point 30.5, 100.0 that
    contain keywords internet and pool.

7
Roadmap
  • Motivation - Problem Definition
  • Baseline Methods
  • IR2-Tree and Search Algorithms
  • Experiments
  • Related Work
  • Conclusions

8
Nearest Neighbor Queries First Baseline
Algorithm
  • Many proposed algorithms.
  • Hjaltason and Samet 99 Incremental NN
  • Appropriate navigation of R-Tree
  • R-Tree Baseline Execute Incremental NN and for
    each output object check if it contains keywords

9
Example Execution of the R-Tree Baseline
algorithm on Distance-First Top-2 Spatial Keyword
Query 30.5, 100.0 with keyword internet and
pool
Root Node N1
-33.2,-122.2 47.3,-70.4
-41.1,-0.5 51.3,174.4
Node N2
Node N3
40.4,-122.2 47.3,-73.5
-33.2,-80.1 25.4,-70.4
-41.1,139.4 35.5,174.4
39.5,-0.5 51.3,116.2
Node N4
Node N5
Node N6
Node N7
47.3,-122.2 47.3,-122.2
40.4,-73.5 40.4,-73.5
-33.2,-70.4 -33.2,-70.4
25.4,-80.1 25.4,-80.1
-41.1,174.4 -41.1,174.4
35.5,139.4 35.5,139.4
51.3,-0.5 51.3,-0.5
39.5,116.2 39.5,116.2
Pointer to H2
Pointer to H6
Pointer to H7
Pointer to H1
Pointer to H8
Pointer to H3
Pointer to H5
Pointer to H4
Enqueue N1
Dequeue N1
Enqueue N2 and N3
Dequeue N3
Enqueue N6 and N7
Dequeue N7
Enqueue H5 and H4
Dequeue H4 H4 does not satisfy keywors, hence it
is discarded
If we continue, objects H3, H5, H8, H6, H1, H7,
H2 will be the results. Only H7, H2 are output
since they contain internet and pool
Priority Queue
N1, 0.0
N3, 0.0
N2, 170.4
N7, 9.0
N6, 39.4
H5, 102.6
H4, 18.5
10
Keyword Search Queries Second Baseline Algorithm
  • Keyword search on documents well-studied in IR.
    Two major methods
  • Inverted index
  • Signature files Faloutsos and Christodoulakis
    84
  • Inverted Index Only (IIO) Baseline
  • For each keyword find spatial objects that
    contain it
  • Intersect them
  • For each object compute distance to query point
  • Sort and return to user

11
Example Execution of IIO Baseline algorithm on
Distance-First Top-2 Spatial Keyword Query 30.5,
100.0 with keyword internet and pool
H2
H6
H1
H7
Results for internet
H3
H4
H2
H7
H8
Results for pool
H2
H7
Intersection of results
Results list
H2, 222.8
H7, 181.9
Execute the Inverted Index for keyword internet
Execute the Inverted Index for keyword pool
Intersect the two result sets
Get the coordinates for H2, calculate distance,
and add to result list
Get the coordinates for H7, calculate distance,
and add to result list
Sort, and that is our top-2 results
12
Roadmap
  • Motivation - Problem Definition
  • Baseline Methods
  • IR2-Tree and Search Algorithms
  • Experiments
  • Related Work
  • Conclusions

13
Information Retrieval R-Tree (IR2-Tree)
  • Combination of R-Tree and Signature Files.
  • Each node contains a rectangle and a signature.
  • The signature of a node is the superimposition
    (OR-ing) of all the signatures of its entries.
  • Bottom-up construction.
  • Multi-level IR2-Tree (MIR2-Tree)
  • Uses different signature lengths for different
    levels
  • More complex update operations
  • Fewer False Positives

14
IR2-Tree Search Algorithm
  • Calculate query signature.
  • Navigate IR2-Tree similarly to Incremental NN
    algorithm.
  • Discard nodes that do not satisfy query
    signature.
  • Check returned objects for false positives.

15
Example Execution of the IR2-Tree Algorithm on
Distance-First Top-2 Spatial Keyword Query 30.5,
100.0 with keyword internet and pool
Root Node N1
11111111 10110111
11111101 11011011
-33.2,-122.2 47.3,-70.4
-41.1,-0.5 51.3,174.4
Node N2
Node N3
10001111 00100011
11111111 10010110
10011001 01001011
01101101 10010011
40.4,-122.2 47.3,-73.5
-33.2,-80.1 25.4,-70.4
-41.1,139.4 35.5,174.4
39.5,-0.5 51.3,116.2
Node N4
Node N5
Node N6
Node N7
10001011 00000010
00001110 00100011
10000011 00010110
01111110 10000010
00011001 01001011
10011001 00001010
01100101 10000011
00001001 10010010
47.3,-122.2 47.3,-122.2
40.4,-73.5 40.4,-73.5
-33.2,-70.4 -33.2,-70.4
25.4,-80.1 25.4,-80.1
-41.1,174.4 -41.1,174.4
35.5,139.4 35.5,139.4
51.3,-0.5 51.3,-0.5
39.5,116.2 39.5,116.2
Pointer to H2
Pointer to H6
Pointer to H7
Pointer to H1
Pointer to H8
Pointer to H3
Pointer to H5
Pointer to H4
First we note that the signature for internet
is
00000010 00000000
Enqueue N1
Dequeue N1
Enqueue N2 note that N3 is pruned
Dequeue N2
Enqueue N4 and N5
Dequeue N5
Enqueue H7 note that H1 is pruned
Dequeue N4
Enqueue H2 note that H6 is pruned
Dequeue H7, check if false positive, our first
result
And the signature for pool is
00000001 00000000
Dequeue H2, check if false positive, our second
result
Therefore the query signature is
00000011 00000000
Priority Queue
N1, 0.0
N2, 170.4
N5, 170.5
N4, 173.8
H7, 181.9
H2, 222.8
16
Roadmap
  • Motivation - Problem Definition
  • Baseline Methods
  • IR2-Tree and Search Algorithms
  • Experiments
  • Related Work
  • Conclusions

17
Experiments
  • Athlon 64 3400 (NewCastle) with 2GB of RAM and
    74GB 10,000RPM drive
  • Block size is 4,096 KB
  • Two real datasets provided by High Performance
    Database Research Center (http//hpdrc.fiu.edu/)
  • Only results on Hotels dataset are presented

18
Varying k
  • 2 keywords
  • Signature length 189 bytes (longer at the top
    levels of the MIR2-Tree)

19
Varying keywords
  • k10
  • Signature length 189 bytes (longer at the top
    levels of the MIR2-Tree)

20
Varying signature length
  • k10
  • 2 keywords
  • Tradeoff nodes in tree (based on entries per
    block) vs. false positives

21
Index Size (MB)
22
Roadmap
  • Motivation - Problem Definition
  • Baseline Methods
  • IR2-Tree and Search Algorithms
  • Experiments
  • Related Work
  • Conclusions

23
Related Work
  • Nearest Neighbor Queries
  • N. Roussopoulos, S. Kelley, and F. Vincent.
    Nearest neighbor queries. SIGMOD, 1995.
  • G.R. Hjaltason and H. Samet. Distance browsing in
    spatial databases. TODS, Vol. 24, No. 2, 1999
  • Combination of spatial and keyword queries
  • D. Park, H. Kim An Enhanced Technique for
    k-Nearest Neighbor Queries with Non-Spatial
    Selection Predicates. In Multimedia Tools and
    Applications archive, Volume 19 , Issue 1
    (January 2003), Pages 79 103
  • Y. Zhou, X. Xie, C. Wang, Y. Gong, and W. Ma.
    Hybrid index structures for location-based web
    search. ACM CIKM 2005
  • Signature Files
  • Christos Faloutsos, Stavros Christodoulakis
    Signature Files An Access Method for Documents
    and Its Analytical Performance Evaluation. In ACM
    Trans. Inf. Syst. 2(4) 267-288(1984)
  • Dik Lun Lee, Young Man Kim, Gaurav Patel
    Efficient Signature File Methods for Text
    Retrieval. Pages 423-435. TKDE Vol 7, Number 3,
    June 1995

24
Roadmap
  • Motivation - Problem Definition
  • Baseline Methods
  • IR2-Tree and Search Algorithms
  • Experiments
  • Related Work
  • Conclusions

25
Conclusions
  • Framework for top-k spatial keyword search
    queries and variants.
  • Propose index combining R-Tree with signature
    files.
  • Algorithm for top-k spatial keyword search.
  • Comprehensive study and experimentation.

26
Thank You!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com