A RuleBased Optimizer for Spatial Join Operations - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

A RuleBased Optimizer for Spatial Join Operations

Description:

Instituto de Inform tica. Universidade Federal do Rio Grande do Sul. Porto Alegre - Brazil ... DBMS, traditionally, improves the performance based on heuristic ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 25
Provided by: dpiI
Category:

less

Transcript and Presenter's Notes

Title: A RuleBased Optimizer for Spatial Join Operations


1
A Rule-Based Optimizer for Spatial Join Operations
  • Miguel Fornari
  • João Luiz Comba
  • Cirano Iochpe
  • Instituto de Informática
  • Universidade Federal do Rio Grande do Sul
  • Porto Alegre - Brazil

2
Outline
  • Introduction and motivation
  • Spatial Hash Algorithms
  • The Validation System Architecture
  • The Rules
  • Conclusions

3
Introduction and Motivation
  • The spatial join operation is fundamental and
    expensive in GIS
  • Combines two sets of spatial features based on a
    spatial predicate
  • DBMS, traditionally, improves the performance
    based on heuristic rules and cost expressions
  • Spatial DBMS can include a specific module to
    spatial operations

4
Goal
  • Reduce response time of the spatial join
    algorithms for the filter step
  • A set of rules to optimize the performance of
    some well-known algorithms
  • Which parameters are relevant?
  • What is the best value for each important
    parameter?

5
SJ Algorithms
  • According to the file organization

6
SJ algorithms
  • For each algorithm, two cost expressions are
    important
  • I/O cost
  • CPU cost
  • Some cost are already known, but not all
  • All expressions written in a similar notation

7
The System Architecture
  • The performance analysis, although correct,
    simplifies many cases
  • Real cases are more complex
  • Real data sets obtained in Internet

8
Plane-sweep algorithm
  • All SJ algorithms load objects to memory and
    perform a sweep-plane algorithm to check if pairs
    of objects satisfy the spatial predicate.
  • Traditional performance is O(k n log n) , where
    k is the number of object intersections
  • But, O(c n log n), where c is the number of
    performed comparisons, is more exact.

9
Plane-sweep algorithm
  • Divide the space into strips
  • Count the number of objects in each strip
  • The size of strips is the average size of objects
  • Estimation of c Sum of all values

10
Rule 1 Sweep-plane
  • The DBMS can estimate c for each axle and choose
    the one with minor value of c, optimizing the
    plane-sweep.
  • The shape of objects alters the response time

11
Synchronized Tree Transversal
  • Well known algorithm for R-Trees
  • The performance depends on height of R-Trees and
    average size of nodes.
  • The space division reduces the number of object
    comparisons ( c).
  • Available memory is not important.

12
Rule 2 - STT
  • The STT algorithm is optimized defining nodes
    with a low number of entries.
  • But, the total number of nodes will be greater,
    defining a minimum limit for the rule.
  • Optimal value between 50-75

13
Rule 2 - STT
  • The performance of STT algorithm is constant when
    the memory buffer size increases.
  • Except for very values
  • Set any value greater than 4heigth of the R-Trees

14
Iterative Stripped SJ
  • Iterative SJ (Jacox Samet) strips
  • Strips divides the space and reduces c
  • Transpassant objects can occur
  • The sorting can be either internal or external
  • The performance depends on the memory available,
    the number of strips, and replicas.

15
Rule 3 - ISSJ
  • The ISSJ algorithm is optimized definining a
    great number of strips. The number of objects in
    each strip will be small, but the is limited by
    the adding of replicas.

16
Rule 3 - ISSJ
  • Its important allocate enough memory to perform
    an internal sorting of each set

17
Partition Based Spatial Method(PBSM)
  • Calculates the number of partitions
  • Uses a regular grid to divide the space
  • Partitions can overflow
  • The performance depends on the number of replicas
    and overflowed partitions
  • The number of object comparisons (c) is reduced
    by the space subdivision

18
Rule 4 - PBSM
  • PBSM is improved setting a high value for the
    number of partitions using a small size of memory
    or just set a lower bound to the number of
    partitions.

19
Rule 4 - PBSM
  • This rule is limited by the number of replicas,
    that increase the number of processed objects.

20
Histogram Hash Stripped Join
  • A histogram of object distribution guides the
    space partitioning to avoid overflow
  • Replicas are counted into the histogram
  • The objects are maintained in a hash file and are
    loaded to memory only once.
  • The performance is affected by the number of
    replicas and the space subdivision

21
Rule 5 - HHSJ
  • HHSJ is improved setting a large value for the
    number of partitions and for the number of strips.

22
Rule 5 - HHSJ
  • This rule is limited, also, by the number of
    replicas, that increase the number of processed
    objects.

23
Conclusions Future Work
  • Our main contribution
  • The use of rules can reduce the response time of
    individual algorithms, in some cases, more than
    50.
  • The rules can be incorporated in real GDBMS
  • Future work
  • Use 3D sets to perform the tests
  • Include other spatial operations
  • Implement in PostGIS

24
Contact questions
  • Miguel Fornari
  • fornari_at_ieee.org
  • www.inf.ufrgs.br/fornari
  • João Comba
  • comba_at_inf.ufrgs.br
  • www.inf.ufrgs.br/comba
  • Cirano Iochpe
  • ciochpe_at_inf.ufrgs.br
  • www.inf.ufrgs.br/ciochpe
Write a Comment
User Comments (0)
About PowerShow.com