Generalized Multidimensional Data Mapping and Query Processing GiMP - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Generalized Multidimensional Data Mapping and Query Processing GiMP

Description:

Generalizes the mapping-based indexing and query processing process (GiMP) ... cf) GiST : generalizes tree-search indexing scheme ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 24
Provided by: ydk
Category:

less

Transcript and Presenter's Notes

Title: Generalized Multidimensional Data Mapping and Query Processing GiMP


1
Generalized Multi-dimensional Data Mapping and
Query Processing (GiMP)
  • Authors Rui Zhang et al.
  • ACM TODS 2005
  • Presented by Youngdae Kim _at_ IDS Lab.
  • 18 Sep, 2007

2
Background
  • Multi-dimensional data
  • spatial data
  • geographic information
  • ex) Pohang located at (129, 35)
  • object with many fields
  • ex) employee relation with fields id, salary,
    name, age, address,
  • Queries
  • point query
  • give me object(s) located at (3, 5)
  • give me employee(s) with age35 and name Jack
  • range query (window query)
  • give me all objects whose location overlap with
    the range 3,7 and 4,6
  • kNN query
  • give me the k nearest neighbors of object a

d2
5
0
5
d1
3
Background (cont.)
  • Index structure
  • R-tree
  • pack regions into rectangles close to each other
    recursively
  • do not use the stable DBMS index structure (e.g.,
    B-tree)
  • not easy to integrate with current DBMSs
    (complicated concurrency and recovery problem
    exist)
  • why not use B-tree?
  • not easy to assign orders (or keys) to
    multi-dimensional data sequentially while
    preserving their proximity
  • but efficiency and reliability are high if we can
    use B-tree

R-tree
close
still close?
one-d
multi-d


4
Mapping-based Indexing Schemes
  • General strategy
  • mapping
  • multi-dimensional data one-dimensional
    data (key)
  • one-to-one or many-to-one
  • query processing using B-tree
  • transform multi-dimensional query into key
    range(s)
  • get matched entries using B-tree
  • discard false positives
  • we obtained a super-set of answers ? possibly
    there exist irrelevant data
  • discard them
  • Examples
  • UB-tree, Pyramid technique, iMinMax, iDistance

5
Observations
  • Crux of mapping-based indexing scheme
  • mapping method
  • distance from reference point scattering factor
  • query transformation
  • multi-dimensional window query transformed into
    one-dimensional range query
  • for kNN query, use the incremental mapping
    mechanism

distance
key (p1) distance scattering factor
p1
r
6
Contributions
  • Generalizes the mapping-based indexing and query
    processing process (GiMP)
  • defined a framework for easy extension
  • cf) GiST generalizes tree-search indexing
    scheme
  • Suggests a measurement to predict performance of
    mapping-based indexing scheme
  • Solves the mappability problem
  • Is there an one-to-one mapping for given data
    space?

7
GiMP Structure
GiMP
Components
Data Mapping
Reference(P) Distance(P1, P2) Base(P)
B-tree
Queries Point query Range query Nearest Neighbor
Basic operations Insert Delete
Components
Components
MapRange(rg) MapAnnulus(Q, rmin, rmax)
Insert(P) Delete(P)
8
GiMP Data Mapping
  • Components
  • Reference (P)
  • reference point for P
  • ex) starting point with Z-value 0
  • Distance (P1, P2)
  • distance between P1 and P2 in multi-dimensional
    space
  • can be L1, Euclidean, Max ,or any user-defined
    distance
  • Base (P)
  • value to be added to the transformed value
  • usually used for scattering keys
  • Key (P) Base (P) Distance (P, Reference (P))

9
GiMP Query Processing
  • Components
  • MapRange (rg)
  • transform given range (rg) into key range
  • MapAnnulus (Q, rmin, rmax)
  • transform given annulus into one-dimensional
    intervals, usually incremental mapping
  • used for kNN search

a set of intervals a1, b2, a2, b2, ,
an, bn
rmin
a set of intervals a1, b2, a2, b2
rmax
10
GiMP Basic Operations
  • Components
  • Insert (P)
  • calculate Key (P) and insert into B-tree using
    the usual B-tree insertion operation
  • Delete (P)
  • use the usual B-tree deletion operation

11
GiMP UB-tree Instantiation
  • Data mapping (one-to-one)
  • use Z-value to map multi-dimensional data
  • P Z-value, one-to-one mapping
  • Reference (P) the point with 0 Z-value
  • Distance (P1, P2) difference of Z-values
  • Base (P) 0 since Z-value mapping is one-to-one
  • Key (P) Base (P) Distance(P, Reference (P))
    Z-value of P

12
GiMP UB-tree Instantiation (cont.)
  • Query processing
  • MapRange (rg)
  • find the Z-value range corresponding to the rg
  • ex) suppose rg is the orange region


intervals to search 12, 15, 24, 27









B-tree search
13
GiMP Pyramid Instantiation
  • Data mapping (many-to-one)
  • divide n-dimensional space into 2d pyramids that
    share the center point of the space as their top
    and a (d-1)-dimensional surface of the data space
    as their base
  • each of 2d pyramids is divided into several
    partitions
  • each data point has height
  • key (P) height of P pyramids number

(d-1)-dimensional surface
height of v
pyramid
partition
p3
p2
p0
center point
v
p1
data space
14
GiMP Pyramid Instantiation (cont.)
  • Data mapping (many-to-one) (cont.)
  • Reference (P) center point
  • Distance (P, Reference (P)) height of P
  • Base (P) pyramids number
  • Key (P) Base (P) Distance(P, Reference (P))
  • pyramids number height of P
  • ex) assume height of v 2.5, then Key (v) 1
    2.5 3.5

15
GiMP Pyramid Instantiation (cont.)
  • Query processing
  • MapRange (rg)
  • find the key range for the partitions which
    overlap the rg
  • ex) suppose rg is the dark-shaded region

d1
p3
the corresponding intervals for the light-shaded
partitions
p2
p0
p1
d0
16
GiMP Pyramid Instantiation (cont.)
  • Query processing (cont.)
  • MapAnnulus (Q, rmin, rmax)
  • incremental key range search
  • ex) suppose we first try (a) and then (b) for kNN
    search
  • at (a), range query transforms to 2hQ-r0,
    2hQr0 for pyr2
  • save the lower bound (2hQ-r0) and upper bound
    (2hQr0)
  • at (b), range query transforms to 2, 2hQ-r0,
    2hQr0, 2hQr0 dr for pyr2 ? the keys to be
    searched form a continuous range

17
GiMP iDistance Instantiation
  • Data mapping (many-to-one)
  • data space is divided into Np partitions
  • each partition has a reference point
  • data point P belongs to Ni partition if i
    argmin dist(P, ri)
  • key (P) distance (P, ri) i c
  • Reference (P) nearest reference point to P
  • Base (P) i c
  • Distance (P1, P2) Euclidean distance between P1
    and P2
  • Key (P) Base (P) Distance (P, Reference (P))

N partitions
key (p) d 1

p
r2
rN
r1
d
18
GiMP iDistance Instantiation
  • Query processing
  • MapAnnulus (Q, rmin, rmax)

19
Performance of GiMP
  • Direct implementation vs GiMP

20
Performance Prediction
  • What dominates the overall performance?
  • the mapping process
  • how the query is mapped to the one-dimensional
    ranges
  • redundant mapping causes performance degradation
  • Mapping redundancy
  • ratio between the mapped region and the query
    region
  • mr 1 is optimal

nm the number of pages that contain the data
points that are in the mapped region
na the minimum number of pages that contain the
data points in the answer set of a query Q
21
Performance Prediction (cont.)
  • Experimental results with amr (averaged mapping
    redundancy)

22
Mappibility Problem
  • Observation
  • naturally, one-to-one mapping shows better
    performance than many-to-one mapping indexing
    scheme
  • Mappibility
  • the existence possibility of one-to-one mapping
    from d-dimensional data space to one-dimensional
    domain
  • existence of one-to-one mapping depends on the
    nature of the data space (countable or
    uncountable property)

23
Conclusion
  • Users can define their own mapping-based indexing
    scheme by implementing the components of GiMP
  • MR (mapping redundancy) is a governing factor in
    the efficiency of mapping-based indexing schemes,
    so that it can be used as a performance
    prediction measurement
  • Existence of one-to-one mapping depends on the
    nature of the data space
Write a Comment
User Comments (0)
About PowerShow.com