Pivoting%20M-tree:%20A%20Metric%20Access%20Method%20for%20Efficient%20Similarity%20Search presentation

About This Presentation

Transcript and Presenter's Notes

Title: Pivoting%20M-tree:%20A%20Metric%20Access%20Method%20for%20Efficient%20Similarity%20Search

1
Pivoting M-tree A Metric Access Method for
Efficient Similarity Search

Tomáš Skopaltomas.skopal_at_vsb.czDepartment of
Computer Science, VŠB-Technical University of
Ostrava

2
Presentation Outline

Similarity search in Metric Spaces
M-tree
PM-tree
structure
range queries
hyper-ring storage
Experimental Results

3
Similarity search in Metric Spaces

Similarity search methods for content-based
retrieval in multimedia databases (in Information
Retrieval resp.)
Similarity modelled by metric d
Restriction to metric yields a paradigmatic
discrepancy with several similarity theories
nevertheless, the triangular inequality is the
basic tool for metric region construction leading
to an efficient similarity search
Metric queries
range query (specified by pivot object Q and
covering radius rQ)
k-NN query (specified by pivot object Q and
number of nearest neighbours k)

4
Metric Access Methods

Designed to search in metric datasets in order to
keep the search costs minimal (number of distance
computation).
When searching large multimedia databases also
the I/O search costs have to be minimized.
Many MAMs developed so far M-tree, GH-tree,
GNAT, LAESA, D-index, VP-tree, MVP-tree, SAT, ...
Majority of the MAMs is not suitable for
similarity search in large datasets (either a
static method or high I/O search costs)
only M-tree and (recently) D-index are suitable
candidates

5
M-tree

dynamic, balanced, and paged metric tree (like
e.g. B-tree, R-tree)
the leaves are clusters of objects
routing entries in the inner nodes represent
metric regions, recursively bounding the object
clusters in leaves
during query evaluation, the triangular
inequality allows discarding of irrelevant
M-tree branches (metric regions resp.)

6
PM-tree, motivation

metric regions in M-tree are unnecessarily large
? indexing of large portions of empty space (the
dead space)
? higher probability of intersection with query
region
? less efficient search
reduction of metric region volume should lead
to more effective discarding of irrelevant
subtrees
the way is to specify a metric region bounding
all the objects more tightly

7
PM-tree, structure

Pivoting M-tree (PM-tree) a combination of
M-tree with the pivot-based methods (LAESA-like)
given a fixed set of p pivots Pi (selected from
the dataset), a PM-tree region is additionaly
defined by p hyper-ring regions (Pi , HRi)
each routing entry contains an array HR of p
intervals ltHRi.min, HRi.maxgt
each interval HRi bounds the distances of
objects to the respective pivot Pi
intersection of the hyper-sphere and the
hyper-rings forms a smaller region bounding all
the objects
the more pivots, the more thightly bounded
region

8
PM-tree, query processing

prior to processing of a query (Q,rQ), distances
d(Q, Pi) for all i p must be computed
metric region is relevant to a range query just
in case that all the hyper-rings and the
hyper-sphere intersect the range query region ?
the more hyper-rings, the lower probability of
intersection with query
? no additional distance computations are
needed for the intersection test

M-tree region
PM-tree region
9
PM-tree, hyper-ring storage

The routing entries of PM-tree nodes are enlarged
by the additional pivot-based information stored
in HR arrays
To keep the space overhead minimal, a compact
storage of HRi intervals is necessary
A distance histogram for each pivot Pi is
created, and interval ltdimin, dimaxgt is chosen
such that e.g. 90 of distances in the distance
histogram fall into that interval
Each value HRi.min, HRi.max, is scaled to the
ltdimin, dimaxgt interval using a single byte,
i.e. each hyper-ring HRi takes 2 bytes

10
Experimental results (synthetic)

synthetic dataset of 100,000 30-dimensional
tuples distributed within 1000 clusters, L2
distance, query selectivity 50 objs.

11
Experimental results (images)

collection of 10,000 images represented by
256-dimensional vectors (gray histograms), L2
distance, query selectivity 50 objs.

12
Recent results (not included in proceedings)

Cost models for range queries in PM-tree (?
ADBIS04)
Experiments on image dataset (? ADBIS04)
Optimal k-NN query algorithm for PM-tree cost
models (to be published...)

13
Reference

1 Skopal T., Pokorný J., Snášel V. PM-tree
Pivoting Metric Tree for Similarity Search in
Multimedia Databases, submitted to ADBIS 2004,
Budapest, Hungary
2 Skopal T. Pivoting M-tree A Metric Access
Method for Efficient Similarity Search, DATESO
2004, Desná
3 Skopal T., Pokorný J., Krátký M., Snášel V.
Revisiting M-tree Building Principles. ADBIS
2003, LNCS 2798, Springer-Verlag, Dresden,
Germany

Write a Comment

User Comments (0)

About PowerShow.com

Pivoting%20M-tree:%20A%20Metric%20Access%20Method%20for%20Efficient%20Similarity%20Search PowerPoint PPT Presentation