Metric Inverted - An efficient inverted indexing method for metric spaces - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Metric Inverted - An efficient inverted indexing method for metric spaces

Description:

Invert objects (canonization to lexicon terms) Metric inverted indexing ... Canonization map features (Fi:vi) ... Canonization. For each feature select the n ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 26

Provided by: benjamin59

Category:

more less

Transcript and Presenter's Notes

Title: Metric Inverted - An efficient inverted indexing method for metric spaces

1
Metric Inverted -An efficient inverted
indexingmethod for metric spaces

Benjamin Sznajder
Jonathan Mamou
Yosi Mass
Michal Shmueli-Scheuer
IBM Research - Haifa

Presented by Shai Erera
2
Outline

Motivation
Problem Definition
Metric Inverted Index
Retrieval
Experiments
Conclusions

3
Outline

Motivation
Problem Definition
Metric Inverted Index
Retrieval
Experiments
Conclusions

4
Motivation

Web 2.0 enables mass multimedia productions
Still, search is limited to manually added
metadata
State of the art solutions for CBIR (Content
Based Image Retrieval) do not scale
Reveal linear scalability in the collection size
due to large number of distance computations
Can we use textIR methods to scale up CBIR?

5
Outline

Motivation
Problem Definition
Metric Inverted Index
Retrieval
Experiments
Conclusions

6
Problem definition

Low level image features can be generalized to
Metric Spaces
Metric Space An ordered pair (S,d) , where S is
a domain and d a distance function d S x S ? R
such that
d satisfies non-negativity, reflexibility,
symmetry and triangle inequality
The best-k results for a query in a metric space
are the k objects with the smallest distance to
the query
Convert distances to scores (small distance
high score) between 0,1

7
Problem definition

Top-K Problem
Assume m metric spaces, a Query Q, an aggregate
function f and a score function sd()
Retrieve the best k objects D with highest
f(sd1(Q,D), sd2(Q,D)sdm(Q,D))

k5
8
Outline

Motivation
Problem Definition
Metric Inverted Index
Retrieval
Experiments
Conclusions

9
Metric Inverted Index

Assume a collection of objects each having m
features
Object D F1v1, F2v2,, Fmvm
m metric spaces
Indexing steps
Lexicon creation (select candidates)
Invert objects (canonization to lexicon terms)

10
Metric inverted indexing Lexicon creation

Number of different features too large
Need to select candidates
Naïve solution
Lexicon of fixed size l
Select randomly l/m documents and extract their
features
These l features form our lexicon
Improvement
Replace the random choice by clustering (K-Means
etc.)
Keep the lexicon in an M-Tree structure

11
Metric inverted indexing invert objects

Given object D F1v1, F2v2,, Fmvm
Canonization map features (Fivi) to lexicon
entries
For each feature select the n nearest lexicon
terms
D F1v11, F1v12, F1v1n, F2v21,
F2v22, F2v2n, Fmvm1, Fmvm2,
Fmvmn
Index D in the relevant posting-lists

12
Outline

Motivation
Problem Definition
Metric Inverted Index
Retrieval
Experiments
Conclusions

13
Retrieval stage term selection

Given Q F1qv1, F2qv2,, Fmqvm
Canonization
For each feature select the n nearest lexicon
terms
Q F1qv11, F1qv12, F1qv1n,
F2qv21, F2qv22, F2qv2n,
Fmqvm1, Fmqvm2, Fmqvmn

14
Retrieval stage Boolean Filtering

These mn posting-lists will be queried via a
Boolean Query
Two possible modes
Strict-query-mode
Fuzzy-query-mode

15
Retrieval stage Scoring

Documents retrieved by the Boolean Query are
fully scored
Return the best k objects with the highest
aggregate score
f(sd_1(Q,D),sd_2(Q,D), ,sd_m(Q,D))

16
Outline

Motivation
Problem Definition
Metric Inverted Index
Retrieval
Experiments
Conclusions

17
Experiments

Focus on
Efficiency
Effectiveness
Collection of 160,000 images from Flickr
3 features are extracted from each image
EdgeHistogram, ScalableColor and ColorLayout
180 queries Fuzzy-Query-Mode
Sampled from the collection of images
Compared to M-tree data-structure

18
Experiments Measures Used

Effectiveness MAP is a natural candidate for
measuring
Problem In Image Retrieval, no document is
irrelevant
Solution we defined as relevant the k highest
scored documents in the collection (according to
the M-Tree computation)
MAP_at_K MAP computed on relevant and retrieved
lists of size k

19
Experiments Measures Used contd.

Efficiency we compute the number of computations
per query
A computation unit (cu) is a distance computation
call between two feature values

20
Effectiveness

MAP vs. number of Nearest Terms
size of the lexicon 12000

21
Effectiveness

MAP vs. lexicon size
Number Nearest Terms 30

22
Effectiveness vs. Efficiency

MAP vs. number of comparisons
Number Nearest Terms 30

23
M-Tree vs. Metric Inverted

Number of comparisons vs. top-k
Number Nearest Terms 30

24
Outline

Motivation
Problem Definition
Metric Inverted Index
Retrieval
Experiments
Conclusions

25
Conclusions

We reduce the gap between Text IR and Multimedia
Retrieval
Our method achieves very good approximation (MAP
98)
Our method improves drastically the efficiency
(90) over state-of-the-art methods

Write a Comment

User Comments (0)