Metric Inverted - An efficient inverted indexing method for metric spaces - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Metric Inverted - An efficient inverted indexing method for metric spaces

Description:

Invert objects (canonization to lexicon terms) Metric inverted indexing ... Canonization map features (Fi:vi) ... Canonization. For each feature select the n ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 26
Provided by: benjamin59
Category:

less

Transcript and Presenter's Notes

Title: Metric Inverted - An efficient inverted indexing method for metric spaces


1
Metric Inverted -An efficient inverted
indexingmethod for metric spaces
  • Benjamin Sznajder
  • Jonathan Mamou
  • Yosi Mass
  • Michal Shmueli-Scheuer
  • IBM Research - Haifa

Presented by Shai Erera
2
Outline
  • Motivation
  • Problem Definition
  • Metric Inverted Index
  • Retrieval
  • Experiments
  • Conclusions

3
Outline
  • Motivation
  • Problem Definition
  • Metric Inverted Index
  • Retrieval
  • Experiments
  • Conclusions

4
Motivation
  • Web 2.0 enables mass multimedia productions
  • Still, search is limited to manually added
    metadata
  • State of the art solutions for CBIR (Content
    Based Image Retrieval) do not scale
  • Reveal linear scalability in the collection size
    due to large number of distance computations
  • Can we use textIR methods to scale up CBIR?

5
Outline
  • Motivation
  • Problem Definition
  • Metric Inverted Index
  • Retrieval
  • Experiments
  • Conclusions

6
Problem definition
  • Low level image features can be generalized to
    Metric Spaces
  • Metric Space An ordered pair (S,d) , where S is
    a domain and d a distance function d S x S ? R
    such that
  • d satisfies non-negativity, reflexibility,
    symmetry and triangle inequality
  • The best-k results for a query in a metric space
    are the k objects with the smallest distance to
    the query
  • Convert distances to scores (small distance
    high score) between 0,1

7
Problem definition
  • Top-K Problem
  • Assume m metric spaces, a Query Q, an aggregate
    function f and a score function sd()
  • Retrieve the best k objects D with highest
    f(sd1(Q,D), sd2(Q,D)sdm(Q,D))

k5
8
Outline
  • Motivation
  • Problem Definition
  • Metric Inverted Index
  • Retrieval
  • Experiments
  • Conclusions

9
Metric Inverted Index
  • Assume a collection of objects each having m
    features
  • Object D F1v1, F2v2,, Fmvm
  • m metric spaces
  • Indexing steps
  • Lexicon creation (select candidates)
  • Invert objects (canonization to lexicon terms)

10
Metric inverted indexing Lexicon creation
  • Number of different features too large
  • Need to select candidates
  • Naïve solution
  • Lexicon of fixed size l
  • Select randomly l/m documents and extract their
    features
  • These l features form our lexicon
  • Improvement
  • Replace the random choice by clustering (K-Means
    etc.)
  • Keep the lexicon in an M-Tree structure

11
Metric inverted indexing invert objects
  • Given object D F1v1, F2v2,, Fmvm
  • Canonization map features (Fivi) to lexicon
    entries
  • For each feature select the n nearest lexicon
    terms
  • D F1v11, F1v12, F1v1n, F2v21,
    F2v22, F2v2n, Fmvm1, Fmvm2,
    Fmvmn
  • Index D in the relevant posting-lists

12
Outline
  • Motivation
  • Problem Definition
  • Metric Inverted Index
  • Retrieval
  • Experiments
  • Conclusions

13
Retrieval stage term selection
  • Given Q F1qv1, F2qv2,, Fmqvm
  • Canonization
  • For each feature select the n nearest lexicon
    terms
  • Q F1qv11, F1qv12, F1qv1n,
    F2qv21, F2qv22, F2qv2n,
    Fmqvm1, Fmqvm2, Fmqvmn

14
Retrieval stage Boolean Filtering
  • These mn posting-lists will be queried via a
    Boolean Query
  • Two possible modes
  • Strict-query-mode
  • Fuzzy-query-mode

15
Retrieval stage Scoring
  • Documents retrieved by the Boolean Query are
    fully scored
  • Return the best k objects with the highest
    aggregate score
  • f(sd_1(Q,D),sd_2(Q,D), ,sd_m(Q,D))

16
Outline
  • Motivation
  • Problem Definition
  • Metric Inverted Index
  • Retrieval
  • Experiments
  • Conclusions

17
Experiments
  • Focus on
  • Efficiency
  • Effectiveness
  • Collection of 160,000 images from Flickr
  • 3 features are extracted from each image
  • EdgeHistogram, ScalableColor and ColorLayout
  • 180 queries Fuzzy-Query-Mode
  • Sampled from the collection of images
  • Compared to M-tree data-structure

18
Experiments Measures Used
  • Effectiveness MAP is a natural candidate for
    measuring
  • Problem In Image Retrieval, no document is
    irrelevant
  • Solution we defined as relevant the k highest
    scored documents in the collection (according to
    the M-Tree computation)
  • MAP_at_K MAP computed on relevant and retrieved
    lists of size k

19
Experiments Measures Used contd.
  • Efficiency we compute the number of computations
    per query
  • A computation unit (cu) is a distance computation
    call between two feature values

20
Effectiveness
  • MAP vs. number of Nearest Terms
  • size of the lexicon 12000

21
Effectiveness
  • MAP vs. lexicon size
  • Number Nearest Terms 30

22
Effectiveness vs. Efficiency
  • MAP vs. number of comparisons
  • Number Nearest Terms 30

23
M-Tree vs. Metric Inverted
  • Number of comparisons vs. top-k
  • Number Nearest Terms 30

24
Outline
  • Motivation
  • Problem Definition
  • Metric Inverted Index
  • Retrieval
  • Experiments
  • Conclusions

25
Conclusions
  • We reduce the gap between Text IR and Multimedia
    Retrieval
  • Our method achieves very good approximation (MAP
    98)
  • Our method improves drastically the efficiency
    (90) over state-of-the-art methods
Write a Comment
User Comments (0)
About PowerShow.com