XRCE at ImageCLEF 07 - PowerPoint PPT Presentation

About This Presentation
Title:

XRCE at ImageCLEF 07

Description:

XRCE at ImageCLEF 07 Stephane Clinchant, Jean-Michel Renders and Gabriela Csurka Xerox Research Centre Europe France Outline Problem statement Image Similarity Text ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 16
Provided by: clefIsti
Category:

less

Transcript and Presenter's Notes

Title: XRCE at ImageCLEF 07


1
XRCE at ImageCLEF 07
  • Stephane Clinchant, Jean-Michel Renders and
    Gabriela Csurka
  • Xerox Research Centre Europe
  • France

2
Outline
  • Problem statement
  • Image Similarity
  • Text Similarity
  • Fusion between text and image
  • Cross-Media Similarities
  • Experimental results
  • Conclusion

3
Problem Statement
  • Problem
  • Retrieve relevant images from a given cross-media
    database (images with text) given a set of query
    images and a query text
  • Proposed solutions
  • Rank the images in the database based on image
    similarity (1), text similarity (2) and
    cross-media similarities (3)

4
Image Similarity
  • The goal is to define an image similarity measure
    that is able to best reflect a semantic
    similarity of the images.
  • E.g.
  • sim( ,
    ) gt sim( , )
  • Our proposed solution (detailed in next slides)
    is to
  • consider both local color and local texture
    features
  • build a generative model (GMM) in the low level
    feature space
  • represent the image based on Fisher Kernel
    Principles
  • define a similarity measure between Fisher Vectors

5
Fisher Vector
  • Given a generative model with parameters ? (GMM)
  • the gradient vector
  • normalized by the Fisher information matrix
  • leads to a unique model-dependent
    representation of the image, called Fisher Vector
  • As similarity between Fisher vectors the L1-norm
    was used

Fisher Kernels on Visual Vocabularies for Image
Categorization, F. Perronnin and C. Dance, CVPR
2007.
6
Text similarity
  • The text is first pre-processed including
  • Tokenization, lemmatization, word decompounding
    and stop-word removal
  • The text is modeled by a multinomial language
    model and smoothed via Jelinek-Mercer method
  • where pML(w ?d ) ? (w, d) and pML(w ?C ) ?
    ?d(w,d)
  • The textual similarity between two documents is
    defined by the cross-entropy function

7
Enriching the text using external corpus
  • Reason the texts related to the images in the
    corpus are poor (title only).
  • How each text in the corpus was enriched as
    follows
  • For each terms in the document we add related
    terms based on their clustered usage analysis an
    external corpus
  • The external corpus was the Flickr image database
  • The relationship between terms was based on the
    frequency of their co-occurrence as tags for
    the same image in Flickr (see top 5 ex. below)

classroom school, class, students, teacher, children
Riviera france, nice, sea, beach, french
Jesus christ, church, cross, religion, god
Ecuador galapagos, quito, southamerica, germany, worldcup
8
Fusion between image and text
  • Early fusion
  • Simple concatenation of image and text features
    (e.g. bag-of-words and bag-of-visual-words)
  • Estimating their co-occurences or joint
    probabilities (Mori et al, Vinokourov et al,
    Duygulu et al, Blei et al, Jeon et al, etc )
  • Late fusion
  • Simply combining the scores of mono-media
    searches (Maillot et al, Clinchant et al)
  • Intermediate level fusion
  • Relevance models (Jeon et al )
  • Trans-media (or intermedia) feedback (Maillot et
    al, Chang et al)

9
Intermediate level fusion
  • Compute mono-media similarities between an
    aggregate of objects coming from a first
    retrieval step and a multimodal object .
  • Use the duality of data to switch media during
    feedback process

Pseudo Feedback Top N ranked images based on
image similarity
Final rank Re-ranked documents based on textual
similarity

Aggregate textual information

10
Aggregate information from pseudo-feedback
  • Aim
  • Compute similarities between an aggregate of
    objects Nimg(q) corresponding to a first
    retrieval for query q and a new multimodal object
    u in the Corpus
  • Where Nimg(q)T(I1), T(I2) T(IN) , T(Ik) is
    the textual part of the kth image Ik in the
    (pseudo)-feedback group based on image similarity
  • Possible solutions
  • Direct Concatenation Aggregate (concatenate)
    T(Ik), k1,N to form a single object and compute
    text similarity between it and T(u).
  • Trans-media document re-ranking Aggregate all
    similarity measures between couple of objects .
  • Complementary (or Inter-media) Feedback Use a
    pseudo feedback algorithm to extract relevant
    features of Nimg(q) and use them to compute the
    similarity with T(u).

11
Trans-media document re-ranking
  • We define the following similarity measure
    between an aggregate of objects Nimg and a
    multimodal object u
  • Notes
  • This approach can be seen as a document
    re-ranking method instead of a query expansion
    mechanism.
  • The values simTXT(T(u),T(v)) can be pre-computed
    offline if the corpus is of reasonable size.
  • By duality, we can inverse the role of images and
    text

12
Complementary Feedback
  • We derive a LM (?F) for the relevance concept
    from the text set FNimg(q)
  • ?F is assumed to be multinomial (peaked at
    relevant terms) estimated by EM from
  • where P(w?C) is word probability built upon the
    Corpus, and ? (0.5) a fixed parameter.
  • The similarities between Nimg and T(u) is given
    by the cross-entropy similarity
  • between ?F and T(u) or we can first interpolate
    ?F with the query text
  • Notes
  • ? (0.5 in our exp) can be seen as a mixing
    weight between image and text
  • Unlike, trans-media re-ranking method, it needs a
    second retrieval step.
  • We can inverse the role of images and text if we
    use Rocchios method instead of (1).

A Study of smoothing methods for Language Models
applied to Information Retrieval, Zhai and
Lafferty, SIGIR 2001.
13
XRCEs ImageCLEF Runs
Run Name Modality Query Approach MAP
1. EN-EN-AUTO-FB-TXT_FLR Text only TXT LM FLR 0.2075
2. AUTO-NOFB-IMG_COMBFK Image only IMG FV L1 0.1890
3. AUTO-FB-TXTIMG_PREFFKTXT Mixed IMG TR 0.2801
4. AUTO-FB-TXTIMG_PREFFKTXT_FLR Mixed IMG TR FLR 0.2761
5. EN-EN-AUTO-FB-TXTIMG_QTXT_COMBPREFFKTXT Mixed IQTQ TR R1 0.3020
6. EN-EN-AUTO-FB-TXTIMG_MPRF Mixed IQTQ CF 0.3168
7. DE-EN-AUTO-FB-TXTIMG_MPRF_FLR Mixed IQTQ QT CF FLR 0.2899
8. EN-DE-AUTO-FB-TXTIMG_MPRF Mixed IQTQ QT CF 0.2776
  • LM language model with cross entropy
  • FVL1 Fisher Vector with L1 norm
  • FLR - text enriched by Flicker tags
  • TR Transmedia Reranking
  • CF Complementary Feedback
  • Ri Run I
  • QT Query Translation

14
Conclusion
  • Our image similarity measure (L1 norm on Ficher
    Vectors) seems to be quite suitable for CBIR.
  • It was the second best Visual Only system and
    unlike the first system it does not used any
    query expansion (nor feedback)
  • Combining it with text similarity within an
    intermediate level fusion allowed for a
    significant improvement.
  • Mixing the modalities increased the performance
    of about 50 (relative) over mono-media (pure
    text or pure image) systems .
  • Three out of six proposed cross-media systems
    were the best three Automatic Mixed Runs .
  • The system well performed even when the query and
    the Corpus were in different languages (English
    versus German).

15
Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com