A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets

Description:

... snippet as ... just the contextually descriptive text snippet for each document ... Given two short text snippets x and y, the semantic similarity kernel ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 12
Provided by: Supe1
Category:

less

Transcript and Presenter's Notes

Title: A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets


1
A Web-based Kernel Function for Measuring the
Similarity of Short Text Snippets
  • WWW 2006
  • M.Sahami, T.Heilman
  • Google Inc.

2
Determining the similarity of short text
  • Traditional document similarity measures often
    produces inadequate results
  • Need a method that captures more of the semantic
    context of the snippets rather than simply
    measuring their term-wise similarity.

3
Other means of determining query similarity
  • Raghavan Sever, 1995
  • Differences in the ordering of documents
    retrieved in response
  • Fitzpatrick Dent, 1997
  • Normalized set overlap (intersection) of the top
    200 documents retrieved for each query.
  • In the context of Machine Learning, a great deal
    of work has extended to measuring semantic
    similarity between documents

4
The approachs advantage
  • Based on query expansion techniques
  • A lazy approach that we need not compute an
    expansion for a given text snippet until we want
    to evaluate the kernel function

5
Leverage the large volume of documents
  • Treat each snippet as a query
  • By examining documents that contain the text
    snippet terms we can discover other contextual
    terms
  • Create a context vector that contains many words
    that tend to occur in context with the original
    snippet terms

6
Compute the query expansion
7
Some issues
  • Create vectors using just the contextually
    descriptive text snippet for each document
  • 1000 characters is sufficient
  • Given two short text snippets x and y, the
    semantic similarity kernel between them is

8
Evaluation
9
Related query suggestion
  • Initial repository Q of previously issued queries
  • sampling search logs from the Google search
    engine
  • For any newly issued query u, compute our kernel
    function K(u, qi)
  • Do some post-filtering to eliminate queries that
    are too linguistically similar to each other
  • Suggest related queries qi which have the highest
    kernel score with u

10
Query suggestion system
11
Conclusion and Comments
  • The approach is relatively simple, but
    surprisingly quite powerful
  • Works well even when the short texts being
    considered have no common terms
Write a Comment
User Comments (0)
About PowerShow.com