A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets

About This Presentation

Title:

A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets

Description:

... snippet as ... just the contextually descriptive text snippet for each document ... Given two short text snippets x and y, the semantic similarity kernel ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 12

Provided by: Supe1

Category:

more less

Transcript and Presenter's Notes

Title: A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets

1
A Web-based Kernel Function for Measuring the
Similarity of Short Text Snippets

WWW 2006
M.Sahami, T.Heilman
Google Inc.

2
Determining the similarity of short text

Traditional document similarity measures often
produces inadequate results
Need a method that captures more of the semantic
context of the snippets rather than simply
measuring their term-wise similarity.

3
Other means of determining query similarity

Raghavan Sever, 1995
Differences in the ordering of documents
retrieved in response
Fitzpatrick Dent, 1997
Normalized set overlap (intersection) of the top
200 documents retrieved for each query.
In the context of Machine Learning, a great deal
of work has extended to measuring semantic
similarity between documents

4
The approachs advantage

Based on query expansion techniques
A lazy approach that we need not compute an
expansion for a given text snippet until we want
to evaluate the kernel function

5
Leverage the large volume of documents

Treat each snippet as a query
By examining documents that contain the text
snippet terms we can discover other contextual
terms
Create a context vector that contains many words
that tend to occur in context with the original
snippet terms

6
Compute the query expansion
7
Some issues

Create vectors using just the contextually
descriptive text snippet for each document
1000 characters is sufficient
Given two short text snippets x and y, the
semantic similarity kernel between them is

8
Evaluation
9
Related query suggestion

Initial repository Q of previously issued queries
sampling search logs from the Google search
engine
For any newly issued query u, compute our kernel
function K(u, qi)
Do some post-filtering to eliminate queries that
are too linguistically similar to each other
Suggest related queries qi which have the highest
kernel score with u