Context-Aware Query Classification presentation

About This Presentation

Transcript and Presenter's Notes

Title: Context-Aware Query Classification

1
Context-Aware Query Classification

Huanhuan Cao1, Derek Hao Hu2, Dou Shen3,
Daxin Jiang4 , Jian-Tao Sun4 , Enhong Chen1 and
Qiang Yang2
1University of Science and Technology of China,
2Hong Kong University of Science and Technology,
3Microsoft Corporation
4Microsoft Research Asia

2
Motivation

Understanding Web user's information need is one
of the most important problems in Web search.
Such information could generally help improving
the quality of many Web search services such as
Ranking
Online advertising
Query suggestion, etc.

3
Challenges

The main challenges of query classification
Lack of feature information
Ambiguity
Multiple intents
The first problem has been studied widely
Query expansion by top search results
Leverage a web directory
However, the second and the third problems are
far away from being closed.

4
Why context is useful?

Context means the previous queries and clicked
URLs in the same session given a query.
Its assumed that
Context has semantic relation with the current
query.
Context may help to label appropriate categories
for current query.
It makes sense to exploit context for specifying
the current query.

5
Example
6
Example
7
Example
8
Overview

Problem statement
Model query context by CRF
Features of CRF
Experiment
Conclusion and future work

9
Problem Statement Context

In a user search session, suppose the user has
raised a series of queries as q1q2qT-1 and
clicked some returned URLs U1U2UT-1
If the user raises a query qT at time T, we call
q1q2qT-1 and U1U2UT-1 as query context of qT
And we call qt t (t ? 1, T - 1) as contextual
queries of qT .

10
Query Context
Query Context of Q_T
11
Problem Statement QC with context and Taxonomy

The objective of query classification (QC) with
context is to classify a user query qT into a
ranked list of K categories cT1, cT2, ..., cTK,
among Nc categories c1,c2,,cNc, given the
context of qT .
A target taxonomy ? is a tree of categories where
c1,c2,,cNc are leaf nodes of this tree.

12
Modeling Query Context by CRF

where q represents q1q2qt

13
Why CRF?

The two main advantages of CRF are
1) It can incorporate general feature functions
to model the relation between observations and
unobserved states
2) It doesn't need prior knowledge of the type of
conditional distribution.
Given 1), we can incorporate some external web
knowledge.
Given 2), we dont need any assumptions of the
type of p(cq).

14
Features of CRF

When we use CRF to model query context, one of
the most important part is to choose effective
feature functions.
We should consider
Relevance between queries and category labels
for leveraging local information of queries
Relevance between adjacent labels for leveraging
contextual information.

15
Relevance between queries and category labels

Term occurrence
The terms of qt are obvious features for
supporting ct
Due to the limited size of training data, many
useful terms indicating category information may
be uncovered.
General label confidence
Leverage an external web directory such as
Google Directory
where M
means the number of
returned results and Mct,qt means the
number of returned results with label ct after
mapping.

16
Relevance between queries and category labels

Click-aware label confidence
Combining the click-information with the
knowledge of a external web directory
CConf(ct ,ut) can be calculated by multiple
approaches.
Here, we use VSM to calculate cosine similarity
between term vectors of ct and ut

17
Relevance between Adjacent Labels

Direct relevance between adjacent labels
Occurrence of adjacent label pair ltct-1,ctgt
The weight implies how likely the two labels
co-occur
Taxonomy based relevance between adjacent labels
Limited by the sampling approach and size of the
training data, some reasonable adjacent label
pairs may not occur proportionally or even not
occur at all.
Consider indirect relevance between adjacent
labels by considering the taxonomy.

18
Experiment

Data set
10,000 random selected sessions from one days
search log of a commercial search engine.
Three labelers firstly label all possible
categories with KDDCUP05 taxonomy for each
unique query of the training data.

19
Examples of multiple category queries

A large ratio of multiple category queries
implies the difficulty of QC without context.
20
Label Sessions

Then the three human labelers are asked to cross
label each session of the data set with a
sequence of level-2 category labels.
For each query, a labeler gives a most
appropriate category label by considering
Query itself
The query context
Clicked URLs of the query.

21
Tested Approaches

Baselines
Non context-aware baseline Bridging
classifier(BC) proposed by Shen et al.
Naïve context-aware baseline Collaborating
classifier(CC). Combine a test query and the
previous query to classify with BC.
CRFs
CRF-B CRF with basic features including term
occurrence, general label confidence and direct
relevance between adjacent labels)
CRF-B-C CRF with basic features click-aware
label confidence)
CRF-B-C-T CRF with basic features click-aware
label confidence taxonomy based relevance)

22
Evaluation Metrics

Given a test session q1q2qT, we let the qT be
the test query and let queries q1q2qT-1 and
corresponding clicked URL sets U1U2UT-1 be the
query context.
For qT ,we evaluate a tested approach by
Precision(P) d(cT ? CT,K)/K
Recall(R) d(cT ? CT,K)
F1 score(F1 ) 2PR/(PR)
Where cT means the ground truth label and CT,K
means a set of the top K labels. d() is a
Boolean function of indicating whether is true
(1) or false (0).

23
Overall results
1) The naïve context-aware baseline consistently
outperforms the non context-aware baseline. 2)
CRFs consistently outperform the two
baselines. 3) CRF-B-C-T gt CRF-B-C gt CRF-B click
information and taxonomy based relevance are
useful.
24
Case study
Context about travel
Click a travel guide web page
Give the most appropriate label in the first
position
25
Efficiency of Our Approach

Offline training
Each iteration takes about 300ms
Time cost of training a CRF is acceptable
Online cost
Calculating features
Label confidence

26
Conclusion and Future work

In this paper, we propose a novel approach for
query classification by modeling query context
via CRFs.
Experiments on a real search log clearly show
that our approach outperforms a non context-aware
baseline and a naive context-aware baselines.
Current approach cannot leverage the contextual
information of the beginning queries of sessions,
which make us carry on our following researches
for leveraging more contextual information out of
sessions.

Thanks?

Write a Comment

User Comments (0)

About PowerShow.com

Context-Aware Query Classification PowerPoint PPT Presentation