Context-Aware Query Classification - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Context-Aware Query Classification

Description:

Context-Aware Query Classification Huanhuan Cao1, Derek Hao Hu2, Dou Shen3, Daxin Jiang4 , Jian-Tao Sun4 , Enhong Chen1 and Qiang Yang2 1University of Science and ... – PowerPoint PPT presentation

Number of Views:161
Avg rating:3.0/5.0
Slides: 28
Provided by: Dere1180
Category:

less

Transcript and Presenter's Notes

Title: Context-Aware Query Classification


1
Context-Aware Query Classification
  • Huanhuan Cao1, Derek Hao Hu2, Dou Shen3,
  • Daxin Jiang4 , Jian-Tao Sun4 , Enhong Chen1 and
    Qiang Yang2
  • 1University of Science and Technology of China,
  • 2Hong Kong University of Science and Technology,
  • 3Microsoft Corporation
  • 4Microsoft Research Asia

2
Motivation
  • Understanding Web user's information need is one
    of the most important problems in Web search.
  • Such information could generally help improving
    the quality of many Web search services such as
  • Ranking
  • Online advertising
  • Query suggestion, etc.

3
Challenges
  • The main challenges of query classification
  • Lack of feature information
  • Ambiguity
  • Multiple intents
  • The first problem has been studied widely
  • Query expansion by top search results
  • Leverage a web directory
  • However, the second and the third problems are
    far away from being closed.

4
Why context is useful?
  • Context means the previous queries and clicked
    URLs in the same session given a query.
  • Its assumed that
  • Context has semantic relation with the current
    query.
  • Context may help to label appropriate categories
    for current query.
  • It makes sense to exploit context for specifying
    the current query.

5
Example
6
Example
7
Example
8
Overview
  • Problem statement
  • Model query context by CRF
  • Features of CRF
  • Experiment
  • Conclusion and future work

9
Problem Statement Context
  • In a user search session, suppose the user has
    raised a series of queries as q1q2qT-1 and
    clicked some returned URLs U1U2UT-1
  • If the user raises a query qT at time T, we call
    q1q2qT-1 and U1U2UT-1 as query context of qT
  • And we call qt t (t ? 1, T - 1) as contextual
    queries of qT .

10
Query Context
Query Context of Q_T
11
Problem Statement QC with context and Taxonomy
  • The objective of query classification (QC) with
    context is to classify a user query qT into a
    ranked list of K categories cT1, cT2, ..., cTK,
    among Nc categories c1,c2,,cNc, given the
    context of qT .
  • A target taxonomy ? is a tree of categories where
    c1,c2,,cNc are leaf nodes of this tree.

12
Modeling Query Context by CRF

  • where q represents q1q2qt

13
Why CRF?
  • The two main advantages of CRF are
  • 1) It can incorporate general feature functions
    to model the relation between observations and
    unobserved states
  • 2) It doesn't need prior knowledge of the type of
    conditional distribution.
  • Given 1), we can incorporate some external web
    knowledge.
  • Given 2), we dont need any assumptions of the
    type of p(cq).

14
Features of CRF
  • When we use CRF to model query context, one of
    the most important part is to choose effective
    feature functions.
  • We should consider
  • Relevance between queries and category labels
    for leveraging local information of queries
  • Relevance between adjacent labels for leveraging
    contextual information.

15
Relevance between queries and category labels
  • Term occurrence
  • The terms of qt are obvious features for
    supporting ct
  • Due to the limited size of training data, many
    useful terms indicating category information may
    be uncovered.
  • General label confidence
  • Leverage an external web directory such as
    Google Directory
  • where M
    means the number of
  • returned results and Mct,qt means the
    number of returned results with label ct after
    mapping.

16
Relevance between queries and category labels
  • Click-aware label confidence
  • Combining the click-information with the
    knowledge of a external web directory

  • CConf(ct ,ut) can be calculated by multiple
    approaches.
  • Here, we use VSM to calculate cosine similarity
    between term vectors of ct and ut

17
Relevance between Adjacent Labels
  • Direct relevance between adjacent labels
  • Occurrence of adjacent label pair ltct-1,ctgt
  • The weight implies how likely the two labels
    co-occur
  • Taxonomy based relevance between adjacent labels
  • Limited by the sampling approach and size of the
    training data, some reasonable adjacent label
    pairs may not occur proportionally or even not
    occur at all.
  • Consider indirect relevance between adjacent
    labels by considering the taxonomy.

18
Experiment
  • Data set
  • 10,000 random selected sessions from one days
    search log of a commercial search engine.
  • Three labelers firstly label all possible
    categories with KDDCUP05 taxonomy for each
    unique query of the training data.

19
Examples of multiple category queries

A large ratio of multiple category queries
implies the difficulty of QC without context.
20
Label Sessions
  • Then the three human labelers are asked to cross
    label each session of the data set with a
    sequence of level-2 category labels.
  • For each query, a labeler gives a most
    appropriate category label by considering
  • Query itself
  • The query context
  • Clicked URLs of the query.

21
Tested Approaches
  • Baselines
  • Non context-aware baseline Bridging
    classifier(BC) proposed by Shen et al.
  • Naïve context-aware baseline Collaborating
    classifier(CC). Combine a test query and the
    previous query to classify with BC.
  • CRFs
  • CRF-B CRF with basic features including term
    occurrence, general label confidence and direct
    relevance between adjacent labels)
  • CRF-B-C CRF with basic features click-aware
    label confidence)
  • CRF-B-C-T CRF with basic features click-aware
    label confidence taxonomy based relevance)

22
Evaluation Metrics
  • Given a test session q1q2qT, we let the qT be
    the test query and let queries q1q2qT-1 and
    corresponding clicked URL sets U1U2UT-1 be the
    query context.
  • For qT ,we evaluate a tested approach by
  • Precision(P) d(cT ? CT,K)/K
  • Recall(R) d(cT ? CT,K)
  • F1 score(F1 ) 2PR/(PR)
  • Where cT means the ground truth label and CT,K
    means a set of the top K labels. d() is a
    Boolean function of indicating whether is true
    (1) or false (0).

23
Overall results
1) The naïve context-aware baseline consistently
outperforms the non context-aware baseline. 2)
CRFs consistently outperform the two
baselines. 3) CRF-B-C-T gt CRF-B-C gt CRF-B click
information and taxonomy based relevance are
useful.
24
Case study
Context about travel
Click a travel guide web page
Give the most appropriate label in the first
position
25
Efficiency of Our Approach
  • Offline training
  • Each iteration takes about 300ms
  • Time cost of training a CRF is acceptable
  • Online cost
  • Calculating features
  • Label confidence

26
Conclusion and Future work
  • In this paper, we propose a novel approach for
    query classification by modeling query context
    via CRFs.
  • Experiments on a real search log clearly show
    that our approach outperforms a non context-aware
    baseline and a naive context-aware baselines.
  • Current approach cannot leverage the contextual
    information of the beginning queries of sessions,
    which make us carry on our following researches
    for leveraging more contextual information out of
    sessions.

27
  • Thanks?
Write a Comment
User Comments (0)
About PowerShow.com