Web Query Analysis: Aligning Queries to Periodic Events - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Web Query Analysis: Aligning Queries to Periodic Events

Description:

... a method to determine the temporal correlation between web queries. In ... Temporal correlation for web queries: Correlate queries to periodic events with ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 2
Provided by: Tym
Category:

less

Transcript and Presenter's Notes

Title: Web Query Analysis: Aligning Queries to Periodic Events


1
Web Query Analysis Aligning Queries to Periodic
Events Student Trinh Hoang Minh
Supervisor Dr. Min-Yen Kan
Project IDH079370
Overview
Data Collection
  • We collected 29 popular queries which represent
    annual events and 39 related queries to those
    events.
  • Using the popular trend search from Google
    Trends, we obtain the query volume histogram
    images. Numerical data are then extracted from
    downloaded images using a graph digitizer.
  • We describe a method to determine the temporal
    correlation between web queries. In particular
  • Study how to identify periodic query-as-events
  • Correlate other non-periodic queries to these
    events.
  • Develop a prototype to analyze such temporal
    correlation between queries and assess its
    performance, resulting in over 90 accuracy.

Evaluation
  • Two classifiers has been trained, a periodic
    classifier and a correlation classifier.
  • Periodic Classifier Evaluation
  • Obtain judgments from 7 staff members for all
    68 queries, resulting in Fleiss Kappa score of K
    0.802.
  • High true positive value of 93.1, with K
    0.794 when compared with the human judges.
  • Correlation Classifier Evaluation
  • divide the queries into periodic and
    non-periodic categories based on previous
    classification.
  • apply the second correlation classifier on
    pairs of queries, one drawn from each class.
  • Each pair was manually classified as to whether
    the two queries were thought to be correlated.
  • High true positive rate of 93.3, with Fleiss
    Kappa score 0.70 when compared with human
    judges.

Task 1. Periodic Classification
  • A recurring event has regular, repeated peaks in
    its histogram, corresponding to the events
    actual date.
  • We train a supervised Bayesian Network
    Classifier, using two main features
  • Autocorrelation Function(ACF) with a lag value
    k set equivalent to one year.
  • Correlation Coefficient Value (CCV) of
    pair-wise yearly histograms (2005, 2006, 2007).
    To reduce noise and variability, Dynamic time
    warping (DTW) was applied to find the best match
    among yearly histograms.

Task 2. Temporal Correlation Classification
  • To identify whether other (non-periodic) queries
    are correlated to these periodic
    events-as-queries, we again used supervised
    classification, using four main of features.
  • Overall Correlation calculate the full period
    query histogram correlation coefficient to find
    out the temporal correlation coefficient.
  •  
  • Most Recent Year Correlation The correlation
    coefficient for the last 12 months (i.e. 2007) is
    calculated and treated as a separate feature.
  • Conjunctive Data Two features measure the
    strength of the conjoined queries
  • the number of the web search results found a
    search engine (Googles Search API, in our case).
  • the number of times the two queries appear
    together in the top ten titles.

Conclusion
  • Key period Correlation A key period is
    defined as a period with high search volume,
    relative to other periods.
  • We then apply correlation coefficient equation
    during these key periods only.
  • Contribution
  • Periodic query classification
  • Temporal correlation for web queries Correlate
    queries to periodic events with reasonable
    accuracy, using only relative volume histograms
    and search results.
  • Facilitate proactive query suggestion or
    re-ranking of search results, which we are
    planning to explore as applications.
  • Future work
  • extend our work by integrating more data on
    query trends from news and blog trends.
  • extend our work to use partial correlation to
    correct for overall query volume growth.
Write a Comment
User Comments (0)
About PowerShow.com