Why We Search - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Why We Search

Description:

halo 3. cedar point. georgia marriage law. australian miners. elliott yamin. aol games ... 3) Familiarity Breeds Contempt. We get tired of certain kinds of news ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 40
Provided by: eyt6
Category:
Tags: halo | search

less

Transcript and Presenter's Notes

Title: Why We Search


1
Why We Search
  • Eytan Adar
  • University of Washington
  • May 12, 2007
  • Dan Weld, Brian Bershad, and Steve Gribble

2
Power in prediction
  • Based on blogs can we figure out which ad words
    to buy?
  • Based on event on TV can we gauge online
    response?
  • What kind of news events do groups respond to?
    How do they respond?
  • Integrate other behavioral data
  • Purchase habits
  • Brand awareness
  • Etc.

3
Power in prediction
  • Can we understand what events impact/predict/corre
    late online behavior?
  • Who responds to an event?
  • When do they respond?
  • How much?
  • Why do they respond?
  • Attention as a resource
  • Indicator for other investments

4
  • Daily lives
  • Information side effects
  • Attention
  • searches, mentions, news, votes, etc.

5
Searches about news
Blog posts about news
time
Predictive, Correlated
6
Suntan lotion sales
Sunshine
time
Predictive, Causal
7
Agenda
  • Transform text behavioral data to more useful
    form
  • Infrastructure to compare different behavioral
    data
  • Analysis visualization technique to compare
    behaviors over time
  • Some observations

8
iraq war
X 15M (MSN Logs) X 12.2M (AOL Logs) May 06
As of all queries (in that period)
iraq war
iraq war
Query Event Stream (QES)
9
X 14M Posts
of blog posts that mention phrase
10
X 13K Articles from CNN/BBC
of news articles that mention phrase number
of inlinks
11
X 2.5K Shows (TV.com)
of episodes that mention phrase number of
votes
12
Phrases/Queries ? Topics
  • We want to know that britney spears is the same
    as
  • spears britney or just
  • britney
  • Solution look at clicks and results
  • 1M queries from MSN logs that appear 2 times
  • Overlapping clicks/result sets indicate
    relatedness of queries (similarity measure)
  • Naïve clustering
  • Query Event Stream (QES) ? Topic Event Stream
    (TES)

13
Experimental Set
  • We take the 3638 most frequent queries from MSN
  • AOL 3627 ( 99)
  • BLOG 1975 (54)
  • NEWS 1704 (47)
  • TV 1602 (44)
  • Compare topic A in one set to topic A in another
  • Limits spurious correlations

14
Correlations
  • Do we even have a chance?
  • Equivalent to convolution
  • Try for some delay range, d, find max value
  • Negative/Positive correlations

r
d
0
15
Delays (high correlation)
38 are at 0
16
Explorer
17
Explorer
18
Explorer
19
Explorer
20
Explorer
21
Explorer
22
Explorer
23
Max-correlation delay 3 hours
time
Same correlations delays, but very different
shapes
24
How do we compare these? Visual summary of
differences?
25
magnitude
time
26
Capture not just delay or difference, but
specific behaviors
peak
fall
rise
run
27
Dynamic Time Warping (DTW)
DTWi,j min(DTWi-1,j cost,
DTWi,j-1 cost,
DTWi-1,j-1 cost)
0
28
Curve 1
Reference Curve
29
DTWRadar Summary of differences between two
times series
Curve 1
Reference Curve
Curve 1 has bigger response on average
Curve 1 lags on average
30
Reference Curve
Curve 1
31
Explorer with Radar
32
Some Findings
  • Randomly selected some topics and labeled them
  • People, places, events, news, etc.
  • So why do we search? Or blog? Or react to news?

33
1) News of the Weird
  • Bloggers pick up on weird stories first
  • igor vovkovinskiy
  • uss oriskany

Blog
Search (MSN)
Blog
Search (MSN)
Curves normalized to max value for readability
34
Blogs lead versus lag in the news
35
2) Anticipated Events
  • Pressure to be new
  • Bloggers dont talk about anticipated events
  • TV Shows

Search (MSN)
Blog
36
3) Familiarity Breeds Contempt
  • We get tired of certain kinds of news
  • Takes a really big spike for us to get excited
  • enron trial

Search (MSN)
News
37
4) Correlation vs. Causation
  • poseidon
  • Both responsd to movie release, but one to
    marketing and one to satire
  • Need other, more specific, data streams to infer
    causation

TV
Search (MSN)
38
5) Influence of the portal
  • mothers day
  • Demographics?
  • Hypothesis Whats on the front page drives
    search
  • Portals present news stories and information
  • Users react to that information
  • Different portals ? different searches

39
Summary
Unstructured Source Data
Conversion from a number of different data sources
Conversion / Data Cleaning
Time Series
Explorer, DTWRadar
Model Building
Models
Number of findings indicating the relationship of
data
Time Series Analysis Algorithms
Predictions
Write a Comment
User Comments (0)
About PowerShow.com