The predictive power of online chatter

About This Presentation

Title:

Description:

Number of Views:58

Avg rating:5.0/5.0

Slides: 15

Provided by: raluc8

Transcript and Presenter's Notes

Title: The predictive power of online chatter

1
The predictive power of online chatter

Presented by R. Paiu
2
Motivation Methodology

Goal
Demonstrate that online blog postings can be used
to predict spikes in sales ranks
Methodology
Blog postings ?? Amazon books sales
Create hand-crafted queries and use them for
predictions
Use queries to discover blog posts discussing a
given product
Plot sales ranks and of blog posts as two time
series
Implement and test automated query generation
Develop automated prediction algorithms

3
Data Sources

4
Correlation of Blog Posts and Sales Rank

5
Detecting Spikes in Sales Ranks (step 1)

Minimal rank m must occur more than 2 weeks from
the start and the end of the considered period
(120 days)
One week before and after m occurs, the rank
value must be greater than max(m50, 1.5m)
50 books (out of 2430) contain a spike

6
Locating Blog Mentions about Books (Step 2)

7
Plotting and Correlating Time Series (Step 3)

Plot the sales rank of the product and the number
of corresponding postings vs. time
y-axis is scaled so that values are 01
Best lag k where cross-correlation is maximum
Best lag
Leading, if negative
Trailing, if positive
Results
Sales rank spikes potentially
correlated with blog activity
Spikes in sales rank may occur
despite insignificant blog activity
Causes include discount pricing,
bulk buying of books, etc.
Highly correlated spikes in sales ranks and
blog mentions were obtained for 10 books

8
Queries Automatic Generation

9
Sales Rank Prediction

Given the time series representing sales data up
to a point t, does the addition of blog mention
data for the same period improve the prediction
of what will happen to the sales rank?
Outcomes
MOTION VOLATILITY Predicting whether
tomorrows sales rank for a particular book will
be higher or lower appears to be hard
SPIKES Analysis of blog data up to a point t
allows to effectively predict when there will be
a future spike in the sales rank, without
recourse to information from the future and even
without recourse to history of sales ranks.

10
Predicting Motion Volatility

Analysis of various predictor algorithms
Volatility prediction predict if tomorrows
sales rank would differ from todays by more than
a certain threshold value.

11
Predicting Spikes 1

Original data is processed as follows (thruthed
data)
The point of the minimal sales rank is located, m
A threshold is set,
Spikes region is taken to be the maximum
interval containing the point of minimal sales
rank and no point of sales rank greater than
Steps
A product and a time t are fixed
The predictor is given as input the number of
blog postings for this product, for all days up
to and including t
The predictor outputs a bit indicating whether it
believes a spike in sales rank will occur in the
near future
Results of the predictor are evaluated against
the truthed data

12
Predicting Spikes 2

13
Predicting Spikes 3

14
Conclusions Future Work

Conclusions
Rank motion and volatility prediction is
difficult
The volume of blog postings related to a product
can be used to predict sales rank spikes
Automatic query generation for detecting blog
post mentions of products produces good results
Future work
Develop better understanding of when blogs are
useful for prediction
Create new tool features and new techniques for
automated query generation and prediction
Expand the area of prediction to other product
sales ranks, electoral voting, and public opinion
on important decisions
Study and analyze proposed Hidden Markov Model
relating discussion involvement of the blogger to
sales prediction lag

Write a Comment

User Comments (0)