Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo

About This Presentation

Title:

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo

Description:

Experiments Data sets Search engine data ... Related Work Behavior targeting Regression models vs. classification models Stream indexing Boolean ... – PowerPoint PPT presentation

Number of Views:259

Avg rating:3.0/5.0

Slides: 18

Provided by: Exp117

Category:

more less

Transcript and Presenter's Notes

Title: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo

1
Efficient Behavior Targeting Using SVM Ensemble
Indexing

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo
Chinese Academy of Sciences
State Grid Energy Institute, China

2
Behavior targeting

Behavior Targeting (BT) uses users historical
behavior data to select the most relevant ads for
display.
Example from Yahoo! Research

User behavior data
ads
Targeted users
3
Regression for BT

Poisson Regression model (Ye Chen, eBay, 2009).
x ad clicks and views, page views, search
queries and clicks.
y click-through rate (CTR).

Poisson dis.
View data
Poisson reg. on view
ad category
Click data
Poisson dis.
Poisson reg. on click
Ye Chen et al., Large-scale behavior targeting
(KDD09 best paper award)
4
Limitations

Limitations
parameter tuning is very difficult.
the Poisson assumption is not always true for
real-world behavior data.
Clicks are typically several orders of magnitude
fewer than views.
User interests are not always fixed, but rather
transient.

5
Classification for BT

SVM for classification
Example 1 3 users on Nikon (www.nikon.com)s ad
a

View data
SVM for classification
View and click data() View but no click data(-)
ad category
Click data
Challenges 1,2,3
6
Classification for BT

Ensemble SVM on data streams
Merits
no complicated parameters
no statistical assumptions
Dynamic model on data streams

Challenge 4
7
Limitations

Time cost is heavy for online computing
ensemble prediction
time cost A (advertisers)W(ensemble
size)N(support vectors)T(features)

Example 2 We collect 2 million behavior events
(W 10) in 1 minute, and prediction result costs
53 minutes.
8
Solutions

Construct Index structure for Ensemble SVM.
Why the index work ?
Trade space for time.
shared features among multiple support vectors
the sparse structure of support vectors

Text terms
Features
map
Support vector
Document
Ensemble SVM
Document set
P. Zhang et al., knowledge index for online data
streams ( KDD 2011 ICDM 2011)
9
The index structure

The SVM-index structure
Example 3 based on example 1, consider a SVM
with 3 support vectors

Inverted hashing table
Support vectors
Ensemble information
Time complexity O(T)
10
The index structure

Operations
Search Predict the label of each incoming user
data x,
Step 1 searches support vectors in the left
inverted indexes
Step 2 calculate xs class label
Insert Integrate new classifiers into ensemble
Delete Drop outdated classifiers from ensemble
Memory

See our source codes.
11
Experiments

Data sets
Search engine data
Comparisons
Possion
E-SVM
E-Index (our method)

12
Comparisons

Observations

E-index has sub-linear prediction time
E-SVM consumes more memory
13
Comparisons
Ensemble models are more accurate than Poisson
regression model
14
Comparisons
The index method can significantly improve the
efficiency, especially when the ensemble size is
large.
15
Related Work

Behavior targeting
Regression models vs. classification models
Stream indexing
Boolean expression indexing in Publish/subscribe
systems
Ensemble models
Concept drifting

16
Conclusions

Contributions
Identify and address the prediction efficiency
problem for ensemble models for behavior
targeting.
Convert ensemble SVM model to a document set, and
propose a new type of invert text index structure
to achieve sub-linear prediction time.
Future work
Index more complicated SVM models with non-linear
kernels.