Title: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo
1Efficient Behavior Targeting Using SVM Ensemble
Indexing
- Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo
- Chinese Academy of Sciences
- State Grid Energy Institute, China
2Behavior targeting
- Behavior Targeting (BT) uses users historical
behavior data to select the most relevant ads for
display. - Example from Yahoo! Research
User behavior data
ads
Targeted users
3Regression for BT
- Poisson Regression model (Ye Chen, eBay, 2009).
- x ad clicks and views, page views, search
queries and clicks. - y click-through rate (CTR).
Poisson dis.
View data
Poisson reg. on view
ad category
Click data
Poisson dis.
Poisson reg. on click
Ye Chen et al., Large-scale behavior targeting
(KDD09 best paper award)
4Limitations
- Limitations
- parameter tuning is very difficult.
- the Poisson assumption is not always true for
real-world behavior data. - Clicks are typically several orders of magnitude
fewer than views. - User interests are not always fixed, but rather
transient.
5Classification for BT
- SVM for classification
- Example 1 3 users on Nikon (www.nikon.com)s ad
a
View data
SVM for classification
View and click data() View but no click data(-)
ad category
Click data
Challenges 1,2,3
6Classification for BT
- Ensemble SVM on data streams
- Merits
- no complicated parameters
- no statistical assumptions
- Dynamic model on data streams
Challenge 4
7Limitations
- Time cost is heavy for online computing
- ensemble prediction
- time cost A (advertisers)W(ensemble
size)N(support vectors)T(features)
Example 2 We collect 2 million behavior events
(W 10) in 1 minute, and prediction result costs
53 minutes.
8Solutions
- Construct Index structure for Ensemble SVM.
- Why the index work ?
- Trade space for time.
- shared features among multiple support vectors
- the sparse structure of support vectors
Text terms
Features
map
Support vector
Document
Ensemble SVM
Document set
P. Zhang et al., knowledge index for online data
streams ( KDD 2011 ICDM 2011)
9The index structure
- The SVM-index structure
- Example 3 based on example 1, consider a SVM
with 3 support vectors
Inverted hashing table
Support vectors
Ensemble information
Time complexity O(T)
10The index structure
- Operations
- Search Predict the label of each incoming user
data x, - Step 1 searches support vectors in the left
inverted indexes - Step 2 calculate xs class label
- Insert Integrate new classifiers into ensemble
- Delete Drop outdated classifiers from ensemble
- Memory
See our source codes.
11Experiments
- Data sets
- Search engine data
- Comparisons
- Possion
- E-SVM
- E-Index (our method)
12Comparisons
E-index has sub-linear prediction time
E-SVM consumes more memory
13Comparisons
Ensemble models are more accurate than Poisson
regression model
14Comparisons
The index method can significantly improve the
efficiency, especially when the ensemble size is
large.
15Related Work
- Behavior targeting
- Regression models vs. classification models
- Stream indexing
- Boolean expression indexing in Publish/subscribe
systems - Ensemble models
- Concept drifting
16Conclusions
- Contributions
- Identify and address the prediction efficiency
problem for ensemble models for behavior
targeting. - Convert ensemble SVM model to a document set, and
propose a new type of invert text index structure
to achieve sub-linear prediction time. - Future work
- Index more complicated SVM models with non-linear
kernels.
17Questions?
- For source code, visit our website
- streamming.org/homepages/lijun.html