Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo

Description:

Experiments Data sets Search engine data ... Related Work Behavior targeting Regression models vs. classification models Stream indexing Boolean ... – PowerPoint PPT presentation

Number of Views:259
Avg rating:3.0/5.0
Slides: 18
Provided by: Exp117
Category:
Tags: cao | data | guo | jun | liu | mining | peng | ping | stream | yanan | zhang

less

Transcript and Presenter's Notes

Title: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo


1
Efficient Behavior Targeting Using SVM Ensemble
Indexing
  • Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo
  • Chinese Academy of Sciences
  • State Grid Energy Institute, China

2
Behavior targeting
  • Behavior Targeting (BT) uses users historical
    behavior data to select the most relevant ads for
    display.
  • Example from Yahoo! Research

User behavior data
ads
Targeted users
3
Regression for BT
  • Poisson Regression model (Ye Chen, eBay, 2009).
  • x ad clicks and views, page views, search
    queries and clicks.
  • y click-through rate (CTR).

Poisson dis.
View data
Poisson reg. on view
ad category
Click data
Poisson dis.
Poisson reg. on click
Ye Chen et al., Large-scale behavior targeting
(KDD09 best paper award)
4
Limitations
  • Limitations
  • parameter tuning is very difficult.
  • the Poisson assumption is not always true for
    real-world behavior data.
  • Clicks are typically several orders of magnitude
    fewer than views.
  • User interests are not always fixed, but rather
    transient.

5
Classification for BT
  • SVM for classification
  • Example 1 3 users on Nikon (www.nikon.com)s ad
    a

View data
SVM for classification
View and click data() View but no click data(-)
ad category
Click data
Challenges 1,2,3
6
Classification for BT
  • Ensemble SVM on data streams
  • Merits
  • no complicated parameters
  • no statistical assumptions
  • Dynamic model on data streams

Challenge 4
7
Limitations
  • Time cost is heavy for online computing
  • ensemble prediction
  • time cost A (advertisers)W(ensemble
    size)N(support vectors)T(features)

Example 2 We collect 2 million behavior events
(W 10) in 1 minute, and prediction result costs
53 minutes.
8
Solutions
  • Construct Index structure for Ensemble SVM.
  • Why the index work ?
  • Trade space for time.
  • shared features among multiple support vectors
  • the sparse structure of support vectors

Text terms
Features
map
Support vector
Document
Ensemble SVM
Document set
P. Zhang et al., knowledge index for online data
streams ( KDD 2011 ICDM 2011)
9
The index structure
  • The SVM-index structure
  • Example 3 based on example 1, consider a SVM
    with 3 support vectors

Inverted hashing table
Support vectors
Ensemble information
Time complexity O(T)
10
The index structure
  • Operations
  • Search Predict the label of each incoming user
    data x,
  • Step 1 searches support vectors in the left
    inverted indexes
  • Step 2 calculate xs class label
  • Insert Integrate new classifiers into ensemble
  • Delete Drop outdated classifiers from ensemble
  • Memory

See our source codes.
11
Experiments
  • Data sets
  • Search engine data
  • Comparisons
  • Possion
  • E-SVM
  • E-Index (our method)

12
Comparisons
  • Observations

E-index has sub-linear prediction time
E-SVM consumes more memory
13
Comparisons
Ensemble models are more accurate than Poisson
regression model
14
Comparisons
The index method can significantly improve the
efficiency, especially when the ensemble size is
large.
15
Related Work
  • Behavior targeting
  • Regression models vs. classification models
  • Stream indexing
  • Boolean expression indexing in Publish/subscribe
    systems
  • Ensemble models
  • Concept drifting

16
Conclusions
  • Contributions
  • Identify and address the prediction efficiency
    problem for ensemble models for behavior
    targeting.
  • Convert ensemble SVM model to a document set, and
    propose a new type of invert text index structure
    to achieve sub-linear prediction time.
  • Future work
  • Index more complicated SVM models with non-linear
    kernels.

17
Questions?
  • For source code, visit our website
  • streamming.org/homepages/lijun.html
Write a Comment
User Comments (0)
About PowerShow.com