Title: Estimating Rates of Rare Events at Multiple Resolutions
1Estimating Rates of Rare Events at Multiple
Resolutions
- Deepak AgarwalAndrei BroderDeepayan
ChakrabartiDejan DiklicVanja JosifovskiMayssam
Sayyadian
2Estimation in the tail
- Contextual Advertising
- Show an ad on a webpage (impression)
- Revenue is generated if a user clicks
- Problem Estimate the click-through rate (CTR) of
an ad on a page - Most (ad, page) pairs have very few impressions,
if any, - and even fewer clicks
- Severe data sparsity
3Estimation in the tail
- Use an existing, well-understood hierarchy
- Categorize ads and webpages to leaves of the
hierarchy - CTR estimates of siblings are correlated
- The hierarchy allows us to aggregate data
- Coarser resolutions
- provide reliable estimates for rare events
- which then influences estimation at finer
resolutions
4System overview
Retrospective dataURL, ad, isClicked
Crawl URLs
a sample of URLs
Classify pages and ads
Rare event estimation using hierarchy
Impute impressions, fix sampling bias
5Sampling of webpages
- Naïve strategy sample at random from the set of
URLs - Sampling errors in impression volume AND click
volume - Instead, we propose
- Crawling all URLs with at least one click, and
- a sample of the remaining URLs
- Variability is only in impression volume
6Imputation of impression volume
impressions nij mij xij
sums to ?nij K.?mijrow constraint
sums toTotal impressions(known)
sums to impressions on ads of this ad
classcolumn constraint
7Imputation of impression volume
Level 0
- Region (page node, ad node)
- Region Hierarchy
- A cross-product of the page hierarchy and the ad
hierarchy
Level i
Region
Page classes
Ad classes
Page hierarchy
Ad hierarchy
8Imputation of impression volume
Level i
Level i1
sums to
block constraint
9Imputing xij
- Iterative Proportional Fitting Darroch/1972
- Initialize xij nij mij
- Iteratively scale xij values to match
row/col/block constraint - Ordering of constraints top-down, then
bottom-up, and repeat
Level i
Level i1
block
Page classes
Ad classes
10Imputation Summary
- Given
- nij (impressions in clicked pool)
- mij (impressions in sampled non-clicked pool)
- impressions on ads of each ad class in the ad
hierarchy - We get
- Estimated impression volume Ñij nij mij
xijin each region ij of every level
11System overview
Retrospective datapage, ad, isclicked
Crawl Pages
a sample of pages
Classify pages and ads
Rare event estimation using hierarchy
Impute impressions, fix sampling bias
12Rare rate modeling
- Freeman-Tukey transform
- yij F-T(clicks and impressions at ij)
transformed-CTR - Variance stabilizing transformation Var(y) is
independent of Ey ? needed in further modeling
13Rare rate modeling
- Generative Model (Tree-structured Markov Model)
variance Wij
Wparent(ij)
Unobserved state
Sparent(ij)
Sij
ßparent(ij)
covariates ßij
variance Vij
Vparent(ij)
yparent(ij)
yij
14Rare rate modeling
- Model fitting with a 2-pass Kalman filter
- Filtering Leaf to root
- Smoothing Root to leaf
- Linear in thenumber of regions
15Experiments
- 503M impressions
- 7-level hierarchy of which the top 3 levels were
used - Zero clicks in
- 76 regions in level 2
- 95 regions in level 3
- Full dataset DFULL, and a 2/3 sample DSAMPLE
16Experiments
- Estimate CTRs for all regions R in level 3 with
zero clicks in DSAMPLE - Some of these regions Rgt0 get clicks in DFULL
- A good model should predict higher CTRs for Rgt0
as against the other regions in R
17Experiments
- We compared 4 models
- TS our tree-structured model
- LM (level-mean) each level smoothed
independently - NS (no smoothing) CTR proportional to 1/Ñ
- Random Assuming Rgt0 is given, randomly predict
the membership of Rgt0 out of R
18Experiments
TS
Random
LM, NS
19Experiments
Few impressions ? Estimates depend more on
siblings
Enough impressions ? little borrowing from
siblings
20Related Work
- Multi-resolution modeling
- studied in time series modeling and spatial
statistics Openshaw/79, Cressie/90, Chou/94 - Imputation
- studied in statistics Darroch/1972
- Application of such models to estimation of such
rare events (rates of 10-3) is novel
21Conclusions
- We presented a method to estimate
- rates of extremely rare events
- at multiple resolutions
- under severe sparsity constraints
- Our method has two parts
- Imputation ? incorporates hierarchy, fixes
sampling bias - Tree-structured generative model ? extremely fast
parameter fitting
22Rare rate modeling
- Freeman-Tukey transform
- Distinguishes between regions with zero clicks
based on the number of impressions - Variance stabilizing transformation Var(y) is
independent of Ey ? needed in further modeling
clicks in region r
impressions in region r
23Rare rate modeling
- Generative Model
- Sij values can be quickly estimated using a
Kalman filtering algorithm - Kalman filter requires knowledge of ß, V, and W
- EM wrapped around the Kalman filter
filtering
smoothing
24Rare rate modeling
- Fitting using a Kalman filtering algorithm
- Filtering Recursively aggregate data from leaves
to root - Smoothing Propagate information from root to
leaves - Complexity linear in the number of regions, for
both time and space
filtering
smoothing
25Rare rate modeling
- Fitting using a Kalman filtering algorithm
- Filtering Recursively aggregate data from leaves
to root - Smoothing Propagates information from root to
leaves - Kalman filter requires knowledge of ß, V, and W
- EM wrapped around the Kalman filter
filtering
smoothing
26Imputing xij
- Iterative Proportional Fitting Darroch/1972
- Initialize xij nij mij
- Top-down
- Scale all xij in every block in Z(i1) to sum to
its parent in Z(i) - Scale all xij in Z(i1) to sum to the row totals
- Scale all xij in Z(i1) to sum to the column
totals - Repeat for every level Z(i)
- Bottom-up Similar
Z(i)
Z(i1)
block
Page classes
Ad classes