Title: Optimizing Online Yield via Predictive Modeling of Individual Site Visitors
1Optimizing Online Yield via Predictive Modeling
of Individual Site Visitors
David Lapayowker Marissa Quitt Elaine Shaver
(PM) Devin Smith
Magnify360 Liasons Olivier Chaine, Jim Healy,
Nate Pool, Gilles ?????
HMC Advisor Zachary Dodds
2Magnify360
Designs multiple websites for clients with each
site customized to meet the needs of different
types of users. Analyzes clickstream data from
site visitors in order to provide the website
that will best suit each one. The result is to
convert a larger set of users than a single page.
old Facebook
new Facebook
3System Overview
Tailored interactions
Navigates to a site
"Conversion"
User Actions
visitor_at_gmail.com
results
serve page
clickstream data
Dataflow
classify user
- user data
- pages served
- conversion data
Our system
Musician
Musician
clustering
Pasadena resident
choose page
Bioengineer
Musician
User groups
Insomniac
Pachyphile
Offline analysis
Online classifier
4Problem Statement
Tailored interactions
Navigates to a site
"Conversion"
Detailed problem statement here
User Actions
visitor_at_gmail.com
results
serve page
clickstream data
Dataflow
classify user
- user data
- pages served
- conversion data
Our system
Musician
Musician
clustering
Pasadena resident
choose page
Bioengineer
Musician
User groups
Insomniac
Pachyphile
Offline analysis
Online classifier
5Clickstream Data
example columns
Database
80 tables
13 GB
110,000,000 rows
ethics anonymous no purchased data!
6User profiles
A profile is a binary attribute that captures a
specific combination of data values.
Currently 42 of them, hand-specified
from Mag360's site
insomniac
something
something
Tradeoffs
captures experienced intuition about what is
important
takes advantage of Magnify360's site-design
expertise
- binary attributes
- may miss patterns not captured by the user
profiles
7Conversion data
The site yield, or conversion, is client-specified
Amount of transaction(s)
Time spent on (a part of) the site
Contact information
presence and/or time of an email address
3 conversion
table
Goal to determine those clusters of visitors
who will be best served (convert) via a
particular version of a client site
8Offline analysis user clustering
one big cluster "best page"
hand-tuned clusters
hierarchical clustering
growing neural gas
decision-tree clustering
fuzzy k-means clustering
support vector machines
Visitors vectors of profile attributes
9Offline analysis user clustering
one big cluster "best page"
hand-tuned clusters
hierarchical clustering
growing neural gas
decision-tree clustering
fuzzy k-means clustering
support vector machines
Visitors vectors of profile attributes
10Offline analysis user clustering
one big cluster "best page"
hand-tuned clusters
hierarchical clustering
growing neural gas
decision-tree clustering
fuzzy k-means clustering
support vector machines
Visitors vectors of profile attributes
11Offline analysis user clustering
one big cluster "best page"
hand-tuned clusters
hierarchical clustering
growing neural gas
decision-tree clustering
fuzzy k-means clustering
support vector machines
Visitors vectors of profile attributes
12Support vector machine example
Can we get one of the real data pages?
13From clusters to sites
Training data from each cluster determines the
best site
7
1
1
(yield)
3 (visits)
page A score 3.0
(yield)
7
8
3
3 (visits)
page B score 6.0
This cluster of six people responds better to
site B,
14Time-based site choice
Magnify360 wants to adapt quickly to new
preferences
Time-weighted average yields
2-3 1
20 7
2-4 1
20 2-3 2-4
page A score 6.05
2-5 8
2-4 7
2-1 3
2-4 2-5 2-1
page B score 3.68
t age of data
but site A has had better recent performance.
15Online classification
procedure
Possible results
16Results Packet 8
all on one graph
what about hand-tuned system results?
comments
17A closer look
talk about SVM parameters here?
comments
18Sensitivity to scoring parameters?
David's charts
comments
19Software structure
Diagram
What's done and not done
comments
20Software structure
Diagram
What's done and not done
comments
21Perspective
Concluding comments
Questions?
22(No Transcript)
23Clickstream Data
The Good
We have DATA!
The Bad
Too much?
The Ugly
What is this data!?
80 tables
13 GB
24One of our tables
25ID, anyone?
26Fun Statistics
27Data To do
Understand the purpose of each table / column
Understand relationships between tables
Create a single table (or file) of relevant
information in order to test and evaluate our
clustering algorithms.
(table demodularization, against all design
principles)
28Clustering Algorithms
k-Means Choose centroids at random, and place
points in cluster such that distances inside
clusters are minimized. Recalculate centroids
and repeat until a steady state is reached
Fuzzy k-Means Similar, but every datapoint is in
a cluster to some degree, not just in or out.
Heirarchical Clustering Uses a bottom-up
approach to bring together points and clusters
that are close together
FuzME's best 10-cluster results synthetic data
Bottom line These clustering algorithms are
simple and effective techniques for categorizing
data, but they cannot exist in a vacuum we are
investigating other techniques that may be used
in parallel or instead.
29Growing Neural Gas
- A clustering algorithm masquerading as a neural
network - Given a data distribution, dynamically determines
nodes or centroids to represent the data
30Growing Neural Gas
- A clustering algorithm masquerading as a neural
network - Given a data distribution, dynamically determines
nodes or centroids to represent the data
Representative Nodes
User Profiles
31Growing Neural Gas
- A clustering algorithm masquerading as a neural
network - Given a data distribution, dynamically determines
nodes or centroids to represent the data - Dynamic because it adds or deletes nodes as
necessary, as well as adapting nodes toward
changes in the data.
Representative Nodes
User Profiles
32How it works
Given some input x
- Find the closest node, s, and the next closest,
t. - Update the error of s by ews x
- Shift s and its neighbors toward x, and increment
the age of all those edges. - If s and t are adjacent, set the age of that edge
to 0. Otherwise, create that edge. - Remove edges that are too old, decrease the error
of all edges by a small amount. - Add a node every ? generations, putting it
between the node with the largest error and its
largest-error neighbor. - Repeat!
33A Few Parameters
(Making sense of the GUI)
- ? Controls how frequently new nodes are inserted
- Max Edge Age Dictates how often old edges are
deleted - ew Factor to scale the value of the winning
node - en Factor to scale the value of the next nearest
node - a Scale factor for decreasing the error of
parent nodes - ß Scale factor for decreasing error of all nodes
34 and the difference they make.
? 1000
? 100
- Larger ?, nodes inserted less often
- Takes longer, but yields more accurate placement
of nodes
- Smaller ?, nodes inserted more often
- Leaves straggler nodes that dont accurately
match data
35Support Vector Machines
36Clearly planar
37Planar in feature space
38Support Vector Regression (Machine?)
Goal Minimize error between hyper-plane and data
points.
SVM
SVR
Minimize plane-to-data distance
Maximize cluster separation
39Getting the correct page
What do we want from a technique?
Input User data. Output Page to serve.
CLASSIFICATION
Input User data and possible page. Output
Predicted Success.
REGRESSION
Both require multiple SVMs.
40Using Classification via SVMs
C
Predicted Page C
B
DATA
C
41Using Regression via SVRs
Page A Predictor
0.42
Predicted Page C
Page B Predictor
0.24
DATA
Page C Predictor
0.78
42Data
The Good
We have DATA!
The Bad
Too much?
The Ugly
What is this data!?
80 tables
13 GB
43One of our tables
44ID, anyone?
45Fun Statistics
46Data To do
Understand the purpose of each table / column
Understand relationships between tables
Create a single table (or file) of relevant
information in order to test and evaluate our
clustering algorithms.
(table demodularization, against all design
principles)
47Goal Breakdown
48Short-term Plan
49Plan for Algorithm Comparison
50Plan for Algorithm Comparison
51Plan for Algorithm Comparison
52Schedule and Conclusion
- Friday November 14
- Prototype algorithm comparison method
- Friday November 21
- Initial testing on real data
- Meeting with Magnify360
- Friday December 5
- Initial composition of classification algorithms
- Friday December 12
- Midyear Report
Questions?
53Questions?
54SVM vs SVR
SVM
SVR
Maximize Distance
Minimize Distance
55Data
The Bad, or, The Challenges
Lots of SQL data
56Some Data Tables
80 tables total
57Data Size
58Problem Statement
Officially Develop an innovative predictive
modeling system to predict shopping cart
abandonment based on profiles, clusters, shopping
cart contents
Most importantly GRAB from email ! Research and
implement various AI techniques to optimize the
process of matching users with websites
Individualized Online Experiences
59Classifying Users
Unsupervised clustering points are clustered
without knowledge of the results Supervised
clustering clusters are built using prior
knowledge of the results Ethical concerns?
60Recap What Magnify360 Does
Individualize a website for different types of
users Collect data on users from their
clickstream, and give them the site that will
appeal to them best Appeal to a larger base of
users by making the site more interesting to a
larger group
serving both!
old Facebook