Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling

Description:

A chronologically ordered sequence of 'clicks' from the same IP-address to the same shop ... Profile 2 Focused search. Profile 3 Potential buyers. Profile 4 ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 17
Provided by: peteriho
Category:

less

Transcript and Presenter's Notes

Title: Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling


1
Analysing Clickstream DataFrom Anomaly
Detection to Visitor Profiling
ECML/PKDD Discovery Challenge October 7, 2005
Porto, Portugal
  • Peter I. Hofgesang
  • hpi_at_few.vu.nl
  • Wojtek Kowalczyk
  • wojtek_at_few.vu.nl

2
Web server data
  • 7 internet shops (home electronics)
  • 80.000 visitors (IP-addresses) in 25 days
  • 0.5 million sessions
  • 3 million clicks (records in a log file)
  • Example record
  • 111076262912193.170.198.122
  • eb5cbe50997fcb7f9155c6c194c832a8/znacka/?c162ti
    skano
  • http//www.google.com./search?hlcsqSennheiserH
    D650
  • btnGVyhledatGooglemlrlang_cs
  • Objective discover interesting patterns !!!

3
Data Mining Process
4
Anomalies/Strange things I
  • Multiple IP-addresses per session
  • 2 IP-addresses 3.051 sessions
  • 3 IP-addresses 362 sessions
  • 4 IP-addresses 113 sessions
  • 22 IP-addresses 1 session
  • Some sessions involve IPs from different
    countries
  • A few sessions (12) refer to multiple shops

5
Anomalies/Strange things II
  • Sessions with long duration
  • 476 sessions longer than 24 hours (up to 18 days)
  • Very Intensive Sessions
  • 2.865 sessions with more than 100 visited pages
  • 19 sessions with more than 1.000 visited pages
  • 2 sessions with more than 10.000 visited pages
  • Frequent IP-addresses with short sessions
  • E.g. 29.320 sessions in less than 20 hours from
    147.229.205.80
  • Parallel sessions
  • Overlapping sequences of clicks from the same IP
    to the same shop within a short period with
    multiple SIDs (Opening a new window? Making a
    transaction? )

6
Anomalies/Strange things III
  • Sequences of short sessions that form sessions
  • Example clicks from 62.209.194.163 (31 Jan 04)
  • 094009 /dt/?c13654http//www.shop5.cz/
  • 094121 /dt/param.php?id115
  • 094121 /
  • 094137 /ls/?id20http//www.shop5.cz/dt/?c136
    54
  • 094142 /
  • 094224 /ls/?id20view1,2,3,8pozice20http
    //www.shop5.cz/ls/
  • 094225 /
  • 094248 /ls/?id20view1,2,3,8http//www.shop
    5.cz/ls/?id20
  • 094248 /
  • 094253 /ls/?id20view1,2,3,8pozice40http
    //www.shop5.cz/ls/
  • Each one has another session identifier !!!

7
Fixing the data
  • A new definition of session
  • A chronologically ordered sequence of clicks
    from the same IP-address to the same shop with
    no gaps longer than 30 minutes
  • Sessions longer than 50 clicks ignored (12.000)
  • Number of sessions dropped 522.410 ? 281.153

8
Old and New Sessions
9
Visitor Profiling
  • Motivation On the internet each shop is just
    one click away. If a user is not satisfied with
    the service he/she just goes to a next one and
    will likely never return.

10
Visitor Profiling Scheme
  • Clustering of user sessions
  • Analysis/interpretation of the clusters
  • Assign a cluster label to each session
  • Analysis of the profile sequences

11
Clustering
  • Cadez et al. (2001) - predictive profiles from
    historical transaction data
  • Mixture of multinomials
  • Full data likelihood
  • The unknown parameters and
  • are estimated by the
    expectation maximization (EM) algorithm.

12
Interpretation of the clusters
Profile 1 General overview of the products
Profile 2 Focused search
Profile 3 Potential buyers
Profile 4 Parameter based search
13
The transitions of profiles
14
Tree of user profiles
15
Tree of potential buyers
16
Conclusion
  • We spot several anomalies ? background
    information about pre-processing data
    preparation is important
  • Important features were missing (who is a buyer?)
  • Four clear user profiles
Write a Comment
User Comments (0)
About PowerShow.com