Title: Guide to the Clickstream Data
1Guide to the Clickstream Data
- Petr Berka
- University of Economics, Prague
- berka_at_vse.cz
2Web Usage Mining Domain
- click-stream - a sequential series of page view
(displays on users browser at one time)
requests, - server session - a click-stream of page views for
a single user for a particular web site, - user session - is the click-stream of page views
for a single user across the entire web.
3The Clickstream Data
- 3Millions of records (24 days) from a www shop
web server log - Contains information about time IP address
session ID page request referer - There are hundreds of thousands of sessions most
of them very short, on average 16 pages - Each page request in this www shop has the same
structure page type / content ID (product ID) - Page types are for example dp (detail of
product), sb (shopping basket), ct (contact)
4Example of the Data
unix time IP address session ID
page request referee 1074589200193.17
9.144.2 1993441e8a0a4d7a4407ed9554b64ed1/dp/?id
124 www.google.cz 1074589201194.213.35.234399
5b2c0599f1782e2b40582823b1c94/dp/?id182
1074589202194.138.39.56 2fd3213f2edaf82b27562d
28a2a747aa/ www.seznam.cz 1074589233
193.179.144.2 1993441e8a0a4d7a4407ed9554b64ed1/
dp/?id148 /dp/?id124 1074589245193.179.144.2
1993441e8a0a4d7a4407ed9554b64ed1/sb/
/dp/?id148 1074589248194.138.39.56
2fd3213f2edaf82b27562d28a2a747aa/contacts/
/ 1074589290193.179.144.2 1993441e8a0a4d7a4407e
d9554b64ed1/sb/ /sb/
5Data Description
- table obchod (shop) - name of the internet shop
(7 entries), - table kategorie (category) - info about
category of products (64 entries), - table list (sheet) - info about a specific
product of a more detailed type (157 entries), - table znacka (brand) - name of the producer or
brand of a product (197 entries), - table tema (theme) - info about themes
discussed in the on-line advice (36 entries)
6Data Summary (1/3)
- 522 410 sessions
- 318 523 single page
- 203 887 length gt 1
- avg. length 16
- median 8
- modus 2
- longest 15454
7Data Summary (2/3)
- time spent during a session
- avg. time 002446
- median 000308
- modus 000009
- longest 4332753
8Data Summary (3/3)
distribution of sessions with length gt 1