Title: Web Analytics: A Brief Tutorial by Dr. Robert J. Boncella Professor of Information Systems
1Web Analytics A Brief TutorialbyDr. Robert J.
BoncellaProfessor of Information Systems
TechnologySchool of BusinessWashburn University
- Presented
- March 2008
- To
- SAIS 2008
2Introduction
- Web analytics is the study of the behavior of
website visitors. - In a commercial context, web analytics refers to
the use of data collected from a web site to
determine which aspects of the website achieve
the business objectives - Tutorial Outline
- Web Analytics Context
- Web Analytics Technology Terminology
- Web Analytics Tools and Case Studies
3Context for Web Analytics
- DSS Decision Support System
- A conceptual framework for a process of
supporting managerial decision- making, usually
by modeling problems and employing quantitative
models for solution analysis - BI - Business Intelligence subset of DSS
- An umbrella term that combines architectures,
tools, databases, applications, and methodologies - BA - Business Analytics subset of BI
- The application of models directly to business
data - Assists in making strategic decisions
- WA - Web Analytics subset of BA
- The application of business analytics activities
to Web-based processes, including e-commerce
4Web Analytics - Details
- Relevant Technology
- Internet TCP/IP
- Client / Server Computing
- HTTP (HyperText Transfer Protocol)
- Server Log Files Cookies
- Web Bugs
- Data Collection
- The Clickstream
- Server Log Files
- Page Tagging
- Data Analysis
- Data Preparation
- Pattern Discovery
- Pattern Analysis
5Client/Server Computing
6Internet TCP/IP
- The Internet
- The infrastructure that provides for the delivery
of data between computer based processes - TCP/IP
- The protocols that provides for reliable delivery
of data on The Internet
7HTTP Protocol
- Client sends a request to a server
- Server sends a response to client
- Connectionless
- Client
- Opens connection to server
- Sends request
- Server
- Responds to request
- Closes connection
- Stateless
- Client/Server have no memory of prior connections
- Server cannot distinguish one client request from
another client
8Cookies
- Used to solve the Statelessness of the HTTP
Protocol - Used to store and retrieve user-specific
information on the web - When an HTTP server responds to a request it may
send additional information that is stored by the
client - state information - When client makes a request to this server the
client will return the cookie that contains its
state information - State information may be a client ID that can be
used as an index to a client data record on the
server
9Web Bug Process
Page C cnts - URLs Img Src - WebBug Img_at_
WBS. TRKSTRM.COM
Page B cnts - URLs Img Src - WebBug Img_at_
WBS. TRKSTRM.COM
1. Render page 2. Click on URL
Cookie My_Brwsr Pg A - Server A Pg B - Server
B Pg C - Server C
Page A cnts - URLs Img Src - WebBug Img _at_
WBS. TRKSTRM.COM
10Common Clickstream Data Sources
- Server Log Files
- Passive data collection
- Normal part of web browser/ web server
transaction - Page Tagging
- Active data collection
- Often requires a third party to implement a
vendor
11Server Log Files
Each time a client requests a resource the server
of that resource may record the following in its
log files
- The name IP address of the client computer
- The time of the request
- The URL that was requested
- The time it took to send the resource
- If HTTP authentication used the username of the
user of the client will be recorded - Any errors that occurred
- The referer link
- The kind of web browser that was used
12Server Log Files
- Example
- 127.0.0.1 - frank 10/Oct/2000135536 -0700
- "GET /apache_pb.gif HTTP/1.0" 200 2326
- 127.0.0.1 Remote host
- frank - user name
- 10/Oct/2000135536 -0700 - date time
- "GET /apache_pb.gif HTTP/1.0" - request
- 200 - status
- 2326 - bytes
13Server Log Files
- Technical issues for server log data
- Data Preparation
- Pageview Identification
- User Identification
- Session Identification
14Page Tags as Data Source
- Provided by Third Party - Vendor
- Vendor Supplies Page Tags
- Vendor Collects the Data
- Vendor Analyzes the Data
- Business Accesses the Data
- Online or
- Reports sent to Business
15Web Data Abstractions
- Abstractions concerning Web usage, Content, and
Structure - Establishes precise semantics for the concepts
- Web site
- Users or Visitors
- User Sessions
- Server Sessions or Visits
- Pageviews
- Clickstreams
16Data Abstractions
- Web Site - collection of interlinked Web pages,
including a host page, residing at the same
network location. - User or Visitors - principal using a client to
interactively retrieve and render resources or
resource manifestations - an individual that is accessing files from a Web
server, using a browser. - User Session - a delimited set of user clicks
across one or more Web servers
17Data Abstractions
- Server Session or Visit - a collection of user
clicks to a single Web server during a user
session - Pageview - the visual rendering of a Web page in
a specific environment at a specific point in
time - a pageview consists of several items
- frames, text, graphics, and scripts that
construct a single Web page - Clickstream - a sequential series of pageview
requests made from a single user
18Web Data Abstractions (High Level)
- Abstractions concerning Visitors
- Establishes precise semantics for the concepts
- Unique Visitor
- Conversion Rate
- Abandonment Rate
- Attrition
- Loyalty
- Frequency
- Recency
19Data Abstractions
- Unique Visitor
- A unique visitor is counted when a human being
uses a web browser to visit a web site. - A visitor may be unique for different periods
of time. - The individual is defined by a cookie in the
visitors web browser
20Data Abstractions
- Conversion Rate
- A conversion rate is the number of completers
divided by the number of starters for any
online activity that is more than one logical
step in length - Starting and finishing any activity
- Purchase
- Download a research article
- Etc.
21Data Abstractions
- Abandonment Rate
- The abandonment rate for any step in a multi-step
process is one minus the number of units that
make it to step n1 divided by those at step
n - The formula is (1 ((n1)/n)
- Consider a 10 step process to acquire a resource
- How any quit after step 1 or 2 or 3 or 4 or
- Consider a 5 step process to acquire a resource
- How any quit after step 1 or 2 or 3 or 4 or
22Data Abstractions
- Attrition
- Attrition is a measurement of people you have
been able to successfully convert but are unable
to retain to convert again - Consider e-bay web site vs. web site for
technical information
23Data Abstractions
- Loyalty
- Loyalty is a measure of the number of visits any
visitor is likely to make over their lifetime as
a visitor - Reported as number of visits per visitor
- 100 visitors made 3 visits each, 87 visitors made
4, etc. - Avoid double counting (i.e. do not count the 87
in with the 100)
24Data Abstractions
- Frequency
- Frequency is a measure of the activity a visitor
generates on a web site in terms of time between
visits - Measured in terms of days between visits
25Data Abstractions
- Recency
- Recency is the number of days since the last
visit (or purchase) - Reported as the number of visitors who returned
after n days.
26Pyramid Model of Web Analytics Data
Uniquely Identified Visitors
Unique Visitors
Visits
Increasing Value of Data
Page Views
Hits
Volume of Available Data
27Web Usage Mining
- Web usage mining is to apply statistical and data
mining techniques to the processed server log
data, in order to discover useful patterns - Data mining methods and algorithms that have been
adapted for the Web domain - Association rules
- Sequential pattern discovery
- Clustering
- Classification
28Web Usage Data Mining
- After discovering patterns from usage data, a
further analysis has to be conducted. - Common ways of analyzing such patterns
- Using a query mechanism on a database where the
results are stored - Loading the results into a data cube and then
performing OLAP operations - Visualization techniques are used for an easier
interpretation of the results - Using these results in association with content
and structure information concerning the Web site
there can be extracted useful knowledge for
modifying the site according to the correlation
between user and content groups.
29Web Analytics Tools and Case Studies
- Tools
- VisiStat - www.visistat.com
- Web Analytics Case Studies
- Communications Provider - TuVox.com
- Online Retailer - TicketsByInternet.com
- Winery Entertainment Venue - The Mountain
Winery - Non-Profit Organization - SFBallet.org
- Public Relations Media Agency - BLASTmedia
- Technology Provider for Real Estate
Professionals - Pullan.com - Real Estate Agency - Intero Real Estate
- Start-Up Online Business - GuruPrint.com