Web Analytics - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Web Analytics

Description:

The study of the behavior of website users. ... restricted/data/Genome/nuc/' 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 16
Provided by: kelly131
Category:
Tags: analytics | msie | web

less

Transcript and Presenter's Notes

Title: Web Analytics


1
Web Analytics User Behavior
An update on Ajaxalytics and CBMGSKelly
StormMarch 17th, 2008
2
Web Analytics
  • The study of the behavior of website users.
  • Use of the data collected from a website to
    determine which aspects of the website are
    'working' or 'effective'.
  • Data is currently derived from web server logfiles

3
Web Server Logfile Analysis
  • Web servers record all transactions in a logfile
  • ex. a single line from Toxodb's
    logfile58.61.39.237 - - 19/Dec/2007232601
    -0500 "GET /restricted/data/Genome/nuc/TgME49B7_G
    ENOMIC_2003.04.17_TIGR.fasta.gz HTTP/1.1" 404 347
    "http//www.toxodb.org/restricted/data/Genome/nuc/
    " "Mozilla/4.0 (compatible MSIE 6.0 Windows NT
    5.1)"
  • These logfiles can be read by a program to
    provide data on the popularity / usefulness of
    the website.

4
Web Server Logfile Analysis
  • Initially, statistics consisted primarily of
    counting the number of client requests (or hits)
    made to the web server.
  • Now we wish to more accurately gauge the amount
    of human activity on web servers by analyzing
    page views, visits (sessions), and user-agents

5
Web Server Logfile Analysis
  • Page views are different from a hits. A page view
    is when a user accesses a particular web page, a
    hit encompasses both page views, image loads,
    JavaScript includes, etc.
  • Visits, or sessions, are sequences of requests
    from a uniquely identified client (host IP
    address) that expired after a certain amount of
    inactivity, usually 30 minutes.
  • User-agents indicate what operating system /
    browser type and version the host is currently
    using.

6
Web Server Logfile Analysis
  • By analyzing page views and user-agents we're
    able to provide detailed reports showing
  • Monthly, weekly, daily history
  • Hourly Average
  • Usage by location (country, state, city)?
  • Page load activity
  • Percentage of operating system and browser
  • References
  • Google Keywords

7
Web Server Logfile Analysis
  • By analyzing sessions, instead of counting hits,
    we can better understand user behavior
  • By creating model graphs of user sessions
  • By creating customer behavior model graphs
    (CBMG)?
  • By grouping users together based upon their
    individual usage and separating them with a
    clustering algorithm

8
Customer Behavior Model Graphs
9
Customer Behavior Model Graphs
10
Customer Behavior Model Graphs
  • For instance, this shows that half the users came
    into the website and browsed, and half came into
    the website and searched

11
Customer Behavior Model Graphs
  • When you have a website with thousands of pages
    the CBMGs quickly become unreadable and thus
    unhelpful, so we'd like to be able to group
    users according to
  • user-agent information
  • time
  • how often they view the site / at what frequency
  • how deep they're diving into the site
  • As well as group pages according to keywords to
    limit the size of the CBMG and give us a clearer
    view of user behavior.

12
Clustering
  • By analyzing each user's sessions and deriving a
    distance for each particular user, we can
    cluster groups of users together based upon their
    performance.
  • Doing so will allow us to separate the normal web
    users from the power users.
  • Observing the power users will give us more
    pertinent information as to the effectiveness of
    the site.

13
For the Comp Sci Kid in all of Us
  • By distance, I mean Euclidean distance
  • Each user will have a particular matrix showing
    the transition frequency from one page to the
    next, the distance will come from examining
    each individual value with respect to it's
    domain.
  • Once the number's are crunched, each matrix will
    be associated with it's own unique value.
  • This process is repeated for all users, and each
    user's matrix is grouped accordingly
  • Rinse, lather, and repeat until alpha
    (inter/intra) is large enough.

14
Since last we met
  • Create a definitions page for Ajaxalytics to help
    clarify some of the verbiage for future users
  • Create schema for parsing the log files to pick
    up where it last left off using timestamps
  • Shift log parser to C for faster parsing
  • Drill-down pages to give precise information (149
    returning visits for a particular day become a
    list of all the Ips, their frequency, the pages
    viewed). Drill-down information will be available
    for each month, day, and hour.
  • Rules for CBMGs based upon Apidb specific
    information.

15
Papers in the pipeline
  • ARPC in Apidb (Working with Cary)?
  • Asynchronous Remote Procedure Calls in Apidb
  • CBMGs in Apidb (Case study of use through
    Toxodb)?
  • Ajaxalytics write-up for the biological community
    ( to preface the release of the software once
    it's ready to go )?
Write a Comment
User Comments (0)
About PowerShow.com