Web Analytics - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Web Analytics

Description:

adsl-63-183-164.ilm.bellsouth.net - - [09/May/2001:13:42:07 -0700] ... The KDD Process for Extracting Useful Knowledge from Volumes of Data ... – PowerPoint PPT presentation

Number of Views:1296
Avg rating:3.0/5.0
Slides: 22
Provided by: oas89
Category:

less

Transcript and Presenter's Notes

Title: Web Analytics


1
Web Analytics
  • Xuejiao Liu
  • INF 385F WIRED
  • Fall 2004

2
Outline
  • Introduction
  • What is Web Analytics
  • Why Web Analytics matter
  • Secondary readings
  • Log files analysis
  • Web usage mining
  • Data preparation
  • KDD process
  • Document access in repositories

3
Log File Lowdown(Michael Calore, 2001 )
  • Log file
  • What are in log file
  • Traffic
  • Audience
  • Browsers/Platforms
  • Errors
  • Referers

4
Log File Lowdown
  • Sample Log File
  • adsl-63-183-164.ilm.bellsouth.net - -
    09/May/2001134207 -0700
  • "GET /about.htm HTTP/1.1" 200 3741
  • http//www.e-angelica.com
  • "Mozilla/4.0 (compatible MSIE 5.0 Windows 98)"
  • Log File Analyzers
  • WebTrends, Sawmill, Analog, Webalizer,
    HTTP-analyze

5
WebTrends
  • log file analyzer
  • Advantages
  • Fast and effective
  • User-friendly interface
  • Feature-rich
  • Support different operating systems
  • Disadvantages
  • Not free

6
WebTrends
7
The KDD Process for Extracting Useful Knowledge
from Volumes of Data (Fayyad, U., G.
Piatetsky-Shapiro, et al. 1996)
  • KDD Knowledge Discovery in Databases
  • The value of data
  • Definitions
  • KDD
  • Data mining

8
The KDD Process
The KDD process 1.Creating a target
dataset 2.Preprocessing and data
cleaning 3.Data reduction and projection 4.Data
mining Choosing the data mining function Choosing
the data mining algorithm 5.Interpretation and
evaluation
9
The KDD Process
  • Data Mining
  • Data mining involves fitting models to or
    determining patterns from observed data
  • Data mining algorithms
  • The model
  • The preference criterion
  • The search algorithm

10
The KDD Process
  • Data Mining
  • Model functions
  • Classification
  • Regression
  • Clustering
  • Dependency modeling
  • Link anlysis
  • Goals of Data Mining
  • Predictive and descriptive

11
Data Preparation for Mining World Wide Web
Browsing Patterns (Cooley, R. W., B. Mobasher,
et al. 1999)
  • Web Usage Mining vs. data mining
  • The WEBMINER process
  • Preprocessing
  • Mining algorithms
  • Pattern Analysis

12
Data Preparation
  • Preprocessing
  • Data cleaning
  • User identification
  • Session identification
  • Path completion
  • Formatting

13
Data Preparation
14
Data Preparation
15
Tracking the Growth of a Site ( Nielsen, Jakob,
1998)
  • Exponential growth of the web and the internet
  • Statistical method
  • Logarithmic convert to get linear regression
    Statistical analysis
  • Hypothesis the site is growing (number of
    pageviews and date are correlated)
  • R2 and significance

16
Tracking the Growth of a Site
R2 0.96, p 17
Tracking the Growth of a Site
  • Predict growth rate
  • Clean noise
  • Confident interval

18
Predicting Document Access in Large, Multimedia
Repositories(by Recker, M. R. and J. E. Pitkow,
1996)
  • patterns of document requests in
    network-accessible multimedia databases
  • Main idea
  • Two related domains Human memory and libraries
  • Borrow models and research results from them

19
Predicting Document Access
  • The model human memory (Anderson and Schooler)
  • The relationship of recency and performance is a
    power function
  • The relationship of frequency and performance is
    a power function
  • Tow parameters for performance
  • Need probability p and Need odds p/(1-p)
  • The linear function
  • Log(Need odds) a Log(Frequency) b

20
Predicting Document Access
  • Apply Human Memory Analysis in Document Requests
    Model
  • Dataset log file of Georgia Tech WWW repository
  • A dynamic information ecology
  • Frequency analysis
  • Regression equation
  • Log(Need Odds) .99 Log (Frequency) 1.30
  • Recency analysis
  • Regression equation
  • Log(Need Odds) -1.15 Log(days) .41
  • Combining recency and frequency

21
Predicting Document Access
  • Conclusion
  • Recency and frequency of past document access are
    strong predictors of future document access
  • Recency probed to be a stronger predictor than
    frequency
  • Applications for the design of information
    systems
  • Determine optimal ordering of retrieved items
  • Inform design decisions
  • Design of caching algorithms
Write a Comment
User Comments (0)
About PowerShow.com