Web Usage Mining: An Overview - PowerPoint PPT Presentation

About This Presentation
Title:

Web Usage Mining: An Overview

Description:

Web Usage Mining: An Overview Lin Lin Department of Management Lehigh University Jan. 30th Agenda Web Usage Mining: Definition Research Issues in Web Usage Mining ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 25
Provided by: cseLehig3
Category:
Tags: mining | overview | usage | web

less

Transcript and Presenter's Notes

Title: Web Usage Mining: An Overview


1
Web Usage Mining An Overview
  • Lin Lin
  • Department of Management
  • Lehigh University
  • Jan. 30th

2
Agenda
  • Web Usage Mining Definition
  • Research Issues in Web Usage Mining
  • Current Research in Web Usage Mining
  • Going Forward

3
Web Usage Mining A Definition
  • The process of applying data mining techniques to
    the discovery of usage patterns from Web data,
    targeted towards various applications
  • Different from content mining structure mining
  • (Adamic, L. A., and Adar, E. 2003.
    Friends and neighbors on the web. Social
    Networks 25(3)211230.)

4
Web Usage Mining Data Source
  • Typical data sources for web usage mining are
  • Web structure data (site map, links, etc.)
  • Web content data
  • User profile (may not be available)
  • Web log (web usage data, clickstream data)

5
Web Usage Mining Procedure
6
Preprocessing Challenges
  • WHO are the users?
  • IP vs. real people
  • HOW LONG did the users stay?
  • Measuring session time (L. Catledge and J.
    Pitkow. Characterizing browsing behaviors on the
    world wide web. Computer Networks and ISDN
    Systems, 27(6), 1995)(Berendt, B. Mobasher, M.
    Nakagawa, and M. Spiliopoulou. The impact of site
    structure and user environment on session
    reconstruction in web usage analysis. In
    Proceedings of the 4th WebKDD 2002 Workshop, at
    the ACM-SIGKDD Conference on Knowledge Discovery
    in Databases (KDD2002), Edmonton, Alberta,
    Canada, July 2002.
  • WHERE did the users go?
  • Server side vs. Client side
  • WHAT did the users view?
  • Content processingMoe, Wendy W. 2003. Buying,
    searching, or browsing Differentiating between
    online shoppers using in-store navigational
    click-stream. J. Consumer Psych. 13(1, 2) 2940.

--------------------------------------------------
------------------------------------- For the
best review on preprocessing methods, refer to
R. Cooley, B. Mobasher, J. Srivastava, Data
preparation for mining world wide web browsing
patterns, Knowledge and Information Systems 1 (1)
(1999) 532
7
Usage Pattern Discovery Methods
  • Statistical Methods (including dependency
    modeling and stochastic modeling)
  • Association Rule Mining
  • Clustering (user cluster vs. page cluster)
  • Classification

8
Usage Pattern Discovery Research Streams
  • Why am I interested in web usage mining? (a.k.a.,
    whats the big deal?)
  • Blattberg, Robert C. and John Deighton (1991),
    "Interactive Marketing Exploring the Age of
    Addressability," Sloane Management Review, 33
    1), 5-14
  • Ghosh, S. 1998. Making business sense of the
    Internet. Harvard Business Review 76(2) 126135
  • Bucklin R. E., Lattin, J. M., Ansari, A., Bell,
    D., Coupey, E. Gupta, S., Little, J. D. C., Mela,
    C. Montgomery, A. Steckel, J. Choice and the
    Internet From Clickstream To Research Stream.
    Marketing Letters, 2002,Vol. 13, No. 3, pp.
    245-258

9
Usage Pattern Discovery Research Streams
  • Lins two cents on current research streams
  • Build a better site
  • For everybody system improvement (caching
    web design)
  • For individuals personalization
  • For search engines SEO
  • Know your visitors better
  • Customer behavior
  • Be a better business

10
Build a Better Site System Improvement
  • Server-side caching of web pages (association
    rules)
  • Y.-H. Wu, A.L.P. Chen, Prediction of web page
    accesses by proxy server log, World Wide Web 5
    (1) (2002) 6788
  • Preprocessing No IP discussion, sessions split
    by time-based heuristics
  • Method Sequential pattern mining
  • Data Usage
  • Contribution Use frequent sequence to predict
    candidate page,

  • personalize based on user maturity

11
Build a Better Site System Improvement
  • Improvement of general web design (AR, SP, MM)
  • Fang, X. and O. R. L. Sheng (2004). Link
    Selector A web mining approach to hyperlink
    selection for web portals. ACM Transactions on
    Internet Technology 4, 209237
  • Preprocessing No IP distinguished, sessions
    split by 25.5 minutes
  • Method Association mining
  • Data Usage Structure
  • Contribution Combine structure info. and usage
    info. to optimize portal page design
  • Where are we headed adaptive web design
  • Y. Fu, M. Creado, C. Ju, Reorganizing web sites
    based on user access patterns, in Proceedings of
    the Tenth International Conference on Information
    and Knowledge Management, ACM Press, 2001, pp.
    583585 (usage content)

12
Build a Better Site Personalization
  • Personalize the web site based on usage patterns
    (AR, Clustering)
  • A key research domain recommender systems
  • Content clustering vs. users clustering vs.
    hybrid approach
  • C. Shahabi and F. Banaei-Kashani. Ecient and
    anonymous web usage mining for web
    personalization. INFORMS Journal on Computing,
    Special Issue on Data Mining, 2002
  • Method Clustering of sessions
  • Data Client side usage data
  • Where are we headed incorporate time and web 2.0
  • Refer to Adomavicius, G., Tuzhilin, A.
    (2005). Toward the next generation of recommender
    systems A survey of the state-of-the-art and
    possible extensions. IEEE Transactions on
    Knowledge and Data Engineering, 17(6), 734749
    for a good review on recommender systems

13
Build a Better Site SEO
  • Adding usage information into PageRank
  • Kalyan Beemanapalli, Ramya Rangarajan, Jaideep
    Srivastava, Usage-Aware Average Clicks, In
    Proc. Of WebKDD 2006 KDD Workshop on Web Mining
    and Web Usage Analysis, in conjunction with the
    12th ACM SIGKDD International Conference on
    Knowledge Discovery and Data Mining (KDD 2006),
    August 20-23 2006
  • Method Association rule in spirit

14
Know your visitors betterCustomer behavior
  • A favorite research stream by marketers and MIS
    researchers
  • Statistical models are used most of the time
  • Macro-level behavior is often the focus
  • Interesting questions related to firm performance
    and profitability

15
Know your visitors betterCustomer behavior
  • Johnson, E. J., Wendy Moe, Peter S. Fader, Steven
    Bellman, and Jerry Lohse. "On the Depth and
    Dynamics of Online Search Behavior," Management
    Science, Vol. 50, No. 3, March 2004, pp. 299308
  • model an individuals tendency to search as a
    logarithmic process
  • hierarchical Bayesian model with Depth of Search
    , dynamics of search and activity of search
  • interested in the number of unique sites searched
    by each household within a given product category
  • Preprocessing Households identified by
    client-side programs, session is month-based
  • Method Statistical Modeling (log model)
  • Data Usage (search)

16
Know your visitors betterCustomer behavior
  • Moe, Wendy W. 2003. Buying, searching, or
    browsing Differentiating between online shoppers
    using in-store navigational clickstream. J.
    Consumer Psych. 13(1, 2) 2940
  • WHY do the customers visit?
  • Preprocessing Content Processing
  • Method Clustering of sessions by visiting
    behavior parameters and content parameters
  • Data Usage Content
  • Conclusion

17
Know your visitors betterCustomer behavior
  • Moe, Wendy W. 2003. Buying, searching, or
    browsing Differentiating between online shoppers
    using in-store navigational clickstream. J.
    Consumer Psych. 13(1, 2) 2940

18
Know your visitors betterCustomer behavior
  • Sismeiro, Catarina, Randolph E. Bucklin. 2004.
    Modeling Purchase Behavior at an E-Commerce Web
    Site A Task Completing Approach. Journal of
    Marketing Research. 41 (3), 306-323
  • How do the customers visit?
  • Predicts online buying by linking the purchase
    decision to what visitors do and to what they are
    exposed while at the site.
  • Preprocessing Content Processing
  • Method Statistical Modeling
  • Data Usage Content
  • Conclusion

19
Know your visitors betterCustomer behavior
  • Sismeiro, Catarina, Randolph E. Bucklin. 2004.
    Modeling Purchase Behavior at an E-Commerce Web
    Site A Task Completing Approach. Journal of
    Marketing Research. 41 (3), 306-323
  • browsing behavior (i.e., time and page views)
  • repeat visitation to the site (return and total
    number of sessions)
  • use of interactive decision aids
  • Data input effort and information gathering and
    processing
  • a series of page specific characteristics

20
Know your visitors betterCustomer behavior
  • My Research Online Customer Lifetime
  • predict an individuals tendency to stay with an
    e-tailer
  • Hybrid BG/NBD model Neural Networks
  • interested in the relationship between online
    customer lifetime and firm profitability
  • Preprocessing Households identified by
    client-side programs, session is month-based
  • Method Statistical Modeling Classification
  • Data Usage

21
Know your visitors betterCustomer behavior
  • My Research Online Customer Lifetime
  • Given N customers with visiting history (Xi, txi,
    T )
  • T the observed time period
  • Xi number of visits customer i made during T
  • txi time of the last visit made by customer i
  • Find the best fit for the following maximum
    likelihood equation to estimate the four
    parameters r, a, b and

22
Know your visitors betterCustomer behavior
  • Given r, a, b and , we can predict
  • Total number of visits during a time period t
    (starting from time 0)
  • Number of visits an individual will make in the
    future t time units Y(t) (from T1 to Tt)

23
Know your visitors betterCustomer behavior
  • My Research Online Customer Lifetime

24
Web Usage Mining The Future
Write a Comment
User Comments (0)
About PowerShow.com