Title: Web Usage Mining: An Overview
1Web Usage Mining An Overview
- Lin Lin
- Department of Management
- Lehigh University
- Jan. 30th
2Agenda
- Web Usage Mining Definition
- Research Issues in Web Usage Mining
- Current Research in Web Usage Mining
- Going Forward
3Web Usage Mining A Definition
- The process of applying data mining techniques to
the discovery of usage patterns from Web data,
targeted towards various applications - Different from content mining structure mining
- (Adamic, L. A., and Adar, E. 2003.
Friends and neighbors on the web. Social
Networks 25(3)211230.)
4Web Usage Mining Data Source
- Typical data sources for web usage mining are
- Web structure data (site map, links, etc.)
- Web content data
- User profile (may not be available)
- Web log (web usage data, clickstream data)
5Web Usage Mining Procedure
6Preprocessing Challenges
- WHO are the users?
- IP vs. real people
- HOW LONG did the users stay?
- Measuring session time (L. Catledge and J.
Pitkow. Characterizing browsing behaviors on the
world wide web. Computer Networks and ISDN
Systems, 27(6), 1995)(Berendt, B. Mobasher, M.
Nakagawa, and M. Spiliopoulou. The impact of site
structure and user environment on session
reconstruction in web usage analysis. In
Proceedings of the 4th WebKDD 2002 Workshop, at
the ACM-SIGKDD Conference on Knowledge Discovery
in Databases (KDD2002), Edmonton, Alberta,
Canada, July 2002. - WHERE did the users go?
- Server side vs. Client side
- WHAT did the users view?
- Content processingMoe, Wendy W. 2003. Buying,
searching, or browsing Differentiating between
online shoppers using in-store navigational
click-stream. J. Consumer Psych. 13(1, 2) 2940.
--------------------------------------------------
------------------------------------- For the
best review on preprocessing methods, refer to
R. Cooley, B. Mobasher, J. Srivastava, Data
preparation for mining world wide web browsing
patterns, Knowledge and Information Systems 1 (1)
(1999) 532
7Usage Pattern Discovery Methods
- Statistical Methods (including dependency
modeling and stochastic modeling) - Association Rule Mining
- Clustering (user cluster vs. page cluster)
- Classification
8Usage Pattern Discovery Research Streams
- Why am I interested in web usage mining? (a.k.a.,
whats the big deal?) - Blattberg, Robert C. and John Deighton (1991),
"Interactive Marketing Exploring the Age of
Addressability," Sloane Management Review, 33
1), 5-14 - Ghosh, S. 1998. Making business sense of the
Internet. Harvard Business Review 76(2) 126135 - Bucklin R. E., Lattin, J. M., Ansari, A., Bell,
D., Coupey, E. Gupta, S., Little, J. D. C., Mela,
C. Montgomery, A. Steckel, J. Choice and the
Internet From Clickstream To Research Stream.
Marketing Letters, 2002,Vol. 13, No. 3, pp.
245-258
9Usage Pattern Discovery Research Streams
- Lins two cents on current research streams
- Build a better site
- For everybody system improvement (caching
web design) - For individuals personalization
- For search engines SEO
- Know your visitors better
- Customer behavior
- Be a better business
10Build a Better Site System Improvement
- Server-side caching of web pages (association
rules) - Y.-H. Wu, A.L.P. Chen, Prediction of web page
accesses by proxy server log, World Wide Web 5
(1) (2002) 6788 - Preprocessing No IP discussion, sessions split
by time-based heuristics - Method Sequential pattern mining
- Data Usage
- Contribution Use frequent sequence to predict
candidate page, -
personalize based on user maturity
11Build a Better Site System Improvement
- Improvement of general web design (AR, SP, MM)
- Fang, X. and O. R. L. Sheng (2004). Link
Selector A web mining approach to hyperlink
selection for web portals. ACM Transactions on
Internet Technology 4, 209237 - Preprocessing No IP distinguished, sessions
split by 25.5 minutes - Method Association mining
- Data Usage Structure
- Contribution Combine structure info. and usage
info. to optimize portal page design - Where are we headed adaptive web design
- Y. Fu, M. Creado, C. Ju, Reorganizing web sites
based on user access patterns, in Proceedings of
the Tenth International Conference on Information
and Knowledge Management, ACM Press, 2001, pp.
583585 (usage content)
12Build a Better Site Personalization
- Personalize the web site based on usage patterns
(AR, Clustering) - A key research domain recommender systems
- Content clustering vs. users clustering vs.
hybrid approach - C. Shahabi and F. Banaei-Kashani. Ecient and
anonymous web usage mining for web
personalization. INFORMS Journal on Computing,
Special Issue on Data Mining, 2002 - Method Clustering of sessions
- Data Client side usage data
- Where are we headed incorporate time and web 2.0
- Refer to Adomavicius, G., Tuzhilin, A.
(2005). Toward the next generation of recommender
systems A survey of the state-of-the-art and
possible extensions. IEEE Transactions on
Knowledge and Data Engineering, 17(6), 734749
for a good review on recommender systems
13Build a Better Site SEO
- Adding usage information into PageRank
- Kalyan Beemanapalli, Ramya Rangarajan, Jaideep
Srivastava, Usage-Aware Average Clicks, In
Proc. Of WebKDD 2006 KDD Workshop on Web Mining
and Web Usage Analysis, in conjunction with the
12th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD 2006),
August 20-23 2006 - Method Association rule in spirit
14Know your visitors betterCustomer behavior
- A favorite research stream by marketers and MIS
researchers - Statistical models are used most of the time
- Macro-level behavior is often the focus
- Interesting questions related to firm performance
and profitability
15Know your visitors betterCustomer behavior
- Johnson, E. J., Wendy Moe, Peter S. Fader, Steven
Bellman, and Jerry Lohse. "On the Depth and
Dynamics of Online Search Behavior," Management
Science, Vol. 50, No. 3, March 2004, pp. 299308 - model an individuals tendency to search as a
logarithmic process - hierarchical Bayesian model with Depth of Search
, dynamics of search and activity of search - interested in the number of unique sites searched
by each household within a given product category - Preprocessing Households identified by
client-side programs, session is month-based - Method Statistical Modeling (log model)
- Data Usage (search)
16Know your visitors betterCustomer behavior
- Moe, Wendy W. 2003. Buying, searching, or
browsing Differentiating between online shoppers
using in-store navigational clickstream. J.
Consumer Psych. 13(1, 2) 2940 - WHY do the customers visit?
- Preprocessing Content Processing
- Method Clustering of sessions by visiting
behavior parameters and content parameters - Data Usage Content
- Conclusion
17Know your visitors betterCustomer behavior
- Moe, Wendy W. 2003. Buying, searching, or
browsing Differentiating between online shoppers
using in-store navigational clickstream. J.
Consumer Psych. 13(1, 2) 2940
18Know your visitors betterCustomer behavior
- Sismeiro, Catarina, Randolph E. Bucklin. 2004.
Modeling Purchase Behavior at an E-Commerce Web
Site A Task Completing Approach. Journal of
Marketing Research. 41 (3), 306-323 - How do the customers visit?
- Predicts online buying by linking the purchase
decision to what visitors do and to what they are
exposed while at the site. - Preprocessing Content Processing
- Method Statistical Modeling
- Data Usage Content
- Conclusion
19Know your visitors betterCustomer behavior
- Sismeiro, Catarina, Randolph E. Bucklin. 2004.
Modeling Purchase Behavior at an E-Commerce Web
Site A Task Completing Approach. Journal of
Marketing Research. 41 (3), 306-323 - browsing behavior (i.e., time and page views)
- repeat visitation to the site (return and total
number of sessions) - use of interactive decision aids
- Data input effort and information gathering and
processing - a series of page specific characteristics
20Know your visitors betterCustomer behavior
- My Research Online Customer Lifetime
- predict an individuals tendency to stay with an
e-tailer - Hybrid BG/NBD model Neural Networks
- interested in the relationship between online
customer lifetime and firm profitability - Preprocessing Households identified by
client-side programs, session is month-based - Method Statistical Modeling Classification
- Data Usage
21Know your visitors betterCustomer behavior
- My Research Online Customer Lifetime
- Given N customers with visiting history (Xi, txi,
T ) - T the observed time period
- Xi number of visits customer i made during T
- txi time of the last visit made by customer i
- Find the best fit for the following maximum
likelihood equation to estimate the four
parameters r, a, b and
22Know your visitors betterCustomer behavior
- Given r, a, b and , we can predict
- Total number of visits during a time period t
(starting from time 0) - Number of visits an individual will make in the
future t time units Y(t) (from T1 to Tt)
23Know your visitors betterCustomer behavior
- My Research Online Customer Lifetime
24Web Usage Mining The Future