Title: Evaluation for Web Mining Applications
1Evaluation for Web Mining Applications
- Bettina Berendt
- Humboldt University Berlin
- Ernestina Menasalvas
- Universidad Politécnica de Madrid
- Myra Spiliopoulou
- Otto von Guericke University Magdeburg
- www.wiwi.hu-berlin.de/berendt/Evaluation
2Evaluation
- the act of ascertaining
- the value and
- the functioning
- of an object according to specified criteria,
operationalised by measures.
? to assess concrete achievements ? to give
feedback towards improvement
3Evaluation for Web mining applications, or
Evaluation of Web applications
Is this a good Website?
4Agenda
Evaluation and Web mining
Evaluation and Web mining
Mining for evaluation perspectives and measures
A case study
Outlook Evaluation of mining
Web mining as a project towards a methodology
Evaluation and experimentation
5What is Web Mining?
- Despite its success, one problem of the current
WWW is that much of this knowledge lies dormant
in the data. - Web mining tries to overcome these problems by
applying data mining techniques to the content,
(hyperlink) structure, and usage of Web resources.
Web Mining Areas Web content mining
5
6Application problems and typicalpattern
discovery techniques
Markov chains
Prediction of next event
Sequence mining
Discovery of associated events/application objects
Association rules
Discovery of visitor groups with common
properties interests
Clustering
Discovery of visitor groups with common behaviour
Session Clustering
Characterization of visitors with respect to a
set of predefined classes
Classification
Card fraud detection
7Knowledge Discovery steps The Cross-Industry
Standard Process for Data Mining CRISP-DM
8Agenda
Evaluation and Web mining
Mining for evaluation perspectives and measures
A case study
Outlook Evaluation of mining
Web mining as a project towards a methodology
Evaluation and experimentation
9Application problems and goals (1)
- Top-level goal 1 The Web exists in order to be
used. - ? Evaluation focusses on usage.
- Goals of usage depend on stakeholder and
viewpoint.
10Application problems and goals (2)
- Stakeholders
- Site users
- Site owners / sponsors (technical, marketing,
management, ...) - Viewpoints a Web site / a collection of Web
sites or pages as ... - ... a piece of software
?
usability? - ... a distribution channel for a business or
organization ?
profitability? market analysis recommendations
for cross-selling ... - ... a collection of documents
?
frequency of use / public perception?
competition analysis - ... a medium for a given content and tasks (e.g.,
e-Learning) ? cf. distribution
channel - ... a Web of connections (e.g., a social network)
? what
properties does the network have?
11Is the site a good site? ? Is it successful?But
What does Success mean?
- Before talking of success
- Why does the site exist?
- Why should someone visit it?
- Why should someone return to it?
- After answering these questions
- Does the site satisfy its owner?
- Does the site satisfy its users?
- ALL the users?
12The object of evaluation usability
- The effectiveness, efficiency, and satisfaction
with which specified users achieve
specified goals in particular environments. - Effectiveness The accuracy and completeness with
which specified users can achieve specified goals
in particular environments. - Efficiency The resources expended in relation to
the accuracy and completeness of goals
achieved. - Satisfaction The comfort and acceptability of
the work system to its users and other people
affected by its use.
13The measures Examples of usability metrics
Satisfaction Measures
Efficiency Measures
Effectiveness Measures
Usability Objective
Rating scale for satisfaction
Time to complete a task
Percentage of goals achieved
Suitability for the Task
Rating scale for satisfaction with
"power features"
Relative efficiency compared with an expert user
Number of "power features" used
Appropriate for trained users
Rating scale for "ease of learning"
Time to learn criterion
Percentage of functions learned
Learnability
Rating scale for error handling
Time spent on correcting errors
Percentage of errors corrected
successfully
Error Tolerance
14Examples of usability measures derived from Web
mining
- Berendt Spiliopoulou (2000) sequential
patterns - Search criteria (interface)
- Selection-based most popular (? user
satisfaction), but least efficient. - Type-in least popular, most efficient
- search criteria (content)
- Location most popular
- Kralisch Berendt (2004) quasi-experimental
design, support, sequential patterns ? Search
criteria popularity is influenced by country
culture - Poblete Baeza-Yates (2004) Query clustering ?
identify the need for hyperlinks and new content - Stojanovic et al. (2002) popularity ? identify
need for new content concepts concepts to be
dropped (ontology evolution) crawler obtains
content
15- Before talking of success
- Why does the site exist?
- Why should someone visit it?
- Why should someone return to it?
- After answering these questions
- Does the site satisfy its owner?
- Does the site satisfy its users?
- ALL the users?
Business goals
Value creation
Sustainable value
Application-centric measures
User-centric measures
User types
16The object of evaluation satisfaction of
business goals
Personalisation
- 1. Sale of products/services on-line
Amazon sells books (etc) online. The site should
help the users find the most suitable books for
their needs, identify more related products of
interest and, finally purchase them in a secure
and intuitive way.
Cross/Up-Selling
Site design
Selling
2. Marketing for products/services to be acquired
off-line
Insurances, banks, application service providers
etc providers of services based on a long-term
relationship with the customer do not sell
on-line to unknown users. The site should
demonstrate to the users the quality of the
product/service and the trustworthiness of its
owner and initiate an off-line contact.
3. Reduction of internal costs, information
dissemination,
17The measures example e-marketing metrics based
on the sales process
Customer-company interaction phases
Information Acquisition
Negotiation Transaction
After Sales Support
- Ratio of persons going from one phase to the next
? positive and negative
measures - example Conversion rate customers / contacted
prospects
18Agenda
Evaluation and Web mining
Mining for evaluation perspectives and measures
A case study
Outlook Evaluation of mining
Web mining as a project towards a methodology
Evaluation and experimentation
19Objectives of the application The largest
European full multi-channel e-tailerselling
consumer electronics online in gt5000 shops
- General objectives Standard e-tailer goals
attract users/shoppers and convert them into
customers - Specific objectives assess the success of the
Web site in relation to other distribution
channels
- ? Questions of the evaluation
- What business metrics can be calculated from Web
usage data, transaction and demographic data for
determining online success? - Are there cross-channel effects between a
companys e-shop and its physical stores?
Background Internet market shares BCG 2002
Teltzrow Berendt, Proc. WebKDD 2003
Günther, Proc. 4th IBM eBusiness Conference 2003
20Outline of the KDD process
- Business understanding see previous slide
- Data
- gt 90K Web server sessions, gt 10K transaction
records 21 days in 2002 - Data understanding main step
- modelling the semantics of the site in terms of a
hierarchy of service concepts that follows the
phases of the sales process
- Data preparation
- Session IDs usual data cleaning steps
- Linking of sessions transaction information
(anonymized) - Modelling / pattern discovery
- Web metrics, cluster analysis, association rules,
sequence mining correlation analysis,
questionnaire study, qualitative market analysis - Pattern evaluation Interesting patterns
21Starting point Web life-cycle metrics,
micro-conversion rates
Cutler and Sterne (2001)
W (whole population)
S (suspects / site visitors)
nS
P (prospects / active investigators)
nP
C (customers)
Cb (abandon cart)
nC
CR (repeat customers)
CA (attrited customers)
C1 (One time Customers)
Metrics example click-through rate M2 / M1
22Extension for application-oriented success
measurement Multi-Channel Metrics
C
WM5 (paid online)
SM5 (paid in store)
SM5 (paid in store)
SM5 (paid in store)
WM5 (belong to SM5 in at least one following
transaction)
WM5 (belong to WM5 in every following
transaction)
WM5 (belong to SM5 in at least one following
transaction)
WM5 (belong to SM5 in at least one following
transaction)
C
WM6 (direct delivery)
SM6 (pick up in store)
SM6 (pick up in store)
SM6 (pick up in store)
SM6 (pick up in store)
WM6 (belong to SM6 in at least one following
transaction)
WM6 (belong to WM6 in every following
transaction)
WM6 (belong to SM6 in at least one following
transaction)
23Internal consistency of preferences payment
and delivery preferences
- Online payment ? Direct delivery (s0.27, c0.97)
lt 1/3 traditional onl.users! - Online payment ? In-store pickup (s0.02, c0.03)
- Cash on delivery ? Direct delivery (s0.02,
c0.03) - In-store payment ? In-store pickup (s0.69,
c0.94) - ? Site is primarily used to collect information.
s support, c confidence of the sequence
24Development of preferences over time
- Direct delivery ? In-store pickup in ?1 following
transaction (s0.001,c0.15) - Direct delivery ? Direct delivery in all
following transactions (s0.003,c0.85) - In-store pickup ? Direct delivery in ?1 foll.
transaction (s0.001, c0.10) () - In-store pickup ? In-store pickup in all foll.
transactions (s0.004, c0.90) - Results for payment migration are similar.
- ? 90 of repeat customers did not change
transaction preferences at all. - ? Rule () as an indicator of the development of
trust?!
25Agenda
Evaluation and Web mining
Mining for evaluation perspectives and measures
A case study
Outlook Evaluation of mining
Web mining as a project towards a methodology
Evaluation and experimentation
26Evaluation of Web mining applications, or Web
mining as a project
Is it worthwhile to do the mining project?
Are the data appropriate for the mining project?
Is the result valuable for the application?
Are the techniques appropriate for the expected
results?
Are (all) the tasks performed well?
27Evaluation, its foci, and design of evaluation
studies
Formative
Summative
Mode
understand how something works analyze strengths
and weaknesses towards improvement, give feedback
assess concrete achievements give results and
evidence
Purpose
Holistic interdependent system
Independent and dependent variables
Conceptuali-sation
Naturalistic inquiry
Experimental design
Design
Exploratory, hypothesis generating ? pattern disc.
Confirmatory, hypothesis testing
Relationship to prior knowledge
Purposeful, key informants ? in mining
interesting patterns
Random, probabilistic
Sampling
Case studies, content and pattern analysis
Descriptive and inferential statistics
Analysis
28End of Part I
Questions thus far ?
29Evaluation of Web mining applications, or Web
mining as a project
Is it worthwhile to do the mining project?
Is the result valuable for the application?
Are (all) the tasks performed well?
30For which measures are field data from Web server
logs (in)adequate data sources?
Satisfaction Measures
Efficiency Measures
Effectiveness Measures
Usability Objective
the users task / intentions ? Assumptions can be
made if there is background knowledge about site
and users
Suitability for the Task
Rating scale for satisfaction
Time to complete a task
Percentage of goals achieved
Suitability for the Task
users level of expertise ? requires (1)
target-group specific logins, (2) induction from
requested content, or (3) other methods, usually
involving reactive data collection
Rating scale for satisfaction with
"power features"
Relative efficiency compared with an expert user
Number of "power features" used
Appropriate for trained users
Definitions of what there is to learn measures
of what the users learned ? usually requires
methods involving reactive data collection
Rating scale for "ease of learning"
Time to learn criterion
Percentage of functions learned
Learnability
Definition of what an error is, or what indicates
an error ? usually requires a detailed knowledge
of users tasks and intentions, i.e. reactive
data collection
Rating scale for error handling
Time spent on correcting errors
Percentage of errors corrected
successfully
Error Tolerance
InternationializationAccessibility Personalization