Kein Folientitel

About This Presentation

Title:

Kein Folientitel

Description:

Empirical Evaluations of Organizational Memory Information Systems Felix-Robinson Aschoff & Ludger van Elst – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 46

Provided by: MOE116

Category:

more less

Transcript and Presenter's Notes

Title: Kein Folientitel

1
Empirical Evaluations of Organizational Memory
Information Systems
Felix-Robinson Aschoff Ludger van Elst
2
Empirical Evaluations of OMIS
1. Evaluation Definition and general
approaches 2. Contributions from related
fields 3. Implications for FRODO
3
What is Empirical Evaluation?
Empirical evaluation refers to the appraisal of a
theory by observation in experiments. Chin,
2001
4
Experiment or not?
Experiment less controlled exploratory study
Advantages influencing variables can be controlled causal statements can be infered more realistic (higher external validity) can be easier and faster to design
Problems artificial transfer to normal user context requires concrete hypothesis to find subjects which participate and pay them influencing variables cannot be controlled cooperation with people during their everyday work
5
Artificial Intelligence vs Intelligence
Amplification
AI Expertsystems IA OMIS FRODO
Development Goal mind-imitating user-independent system working by itself hybrid solution cooperation between system and human user constant interaction
Evaluation focus on technical evaluation, if system meets its requirements focus must be on cooperation of system and user human-in-the-loop studies
6
Empirical Evaluations of OMIS
1. Evaluation Definition and general approaches
2. Contributions from related fields
2. Contributions from related fields
3. Implications for FRODO
7
Contributions from related fields
1. Knowledge Engineering 1.1 General
Approaches - The Sysiphus Initiative - High
Performance Knowledge Bases - Essential Theory
Approach - Critical Success Metrics 1.2
Knowledge Acquisition 1.3 Ontologies 2. Human
Computer Interaction 3. Information
Retrieval 4. Software Engineering
(Goal-Question-Metric Techniqe)
8
The Sisyphus Initiative
A series of challenge problems for the
development of KBS by different research groups
with a focus on PSM Sisyphus-I Room
allocation Sisyphus-II Elevator
configuration Sisyphus-III Lunar igneous rock
classification Sisyphus-IVIntegration over the
web Sisyphus-V High quality knowledge base
initiative (hQkb) (Menzies, 99)
9
Problems of the Sisyphus Initiative

Sisyphus I II
No higher referees
No common metrics
Focus on modelling of knowledge. Effort to build
a model of the
domain knowledge was usually not recorded.
Important aspects like the accumulation of
knowledge and cost-
effectiveness calculation were not paid any
attention.
Sisyphus III
Funding
Willingness of researchers to participate

...none of the Sisyphus experiments have yielded
much evaluation information (though at the
time of this writing Sisyphus-III is not
complete) (Shadbolt et al 99)
10
High Performance Knowledge Bases

run by the Defence Advanced Research Project
Agency (DARPA)
in the USA
goal to increase the rate at which knowledge
can be modified in a KBS
three groups of researchers
1) challenge problem developers
2) technology developers
3) integration teams

11
HPKB Challenge Problem

International Crisis Scenario in the Persian
Gulf
Hostilities between Saudia Arabia and Iran
Iran closes the Strait of Hormuz to international
shipping
Integration of the following KBs
the HPKB upper-level ontology (Cycorp)
the World Fact Book knowledge base (Central
Intelligence Agency)
the Units and Measures Ontology (Stanford)
Example Questions the system should be able to
answer
With what weapons is Iran capable of firing upon
tankers in the Strait of H.?
What risk would Iran face in closing the strait
to shipping?
Answer key to second question contains for
expample
Economic sanctions from Saudi Arabia, GCC, U.S.,
UN,, because Iran
violates an international norm promoting freedom
of the seas.
Source The Convention on the Law of the Sea

12
HPKB Evaluation

Systems answers were rated on four official
criteria
by challenge problem developers and subject
matter experts
Scale 0 3
The correctness of the answer
The quality of the explanation of the answer
The completeness and quality of the cited sources
The quality of the representation of the question
two phase, test-retest schedule

13
Essential Theory Approach
Menzies van Harmelen, 1999
Different schools of knowledge engineering
14
Technical evaluation of ontologies
Gòmez-Pérez, 1999
1) Consistency 2) Completeness 3)
Conciseness 4) Expandability 5) Sensitiveness

Errors in developing taxonomies
Circularity errors
Partition errors
Redundancy errors
Grammatial errors
Semantic errors
Incompleteness errors

15
Related Fields

Knowledge Acquisition
Shadbolt, N., O'Hara, K. Crow, L. (1999).The
experimental evaluation
of knowledge acquisition techniques and methods
history, problems and
new directions. International Journal of
Human-Computer Studies, 51,
729-755.
Human Computer Interaction
HCI is the study of how people design,
implement, and use
interactive computer systems, and how computers
affect
individuals and society. (Myers et al. 1996)
facilitate interaction between users and
computer systems
make computers useful to a wider population
Information Retrieval
Recall and Precision
e.g. key-word based IR vs. ontology-enhanced IR
(Aitken Reid, 2000)

16
Empirical Evaluations of OMIS
1. Evaluation Definition and general approaches
2. Contributions from related fields
3. Implications for FRODO
3. Implications for FRODO
17
Guideline for Evaluation

Formulate the main purposes of your framework or
application.
Formulate precise hypothesis.
Define clear performance metrics.
Standardize the measurement of your performance
metrics.
Be thourough with designing your (experimental)
research design.
Consider the use of inference statistics.
(Cohen, 1995)
Meet common standards for the report of your
results.

18
Evaluation of Frameworks
Frameworks are general in scope and designed to
cover a wide range of tasks and problems.
The systematic control of influencing variables
becomes very difficult
Only a whole series of experiments across a
number of different tasks and a number of
different domains could controll for all the
factors that would be essential to take into
account. Shadbolt et al. 1999

Approaches
Sisyphus Initiative
Essential Theory Approach (Menzies van
Harmelen, 1999)

19
Problems with the Evaluation of FRODO

Difficulty to control influencing variables when
evaluating
entire frameworks
Frodo is not a running system (yet)
Only few prototypic implementations that are
based on
FRODO
Frodo is probably underspecified for evaluation
in many
areas

20
Goal-Question-Metric Technique
Basili, Caldiera Rombach 1994
21
Informal FRODO Projekt Goals

FRODO will provide a flexible, scalable
framework for evolutionary growth
for distributed OMs
FRODO will provide a comprehensive toolkit for
the automatic or semi-
automatic construction and maintenance of
domain ontologies
FRODO will improve information delivery by the
OM by developing more
integrated and easier adaptable DAU
techniques
FRODO will develop a methodology and tool for
business-process oriented
knowledge management relying on the notion of
weakly-structured workflows
FRODO is based on the assumption that a hybrid
solution where the system
supports humans in the decision-making
process is more appropriate for
OMIS than mind-imitating AI systems (IAgtAI)

22
Task Type and Workflows
Task Type
negotiation co-decisison making
projects workflow-processes
unique low volume communication intensive
repetitive high volume heads down
23
FRODO GQM Goal concerning workflows
GQM of FRODO
Purpose Compare
Quality issue efficiency
Object (process) task completion with workflows
Viewpoint viewpoint of the end-user
in the context of knowledge intensive tasks
Conceptual level (goals)

GQM-Goals should specify
a Purpose
a quality Issue
a measurement Object
a Viewpoint

Object of Measurement can be
Products
Processes
Resources

24
GQM Abstraction Sheet for FRODO
Quality factors Variation factors
efficiency of task completion Task types as described in Abecker 2001 (dimension negotiation, co-decision making, projects, workflow-processes)
Baseline hypothesis Impact of variation factors
the experimental design will provide a controll group for comparison KiTs are more succesfully supported by weakly-structured flexible workflows based on FRODO than by a-priori strictly structured workflows.
25
GQM Questions and Metrics
What is the efficiency of task completion using
FRODO weakly-structured flexible workflows for
KiTs? What is the efficiency of task completion
using a-priori strictly-structured workflows for
KiTs? What is the efficiency or task completion
using FRODO weakly-structured flexible workflows
for classical workflow processes? Efficiency
of task completion quality of result expert
judgement divided by the time needed for
completion of the task. user-friendliness judged
by users
26
Hypothesis
H1 For KiTs weakly-structured flexible workflows
as proposed by FRODO will yield higher efficiency
of task completion than a-priori
strictly-structured workflows. H2 For
classical workflow processes FRODO
weakly-structured flexible workflows will be as
good as a-priori strictly-structured workflows or
better.
27
Experimental Design
2 x 2 factorial experiment independent variables
workflows task type Dependent
variable efficiency of task completion
workflow workflow
task type weakly-struct. flex. wf/ KiT strictly-struct. wf/ KiT
task type weakly-struct. flex. wf/ classical wf process strict-struct. wf/ classical wf process
Within Subject Design vs. Between Subject
Design Randomized Groups (15-20 for statistical
inference) Possibilities Degradation Studies,
Benchmarking
28
Empirical Evaluation of Organizational Memory
Information Systems Felix-Robinson Aschoff
Ludger van Elst
1 Introduction 2 Contributions from Related
Fields 2.1 Knowledge Engineering 2.1.1
Generel Methods and Guidelines (Essential
Theories, Critical Success Metrics, Sisyphus,
HPKB) 2.1.2 Knowledge Acquisition 2.1.3
Ontologies 2.2 Human Computer Interaction 2.3
Information Retrieval 2.4 Software Engineering
(Goal-Question-Metric Technique) 3 Implications
for Organizational Memory Information
Systems 3.1 Implications for the evaluation of
OMIS 3.2 Relevant aspects of OMs for
evaluations and rules of thumb for
conducting evaluative research 3.3 Preliminary
sketch of an evaluation of FRODO References Appe
ndix A Technical evaluation of Ontologies
29
References
Aitken, S. Reid, S. (2000). Evaluation of an
ontology-based information retrieval tool.
Proceedings of 14th European Conference on
Artificial Intelligence. http//delicias.dia.fi.
upm.es/WORKSHOP/ECAI00/accepted-papers.html
Basili, V.R., Caldiera, G. Rombach, H.D.
(1994). Goal question metric paradigm. In John J.
Marciniak, editor, Encyclopedia of Software
Engineering, volume 1, 528532. John Wiley
Sons Berger, B., Burton, A.M., Christiansen, T.,
Corbridge, C., Reichelt, H. Shadbolt,
N.R.(1989) Evaluation criteria for knowledge
acquisition, ACKnowledgeproject deliverable
ACK-UoN-T4.1-DL-001B. University of Nottingham,
Nottingham Chin, D. N. (2001). Empirical
evaluation of user models and user-adapted
systems. User Modeling and User-Adapted
Interaction, 11 181-194 Cohen, P. (1995).
Empirical Methods for Artificial Intelligence.
Cambridge MIT Press. Cohen, P.R., Schrag,R.,
Jones E., Pease, A., Lin, A., Starr, B., Easter,
D., Gunning D., Burke, M. (1998). The DARPA
high performance knowledge bases project.
Artificial Intelligence Magazine. Vol. 19, No. 4,
pp.25-49. Gómez-Pérez, A. (1999). Evaluation of
taxonomic knowledge in ontologies and knowledge
bases. Proceedings of KAW'99. http//sern.ucalga
ry.ca/KSI/KAW/KAW99/papers.html Grüninger, M.
Fox, M.S. (1995) Methodology for the design and
evaluation of ontologies, Workshop on Basic
Ontological Issues in Knowledge Sharing,
IJCAI-95, Montreal. Hays, W. L. (1994).
Statistics. Orlando Harcourt Brace. Kagolovsky,
Y., Moehr, J.R. (2000). Evaluation of Information
Retrieval Old problems and new perspectives.
Proceedings of 8th International Congress on
Medical Librarianship. http//www.icml.org/tuesday
/ir/kagalovosy.htm Martin, D.W. (1995). Doing
Psychological Experiments. Pacific Grove
Brooks/Cole. Menzies, T. (1999a). Critical sucess
metrics evaluation at the business level.
International Journal of Human-Computer Studies,
51, 783-799. Menzies, T. (1999b). hQkb - The high
quality knowledge base initiative (Sisyphus V
learning design assessment knowledge).
Proceedings of KAW'99. http//sern.ucalgary.ca/K
SI/KAW/KAW99/papers.html Menzies, T. van
Harmelen, F. (1999). Editorial Evaluating
knowledge engineering techniques. International
Journal of Human-Computer Studies, 51,
715-727. Myers, B., Hollan, J. Cruz, I. (Ed.)
(1996). Strategic directions in human computer
interaction. ACM Computing Surveys, 28, 4 Nick,
M., Althoff, K., Tautz, C. (1999). Facilitating
the practical evaluation of knowledge-based
systems and organizational memories using the
goal-question-metric technique. Proceedings of
KAW 99. http//sern.ucalgary.ca/KSI/KAW/KAW99/pap
ers.html Shadbolt, N., O'Hara, K. Crow, L.
(1999).The experimental evaluation of knowledge
acquisition techniques and methods history,
problems and new directions. International
Journal of Human-Computer Studies, 51,
729-755. Tallis, M., Kim, J., Gil, Y. (1999).
User studies of knowledge acquisition tools
methodology and lessons learned. Proceedings of
KAW 99 http//sern.ucalgary.ca/KSI/KAW/KAW99/pa
pers.html Tennison, J., OHara, K., Shadbolt, N.
(1999) Evaluating KA tools Lessons from an
experimental evaluation of APECKS. Proceedings of
KAW99 http//sern.ucalgary.ca/KSI/KAW/KAW99/pap
ers/Tennison1/
30
Tasks for Workflow Evaluation
Possibles Tasks for workflow evaluation
experiment KiT Please write a report about
your personal greatest learning
achievements during the last semester. Find
sources related to these scientific
areas in the Internet. Prepare a Power Point
Presentation. To help you with these
task you will be provided with FRODO
weakly-structured wf / classical workflow Simple
structured task Please implement Netscape on
your computer and use the Internet to
find all universities in Iowa that offer
computer sciences. Use e-mail to ask for further
information. To help you with these task
you will be provided with FRODO
weakly-structured wf / classical workflow
31
(No Transcript)
32
GQM Goals for CBR-PEB
GQM of CBR-PEB Nick, Althoff, Tautz, 1999 GOAL 2 Economic utility
Analyze retrieved information
for the purpose of Monitoring
with respect to Economic utility
From the viewpoint of CBR system developers
in the context of Decision supp. for CBR devel.
Conceptual level (goals)

GQM-Goals should specify
a Purpose
a quality Issue
a measurement Object
a Viewpoint

Object of Measurement can be
Products
Processes
Resources

33
GQM Abstraction Sheet for CBR-PEB
Goal 2 Economic Utility for CBR - PEB
Quality factors Variation factors
1. Similarity of retrieved information as modeled in CBR-PEB (Q-12) 2. Degree of maturity (desiredmax.) development, prototype, pilot use (Q-13) ... 1. amount of background knowledge a. number of attributes (Q-8.1.1) ... 2. Case origin university, industrial research, industry ...
Baseline hypothesis Impact of variation factors
1.M.M.0.2 N.N.0.5 (scale 0..1) ... The estimates are on average. 1. The higher the amount of backround knowledge, the higher the similarity. (Q-8) 2. The more industrial the case origin, the higher the degree of maturity. (Q-9)...
34
GQM Questions and Metrics
GQM plan for CBR-PEB Goal 2 Economic Utility
Q-9 What is the impact of the case origin on the
degree of maturity? Q-9.1 What
is the case origin ? M-9.1.1 per retrieval
attempt for each chosen case case
origin university, industrial research,
industry Q-9.2 What is the degree of maturity
of the system? M-9.2.1 per retrieval attempt
for each chosen case case
attribute status prototype, being
developed, pilot system, application in
practical use unknown
35
FRODO GQM-Goal concerning Ontologies
For the circumstances FRODO is designed for
hybrid solutions are more successful than AI
solutions Purpose Compare Issue the
efficiency of Object (process) ontology
construction and use with respect to Stability,
Sharing Scope, Formality of Information Viewpoint
from the users viewpoint
36
GQM Abstraction Sheet for FRODO (Ont.)
Quality factors Variation factors
efficiency of ontology construction and use Sharing Scope Stability Formality
Baseline hypothesis Impact of variation factors
the experimental design will provide a controll group for comparison high Sharing Scope, medium Stability, low Formality -gtFRODO more successf. low Sharing Scope, high Stability, high Formality -gtAI more successful
37
GQM Questions and Metrics
What is the efficiency of the ontology
construction and use process using FRODO for a
situation with high sharing scope, medium
stability and low formality? What is the
efficiency of the ontology construction and use
process using FRODO for a situation with low
sharing scope, high stability and high
Formality? What is the efficiency of the
ontology construction and use process using AI
systems for these situations. Metrics efficiency
of ontology construction number of definitions
/ time efficiency of ontology use Information
Retrieval (Recall and Precision)
38
Hypothesis
H1 for Situation 1 (high sharing scope, medium
stability, low formality) FRODO will yield a
higher efficiency of ontology construction and
use. H2 for Situation 2 (low sharing scope,
high stability and high formality) an AI system
will yield higher efficiency of ontology
construction and use.
39
Experimental Design
2 x 2 factorial experiment independent variables
Situation (1/2) Systems
(FRODO/AI) Dependent variable efficiency of
ontology construction and use
Situation Situation
System Situation 1/ FRODO Situation 2/ FRODO
System Situation 1/ AI System Situation 2/ AI System
Within Subject Design vs. Between Subject
Design Randomized Groups (15-20 for statistical
inference)
40
Big evaluation versus small evaluation
Van Harmelen, 98

Distinguish different types of evaluation
Big evaluation evaluation of KA/KE
methodologies
Small evaluation evaluation of KA/KE
components (e.g. a particular PSM)
Micro evaluation evaluation of KA/KE product
(e.g. a single system)
Some are more interesting than others
Big evaluation is impossible to control
Micro evaluation is impossible to generalize
Small evaluation might just be the only option

41
Knowledge Acquisition

Problems with the Evaluation of the KA process
(Shadbolt et al, 1999)
the availability of human experts
the need for a gold standard of knowledge
the question of how many different domains and
tasks should be
included
the difficulty of isolating the value-added of a
single technique or
tool
how to quantify knowledge and knowledge
engineering effort

42
Knowledge Acquisition
4) the difficulty of isolating the value-added
of a single technique or tool

Conduct a series of experiments
Test different implementations of the same
techniqe against each other or against a paper
and pencil version
Test groups of tools in complementary pairings or
different orderings of the same set of tools
Test the value of single sessions against
multiple sessions and the effect of feedback in
multiple sessions
Exploit techniques from the evaluation of
standard software to control for effects from
interface, implementation etc.
Problem Scale-up of experimental programme

43
Essential Theory Approach

Identify a process of interest.
Create an essential theory t for that process.
Identify some competing process description, ?
T.
Design a study that explores core pathways in
both ? T and T.
Acknowledge that your study may not be
definitive.

Advantage Broad conceptual approach results
are of interest for the entire
comunity Problem Interpretation of results is
difficult (due to KE school or due to concrete
technology like implementation, interface etc?)
44
Three Aspects of Ontology Evaluation

Three aspects of evaluating ontologies
the process of constructing the ontology
the technical evaluation
end user assessment and ontology-user interaction

45
Assessment and Ontology-User Interaction
Assessment is focused on judging the
understanding, usability, usefulness,
abstraction, quality and portability of the
definitions by the users-point of view.
(Gómez-Pérez, 1999)