Kein Folientitel - PowerPoint PPT Presentation

About This Presentation
Title:

Kein Folientitel

Description:

Empirical Evaluations of Organizational Memory Information Systems Felix-Robinson Aschoff & Ludger van Elst – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 46
Provided by: MOE116
Category:

less

Transcript and Presenter's Notes

Title: Kein Folientitel


1
Empirical Evaluations of Organizational Memory
Information Systems
Felix-Robinson Aschoff Ludger van Elst
2
Empirical Evaluations of OMIS
1. Evaluation Definition and general
approaches 2. Contributions from related
fields 3. Implications for FRODO
3
What is Empirical Evaluation?
Empirical evaluation refers to the appraisal of a
theory by observation in experiments. Chin,
2001
4
Experiment or not?
Experiment less controlled exploratory study
Advantages influencing variables can be controlled causal statements can be infered more realistic (higher external validity) can be easier and faster to design
Problems artificial transfer to normal user context requires concrete hypothesis to find subjects which participate and pay them influencing variables cannot be controlled cooperation with people during their everyday work
5
Artificial Intelligence vs Intelligence
Amplification
AI Expertsystems IA OMIS FRODO
Development Goal mind-imitating user-independent system working by itself hybrid solution cooperation between system and human user constant interaction
Evaluation focus on technical evaluation, if system meets its requirements focus must be on cooperation of system and user human-in-the-loop studies
6
Empirical Evaluations of OMIS
1. Evaluation Definition and general approaches
2. Contributions from related fields
2. Contributions from related fields
3. Implications for FRODO
7
Contributions from related fields
1. Knowledge Engineering 1.1 General
Approaches - The Sysiphus Initiative - High
Performance Knowledge Bases - Essential Theory
Approach - Critical Success Metrics 1.2
Knowledge Acquisition 1.3 Ontologies 2. Human
Computer Interaction 3. Information
Retrieval 4. Software Engineering
(Goal-Question-Metric Techniqe)
8
The Sisyphus Initiative
A series of challenge problems for the
development of KBS by different research groups
with a focus on PSM Sisyphus-I Room
allocation Sisyphus-II Elevator
configuration Sisyphus-III Lunar igneous rock
classification Sisyphus-IVIntegration over the
web Sisyphus-V High quality knowledge base
initiative (hQkb) (Menzies, 99)
9
Problems of the Sisyphus Initiative
  • Sisyphus I II
  • No higher referees
  • No common metrics
  • Focus on modelling of knowledge. Effort to build
    a model of the
  • domain knowledge was usually not recorded.
  • Important aspects like the accumulation of
    knowledge and cost-
  • effectiveness calculation were not paid any
    attention.
  • Sisyphus III
  • Funding
  • Willingness of researchers to participate

...none of the Sisyphus experiments have yielded
much evaluation information (though at the
time of this writing Sisyphus-III is not
complete) (Shadbolt et al 99)
10
High Performance Knowledge Bases
  • run by the Defence Advanced Research Project
    Agency (DARPA)
  • in the USA
  • goal to increase the rate at which knowledge
    can be modified in a KBS
  • three groups of researchers
  • 1) challenge problem developers
  • 2) technology developers
  • 3) integration teams

11
HPKB Challenge Problem
  • International Crisis Scenario in the Persian
    Gulf
  • Hostilities between Saudia Arabia and Iran
  • Iran closes the Strait of Hormuz to international
    shipping
  • Integration of the following KBs
  • the HPKB upper-level ontology (Cycorp)
  • the World Fact Book knowledge base (Central
    Intelligence Agency)
  • the Units and Measures Ontology (Stanford)
  • Example Questions the system should be able to
    answer
  • With what weapons is Iran capable of firing upon
    tankers in the Strait of H.?
  • What risk would Iran face in closing the strait
    to shipping?
  • Answer key to second question contains for
    expample
  • Economic sanctions from Saudi Arabia, GCC, U.S.,
    UN,, because Iran
  • violates an international norm promoting freedom
    of the seas.
  • Source The Convention on the Law of the Sea

12
HPKB Evaluation
  • Systems answers were rated on four official
    criteria
  • by challenge problem developers and subject
    matter experts
  • Scale 0 3
  • The correctness of the answer
  • The quality of the explanation of the answer
  • The completeness and quality of the cited sources
  • The quality of the representation of the question
  • two phase, test-retest schedule

13
Essential Theory Approach
Menzies van Harmelen, 1999
Different schools of knowledge engineering
14
Technical evaluation of ontologies
Gòmez-Pérez, 1999
1) Consistency 2) Completeness 3)
Conciseness 4) Expandability 5) Sensitiveness
  • Errors in developing taxonomies
  • Circularity errors
  • Partition errors
  • Redundancy errors
  • Grammatial errors
  • Semantic errors
  • Incompleteness errors

15
Related Fields
  • Knowledge Acquisition
  • Shadbolt, N., O'Hara, K. Crow, L. (1999).The
    experimental evaluation
  • of knowledge acquisition techniques and methods
    history, problems and
  • new directions. International Journal of
    Human-Computer Studies, 51,
  • 729-755.
  • Human Computer Interaction
  • HCI is the study of how people design,
    implement, and use
  • interactive computer systems, and how computers
    affect
  • individuals and society. (Myers et al. 1996)
  • facilitate interaction between users and
    computer systems
  • make computers useful to a wider population
  • Information Retrieval
  • Recall and Precision
  • e.g. key-word based IR vs. ontology-enhanced IR
  • (Aitken Reid, 2000)

16
Empirical Evaluations of OMIS
1. Evaluation Definition and general approaches
2. Contributions from related fields
3. Implications for FRODO
3. Implications for FRODO
17
Guideline for Evaluation
  • Formulate the main purposes of your framework or
    application.
  • Formulate precise hypothesis.
  • Define clear performance metrics.
  • Standardize the measurement of your performance
    metrics.
  • Be thourough with designing your (experimental)
    research design.
  • Consider the use of inference statistics.
    (Cohen, 1995)
  • Meet common standards for the report of your
    results.

18
Evaluation of Frameworks
Frameworks are general in scope and designed to
cover a wide range of tasks and problems.
The systematic control of influencing variables
becomes very difficult
Only a whole series of experiments across a
number of different tasks and a number of
different domains could controll for all the
factors that would be essential to take into
account. Shadbolt et al. 1999
  • Approaches
  • Sisyphus Initiative
  • Essential Theory Approach (Menzies van
    Harmelen, 1999)

19
Problems with the Evaluation of FRODO
  • Difficulty to control influencing variables when
    evaluating
  • entire frameworks
  • Frodo is not a running system (yet)
  • Only few prototypic implementations that are
    based on
  • FRODO
  • Frodo is probably underspecified for evaluation
    in many
  • areas

20
Goal-Question-Metric Technique
Basili, Caldiera Rombach 1994
21
Informal FRODO Projekt Goals
  • FRODO will provide a flexible, scalable
    framework for evolutionary growth
  • for distributed OMs
  • FRODO will provide a comprehensive toolkit for
    the automatic or semi-
  • automatic construction and maintenance of
    domain ontologies
  • FRODO will improve information delivery by the
    OM by developing more
  • integrated and easier adaptable DAU
    techniques
  • FRODO will develop a methodology and tool for
    business-process oriented
  • knowledge management relying on the notion of
    weakly-structured workflows
  • FRODO is based on the assumption that a hybrid
    solution where the system
  • supports humans in the decision-making
    process is more appropriate for
  • OMIS than mind-imitating AI systems (IAgtAI)

22
Task Type and Workflows
Task Type
negotiation co-decisison making
projects workflow-processes
unique low volume communication intensive
repetitive high volume heads down
23
FRODO GQM Goal concerning workflows
GQM of FRODO
Purpose Compare
Quality issue efficiency
Object (process) task completion with workflows
Viewpoint viewpoint of the end-user
in the context of knowledge intensive tasks
Conceptual level (goals)
  • GQM-Goals should specify
  • a Purpose
  • a quality Issue
  • a measurement Object
  • a Viewpoint
  • Object of Measurement can be
  • Products
  • Processes
  • Resources

24
GQM Abstraction Sheet for FRODO
Quality factors Variation factors
efficiency of task completion Task types as described in Abecker 2001 (dimension negotiation, co-decision making, projects, workflow-processes)
Baseline hypothesis Impact of variation factors
the experimental design will provide a controll group for comparison KiTs are more succesfully supported by weakly-structured flexible workflows based on FRODO than by a-priori strictly structured workflows.
25
GQM Questions and Metrics
What is the efficiency of task completion using
FRODO weakly-structured flexible workflows for
KiTs? What is the efficiency of task completion
using a-priori strictly-structured workflows for
KiTs? What is the efficiency or task completion
using FRODO weakly-structured flexible workflows
for classical workflow processes? Efficiency
of task completion quality of result expert
judgement divided by the time needed for
completion of the task. user-friendliness judged
by users
26
Hypothesis
H1 For KiTs weakly-structured flexible workflows
as proposed by FRODO will yield higher efficiency
of task completion than a-priori
strictly-structured workflows. H2 For
classical workflow processes FRODO
weakly-structured flexible workflows will be as
good as a-priori strictly-structured workflows or
better.
27
Experimental Design
2 x 2 factorial experiment independent variables
workflows task type Dependent
variable efficiency of task completion
workflow workflow
task type weakly-struct. flex. wf/ KiT strictly-struct. wf/ KiT
task type weakly-struct. flex. wf/ classical wf process strict-struct. wf/ classical wf process
Within Subject Design vs. Between Subject
Design Randomized Groups (15-20 for statistical
inference) Possibilities Degradation Studies,
Benchmarking
28
Empirical Evaluation of Organizational Memory
Information Systems Felix-Robinson Aschoff
Ludger van Elst
1 Introduction 2 Contributions from Related
Fields 2.1 Knowledge Engineering 2.1.1
Generel Methods and Guidelines (Essential
Theories, Critical Success Metrics, Sisyphus,
HPKB) 2.1.2 Knowledge Acquisition 2.1.3
Ontologies 2.2 Human Computer Interaction 2.3
Information Retrieval 2.4 Software Engineering
(Goal-Question-Metric Technique) 3 Implications
for Organizational Memory Information
Systems 3.1 Implications for the evaluation of
OMIS 3.2 Relevant aspects of OMs for
evaluations and rules of thumb for
conducting evaluative research 3.3 Preliminary
sketch of an evaluation of FRODO References Appe
ndix A Technical evaluation of Ontologies
29
References
Aitken, S. Reid, S. (2000). Evaluation of an
ontology-based information retrieval tool.
Proceedings of 14th European Conference on
Artificial Intelligence. http//delicias.dia.fi.
upm.es/WORKSHOP/ECAI00/accepted-papers.html
Basili, V.R., Caldiera, G. Rombach, H.D.
(1994). Goal question metric paradigm. In John J.
Marciniak, editor, Encyclopedia of Software
Engineering, volume 1, 528532. John Wiley
Sons Berger, B., Burton, A.M., Christiansen, T.,
Corbridge, C., Reichelt, H. Shadbolt,
N.R.(1989) Evaluation criteria for knowledge
acquisition, ACKnowledgeproject deliverable
ACK-UoN-T4.1-DL-001B. University of Nottingham,
Nottingham Chin, D. N. (2001). Empirical
evaluation of user models and user-adapted
systems. User Modeling and User-Adapted
Interaction, 11 181-194 Cohen, P. (1995).
Empirical Methods for Artificial Intelligence.
Cambridge MIT Press. Cohen, P.R., Schrag,R.,
Jones E., Pease, A., Lin, A., Starr, B., Easter,
D., Gunning D., Burke, M. (1998). The DARPA
high performance knowledge bases project.
Artificial Intelligence Magazine. Vol. 19, No. 4,
pp.25-49. Gómez-Pérez, A. (1999). Evaluation of
taxonomic knowledge in ontologies and knowledge
bases. Proceedings of KAW'99. http//sern.ucalga
ry.ca/KSI/KAW/KAW99/papers.html Grüninger, M.
Fox, M.S. (1995) Methodology for the design and
evaluation of ontologies, Workshop on Basic
Ontological Issues in Knowledge Sharing,
IJCAI-95, Montreal. Hays, W. L. (1994).
Statistics. Orlando Harcourt Brace. Kagolovsky,
Y., Moehr, J.R. (2000). Evaluation of Information
Retrieval Old problems and new perspectives.
Proceedings of 8th International Congress on
Medical Librarianship. http//www.icml.org/tuesday
/ir/kagalovosy.htm Martin, D.W. (1995). Doing
Psychological Experiments. Pacific Grove
Brooks/Cole. Menzies, T. (1999a). Critical sucess
metrics evaluation at the business level.
International Journal of Human-Computer Studies,
51, 783-799. Menzies, T. (1999b). hQkb - The high
quality knowledge base initiative (Sisyphus V
learning design assessment knowledge).
Proceedings of KAW'99. http//sern.ucalgary.ca/K
SI/KAW/KAW99/papers.html Menzies, T. van
Harmelen, F. (1999). Editorial Evaluating
knowledge engineering techniques. International
Journal of Human-Computer Studies, 51,
715-727. Myers, B., Hollan, J. Cruz, I. (Ed.)
(1996). Strategic directions in human computer
interaction. ACM Computing Surveys, 28, 4 Nick,
M., Althoff, K., Tautz, C. (1999). Facilitating
the practical evaluation of knowledge-based
systems and organizational memories using the
goal-question-metric technique. Proceedings of
KAW 99. http//sern.ucalgary.ca/KSI/KAW/KAW99/pap
ers.html Shadbolt, N., O'Hara, K. Crow, L.
(1999).The experimental evaluation of knowledge
acquisition techniques and methods history,
problems and new directions. International
Journal of Human-Computer Studies, 51,
729-755. Tallis, M., Kim, J., Gil, Y. (1999).
User studies of knowledge acquisition tools
methodology and lessons learned. Proceedings of
KAW 99 http//sern.ucalgary.ca/KSI/KAW/KAW99/pa
pers.html Tennison, J., OHara, K., Shadbolt, N.
(1999) Evaluating KA tools Lessons from an
experimental evaluation of APECKS. Proceedings of
KAW99 http//sern.ucalgary.ca/KSI/KAW/KAW99/pap
ers/Tennison1/
30
Tasks for Workflow Evaluation
Possibles Tasks for workflow evaluation
experiment KiT Please write a report about
your personal greatest learning
achievements during the last semester. Find
sources related to these scientific
areas in the Internet. Prepare a Power Point
Presentation. To help you with these
task you will be provided with FRODO
weakly-structured wf / classical workflow Simple
structured task Please implement Netscape on
your computer and use the Internet to
find all universities in Iowa that offer
computer sciences. Use e-mail to ask for further
information. To help you with these task
you will be provided with FRODO
weakly-structured wf / classical workflow
31
(No Transcript)
32
GQM Goals for CBR-PEB
GQM of CBR-PEB Nick, Althoff, Tautz, 1999 GOAL 2 Economic utility
Analyze retrieved information
for the purpose of Monitoring
with respect to Economic utility
From the viewpoint of CBR system developers
in the context of Decision supp. for CBR devel.
Conceptual level (goals)
  • GQM-Goals should specify
  • a Purpose
  • a quality Issue
  • a measurement Object
  • a Viewpoint
  • Object of Measurement can be
  • Products
  • Processes
  • Resources

33
GQM Abstraction Sheet for CBR-PEB
Goal 2 Economic Utility for CBR - PEB
Quality factors Variation factors
1. Similarity of retrieved information as modeled in CBR-PEB (Q-12) 2. Degree of maturity (desiredmax.) development, prototype, pilot use (Q-13) ... 1. amount of background knowledge a. number of attributes (Q-8.1.1) ... 2. Case origin university, industrial research, industry ...
Baseline hypothesis Impact of variation factors
1.M.M.0.2 N.N.0.5 (scale 0..1) ... The estimates are on average. 1. The higher the amount of backround knowledge, the higher the similarity. (Q-8) 2. The more industrial the case origin, the higher the degree of maturity. (Q-9)...
34
GQM Questions and Metrics
GQM plan for CBR-PEB Goal 2 Economic Utility
Q-9 What is the impact of the case origin on the
degree of maturity? Q-9.1 What
is the case origin ? M-9.1.1 per retrieval
attempt for each chosen case case
origin university, industrial research,
industry Q-9.2 What is the degree of maturity
of the system? M-9.2.1 per retrieval attempt
for each chosen case case
attribute status prototype, being
developed, pilot system, application in
practical use unknown
35
FRODO GQM-Goal concerning Ontologies
For the circumstances FRODO is designed for
hybrid solutions are more successful than AI
solutions Purpose Compare Issue the
efficiency of Object (process) ontology
construction and use with respect to Stability,
Sharing Scope, Formality of Information Viewpoint
from the users viewpoint
36
GQM Abstraction Sheet for FRODO (Ont.)
Quality factors Variation factors
efficiency of ontology construction and use Sharing Scope Stability Formality
Baseline hypothesis Impact of variation factors
the experimental design will provide a controll group for comparison high Sharing Scope, medium Stability, low Formality -gtFRODO more successf. low Sharing Scope, high Stability, high Formality -gtAI more successful
37
GQM Questions and Metrics
What is the efficiency of the ontology
construction and use process using FRODO for a
situation with high sharing scope, medium
stability and low formality? What is the
efficiency of the ontology construction and use
process using FRODO for a situation with low
sharing scope, high stability and high
Formality? What is the efficiency of the
ontology construction and use process using AI
systems for these situations. Metrics efficiency
of ontology construction number of definitions
/ time efficiency of ontology use Information
Retrieval (Recall and Precision)
38
Hypothesis
H1 for Situation 1 (high sharing scope, medium
stability, low formality) FRODO will yield a
higher efficiency of ontology construction and
use. H2 for Situation 2 (low sharing scope,
high stability and high formality) an AI system
will yield higher efficiency of ontology
construction and use.
39
Experimental Design
2 x 2 factorial experiment independent variables
Situation (1/2) Systems
(FRODO/AI) Dependent variable efficiency of
ontology construction and use
Situation Situation
System Situation 1/ FRODO Situation 2/ FRODO
System Situation 1/ AI System Situation 2/ AI System
Within Subject Design vs. Between Subject
Design Randomized Groups (15-20 for statistical
inference)
40
Big evaluation versus small evaluation
Van Harmelen, 98
  • Distinguish different types of evaluation
  • Big evaluation evaluation of KA/KE
    methodologies
  • Small evaluation evaluation of KA/KE
    components (e.g. a particular PSM)
  • Micro evaluation evaluation of KA/KE product
    (e.g. a single system)
  • Some are more interesting than others
  • Big evaluation is impossible to control
  • Micro evaluation is impossible to generalize
  • Small evaluation might just be the only option

41
Knowledge Acquisition
  • Problems with the Evaluation of the KA process
    (Shadbolt et al, 1999)
  • the availability of human experts
  • the need for a gold standard of knowledge
  • the question of how many different domains and
    tasks should be
  • included
  • the difficulty of isolating the value-added of a
    single technique or
  • tool
  • how to quantify knowledge and knowledge
    engineering effort

42
Knowledge Acquisition
4) the difficulty of isolating the value-added
of a single technique or tool
  • Conduct a series of experiments
  • Test different implementations of the same
    techniqe against each other or against a paper
    and pencil version
  • Test groups of tools in complementary pairings or
    different orderings of the same set of tools
  • Test the value of single sessions against
    multiple sessions and the effect of feedback in
    multiple sessions
  • Exploit techniques from the evaluation of
    standard software to control for effects from
    interface, implementation etc.
  • Problem Scale-up of experimental programme

43
Essential Theory Approach
  1. Identify a process of interest.
  2. Create an essential theory t for that process.
  3. Identify some competing process description, ?
    T.
  4. Design a study that explores core pathways in
    both ? T and T.
  5. Acknowledge that your study may not be
    definitive.

Advantage Broad conceptual approach results
are of interest for the entire
comunity Problem Interpretation of results is
difficult (due to KE school or due to concrete
technology like implementation, interface etc?)
44
Three Aspects of Ontology Evaluation
  • Three aspects of evaluating ontologies
  • the process of constructing the ontology
  • the technical evaluation
  • end user assessment and ontology-user interaction

45
Assessment and Ontology-User Interaction
Assessment is focused on judging the
understanding, usability, usefulness,
abstraction, quality and portability of the
definitions by the users-point of view.
(Gómez-Pérez, 1999)
  • Ontology-user interaction in OMIS
  • more dynamic
  • success of OMIS rely on active use
  • users with heterogen skills, backgrounds and
    tasks
Write a Comment
User Comments (0)
About PowerShow.com