COM362 Knowledge Engineering - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

COM362 Knowledge Engineering

Description:

Desirable to test all possible routes through the knowledge base ... Can be focused very specifically on parts of the knowledge base ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 36

Provided by: john245

Category:

more less

Transcript and Presenter's Notes

Title: COM362 Knowledge Engineering

1
Testing and Validation of Knowledge-Based Systems

John MacIntyre
0191 515 3778
john.macintyre_at_sunderland.ac.uk

2
Testing and Validation

Testing is not the same as validation!
Testing involves measuring the functional
correctness of the system
Validation involves assessing the accuracy and
scope of the system
Like knowledge acquisition, testing and
validation are a bottleneck in the development of
KBS

3
Put simply...

TESTING is the process which determines if the
prototype or final system does what it is
supposed to do
VALIDATION is the process which determines
whether the system actually solves the problem
which it is supposed to address

4
Problems in Prototyping

Two arguments are often made
ARGUMENT 1
It is more difficult to validate prototypes
because the development process is less
structured
ARGUMENT 2
Prototypes take more effort than traditional
software development, especially in testing and
validation

5
Argument 1

Interactive development process involves experts
who carry out inherent validation during each
prototyping stage
Testing and validation at each stage of the
prototype actually helps to ensure successful and
timely completion
Testing and validation of KBS is, therefore, a
CONSTANT process (unlike traditional software)

6
Argument 2

Research has shown that software built by
prototyping showed the following when compared to
traditionally-built systems
40 per cent less code
45 per cent less effort
comparable performance
shorter development time overall
Boehm et al, 1984

7
Issues to be Resolved

How will the Knowledge Engineer plan the testing
and validation process?
What techniques will be used in testing and
validation?
Who will carry out the testing and validation
procedures?
How will the results be recorded, and amendments
made?

8
Planning

Testing and validation must be planned BEFORE the
construction of the prototype
Test cases should be identified at an early stage
May be necessary to include hooks or special
testing facilities in the software
May also be necessary to build in special
reporting (at various levels of the KBS)

9
Testing Objectives

To check that the software is functioning
correctly at all levels
To determine how the software will perform when
presented with untested or unexpected cases
Desirable to test all possible routes through the
knowledge base
However, many KBS are much too complex for this
to be possible

10
...continued

Four major points to be considered
COMPLETENESS
is anything missing?
CONSISTENCY
does each part interface with every other part of
the system?
ROBUSTNESS
how does the system perform when items of data
are missing and/or incorrect?
SEQUENCE INDEPENDENCE
is the systems non-procedural knowledge complete
free of sequence dependencies?

11
Completeness

Difficult to define - how do we know?
Can check that the system has a mechanism for
dealing with any type of input (eg ve, -ve, and
0)
Two approaches
DIRECT - KE can work with expert to check every
possible input
INDIRECT - KE generates test values independent
of the expert, focusing on the structure of the
knowledge base

12
Consistency

What happens if part of the system does not
interface correctly with other parts?
Check data formats (eg one part generates paint
colour where another expects stock no. also
units of measurement)
Check either that data received has already been
checked, or is checked by this component (eg
range checking)
Check interface design (input, output)

13
Robustness

Robustness means the systems ability to
produce correct output in the face of
deteriorating input
Input may deteriorate in
QUANTITY - more values unknown
QUALITY - more values incorrect
Robust systems display graceful degradation
slow decay in performance in response to a slow
decay in input quality/quantity

14
...continued

It is easier to test for robustness than it is to
build it into the system!
Quantity and quality of test data can be
gradually decreased
This can be done randomly, but it is better to
corrupt the data in a realistic way (reflecting
actual use)
Can be difficult to determine a measure of how
close the system is to the right answer - an
error measure

15
Sequence Independence

Rules are usually constructed on the premise that
they can be fired whenever all of their
antecedents are satisfied
Inference engine will determine the order in
which the rules are fired
However, many developers include implicit
procedural information in rules
Can lead to a rule base which has been working
well suddenly producing unexpected results

16
...continued

Varying the order in which data is presented to
the system can highlight dependencies
However, some situations will require a broad
range of data to test all dependencies
Knowledge Engineer may decide to impose
dependencies by specifying the order of rule
firing - but beware!

17
Testing Techniques

The following are the most common test techniques
for KBS
TEST CASES
TRACING
CONSISTENCY CHECKING FUNCTIONS
RULE ORDER FIRING VARIATION
REGRESSION TESTING

18
Test Cases

Aim to determine if the system can arrive at a
proper conclusion for a given set of inputs
Also aim to determine why an improper conclusion
is derived
Test cases must be carefully designed, and a test
programme developed before the prototype is built
Correct and acceptable results must be defined

19
Tracing

Aims to determine how the system arrived at its
conclusions
Actually used for both testing and debugging
Can be focused very specifically on parts of the
knowledge base
Data still required to give a meaningful path to
trace

20
continued

However, trace results can be difficult to
interpret - even for the KE!

21
Consistency Checking

Particularly appropriate for object-oriented
applications
Each object can have a function (or method) to
perform consistency checking
Check for
Existence of all required objects
State of all required objects
Assess missing objects/states

22
Varying Rule Firing Order

Dependencies can be very difficult to identify
Change sequence of presentation of the same data
set to all possible permutations - is the same
result achieved?
Change the position of rules in lists/hierarchies
- is there any difference in behaviour?
Check tie-breaking mechanisms

23
Regression Testing

Based on iterative nature of KBS development
Requires a standard set of test cases used at
each stage of development
Each successive prototype presented with the
standard test set
Can the new version at least duplicate the
results of the last version?
Measures can be used to determine the level of
performance (automatically?)

24
Validation Objectives

To check that the system actually solves the
problem
To check that the specified result(s) are the
right result(s)
To check that the system helps the user to make
better, faster decisions
If the system arrives at the right results by the
wrong process, this is a TESTING issue

25
...continued

Major points to be considered
COVERAGE
Scope
Productivity
Effectiveness
Skill level
Training
Compatability
AUTHORITY
validity of the validation!

26
Coverage

There are many issues which must be covered to
ensure that the system is valid for deployment
Difficult to encompass these all into a single
measure of validity
Therefore usually treated separately, sometimes
scored on a grid system with appropriate
weightings
Criteria for validity (for each issue and
overall) need to be defined

27
Coverage Issues

SCOPE
Can the system address the full range of problems
originally targeted?
PRODUCTIVITY
Can the user process problems at the expected (or
acceptable) rate?
EFFECTIVENESS
Can the user use the system to produce solutions
of acceptable quality?

28
...continued

SKILL LEVEL
Can users with the expected level of training use
the full capabilities of the system?
TRAINING
Do users develop a better understanding of the
problem by using the system?
COMPATABILITY
Is the normal operation of the system compatible
with normal business operations?

29
Authority

Many people may be involved in validating the
system
Issues of cost, scope, productivity,
effectiveness etc probably need to be evaluated
by different people
It is essential that the right people handle the
appropriate parts of the process
Individual assessments must be integrated into an
agreed framework

30
Validation Techniques

Many techniques for validation, but the most
common are
Testing with actual data
Testing with contrived data
Direct examination of the knowledge base by
expert(s)
Direct examination of the knowledge base by
knowledge engineer(s)
Parallel use of system by expert(s)
Parallel use of system by non-expert(s)
Decision/contingency tables

31
Using Actual Data

Usually available in the standard test set
Allows validator to assess the system in real
life situations
Will highlight issues directly relevant to
deployment
Can allow the users to develop a level of
comfort with the system - especially if
effectiveness outweighs effort

32
Using Contrived Data

Usually difficult to derive
Needs to be carefully designed
Can allow the validator to assess the systems
performance in extreme cases
Gives no guarantee that problems will be
highlighted

33
Direct Examination of KB

Can be done if the knowledge is in a readable
form
Can be done by
EXPERT
understands the domain
may not understand the structures
KNOWLEDGE ENGINEER
will be familiar with the structures
may not fully understand the domain

34
Parallel Use

System can be deployed in parallel with current
system, and comparisons made
Results should be carefully documented, and not
anecdotal
Can be used in parallel by
EXPERT - to compare with normal approach to
problem resolution or decision making, and
evaluate scope
NON-EXPERT - to evaluate effectiveness and
productivity

35
Decision Tables etc.

Other methods are Decision or Contingency Tables,
or Statistical Analysis
However, these methods comprise only about 1 of
all commercial KBS validation effort OLeary,
1994
Require careful compilation of grids for all
possible decisions or contingencies, or
mathematical measures of accuracy
Difficult to implement on large systems

Write a Comment

User Comments (0)