COM362 Knowledge Engineering - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

COM362 Knowledge Engineering

Description:

Desirable to test all possible routes through the knowledge base ... Can be focused very specifically on parts of the knowledge base ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 36
Provided by: john245
Category:

less

Transcript and Presenter's Notes

Title: COM362 Knowledge Engineering


1
Testing and Validation of Knowledge-Based Systems
  • John MacIntyre
  • 0191 515 3778
  • john.macintyre_at_sunderland.ac.uk

2
Testing and Validation
  • Testing is not the same as validation!
  • Testing involves measuring the functional
    correctness of the system
  • Validation involves assessing the accuracy and
    scope of the system
  • Like knowledge acquisition, testing and
    validation are a bottleneck in the development of
    KBS

3
Put simply...
  • TESTING is the process which determines if the
    prototype or final system does what it is
    supposed to do
  • VALIDATION is the process which determines
    whether the system actually solves the problem
    which it is supposed to address

4
Problems in Prototyping
  • Two arguments are often made
  • ARGUMENT 1
  • It is more difficult to validate prototypes
    because the development process is less
    structured
  • ARGUMENT 2
  • Prototypes take more effort than traditional
    software development, especially in testing and
    validation

5
Argument 1
  • Interactive development process involves experts
    who carry out inherent validation during each
    prototyping stage
  • Testing and validation at each stage of the
    prototype actually helps to ensure successful and
    timely completion
  • Testing and validation of KBS is, therefore, a
    CONSTANT process (unlike traditional software)

6
Argument 2
  • Research has shown that software built by
    prototyping showed the following when compared to
    traditionally-built systems
  • 40 per cent less code
  • 45 per cent less effort
  • comparable performance
  • shorter development time overall
  • Boehm et al, 1984

7
Issues to be Resolved
  • How will the Knowledge Engineer plan the testing
    and validation process?
  • What techniques will be used in testing and
    validation?
  • Who will carry out the testing and validation
    procedures?
  • How will the results be recorded, and amendments
    made?

8
Planning
  • Testing and validation must be planned BEFORE the
    construction of the prototype
  • Test cases should be identified at an early stage
  • May be necessary to include hooks or special
    testing facilities in the software
  • May also be necessary to build in special
    reporting (at various levels of the KBS)

9
Testing Objectives
  • To check that the software is functioning
    correctly at all levels
  • To determine how the software will perform when
    presented with untested or unexpected cases
  • Desirable to test all possible routes through the
    knowledge base
  • However, many KBS are much too complex for this
    to be possible

10
...continued
  • Four major points to be considered
  • COMPLETENESS
  • is anything missing?
  • CONSISTENCY
  • does each part interface with every other part of
    the system?
  • ROBUSTNESS
  • how does the system perform when items of data
    are missing and/or incorrect?
  • SEQUENCE INDEPENDENCE
  • is the systems non-procedural knowledge complete
    free of sequence dependencies?

11
Completeness
  • Difficult to define - how do we know?
  • Can check that the system has a mechanism for
    dealing with any type of input (eg ve, -ve, and
    0)
  • Two approaches
  • DIRECT - KE can work with expert to check every
    possible input
  • INDIRECT - KE generates test values independent
    of the expert, focusing on the structure of the
    knowledge base

12
Consistency
  • What happens if part of the system does not
    interface correctly with other parts?
  • Check data formats (eg one part generates paint
    colour where another expects stock no. also
    units of measurement)
  • Check either that data received has already been
    checked, or is checked by this component (eg
    range checking)
  • Check interface design (input, output)

13
Robustness
  • Robustness means the systems ability to
    produce correct output in the face of
    deteriorating input
  • Input may deteriorate in
  • QUANTITY - more values unknown
  • QUALITY - more values incorrect
  • Robust systems display graceful degradation
  • slow decay in performance in response to a slow
    decay in input quality/quantity

14
...continued
  • It is easier to test for robustness than it is to
    build it into the system!
  • Quantity and quality of test data can be
    gradually decreased
  • This can be done randomly, but it is better to
    corrupt the data in a realistic way (reflecting
    actual use)
  • Can be difficult to determine a measure of how
    close the system is to the right answer - an
    error measure

15
Sequence Independence
  • Rules are usually constructed on the premise that
    they can be fired whenever all of their
    antecedents are satisfied
  • Inference engine will determine the order in
    which the rules are fired
  • However, many developers include implicit
    procedural information in rules
  • Can lead to a rule base which has been working
    well suddenly producing unexpected results

16
...continued
  • Varying the order in which data is presented to
    the system can highlight dependencies
  • However, some situations will require a broad
    range of data to test all dependencies
  • Knowledge Engineer may decide to impose
    dependencies by specifying the order of rule
    firing - but beware!

17
Testing Techniques
  • The following are the most common test techniques
    for KBS
  • TEST CASES
  • TRACING
  • CONSISTENCY CHECKING FUNCTIONS
  • RULE ORDER FIRING VARIATION
  • REGRESSION TESTING

18
Test Cases
  • Aim to determine if the system can arrive at a
    proper conclusion for a given set of inputs
  • Also aim to determine why an improper conclusion
    is derived
  • Test cases must be carefully designed, and a test
    programme developed before the prototype is built
  • Correct and acceptable results must be defined

19
Tracing
  • Aims to determine how the system arrived at its
    conclusions
  • Actually used for both testing and debugging
  • Can be focused very specifically on parts of the
    knowledge base
  • Data still required to give a meaningful path to
    trace

20
continued
  • However, trace results can be difficult to
    interpret - even for the KE!

21
Consistency Checking
  • Particularly appropriate for object-oriented
    applications
  • Each object can have a function (or method) to
    perform consistency checking
  • Check for
  • Existence of all required objects
  • State of all required objects
  • Assess missing objects/states

22
Varying Rule Firing Order
  • Dependencies can be very difficult to identify
  • Change sequence of presentation of the same data
    set to all possible permutations - is the same
    result achieved?
  • Change the position of rules in lists/hierarchies
    - is there any difference in behaviour?
  • Check tie-breaking mechanisms

23
Regression Testing
  • Based on iterative nature of KBS development
  • Requires a standard set of test cases used at
    each stage of development
  • Each successive prototype presented with the
    standard test set
  • Can the new version at least duplicate the
    results of the last version?
  • Measures can be used to determine the level of
    performance (automatically?)

24
Validation Objectives
  • To check that the system actually solves the
    problem
  • To check that the specified result(s) are the
    right result(s)
  • To check that the system helps the user to make
    better, faster decisions
  • If the system arrives at the right results by the
    wrong process, this is a TESTING issue

25
...continued
  • Major points to be considered
  • COVERAGE
  • Scope
  • Productivity
  • Effectiveness
  • Skill level
  • Training
  • Compatability
  • AUTHORITY
  • validity of the validation!

26
Coverage
  • There are many issues which must be covered to
    ensure that the system is valid for deployment
  • Difficult to encompass these all into a single
    measure of validity
  • Therefore usually treated separately, sometimes
    scored on a grid system with appropriate
    weightings
  • Criteria for validity (for each issue and
    overall) need to be defined

27
Coverage Issues
  • SCOPE
  • Can the system address the full range of problems
    originally targeted?
  • PRODUCTIVITY
  • Can the user process problems at the expected (or
    acceptable) rate?
  • EFFECTIVENESS
  • Can the user use the system to produce solutions
    of acceptable quality?

28
...continued
  • SKILL LEVEL
  • Can users with the expected level of training use
    the full capabilities of the system?
  • TRAINING
  • Do users develop a better understanding of the
    problem by using the system?
  • COMPATABILITY
  • Is the normal operation of the system compatible
    with normal business operations?

29
Authority
  • Many people may be involved in validating the
    system
  • Issues of cost, scope, productivity,
    effectiveness etc probably need to be evaluated
    by different people
  • It is essential that the right people handle the
    appropriate parts of the process
  • Individual assessments must be integrated into an
    agreed framework

30
Validation Techniques
  • Many techniques for validation, but the most
    common are
  • Testing with actual data
  • Testing with contrived data
  • Direct examination of the knowledge base by
    expert(s)
  • Direct examination of the knowledge base by
    knowledge engineer(s)
  • Parallel use of system by expert(s)
  • Parallel use of system by non-expert(s)
  • Decision/contingency tables

31
Using Actual Data
  • Usually available in the standard test set
  • Allows validator to assess the system in real
    life situations
  • Will highlight issues directly relevant to
    deployment
  • Can allow the users to develop a level of
    comfort with the system - especially if
    effectiveness outweighs effort

32
Using Contrived Data
  • Usually difficult to derive
  • Needs to be carefully designed
  • Can allow the validator to assess the systems
    performance in extreme cases
  • Gives no guarantee that problems will be
    highlighted

33
Direct Examination of KB
  • Can be done if the knowledge is in a readable
    form
  • Can be done by
  • EXPERT
  • understands the domain
  • may not understand the structures
  • KNOWLEDGE ENGINEER
  • will be familiar with the structures
  • may not fully understand the domain

34
Parallel Use
  • System can be deployed in parallel with current
    system, and comparisons made
  • Results should be carefully documented, and not
    anecdotal
  • Can be used in parallel by
  • EXPERT - to compare with normal approach to
    problem resolution or decision making, and
    evaluate scope
  • NON-EXPERT - to evaluate effectiveness and
    productivity

35
Decision Tables etc.
  • Other methods are Decision or Contingency Tables,
    or Statistical Analysis
  • However, these methods comprise only about 1 of
    all commercial KBS validation effort OLeary,
    1994
  • Require careful compilation of grids for all
    possible decisions or contingencies, or
    mathematical measures of accuracy
  • Difficult to implement on large systems
Write a Comment
User Comments (0)
About PowerShow.com