Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluation

Description:

... issues, need to evaluate with a number of subjects (not just do it yourself) ... Video the session, analyse afterwards. 22. Methods and measures ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 29
Provided by: harold
Category:
Tags: evaluation

less

Transcript and Presenter's Notes

Title: Evaluation


1
Evaluation
2
Evaluation
  • Personal evaluation
  • Software validation
  • Software evaluation

3
Personal evaluation
  • What have I achieved?
  • Have I achieved what I set out to achieve?
  • Where have I fallen short?
  • Why?
  • What could I have done better?
  • Assumes an a priori statement of what you
    hope/expect/intend to achieve

4
Self evaluation in your dissertation
  • Dissertation plan
  • Introduction
  • Background
  • Success criteria
  • Design
  • Realisation
  • Evaluation/Testing
  • Conclusions Further Work
  • Ch 3 lays out success criteria by which success
    of project is to be judged
  • Ch 6 will review work done in Ch 5 with respect
    to these criteria, including reflection on
    overall validity of the approach
  • But this is not software evaluation

5
Program validation
  • Systematically check all functions in your
    program/application
  • Systematically check all sequences of inputs etc.
  • Does your program/application do what you think
    it is supposed to do?
  • This is important, but ...
  • This is not software evaluation

6
Software evaluation
  • Note We are using the term software in a very
    vague sense it could include a program, a web
    application, any sort of implementation that does
    something
  • Evaluate the appropriateness of the software with
    respect to its intended use
  • Large range of aspects of software that can be
    evaluated

7
Evaluation evaluation
  • In your dissertation you are asked to evaluate
    what you have achieved
  • Your research could (should?) include an
    evaluation element
  • So you will need to evaluate your evaluation
  • Your evaluation might have negative results, but
    still be an informative experiment which you can
    evaluate positively
  • Your research could even be to compare evaluation
    schemes!

8
A case study
  • Last year a student of mine did a project which
    was a comparative evaluation of a number of
    speech synthesis devices
  • His dissertation discussed
  • Factors in setting up a comparative evaluation
  • A description of the actual evaluation
  • A discussion of the results
  • His personal evaluation then considered how well
    the experiment (i.e. the evaluation) had been
    conducted

9
Software evaluation
  • Functionality does it do what is supposed to
    do?
  • Reliability does it do the same thing under the
    same conditions?
  • Usability is it user-friendly?
  • Efficiency cost, speed, etc.
  • Maintainability can you modify it? Is it
    robust?
  • Portability can it be transferred from one
    environment/platform to another?

10
Software evaluation
  • Evaluating commercial software is different from
    evaluating something you have constructed
  • Even if you have constructed it from commercially
    available components
  • Again, note the difference between validation and
    evaluation
  • Especially concerning functionality
  • Also, evaluation not the same as a software
    review, as found eg in a magazine

11
Stakeholders
  • Developers
  • Researchers
  • Commercial developers
  • End-users
  • Actual end-users (is this a single type?)
  • Their managers (buyers)
  • Vendors
  • Investors

12
Evaluation types
  • Feasibility / Suitability
  • For any of the above stakeholders
  • Internal evaluation
  • For development
  • Iterative testing, to evaluate progress
  • Adequacy evaluation
  • Diagnostic evaluation (debugging)
  • Black box vs. glass box evaluation

13
Evaluation types
  • Declarative evaluation
  • How well does it perform?
  • Comparison with a gold standard ideal
    performance
  • Comparison with a baseline wooden block
  • Usability evaluation
  • How long does each step take?
  • Is it natural, intuitive?
  • Is it easy to learn to use?
  • Is it well documented?

14
Evaluation types
  • Operational evaluation
  • ROI
  • Compatibility with other software
  • Consistency of interfaces
  • Internal
  • With respect to standards (eg Microsoft)
  • Failsofts
  • Role of humans
  • Preparation, throughput, correction, output
  • Backup
  • Documentation
  • Support
  • Corporate situation of provider

15
Framework for evaluation
  • Definition of the relevant quality
    characteristics what is it you want to
    evaluate? Be specific
  • Definition of attributes pertinent to this
    quality
  • Definition of a measure able to provide values
    for these attributes
  • Definition of a method whereby the measure can be
    made

16
Framework for evaluation
  • Important to be sure that
  • The quality to be evaluated is genuinely a
    quality that is claimed of the software
  • The attribute to be measured does reflect the
    quality in question
  • The measure does genuinely measure that attribute
    (and not some other one)
  • The method is sufficient to deliver a meaningful
    measure

17
Example spell checker
  • Function
  • (a) identify wrongly-spelled word
  • (b) suggest an appropriate correction
  • (among other features)
  • Quality ability to do (a)
  • Attribute success rate in performance of that
    task
  • Measure Precision percentage of
    wrongly-spelled words correctly identified in a
    document
  • Method give it a text with some wrongly-spelled
    words and count how many it spots

18
Example spell checker
  • Good evaluation, but not A
  • Success means
  • Identifying misspelled words (true positives)
  • Ignoring correctly spelled words (true negatives)
  • So is the measure really appropriate? We are only
    counting true positives and false negatives we
    are not giving credit for the true negatives, nor
    penalising false positives
  • The method is underspecified
  • How much text?
  • What sort of text?
  • Should we take into account what we know about
    spell checking (a certain class of error is very
    hard to detect)?
  • Should we classify misspellings and measure
    different classes separately?

19
Attributes
  • Different types imply different measures/methods
  • Example dish-washers

a pre-wash rinse cycle b independent rinse
cycle
20
Methods and measures
  • Objective measures
  • Measuring, counting, timing
  • Doing a specific task
  • In case of usability issues, need to evaluate
    with a number of subjects (not just do it
    yourself)
  • Comparison against a gold standard
  • Precision
  • Recall
  • Other measures also considering false positives
    and negatives

21
Methods and measures
  • Subjective measures
  • Interview after use
  • Feedback questionnaire
  • Rating scales (usually 5 or 7 points, DK, N/A)
  • Open-ended questions?
  • Questions should relate to some specific point
  • Repeat (some) questions in a disguised way
  • Performance analysis
  • Video the session, analyse afterwards

22
Methods and measures
  • Dont try to measure too many different things
    with the same instrument
  • Though this can be possible to some extent
  • But extraneous factors need to be controlled
    carefully
  • Problem of statistical significance
  • Do you have enough subjects to know that the
    differences (and similarities) are not just
    random fluctuations?

23
Example
  • Simulated doctor-patient interviews with patients
    with limited English, using computer-based
    communication device with symbols and digitised
    speech
  • two devices (laptopmousepad, tabletstylus)
  • doctors and nurses
  • literate and illiterate patients

24
(No Transcript)
25
Example
  • General question could they get to the end of
    the consultation? (How did we measure this?)
  • Objective measures
  • How long did it take?
  • How many questions did they ask?
  • How many answers were (apparently) correctly
    understood?
  • Subjective measures
  • Feedback questionnaire with satisfaction ratings
  • Open-ended questions about specific issues

26
Subjects
  • Many types of evaluation require volunteers
  • How many do you need?
  • Where will you get them from?
  • Are they suitable?
  • Exclusion factors eg prior familiarity with your
    topic
  • Need to control for irrelevant differences in
    their profile
  • How will you guarantee their cooperation?
  • Ethical issues
  • Officially, you need ethics clearance for any
    experiments involving living beings!
  • In any case, important that volunteers know what
    they are letting themselves in for
  • Also important that you dont waste peoples
    time, eg evaluating a useless task (for example
    as a baseline)

27
Summary
  • What are you trying to evaluate?
  • Be specific, not general eg What do you think of
    this interface?
  • What is the best way to measure what you are
    interested in?
  • How feasible is it to do what you want?
  • After Easter How to write it all up!

28
Next session
  • No class next week
  • First week after Easter (19 Apr)
  • No class on Thursday
  • Instead, practical sessions on Library Resources
    with Barry White
  • choose one of three sessions
  • each at 2pm-4pm
  • Wed 18, Thur 19 or Fri 20 April
  • in the Joule Library
  • Do we need a sign-up sheet?
Write a Comment
User Comments (0)
About PowerShow.com