The Value in Evaluation - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

The Value in Evaluation

Description:

Steering effect- exams 'drive the learning'- students study/learn for the exam ... that support and reinforce teaching modalities and steer students' learning ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 41

Provided by: mssm

Category:

more less

Transcript and Presenter's Notes

Title: The Value in Evaluation

1
The Value in Evaluation

Erica Friedman
Assistant Dean of Evaluation
MSSM

2
DOES
Longitudinal Clinical Observation
OSCE
SHOWS HOW
SP
Practical
Oral
Application
Essay
Note review
KNOWS HOW
MEQ
MCQ
KNOWS
Written
Observed
FORMATIVE
SUMMATIVE
3
Session Goals

Understand the purposes of assessment
Understand the framework for selecting and
developing assessment methods
Recognize the benefits and limitations of
different methods of assessment
Conference Objectives
Review the goals and objectives for your course
or clerkship in the context of assessment
Identify the best methods of assessing your goals
and objectives

4
Purpose of Evaluation

To certify individual competence
To assure successful completion of
goals/objectives
To provide feedback
To students
To faculty, course and clerkship directors
As a statement of values (what is most critical
to learn)
For Program Evaluation- evaluation of an
aggregate, not an individual (ex. average ability
of students to perform a focused history and
physical)

5
Consequences of evaluation

Steering effect- exams drive the learning-
students study/learn for the exam
Impetus for change (feedback from students,
Executive Curriculum, LCME)

6
Definitions- Reliability

The consistency of a measurement over time or by
different observers- ( ex. a thermometer always
reads 98 degrees C when placed in boiling,
distilled water at sea level)
The proportion of variability in a score due to
the true difference between subjects (ex. The
difference between Greenwich time and the time on
your watch)-
Inter-rater reliability (correlation between
scores of 2 raters)
Internal reliability (correlation between items
within an exam)

7
Definitions-Validity

The ability to measure what was intended (the
thermometer reading is reliable but not valid)
Four types-
Face/content
Criterion
Construct/predictive
Internal

8
Types of validity

Face/content- Would experts agree that it
assesses whats important?-(drivers test
mirroring actual driving situation and
conditions)
Criterion- draw an inference from test scores to
actual performance. Ex. if a simulated drivers
test score predicts the road test score, the
simulation test is claimed to have a high degree
of criterion validity.
Construct/predictive- does it assess what it
intended to assess (ex. Drivers test as a
predictor of the likelihood of accidents- results
of your course exam predict the students
performance on that section of Step 1)
Internal- do other methods assessing the same
domain obtain similar results (similar scores
from multiple SPs assessing history taking
skills)

9
Types of Evaluations- Formative and Summative
Definitions

Formative evaluation- provide feedback so the
learner can modify their learning approach- When
the chef tastes the sauce, thats formative
evaluation
Summative evaluation- done to decide if a student
has met the minimum course requirements (pass or
fail)- usually judged against normative
standards- when the customer tastes the sauce,
thats summative evaluation

10
Conclusions about formative assessments

Stakes are lower (not determining passing or
failing, so lower reliability is tolerated)
Desire more information, so they may require
multiple modalities (it is rare for one
assessment method to identify all critical
domains) for validity and reliability
Use evaluation methods that support and reinforce
teaching modalities and steer students learning
May only identify deficiencies but not define how
to remediate

11
Conclusions about summative assessments

Stakes are higher- students may pass who are
incompetent or may fail and require remediation
Desire high reliability (gt0.8) so often require
multiple questions/problems or cases (20-30
stations/OSCE, 15-20 cases for oral
presentations, 700 questions for an MCQ)
Desire high content validity (single cases have
low content validity and are not representative)
Desire high predictive validity (correlation with
future performance), which is often hard to
achieve
Consider reliability, validity, benefit and cost
(resources, time and ) in determining the best
assessment tools

12
How to Match Assessment to Goals and Teaching
Methods

Define the type of learning (lecture, small
group, computer module/self study, etc)
Define the domain to be assessed (knowledge,
skill, behavior) and the level of performance
expected (knows, knows how, shows how or does)
Determine the type of feedback required

13
Purpose of feedback

For students To provide a good platform to
support and enhance student learning
For faculty To determine what works (what
facilitated learning and who were appropriate
role models)
For students and faculty To determine areas that
require improvement

14
Types of Feedback

Quantitative
Total score compared to other students, providing
the high, low and mean score and minimum
requirement for passing grade
Qualitative
Written personal feedback identifying areas of
strength and weakness
Oral feedback one on one or in a group to discuss
the areas of deficiency to help guide further
learning

15
Evaluation Bias-Pitfall

Can occur with any evaluation requiring
interpretation by an individual (all methods
other than MCQ)
Expectation bias (halo effect)- prior knowledge
or expectation of the outcome influences the
ratings (especially a global rating)
Audience effect- a learners performance is
influenced by the presence of an observer (seen
especially with skills and behaviors)
Rater traits- the training of the rater or the
raters traits affect the reliability of the
observation

16
Types of assessment tools-Written

Does not require an evaluator to be present
during the assessment and can be open or closed
book
Multiple choice question (MCQ)
Modified short answer essay question (MEQ)-
Patient management problem is a variation of this
Essay
Application test
Medical note/chart review

17
Types of assessment tools-Observer Dependent
Interaction

Usually requires active involvement of an
assessor and occurs as a single event
Practical
Medical record review
Standardized Patient(s) (SP)
Objective Structured Clinical Examination (OSCE)
Oral examination- chart stimulated recall triple
jump or direct observation

18
Types of assessment tools- Observer Dependent
Longitudinal Interaction

Continual evaluation over time
Preceptor evaluation either completion of a a
critical incident report or structured rating
form based on direct observation over time
Peer evaluation
Self evaluation

19
MCQ

Definition A test composed of questions on
which each stem is followed by several
alternative answers. The examinee must select
the most correct answer.
Measures Knows and Knows how
Pros Efficient cheap samples large content
domain (60 questions/hour) high reliability
easy objective scoring, direct correlate of
knowledge with expertise
Cons Often a recall of facts provides
opportunity for guessing (good test-taker)
unrealistic doesnt provide information about
the thought process encourages learning to
recall
Suggestions Create questions that can be
answered from the stem alone avoid always,
frequently, all or none randomly assign correct
answers can correct for guessing (penalty
formula)

20
MEQ

Definition A series of sequential questions in
a linear format based on an initial limited
amount of information. It requires immediate
short answers followed by additional information
and subsequent questions. (patient management
problem is a variation of this type)
Measures Knows and Knows how
Pros Can assess problem solving, hypothesis
generation and data interpretation
Cons Low inter-case reliability less content
validity harder to administer time consuming to
grade and variable inter-rater reliability
Suggestions Use directed (not open ended)
questions provide extensive answer key

21
Open ended essay question

Definition Question allowing a student the
freedom to decide the topic to address and the
position to take- it can be take home
Measures Knows, Knows how
Pros Assesses ability to think (generate ideas,
weigh arguments, organize information, build and
support conclusions and communicate thoughts
high face validity
Cons Low reliability time intensive to grade
narrow coverage of content
Suggestions strictly define the response and
the rating criteria

22
Application test

Definition Open book problem solving test
incorporating a variety of MCQs and MEQs. It
provides a description of a problem with data.
The examinee is asked to interpret the data to
solve the problem. (ex. Quiz item 3)
Measures Knows and knows how
Pros Assesses higher learning good
face/content validity reasonable reliability
useful for formative and summative feedback
Cons Harder to create and grade

23
Practical Exam

Definition Hands on exam to demonstrate and
apply knowledge (ex. Culture and identify the
bacteria on the glove of a Sinai cafeteria
worker, or performance of a history and physical
on a patient)
Measures Know, knows how , and ? shows how and
does
Pros Can test multiple domains, actively
involves the learner (good steering effect) best
suited for procedural/technical skills higher
face validity
Cons Labor intensive (creation and grading)
hard to identify gold standard, so subjective
grading high rate of item failure (unanticipated
problems with administration)
Suggestions Pilot first adequate, specific
instructions and goals specific, defined
criteria for grading and train raters for direct
observation, require multiple encounters for
higher reliability

24
Medical record/note review

Definition Examiner reviews learners previously
created document can be random
Measures Knows how and Does
Pros Can review multiple records for higher
reliability high face validity less costly than
oral (done without learner and at examiners
convenience)
Cons Lower inter-rater reliability less
immediate feedback unable to determine basis for
decisions
Suggestions Create a template with specific
ratings for skills

25
Standardized Patients

Definition Simulated patient/actor trained to
present history in reliable, consistent manner
and to use a checklist to assess students skills
and behaviors
Measures Knows, Knows how, Shows how and Does
Pros High face validity can assess multiple
domains can be standardized can give immediate
feedback
Cons Costly labor intensive must use multiple
SPs for high reliability

26
OSCE (Objective Structured Clinical Exam)

Definition Task oriented, multi-station exam
stations can be 5-30 minutes and require written
answers or observation (ex. Take orthostatic VS
perform a cardiac exam smoking cessation
counseling read and interpret CXR or EKG
results communicate lab results and advise a
patient
Measures Knows, Knows how, Shows how and Does

27
OSCE (Objective Structured Clinical Exam)

Pros Assesses clinical competency tests a wide
range of knowledge, skills and behaviors can
give immediate feedback good test-retest
reliability good content and construct validity
less patient and examiner variability than with
direct observation
Cons Costly (manpower and ) case specific
requires gt 20 stations for internal consistency
weaker criterion validity

28
Oral Examination

Definition Method of evaluating a learners
knowledge by asking a series of questions. The
process is open ended with the examiner directing
the questions. (ex. chart stimulated patient
recall or a triple jump)
Measures Knows, Knows how, sometimes Shows how
and does

29
Oral Exam

Pros Can measure clinical judgement,
interpersonal skills (communication) and
behavior high face validity flexible can
provide direct feedback
Cons Poor inter-rater reliability (dove vs hawk
and observer bias) content specific so low
reliability (must use gt 6 cases to increase
reliability) labor intensive
Suggestions multiple short cases define
questions and answers provide simple rating
scales and train raters

30
Triple Jump

Definition Three step written and oral exam-
written, research and then oral part- (ex.
COMPASS 1)
Measures Knows, knows how, shows how and does
Pros Assesses hypothesis generation, use of
resources, application of knowledge to problem
solve and self directed learning provides
immediate feedback high face validity
Cons only for formative assessment (poor
reliability) time/faculty intensive too content
specific and inconsistent rater evaluations

31
Clinical Observations

Definition Assessment of various domains
longitudinally by an observer- either preceptor,
peer or self (small group evaluations during
first two years and preceptor ratings during
clinical exposure)
Measures Knows, knows how, Shows how and Does
Pros Simple efficient high face validity
formative and summative

32
Clinical Observations

Cons low reliability (only recent encounters
often influence grade) halo effect (lack of
domain discrimination) more often a judgement of
personality and Lake Woebegone effect (all
students are necessarily above average)
unwillingness to document negative ratings (fear
of failing someone)
Suggestions Frequent ratings and feedback
increase the number of observations multiple
assessors (with group discussion about specific
ratings)

33
Peer/Self Evaluation

Pros Useful for formative feedback
Cons Lack of correlation with faculty
evaluations same cons as others (measure of
nice guy, low reliability, halo effect- peer
evaluations have friend effect or fear of
retribution or desire to penalize
Suggestions limit the of behaviors assessed
clarify the difference between evaluation of
professional and personal aspects develop
operationally proven criteria for rating provide
multiple opportunities for students to do this
and provide feedback from faculty

34
Erica Friedmans Educational Pyramid
Direct Observation, Practical
Does
OSCE, T Jump Oral, SP, Practical Chart review
Shows How
MEQ,Essay
Knows How
MCQ
Knows
35
(No Transcript)
36
Critical factors for choosing an evaluation tool

Type of evaluation and feedback desired
formative/summative
Focus of evaluation
Knowledge, skills, behaviors (attitudes)
Level of evaluation
Know, Knows how, Shows how, Does
Pros/Cons
Validity, Reliability, Cost (time, resources)

37
How to be successful

Students should be clear about the
course/clerkship goals and the specifics about
the types of assessments used and the criteria
for passing (and if relevant, just short of
honors and honors)
Make sure the choice of assessments is consistent
with the values of your course and the school
Final judgments about students progress should
be based on multiple assessments using a variety
of methods over a period of time (instead of one
time point)