Title: The Value in Evaluation
1The Value in Evaluation
- Erica Friedman
- Assistant Dean of Evaluation
- MSSM
2DOES
Longitudinal Clinical Observation
OSCE
SHOWS HOW
SP
Practical
Oral
Application
Essay
Note review
KNOWS HOW
MEQ
MCQ
KNOWS
Written
Observed
FORMATIVE
SUMMATIVE
3Session Goals
- Understand the purposes of assessment
- Understand the framework for selecting and
developing assessment methods - Recognize the benefits and limitations of
different methods of assessment - Conference Objectives
- Review the goals and objectives for your course
or clerkship in the context of assessment - Identify the best methods of assessing your goals
and objectives
4Purpose of Evaluation
- To certify individual competence
- To assure successful completion of
goals/objectives - To provide feedback
- To students
- To faculty, course and clerkship directors
- As a statement of values (what is most critical
to learn) - For Program Evaluation- evaluation of an
aggregate, not an individual (ex. average ability
of students to perform a focused history and
physical)
5Consequences of evaluation
- Steering effect- exams drive the learning-
students study/learn for the exam - Impetus for change (feedback from students,
Executive Curriculum, LCME)
6Definitions- Reliability
- The consistency of a measurement over time or by
different observers- ( ex. a thermometer always
reads 98 degrees C when placed in boiling,
distilled water at sea level) - The proportion of variability in a score due to
the true difference between subjects (ex. The
difference between Greenwich time and the time on
your watch)- - Inter-rater reliability (correlation between
scores of 2 raters) - Internal reliability (correlation between items
within an exam)
7Definitions-Validity
- The ability to measure what was intended (the
thermometer reading is reliable but not valid) - Four types-
- Face/content
- Criterion
- Construct/predictive
- Internal
8Types of validity
- Face/content- Would experts agree that it
assesses whats important?-(drivers test
mirroring actual driving situation and
conditions) - Criterion- draw an inference from test scores to
actual performance. Ex. if a simulated drivers
test score predicts the road test score, the
simulation test is claimed to have a high degree
of criterion validity. - Construct/predictive- does it assess what it
intended to assess (ex. Drivers test as a
predictor of the likelihood of accidents- results
of your course exam predict the students
performance on that section of Step 1) - Internal- do other methods assessing the same
domain obtain similar results (similar scores
from multiple SPs assessing history taking
skills)
9Types of Evaluations- Formative and Summative
Definitions
- Formative evaluation- provide feedback so the
learner can modify their learning approach- When
the chef tastes the sauce, thats formative
evaluation - Summative evaluation- done to decide if a student
has met the minimum course requirements (pass or
fail)- usually judged against normative
standards- when the customer tastes the sauce,
thats summative evaluation
10Conclusions about formative assessments
- Stakes are lower (not determining passing or
failing, so lower reliability is tolerated) - Desire more information, so they may require
multiple modalities (it is rare for one
assessment method to identify all critical
domains) for validity and reliability - Use evaluation methods that support and reinforce
teaching modalities and steer students learning - May only identify deficiencies but not define how
to remediate
11Conclusions about summative assessments
- Stakes are higher- students may pass who are
incompetent or may fail and require remediation - Desire high reliability (gt0.8) so often require
multiple questions/problems or cases (20-30
stations/OSCE, 15-20 cases for oral
presentations, 700 questions for an MCQ) - Desire high content validity (single cases have
low content validity and are not representative) - Desire high predictive validity (correlation with
future performance), which is often hard to
achieve - Consider reliability, validity, benefit and cost
(resources, time and ) in determining the best
assessment tools
12How to Match Assessment to Goals and Teaching
Methods
- Define the type of learning (lecture, small
group, computer module/self study, etc) - Define the domain to be assessed (knowledge,
skill, behavior) and the level of performance
expected (knows, knows how, shows how or does) - Determine the type of feedback required
13Purpose of feedback
- For students To provide a good platform to
support and enhance student learning - For faculty To determine what works (what
facilitated learning and who were appropriate
role models) - For students and faculty To determine areas that
require improvement
14Types of Feedback
- Quantitative
- Total score compared to other students, providing
the high, low and mean score and minimum
requirement for passing grade - Qualitative
- Written personal feedback identifying areas of
strength and weakness - Oral feedback one on one or in a group to discuss
the areas of deficiency to help guide further
learning
15Evaluation Bias-Pitfall
- Can occur with any evaluation requiring
interpretation by an individual (all methods
other than MCQ) - Expectation bias (halo effect)- prior knowledge
or expectation of the outcome influences the
ratings (especially a global rating) - Audience effect- a learners performance is
influenced by the presence of an observer (seen
especially with skills and behaviors) - Rater traits- the training of the rater or the
raters traits affect the reliability of the
observation
16Types of assessment tools-Written
- Does not require an evaluator to be present
during the assessment and can be open or closed
book - Multiple choice question (MCQ)
- Modified short answer essay question (MEQ)-
Patient management problem is a variation of this - Essay
- Application test
- Medical note/chart review
17Types of assessment tools-Observer Dependent
Interaction
- Usually requires active involvement of an
assessor and occurs as a single event - Practical
- Medical record review
- Standardized Patient(s) (SP)
- Objective Structured Clinical Examination (OSCE)
- Oral examination- chart stimulated recall triple
jump or direct observation
18Types of assessment tools- Observer Dependent
Longitudinal Interaction
- Continual evaluation over time
- Preceptor evaluation either completion of a a
critical incident report or structured rating
form based on direct observation over time - Peer evaluation
- Self evaluation
19MCQ
- Definition A test composed of questions on
which each stem is followed by several
alternative answers. The examinee must select
the most correct answer. - Measures Knows and Knows how
- Pros Efficient cheap samples large content
domain (60 questions/hour) high reliability
easy objective scoring, direct correlate of
knowledge with expertise - Cons Often a recall of facts provides
opportunity for guessing (good test-taker)
unrealistic doesnt provide information about
the thought process encourages learning to
recall - Suggestions Create questions that can be
answered from the stem alone avoid always,
frequently, all or none randomly assign correct
answers can correct for guessing (penalty
formula)
20MEQ
- Definition A series of sequential questions in
a linear format based on an initial limited
amount of information. It requires immediate
short answers followed by additional information
and subsequent questions. (patient management
problem is a variation of this type) - Measures Knows and Knows how
- Pros Can assess problem solving, hypothesis
generation and data interpretation - Cons Low inter-case reliability less content
validity harder to administer time consuming to
grade and variable inter-rater reliability - Suggestions Use directed (not open ended)
questions provide extensive answer key
21Open ended essay question
- Definition Question allowing a student the
freedom to decide the topic to address and the
position to take- it can be take home - Measures Knows, Knows how
- Pros Assesses ability to think (generate ideas,
weigh arguments, organize information, build and
support conclusions and communicate thoughts
high face validity - Cons Low reliability time intensive to grade
narrow coverage of content - Suggestions strictly define the response and
the rating criteria
22Application test
- Definition Open book problem solving test
incorporating a variety of MCQs and MEQs. It
provides a description of a problem with data.
The examinee is asked to interpret the data to
solve the problem. (ex. Quiz item 3) - Measures Knows and knows how
- Pros Assesses higher learning good
face/content validity reasonable reliability
useful for formative and summative feedback - Cons Harder to create and grade
23Practical Exam
- Definition Hands on exam to demonstrate and
apply knowledge (ex. Culture and identify the
bacteria on the glove of a Sinai cafeteria
worker, or performance of a history and physical
on a patient) - Measures Know, knows how , and ? shows how and
does - Pros Can test multiple domains, actively
involves the learner (good steering effect) best
suited for procedural/technical skills higher
face validity - Cons Labor intensive (creation and grading)
hard to identify gold standard, so subjective
grading high rate of item failure (unanticipated
problems with administration) - Suggestions Pilot first adequate, specific
instructions and goals specific, defined
criteria for grading and train raters for direct
observation, require multiple encounters for
higher reliability
24Medical record/note review
- Definition Examiner reviews learners previously
created document can be random - Measures Knows how and Does
- Pros Can review multiple records for higher
reliability high face validity less costly than
oral (done without learner and at examiners
convenience) - Cons Lower inter-rater reliability less
immediate feedback unable to determine basis for
decisions - Suggestions Create a template with specific
ratings for skills
25Standardized Patients
- Definition Simulated patient/actor trained to
present history in reliable, consistent manner
and to use a checklist to assess students skills
and behaviors - Measures Knows, Knows how, Shows how and Does
- Pros High face validity can assess multiple
domains can be standardized can give immediate
feedback - Cons Costly labor intensive must use multiple
SPs for high reliability
26OSCE (Objective Structured Clinical Exam)
- Definition Task oriented, multi-station exam
stations can be 5-30 minutes and require written
answers or observation (ex. Take orthostatic VS
perform a cardiac exam smoking cessation
counseling read and interpret CXR or EKG
results communicate lab results and advise a
patient - Measures Knows, Knows how, Shows how and Does
27OSCE (Objective Structured Clinical Exam)
- Pros Assesses clinical competency tests a wide
range of knowledge, skills and behaviors can
give immediate feedback good test-retest
reliability good content and construct validity
less patient and examiner variability than with
direct observation - Cons Costly (manpower and ) case specific
requires gt 20 stations for internal consistency
weaker criterion validity
28Oral Examination
- Definition Method of evaluating a learners
knowledge by asking a series of questions. The
process is open ended with the examiner directing
the questions. (ex. chart stimulated patient
recall or a triple jump) - Measures Knows, Knows how, sometimes Shows how
and does
29Oral Exam
- Pros Can measure clinical judgement,
interpersonal skills (communication) and
behavior high face validity flexible can
provide direct feedback - Cons Poor inter-rater reliability (dove vs hawk
and observer bias) content specific so low
reliability (must use gt 6 cases to increase
reliability) labor intensive - Suggestions multiple short cases define
questions and answers provide simple rating
scales and train raters
30Triple Jump
- Definition Three step written and oral exam-
written, research and then oral part- (ex.
COMPASS 1) - Measures Knows, knows how, shows how and does
- Pros Assesses hypothesis generation, use of
resources, application of knowledge to problem
solve and self directed learning provides
immediate feedback high face validity - Cons only for formative assessment (poor
reliability) time/faculty intensive too content
specific and inconsistent rater evaluations
31Clinical Observations
- Definition Assessment of various domains
longitudinally by an observer- either preceptor,
peer or self (small group evaluations during
first two years and preceptor ratings during
clinical exposure) - Measures Knows, knows how, Shows how and Does
- Pros Simple efficient high face validity
formative and summative
32Clinical Observations
- Cons low reliability (only recent encounters
often influence grade) halo effect (lack of
domain discrimination) more often a judgement of
personality and Lake Woebegone effect (all
students are necessarily above average)
unwillingness to document negative ratings (fear
of failing someone) - Suggestions Frequent ratings and feedback
increase the number of observations multiple
assessors (with group discussion about specific
ratings)
33Peer/Self Evaluation
- Pros Useful for formative feedback
- Cons Lack of correlation with faculty
evaluations same cons as others (measure of
nice guy, low reliability, halo effect- peer
evaluations have friend effect or fear of
retribution or desire to penalize - Suggestions limit the of behaviors assessed
clarify the difference between evaluation of
professional and personal aspects develop
operationally proven criteria for rating provide
multiple opportunities for students to do this
and provide feedback from faculty
34Erica Friedmans Educational Pyramid
Direct Observation, Practical
Does
OSCE, T Jump Oral, SP, Practical Chart review
Shows How
MEQ,Essay
Knows How
MCQ
Knows
35(No Transcript)
36Critical factors for choosing an evaluation tool
- Type of evaluation and feedback desired
formative/summative - Focus of evaluation
- Knowledge, skills, behaviors (attitudes)
- Level of evaluation
- Know, Knows how, Shows how, Does
- Pros/Cons
- Validity, Reliability, Cost (time, resources)
37How to be successful
- Students should be clear about the
course/clerkship goals and the specifics about
the types of assessments used and the criteria
for passing (and if relevant, just short of
honors and honors) - Make sure the choice of assessments is consistent
with the values of your course and the school - Final judgments about students progress should
be based on multiple assessments using a variety
of methods over a period of time (instead of one
time point)
38Number of courses or clerkships using a specific
assessment tool-assessing our assessment methods
39Why assess ourselves?
- Assure successful completion of our course goals
and objectives - Assure integration with the mission of the school
- Direct our teaching/learning-(determine what
worked and what needs changing)
40How we currently assess ourselves
- Student evaluations (quantitative and
qualitative)- most often summative - Performance of students on our exam and specific
sections of USMLE - Focus and feedback groups (formative and
currently done by Deans office) - Peer evaluations of course/clerkship- by ECC
- Self evaluations- yearly grid completed by course
directors and core faculty - Consider peer evaluation of teaching and teaching
materials