Title: Assessment of Education Outcomes
1Assessment of Education Outcomes
2Resource ACGME Toolbox of Assessment Methods
3ACGME and Curriculum Design
- Focus on competency based education
- Competency-based education focuses on learner
performance (learning outcomes) in reaching
specific objectives (goals and objectives of the
curriculum).
4ACGME Requires That
- Learning opportunities in each competency domain
- Evidence of multiple assessment methods
- Use of aggregate data to improve the educational
program
5What are the competencies?
- Medical Knowledge
- Patient Care
- Practice Based Learning and Improvement
- Systems Based Practice
- Professionalism
- Interpersonal and Communication Skills
6Glossary of Terms--Reliability/Reproducibility
- when scores on a given test are consistent with
prior scores for same or similar individuals. - Measured as a correlation with 1.0 being perfect
reliability and 0.5 being unreliable.
7Glossary of Terms Validity
- How well the assessment measures represent or
predict a residents ability or behavior - It is the scores and not the kind of test that is
validi.e. it is possible to determine whether
the written exam score for a group of residents
is valid, but incorrect to say that all written
exams are valid
8Glossary of Terms Generalizable
- Measurements (scores) derived from an assessment
tool are considered generalizable if they can
apply to more than the sample of cases or test
questions used in a specific assessment
9Glossary of TermsTypes of Evaluation
- Formative intended to provide constructive
feedbacknot intended to make a go/no-go decision - Summativedesigned to accumulate all evaluations
into a go/no-go decision
10360 Degree Evaluation Instrument
- Measurement tools completed by multiple people in
a persons sphere of influence. - Most using a rating scale of 1-5, with 5 meaning
all the time and 1 meaning never - Evaluators provide more accurate and less lenient
ratings when evaluation is used for formative
rather than summative evaluation
11360 Degree Evaluation Instrument
- Published reports of use are very limited.
- Reports of various categories of people
evaluating residents at same time with different
instruments. - Reproducible results were most easily obtainable
when 5-10 nurses rated residents, whereas greater
number of faculty and patients were necessary for
same degree of reliability. - Higher reliability seen in military and education
settings.
12360 Degree Evaluation Instrument
- Two practical challenges
- Constructing surveys that are appropriate for use
by variety of evaluators - Orchestrating data collection from large number
of individuals - Use of electronic database is helpful in
collecting these data
13Chart Stimulated Recall
- Examination where patient cases of the resident
are assessed in a standardized oral examination - Trained physician examiner questions the examinee
about the care provided probing for reasons
behind the workup, diagnoses, interpretation, and
treatment plans - CSR takes 5-10 minutes per patient case
14Chart Stimulated Recall
- Cases are chosen to be samples of patients
examinee should be able to manage - Scores are derived based on predefined scoring
rules. - Examinees performance is determined by combining
scores from all cases for a pass/fail decision
overall or by each session.
15Chart Stimulated Recall
- Exam score reliability reported between 0.65 and
0.88. - Physician examiners need to be trained in how to
question examinee and score the responses. - Mock Orals can use residents cases with less
standardization to help familiarize residents of
the upcoming orals - CSR oral exams require resources and expertise to
fairly text competency and accurately standardize
the exam.
16Checklist Evaluation
- Consist of essential or desired specific
behaviors - Typical response options are check boxes or yes
to indicate that the behavior occurred - Forms provide information for purpose of making a
judgment regarding adequacy of overall performance
17Checklist Evaluation
- Useful for evaluating a competency that can be
broken down into specific individual behaviors. - Checklists have been shown to be useful to
demonstrate specific clinical skills, procedural
skills, history taking and physical examination
18Checklist Evaluation
- When users are trained, reliability is in the 0.7
to 0.8 range. - To ensure validity, checklists require consensus
by several experts - Require trained evaluators.
19Global Rating of Live or Recorded Performance
- Rater judges general rather than specific skills
(clinical judgment, medical knowledge) - Judgments made retrospectively based on general
impressions made over time - All rating forms have some scale on which the
resident is rated - Written comments are important to allow evaluator
to explain rating
20Global Rating of Live or Recorded Performance
- Most often used to rate resident at end of
rotation and summary statements over days or
weeks - Scores can be highly subjective
- Sometimes all competencies are rated the same in
spite of variable performance - Some scores biased when reviewers refuse to use
extreme ends of the scale to avoid being harsh or
extreme
21Global Rating of Live or Recorded Performance
- More skilled physicians give more reproducible
ratings than physicians with less experience. - Faculty give more lenient ratings than residents
- Training of raters important for reproducibility
of the results.
22Objective structured clinical exam (OSCE)
- One or more assessment tools are administered
over 12-20 separate patient encounter stations. - All candidates move from station to station in a
set sequence, and with similar time constraints.
- Standardized patients are the primary evaluation
tool in OSCE exams
23Objective structured clinical exam (OSCE)
- Useful to measure in a standardized manner
patient/doctor encounters - Not useful to measure outcomes of continuity care
or procedural outcomes - Separate performance score tallied for each
station, combined for a global score - OSCE with 14 to 18 stations has been recommended
to obtain reliable measures of performance
24Objective structured clinical exam (OSCE)
- Very useful to measure specific skills
- Very difficult to administer
- Most cost-effective with large programs
25Procedural, operative or case logs
- Document each patient encounter
- Logs may or may not include numbers of cases,
details may vary from log to log - There is no known study looking at procedure logs
and outcomes - Electronic databases make storing these data
feasible
26Patient Surveys
- Surveys about patient experience often include
questions about physician care such as amount of
time spent, overall quality of care, competency,
courtesy, empathy and interest - Rated according to a scale or yes or no to
statements such as the doctor kept me waiting
27Patient Surveys
- Reliability estimates of 0.9 or greater have been
achieved for patient satisfaction survey forms
used in hospitals and clinics - Much lower reliability for rating of residents in
range of 0.7-0.82 using an American Board of
Medicine Patient Satisfaction Questionnaire - Use of rating scales such as yes, definitely, yes
somewhat or no may produce more reproducible
results
28Patient Surveys
- Available from commercial developers and medical
organizations - Focus on desirable and undesirable physician
behaviors - Can be filled out quickly
- Difficulty with language barriers
- Difficulty obtaining enough per-resident survey
to provide reproducible results
29Portfolios
- Collection of products prepared by the resident
that provides evidence of learning and
achievement related to a learning plan. - Can include written documents, video and audio
recordings, photographs and other forms of
information - Reflection on what has been learned important
part of constructing a portfolio
30Portfolios
- Can be used for both summative and formative
evaluation - Most useful to evaluate master of competencies
that are difficult to master in other ways such
as practice-based improvement and use of
scientific evidence in patient care
31Portfolios
- Reproducible assessments are feasible when
agreement on criteria and standards for a
portfolio - Can be more useful to assess an educational
program than an individual - May be counterproductive when standard criteria
are used to demonstrate individual learning gains
relative to individual goals - Validity is determined by extent to which
products or documentation included demonstrates
mastery of expected learning
32Record Review
- Trained staff at institution review medical
records and abstract information such as
medications, tests ordered, procedures performed,
and patient outcomes. - Records are summarized and compared to accepted
patient care standards. - Standards of care exist for more than 1600
diseases on the website of the Agency for
HealthCare Research and Quality
33Record Review
- Sample of 8-10 patient records is sufficient for
a reliable assessment of care for a diagnosis or
procedure - Fewer necessary if chosen at random
- Missing or incomplete documentation is
interpreted as not meeting the accepted standard
34Record Review
- Take 20-30 minutes per record on average.
- Need to see certain number of patients with a
given diagnosis which can delay reports - Criteria of care must be agreed upon
- Staff training regarding identifying and coding
information is critical
35Simulation and Models
- Use to assess performance through experiences
that closely resemble reality and imitate but do
not duplicate the real clinical problem - Allow examinees to reason through a clinical
problem with little or no cueing - Permit examinees to make life-threatening errors
without hurting a real patient - Provide instant feedback
36Simulation and Models--Types
- Paper and pencil patient branching problems
- Computerized clinical case simulations
- Role playing situations standardize patients
- Anatomical models and mannequins
- Virtual reality combines computers and sometimes
mannequinsgood to assess procedural competence
37Simulation and Models--Use
- Used to train and assess surgeons doing
arthroscopy - Major wound debridement
- Anesthesia training for life threatening critical
incidents during surgery - Cardiopulmonary incidents
- Written and computerized simulation test
reasoning and development of diagnostic plans
38Simulation and Models-
- Studies have demonstrated content validity for
high quality simulation designed to resemble real
patients. - One or more scores are derived from each
simulation based on preset scoring rules from
experts in the discipline - Examinees performance determined by combining
scores to derive overall performance score - Can be part of an OSCE
- Expensive to createmany grants and contracts
available to develop these
39Standardized Oral Exams
- Uses realistic patient cases with a trained
physician examiner questioning the examinee - Clinical problem presented as a scenario
- Questions probe the reasoning for requesting
clinical tests, interpretation of findings and
treatment plans - Exams last 90 minutes to 2 ½ hours
- 1-2 physicians serve as examiners
40Standardized Oral Exams
- Test clinical decision making with real-life
scenarios - 15 of 24 ABMS Member Boards use standardized oral
exams as final examination for initial
certification - Committee of experts in specialty carefully craft
the scenarios - Focus on assessment of key features of the case
- Exam score reliability is between 0.65 and 0.88
41Standardized Oral Exams
- Examiners need to be well trained for exams to be
reliable - Mock orals can be used to prepare but are much
less standardized - Extensive resources and expertise to develop and
administer a standardized oral exam
42Standardized Patient Exam
- Standardized patients are well persons trained to
simulate a medical condition in a standardized
way - Exam consists multiple SPs each presenting a
different condition in a 10-12 minute patient
encounter - Performance criteria are set in advance
- Included as stations in the OSCE
43Standardized Patient Exam
- Used to assess history-taking skills, physical
exam skills, communication skills, differential
diagnosis, laboratory utilization, and treatment - Reproducible scores are more readily obtained for
history taking, physical exam and communication
skills - Most often used as a summative performance exam
for clinical skills - A single SP can assess targeted skills and
knowledge
44Standardized Patient Exam
- Standardized patient exams can generate reliable
scores for individual stations - Training of raters is critical
- Takes at least a half-day to test to obtain
reliable scores for hands-on skills - Research on validity has found better performance
by senior than junior residents (construct
validity) and modest correlations between SP
exams and clinical ratings or written exams
(concurrent validity)
45Standardized Patient Exam
- Development and implementation take a lot of
resources - Can be more efficient when sharing SPs in
multiple residency programs - Need large facility with multiple exam rooms for
each station
46Written Exams
- Usually made up of multiple choice questions
- Each contains an introductory statement followed
by four or five options - The examine selects one of the options as the
presumed correct answer by marking the option on
a coded answer sheet - In training exam is an example of this format
- Typical half-day exam has 175-250 test questions
47Written Exams
- Medical knowledge and understanding can be
measured. - Comparing test scores with national statistics
can serve to identify strengths and limitations
of individual residents to help improvement - Comparing test results aggregated for residents
each year an help identify residency training
experiences that might be improved
48Written Exams
- Committee of experts designs the test and agrees
on the knowledge to be assessed - Creates a test blueprint for the number of test
questions for each topic - When tests are used to make pass/fail decisions,
test should be piloted and statistically analyzed - Standards for passing should be set by a
committee of experts prior to administering the
exam
49Written Exams
- If performance is compared from year to year, at
least 20-30 percent of the same test questions
should be repeated each year - For in training exams, each residency administers
exam purchased from a vendor - Tests are scored by the vendor and scores
returned to the residency director - Comparable national scores provided
- All 24 ABMS Member boards use MCQ exams for
initial certification
50Use of These Tools in Medical Education
- Field is changing
- Technology will provide new opportunities,
particularly in simulating and assessing medical
problems - ACGME is requiring programs to use multiple valid
tools to assess resident performance