The Value in Evaluation - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

The Value in Evaluation

Description:

Steering effect- exams 'drive the learning'- students study/learn for the exam ... that support and reinforce teaching modalities and steer students' learning ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 41
Provided by: mssm
Category:
Tags: evaluation | value

less

Transcript and Presenter's Notes

Title: The Value in Evaluation


1
The Value in Evaluation
  • Erica Friedman
  • Assistant Dean of Evaluation
  • MSSM

2
DOES
Longitudinal Clinical Observation
OSCE
SHOWS HOW
SP
Practical
Oral
Application
Essay
Note review
KNOWS HOW
MEQ
MCQ
KNOWS
Written
Observed
FORMATIVE
SUMMATIVE
3
Session Goals
  • Understand the purposes of assessment
  • Understand the framework for selecting and
    developing assessment methods
  • Recognize the benefits and limitations of
    different methods of assessment
  • Conference Objectives
  • Review the goals and objectives for your course
    or clerkship in the context of assessment
  • Identify the best methods of assessing your goals
    and objectives

4
Purpose of Evaluation
  • To certify individual competence
  • To assure successful completion of
    goals/objectives
  • To provide feedback
  • To students
  • To faculty, course and clerkship directors
  • As a statement of values (what is most critical
    to learn)
  • For Program Evaluation- evaluation of an
    aggregate, not an individual (ex. average ability
    of students to perform a focused history and
    physical)

5
Consequences of evaluation
  • Steering effect- exams drive the learning-
    students study/learn for the exam
  • Impetus for change (feedback from students,
    Executive Curriculum, LCME)

6
Definitions- Reliability
  • The consistency of a measurement over time or by
    different observers- ( ex. a thermometer always
    reads 98 degrees C when placed in boiling,
    distilled water at sea level)
  • The proportion of variability in a score due to
    the true difference between subjects (ex. The
    difference between Greenwich time and the time on
    your watch)-
  • Inter-rater reliability (correlation between
    scores of 2 raters)
  • Internal reliability (correlation between items
    within an exam)

7
Definitions-Validity
  • The ability to measure what was intended (the
    thermometer reading is reliable but not valid)
  • Four types-
  • Face/content
  • Criterion
  • Construct/predictive
  • Internal

8
Types of validity
  • Face/content- Would experts agree that it
    assesses whats important?-(drivers test
    mirroring actual driving situation and
    conditions)
  • Criterion- draw an inference from test scores to
    actual performance. Ex. if a simulated drivers
    test score predicts the road test score, the
    simulation test is claimed to have a high degree
    of criterion validity.
  • Construct/predictive- does it assess what it
    intended to assess (ex. Drivers test as a
    predictor of the likelihood of accidents- results
    of your course exam predict the students
    performance on that section of Step 1)
  • Internal- do other methods assessing the same
    domain obtain similar results (similar scores
    from multiple SPs assessing history taking
    skills)

9
Types of Evaluations- Formative and Summative
Definitions
  • Formative evaluation- provide feedback so the
    learner can modify their learning approach- When
    the chef tastes the sauce, thats formative
    evaluation
  • Summative evaluation- done to decide if a student
    has met the minimum course requirements (pass or
    fail)- usually judged against normative
    standards- when the customer tastes the sauce,
    thats summative evaluation

10
Conclusions about formative assessments
  • Stakes are lower (not determining passing or
    failing, so lower reliability is tolerated)
  • Desire more information, so they may require
    multiple modalities (it is rare for one
    assessment method to identify all critical
    domains) for validity and reliability
  • Use evaluation methods that support and reinforce
    teaching modalities and steer students learning
  • May only identify deficiencies but not define how
    to remediate

11
Conclusions about summative assessments
  • Stakes are higher- students may pass who are
    incompetent or may fail and require remediation
  • Desire high reliability (gt0.8) so often require
    multiple questions/problems or cases (20-30
    stations/OSCE, 15-20 cases for oral
    presentations, 700 questions for an MCQ)
  • Desire high content validity (single cases have
    low content validity and are not representative)
  • Desire high predictive validity (correlation with
    future performance), which is often hard to
    achieve
  • Consider reliability, validity, benefit and cost
    (resources, time and ) in determining the best
    assessment tools

12
How to Match Assessment to Goals and Teaching
Methods
  • Define the type of learning (lecture, small
    group, computer module/self study, etc)
  • Define the domain to be assessed (knowledge,
    skill, behavior) and the level of performance
    expected (knows, knows how, shows how or does)
  • Determine the type of feedback required

13
Purpose of feedback
  • For students To provide a good platform to
    support and enhance student learning
  • For faculty To determine what works (what
    facilitated learning and who were appropriate
    role models)
  • For students and faculty To determine areas that
    require improvement

14
Types of Feedback
  • Quantitative
  • Total score compared to other students, providing
    the high, low and mean score and minimum
    requirement for passing grade
  • Qualitative
  • Written personal feedback identifying areas of
    strength and weakness
  • Oral feedback one on one or in a group to discuss
    the areas of deficiency to help guide further
    learning

15
Evaluation Bias-Pitfall
  • Can occur with any evaluation requiring
    interpretation by an individual (all methods
    other than MCQ)
  • Expectation bias (halo effect)- prior knowledge
    or expectation of the outcome influences the
    ratings (especially a global rating)
  • Audience effect- a learners performance is
    influenced by the presence of an observer (seen
    especially with skills and behaviors)
  • Rater traits- the training of the rater or the
    raters traits affect the reliability of the
    observation

16
Types of assessment tools-Written
  • Does not require an evaluator to be present
    during the assessment and can be open or closed
    book
  • Multiple choice question (MCQ)
  • Modified short answer essay question (MEQ)-
    Patient management problem is a variation of this
  • Essay
  • Application test
  • Medical note/chart review

17
Types of assessment tools-Observer Dependent
Interaction
  • Usually requires active involvement of an
    assessor and occurs as a single event
  • Practical
  • Medical record review
  • Standardized Patient(s) (SP)
  • Objective Structured Clinical Examination (OSCE)
  • Oral examination- chart stimulated recall triple
    jump or direct observation

18
Types of assessment tools- Observer Dependent
Longitudinal Interaction
  • Continual evaluation over time
  • Preceptor evaluation either completion of a a
    critical incident report or structured rating
    form based on direct observation over time
  • Peer evaluation
  • Self evaluation

19
MCQ
  • Definition A test composed of questions on
    which each stem is followed by several
    alternative answers. The examinee must select
    the most correct answer.
  • Measures Knows and Knows how
  • Pros Efficient cheap samples large content
    domain (60 questions/hour) high reliability
    easy objective scoring, direct correlate of
    knowledge with expertise
  • Cons Often a recall of facts provides
    opportunity for guessing (good test-taker)
    unrealistic doesnt provide information about
    the thought process encourages learning to
    recall
  • Suggestions Create questions that can be
    answered from the stem alone avoid always,
    frequently, all or none randomly assign correct
    answers can correct for guessing (penalty
    formula)

20
MEQ
  • Definition A series of sequential questions in
    a linear format based on an initial limited
    amount of information. It requires immediate
    short answers followed by additional information
    and subsequent questions. (patient management
    problem is a variation of this type)
  • Measures Knows and Knows how
  • Pros Can assess problem solving, hypothesis
    generation and data interpretation
  • Cons Low inter-case reliability less content
    validity harder to administer time consuming to
    grade and variable inter-rater reliability
  • Suggestions Use directed (not open ended)
    questions provide extensive answer key

21
Open ended essay question
  • Definition Question allowing a student the
    freedom to decide the topic to address and the
    position to take- it can be take home
  • Measures Knows, Knows how
  • Pros Assesses ability to think (generate ideas,
    weigh arguments, organize information, build and
    support conclusions and communicate thoughts
    high face validity
  • Cons Low reliability time intensive to grade
    narrow coverage of content
  • Suggestions strictly define the response and
    the rating criteria

22
Application test
  • Definition Open book problem solving test
    incorporating a variety of MCQs and MEQs. It
    provides a description of a problem with data.
    The examinee is asked to interpret the data to
    solve the problem. (ex. Quiz item 3)
  • Measures Knows and knows how
  • Pros Assesses higher learning good
    face/content validity reasonable reliability
    useful for formative and summative feedback
  • Cons Harder to create and grade

23
Practical Exam
  • Definition Hands on exam to demonstrate and
    apply knowledge (ex. Culture and identify the
    bacteria on the glove of a Sinai cafeteria
    worker, or performance of a history and physical
    on a patient)
  • Measures Know, knows how , and ? shows how and
    does
  • Pros Can test multiple domains, actively
    involves the learner (good steering effect) best
    suited for procedural/technical skills higher
    face validity
  • Cons Labor intensive (creation and grading)
    hard to identify gold standard, so subjective
    grading high rate of item failure (unanticipated
    problems with administration)
  • Suggestions Pilot first adequate, specific
    instructions and goals specific, defined
    criteria for grading and train raters for direct
    observation, require multiple encounters for
    higher reliability

24
Medical record/note review
  • Definition Examiner reviews learners previously
    created document can be random
  • Measures Knows how and Does
  • Pros Can review multiple records for higher
    reliability high face validity less costly than
    oral (done without learner and at examiners
    convenience)
  • Cons Lower inter-rater reliability less
    immediate feedback unable to determine basis for
    decisions
  • Suggestions Create a template with specific
    ratings for skills

25
Standardized Patients
  • Definition Simulated patient/actor trained to
    present history in reliable, consistent manner
    and to use a checklist to assess students skills
    and behaviors
  • Measures Knows, Knows how, Shows how and Does
  • Pros High face validity can assess multiple
    domains can be standardized can give immediate
    feedback
  • Cons Costly labor intensive must use multiple
    SPs for high reliability

26
OSCE (Objective Structured Clinical Exam)
  • Definition Task oriented, multi-station exam
    stations can be 5-30 minutes and require written
    answers or observation (ex. Take orthostatic VS
    perform a cardiac exam smoking cessation
    counseling read and interpret CXR or EKG
    results communicate lab results and advise a
    patient
  • Measures Knows, Knows how, Shows how and Does

27
OSCE (Objective Structured Clinical Exam)
  • Pros Assesses clinical competency tests a wide
    range of knowledge, skills and behaviors can
    give immediate feedback good test-retest
    reliability good content and construct validity
    less patient and examiner variability than with
    direct observation
  • Cons Costly (manpower and ) case specific
    requires gt 20 stations for internal consistency
    weaker criterion validity

28
Oral Examination
  • Definition Method of evaluating a learners
    knowledge by asking a series of questions. The
    process is open ended with the examiner directing
    the questions. (ex. chart stimulated patient
    recall or a triple jump)
  • Measures Knows, Knows how, sometimes Shows how
    and does

29
Oral Exam
  • Pros Can measure clinical judgement,
    interpersonal skills (communication) and
    behavior high face validity flexible can
    provide direct feedback
  • Cons Poor inter-rater reliability (dove vs hawk
    and observer bias) content specific so low
    reliability (must use gt 6 cases to increase
    reliability) labor intensive
  • Suggestions multiple short cases define
    questions and answers provide simple rating
    scales and train raters

30
Triple Jump
  • Definition Three step written and oral exam-
    written, research and then oral part- (ex.
    COMPASS 1)
  • Measures Knows, knows how, shows how and does
  • Pros Assesses hypothesis generation, use of
    resources, application of knowledge to problem
    solve and self directed learning provides
    immediate feedback high face validity
  • Cons only for formative assessment (poor
    reliability) time/faculty intensive too content
    specific and inconsistent rater evaluations

31
Clinical Observations
  • Definition Assessment of various domains
    longitudinally by an observer- either preceptor,
    peer or self (small group evaluations during
    first two years and preceptor ratings during
    clinical exposure)
  • Measures Knows, knows how, Shows how and Does
  • Pros Simple efficient high face validity
    formative and summative

32
Clinical Observations
  • Cons low reliability (only recent encounters
    often influence grade) halo effect (lack of
    domain discrimination) more often a judgement of
    personality and Lake Woebegone effect (all
    students are necessarily above average)
    unwillingness to document negative ratings (fear
    of failing someone)
  • Suggestions Frequent ratings and feedback
    increase the number of observations multiple
    assessors (with group discussion about specific
    ratings)

33
Peer/Self Evaluation
  • Pros Useful for formative feedback
  • Cons Lack of correlation with faculty
    evaluations same cons as others (measure of
    nice guy, low reliability, halo effect- peer
    evaluations have friend effect or fear of
    retribution or desire to penalize
  • Suggestions limit the of behaviors assessed
    clarify the difference between evaluation of
    professional and personal aspects develop
    operationally proven criteria for rating provide
    multiple opportunities for students to do this
    and provide feedback from faculty

34
Erica Friedmans Educational Pyramid
Direct Observation, Practical
Does
OSCE, T Jump Oral, SP, Practical Chart review
Shows How
MEQ,Essay
Knows How
MCQ
Knows
35
(No Transcript)
36
Critical factors for choosing an evaluation tool
  • Type of evaluation and feedback desired
    formative/summative
  • Focus of evaluation
  • Knowledge, skills, behaviors (attitudes)
  • Level of evaluation
  • Know, Knows how, Shows how, Does
  • Pros/Cons
  • Validity, Reliability, Cost (time, resources)

37
How to be successful
  • Students should be clear about the
    course/clerkship goals and the specifics about
    the types of assessments used and the criteria
    for passing (and if relevant, just short of
    honors and honors)
  • Make sure the choice of assessments is consistent
    with the values of your course and the school
  • Final judgments about students progress should
    be based on multiple assessments using a variety
    of methods over a period of time (instead of one
    time point)

38
Number of courses or clerkships using a specific
assessment tool-assessing our assessment methods
39
Why assess ourselves?
  • Assure successful completion of our course goals
    and objectives
  • Assure integration with the mission of the school
  • Direct our teaching/learning-(determine what
    worked and what needs changing)

40
How we currently assess ourselves
  • Student evaluations (quantitative and
    qualitative)- most often summative
  • Performance of students on our exam and specific
    sections of USMLE
  • Focus and feedback groups (formative and
    currently done by Deans office)
  • Peer evaluations of course/clerkship- by ECC
  • Self evaluations- yearly grid completed by course
    directors and core faculty
  • Consider peer evaluation of teaching and teaching
    materials
Write a Comment
User Comments (0)
About PowerShow.com