The Science and Art of Exam Development

About This Presentation

Title:

The Science and Art of Exam Development

Description:

The Science and Art of Exam Development Paul E. Jones, PhD Thomson Prometric What is validity and how do I know if my test has it? Validity Validity refers to the ... – PowerPoint PPT presentation

Number of Views:336

Avg rating:3.0/5.0

Slides: 49

Provided by: pjo268

Category:

more less

Transcript and Presenter's Notes

Title: The Science and Art of Exam Development

1
The Science and Art of Exam Development
Paul E. Jones, PhD Thomson Prometric
2
What is validity and how do I know if my test has
it?
3
Validity

Validity refers to the degree to which evidence
and theory support the interpretations of test
scores entailed by the proposed uses of tests.
Validity is, therefore, the most fundamental
considerations in developing and evaluating
tests. (APA Standards, 1999, p. 9)

4
A test may yield valid judgments about people

If it measures the domain it was defined to
measure.
If the test items have good measurement
properties.
If the test scores and the pass/fail decisions
are reliable.
If alternate forms of the test are on the same
scale.
If you apply defensible judgment criteria.
if you allow enough time for competent (but not
necessarily speedy) candidates to take the test.
If it is presented to the candidate in a
standardized fashion, without environmental
distractions.
If the test taker is not cheating and the test
has not deteriorated.

5
Is this a Valid Test?

1. 4 - 3 _____ 6. 3 - 2 _____
2. 9 - 2 _____ 7. 8 - 7 _____
3. 4 - 4 _____ 8. 9 - 5 _____
4. 7 - 6 _____ 9. 6 - 2 _____
5. 5 - 1 _____ 10. 8 - 3 _____

6
The Validity Technical Quality of the Testing
System
Design
Item Bank
7
The Validity Argument is Part of the Testing
System
Design
Item Bank
8
How should I start a new testing initiative?
9
A Testing System Begins with Design
Design
Item Bank
10
Test Design Begins with Test Definition

Test Title
Credential Name
Test Purpose (This test will certify that the
successful candidate has important knowledge and
skills necessary to )
Intended Audience
Candidate Preparation
High-Level Knowledge and Skills Covered
Products or Technologies Addressed
Knowledge and Skills Assumed but Not Tested
Knowledge and Skills Related to the Test but Not
Tested
Borderline Candidate Description
Testing Methods
Test Organization
Test Stakeholders
Other Information

11
Test Definition Begins with Program Design
12
Test Definition Leads to Practice Analysis
13
Practice Analysis Leads to Test Objectives
14
Test Objectives are Embedded in a Blueprint
15
Once I have a blueprint, how do I develop
appropriate exam items?
16
The Testing System
Design
Item Bank
17
Creating Items
Content Characteristics
Response Modes Choose one
Content Options Choose Many
Text Graphics Audio Video Simulations Application
s
Item
Single M/C Multiple M/C Single PC Multiple
PC Drag Drop Brief FR Essay FR Simulation/App
Scoring
18
Desirable Measurement Properties of Items

Item-objective linkage
Appropriate difficulty
Discrimination
Interpretability

19
Item-Objective Linkage
20
Good Item Development Practices

SME writers in a social environment
Industry-accepted item writing principles
Item banking tool
Mentoring
Rapid editing
Group technical reviews

21
How can I gather and use data to develop an item
bank?
22
The Testing System
Design
Item Bank
23
Classical Item Analysis Difficulty and
Discrimination
24
Classical Option Analysis Good Item
n proportion discrim Q1 Q2
Q3 Q4 Q5
gt
25
Classical Option Analysis Problem Item
26
IRT Item Analysis Difficulty and Discrimination
27
Good IRT Model Fit
28
How can I assemble test forms from my item bank?
29
The Testing System
Design
Item Bank
30
Reliability

Reliability refers to the degree to which test
scores are free from errors of measurement. (APA
Standards, 1985, p. 19)

31
More Reliable Test
32
Less Reliable Test
33
How to Enhance Reliability When Assembling Test
Forms

Score reliability/generalizability
Select items with good measurement properties.
Present enough items.
Target items at candidate ability level.
Sample items consistently from across the content
domain (use a clearly-defined test blueprint).
Score dependability
Same as above.
Minimize differences in test difficulty.
Pass-Fail consistency
Select enough items.
Target items at the cut score.
Maintain same score distribution shape between
forms

34
Building Simultaneous Parallel Forms Using
Classical Theory
35
Building Simultaneous Parallel Forms Using IRT
36
What options do I have for setting the passing
score for my exam?
37
The Testing System
Design
Item Bank
38
Setting Cut Scores
Why not just set the cut score at 75 correct?
39
Setting Cut Scores
Why not just set the cut score so that 80 of the
candidates pass?
40
The logic of criterion-based cut score setting

Certain knowledge and skills are necessary for
practice.
The test measures an important subset of these
knowledge and skills, and thus readiness for
practice.
The passing cut score is such that those who
pass have a high enough level of mastery of the
KSJs to be ready for practice at the level
defined in the test definition, while those who
fail do not. (Kane, Crooks, and Cohen, 1997)

41
The Main Goal in Setting Cut Scores
Meeting the Goldilocks Criteria
We want the passing score to be neither too high
nor too low, but at least approximately, just
right.
Kane, Crooks, and Cohen, 1997, p. 8
42
Two General Approaches to Setting Cut Scores