Presented at CLEARs 23rd Annual Conference - PowerPoint PPT Presentation

1 / 62

About This Presentation

Title:

Presented at CLEARs 23rd Annual Conference

Description:

Presented at CLEAR's 23rd Annual Conference. Toronto, Ontario September, 2003 ... Do item authors and reviewers sign and understand non-disclosure agreements? ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 63

Provided by: RobS69

Category:

more less

Transcript and Presenter's Notes

Title: Presented at CLEARs 23rd Annual Conference

1
Defending Your Licensing Examination Programme

Deborah Worrad
Registrar and Executive Director
College of Massage Therapists of Ontario

2
Critical Steps

Job Analysis Survey
Blueprint for Examination
Item Development Test Development
Cut Scores Scoring/Analysis
Security

3
Subject Matter ExpertsSelection

Broadly representative of the profession
Specialties of practice
Ethnicity
Age distribution
Education level

Gender distribution
Representation from newly credentialed
practitioners
Geographical distribution
Urban vs. rural practice locations
Practice settings

4
Job Analysis Survey

Provides
the framework for the examination development
a critical element for ensuring that valid
interpretations are made about an individuals
exam performance
a link between what is done on the job and how
candidates are evaluated for competency

5
Job Analysis Survey

Comprehensive survey of critical knowledge,
skills and abilities (KSAs) required by an
occupation
Relative importance, frequency and level of
proficiency of tasks must be established

6
Job Analysis Survey

Multiple sources of information should be used to
develop the KSAs
The survey must provide sufficient detail in
order to provide enough data to support exam
construction (blueprint)

7
Job Analysis Survey

Good directions
User friendly simple layout
Demographic information requested from
respondents
Reasonable rating scale
Pilot test

8
Job Analysis Survey

Survey is sent to either a representative sample
(large profession) or all members (small)
With computer technology the JAS can be done on
line saving costs associated with printing and
mailing
Motivating members to complete the survey may be
necessary

9
Job Analysis Survey

Statistical analysis of results must include
elimination of outliers and respondents with
personal agendas
A final technical report with the data analysis
must be produced

10
Blueprint for Examination

An examination based on a JAS provides the
foundation for the programme content validity
The data from the JAS on tasks and KSAs critical
to effective performance is used to create the
examination blueprint
Subject Matter Experts review the blueprint to
confirm results from data analysis

11
Item Development

Items must fit the test blueprint and be properly
referenced
Principles of item writing must be followed and
the writers trained to create items that will
properly discriminate at an entry level
The writers must be demographically
representative of practitioners

12
Item Development

Item editing is completed by a team of Subject
Matter Experts (SMEs) for content review and
verification of accuracy
Items are converted to a second language at this
point if required
Items should be pre-tested with large enough
samples

13
Examination Psychometrics

Options
Computer adaptive model
Paper and pencil model with item response theory
(IRT) and pre-testing
Equipercentile equating using an embedded set of
items on every form for equating and establishing
a pass score

14
Test Development

Relationship between test specifications and
content must be logical and defensible
Test questions are linked to blueprint which is
linked to the JAS
Exam materials must be secure

15
Test Development

Elements of test development differ depending on
model you are using
Generally - develop a test form ensuring
Items selected meet statistical requirements
Items match the blueprint
No item cues another item
No repetition of same items

16
Cut Scores

Use an approved method to establish minimal
competence standards required to pass the
examination
This establishes the cut score (pass level)

17
Cut Scores

One method is the modified Angoff in which a SME
panel makes judgements about the minimally
competent candidates ability to answer each item
correctly
This is frequently used by testing programmes and
does not take too long to complete

18
Cut Scores

The SMEs provide an estimate of the proportion of
minimally competent candidates who would respond
correctly to each item
This process is completed for all items and an
average rating established for each item
Individual item rating data are analyzed to
establish the passing score

19
Scoring

Scoring must be correct in all aspects
Scanning
Error checks
Proper key
Quality control
Reporting

20
Scoring/Analysis

Test item analysis on item difficulty and item
discrimination must be conducted
Adopt a model of scoring appropriate for your
exam (IRT, equipercentile equating)
Must ensure that the passing scores are fair and
consistent eliminating the impact of varying
difficulty among forms

21
Scoring

Adopting a scaled score for reporting results to
candidates may be beneficial
Scaling scores facilitates the reporting of any
shifts in the passing point due to ease or
difficulty of a form
Cut scores may vary depending on the test form so
scaling enables reporting on a common scale

22
Security

For all aspects of the work related to
examinations, proper security procedures must be
followed including
Passwords and password maintenance
Programme software security
Back-ups
Encryption for email transmissions
Confidentiality agreements

23
Security

Exam administration security must include
Exam materials locked in fire proof vaults
Security of delivery of exam materials
Diligence in dealing with changes in technology
if computer delivery of the exam is used

24
Presentation Follow-up

Please pick up a handout from this presentation
-AND/OR-
Presentation materials will be posted on CLEARs
website

25
Defending Your Licensing Examination Program
With Data

Robert C. Shaw, Jr., PhD

26
The Defense Triangle
Test Score Use
Reliability
Criterion
Content
27
Content

Standard 14.14 (1999) The content domain to be
covered by a credentialing test should be defined
clearly and justified in terms of the importance
of the content . . .
We typically evaluate tasks along an
importance dimension or a significance dimension
that incorporates importance and frequency
extent dimension

28
Content

Task importance/significance scale points
4. Extremely
3. Very
2. Moderately
1. Not
Task extent scale point
0. Never Performed

29
Content

We cause each task to independently surpass
importance/significance and extent exclusion
rules
We do not composite task ratings
We are concerned about diluting tests with
relatively trivial content (high extent-low
importance) or including content that may be
unfair to test (low extent-high importance)

30
Content

Selecting a subset of tasks and labeling them
critical is only defensible when the original
list was reasonably complete
We typically ask task inventory respondents how
adequately the task list covered the job
completely, adequately, inadequately
We then calculate percentages of respondents who
selected each option

31
Content

Evaluate task rating consistency
Were the people consistent?
Intraclass correlation
Were tasks consistently rated within each content
domain?
Coefficient alpha

32
Content
33
Content

We typically ask task inventory respondents in
what percentages they would allocate items across
content areas to lend support to the structure of
the outline
I encourage a task force to explicitly follow
these results or follow the rank order
Because items are specified according to the
outline, we feel these results demonstrate
broader support for test specifications beyond
the task force

34
Content
What percentage of items would you allocate to
each content area?
35
Reliability

Test scores lack utility until one can show the
measurement scale is reasonably precise

36
Reliability

Test score precision is often expressed in terms
of
Kuder-Richardson Formula 20 (KR 20) when items
are dichotomously (i.e., 0 or 1) scored
Coefficient Alpha when items are scored on a
broader scale (e.g., 0 to 5)
Standard Error of Measurement

37
Reliability

Standard 14.15 (1999) Estimates of the
reliability of test-based credentialing decisions
should be provided.
Comment . . . Other types of reliability
estimates and associated standard errors of
measurement may also be useful, but the
reliability of the decision of whether or not to
certify is of primary importance

38
Reliability

Decision Consistency Index

39
Criterion

The criterion to which test scores are related
can be represented by two planks

40
Criterion

Most programs rely on the minimal competence
criterion expressed in a passing point study

41
Criterion

Judges expectations are expressed through
text describing minimally competent practitioners
item difficulty ratings
We calculate an intraclass correlation to focus
on the consistency with which judges gave
ratings
We find confidence intervals around the mean
rating

42
Criterion

We use the discrimination value to look for
aberrant behavior from judges

43
Criterion
Mean of judges ratings
Passing score
44
Criterion

One of my clients was sued in 1975
In spite of evidence linking test content to a
1973 role delineation study, the court would not
dismiss the case
Issues that required defense were
discrimination or adverse impact from of test
score use
job-relatedness of test scores

45
Criterion

Only after a criterion-related validation study
was conducted was the suit settled

46
Criterion

Theoretic model of these studies

Supervisor Rating Inventory
Critical Content
Test
Correlation of Ratings and Test Scores
47
Criterion

Test Bias Study
Compare regression lines of job performance from
test scores for focal and comparator groups
There are statistical procedures available to
determine whether slopes and intercepts
significantly differ
Differences in mean scores are not necessarily a
critical indicator

48
The Defense Triangle
Test Score Use
Reliability
Criterion
Content
49
Presentation Follow-up

Presentation materials will be posted on CLEARs
website
rshaw_at_goamp.com

50
Defending Your Program Strengthening Validity in
Existing Examinations

Ron Rodgers, Ph.D.
Director of Measurement Services
Continental Testing Services (CTS)

51
Waht Can Go Wrong?

Job/practice analysis test specs
Item development documentation
Test assembly procedures controls
Candidate information before after
Scoring accuracy item revalidation
Suspected cheating candidate appeals
Practical exam procedures scoring

52
Job Analysis Test Specs

Undocumented (or no) job analysis
Embedded test specifications
Unrepresentative populations for job analysis or
pilot testing
Misuse of trial forms and data to support
live examinations

53
Item Development

Do item authors and reviewers sign and understand
non-disclosure agreements?
How does each question reflect job analysis
results and test specifications?
Should qualified candidates be able to answer Qs
correctly with information available during the
examination?

54
Item Development

Do any questions offer cues that answer other
questions on an exam?
Do item patterns offer cues to marginally
qualified, test-savvy candidates?
Is longest answer always correct?
If None of the above or All of the above Qs are
used, are these always correct?
True-False questions with clear patterns?
Do other detectable patterns cue answers?

55
Item Documentation

Are all Qs supported by references cited for and
available to all candidates?
Do any questions cite item authors or committee
members as references?
Are page references cited for each Q?
Are citations updated as new editions of each
reference are published?

56
Candidate Information

Are all references identified to and equally
available to all candidates?
Are content outlines for each test provided to
help candidates prepare?
Are sample Qs given to all candidates?
Are candidates told what they must/may bring and
use during the examination?

57
Test Assembly Controls

Are parallel forms assembled to be of
approximately equal difficulty?
Is answer key properly balanced?
Approx. equal numbers of each option
Limit consecutive Qs with same answer
Avoid repeated patterns of responses
Avoid long series of Qs without an option

58
Suspected Cheating

Is potential cheating behavior at the test site
clearly defined for onsite staff?
Are candidates informed of possible consequences
of suspected cheating?
Are staff trained to respond fairly and
appropriately to suspected cheating?
Are procedures in place to help staff
document/report suspected cheating?

59
Scoring Controls

How is accuracy of answer key verified?
Do item analyses show any anomalies in candidate
performance on test?
Are oddly performing Qs revalidated?
Identify ambiguities in sources or Qs
Verify that each Q has one right answer
Give credit to all candidates when needed
Are scoring adjustments applied fairly?
Are rescores/refunds issued as needed?

60
Candidate Appeals

How do candidates request rescoring?
Do policies allow cancellation of scores when
organized cheating is found?
Harvested Qs on websites, in print
Are appeal procedures available?
Are appeal procedures explained?
How is test security protected during candidate
appeal procedures?

61
Practical Examinations

Is test uniform for all candidates?
Is passing score defensible?
Are scoring controls in place to limit bias for
or against individual candidates?
Are scoring criteria well-documented?
Are judges well-trained to apply scoring criteria
consistently?
Are scoring judgments easy to record?
How are marginal scores resolved?

62
Presentation Follow-up