Presented at CLEARs 23rd Annual Conference - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Presented at CLEARs 23rd Annual Conference

Description:

Presented at CLEAR's 23rd Annual Conference. Toronto, Ontario September, 2003 ... Do item authors and reviewers sign and understand non-disclosure agreements? ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 63
Provided by: RobS69
Category:

less

Transcript and Presenter's Notes

Title: Presented at CLEARs 23rd Annual Conference


1
Defending Your Licensing Examination Programme
  • Deborah Worrad
  • Registrar and Executive Director
  • College of Massage Therapists of Ontario

2
Critical Steps
  • Job Analysis Survey
  • Blueprint for Examination
  • Item Development Test Development
  • Cut Scores Scoring/Analysis
  • Security

3
Subject Matter ExpertsSelection
  • Broadly representative of the profession
  • Specialties of practice
  • Ethnicity
  • Age distribution
  • Education level
  • Gender distribution
  • Representation from newly credentialed
    practitioners
  • Geographical distribution
  • Urban vs. rural practice locations
  • Practice settings

4
Job Analysis Survey
  • Provides
  • the framework for the examination development
  • a critical element for ensuring that valid
    interpretations are made about an individuals
    exam performance
  • a link between what is done on the job and how
    candidates are evaluated for competency

5
Job Analysis Survey
  • Comprehensive survey of critical knowledge,
    skills and abilities (KSAs) required by an
    occupation
  • Relative importance, frequency and level of
    proficiency of tasks must be established

6
Job Analysis Survey
  • Multiple sources of information should be used to
    develop the KSAs
  • The survey must provide sufficient detail in
    order to provide enough data to support exam
    construction (blueprint)

7
Job Analysis Survey
  • Good directions
  • User friendly simple layout
  • Demographic information requested from
    respondents
  • Reasonable rating scale
  • Pilot test

8
Job Analysis Survey
  • Survey is sent to either a representative sample
    (large profession) or all members (small)
  • With computer technology the JAS can be done on
    line saving costs associated with printing and
    mailing
  • Motivating members to complete the survey may be
    necessary

9
Job Analysis Survey
  • Statistical analysis of results must include
    elimination of outliers and respondents with
    personal agendas
  • A final technical report with the data analysis
    must be produced

10
Blueprint for Examination
  • An examination based on a JAS provides the
    foundation for the programme content validity
  • The data from the JAS on tasks and KSAs critical
    to effective performance is used to create the
    examination blueprint
  • Subject Matter Experts review the blueprint to
    confirm results from data analysis

11
Item Development
  • Items must fit the test blueprint and be properly
    referenced
  • Principles of item writing must be followed and
    the writers trained to create items that will
    properly discriminate at an entry level
  • The writers must be demographically
    representative of practitioners

12
Item Development
  • Item editing is completed by a team of Subject
    Matter Experts (SMEs) for content review and
    verification of accuracy
  • Items are converted to a second language at this
    point if required
  • Items should be pre-tested with large enough
    samples

13
Examination Psychometrics
  • Options
  • Computer adaptive model
  • Paper and pencil model with item response theory
    (IRT) and pre-testing
  • Equipercentile equating using an embedded set of
    items on every form for equating and establishing
    a pass score

14
Test Development
  • Relationship between test specifications and
    content must be logical and defensible
  • Test questions are linked to blueprint which is
    linked to the JAS
  • Exam materials must be secure

15
Test Development
  • Elements of test development differ depending on
    model you are using
  • Generally - develop a test form ensuring
  • Items selected meet statistical requirements
  • Items match the blueprint
  • No item cues another item
  • No repetition of same items

16
Cut Scores
  • Use an approved method to establish minimal
    competence standards required to pass the
    examination
  • This establishes the cut score (pass level)

17
Cut Scores
  • One method is the modified Angoff in which a SME
    panel makes judgements about the minimally
    competent candidates ability to answer each item
    correctly
  • This is frequently used by testing programmes and
    does not take too long to complete

18
Cut Scores
  • The SMEs provide an estimate of the proportion of
    minimally competent candidates who would respond
    correctly to each item
  • This process is completed for all items and an
    average rating established for each item
  • Individual item rating data are analyzed to
    establish the passing score

19
Scoring
  • Scoring must be correct in all aspects
  • Scanning
  • Error checks
  • Proper key
  • Quality control
  • Reporting

20
Scoring/Analysis
  • Test item analysis on item difficulty and item
    discrimination must be conducted
  • Adopt a model of scoring appropriate for your
    exam (IRT, equipercentile equating)
  • Must ensure that the passing scores are fair and
    consistent eliminating the impact of varying
    difficulty among forms

21
Scoring
  • Adopting a scaled score for reporting results to
    candidates may be beneficial
  • Scaling scores facilitates the reporting of any
    shifts in the passing point due to ease or
    difficulty of a form
  • Cut scores may vary depending on the test form so
    scaling enables reporting on a common scale

22
Security
  • For all aspects of the work related to
    examinations, proper security procedures must be
    followed including
  • Passwords and password maintenance
  • Programme software security
  • Back-ups
  • Encryption for email transmissions
  • Confidentiality agreements

23
Security
  • Exam administration security must include
  • Exam materials locked in fire proof vaults
  • Security of delivery of exam materials
  • Diligence in dealing with changes in technology
    if computer delivery of the exam is used

24
Presentation Follow-up
  • Please pick up a handout from this presentation
    -AND/OR-
  • Presentation materials will be posted on CLEARs
    website

25
Defending Your Licensing Examination Program
With Data
  • Robert C. Shaw, Jr., PhD

26
The Defense Triangle
Test Score Use
Reliability
Criterion
Content
27
Content
  • Standard 14.14 (1999) The content domain to be
    covered by a credentialing test should be defined
    clearly and justified in terms of the importance
    of the content . . .
  • We typically evaluate tasks along an
  • importance dimension or a significance dimension
    that incorporates importance and frequency
  • extent dimension

28
Content
  • Task importance/significance scale points
  • 4. Extremely
  • 3. Very
  • 2. Moderately
  • 1. Not
  • Task extent scale point
  • 0. Never Performed

29
Content
  • We cause each task to independently surpass
    importance/significance and extent exclusion
    rules
  • We do not composite task ratings
  • We are concerned about diluting tests with
    relatively trivial content (high extent-low
    importance) or including content that may be
    unfair to test (low extent-high importance)

30
Content
  • Selecting a subset of tasks and labeling them
    critical is only defensible when the original
    list was reasonably complete
  • We typically ask task inventory respondents how
    adequately the task list covered the job
  • completely, adequately, inadequately
  • We then calculate percentages of respondents who
    selected each option

31
Content
  • Evaluate task rating consistency
  • Were the people consistent?
  • Intraclass correlation
  • Were tasks consistently rated within each content
    domain?
  • Coefficient alpha

32
Content
33
Content
  • We typically ask task inventory respondents in
    what percentages they would allocate items across
    content areas to lend support to the structure of
    the outline
  • I encourage a task force to explicitly follow
    these results or follow the rank order
  • Because items are specified according to the
    outline, we feel these results demonstrate
    broader support for test specifications beyond
    the task force

34
Content
What percentage of items would you allocate to
each content area?
35
Reliability
  • Test scores lack utility until one can show the
    measurement scale is reasonably precise

36
Reliability
  • Test score precision is often expressed in terms
    of
  • Kuder-Richardson Formula 20 (KR 20) when items
    are dichotomously (i.e., 0 or 1) scored
  • Coefficient Alpha when items are scored on a
    broader scale (e.g., 0 to 5)
  • Standard Error of Measurement

37
Reliability
  • Standard 14.15 (1999) Estimates of the
    reliability of test-based credentialing decisions
    should be provided.
  • Comment . . . Other types of reliability
    estimates and associated standard errors of
    measurement may also be useful, but the
    reliability of the decision of whether or not to
    certify is of primary importance

38
Reliability
  • Decision Consistency Index

39
Criterion
  • The criterion to which test scores are related
    can be represented by two planks

40
Criterion
  • Most programs rely on the minimal competence
    criterion expressed in a passing point study

41
Criterion
  • Judges expectations are expressed through
  • text describing minimally competent practitioners
  • item difficulty ratings
  • We calculate an intraclass correlation to focus
    on the consistency with which judges gave
    ratings
  • We find confidence intervals around the mean
    rating

42
Criterion
  • We use the discrimination value to look for
    aberrant behavior from judges

43
Criterion
Mean of judges ratings
Passing score
44
Criterion
  • One of my clients was sued in 1975
  • In spite of evidence linking test content to a
    1973 role delineation study, the court would not
    dismiss the case
  • Issues that required defense were
  • discrimination or adverse impact from of test
    score use
  • job-relatedness of test scores

45
Criterion
  • Only after a criterion-related validation study
    was conducted was the suit settled

46
Criterion
  • Theoretic model of these studies

Supervisor Rating Inventory
Critical Content
Test
Correlation of Ratings and Test Scores
47
Criterion
  • Test Bias Study
  • Compare regression lines of job performance from
    test scores for focal and comparator groups
  • There are statistical procedures available to
    determine whether slopes and intercepts
    significantly differ
  • Differences in mean scores are not necessarily a
    critical indicator

48
The Defense Triangle
Test Score Use
Reliability
Criterion
Content
49
Presentation Follow-up
  • Presentation materials will be posted on CLEARs
    website
  • rshaw_at_goamp.com

50
Defending Your Program Strengthening Validity in
Existing Examinations
  • Ron Rodgers, Ph.D.
  • Director of Measurement Services
  • Continental Testing Services (CTS)

51
Waht Can Go Wrong?
  • Job/practice analysis test specs
  • Item development documentation
  • Test assembly procedures controls
  • Candidate information before after
  • Scoring accuracy item revalidation
  • Suspected cheating candidate appeals
  • Practical exam procedures scoring

52
Job Analysis Test Specs
  • Undocumented (or no) job analysis
  • Embedded test specifications
  • Unrepresentative populations for job analysis or
    pilot testing
  • Misuse of trial forms and data to support
    live examinations

53
Item Development
  • Do item authors and reviewers sign and understand
    non-disclosure agreements?
  • How does each question reflect job analysis
    results and test specifications?
  • Should qualified candidates be able to answer Qs
    correctly with information available during the
    examination?

54
Item Development
  • Do any questions offer cues that answer other
    questions on an exam?
  • Do item patterns offer cues to marginally
    qualified, test-savvy candidates?
  • Is longest answer always correct?
  • If None of the above or All of the above Qs are
    used, are these always correct?
  • True-False questions with clear patterns?
  • Do other detectable patterns cue answers?

55
Item Documentation
  • Are all Qs supported by references cited for and
    available to all candidates?
  • Do any questions cite item authors or committee
    members as references?
  • Are page references cited for each Q?
  • Are citations updated as new editions of each
    reference are published?

56
Candidate Information
  • Are all references identified to and equally
    available to all candidates?
  • Are content outlines for each test provided to
    help candidates prepare?
  • Are sample Qs given to all candidates?
  • Are candidates told what they must/may bring and
    use during the examination?

57
Test Assembly Controls
  • Are parallel forms assembled to be of
    approximately equal difficulty?
  • Is answer key properly balanced?
  • Approx. equal numbers of each option
  • Limit consecutive Qs with same answer
  • Avoid repeated patterns of responses
  • Avoid long series of Qs without an option

58
Suspected Cheating
  • Is potential cheating behavior at the test site
    clearly defined for onsite staff?
  • Are candidates informed of possible consequences
    of suspected cheating?
  • Are staff trained to respond fairly and
    appropriately to suspected cheating?
  • Are procedures in place to help staff
    document/report suspected cheating?

59
Scoring Controls
  • How is accuracy of answer key verified?
  • Do item analyses show any anomalies in candidate
    performance on test?
  • Are oddly performing Qs revalidated?
  • Identify ambiguities in sources or Qs
  • Verify that each Q has one right answer
  • Give credit to all candidates when needed
  • Are scoring adjustments applied fairly?
  • Are rescores/refunds issued as needed?

60
Candidate Appeals
  • How do candidates request rescoring?
  • Do policies allow cancellation of scores when
    organized cheating is found?
  • Harvested Qs on websites, in print
  • Are appeal procedures available?
  • Are appeal procedures explained?
  • How is test security protected during candidate
    appeal procedures?

61
Practical Examinations
  • Is test uniform for all candidates?
  • Is passing score defensible?
  • Are scoring controls in place to limit bias for
    or against individual candidates?
  • Are scoring criteria well-documented?
  • Are judges well-trained to apply scoring criteria
    consistently?
  • Are scoring judgments easy to record?
  • How are marginal scores resolved?

62
Presentation Follow-up
  • Please pick up a handout from this presentation
    -AND/OR-
  • Presentation materials will be posted on CLEARs
    website
Write a Comment
User Comments (0)
About PowerShow.com