Center for Distance and Independent Study - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Center for Distance and Independent Study

Description:

Content Validity - the extent to which test items match or reflect a teacher's ... Suggestions for cognitive complexity of objective or multiple-choice items: ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 34
Provided by: terrie1
Category:

less

Transcript and Presenter's Notes

Title: Center for Distance and Independent Study


1
Center for Distance and Independent Study

2
Creating and Using a Statistical System to
Improve Objective Items
  • Terrie Nagel
  • AACIS, November 10, 2000

3
Steps on Our Journey
  • Reasons
  • Assessment Resource Center (ARC)
  • Quantitative Analysis (Creating and Using a
    Statistical System to Improve Objective Items)
  • Qualitative Analysis
  • Conclusion

4
Reasons
  • Over 17,000 enrollments annually
  • Over 20 middle school, 140 high school and 160
    university courses available
  • MU High School Diploma program Fall, 1999

5
Reasons
  • Simplistic test assessment used in past

6
Reasons
  • Test Reliability - the consistency with which a
    test measures whatever it is measuring a test is
    reliable if it consistently yields the same, or
    nearly the same, ranks over repeated
    administrations, during which we would not expect
    the trait being measured to have changed.

7
Reasons
  • Content Validity - the extent to which test
    items match or reflect a teachers instructional
    objectives

8
Assessment Resource Center
  • Focus on 10 high school courses initially
  • ARC expert in field of test assessment
  • Sound advice for independent study curriculum
    development in general
  • Suggestions regarding current trends

9
Assessment Resource Center
  • Sound advice for independent study curriculum
    development in general
  • Tests should not repeat items verbatim from
    homework questions (otherwise, testing how well
    students memorize items rather than how well they
    understand the content)
  • Each item should be independent of other items.
    One item should not provide clues to the answers
    to other items on the same test.

10
Assessment Resource Center
  • Sound advice for independent study curriculum
    development in general
  • True/False Items - Doesnt advocate difficult to
    write good ones that test more than rote
    memorization and students have a 50 percent
    chance of getting the item right. To be true, a
    statement has to be clearly true 100 of the
    time. To be false, it has to be definitely false,
    with no additional qualifications. Often only
    trivial statements can be said to be
    unambiguously true or false.

11
Assessment Resource Center
  • Sound advice for independent study curriculum
    development in general
  • Matching questions also a concern should include
    more distractors than are used (otherwise
    increases chance of getting item correct by
    guessing) . Matching questions are basically
    multiple true-false items and have some of the
    same shortcomings as true/false items. Major
    disadvantage - suitable only for measuring
    associations and not for assessing higher levels
    of understanding.

12
Assessment Resource Center
  • Suggestions regarding current trends
  • Ask course authors to provide large pool of test
    questions doubling the number of questions gives
    course editor chance to use well written ones.
    Consider paying bonus for well written questions.
    Consider paying only for well written
    questions used or added to pool of available
    questions.

13
Assessment Resource Center
  • Suggestions regarding current trends
  • Computer adaptive testing - must ensure
  • 1) large pool of available test questions -
    ARC
  • felt would entail an even larger bank of
  • questions, perhaps 8 times as large
  • 2) questions cover instructional objectives
  • 3) items range in difficulty level
  • 4) cognitive complexity of items varies
  • appropriately (comprehension,
    application)

14
Quantitative Analysis
  • For practical reasons, we assess reliability
    using internal consistency methods rather than
    test-retest or alternate exams
  • If the test is designed to measure a single basic
    concept, it is reasonable to assume students who
    get one item right will also be more likely to
    get other, similar items right (items should be
    fairly highly correlated with each other) .
    Common measures are coefficient alpha and
    Kuder-Richardson 20.

15
Quantitative Analysis
  • Coefficient Alpha and KR-20 estimates can
    reasonably be interpreted as reflecting the
    reliability of the test as long as every item
    measures the same general trait of ability as
    every other item (e.g., they would be
    appropriately calculated for a 20-item spelling
    test but not for a 20-item test that had 5 items
    each to measure spelling, reading, math, and
    science). KR-21 is another measure but it
    assumes all items are of equal difficulty, seldom
    true in practice, so its not really useful.

16
Quantitative Analysis
  • Limitations of Mermac
  • Batch mainframe environment
  • Additional CPU mainframe costs to the dept.
  • Cannot be edited or personalized
  • Test keys manually entered in JCL, coding errors
    possible
  • Only KR-20 and KR-21 reliability correlation
    coefficients reported KR-20 considered
    misleading since it is generally the highest,
    others should be reported

17
Quantitative Analysis
  • SAS (Statistical Analysis System)
  • Powerful, easy to use
  • Macro-driven, personalized reports
  • Easily handles variety of input
  • Test keys read in to avoid data entry errors
  • Additional statistical computations easy
    (coefficient alpha individual item analysis more
    accurately targets items for inclusion/removal to
    improve test reliability)

18
Quantitative Analysis
  • Item Analysis
  • How difficult is the item?
  • Does the item distinguish between the higher and
    lower scoring examinees?
  • Do some examinees select all the options?
  • Or are there some options that no examinees
    choose?

19
Quantitative Analysis
  • Item Difficulty Index - refers to the proportion
    of students who answered an item correctly.
  • Item Difficulty can range from 1.00 (everyone
    got
  • it right) to 0 (no one got it right).
    Items with
  • difficulty levels gt .75 are considered
    relatively
  • easy, whereas items with difficulty levels lt
    .25
  • are considered relatively difficult.

20
Quantitative Analysis
  • Item Discrimination Index - a measure of the
    extent to which the item being analyzed
    distinguishes between students who did well
    overall on the text and those who did not do well
    overall on the test.
  • The procedure involved in calculating this can be
    described as follows
  • 1) Rank the test scores from high to low

21
Quantitative Analysis
  • 2) Form upper and lower groups based on
    total
  • test scores. The formation of these
    groups
  • varies considerably if test sample is
    small,
  • data can be divided into half and all
    scores
  • considered. Since I generally have a
    lot of data,
  • I use quintiles and divide data into
    fifths, thus
  • my upper and lower groups approximate
    the top
  • 20 and bottom 20, respectively.
    Groups
  • wont always be equal in size due to
    duplicate
  • scores.

22
Quantitative Analysis
  • 3) For each item, count the number of
    students in
  • the upper and lower groups that chose
    the
  • correct response.
  • 4) Compute item discrimination

23
Quantitative Analysis
  • Item Discrimination Index - can range from -1.00
    to 1.00. Items with positive discrimination
    indices greater than .30 are good, whereas those
    with negative values should be replaced.
  • If your goal is improving test reliability, be
    careful to check both the items discrimination
    index and its correlation with the total (unless
    correlation with the total is negative, you will
    actually decrease test reliability if the item is
    deleted from the test).

24
Quantitative Analysis
  • An item has satisfactory discriminatory power if
    more students in the upper group than in the
    lower group choose the correct option, and more
    students in the lower than in the upper group
    choose each distractor.
  • Results of these analyses reveal whether or not
    any of your items are miskeyed, subject to
    guessing, or ambiguous.

25
Quantitative Analysis
  • Miskeying - this should be checked as a
    possibility anytime the majority of students in
    the upper group choose a particular distractor
    rather than the correct answer.

26
Quantitative Analysis
  • Guessing - this is indicated when students in
    the upper group appear to be responding in a
    random fashion. If each option was chosen by
    about the same number of students in the upper
    group, guessing is probably going on.

27
Quantitative Analysis
  • Ambiguity - this is suspected when one
    particular distractor is chosen by about as many
    students in the upper group as is the correct
    response. Items that are ambiguous or prone to
    guessing need to be rewritten.

28
Quantitative Analysis
  • American History to 1898 Example
  • Descriptive Statistics Measures of Variability
  • Internal Consistency Reliability (Coefficient
    alpha, KR-20, Spearman Brown)
  • Frequency Distribution
  • Quintile Summary
  • Matrix of Responses by Fifths Including Item
    Difficulty Index and Rank Point Biserial Index
  • (complex form of Item Discrimination Index)

29
Qualitative Analysis
  • Rewriting/Editing Problematic Items (see General
    Guidelines in presentation folders)
  • Example
  • Clue provided in stem
  • 1. The ability to lead and motivate others
    is known as
  • A. leadership.
  • B. direct attack.
  • C. progression.
  • D. production.

30
Qualitative Analysis
  • Content validity is reinforced by a test
    blueprint matrix which has two components
  • The content outline lists the content topics
    (from the instructional objectives) to be covered
    by the test items
  • The categories serve as a check on the level of
    cognitive complexity of the test items (ranging
    from simple knowledge or recall to higher-level
    cognitive skills)

31
Qualitative Analysis
  • Suggestions for cognitive complexity of objective
    or multiple-choice items
  • Items at the knowledge level of Blooms taxonomy
    involve remembering or recalling previously
    learned information since this is rote
    memorization, these items would not promote
    meaningful learning
  • Objective items lend well to comprehension and
    application outcomes 60 comprehension items and
    40 application items would be a good ratio on a
    test

32
Conclusion
  • Current Status and Future Plans
  • One-fifth of the 140 high school courses have
    been or are in the process of being adapted after
    undergoing test assessment our goal is to assess
    all of the computer-evaluated exams, given enough
    data
  • Computer-adaptive testing is in preliminary
    stages, helps reinforce trend toward use of test
    blueprints and independent test items
  • Targeted university courses will undergo test
    assessment as well, may involve dept. approval

33
Terrie Nagel nagelt_at_missouri.edu
Write a Comment
User Comments (0)
About PowerShow.com