Test Development - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Test Development

Description:

How will meaning be attributed to scores on this test? ... B. Brad Pitt 2. Dumb & Dumber ___ C. Jim Carrey 3. Shaft ___ D. Tom Cruise 4. Fight Club ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 34
Provided by: ub19
Category:
Tags: development | dumb | test

less

Transcript and Presenter's Notes

Title: Test Development


1
Test Development
2
Test Development Process
  • Test Conceptualization
  • Test Construction
  • Test Tryout
  • Item Analysis
  • Test Revision

3
Test Conceptualization
  • Role of self-talk
  • Preliminary questions
  • What is test designed to measure?
  • Whats the objective of test?
  • Is there a need for the test?
  • Potential harm/benefits?
  • What content will be covered?

4
Test Conceptualization (contd)
  • How will meaning be attributed to scores on this
    test?
  • Norm-referenced compare individual score to
    others scores (who have already taken test)
  • Criterion-referenced compare score to criterion
    group (known to have trait)

5
Test Conceptualization (contd)
  • Pilot work
  • Preliminary research surrounding the creation of
    the prototype of test
  • Aim determine how best to measure targeted
    construct

6
Test Construction
  • Three steps
  • Scaling
  • Writing items
  • Scoring items

7
Test Construction (contd)
  • Scaling setting rules for assigning numbers in
    measurement
  • Decision of type of scales
  • Types of scales
  • Age-based
  • Grade-based
  • Stanine transformation of raw scores
  • Uni- or multi-dimensional
  • Method of paired comparisons

8
Method of Paired Comparisons
  • Select the behavior you think would be more
    justified
  • a. cheating on taxes if one has a chance
  • b. accepting a bribe in the course of ones
    duties
  • Which picture do you prefer?

9
Test Construction (contd)
  • Writing Items
  • Consider content, of item formats, number of
    items
  • Item pool group that items will be drawn or
    discarded for the final test version

10
Test Construction (contd)
  • Writing items contd
  • Item format
  • Selected-response
  • Constructed-response

11
Constructed Response
  • The standard deviation is generally considered
    the most useful measure of _________.
  • Answer variability

12
Construction-Writing Items
  • Writing Items contd
  • Selected response formats
  • Dichotomous
  • Polytomous
  • Likert
  • Categorical
  • Checklists
  • Matching
  • Subjective response format

13
Dichotomous True/False
  • Variables such as the form, plan, structure,
    arrangement, and layout of individual test items
    are collectively referred to as item format.
  • True False

14
Selected-Response Multiple Choice
  • Item A
  • A psychological test, and interview, and a case
    study are
  • Psychological assessment tools
  • Standardized behavioral samples
  • Reliable assessment instruments
  • Theory-linked measures

Stem Correct alternative Distractors
15
Selected Response MC
  • Item B
  • A good multiple-choice item in a an achievement
    test
  • Has one correct alternative
  • Has grammatically parallel alternatives
  • Has alternatives of similar length
  • Has alternatives that fit grammatically with the
    stem
  • Includes as much of the item as possible in the
    stem to avoid unnecessary repetition
  • Avoids ridiculous distractors
  • Is not excessively long
  • All of the above
  • None of the above

16
Likert Scales
  • How effective was the textbook in facilitating
    your learning in this course?
  • 1 2 3 4 5
  • Not A little Average
    More effective Extremely
  • at all effective
    effectiveness than usual
    effective
  • effective

17
Categorical
  • What level of education have you completed?
  • Range between kindergarten and 5th grade
  • Middle school education (6 8th grade)
  • Portion of high school (9-11th grade)
  • High school diploma
  • Associates degree
  • Masters degree
  • Professional degree (Ph.D., M.D., J.D., D.O.)

18
Checklists
  • Which symptoms have you experienced in the past
    month?
  • ___ Feeling down ___ Anxiety
  • ___ Irritability ___ Restlessness
  • ___ Sadness ___ Appetite changes
  • ___ Crying ___ Less interest in sex

19
Matching
  • ___ A. Samuel L. Jackson 1. Mission
    Impossible
  • ___ B. Brad Pitt 2. Dumb Dumber
  • ___ C. Jim Carrey 3. Shaft
  • ___ D. Tom Cruise 4. Fight Club

20
Subjective Response Formats
  • Fill-in-the-blank (e.g., regression is
    _________________)
  • Short answer
  • Essay
  • The longer and more complex the answer, the more
    difficult it is to score reliably.

21
Summary for writing items
  • 1. Use a theory or model to guide your
    test/survey when possible
  • 2. Try not to confuse the participant
  • 3. Use simple, clear language
  • 4. PROOFREAD
  • 5. Anticipate confusion
  • 6. Consider boredom and fatigue
  • 7. Consider short-term memory limitations
  • 8. Remember item writing should proceed with a
    plan in mind we should have a clearly defined
    notion of the construct we wish to measure!!!

22
Test Construction (contd)
  • Scoring Items
  • Class scoring earn credit towards placement in
    class
  • Category scoring earn credit?category
  • Ipsative scoring compares testtakers score on
    one scale within the test with another scale on
    same test

23
Ipsative Scoring
  • Edwards Personal Preference Schedule (EPPS)
    forced choice of two socially desirable
    responses yields info on the strength of the
    various needs in relations to the strength of the
    other needs of the testtaker (not towards needs
    of general population) so can only draw
    intra-individual (within) conclusions NOT
    inter-individual (between)
  • e.g.,
  • I feel depressed when I fail at something
  • I feel nervous when giving a talk before a group.

24
Test Tryout
  • Use similar people as those test developed for
  • 5-10 people per test item
  • e.g., if test is for aiding in selection of
    corporate execs w/ management potential, then try
    it out on corporate employees at the targeted
    level
  • The more the people, the weaker the role of
    chance in data analysis

25
Item Analysis
  • Item difficulty how many people get it right
    The more who get it right, the easier the item
  • Optimum difficulty level 1st, find half of the
    difference between 100 success and chance
    performance (chance) 2nd, will add this value to
    the probability of performing correctly by chance
    alone (midway pt)
  • 100 (1.0) and level of chance (.2 for 5 items)
  • 1.0 - .2 0.8 .40
  • 2 2
  • .40 .20 .60 (optimum difficulty
  • (chance) (midway pt)
    level)

26
Item Analysis
  • Item discriminability determines whether people
    who have done well on certain items have also
    done well on whole test
  • Extreme group method compares those who do well
    with those who havent

27
Item Analysis
  • Item reliability
  • Item-Reliability Index higher index means more
    reliable (i.e., measure of internal consistency)
  • Factor analysis can see if items are loading on
    factors you want them to or if several factors
    are emerging can eliminate items based on what
    you want test to do

28
Item Analysis
  • Item Validity
  • Item-validity index indicates degree which a
    test measures what it says it measures
  • Higher is better
  • Uses the item-score SD and the correlation
    between the item score and criterion score

29
Item Analysis
  • Item Characteristic Curve relationship between
    performance on the item and performance on the
    test

30
Item Characteristic Curves
A
B
C
D
High Prob of correct response Low
Low High
Ability
Low High
Ability
31
Test Revision
  • Mold test into its final form
  • Evaluate strengths/weaknesses of items
  • Delete weaker items
  • e.g.,
  • Some items may be too easy or too hard (these
    lack reliability and validity because of their
    restricted ranges of testtaker performance)
  • items could have high reliability but poor
    criterion validity, or could be unbiased but too
    easy
  • also want to reflect on purpose of test (if
    educational placement test, developer will be
    very concerned about bias of items)
  • if want test to identify most skilled individuals
    (astronaut program candidates), then want high
    item discrimination.

32
Test Revision
  • Administer test under standardized conditions to
    a 2nd appropriate sample of testtakers
  • Standardization Once test is in its final form,
    this process used to introduce objectivity
    uniformity into test administration, scoring,
    interpretation
  • Cross-validation revalidating test on another
    sample of people
  • Validity shrinkage decrease in item validities

33
Example
  • Affirmative Action Knowledge Test
  • 5 phases of development
  • Item-level analysis
  • Scale-level analysis
  • Convergent/discriminant validity
Write a Comment
User Comments (0)
About PowerShow.com