Test Development - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Test Development

Description:

How will meaning be attributed to scores on this test? ... B. Brad Pitt 2. Dumb & Dumber _ C. Jim Carrey 3. Shaft _ D. Tom Cruise 4. Fight Club ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 34

Provided by: ub19

Category:

more less

Transcript and Presenter's Notes

Title: Test Development

1
Test Development
2
Test Development Process

Test Conceptualization
Test Construction
Test Tryout
Item Analysis
Test Revision

3
Test Conceptualization

Role of self-talk
Preliminary questions
What is test designed to measure?
Whats the objective of test?
Is there a need for the test?
Potential harm/benefits?
What content will be covered?

4
Test Conceptualization (contd)

How will meaning be attributed to scores on this
test?
Norm-referenced compare individual score to
others scores (who have already taken test)
Criterion-referenced compare score to criterion
group (known to have trait)

5
Test Conceptualization (contd)

Pilot work
Preliminary research surrounding the creation of
the prototype of test
Aim determine how best to measure targeted
construct

6
Test Construction

Three steps
Scaling
Writing items
Scoring items

7
Test Construction (contd)

Scaling setting rules for assigning numbers in
measurement
Decision of type of scales
Types of scales
Age-based
Grade-based
Stanine transformation of raw scores
Uni- or multi-dimensional
Method of paired comparisons

8
Method of Paired Comparisons

Select the behavior you think would be more
justified
a. cheating on taxes if one has a chance
b. accepting a bribe in the course of ones
duties

Which picture do you prefer?

9
Test Construction (contd)

Writing Items
Consider content, of item formats, number of
items
Item pool group that items will be drawn or
discarded for the final test version

10
Test Construction (contd)

Writing items contd
Item format
Selected-response
Constructed-response

11
Constructed Response

The standard deviation is generally considered
the most useful measure of _________.
Answer variability

12
Construction-Writing Items

Writing Items contd
Selected response formats
Dichotomous
Polytomous
Likert
Categorical
Checklists
Matching
Subjective response format

13
Dichotomous True/False

Variables such as the form, plan, structure,
arrangement, and layout of individual test items
are collectively referred to as item format.
True False

14
Selected-Response Multiple Choice

Item A
A psychological test, and interview, and a case
study are
Psychological assessment tools
Standardized behavioral samples
Reliable assessment instruments
Theory-linked measures

Stem Correct alternative Distractors
15
Selected Response MC

Item B
A good multiple-choice item in a an achievement
test
Has one correct alternative
Has grammatically parallel alternatives
Has alternatives of similar length
Has alternatives that fit grammatically with the
stem
Includes as much of the item as possible in the
stem to avoid unnecessary repetition
Avoids ridiculous distractors
Is not excessively long
All of the above
None of the above

16
Likert Scales

How effective was the textbook in facilitating
your learning in this course?
1 2 3 4 5
Not A little Average
More effective Extremely
at all effective
effectiveness than usual
effective
effective

17
Categorical

What level of education have you completed?
Range between kindergarten and 5th grade
Middle school education (6 8th grade)
Portion of high school (9-11th grade)
High school diploma
Associates degree
Masters degree
Professional degree (Ph.D., M.D., J.D., D.O.)

18
Checklists

Which symptoms have you experienced in the past
month?
___ Feeling down ___ Anxiety
___ Irritability ___ Restlessness
___ Sadness ___ Appetite changes
___ Crying ___ Less interest in sex

19
Matching

___ A. Samuel L. Jackson 1. Mission
Impossible
___ B. Brad Pitt 2. Dumb Dumber
___ C. Jim Carrey 3. Shaft
___ D. Tom Cruise 4. Fight Club

20
Subjective Response Formats

Fill-in-the-blank (e.g., regression is
_________________)
Short answer
Essay
The longer and more complex the answer, the more
difficult it is to score reliably.

21
Summary for writing items

1. Use a theory or model to guide your
test/survey when possible
2. Try not to confuse the participant
3. Use simple, clear language
4. PROOFREAD
5. Anticipate confusion
6. Consider boredom and fatigue
7. Consider short-term memory limitations
8. Remember item writing should proceed with a
plan in mind we should have a clearly defined
notion of the construct we wish to measure!!!

22
Test Construction (contd)

Scoring Items
Class scoring earn credit towards placement in
class
Category scoring earn credit?category
Ipsative scoring compares testtakers score on
one scale within the test with another scale on
same test

23
Ipsative Scoring

Edwards Personal Preference Schedule (EPPS)
forced choice of two socially desirable
responses yields info on the strength of the
various needs in relations to the strength of the
other needs of the testtaker (not towards needs
of general population) so can only draw
intra-individual (within) conclusions NOT
inter-individual (between)
e.g.,
I feel depressed when I fail at something
I feel nervous when giving a talk before a group.

24
Test Tryout

Use similar people as those test developed for
5-10 people per test item
e.g., if test is for aiding in selection of
corporate execs w/ management potential, then try
it out on corporate employees at the targeted
level
The more the people, the weaker the role of
chance in data analysis

25
Item Analysis

Item difficulty how many people get it right
The more who get it right, the easier the item
Optimum difficulty level 1st, find half of the
difference between 100 success and chance
performance (chance) 2nd, will add this value to
the probability of performing correctly by chance
alone (midway pt)
100 (1.0) and level of chance (.2 for 5 items)
1.0 - .2 0.8 .40
2 2
.40 .20 .60 (optimum difficulty
(chance) (midway pt)
level)

26
Item Analysis

Item discriminability determines whether people
who have done well on certain items have also
done well on whole test
Extreme group method compares those who do well
with those who havent

27
Item Analysis

Item reliability
Item-Reliability Index higher index means more
reliable (i.e., measure of internal consistency)
Factor analysis can see if items are loading on
factors you want them to or if several factors
are emerging can eliminate items based on what
you want test to do

28
Item Analysis

Item Validity
Item-validity index indicates degree which a
test measures what it says it measures
Higher is better
Uses the item-score SD and the correlation
between the item score and criterion score

29
Item Analysis

Item Characteristic Curve relationship between
performance on the item and performance on the
test

30
Item Characteristic Curves
A
B
C
D
High Prob of correct response Low
Low High
Ability
Low High
Ability
31
Test Revision

Mold test into its final form
Evaluate strengths/weaknesses of items
Delete weaker items
e.g.,
Some items may be too easy or too hard (these
lack reliability and validity because of their
restricted ranges of testtaker performance)
items could have high reliability but poor
criterion validity, or could be unbiased but too
easy
also want to reflect on purpose of test (if
educational placement test, developer will be
very concerned about bias of items)
if want test to identify most skilled individuals
(astronaut program candidates), then want high
item discrimination.

32
Test Revision

Administer test under standardized conditions to
a 2nd appropriate sample of testtakers
Standardization Once test is in its final form,
this process used to introduce objectivity
uniformity into test administration, scoring,
interpretation
Cross-validation revalidating test on another
sample of people
Validity shrinkage decrease in item validities

33
Example