Title: A Review of the Literature on Examinee Choice in Test Development: Implications for Universally Designed High Stakes Tests
1Creating Innovative Tests Applying Universal
Design to Assessment Practices
Assessment Colloquium November 30, 2007
Manju Banerjee, Ph.D. Assistant Professor in
Residence Special Education
2Just imagine --- If there were no tests, no
assessment, no accountability as we know it?
Student perspective Teacher perspective
Policy maker perspective
3Opportunities borne of new technologies, desires
borne of new understandings of learning ---- a
new generation of assessment beckons. To realize
the vision, we must reconceive how we think about
assessment, from purposes and designs to
production and delivery. (Mislevy, Steinberg,
Almond, 1999, p.6)
4 Computer-based tests (CBT) are the next
frontier in high stakes assessment (Thompson,
Johnstone, Thurlow, 2002)
5What is the appeal of computer-based tests?
Opportunity to create tests that support
accessibility needs of diverse test takers --
Universal Design (UD)
UD is anchored in the belief that a design that
works well for examinees with disabilities,
improves usability for all individuals (Center
for an Accessible Society, 2006)
- What is universal design? (Center for Universal
Design, 1997) - What makes a test universally designed?
- Seven Elements of a universally designed
test (Thompson, Johnstone, Thurlow, 2002)
6Application of Universal Design to High Stakes
Tests
Maximum usability
Widest range of consumers
Without design adaptations
Minimize construct irrelevant features
Disabilities, ELL, Non-traditional age
Built-in from the start
Include test taking features
EXAMINEE CHOICE
Examinee choice is flexibility to access and
express in the mode or methods that best suit
the individual (Hall, 2005, p. 2) (Russell,
Goldberg, OConner, 2003)
71. Objective of Study
- Inform product development of high stakes tests
- Based on current research on features that
support examinee choice in high stakes test
design
Test taking tools
On-screen item display tools
Access tools
- Bridgeman, Lennon,
- Jackenthal, 2002
- Mazzeo Harvey, 1988
- Pommerich, 2004
- Pommerich Burden, 2004
- Goldberg Pedula, 2002
- Peak, 2005
- Lunz Bergstrom, 1994
- Vispoel et al., 2000
- Mandinach et al., 2005
- Sireci, Li, Scarpati, 2003
- Tindal Fuchs, 2000
8II. Background Information
Features of Examinee Choice
- Tindal, 1998
- CTB/McGraw-Hill, 2004
Construct neutral
Construct related
Test taking tools
Access tools
On-screen item display
9II. Background Information (Cont.)
- U D increased accessibility for
all examinees - Accessibility is maximized when examinees have
choice over features of test design - Research on features of test design fall into
three broad categories - (1)Test taking tools (2)Item Display (3)
Access tools - Some features are construct neutral/construct
irrelevant others are construct related
(including test accommodations) - Allowing examinees to choose features of test
design based on individual preferences needs to
be explored for a wide range of features
including features that affect test construct - U D suggest a framework but research is still
emerging on the application of UD to high stakes
CBTs.
10III. Methodology and Procedures
Research questions
What are college students stated preferences for features and combinations of features of test design from among test taking tools, on-screen item display, and access tools for the Passage Comprehension section of the GRE??
Are stated preferences for features and combinations of features from among test-taking tools, on-screen item display, and access tools different among students with and without learning disabilities (LD), Attention Deficit Hyperactivity Disorder (ADHD), or both?
11III. Methodology and Procedures (cont.)
Research Design, Instrumentation, Pilot Study
Exploratory study - Survey design
Participants responded to an online survey instrument (1) Student background questionnaire (2) Demonstration of selected features of CBT (3) Opportunity for practice (4) Two choice exercises Rank-ordered choice exercise Voluntary top feature choice exercise
Two pilot studies
12III. Methodology and Procedures (contd.)
Instrumentation Test Features
Attribute 1 Test taking tools ? Highlighting ? Tagging ? Strike-out ? Change answer
Attribute 2 On-screen item display tools ? Font size ? Note pad ? Question reorder
Attribute 3 Access tools ? Self-voicing less 20 points ? 50 extra time less 20 points ? Self-voicing less 40 points ? 50 extra time less 40 points ? Self-voicing less 60 points ? 50 extra time less 60 points ? No selection
13III. Methodology and Procedures (contd.)
Instrumentation
Introduction http//www.education.uconn.edu/jamison/highstakestesting/intro.cfm
Highlighting feature http//www.education.uconn.edu/jamison/highstakestesting/tool1video.cfm
Strike out feature http//www.education.uconn.edu/jamison/highstakestesting/tool3video.cfm
14III. Methodology and Procedures (cont.)
Instrumentation
Choice exercise 1 http//www.education.uconn.edu/jamison/highstakestesting/choice1.cfm
Choice exercise 2 http//www.education.uconn.edu/jamison/highstakestesting/choice2.cfm
15III. Methodology and Procedures (contd.)
Instrumentation- Creating the 1st choice exercise
- Given 4x3x7 (features) 84 combinations
- Select a unique group of 4 from 84 combinations
Attribute Range of occurrence of features
Test taking tools 176 - 196
On-screen item display tools 230 - 250
Access tools "No selection feature 75 99 185
16III. Methodology and Procedures (contd.)
Data Analysis
Research Question 1
Research Question 2
Rank-ordered choice exercise data
Voluntary top feature choice exercise data
Rank-ordered Logit Regression
Multinomial Logit Regression
17III. Methodology and Procedures (contd.)
Data Analysis Rank-ordered logit regression
Dependent variable - Ranks assigned to the combination of feature
Independent variables - Features and attributes
Utility/Preference Non-significant ?baseline rank is proxy for preference Non-significant ? zero probability of selection Relative Utility One feature is dropped from each attribute for the model to be determinate
18III. Methodology and Procedures (contd.)
Data Analysis Rank-ordered logit regression
Three models were estimated Model 1 Three attributes as independent variables Model 2 Three attributes as independent variables with no selection feature was omitted Model 3 Features within each attribute as independent variables
19III. Methodology and Procedures (contd.)
Data Analysis Multinomial logit regression
Dependent variable Top pick feature within an attribute
Independent variables Demographic characteristics
20IV. Results Participant demographics
21IV. Results - Model 1
Demographic Characteristics Test-taking tools ?(SE) On-screen item display tools ?(SE) Access tools ?(SE)
All participants .03(.06) .04(.09) .04(.03)
No LD/ADHD .03(.07) -.03(.09) .06(.03)
LD/ADHD -.06(.17) .51(.24) -0.12(.08)
Graduate -.03(.08) .02(.11) .11(.03)
Undergraduate .10(.09) .07(.14) -.05(.04)
No disability .05(.07) -.03(.11) .07(.03)
Disability -.02(.12) .20(.15) -.02(.05)
plt0.10, plt0.05, plt0.01
22IV. Results - Model 1 (contd.)
Demographic Characteristics Test-taking tools ?(SE) On-screen item display tools ?(SE) Access tools ?(SE)
High GPA (?3.0) .04(.07) -.04(.09) .04(.03)
Low GPA (lt3.0) .004(.17) .54(.26) .06(.08)
Prior CBT exp. .05(.09) -.10(.12) .09(.03)
No CBT exp. .10(.08) .15(.12) .01(.04)
Male .02(.11) -.01(.15) .09(.04)
Female .04(.07) .05(.10) .01(.03)
plt0.10, plt0.05, plt0.01
23IV. Results - Model 2
Demographic Characteristics Test-taking tools ?(SE) On-screen item display tools ?(SE) SV Time ?(SE)
All participants .03(.06) .03(.09) -.19(.15)
No LD/ADHD .03(.07) -.04(.09) -.24(.16)
LD/ADHD -.04(.17) .49(.24) .40(.45)
Graduate -.03(.08) .02(.11) -.39(.19)
Undergraduate .10(.09) .07(.14) .12(.24)
No disability .05(.07) -.04(.11) -.25(.18)
Disability -.02(.11) .02(.15) .01(.26)
plt0.10, plt0.05, plt0.01
24IV. Results - Model 2 (contd.)
Demographic Characteristics Test-taking tools ?(SE) On-screen item display tools ?(SE) SV Time ?(SE)
High GPA (?3.0) .03(.07) -.04(.09) -.24(.16)
Low GPA (lt3.0) -.01(.17) .50(.26) -.05(.43)
Prior CBT exp. -.05(.09) -.10(.12) -.09(.04)
No CBT exp. .10(.08) .15(.12) .01(.04)
Male .02(.11) -.02(.15) -.55(.28)
Female .03(.11) -.02(.16) -.55(.31)
plt0.10, plt0.05, plt0.01
25IV. Results - Model 3 (contd.)
Features above baseline preference Features at baseline preference Features below baseline preference
Strike-out Tagging for review Highlighting Change answer Question reorder Extra time Font size SV less 20 pt. SV less 40 pt.
All LD/ADHD No LD/ ADHD Grad Undergrad
26IV. Results - Model 3 (contd.)
Features above baseline preference Features at baseline preference Features below baseline preference
Tagging for review Highlighting Change answer Question reorder Extra time Note pad Strike-out SV less 40 pt. Extra time less 40 pt
High GPA ? 3.0 Low GPA lt3.0 Prior CBT experience No prior CBT experience Male Female
27IV. Results (contd.)
Test of equality of regression coefficients for
LD/ADHD status
Model 1 On-screen item display Z 2.107 p 0.02
Model 2 On-screen item display SV ET Z 2.07 p 0.02 Z1.34 p 0.09
Model 3 Tagging for later review Z 1.76 p 0.04 Strike-out Z 2.40 p 0.01 Self-voicing less 40 points Z 2.30 p 0.01
28IV. Results -Voluntary Top Feature Choice Exercise
29V. Summary of Results and Discussion
College students preferences for combinations of features of test design varied by demographic strata
At the attribute level (rank-ordered exercise) - Students with LD and/or ADHDAND students with low GPA indicated above baseline level of preference for on-screen item display relative to test-taking tools and access tools - Students without LD/ADHD, Graduates, No disabilities, With prior CBT experience and Males prefer Access Tools with no selection, BUT indicated below baseline preference when no selection is removed (except for those with no disabilities)
30V. Summary of Results and Discussion (contd.)
At the features level (rank-ordered exercise) ? Strike-out, Tagging (LD/ADHD With disabilities Undergrad GPA lt 3.0)
At the features level (voluntary top choice exercise) among Test-taking tools display ? Highlight (all participants No LD/ADHD High GPA Graduates, Undergraduates, No disabilities, Male Female)
31V. Summary of Results and Discussion (contd.)
At the features level (voluntary top choice exercise) among On-screen item display ? Note pad (across all demographic strata)
At the features level (voluntary top choice exercise) among Access tools ? Extra time (across all demographic strata)
32V. Implications for Future Research
Further investigation of examinee choice in high stakes computer-based test development Explore other features of test design Investigate concept of examinee choice with different college populations Expand UD in assessment to include construct irrelevant and construct related features. Provide examinee choice for features within high stakes test preparation material
33V. Limitations of Study
Sample selection did not follow formal procedures for stratified random sampling (Levy Lemeshow, 1991) Participants were all from a competitive Research One university (external validity) Focus on hypothetical choices rather than real time choices (stated vs. revealed preference) No way to determine if all participants clearly understood the trade-off exercise. (Notion of students with LD/ADHD and penalty)
34 END OF PRESENTATION