Title: Computerized Adaptive Testing
1Reducing the duration and cost of assessment with
the GAIN Computer Adaptive Testing
2Evidence-Based Practice
- Requires accurate diagnosis, treatment placement,
and outcomes monitoring - Assessment over a wide range of domains
- The cost of evidence-based assessment is
- Time
- Respondent Burden
- Increased staff resources (including training
3Improving Efficiency
- The use of screeners and short-form instruments
has significantly improved the efficiency of the
assessment process - Can help determine whether a full assessment is
warranted - But not a substitute for a full assessment
- Lack of precision
- Floor and ceiling effects
- Limited content validity
4Computerized Adaptive Testing
- Selects items from a large bank of items based on
the responses made to previous items. - Continues to select and administer items until
sufficient measurement precision is obtained. - Combines the precision and comprehensiveness of a
full assessment with the efficiency of a screener.
5CAT Process
Typical Pattern of Responses
Increased Difficulty
- Score is calculated and the next best item is
selected based on item difficulty
Middle Difficulty
/- 1 Std. Error
Decreased Difficulty
Correct
Incorrect
6CAT in Clinical Assessment
7CAT in Clinical Assessment Issues
- Triage of individuals to support clinical
decision making
- Measurement of multiple clinical dimensions and
subdimensions
- Persons with atypical presentation of symptoms
- Generalizability of assessment to various groups
8Clinical Decision Making
- How severe are the symptoms?
- What type of treatment is most appropriate?
- Can CAT be used to answer these questions more
efficiently?
9Strategy
- Use CAT to place persons into low, moderate and
high levels of substance abuse and dependency. - Starting Rules
- Using screener measures to set the initial
measure and select the first item - Variable Stop Rules
- Tight precision around cut points
- Less precision away from cut points
10CAT Standard Error
11Results
- CAT to full-measure correlations ranged from .87
to .99 - Classification of persons into treatment groups
based on CAT and full measure (kappa
coefficients) ranged from .66 to .71. - Screener starting rule improved CAT efficiency by
7 percent - Variable stop rules improved efficiency by 15-38
12Measuring Multiple Dimensions
13Assessment on Multiple Dimensions
- Instruments often measure multiple domains
- In CAT, treating a multi-domain measure as
measuring one domain is problematic - Some subdimensions may not be adequately measured
14Strategy Content Balancing
- Set an item quota for each subscale
- Maximum number of subscale items to administer
during the CAT - An item is selected if
- Its subscale quota has not been met
- Provides maximum information
15Content Balancing Procedures
Method Screener Content Balanced
None No No
Screener Yes No
Mixed Yes Yes
Full No Yes
16Percentage of Items Administered by Subscale
IMDS Scale N Items None Screener Mixed Full
Depression 1 99 100 100 100
Depression 3 79 77 100 100
Homicidal/ Suicidal 1 21 100 100 100
Homicidal/ Suicidal 3 8 8 100 100
Anxiety 1 100 100 100 100
Anxiety 3 100 100 100 100
Trauma 1 100 100 100 100
Trauma 3 100 100 100 100
17Cont. Balancing CAT to Full IMDS Correlations
IMDS Scales None Screener Mixed Full
IMDS 0.98 0.98 0.98 0.97
Depression 0.96 0.94 0.96 0.96
Homicidal/Suicidal 0.60 0.83 0.96 0.95
Anxiety 0.96 0.95 0.96 0.96
Trauma 0.97 0.97 0.97 0.97
Average r 0.89 0.93 0.97 0.96
18Identifying Persons with Atypical Presentation of
Symptoms
19Overview
- Implications Clients sometimes endorse severe
clinical symptoms that are not reflected by
overall scores on standard assessments. - Statistics that can detect atypical presentation
of symptoms have important clinical implications.
- Strategy Identify fit statistics sensitive to
atypical presentation in a CAT context
20Rasch Fit Statistics
- Fit statistics are used to test particular
hypotheses. - Atypicalness Used to detect unexpected outlying,
off-target responses. Outlier sensitive - Example A person with a high level on the
measured trait misses an easy item. - Randomness Used to detect unexpected inlying,
targeted responses. - Both infit and outfit are chi-square statistics.
An infit or outfit value of 1.0 indicates perfect
fit to the Rasch model.
21Problems with Fit
Responses by Severity Low High Responses by Severity Low High Responses by Severity Low High Randomness Atypicalness
111 11111100000 0000 0.3 0.5
111 10101100010 0000 0.6 1.0
111 11101010000 0000 1.0 1.0
111 00001110000 0000 0.9 1.3
011 11111110000 0000 3.8 1.0
111 11111100000 0001 3.8 1.0
101 01010101010 1010 4.0 2.3
000 00000000011 1111 12.6 4.3
22Clinical Implications of Misfit
- Our analyses indicate that there are subgroups
who endorse severe symptoms without endorsement
of milder symptoms. - Examples
- Atypical suicide
- Substance use withdrawal without dependence
23Atypicalness by Number of Items
Number of Items Atypicalness Categories Atypicalness Categories Atypicalness Categories
Number of Items Uber Typical Typical Atypical
16 30.2 48.1 21.7
12 34.3 51.1 14.6
8 38.4 53.2 8.4
4 58.2 40.0 1.8
24Content Balancing and Atypicalness
Atypicalness Category None Screener Mixed Full Full IMDS
Proto Typical 26.7 34.6 48.3 50.5 49.2
Typical 69.0 58.7 40.8 38.9 38.4
Atypical 4.3 6.5 10.9 10.6 12.4
Kappa .27 .32 .48 .50 --
25Future Research
- Identify alternative fit statistics that are more
sensitive to atypical presentation of symptoms - Determine when it is likely that someone may be
present with atypical symptoms, and if so, select
items to confirm atypicalness.
26Generalizability of CAT to Various Groups
27Overview
- Persons at the same severity level may differ in
their endorsement of specific items. - This is called differential item functioning
(DIF) - On the GAIN, DIF has been detected by
- Age (adolescent vs. adult)
- Gender
- Ethnicity/Race
- Drug of choice
28DIF By GAIN Scale
Scale Total Age Gender Race Prim. Drug
Internal Mental Distress 43 13 5 10 26
Crime Violence 31 11 14 22 27
Behavioral Complexity 33 12 12 17 22
Substance Problems 16 8 5 9 16
29DIF and CAT
- The presence of DIF can limit our ability to
generalize measurement findings across different
groups. - Controlling for DIF becomes complicated as the
number of DIF items and groups/factors increases.
- Currently exploring a number of methods for
controlling DIF in CAT.
30Potential of CAT in Clinical Practice
- Reduce respondent burden
- Reduce staff resources
- Reduce data fragmentation
- Streamline complex assessment procedures
- Assist in clinical decision making
- Identify persons with atypical profiles
- Improve measurement generalizability
31Future Research
- How do we put it all together?
- Much of the research in the area of CAT has used
computer simulation. There is a need to test
working CAT systems in clinical practice.
32Contact Information
- A copy of this presentation will be at
www.chestnut.org/li/posters - For more information, please contact Barth Riley
at bbriley_at_chestnut.org