Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments

About This Presentation

Title:

Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments

Description:

Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments Marie-Andr e Somers (Presenter) Pei Zhu Edmond Wong MDRC – PowerPoint PPT presentation

Number of Views:132

Avg rating:3.0/5.0

Slides: 22

Provided by: Pei96

Learn more at: https://ies.ed.gov

Category:

more less

Transcript and Presenter's Notes

Title: Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments

1
Using State Tests to Measure Student Achievement
in Large-Scale Randomized Experiments
An Empirical Assessment Based on Four Recent
Evaluations
Marie-Andrée Somers (Presenter) Pei Zhu Edmond
Wong MDRC

IES Research Conference
June 28th, 2010

2
Two key concerns with using state tests in an
evaluation

They may not be suitable for the evaluation
Validity concerns They may not be aligned with
outcomes of interest (do not provide a valid
inference about program impacts)
Reliability concerns They may be too difficult
for low-performing students (unreliable)
Variation in scale/content of state tests also
complicates the task of combining impact findings
across states and grades

3
About This Study

Funded by Institute of Education Sciences (IES)
Purpose is to bring data to bear on several
topics covered in May et al. discussion paper
Are state tests suitable for evaluation purposes?
As a measure of the outcome(s) of interest?
As a measure of student achievement at baseline?
How should impacts on state tests be pooled?
Are impact findings sensitive to methods of
rescaling and aggregating test scores across
states and/or grades?

4
Overview of Analytical Approach

We identified 4 large-scale randomized
experiments where achievement was measured using
both (i) state tests AND (ii) a study test
The study test provides a benchmark for gauging
the suitability of state tests
Two types of analyses
Impact analyses We compared estimated impacts on
state tests and on the benchmark study test
Descriptive analyses We also examined published
information on the characteristics/content of
tests

5
Data and Samples
Study A Study B Study C Study D
Targeted Outcome General Reading Achievement General Math Achievement Specific Reading Outcome Specific Math Outcome
Level Elementary Elementary High School Middle School
Sample for Analysis 1,032 (9 states) 944 (7 states) 1,065 (4 states) 4,387 (9 states)

Studies represent diversity with respect to grade
levels and outcomes
Analysis sample includes students with a state
test score and a study test score

6
Approach for Estimating Impacts

Impact on state tests
Rescaling Scores are z-scored by state and
grade using the sample mean and standard
deviation
Pooling approach Impacts by state and grade are
aggregated using precision weighting
Impact on the study test
Rescaled/pooled using the same approach for
comparability

7
Criteria for Assessing Suitability

Two dimensions of suitability
Validity
Whether the content of state tests is aligned
with the outcomes of interest in the evaluation
Reliability
Whether state tests provide a reliable measure of
achievement for the target population (in this
case, low-performing students)
A key concern State tests have low reliability
and do not yield valid inferences about program
effectiveness

8
Criteria for Assessing Suitability

Implications for the impact findings
Poor Validity
Could fail to detect impacts on the outcome of
interest (invalid inference about program
effectiveness)
Affects the magnitude of the estimated impact on
state tests
Low Reliability
Student achievement is estimated with greater
error
Affects the standard error of the estimated
impact on state tests

9
Criteria for Assessing Suitability

Reliability Compare the standard error of the
estimated impact on state tests vs. the study
test
Smaller standard error is better (more
precision)
Validity Compare the magnitude of the impact
estimates, in light of estimation error
Compare the statistical significance of the
impact findings (i.e., conclusions about program
effectiveness based on p-value)
If both estimates are statistically significant,
then also compare their magnitudes

10
Criteria for Assessing Validity

The extent to which the magnitude of the impact
estimates are expected to differ depends on the
outcome that state tests are intended to measure
Two types of intervention
Targeted outcome is general achievement (Studies
A and B)
The outcome of interest is general achievement
in math or reading
Both state tests and the study test measure the
targeted outcome (general achievement)
If state tests are valid, then the impact on the
study test and state tests should be similar

11
Criteria for Assessing Validity

Two types of intervention (ctd.)
Targeted outcome is a specific skill (Studies C
and D)
There are two outcomes of interest
Targeted skill (short-term) and
General achievement (longer term)
Study test is used to measure the short-term
outcome (specific skill), while state tests are
used to measure the longer-term outcome (general
achievement)
If state tests are valid, then the impact on
state tests should be smaller than the impact on
the study test

12
Benchmark Impact on the Study Test
13
P-Value Magnitude (Validity)
Targeted Outcome is General Achievement
p 0.119
p 0.055
14
P-Value Magnitude (Validity)
Targeted Outcome is General Achievement
p 0.119
p 0.189
p 0.055
p 0.229
15
P-Value Magnitude (Validity)
Targeted Outcome is a Specific Skill
p 0.002
p 0.578
16
P-Value Magnitude (Validity)
Targeted Outcome is a Specific Skill
p 0.002
p 0.007
p 0.578
17
P-Value Magnitude (Validity)
Targeted Outcome is a Specific Skill
p 0.002
p 0.007
p 0.578
p 0.219
18
Standard Errors (Reliability)
19
Standard Errors (Reliability)
State-Study Ratio 1.20
1.07 1.04
1.03
20
Conclusion

Findings suggest that state tests can be used as
a complement to a study-administered test
State tests are suitable (valid and reliable) in
3 of 4 studies
Whether state tests can be used as a substitute
for a study test is an open question
Limited availability in some grades and subjects
Available for all states/grades in only 1 of 4
studies
May not be able to use them to measure a specific
targeted skill
Possibly less reliable
Findings from descriptive analysis lead to the
same conclusions as the impact analysis

21
Questions?

Marie-Andrée Somers
marie-andree.somers_at_mdrc.org
Pei Zhu
pei.zhu_at_mdrc.org
Edmond Wong
edmond.wong_at_mdrc.org

Write a Comment

User Comments (0)

About PowerShow.com

Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments - PowerPoint PPT Presentation

Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments

Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments Marie-Andr e Somers (Presenter) Pei Zhu Edmond Wong MDRC – PowerPoint PPT presentation