Title: The Determinants of Student Achievement: Different Estimates for Different Measures
1The Determinants of Student AchievementDifferent
Estimates for Different Measures
Tim Sass Department of Economics Florida State
University
CALDER Conference October 4, 2007
2Different Measures
- Types of Tests
- Criterion Reference Tests
- Test whether student has learned elements in
state established instructional standards - State specific
- Nationally Normed Tests
- Tests whether student has learned a set of
concepts and skills that may or may not
correspond to any particular states curriculum
benchmarks - Allows interstate comparisons
3Different Measures
- Scaling
- Non-Vertically Aligned Scale Scores
- Scale potentially different at each grade level
- Cant compare learning gains
- Criterion reference tests are typically not
vertically aligned - Vertical or Developmental Scales
- A single equal-interval scale that spans all
grade levels - A one-unit change means the same at all levels
within and between grades - Some norm-referenced exams are of this type
- Stanford Achievement Test
4Non-Vertically Aligned Scores
Grade 10
Trigonometry
Grade 9
Grade 8
Grade 7
Grade 6
Grade 5
Single-Digit Addition
Grade 4
Grade 3
5Vertically Scaled Scores
Trigonometry
Grade 10
Grade 9
Grade 8
Grade 7
Grade 6
If done right, vertically scaled exam ideal for
analyzing learning gains since one-point change
has same meaning everywhere on the scale.
Grade 5
Simple Addition
Grade 4
Grade 3
6Different Measures
- Scale Scores Normalized by Grade and Year
- Frequently used by researchers to compare a
students performance on criterion referenced
tests over time - Compares a students performance relative to the
performance of other students taking the same
grade-level exam in the same year - Unit of measure is the standard deviation
- If performance distribution changes from grade to
grade, normalized scores may not be comparable - Also sometimes used to try to equate performance
on different exams when a state changes their
test midstream
7Normalized Scores
0
Normalized score sets mean to zero and rescales
score
Grade 5
8Different Results
- Analysis of the Effectiveness of NBPTS Certified
Teachers - Harris and Sass, The Effects of NBPTS-Certified
Teachers on Student Achievement (February, 2007) - Compares the effectiveness of NBPTS-certified
teachers (NBCTs) with the effectiveness of
non-NBCTs in Florida - In many cases, results vary whether use scores
from Floridas criterion reference test, the
FCAT-Sunshine State Standards exam (FCAT-SSS), or
the Stanford Achievement Test, a norm-referenced
test (FCAT-NRT)
9Value-Added Estimates of Reading Achievement
Selected Explanatory Variables FCAT-SSS Developmental Scale FCAT-NRT Developmental Scale FCAT-SSS Normalized by Grade Year FCAT-NRT Normalized by Grade Year
NBPTS Certified 0.0163 0.0020 0.0186 0.0011
First-Year Teacher -0.0403 -0.0219 -0.0324 -0.0266
1-2 Years of Teaching Experience -0.0071 -0.0106 -0.0075 -0.0120
3-4 Years of Teaching Experience -0.0123 -0.0129 -0.0112 -0.0134
5-9 Years of Teaching Experience -0.0075 -0.0109 -0.0098 -0.0116
Advanced Degree -0.0128 0.0007 -0.0101 -0.0001
Class Size -0.0028 -0.0017 -0.0026 -0.0017
Note all coefficients expressed in standard
deviation units omitted experience
category is teachers with 10 years of
experience coefficients in green are
statistically significant at the 95 confidence
level
10Different Results
- More variation in estimated effects across exams
than in different scalings of same exam - Estimated effects of variables representing small
proportions of teachers most variable - NBPTS Certification
- Advanced Degrees
- Why are there differences across exams?
- Differences in material covered
- Differential ceiling effects
11Vertically Scaled Scores With Ceiling
Trigonometry
Grade 10
Grade 9
Grade 8
Grade 7
Grade 6
Grade 5
Simple Addition
Grade 4
Grade 3
12Conclusions
- Not much difference between developmental scale
scores and non-vertically aligned scores that are
normalized by grade and year - Different tests can yield different results
- Low-incidence variables seem to be most sensitive
to test instrument - Not clear whether differences due to material
tested or differential ceiling effects