Title: StateNAEP Standard Mappings: Cautions and Alternatives
1State-NAEP Standard Mappings Cautions and
Alternatives
- Andrew Ho
- University of Iowa
2State-NAEP Percent Proficient Comparisons
3Visualizing Proficiency
- Setting cut scores is a judgmental process.
- Shifting the cut score changes the percent of
proficient students in a nonlinear fashion.
4Percent Proficiency by Cut Score
5Why a State-NAEP Link?
- State A and state B both report that 50 of
students are proficient. - Are states comparable in student proficiency, or
are their definitions of proficient different? - If state A has a higher standard for proficiency,
then state A students are more proficient. - How can we tell?
- Rationale If both states use a NAEP-like test,
and state A students are more proficient on NAEP,
then state A must have the higher standard for
proficiency.
6States A and B on NAEP
7Map State Standards Onto NAEP
8Interpretations
- State A (250) has a higher standard for
proficiency than state B (225). - Essentially adjusts or handicaps state
proficiency standards based on NAEP performance. - All else being equal (!), higher scoring NAEP
states will appear to have higher standards. - What percent of proficient students would state B
have to report to have the same as or a higher
standard than state A?
9Same Standards, Lower Proficient
10A Closer Look (Braun, Qian, McLaughlin)
NAEP distributions for students taking both tests
BQ average percents proficient.
Percents proficient on state tests for those
taking both tests
McLaughlin averages scores
11Strong Inferences, What Support?
- Handicapping state percents proficient by NAEP
performance logically requires a strong
relationship between state tests and NAEP. - Does this method require a strong relationship
between state tests and NAEP? - Does this method provide evidence for a strong
relationship between state tests and NAEP?
12Throw Percents, See What Falls Out
13It Doesnt Matter What Percents
- The method is blind to the meaning of the
percentages. - We can map state football standards to NAEP.
- States A and B both claim that 10 of the
students who took NAEP are JV- or Varsity-ready. - Conclusion State A has higher standards due to
its higher NAEP performance. State B should let
fewer students on its teams to match state As
standards. - We can map height to NAEP.
- States A and B both claim that 60 of the
students who took NAEP are tall. - Conclusion State A has higher standards due to
its higher NAEP performance. State B should
consider its students shorter to match state As
standards.
14Can the Method Check Itself?
15Mapped Standard and Proficient
- There is a strong negative correlation between
the NAEP mapping of the state standard and the
percent of proficient students. - Braun and Qian conclude that most of the observed
differences among state percents proficient are
due to the stringency of performance standards - Perhaps true, but unfounded.
16Look again at how the method works
17Unfounded (see also Koretz, 2007)
- The plot does not support the inference.
- It doesnt matter whether you are throwing
percents of proficient students, percents of
football players, or percents of tall students,
you will see a strong negative correlation. - Its standard setting and not
- Test Content?
- Motivation?
- State policies?
- This method cannot make these distinctions.
18If You Link It, They Will Come
- Braun and Qian have built a provocative,
speculative, casual mapping that will be
dramatically overinterpreted. - Braun and Mislevy (2005) argue against Intuitive
Test Theory tenets such as A Test Measures What
it Says at the Top of the Page, and, A Test is
a Test is a Test. - Would you handicap your golf game based on your
tennis performance? The analogy is largely
untested.
19Three Higher Order Issues
- Mappings are unlikely to last over time.
- NAEP and state trends are dissimilar.
- Percent Proficiency is Horrible Anyways
- Absolutely abysmal for reporting trends and gaps.
- NAEP-State Content analyses are the way to
proceed. - But the outcome variables should not be
percent-proficient-based statistics. - And content comparisons are just not easy to do.
20Proficiency Trends Over Time?
21Proficiency Trends Are Not Linear
22Revisiting the Distribution
23Percent Proficiency by Cut Score
24Two Time Points and a Cut Score
25Another Perspective (Holland, 2002)
26Proficiency Trends Depend on Cut Score
27Five States Under NCLB
28Sign Reversal?! Sure.
29Six Blind Men and an Elephant
Fan
Spear
Wall
Rope
Snake
Trunk
30NAEP R4 Trends by Cut Score
31State R4 Trends by Cut Score
32Percent Proficient-Based Trends
- Focus on only a slice of the distribution.
- Are not expected to be linear over time.
- Depend on the choice of cut score.
And now you want to compare them across tests?
This is where Braun and Qian can help.
Alternative methods exist
33NAEP Trend (Scale-Invariant Effect Size)
NAEP vs. State Trends
State Trend (Scale-Invariant Effect Size)
34Removing Pogo Sticks
- BQ Remove Pogo Sticks at Proficiency
- Alternatively, Address the Full Distribution
- Still leaves the necessararily prior question
- How similar are NAEP and the State test?
35Model Trend Discrepancies
- Model discrepancy as a function of content (Ho,
2005 Wei, Lukoff, Shen, Ho and Haertel, 2006) - Overlapping content trends should be similar
nonoverlapping content trends should account for
the discrepancies. - Doesnt work well in practice, so far.
- The tests are just too different!