Title: StateNAEP Standard Mappings: Cautions and Alternatives
1State-NAEP Standard Mappings Cautions and
Alternatives
- Andrew Ho
- University of Iowa
2When People Want a Method
- Give it to them?
- So someone more dangerous wont?
- But build in limitations?
- Make strenuous cautions?
- And try to change the subject?
3State-NAEP Percent Proficient Comparisons
4Visualizing Proficiency
- Setting cut scores is a judgmental process.
- Shifting the cut score changes the percent of
proficient students in a nonlinear fashion.
5Percent Proficiency by Cut Score
6Why a State-NAEP Link?
- State A and state B both report that 50 of
students are proficient. - Are states comparable in student proficiency, or
are their definitions of proficient different? - If state A has a higher standard for proficiency,
then state A students are more proficient. - How can we tell?
- Rationale If both states use a NAEP-like test,
and state A students are more proficient on NAEP,
then state A must have the higher standard for
proficiency.
7States A and B on NAEP
8Map State Standards Onto NAEP
9Interpretations
- State A (250) has a higher standard for
proficiency than state B (225). - Essentially adjusts or handicaps state
proficiency standards based on NAEP performance. - All else being equal (!), higher scoring NAEP
states will appear to have higher standards. - What percent of proficient students would state B
have to report to have the same as or a higher
standard than state A?
10Same Standards, Lower Proficient
11A Closer Look (Braun, Qian, McLaughlin)
NAEP distributions for students taking both tests
BQ average percents proficient.
Percents proficient on state tests for those
taking both tests
McLaughlin averages scores
12Strong Inferences, What Support?
- Handicapping state percents proficient by NAEP
performance logically requires a strong
relationship between state tests and NAEP. - Does this method require a strong relationship
between state tests and NAEP? - Does this method provide evidence for a strong
relationship between state tests and NAEP?
13Throw Percents, See What Falls Out
14It Doesnt Matter What Percents
- The method is blind to the meaning of the
percentages. - We can map state football standards to NAEP.
- States A and B both claim that 10 of the
students who took NAEP are JV- or Varsity-ready. - Conclusion State A has higher standards due to
its higher NAEP performance. State B should let
fewer students on its teams to match state As
standards. - We can map height to NAEP.
- States A and B both claim that 60 of the
students who took NAEP are tall. - Conclusion State A has higher standards due to
its higher NAEP performance. State B should
consider its students shorter to match state As
standards.
15Can the Method Check Itself?
Braun and Qian (2007)
16Mapped Standard and Proficient
- There is a strong negative correlation between
the NAEP mapping of the state standard and the
percent of proficient students. - Braun and Qian We assert that most of the
observed differences among states in the
proportions of students meeting states
proficiency standards are the result of
differences in the stringency of their
standards. - Perhaps true, but unfounded.
17Look again at how the method works
18Unfounded (see also Koretz, 2007)
- The plot does not support the inference.
- It doesnt matter whether you are throwing
percents of proficient students, percents of
football players, or percents of tall students,
you will see a strong negative correlation. - Its standard setting and not
- Test Content?
- Motivation?
- State policies?
- This method cannot make these distinctions.
19If You Link It, They Will Come
- Braun and Qian have built a provocative,
speculative, casual mapping that will be
dramatically overinterpreted. - Braun and Mislevy (2005) argue against Intuitive
Test Theory tenets such as A Test Measures What
it Says at the Top of the Page, and, A Test is
a Test is a Test. - Would you handicap your golf game based on your
tennis performance? The analogy is largely
untested.
20Three Higher Order Issues
- Mappings are unlikely to last over time.
- NAEP and state trends are dissimilar.
- Percent Proficiency is Horrible Anyways
- Absolutely abysmal for reporting trends and gaps.
- NAEP-State Content analyses are the way to
proceed. - But the outcome variables should not be
percent-proficient-based statistics. - And this is just not easy to do.
21Proficiency Trends Over Time?
22Proficiency Trends Are Not Linear
23Revisiting the Distribution
24Percent Proficiency by Cut Score
25Two Time Points and a Cut Score
26Another Perspective (Holland, 2002)
27Proficiency Trends Depend on Cut Score
28Five States Under NCLB
29Sign Reversal?! Sure.
30Six Blind Men and an Elephant
Fan
Spear
Wall
Rope
Snake
Trunk
31NAEP R4 Trends by Cut Score
32State R4 Trends by Cut Score
33Percent Proficient-Based Trends
- Focus on only a slice of the distribution.
- Are not expected to be linear over time.
- Depend on the choice of cut score.
And now you want to compare them across tests?!
34NAEP Trend (Scale-Invariant Effect Size)
NAEP vs. State Trends
State Trend (Scale-Invariant Effect Size)
35Model Trend Discrepancies
- As a function of content (Ho, 2005 Wei, Lukoff,
Shen, Ho and Haertel, 2006) - Overlapping content trends should be similar
nonoverlapping content trends should account for
the discrepancies. - Doesnt work well in practice, so far.
- The tests are just too different!
362005 NAEP Black-White Gap by PAC
37Gap Trend Flipping
38NAEP Math Grade 4 Gap Trends
39Dont Forget the Elephant
- Gaps Naturally Bow
- Trends Naturally Bow, Occasionally Flip
- Gap Trends NATURALLY Flip
- High-Stakes Ambiguity
- Students are learning more and less.
- Teachers are teaching better and worse.
- NCLB is working and not working.
- There is and is not progress toward equity in
educational opportunity.
40State vs. NAEPTwo Kids on Pogo Sticks
41State vs. NAEP
- Averages are better than PACs.
- But even averages are weak.
- Distorting the scale can reverse the sign of an
average-based trend, too! - Not a problem for one test, but for cross-test
comparisons for tests with different scales - How about a scale-free statistic?
42Further Considerations
- Which states have comparable 03-05 trends?
- Changed cut scores, tests, or scales?
- Incorporate Alternate Assessments?
- Fall vs. Spring Testing?
- Grade 3 or 5 in place of Grade 4?
- Reading vs. English and Language Arts?
43References
- Koretz, McCaffrey and Hamilton (2001)
- Linn, Baker and Betebenner (2002)
- Haertel, Thrash and Wiley (1978)
- Spencer (1983)
- Holland (2002)
- Ho and Haertel (2006)
- And thanks to Tracey Magda, my RA.
44You Dont Need the State Distribution!
NAEP Proficiency Curve
State Proficiency Curve
But a Mapping Cannot Validate Itself Alone!!!
45Mapping State Football Standards Onto the NAEP
Scale
State Football Proficiency Curve
NAEP Proficiency Curve
PeeWee Pop Jr. JV HS C Pro
46Mapping School Doorway Height Onto the NAEP Scale
NAEP Proficiency Curve
State Height Cutoff Curve
49 50 53 56 59 60 63
47Tabular Equipercentile Mapping
- For all students who take NAEP,
- Sort their heights in descending order.
- Sort their NAEP scores in descending order.
- Mapping