StateNAEP Standard Mappings: Cautions and Alternatives - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

StateNAEP Standard Mappings: Cautions and Alternatives

Description:

Setting cut scores is a judgmental process. Shifting the cut score changes the percent of ... Removing Pogo Sticks. BQ Remove Pogo Sticks at Proficiency ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 36
Provided by: andrew72
Category:

less

Transcript and Presenter's Notes

Title: StateNAEP Standard Mappings: Cautions and Alternatives


1
State-NAEP Standard Mappings Cautions and
Alternatives
  • Andrew Ho
  • University of Iowa

2
State-NAEP Percent Proficient Comparisons
3
Visualizing Proficiency
  • Setting cut scores is a judgmental process.
  • Shifting the cut score changes the percent of
    proficient students in a nonlinear fashion.

4
Percent Proficiency by Cut Score
5
Why a State-NAEP Link?
  • State A and state B both report that 50 of
    students are proficient.
  • Are states comparable in student proficiency, or
    are their definitions of proficient different?
  • If state A has a higher standard for proficiency,
    then state A students are more proficient.
  • How can we tell?
  • Rationale If both states use a NAEP-like test,
    and state A students are more proficient on NAEP,
    then state A must have the higher standard for
    proficiency.

6
States A and B on NAEP
7
Map State Standards Onto NAEP
8
Interpretations
  • State A (250) has a higher standard for
    proficiency than state B (225).
  • Essentially adjusts or handicaps state
    proficiency standards based on NAEP performance.
  • All else being equal (!), higher scoring NAEP
    states will appear to have higher standards.
  • What percent of proficient students would state B
    have to report to have the same as or a higher
    standard than state A?

9
Same Standards, Lower Proficient
10
A Closer Look (Braun, Qian, McLaughlin)
NAEP distributions for students taking both tests
BQ average percents proficient.
Percents proficient on state tests for those
taking both tests
McLaughlin averages scores
11
Strong Inferences, What Support?
  • Handicapping state percents proficient by NAEP
    performance logically requires a strong
    relationship between state tests and NAEP.
  • Does this method require a strong relationship
    between state tests and NAEP?
  • Does this method provide evidence for a strong
    relationship between state tests and NAEP?

12
Throw Percents, See What Falls Out
13
It Doesnt Matter What Percents
  • The method is blind to the meaning of the
    percentages.
  • We can map state football standards to NAEP.
  • States A and B both claim that 10 of the
    students who took NAEP are JV- or Varsity-ready.
  • Conclusion State A has higher standards due to
    its higher NAEP performance. State B should let
    fewer students on its teams to match state As
    standards.
  • We can map height to NAEP.
  • States A and B both claim that 60 of the
    students who took NAEP are tall.
  • Conclusion State A has higher standards due to
    its higher NAEP performance. State B should
    consider its students shorter to match state As
    standards.

14
Can the Method Check Itself?
15
Mapped Standard and Proficient
  • There is a strong negative correlation between
    the NAEP mapping of the state standard and the
    percent of proficient students.
  • Braun and Qian conclude that most of the observed
    differences among state percents proficient are
    due to the stringency of performance standards
  • Perhaps true, but unfounded.

16
Look again at how the method works
17
Unfounded (see also Koretz, 2007)
  • The plot does not support the inference.
  • It doesnt matter whether you are throwing
    percents of proficient students, percents of
    football players, or percents of tall students,
    you will see a strong negative correlation.
  • Its standard setting and not
  • Test Content?
  • Motivation?
  • State policies?
  • This method cannot make these distinctions.

18
If You Link It, They Will Come
  • Braun and Qian have built a provocative,
    speculative, casual mapping that will be
    dramatically overinterpreted.
  • Braun and Mislevy (2005) argue against Intuitive
    Test Theory tenets such as A Test Measures What
    it Says at the Top of the Page, and, A Test is
    a Test is a Test.
  • Would you handicap your golf game based on your
    tennis performance? The analogy is largely
    untested.

19
Three Higher Order Issues
  • Mappings are unlikely to last over time.
  • NAEP and state trends are dissimilar.
  • Percent Proficiency is Horrible Anyways
  • Absolutely abysmal for reporting trends and gaps.
  • NAEP-State Content analyses are the way to
    proceed.
  • But the outcome variables should not be
    percent-proficient-based statistics.
  • And content comparisons are just not easy to do.

20
Proficiency Trends Over Time?
21
Proficiency Trends Are Not Linear
22
Revisiting the Distribution
23
Percent Proficiency by Cut Score
24
Two Time Points and a Cut Score
25
Another Perspective (Holland, 2002)
26
Proficiency Trends Depend on Cut Score
27
Five States Under NCLB
28
Sign Reversal?! Sure.
29
Six Blind Men and an Elephant
Fan
Spear
Wall
Rope
Snake
Trunk
30
NAEP R4 Trends by Cut Score
31
State R4 Trends by Cut Score
32
Percent Proficient-Based Trends
  • Focus on only a slice of the distribution.
  • Are not expected to be linear over time.
  • Depend on the choice of cut score.

And now you want to compare them across tests?
This is where Braun and Qian can help.
Alternative methods exist
33
NAEP Trend (Scale-Invariant Effect Size)
NAEP vs. State Trends
State Trend (Scale-Invariant Effect Size)
34
Removing Pogo Sticks
  • BQ Remove Pogo Sticks at Proficiency
  • Alternatively, Address the Full Distribution
  • Still leaves the necessararily prior question
  • How similar are NAEP and the State test?

35
Model Trend Discrepancies
  • Model discrepancy as a function of content (Ho,
    2005 Wei, Lukoff, Shen, Ho and Haertel, 2006)
  • Overlapping content trends should be similar
    nonoverlapping content trends should account for
    the discrepancies.
  • Doesnt work well in practice, so far.
  • The tests are just too different!
Write a Comment
User Comments (0)
About PowerShow.com