Title: Valid OR Reliable: Subgroups in NCLBs AYP
1Valid OR Reliable? Subgroups in NCLBs AYP
- Brian Gong
- Center for Assessment
- CCSSO Large-Scale Assessment Conference
- San Francisco, CA June 27, 2006
2Agreement on NCLB
- One of the BEST things about NCLB law Attention
to subgroups - QUESTIONED Equal goals for SWD, ELL
- PROBLEMATIC How to handle reliability of school
accountability decisions (e.g., minimum-n,
confidence intervals, conjunctive decisions,
student membership in multiple subgroups)
3Current political context
- Anticipation that every school will be
identified sooner because of subgroups or later
because of AMO rising to 100 by 2013-14 - Flurry of activities to decrease the numbers of
schools/districts identified by states and by
USDOE - Less attention on how to raise scores by
legitimate learning and school capacity - Even less attention on whether right schools
are identified (and not identified)
4Pressure to identify the right number of schools
Figure adapted from data published in Education
Week, Taking Root, by Lynn Olson, Dec. 8, 2004,
retrieved on 3/7/04 from http//www.edweek.org/ew/
articles/2004/12/08/15nclb-1.h24.html
5Assertion States/USED should develop more valid
ways to deal with subgroups
- Understand challenges and problems
- Develop solutions
- Create political environment to clarify values
and adopt appropriate solutions - Develop implementation supports
6Problems Validity
- Invalid goals dont fit subgroup/system
capacity - Invalid simple conjunctive rule doesnt reflect
conception of school quality - Invalid all-or-nothing identification doesnt
reflect school performance - Invalid sanctions dont fit shortcomings in
school performance - Invalid large impact on inclusion
- Invalid multiple membership clouds portrayal of
school performance - Invalid Implementation so uneven as to be unfair
7Problems Reliability (of school accountability
decisions)
- Unreliable multiple conjunctive decisions add
error to overall judgment - Unreliable decisions based on small sizes are
less reliable - Unreliable decisions based on gains are less
reliable - Unreliable/unknown How minimum-n, n-tested, and
confidence intervals interact - Unknown What is the right size for confidence
intervals, minimum-n, and n-tested?
8Solutions What IS Known
- States should be concerned about reliability of
school accountability decisionsespecially under
NCLBand implement sufficient safeguards - For subgroup accountability, validity and
reliability often involve trade-offs more valid
often means less reliable - Current USDOE guidance and approval often favors
reliability over validity, and Type II error (low
identification) over reliability
9Recap of NCLBs Subgroup Accountability
- Up to 37 hurdles for a school
- 9 each in Reading/ELA and Math performance
- School as a whole
- 5 race/ethnicity subgroups
- Economic Disadvantaged
- SWD
- ELL
- 9 each in Reading/ELA and Math participation
- Other Academic Indicator (Graduation rate in high
school states choice in pre-secondary) - Conjunctive Missing any one hurdle means schools
does not meet AYP for Status (may meet by safe
harbor, appeal, etc.)
10How a school is eligible to have a subgroup for
accountability
- Minimum number of students
- Definition of membership (e.g., SWD IEP or 504
plan, but not GT) - Note SWD identification and classification very
probably inconsistent across states and schools - FAY
- Definition of significant subgroup for
race/ethnicity
11Minimum-n
- Minimum-n size originally intended to help
address sampling error and provide some
reliability around school decisions, along with
the do not meet two years in a row - As threatened by high numbers of schools
identified, states and USED have used minimum-n
as a way out - Approved subgroup minimum size increasing to well
beyond 30, plus proposed percentages (e.g., 15
of total student body)
12Increasing Minimum-n Lose the baby and
bathwater solution
- Statistically inferior to use of confidence
intervals - Biased against large, diverse schools
- Protection against decision inconsistency for
status has diminishing returns - Demonstrably insufficient to guard against
unreliability in safe harbor decisions - Can have tremendous impact on invalidity of AYP
design
13AYP biased by minimum-n
14Impact of Increasing Minimum-n 1 current AMOs
n-sizes, five states, only SPED
15Impact of Increasing Minimum-n 2 Percent of
schools meeting AYP
16Impact of Increasing Minimum-n 3 Percent of
passing schools not meeting minimum-n for SPED
17Impact of Increasing Minimum-n 4 Percent of
SPED students in the state excluded
18Impact of Increasing Confidence Intervals Percent
of schools identified as meeting AYP (status)
19Logic of Sampling Error
- Why do we need to consider sampling error if all
the students in a school are tested in any one
year? - Interested in generalizing to future performance
with future students - Generalizing from performance of some finite
sample - Empirical support of year to year differences in
cohorts like samples drawn from a population
(Hill)
20Adjustment 1 Approve high confidence intervals
on status and safe harbor
- Do not approve high minimum-n sizes for
subgroups, if allowed high CIs (99) on both
status and safe harbor - 95 on each test avg. equivalent to 90 on family
of decisions across multiple conjunctive
decisions (see Hill DePascale, 2003) - Discuss safe harbor confidence intervals in depth
another time
21Make minimum-n more valid
- If not using a confidence interval, then
minimum-n creates a sharp break - School with 30 students is in, school with 29
students is out, no matter their performance,
e.g., school with 5 students of 29 proficient
declared Meets AYP by virtue of minimum-n - Using an optimizing calculationor benefit of
the doubt approachregarding minimum-n, could
make reliable judgments about these schools - School in example could have a maximum of 6
students proficient would it meet the AMO (with
a CI)? - Aggregate over subgroups (see Utah)
22Attend to distribution of students
- Minimum-n attends only to count of students
- But what about distribution of students across
subgroups (see Delaware discussion of multiple
group membership) - Also need to consider representation of subgroups
within school (fundamental logic of NCLB that a
very small proportion of students can determine
schools standing)
23Coherence with Subgroup
- Focus on adjustments that increase the validity
of the AYP system - Solve real problems that dont make sense to
schools and public (like small offense, large
consequence and different offense, same
consequence as well as political problems (like
over 80 of districts identified as not meeting
AYP)
24Qualitative Differences in Not Meeting AYP by
Subgroups
- Who doesnt meet
- How far from meeting (e.g., Lamitina)
- Assistance/Sanctions to match
25For more information
- Center for Assessment
- www.nciea.org
- Brian Gong
- bgong_at_nciea.org