Title: NCLB and Growth Models: In Conflict or in Concert?
1NCLB and Growth Models In Conflict or in Concert?
- Susan L. Rigney, United States Department of
Education - Joseph A. Martineau, Michigan Department of
Education - Presented at the MARCES conference on
- Longitudinal Modeling of Student Achievement
- College Park, MD
- November 7, 2005
2Introduction
- In response to your concerns about giving
schools credit for improving student achievement,
we are also considering the idea of a growth
model - Margaret Spellings
- 9/13/05
3Author Perspectives
- Sue Rigney
- Education Specialist in the office of Student
Assessment and School Accountability (Title I) at
the U. S. Department of Education. - Primary responsibility monitoring state
compliance with the standards, assessment and
accountability requirements of NCLB - Secondary responsibility contributing to
ongoing discussion, clarification and
implementation of policies related to assessment
and accountability.
4Author Perspectives
- Joseph Martineau
- Psychometrician for the Michigan Office of
Educational Assessment and Accountability. - Primary concerns congruence of accountability
systems with values of educational research
adequacy of statistical psychometric
methodology - His secondary concerns philosophy and policy of
accountability in terms of both practicality and
feasibility - Authorship should not be construed as an
endorsement of NCLB as a whole.
5In conflict?
- CRS says
- Substantial interestin the possible use of
individual/cohort growth models Such AYP models
are not consistent with certain statutory
provisions of NCLB as currently interpreted by
USED - But, NCLB (Sec 4) says
- The Secretary shall take such steps as are
necessary to provide for the orderly transition
to, and implementation of, programs authorized by
this Act
6In concert?
- USED Growth Model Study Group
- IES grant for longitudinal data systems
- State Accountability Workbook Amendments
7Types of Models
- Definitions developed by a State collaborative
through CCSSO (Goldschmidt et al, 2005) - Definitions
- Cross-sectional models
- Status Models
- Improvement Models
- Longitudinal Models
- Growth Models
- Residual Growth (RG) Models
- Commonly labeled Value Added Models
- Why we use the term RG
8The Intersection of Policy and Growth Models
- 3-8 Assessments Provide Longitudinal Data
- Safe Harbor
- Use of Improvement Index in AYP
- CCSSO SCASS Activities
- USED Assistant Secretary Luce
9Systemic CoherenceA Standard for Evaluating
Models
- Three broad principles of systemic coherence
- Models are consistent with policy goals
- Models are integrated as a part of a consistent
system of content standards, assessments,
performance standards, and accountability
criteria - Models are implemented in a manner consistent
with the values of educational research
101. Standards-based
- Assessments must cover depth and breadth
- Results expressed in terms of performance levels
- Proficient is most influential component of AYP
112. All Students
- Participate (95 rule)
- Results reported for all
- AYP Not all Visible
- Full Academic Year
- Minimum n
- LEP exemption for ELA test
- Held to same standards
- Alternate based on alternate achievement standards
123. School Improvement
- Annual Measurable Objectives
- Increased in 2004-05
- Adjustment for transition in 2005-06
- School accountable for subgroups
- More visible in 2005-06
- Consequences
- Can/should growth moderate consequences?
13Consistency of Content Standards, Assessments,
Performance Standards, and Accountability Criteria
- Accountability based on academic indicators
- Peer Review of State Assessment Systems
- Alignment
- Performance descriptors
- Alternate assessments
14Coherent Assessment System
- State assessments
- Rational, coherent design
- Relative contribution of different tests
- Matrix forms equivalent
- Comparability
- English vs Spanish
- Computer vs paper pencil
- Local assessments
- Aligned, equivalent, comparable results for
subgroups, aggregable
15Results understandable
- Educators know what to do
- Articulation across grades
- Articulation across performance levels
- A progression matrix that show
- Proficient is different from basic because
- Proficient in third grade is different form
proficient in fourth grade because - Administrators know how to allocate resources
16Consistency with Values of Educational Research
- As defined by Gregory N. Derry1.
- Free flow of information Curiosity
- Replicability
- Thorough peer review
- Improvement
- Honesty and Open-mindedness
- Willingness to consider multiple alternatives
- Scrupulous investigations of weaknesses
- Flexibility to adopt feasible improvements
1 Professor of Physics at Loyola University and
author of What Science Is and How It Works
(Princeton University Press, 1999)
17Attributes of Systemic Coherence Applicable in
this Context
- Alignment of standards and assessments
- The same performance standards for all
- Inclusion of all student groups
- Explicit tracking of achievement gaps
- Appropriate statistical and psychometric models
- A program of ongoing research
- Consistency of reports with all other attributes
181. Alignment of Standards and Assessments
- Foundation of validity of school accountability
decisions - USED expects independent verification of
- Full range of content standards?
- Address content and process skills?
- Same degree and pattern of emphasis?
- Scores reflect full range of achievement?
- Procedures to maintain/improve?
19Alignment methods
- Alignment Methodology
- Webb (SCASS TILSA)
- Porter (SCASS SEC)
- Achieve
- Buros
- Methods do not address articulation across grades
- JM Current instantiations of independent
review may underestimate alignment
202. The Same Standards for All Students
- Grade-level achievement standards
- Except for students with most significant
cognitive disabilities (1) - All students proficient by 2013-14
- What about growth toward proficient?
- What about length of time in system?
- Proposals to balance fairness toward both
educators and student groups should also be a
part of any plan to implement growth models for
accountability purposes. Fairness toward one
should not be sacrificed for fairness toward the
other.
212. The Same Standards for All Students
- JM The NCLB expectation that all students will
be proficient by a given date seems unreasonable.
The recognition that there will always be
individual differences among students (and
aggregate differences across schools in their
intake populations) should also be incorporated
in setting policy targets. - SR Safe harbor recognizes that adequate yearly
progress may be met with less than 100 meeting
annual and long-range goals. - JM The safe harbor provision of NCLB is a good
beginning, but does not fully account for these
realities.
222. The Same Standards for All Students
- JM The punitive nature of NCLB consequences can
actually undermine policy objectives by adding
turbulence to schools serving low-achieving
students. - SR The pressures of accountability have resulted
in remarkable successes (Ed Trust), and there are
multiple safeguards to prevent Type I error. - JM The multiple safeguards are an important
starts, but policies encouraging more assistance
in and attraction of highly effective educators
to low-achieving schools is more likely to
support the policy objectives. - SR NCLB funds are available for recruitment and
retention bonuses, and data indicate that states
are beginning to use these funds in this way.
23Implications for growth model
- Expectation of same growth for all maintains
achievement gap - Expectation of 12 months growth in 1 year
maintains achievement gap - Expectation of normative growth maintains
achievement gap
243. Inclusion of All Student Groups
- Missing data means missing students
- How many missing students does it take to
compromise validity? - Robustness to missing data does not imply that it
is OK to leave out data where it can reasonably
be obtained
254. Explicitly Tracking Achievement Gaps
- Closing the achievement gap is a
- Policy objective
- Matter of ethics
- Attainable
- Tracking the achievement gap makes inequities
publicly visible
264. Explicitly Tracking Achievement Gaps,
continued
- Separate models from those used to track
attainment of growth targets - Include in the model variables defining
policy-defined subgroups - Interaction of grade with subgroup variables
- Simple graphical representation of the results
275. Appropriate Statistical and Psychometric Models
- Statistical concerns
- Match of model to data structure
- Violations of assumption
- Do random effects models cheat?
- How do we integrate results from alternate
assessments? - What is the sample, and what is the population?
- Different models needed for different purposes
- Meeting growth targets
- Tracking achievement gaps
- Primary research
285. Appropriate Statistical and Psychometric Models
- Statistical concerns
- Are the models correlational or causal? The
mandated data collection is correlations. - JM The mandated policy uses are more causal.
The descriptive statistics are used to label
schools as in need of improvement, and if
students are not achieving reasonable goals, it
is hard to argue with this label. However, the
distinction between schools in need of
improvement and ineffective educators is unlikely
to be either fathomed or appreciated by many
people. The nature of NCLB consequences invites
this unfounded interpretation. - SR The statute provides substantial resources
for professional development and instructional
materials in order to help educators meet the
extraordinary needs of the children they serve.
295. Appropriate Statistical and Psychometric
Models, continued
- Unwarranted assumptions
- No equating error
- Vertical Doran (2005)
- Horizontal not studied, but most assessments
only have a few anchor items in common across
years - Interval level scale
- If using scale scores, most models assume equal
interval measurement - Psychometrically suspect
- Effects not well studied
305. Appropriate Statistical and Psychometric
Models, continued
- Unwarranted assumptions, continued
- A single continuous scale on the same construct
across grades (vertical or developmental scales) - Mathematical demonstrations (Martineau, 2004, in
press) - We purposely build content shift into our
assessments across grades - High correlations among sub-constructs do not
take care of the problem - Students where growth is occurring outside the
curriculum-defined range for the grade are not
measured well - Effects of prior schools/grades become attributed
to later schools/grades - Practically significant effects of the
misattributions occur in all reasonably
conceivable assessment scenarios - Empirical validation (Lockwood et al, under peer
review) - Subscales of math assessment, greater variability
within teacher across subscales than across
teachers within subscale. - Low correlations in value added across
subscales - The sub-content matters tremendously
315. Appropriate Statistical and Psychometric
Models, continued
- Unwarranted assumptions, continued
- We need to account for equating error
- We need to study the effects of the
interval-level measurement assumption and either - Validate the assumption, or
- Not make the assumption
- We need to either
- Develop psychometric models that can account for
change in content across grades, or - Not assume the same content across grades
- Analytical models that avoid scale assumptions
- Hills Value Table approach (this conference)
- Betebenner transition matrix approach (2005)
- Standards-based interpretations, can use baseline
data
326. An Ongoing Program of Research
- A turbulent field (in its adolescence, to quote
Lissitz) - Large-scale implementation in a turbulent field
requires extraordinary flexibility to keep up
with the state of the art - And yet, too much flexibility can thwart useful
interpretation of trend data
337. Consistency of Reports with Other Attributes
- Responsive to instruction?
- Understandable to stakeholders?
- Grounded in policy aims?
- Valid reliable?
34Setting standards for growth
- Whats reasonable?
- vs
- What do we hope to accomplish?
- Whats fair?
35Growth school consequences
36Conclusions
- Can we add growth?
- Yes!
- Should we add growth?
- Yes, where there is an evaluative framework tied
to policy objectives, a systemic approach, and
alignment with the values of educational research - Must we add growth?
- An option, not a requirement because of the
extraordinary necessary infrastructure
37Recommendations for Policymakers
- Understand the basic differences between models
Run simulations with real data - Understand the limitations
- Listen to practitioners
- Listen to methodologists
- Anticipate cost/benefits
- Lack of stability corrupts meaning
- Do not over-specify the details in statute
- This field moves ahead quickly
- Flexibility to implement advances is key
38Recommendations for Accountability Implementation
Staff
- State Directors give your staff time to write it
up!! - Require greater detail in the Technical Manuals
that allows for comprehensive review of the
procedures - Explain it (as much as you can) to your
legislators and Congresspersons - Challenge assumptions
- Status quo is good
- Change is good
- Resource assumptions
- Claims of proponents
39Recommendations for Technical Researchers
- Validity need not conflict with transparency
- Validity
- Maintain sufficient complexity to produce valid
results - Transparency for non-technical stakeholders
- Simple, but accurate reports
- Grounded interpretations
- Transparency for technical stakeholders
- Comprehensive documentation of the entire system,
including psychometric and statistical models - Facilitation of replication
- Facilitation of primary research on strengths and
weaknesses
40Recommendations for Technical Researchers
- Pay systemic attention to
- Assumptions of psychometric models
- Assumptions of content standard models
- Assumptions of statistical models
- Think carefully about what the models can tell us
and cannot tell us about instruction, curriculum,
and student development - Develop simple graphical representations of the
model and its important concepts for policymaker
consumption - Become involved in public policy forums as a
community lobby in order to promote appropriate
interpretation of data. - We cannot give our cautions, wash our hands of
how the data is used, and stand on the outside of
the political process
41Recommendations for All Stakeholders
- Realize that with all of the high stakes
surrounding accountability uses of student
achievement data, there are forces that can work
against community interests - Economic benefits, reputations, and other
personal investments can cause proponents of
specific systems to avoid scrupulous
investigations of the shortcomings of those
systems and/or the benefits of competing
approaches - Willingness to be and accountability for being
rigorously honest and open-minded about multiple
approaches is an essential part of improving and
evaluating growth-based accountability systems