Title: Test-Based Accountability Systems: Asking the Right Questions
1Test-Based Accountability Systems Asking the
Right Questions
- Laura Hamilton
- RAND
- February 1, 2003
- Presented at the Annenberg School for
Communication - University of Southern California
2Testing and information are cornerstones of state
and federal education policy
- Information is power testing and gathering
independent data are the ways to get information
into the hands of parents, educators and
taxpayers. - Until teachers and parents recognize what their
students know and can do, they can't help them
improve. Testing will raise expectations for all
students and ensure that no child slips through
the cracks. - Accountability begins with informed parents,
communities and elected leaders so we can work
together to improve schools. - (Source U.S. Department of Education NCLB web
site)
3NCLB mandates state-level test-based
accountability (TBA) systems
- Key components
- Standards that communicate what students must
learn - Tests to measure attainment of those standards
- Publication of information from tests
- Systems of consequences attached to test scores
- TBA is often treated as synonymous with
accountability - Information value of tests is emphasized
- All stakeholders are believed to benefit from
testing and the information it provides
4TBA is neither a panacea nor a curse
- Like any education reform, design and
implementation of test-based accountability
policies is subject to enormous variation - Effects have not been uniformly positive or
negative. - Debates typically fail to address importance of
policy variation, implementation differences, and
local context.
5What do we know about tests?
- Extensive variation across states
- Format (multiple-choice, essay)
- Subjects tested
- Methods for score reporting (norm-referenced,
criterion-referenced) - Scores may be unstable even when technical
quality (e.g., reliability) is high - Alignment between tests and standards is often
weak or unmeasured - Quality of tests is key, but may be undermined by
insufficient capacity.
6What do we know about TBA?
- Scores often rise when stakes are introduced
- Information value of scores is limited
- Scores become inflated
- Teachers do not find most existing tests useful
for instructional purposes. - Testing influences classroom activities
- Tests have stronger influence than standards
- Targets are often unrealistic
- but TBA systems may be designed to address
problems and maximize benefits.
7Rise in scores
- In almost every case, scores have risen for the
first few years after a TBA system is introduced - Makes TBA a relatively inexpensive way for
policymakers to demonstrate progress - Difficult to separate the effects of TBA from
other policy initiatives that occur at the same
time - Question What evidence is there to support
inferences about the positive effects of TBA?
8Information value of tests
- Score inflation is common
- Audit mechanisms can helpe.g., NAEP
- However, discrepancies in test-score trends are
difficult to understand without mapping tests to
standards - Teachers typically do not rely heavily on
information from standardized tests. - There are efforts to improve information value of
tests for teachers - Supplementary assessment system may help
- Information value of tests for other stakeholders
is unknown - Questions How do stakeholders use test data,
and what changes are needed to make data more
informative? What inferences do users make from
scores?
9Test scores rise, but increases donot always
generalize
Source Linn, 1999
10Influence on curriculum and instruction
- Teachers and administrators respond to testing by
shifting curriculum and instruction toward tested
content and away from untested content - School personnel learn how to game the system
- These actions affect validity of information
- Questions What changes have teachers and
principals made in response to tests? What do
teachers perceive as key leverage points? What
resources are teachers given to promote positive
changes?
11Teachers responses to tests can affect validity
of scores
1. Providing more instructional time 2.
Working harder to cover more material 3.
Working more effectively
Positive Teacher Responses
4. Reallocating classroom instruction time 5.
Aligning instruction with standards 6. Coaching
students to do better by focusing
instruction on incidental aspects of the test
Ambiguous Teacher Responses
Negative Teacher Responses
7. Cheating
Based on Koretz, McCaffrey and Hamilton, 2001
12Practices are affected more by tests than by
standards
Content and Performance Standards
Ideal model
Testing Program
School Policy
Classroom Practices
Student Outcomes
- Curriculum
- Professional development
- Instructional materials
- Knowledge
- Skills
- Attitudes
- Curriculum emphasis
- Instructional strategies
- Student groupings
13Practices are affected more by tests than by
standards
Content and Performance Standards
Reality
Testing Program
School Policy
Classroom Practices
Student Outcomes
- Curriculum
- Professional development
- Instructional materials
- Knowledge
- Skills
- Attitudes
- Curriculum emphasis
- Instructional strategies
- Student groupings
14Targets are difficult to reach
- NCLB and many state policies demand gains that
have never been achieved before - Burden is highest on initially low-performing
schools - Expectation for universal proficiency fails to
address outside influences - Questions How are low-performing schools
attempting to meet targets? Do staff view
targets as realistic and attainable?
15NCLB relative gains model
16Conditions required for TBA to work
- Solvable problems
- Attainable, publicly-endorsed standards
- High-quality information
- Salient and appropriate incentives
- Effective intervention
- External political environment supportive of
reform
17Solvable problems
- TBA policies rely on incentives and
locally-developed interventions to fix failing
schools. - Assumptions is that problems are due either to
unmotivated staff or to staff who dont know how
to improve their effectiveness. - TBA may not work if problems stem from other
sources (e.g., high levels of mobility or other
external factors that affect students ability to
learn severe lack of resources in schools)
18Attainable, publicly-endorsed standards
- Quality of standards is central, but their
effectiveness also depends on how they are
communicated and measured. - Must reflect some degree of public consensus
regarding what knowledge and skills are valued. - Must be perceived as attainable by teachers,
parents, students, administrators.
19High-quality information
- Tests must measure student attainment of
standards with a sufficient degree of reliability
and validity. - Evaluating technical quality of tests is
difficult appropriate way of measuring
reliability, for example, depends on how test was
constructed and what kinds of scores are
reported. - Alignment between tests and standards is
critical, but there is disagreement on what
alignment means and how to measure it. - System can be designed to reduce (but not
eliminate) score inflation. - Information must be communicated in a way that
meets stakeholders needs. - Limitations must also be communicated.
20Salient and appropriate incentives
- Incentives must motivate improved performance on
the part of educators must be perceived as
meaningful and important. - Incentive system must recognize role of students
and families in influencing outcomes. - System must balance individual- and group-level
incentives. - Effort must be made to reduce incentives that
encourage undesirable actions.
21Effective intervention
- Intervention strategies must work better than
what schools are currently doing. - States or districts must have resources to
intervene in all necessary cases - Interventions must not promote narrow test
preparation or other undesirable practices.
22External political environment supportive of
reform
- TBA functions in a broader political environment
- Design of TBA systems is influenced by political
considerations - Understanding the broader context is critical for
understanding how TBA will work in practice