Title: Cultivating Success in Fertile Soil
1Cultivating Success in Fertile Soil
Raising Expectations and Outcomes for Students
with Disabilities through Assessment and
Accountability Systems
Martha Thurlow, Rachel Quenemoen, Sandy Thompson,
John Bielinski, and Jane Minnema National
Center on Educational Outcomes (NCEO)
University of Minnesota
2Clinic Agenda
Part I Separate the Wheat from the Chaff Part
II Plow New Ground Part III Apply the Best
Fertilizers Part IV Harvest a Rich Return
3NCEO Resources
Visit education.umn.edu/nceo or Search for NCEO
4Part I. Separate the Wheat from the Chaff
5In 1989, the Education Summit set an agenda for
education reform that called for
- Higher Expectations
- Rigorous Educational Standards
- Assessments of Progress toward Standards
NCEO was funded in 1990 to look at the
educational outcomes of students with disabilities
6IDEA 97
New Assessment Provisions
- Participation of students with disabilities in
state and district assessments - Alternate assessments for those students unable
to participate in general state or district
assessments - Inclusion of disaggregated participation and
performance data of students with disabilities in
public reports whenever data are provided for all
students
7Title I
Includes ALL Students
- All eligible students can receive Title I
services, regardless of other services provided - Title I evaluation is based on statewide
assessment, which is to include all students - States must report statewide data, with
disaggregations for students with disabilities,
LEP students, and other groups - States must define adequate yearly progress (AYP)
and evaluate schools against AYP
8Common Themes Include
- Participation of ALL students in state and
district assessments - Reported information about the performance of
special populations, relative to other students - Measurement against consistent goals and
standards for ALL students (to the maximum extent
appropriate)
9Standards-Based Reform Context
High Standards
All Students
--- Everything else is negotiable ---
schedules, place, time, structure, curriculum,
methods of assessment, instructional methods . .
.
AcCOUNTability
10Accountability System Components
Goals (Content Standards)
Indicators of Success (Performance Standards)
Measures of Performance (Assessment System)
Reporting
Consequences
11Clarification of Assessments
Eligibility Assessments
Classroom Tests
Large-Scale Assessments
Districtwide Statewide National
12VARYING Context of State Assessments
- Some measure basics, others high standards
- Some are high stakes for students, some high
stakes for systems, some are both - Grades administered vary, as do content areas
(all have Reading and Math) - Some are norm-referenced, some are criterion
referenced, and some are both - Varying approaches to accommodations and
alternate assessments
13Principles of Inclusive Assessment and
Accountability Systems
- Principle 1. All students with disabilities are
included in the assessment system. - Principle 2. Decisions about how students with
disabilities participate in the assessment system
are the result of clearly articulated
decision-making processes.
14Principles of Inclusive Assessment and
Accountability Systems
- Principle 3. All students with disabilities are
included when student scores are publicly
reported, in the same frequency and format as all
other students. - Principle 4. The assessment performance of
students with disabilities has the same impact of
the final accountability index as the
performance of other students.
15Principles of Inclusive Assessment and
Accountability Systems
- Principle 5. There is improvement of both the
assessment system and the accountability system
over time - Principle 6. Every policy and practice reflects
the belief that all students must be included in
state and district assessment and
accountability systems.
16Part II. Plow New Ground
17High Stakes Testing
- Student Accountability students are held
responsible and consequences are assigned to them
(e.g., must pass test to graduate or move to next
grade) 20 States - System Accountability educators, schools, or
districts are held responsible and consequences
are assigned to them (e.g., schools rated
according to test scores, teachers receive
rewards for student performance) 38 States
18Definitions
- Norm-Referenced Test (NRT)A test that allows
its users to make score interpretations of a test
takers performance in relationship to the
performance of other people in a specified
reference population. - Criterion-Referenced Test (CRT) A
test that allows its users to make score
interpretations in relation to a functional
performance level, typically through cut score
definitions. - From OCR Resource Guide, December 2000
19Norm-Referenced Tests
Criterion-Referenced Tests
Proficient
Levels 1 2
3 4
20Reliability
Reliability is an index of the precision with
which an examinees score is estimated with a
particular set of items.
Not reliable Or valid
Reliable, But not valid
Reliable Valid
21Validity
The degree to which test scores accurately
reflect the types of inferences made.
Non- Accommodated
Accommodated
22Test score utility for school improvement
resulting in improved outcomes for students with
disabilities, depends on alignment with standards
Curriculum and Instruction Improvement
CONTENT STANDARDS
Standardized Test
23When the Numbers Are Not Enough
- IDEA and IASA require states to report the number
of students with disabilities participating in
the regular assessment, and the number
participating in the alternate assessment - Two important numbers
24- How does the performance of students receiving
special ed. services compare to the performance
of all students? -
Number receiving SpEd
Number
of students
Math Proficiency
300
70
Partially Proficient
250
20
Proficient
50
10
Advanced
- Now, look how easy it is to compare groups
when - percentages are reported in each category
25Who is counted affects interpretation of results
26Louisiana Data (NRT)
IEP IEP 504 504 Grade In Out In Out
3 5.7 94.3 12.8 87.2 5 5.8 94.2 15.4 84.6
6 6.1 93.9 19.2 81.8 9 11.2 88.8 22.6 77.4
27Assessment Participation Rates of Students with
Disabilities
- Rates Vary Tremendously Across States
- The lowest rate is 15
- The highest rate is 100
2814 states disaggregated data on the participation
of students with disabilities 17 states
disaggregated data on the performance of students
with disabilities
29Current Research and Technical Challenges
- Accommodations and Modifications
- Alternate Assessment
- Out-of-Level Testing
- Other GAP Assessments
30Accommodations and ModificationsFertilizer
Guaranteedto Boost Yield
31Accommodation Use
- Is on the rise
- About 50 of the LD students accommodated
- Most common accommodations are
- small group administration
- read-aloud
- extended time
- Accumulating evidence from experimental studies
indicates that some accommodations boost
performance
32The Metaphor
- An accommodation is a change in testing
materials or procedures that
- increases access to the test for students with
disabilities.
- results in measurement of student abilities not
disabilities
- levels-the-playing field
33Psychometric Definition
- An accommodation represents an alteration to
standard test conditions that neutralizes
extraneous sources of difficulty that result from
an interaction between standard administration
and the students disability while preserving the
measurement goals of the test.
34Example
85
- 0
- 5
- 10
35Test Score Boost
- An accommodation should boost performance for
students with disabilities but not for students
without disabilities - necessary but NOT sufficient
- Since 1995, there were 38 empirical studies of
test score boost 6 studies examining construct
validity - Single subject design
- Test individuals under many conditions
- Use very short (usually single item) performance
tests - look for accommodations that result in large
boost - do not account for measurement error in the
comparison of performance
36Preserving Measurement Goals
- The construct the test was designed to measure
should remain unchanged by the presence of an
accommodation - Requires construct validation studies
- Test score boost
- associations with other measures
- invariance of the item characteristics
- Difficult to do with small samples
- Extant data well suited for construct validity
study - large samples
- real-world
- less expensive/time-consuming
37Construct-validation
- Compare item characteristics across groups
- Differential Item Functioning Analysis
- Structural Equation Modeling
- Four possibilities if an effect is found
- Accommodation not appropriately administered
- Accommodation not administered to appropriate
population - Accommodation doesnt work
- Some combination of these
38Research Findings
- DIF analysis across four groups
- Non-disabled, non-accommodated
- Low performing, non-disabled, non-accommodated
- Reading disabled, no read-aloud accommodation
- Reading disabled, with read-aloud accommodation
- Results
39Alternate Assessment for those students unable
to participate in general state assessments
New part of state and district assessment systems
- Did not exist in most places before IDEA -
Lots of activity in the past year!
40Alternate Assessments are intended to provide the
missing piece that makes it possible to include
ALL students with disabilities
Many states have found the need for more than one
missing piece
41Focus of Alternate Assessments is Evolving
Number
of States
99 00 01 State
Standards/Expanded 19 28 19 Skills
Linked to Standards -- 3
14 Standards Additional Skills 1 7
9 Skills Only 16
9 4 Other or Uncertain
24 3 3
42As Focus Evolves, So Does Assessment
Decision-Making Process
43Example from MA training -Who should take
MCAS-Alt?
- A student with a disability
- Who requires substantial modifications to
instructional level and learning standards in a
content area, and - Who requires intensive, individualized
instruction in order to acquire and generalize
knowledge, and - Who is unable to demonstrate achievement of
learning standards on a paper and pencil test,
even with accommodations
44Variations in Approach
- Body of Evidence/ Portfolio 24 states
- Checklist 9 states
- IEP team determines strategy 4 states
- IEP analysis 3 states
- Combination of strategies 4 states
- Specific performance assessment 4 states
- No decision 2 states
45Stakeholders Bring Different Values and Beliefs
to the Table
- Alternate assessment developers in nearly all
states included - State special education and assessment personnel
- Local administrators, special and general
educators, assessment coordinators, and related
service providers - Parents and advocates
- A few states included students and adults with
disabilities
46Variations in Student Performance Measures
- Skill/competence 40 states
- Independence 32 states
- Progress 24 states
- Ability to generalize 18 states
- Other 7 states
47Variations in System Performance Measures
- Variety of Settings 21 states
- Staff Support 20 states
- Appropriateness (e.g, age, challenge) 20
states - Gen. Ed. Participation 12 states
- Parent Satisfaction 9 states
- No system measures 8 states
48Example Arkansas Scoring Domain Definitions
- Performance - demonstration of skill while
attempting a given task. Each entry is scored - Support - assistance provided to a student during
performance of tasks. Each entry is scored - Appropriateness - The degree to which the tasks
1) reflect the chronological age of a student, 2)
provide a challenge for the student, and 3) are
representative of real-world activities that
promote increased independence. Each entry is
scored - Settings - settings or environments in which
tasks are administered/performed for math
entries and for ELA entries. Scored once for
each content area across entries
49Alternate Assessment Performance Descriptors
About one-third of states are using the same
performance descriptors for their alternate and
general assessments Slightly more states are
using different performance descriptors
50Absolute vs. Relative Performance Standards
- Some states emphasize measurement against
absolute standards over the relative emphasis on
individualized needs and abilities. - In these states, most students participating in
the alternate assessment are performing at the
0 or 1 levels. - Other states have a separate definition of
performance levels for the alternate assessment
that emphasizes student-by-student growth of
skill toward the relative standard based on the
high expectation bridge, not in comparison to
absolute standards. - With this approach, student results can be at
any of the proficiency levels.
51Number of States Reporting Alternate Assessment
Results
- Blended with General Assessment Results
- 1
- Results Reported Separately
- 0
52Out-of-Level Testing What Does it Offer?
Semantics Out-of-level/functional
level Alternative/alternate Standards-Based
Measurement assess proficiency against curriculum
standards use proficiency levels dissatisfaction
about the inability to detect progress for
students in lowest proficiency level
53Proficiency Standards the technical view
54Measuring Progress
- Further split bottom level
- Compounds already unreliable measurement
- Unreliable measurement of achievement leads to
unreliable of progress - If measuring progress is important measuring
progress within groups is required (e.g. SpEd),
then we need reliable measurement for all kids
55PRECISION
- Precision (reliability) is the cap of validity
- poor precision poor validity
- Precision decreases exponentially as test scores
move toward the tails - too few items to indicate what the examinee can
and cannot do
56- Less Precision in the Tails
57Precision Increased with3 Linked Tests
58Multilevel System
- Require larger item banks
- Require linking (e.g. concurrent calibration)
- Concepts Content should overlap across levels
- Require mechanism for assignment to levels
- safe guards to ensure appropriate assignment
59Out-of-Level Testing A policy and practice view
- the administration of a test at a level
above or below the level that is generally
recommended for a student based on his or her age
or grade. - Study Group on Alternate Assessment, 1999
60Out-of-Level Controversy
- Surrounded by contentious issues
- Debated at federal, state, district, and school
levels - Opinions vary across and within multiple
stakeholder groups
61Caution signs OOLT in standards-based settings
- While the psychometric basis for
out-of-level testing may apply to instructional
assessments, the logic may not hold up
when measuring against standards. - The consequences of out-of-level testing have not
been adequately addressed - does performance
begin to plateau? Do expectations drop over
time, further affecting instruction?
62Expansion and Variability
- Rapid expansion of out-of-level testing programs
- Wide variability in policy content and
implementation practices
63Increase from 1993 to 2001
- Numbers of states allowing is increasing
- Wide variability in policy and implementation
- 1993 1 State (Georgia)
- 1995 5 States (Connecticut, Georgia, Kansas,
North Carolina, Oregon) - 1997 10 States (Alaska, Connecticut, Georgia,
Maine, Missouri, New Hampshire, New York, North
Dakota, Vermont, West Virginia) - 2001 17 States (Alabama, Arizona, California,
Connecticut, Delaware, Georgia, Hawaii,
Iowa, Louisiana, Mississippi, North Dakota,
Oregon, South Carolina, Texas, Utah, Vermont,
West Virginia)
64Who Gets Tested Out of Level?
- Only students receiving special education
services most states - Students with 504 Accommodation Plans a few
states - LEP students a few states
- Any student one state
65Where Out-of-Level TestingFits in Assessment
Systems
- Accommodation
- Non-standard Accommodation
- Modification
- Adapted Assessment
- Alternate Assessment
66Number of Levels Tested Below Grade Level(n 12
states)
- NRT CRT
- 1 level 2 2
- 1-2 levels 1 0
- 3-4 levels 2 0
- Instructional level 1 4
- Test levels only 0 2
67Pros and Cons for Students
- More accurate instructional decisions BUT
Lower expectations - Grade retention may decrease BUT May not
receive regular diploma - Students may have less test anxiety BUT Less
motivation to complete a developmentally
inappropriate test
68Pros and Cons for Systems
- More students included in state tests BUT
Students with disabilities treated
differently... May result in exclusion from
reporting or accountability. - Test scores may be more valid BUT Test scores
may not be usable - Way to improve assessment and accountability
systems BUT State systems may not actually be
inclusive
69Current Research
- Describing the prevalence of out-of-level testing
- Determining how students are selected for
out-of-level testing - Investigating the impact of out-of-level testing
on academic performance
70Other Gap Assessments
71Ways to Participate
- Same way as other students
- With accommodations
- Alternate assessment
- Not as simple as it looks
- Some states are identifying
other ways
72Hybrid Plants
73Some hybrids are better than others . . .
PRETTY GOOD, GOOD, NOT SO GOOD, or REALLY NOT GOOD
How do you tell?
Back to the Principles!
74Levels Testing
Different from out-of-level testing?
Really assessing the same standards for all
students?
Implications for performance over time?
75Developmental Scales and Other Assessments
Really assess the same standards as for other
students?
How can scores be aggregated with other scores?
Implications for standards-based instruction?
76Alternative Assessments (e.g., juried
assessments)
Enough basis for good alternatives in
performance assessment literature?
Should these be available only to students with
disabilities?
How can this information be aggregated with test
scores?
77SUMMARY A work in progress!
- Accommodations and Modifications
- Alternate Assessment
- Out-of-Level Testing
- Other GAP Assessments
78Interpreting Performance Trends for Students in
Special Education
79What goes in, must come out!Methods for
Reporting Trends
- Cross-sectional
- across grades within year
- Cohort-dynamic
- within grade across years
- Cohort-static
- across grades across years
- a group is defined in base year and tracked over
time
80Considerations for SWD
- Disability label is NOT static
- Participation rates tend to increase with grade
level - Accommodation use tends to decrease with grade
level - Drop-out occurs mostly among students with a mild
disability
81Transitions
82Transitions Tied to Performance
83Impact on Achievement Trends
84Part III. Apply the Best Fertilizers
85How High is High Enough?
HIGH EXPECTATION BRIDGE
Content and Performance Standards
IEP Goals and Objectives
86What Other Information Exists. . .
- Beginning results of a study on out-of-level
testing - Analyses of the effects of accommodations on test
score comparability and validity - Exploration of issues in assessment for students
with disabilities who are also English Language
Learners - Studies of related policies graduation
requirements, social promotion, appeals/waiver
procedures - Continued analyses of state data to better
understand accommodation, reporting, and
performance issues - Identification of procedures for reviewing items
for bias for disabilities or accommodations
87Part IV. Harvest a Rich Return
88Positive Consequences
892001 State Directors Told Us All Students with
Disabilities are Included in All Components of
the Accountability System in 25 States
90NCEOs 2001 Survey of States
What State Directors Say About Changes in
Performance
About 28 of states reported increases in state
test performance of students with
disabilities Nearly one-third of the states were
not able to make comparisons because of previous
unavailability of data
91Survey of 100 Students with Learning Disabilities
in Minnesota
- The majority of students surveyed
- Know about graduation tests
- Know how they are doing on tests
- Use accommodations on tests
- Understand accommodations and other things that
help them learn - Schools attended by most students surveyed are
teaching them about accommodations
92Actual Consequences 2001
(a few examples)
- New York More students with disabilities PASSED
the Regents Exam than took it before - Kentucky Higher performance levels on alternate
assessment were correlated with integration of
instruction and assessment and the level of
involvement of the student in constructing his or
her own portfolio - Wyoming Laras Story
93Future
It is not going away the push will continue to
include students with disabilities and LEP
students in assessments and accountability
systems. That is a GOOD thing! It is important
to get on with it . . . .
94Remember What goes in affects what comes out!
It is important to focus not just on measuring
the cows milk/cream output (although that IS
Important) . . . . But, we need to get on with
making sure that the cow increases production!
95IT ALL COMES DOWN TO TEACHING AND LEARNING