Title: National Council on Measurement in Education
1National Council on Measurement in Education
Symposium - Setting Performance Standards for
Schools in Accountability ProgramsPolicy,
Technical, and Operational Issues Thursday,
April 12, 2007 Chicago, Illinois
2Setting Performance Standards for Schools in
Accountability ProgramsPolicy, Technical, and
Operational Issues
Moderator Anita Rawls, University of South
Carolina
3Presenters Eugene Kennedy, Louisiana State
University Standard Setting Challenges for School
Performance Rating Systems Charity Smith,
Arkansas Department of Education School
Performance Index The Arkansas Experience from
Act 35 to Field Review and State Board of
Education Robert Kennedy, University of Arkansas
for Medical Sciences Use of Policy-induced and
School Descriptor Methodology Huynh Huynh,
University of South Carolina Validity,
Reliability and Other Technical
Considerations Charity Smith, Arkansas
Department of Education Final Deliberations by
State Board of Education
4Discussants Peter Behuniak, University of
Connecticut William Schafer, University of
Maryland
5Standards Setting Challenges for School
Performance Rating Systems
Eugene Kennedy
6Standard Setting Challenges For School
Performance Rating Systems
- Why Rate Schools?
- On What Characteristics Should Schools Be Rated?
- What Steps Are Involved In Rating Schools?
7How Do We Define Performance?
- For Students Achievement Scores
- For Schools Aggregated Achievement Data
Adjusted/Not-Adjusted for Input? - Challenges At The Student Level Special
Populations, Retention, etc. - Challenges At the School Level Grade
Organization, Differential Input, Stakeholders,
etc.
8Creation of a Performance Index
- Students Summary Scores, Item Response Theory
(IRT) Scale, etc. - Schools Weighted Index, IRT, etc.
9Procedures for Setting Standards
- Students Defining Performance Levels, Judges,
etc. - Schools Definitions of Performance Levels,
Judges and Stakeholders, etc.
10Validity and Reliability of Results
- Students Internal Consistency, Classification
Accuracy, Predictive Validity, etc. - Schools Stability, Face Validity, etc.
11Performance Labels and Their Implications
- Students Advanced, Proficient, etc.
- Schools High Performing, Low Performing, etc.
12School Performance Index The Arkansas Experience
from Act 35 to Field Review and State Board of
Education
13Act 35The Arkansas Student Assessment and
Accountability Act 0f 2004
- Like many other states, Arkansas has experienced
many initiatives designed to improve its public
education system. - Act 35, which was passed in the Second
Extraordinary Session of the 84th General
Assembly in 2003 mandates that the Arkansas State
Board of Education (SBE) adopt content standards
which reflect what students know and should be
able to do - Develop a criterion-referenced test (CRT)
- Establish rewards and sanctions
- Identify underperforming schools
- Assess the annual learning gains of students
14The Arkansas Comprehensive Assessment Requirements
- Act 35 Big Changes in Testing and
Accountability - More grades added
- Standard setting to set or reset assessment cut
scores - Vertically scaled the CRT for public school
students 3-8 - Specific analyses of student achievement data
15The Arkansas Comprehensive Accountability
Requirements
- Develop a two-tiered annual accountability rating
system approach Performance and growth - Rate schools in five category levels (ranging
from excellent, category 5 to schools in need of
immediate improvement category 1.) - Develop value-added longitudinal calculations for
growth - Ensure that School Ratings are valid, replicable,
transparent, and easily understood - Use a team of relevant technical experts
- Ensure that the accountability ratings approach
is approved by the SBE.
16Timeline
- 2005-06 School Year
- Report spring 2005 test results against newly
adopted standards for grades 3 through 8 - Administer the new tests in grades 3 to 8 in
spring 2006 - Summer 2006
- Report results for grades 3 to 8 against newly
adopted standards - Prepare 2006 School Performance Rating System
- Implement School Improvement Rating System
showing growth from 2005-06.
17Technical Advisory Committees
- Implementation of Act 35 and adherence to the
demanding timeline noted above required extensive
work by officials at the Arkansas Department of
Education (ADE). - As part of this process, the ADE created two
technical advisory committees (TACs), one for
assessment and one for accountability. These
TACs act in an advisory capacity for major
aspects of the implementation of Act 35. - They meet as needed and offer advice and
recommendations to the ADE. Given the reliance
of the accountability program on the statewide
assessments, there is considerable overlap
in the composition of the two committees.
18School Accountability Ratings
- The ADE is required to produce an annual report
which will identify schools as being in one of
five categories based on performance outcomes on
the criterion-referenced benchmark examinations.
These categories (levels) and their qualitative
interpretations are - Level 1 Schools in Need of Immediate
Improvement - Level 2 Schools on Alert
- Level 3 Schools Meeting Standards
- Level 4 Schools Exceeding Standards
- Level 5 Schools of Excellence
19Assignment of School Accountability Ratings
- Schools in Arkansas will not be assigned
performance ratings during the period 2004-05
through 2008-09, unless they specifically request
that this be done. - The baseline year for improvement gains will be
the 2006-07 school year. Actual improvement
ratings (growth) will be assigned starting with
the 2007-08 school year. - Once improvement and performance ratings are
assigned, they will carry significant
consequences for schools.
20Creation of School Weighted Average Index and
General Considerations in Setting Standards for
School Performance
- Initially the TAC/Accountability and the ADE
considered three options developing the annual
school performance ratings required by Act 35
quintiles, stanines, and setting cut scores using
a standard setting conference. - Deliberations were also made on how to compute a
school index that would be used for categorizing
schools. - Following are the chronological steps in
TAC/Accountability deliberations and field
presentation to major groups of Arkansas
stakeholders.
21School Weighted Average Index
- The development of a school performance rating
system in - Arkansas involved three distinct steps.
- First, the TAC/Accountability and the ADE
examined ways to compute a school index to be
used to assign a performance category to each
school. - Second, the TAC/Accountability then deliberated
on how to set the cut scores for this index in
order to define each of the five performance
categories legislated by Act 35. - Third, the TAC/Accountability made
recommendations to the ADE as to how it could
interact with various stake-holders in order to
get their endorsement of the proposed rating
system for consideration and adoption by the SBE.
Note The ADE conducted awareness training with
more than 1,100 stakeholders.
22General Considerations in Setting Standards for
School Performance and Adoption of
Criterion-Referenced Approach
- The TAC/Accountability and the ADE considered
three options for developing annual school
performance ratings - norm-referenced (quintiles and stanines)
- criterion-referenced (expert judgment)
- After statewide focus groups and recommendations,
the SBE adopted the third option, the
criterion-referenced approach.
23Computation of Weighted Average
- The weighted average index began with numerical
values, or weights, tentatively assigned to each
student's performance category from ACTAAP
proficiency levels (Advanced 4 Proficient
3 Basic 2 Below Basic 1). - A different set of weights could be assigned if
policy makers decided to value the performance
for each performance level differently. - With these weights assigned to the performance
levels, the performance index for the school
could be computed by multiplying the weights of
the performance levels times the number of
students scoring in the performance category. - This would be done for each grade and subject.
The weighted sum would then be divided by the
total number of students tested in the various
subjects and grades. - The resulting average for the school would range
between 1.0 and 4.0.
24Preliminary Considerations and Use of School
Descriptor Methodology
25Preliminary Steps in the Standard Setting Process
- Tentative categories
- Information provided
- Statewide data profile
26Initial Considerations for Preliminary Cut Scores
27General Considerations forPreliminary Cut Scores
- March 8th, bad weather
- March 15th benefited
28Data for School Profile
- Information provided
- weighted average index
- economically disadvantaged, LEP, and special
education - Adequate Yearly Progress
- accreditation
- number tested
- percentages at each level
29School Profile in Each Preliminary Level
- Level 1 Schools in need of immediate
improvement (42 schools) - Level 2 Schools on alert (117)
- Level 3 Schools meeting standards (795)
- Level 4 Schools exceeding standards (112)
- Level 5 Schools of excellence (24)
Note This preliminary level analysis includes
high schools.
30School Profile in Each Pairwise Overlapped Schools
- Pairwise overlap of school ratings
- Levels 1 and 2 1.68 to 1.73
- Levels 2 and 3 1.75 to 2.17
- Levels 3 and 4 2.68 to 2.92
- Levels 4 and 5 2.86 to 3.07
- Panelists set cut points where they felt
comfortable.
31Composition of the Panel
- Facilitators black female, black male, Hispanic
male, white female, and white male - Panelists also racially and geographically
diverse PTA, business, AAEA, AEA, ASBA - Each group named 12 representatives, for a total
of 60 panelists (52 actually participated) - Monitored by TAC/Accountability
32Beginning Plenary Session
- Plenary meetings and group sessions.
- Purpose of the meeting
- Advisory role of the TAC/Accountability
- Background, objectives, procedures
- The criterion-reference approach explained
33Group Session (Round 1)
- Role alike groups
- Panelists discussions
- Initial break points
- Medians and ranges
34Round 1 Group Median Cut Scores and All-Group
Results
35Second Plenary Meeting
- Key points
- Lunch
- Reconsideration
- New cut scores
- New group means and medians
36Round 2 Group Median Cut Scores and All-Group
Results
37Final Plenary Meeting
- Maintained individual confidentiality
- State Board consideration
- Panelists evaluation
- Thanks to the panelists
38Validity Reliability and Other Technical
Considerations
39Technical Characteristics
- Act 35 levels of school performance status are
- Level 1 Schools in Need of Immediate
Improvement - Level 2 Schools on Alert
- Level 3 Schools Meeting Standards
- Level 4 Schools Exceeding Standards
- Level 5 Schools of Excellence.
- Cut scores on the performance index scale have
been established. - I will now present some major psychometric
characteristics of the Weighted Average Index and
the status classifications.
40Internal Consistency of Performance Index
- The Weighted Average Index (PI) is a (linear)
average of the performance of all students in
that school. - An internal consistency (reliability) of the
index was computed using an analog of the
split-half (Spearman-Brown) reliability in
classical test theory. There are three steps - Step 1 Students in each school were randomly
split into equal (or nearly equal) half groups
and the index was computed for each half. - Step 2 The Pearson correlation (r12) was
computed for the two half-group indices using all
available schools (with at least 40 students). - Step 3 The Spearman-Brown formula used to
compute the reliability (r) of the performance
index for the entire school r 2 r12/
(1 r12).
41Summary Data for Split-Half Reliability (2005)
42Summary Data for Split-Half Reliability (2006)
43Yearly Stability of Weighted Average Index
44Stability of Act 35 Performance Level
- The yearly stability of the performance index was
studied also through the performance level
classification. - We looked at the cross-tabulation data for the
2005 and 2006 performance levels for the large
schools, that is those with at least 40 students
with complete data in both years. There are 854
large schools. - Out of these, a total of 556 (65) retained the
same level from 2005 to 2006. - 282 (33) moved up by one level.
- One school that moved up two categories and 9
schools moved down one category.
45Tabulation of Act 35 Performance Category with
AYP Category for 2005
Note AYP categories are coded as No 0 and Yes
1 for correlation calculation.
46Tabulation of Act 35 Performance Category with
AYP Category for 2006
Note AYP categories are coded as No 0 and Yes
1 for correlation calculation
47Final Deliberation by the State Board of
Education
48State Board of Education Action
- The SBE and other stakeholders were kept
informed. - The SBE did the following
- Adopted the Weighted Average Index for
calculating performance ratings for schools - Recommended detail communication with
stakeholders to ensure transparency - Approved official cut scores recommended by the
standards setting team - Adopted appropriate ratings criteria through
approved rules and regulations - Reviewed the Standard Setting Technical Report