Title: Growth Scales and Pathways
1Growth Scales and Pathways
- William D. Schafer
- University of Maryland
- and
- Jon S. Twing
- Pearson Educational Measurement
2NCLB leaves some unmet policy needs
3- Assessment of student-level growth
4- Sensitivity to change within achievement levels
5- Assessment and accountability at all grades
6- Broad representation of schooling outcomes
7- Descriptions of what students are able to do in
terms of next steps
8- Cost-benefit accountability
9How can we meet these needs?
- Our approach starts with measurement of growth
through cross-grade scaling of achievement
10- Current work is being done around
- Vertical Scales in which
- Common items for adjacent grades used to generate
a common scale across grades
11- Another approach is grade-equivalents.
- Both are continuous cross-grade scales.
12- We only have three problems with continuous
cross-grade scales -
- The Past
- The Present
- The Future
13Why the Past?
- Ignores instructional history
- The same student score should be interpreted
differently depending on the grade level of the
student
14Why the Present?
- Relationships among items may (probably do)
differ depending on grade level of the student.
(e.g., easy fifth grade items may be difficult
for fourth graders) - Lack of true equating. It is better for fourth
graders to take fourth grade tests and for fifth
graders to take fifth grade tests.
15Why the Future?
- Instructional expectations differ. A score of GE
5.0 (or VS 500) carries different growth
expectations from a fifth-grade experience next
year for a current fifth grader than for a
current fourth grader.
16- We do need to take seriously the interests of
policymakers in continuous scaling. - But the problems with grade-equivalents and
vertical scaling may be too severe to recommend
them. - Here are seven criteria that an alternate system
should demonstrate.
171. Implement the Fundamental Accountability
Mission
- Test all students on what they are supposed to be
learning.
182. Assess all contents at all grades.
- Educators should be accountable for all public
expenditures. - Apply this principle at least to all
non-affective outcomes of schooling.
193. Define tested domains explicitly.
- Teachers need to understand their learning
targets in terms of - Knowledge (what students know)
- Factual
- Conceptual
- Procedural
- Cognition (what they do with it)
204. Base test interpretations on the future.
- We cant change the past, but we can design the
future. - It can be more meaningful to think about what
students are prepared for than about what they
have learned.
215. Inform decision making about students,
teachers, and programs.
- Within the limits of privacy, gathering data for
accountability judgments about everyone and
everything (within reason) will help decision
makers reach the most informed decisions. - This also means that we will associate
assessments with those who are responsible for
improving them.
226. Emphasize predictive evidence of validity.
- Basing assessment interpretations on the future
(see point 4) suggests that our best evidence to
validate our interpretations is how well they
predicted in the past.
237. Capitalize on both criterion and norm
referencing.
- Score reports need to satisfy the needs of the
recipients. Both criterion-referencing (what
students are prepared to do) and norm-referencing
(how many are as, more, and less prepared) convey
information that is useful. - Other things equal, more information is better
than less.
24Our Approach to the Criteria
- Many of the criteria are self-satisfying.
- Some recent and new concepts are needed.
- Four recent or new concepts
- Socially moderated standard setting
- Operationally defined exit competencies
- Growth scaling
- Growth pathways
25Socially Moderated Standard Setting
- Ferrara, Johnson, Chen (2005)
- Judges set achievement level cut points where
students have prerequisites for the same
achievement level next year. - Note the future orientation of the achievement
levels. This concept also underlies Lissitz
Huynhs (2003) concept of vertically moderated
standards.
26Operationally Defined Exit Competencies
- If we implement socially moderated standards,
where do the cut points for the 12th grade come
from? - Our suggestion is to base them on what the
students are prepared for, such as (1) college
credit, (2) ready for college, (3) needs college
remediation, (4) satisfies federal
ability-to-benefit rules, (5) capable of
independent living, (6) below. - Modify as needed for lower grades (e.g., fewer
levels) and certain contents (e.g., athletics,
music)
27Growth Scaling
- Some elements of this have been used in Texas and
Washington State. - Test at each grade level separately for any
content (i.e., only grade-level items). - Report using a three-digit scale.
- First digit is the grade level.
- Second two digits are a linear transform of the
lower proficient (e.g., 40) and advanced
(e.g., 60)cut points. Could transform
non-linearly to all cut points with more than
three levels.
28Growth Pathways
- Given that content is backmapped (Wiggins
McTighe, 1998), and achievement levels are
socially moderated, can express achievement
results in terms of readiness for growth (next
year, or at 12th grade or both). - Can generate transition matrices to express
likelihoods of various futures for students.
29Adequate Yearly Progress
- Capitalizing on Hill et al. (2005) can use growth
pathways as the bases for expectations and give
point awards for students meeting or falling
below or above their expectations based on
year-ago achievement levels.
30Existing Empirical State Data
- Using existing data, we explored some of these
concepts. - Two data sets were used from Texas.
- All data is in the public domain and can be
obtained from the Texas website. - Current Texas data is used TAKS
- Previous Texas data is used TAAS
31TAAS Data (2000-2002)
32Immediate Observations - TAAS Data
- Passing standards appear to be relatively
lenient. - Actual standards were set in fall of 1989.
- Curriculum change occurred in 2000.
- Texas Learning Index (TLI)
- Is a variation of the Growth Scaling model
previously discussed. - Will be discussed in more detail shortly.
- Despite the leniency of the standard, average
cross-sectional gain is shown with the TLI. - About a 2.5 TLI value gain on average (across
grades).
33TAKS Data (2003-2005)
34Immediate Observations -TAKS Data
- Passing standards appear to be more severe than
TAAS, but still the majority of students pass for
the most part. - Standards were set using Item Mapping and
field-test data in 2003. - Standards were phased in by the SBOE.
- Passing is labeled as Met the Standard.
- Scale Scores are transformed within grade and
subject calibrations using Rasch. - Scales were set such that 2100 is always
passing. - Socially moderated expectation that a 2100 this
year is equal to a 2100 next year. - We will look at this in another slide shortly.
35Immediate Observations-TAKS Data
- Some Issues/Problems seem obvious
- Use of field test data the and lack of student
motivation the first year. - Phase in of the standards makes the meaning of
passing difficult to understand. - Construct changes between grades 8 and 9.
- Math increases in difficulty across the grades.
- Cross-sectional gain scores show some progress,
with between 20 and 35 point gains in average
scaled score across grades and subjects. - Finally, the percentage of classifications
(impact) resulting from the Item Mapping standard
setting is quite varied.
36A Pre-Organizer
- Socially Moderated Standard Setting
- Really sets the expectation of student
performance in the next grade. - Growth Scaling
- A different definition of growth.
- Growth by fiat.
- Operationally Defined Exit Competencies
- How does a student exit the program?
- How to migrate this definition down to other
grades. - Growth Pathways
- Cumulative probability of success.
- Not addressed in this paper with Texas data.
37Socially Moderated Standard Setting
- Consider the TAKS data in light of Socially
Moderated Standard Setting. - The cut scores were determined separately by
grade and subject using an Item Mapping
procedure. - 2100 was selected as the transformation of the
Rasch theta scale associated with passing. - 2100 became the passing standard for all grades
and subjects. - Similar to the quasi-vertical scale scores
procedure described by Ferrara et al. (2005).
38Socially Moderated Standard Setting
- Despite implementation procedures, the standard
setting yielded a somewhat inconsistent set of
cut scores. - Panels consisted of on and adjacent grade
educators. - Performance level descriptors were discussed both
for the current grade and the next. - A review panel was convened to ensure continuity
between grades within subjects. - This review panel was comprised of educators from
all grades participating in the standard setting
and use impact data for all grades as well as
traditionally estimated vertical scaling
information.
39Socially Moderated Standard Setting
- Yet, some inconsistencies are hard to explain.
- For example, the standards yielded the following
passing rates for Reading - Grade 3 81
- Grade 4 76
- Grade 5 67
- Grade 6 71
- Clearly, social moderation did not occur
- Differences in content standards from grade to
grade. - Lack of a clearly defined procedure setting up
expectation at the next grade. - Mitigating factors (i.e., kids cry raw score
percent correct, etc.).
40Socially Moderated Standard Setting
- What about unanticipated consequences?
- Are teachers, parents and the public calculating
gain score differences between the grades based
on these horizontal scale scores? - Will the expectation not be 2100 this year
2100 next year? This is similar to one of the
concerns in Ferrara et. al. (2005) that
prohibited the research from being conducted. - In fact, based on simple regression using
matched cohorts, the expectation is a student
with a scaled score of 2100 in grade 3 reading
will earn a 2072 in grade 4 reading on average.
41Growth Scaling
- The TAAS TLI is an example of this type of
growth scale. - A standard setting was performed for the Exit
Level TAAS test. - This cut score was expressed in standard
deviation units above or below the mean (i.e., a
standard score). - This same distance was then articulated down to
other grades. - The logic was one defining growth in terms of
maintaining relative status as students move
across the grades. - For example, if the passing standard was 1.0
standard deviation above the mean at Exit Level,
then students who are 1.0 standard deviation
above the mean in the lower grade distributions
are on track to pass the Exit Level test
provided they maintain their current standing /
progress.
42Growth Scaling
- For convenience, the scales were transformed such
that the passing standards were at 70. - Grade level designations were then added to
further enhance the meaning of the score. - This score had some appealing reporting
properties - Passing was 70 at each grade.
- Since the TLI is a standard score, gain measures
could be calculated for value added statements.
43Growth Scaling
- Some concerns were also noted
- Outside of the first cut score, the TLI was
essentially content standard free. - Because it was based on distribution statistics,
the distributions (like norms) would become
dated. - Differences in the shapes of the distributions
(e.g., test difficulty) would have an unknown
impact on students actually being able to hold
their own. - Differences in the content being measured across
the grades is essentially irrelevant.
44Operationally Defined Exit Competencies
- The TAKS actually has such a component at the
Exit Level. - This is called the Higher Education Readiness
Component (HERC) Standard. - Students must reach this standard to earn dual
college credit and to be allowed credit for
college level work. - Two types of research were conducted to provide
information for traditional standard setting - Correlations with existing measures (ACT SAT).
- Empirical study examining how well second
semester freshmen performed on the Exit Level
TAKS test.
45Operationally Defined Exit Competencies
- This research yielded the following
46Operationally Defined Exit Competencies
- Some interesting observations
- HERC standard was taken to be 2200, different
from that needed to graduate. - Second semester college freshmen did marginally
better than the required passing standard for
TAKS to graduate. - Predicted ACT and SAT scores support the notion
that the TAKS passing standards are moderately
difficult. - Given the content of the TAKS assessments, how
could this standard be articulated down to lower
grades?
47Concluding Remarks
- Three possible enhancements that may or may not
be intriguing for policymakers - Grades as Achievement Levels
- Information Rich Classrooms
- Monetary Metric
48Grades as Achievement Levels
- Associating letter grades with achievement levels
would - Provide meaningful interpretations for grades
- Provide consistent meanings for grades
- Force use as experts recommend
- Enable concurrent evaluations of grades
- Enable predictive evaluations of grades
- Require help for teachers to implement
49Information Rich Classrooms
- Concept is from Schafer Moody (2004).
- Achievement goals would be clarified through test
maps. - Progress would be tracked at the content strand
level throughout the year using combinations of
formative and summative assessments (heavy role
for computers). - Achievement level assignments would occur
incrementally throughout the year.
50Monitory Metric for Value Added
- Economists would establish value of each exit
achievement level through estimating lifetime
earned income. - The earnings would be amortized across grade
levels and contents. - The value added for each student each year is
the sum across contents of the products of the
achievement level times the vector of
probabilities of exit achievement levels times
the vector of amortized monitory values. - Enables cost-benefit analysis of education in a
consistent metric for inputs and outputs.