Growth Scales and Pathways - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Growth Scales and Pathways

Description:

... lower grades (e.g., fewer levels) and certain contents (e.g., athletics, music) ... awards for students meeting or falling below or above their expectations based ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 51

Provided by: hpcus518

Category:

more less

Transcript and Presenter's Notes

Title: Growth Scales and Pathways

1
Growth Scales and Pathways

William D. Schafer
University of Maryland
and
Jon S. Twing
Pearson Educational Measurement

2
NCLB leaves some unmet policy needs
3

Assessment of student-level growth

Sensitivity to change within achievement levels

Assessment and accountability at all grades

Broad representation of schooling outcomes

Descriptions of what students are able to do in
terms of next steps

Cost-benefit accountability

9
How can we meet these needs?

Our approach starts with measurement of growth
through cross-grade scaling of achievement

Current work is being done around
Vertical Scales in which
Common items for adjacent grades used to generate
a common scale across grades

Another approach is grade-equivalents.
Both are continuous cross-grade scales.

We only have three problems with continuous
cross-grade scales
The Past
The Present
The Future

13
Why the Past?

Ignores instructional history
The same student score should be interpreted
differently depending on the grade level of the
student

14
Why the Present?

Relationships among items may (probably do)
differ depending on grade level of the student.
(e.g., easy fifth grade items may be difficult
for fourth graders)
Lack of true equating. It is better for fourth
graders to take fourth grade tests and for fifth
graders to take fifth grade tests.

15
Why the Future?

Instructional expectations differ. A score of GE
5.0 (or VS 500) carries different growth
expectations from a fifth-grade experience next
year for a current fifth grader than for a
current fourth grader.

We do need to take seriously the interests of
policymakers in continuous scaling.
But the problems with grade-equivalents and
vertical scaling may be too severe to recommend
them.
Here are seven criteria that an alternate system
should demonstrate.

17
1. Implement the Fundamental Accountability
Mission

Test all students on what they are supposed to be
learning.

18
2. Assess all contents at all grades.

Educators should be accountable for all public
expenditures.
Apply this principle at least to all
non-affective outcomes of schooling.

19
3. Define tested domains explicitly.

Teachers need to understand their learning
targets in terms of
Knowledge (what students know)
Factual
Conceptual
Procedural
Cognition (what they do with it)

20
4. Base test interpretations on the future.

We cant change the past, but we can design the
future.
It can be more meaningful to think about what
students are prepared for than about what they
have learned.

21
5. Inform decision making about students,
teachers, and programs.

Within the limits of privacy, gathering data for
accountability judgments about everyone and
everything (within reason) will help decision
makers reach the most informed decisions.
This also means that we will associate
assessments with those who are responsible for
improving them.

22
6. Emphasize predictive evidence of validity.

Basing assessment interpretations on the future
(see point 4) suggests that our best evidence to
validate our interpretations is how well they
predicted in the past.

23
7. Capitalize on both criterion and norm
referencing.

Score reports need to satisfy the needs of the
recipients. Both criterion-referencing (what
students are prepared to do) and norm-referencing
(how many are as, more, and less prepared) convey
information that is useful.
Other things equal, more information is better
than less.

24
Our Approach to the Criteria

Many of the criteria are self-satisfying.
Some recent and new concepts are needed.
Four recent or new concepts
Socially moderated standard setting
Operationally defined exit competencies
Growth scaling
Growth pathways

25
Socially Moderated Standard Setting

Ferrara, Johnson, Chen (2005)
Judges set achievement level cut points where
students have prerequisites for the same
achievement level next year.
Note the future orientation of the achievement
levels. This concept also underlies Lissitz
Huynhs (2003) concept of vertically moderated
standards.

26
Operationally Defined Exit Competencies

If we implement socially moderated standards,
where do the cut points for the 12th grade come
from?
Our suggestion is to base them on what the
students are prepared for, such as (1) college
credit, (2) ready for college, (3) needs college
remediation, (4) satisfies federal
ability-to-benefit rules, (5) capable of
independent living, (6) below.
Modify as needed for lower grades (e.g., fewer
levels) and certain contents (e.g., athletics,
music)

27
Growth Scaling

Some elements of this have been used in Texas and
Washington State.
Test at each grade level separately for any
content (i.e., only grade-level items).
Report using a three-digit scale.
First digit is the grade level.
Second two digits are a linear transform of the
lower proficient (e.g., 40) and advanced
(e.g., 60)cut points. Could transform
non-linearly to all cut points with more than
three levels.

28
Growth Pathways

Given that content is backmapped (Wiggins
McTighe, 1998), and achievement levels are
socially moderated, can express achievement
results in terms of readiness for growth (next
year, or at 12th grade or both).
Can generate transition matrices to express
likelihoods of various futures for students.

29
Adequate Yearly Progress

Capitalizing on Hill et al. (2005) can use growth
pathways as the bases for expectations and give
point awards for students meeting or falling
below or above their expectations based on
year-ago achievement levels.

30
Existing Empirical State Data

Using existing data, we explored some of these
concepts.
Two data sets were used from Texas.
All data is in the public domain and can be
obtained from the Texas website.
Current Texas data is used TAKS
Previous Texas data is used TAAS

31
TAAS Data (2000-2002)
32
Immediate Observations - TAAS Data

Passing standards appear to be relatively
lenient.
Actual standards were set in fall of 1989.
Curriculum change occurred in 2000.
Texas Learning Index (TLI)
Is a variation of the Growth Scaling model
previously discussed.
Will be discussed in more detail shortly.
Despite the leniency of the standard, average
cross-sectional gain is shown with the TLI.
About a 2.5 TLI value gain on average (across
grades).

33
TAKS Data (2003-2005)
34
Immediate Observations -TAKS Data

Passing standards appear to be more severe than
TAAS, but still the majority of students pass for
the most part.
Standards were set using Item Mapping and
field-test data in 2003.
Standards were phased in by the SBOE.
Passing is labeled as Met the Standard.
Scale Scores are transformed within grade and
subject calibrations using Rasch.
Scales were set such that 2100 is always
passing.
Socially moderated expectation that a 2100 this
year is equal to a 2100 next year.
We will look at this in another slide shortly.

35
Immediate Observations-TAKS Data

Some Issues/Problems seem obvious
Use of field test data the and lack of student
motivation the first year.
Phase in of the standards makes the meaning of
passing difficult to understand.
Construct changes between grades 8 and 9.
Math increases in difficulty across the grades.
Cross-sectional gain scores show some progress,
with between 20 and 35 point gains in average
scaled score across grades and subjects.
Finally, the percentage of classifications
(impact) resulting from the Item Mapping standard
setting is quite varied.

36
A Pre-Organizer

Socially Moderated Standard Setting
Really sets the expectation of student
performance in the next grade.
Growth Scaling
A different definition of growth.
Growth by fiat.
Operationally Defined Exit Competencies
How does a student exit the program?
How to migrate this definition down to other
grades.
Growth Pathways
Cumulative probability of success.
Not addressed in this paper with Texas data.

37
Socially Moderated Standard Setting

Consider the TAKS data in light of Socially
Moderated Standard Setting.
The cut scores were determined separately by
grade and subject using an Item Mapping
procedure.
2100 was selected as the transformation of the
Rasch theta scale associated with passing.
2100 became the passing standard for all grades
and subjects.
Similar to the quasi-vertical scale scores
procedure described by Ferrara et al. (2005).

38
Socially Moderated Standard Setting

Despite implementation procedures, the standard
setting yielded a somewhat inconsistent set of
cut scores.
Panels consisted of on and adjacent grade
educators.
Performance level descriptors were discussed both
for the current grade and the next.
A review panel was convened to ensure continuity
between grades within subjects.
This review panel was comprised of educators from
all grades participating in the standard setting
and use impact data for all grades as well as
traditionally estimated vertical scaling
information.

39
Socially Moderated Standard Setting

Yet, some inconsistencies are hard to explain.
For example, the standards yielded the following
passing rates for Reading
Grade 3 81
Grade 4 76
Grade 5 67
Grade 6 71
Clearly, social moderation did not occur
Differences in content standards from grade to
grade.
Lack of a clearly defined procedure setting up
expectation at the next grade.
Mitigating factors (i.e., kids cry raw score
percent correct, etc.).

40
Socially Moderated Standard Setting

What about unanticipated consequences?
Are teachers, parents and the public calculating
gain score differences between the grades based
on these horizontal scale scores?
Will the expectation not be 2100 this year
2100 next year? This is similar to one of the
concerns in Ferrara et. al. (2005) that
prohibited the research from being conducted.
In fact, based on simple regression using
matched cohorts, the expectation is a student
with a scaled score of 2100 in grade 3 reading
will earn a 2072 in grade 4 reading on average.

41
Growth Scaling

The TAAS TLI is an example of this type of
growth scale.
A standard setting was performed for the Exit
Level TAAS test.
This cut score was expressed in standard
deviation units above or below the mean (i.e., a
standard score).
This same distance was then articulated down to
other grades.
The logic was one defining growth in terms of
maintaining relative status as students move
across the grades.
For example, if the passing standard was 1.0
standard deviation above the mean at Exit Level,
then students who are 1.0 standard deviation
above the mean in the lower grade distributions
are on track to pass the Exit Level test
provided they maintain their current standing /
progress.

42
Growth Scaling

For convenience, the scales were transformed such
that the passing standards were at 70.
Grade level designations were then added to
further enhance the meaning of the score.
This score had some appealing reporting
properties
Passing was 70 at each grade.
Since the TLI is a standard score, gain measures
could be calculated for value added statements.

43
Growth Scaling

Some concerns were also noted
Outside of the first cut score, the TLI was
essentially content standard free.
Because it was based on distribution statistics,
the distributions (like norms) would become
dated.
Differences in the shapes of the distributions
(e.g., test difficulty) would have an unknown
impact on students actually being able to hold
their own.
Differences in the content being measured across
the grades is essentially irrelevant.

44
Operationally Defined Exit Competencies

The TAKS actually has such a component at the
Exit Level.
This is called the Higher Education Readiness
Component (HERC) Standard.
Students must reach this standard to earn dual
college credit and to be allowed credit for
college level work.
Two types of research were conducted to provide
information for traditional standard setting
Correlations with existing measures (ACT SAT).
Empirical study examining how well second
semester freshmen performed on the Exit Level
TAKS test.

45
Operationally Defined Exit Competencies

This research yielded the following

46
Operationally Defined Exit Competencies

Some interesting observations
HERC standard was taken to be 2200, different
from that needed to graduate.
Second semester college freshmen did marginally
better than the required passing standard for
TAKS to graduate.
Predicted ACT and SAT scores support the notion
that the TAKS passing standards are moderately
difficult.
Given the content of the TAKS assessments, how
could this standard be articulated down to lower
grades?

47
Concluding Remarks

Three possible enhancements that may or may not
be intriguing for policymakers
Grades as Achievement Levels
Information Rich Classrooms
Monetary Metric

48
Grades as Achievement Levels

Associating letter grades with achievement levels
would
Provide meaningful interpretations for grades
Provide consistent meanings for grades
Force use as experts recommend
Enable concurrent evaluations of grades
Enable predictive evaluations of grades
Require help for teachers to implement

49
Information Rich Classrooms

Concept is from Schafer Moody (2004).
Achievement goals would be clarified through test
maps.
Progress would be tracked at the content strand
level throughout the year using combinations of
formative and summative assessments (heavy role
for computers).
Achievement level assignments would occur
incrementally throughout the year.

50
Monitory Metric for Value Added

Economists would establish value of each exit
achievement level through estimating lifetime
earned income.
The earnings would be amortized across grade
levels and contents.
The value added for each student each year is
the sum across contents of the products of the
achievement level times the vector of
probabilities of exit achievement levels times
the vector of amortized monitory values.
Enables cost-benefit analysis of education in a
consistent metric for inputs and outputs.