Title: Growing Pains: The State of the Art in ValueAdded Modeling
1Growing Pains The State of the Art in
Value-Added Modeling
- Presentation on March 2, 2005 to
- Michigan School Testing Conference
- By Joseph A. Martineau
- Psychometrician
- Office of Educational Assessment Accountability
- Michigan Department of Education
2Why Value Added?
- Value Added measures of achievement are being
discussed as a possible addition to the
regulations of No Child Left Behind (NCLB). - Various ways of implementing Value Added in NCLB
are possible - One likely implementation of Value Added is as
another way to make safe harbor if the percent
proficiency targets are not met
3What is Value Added?
- In accountability, Value Added is a term that
describes the part of achievement (or change in
achievement) that is attributable to the
effectiveness of a unit (teacher or school) - Positive estimates indicate units that are above
average, negative estimates indicate that units
are below average - Defining what is attributable to the
effectiveness of a unit is a matter of
philosophical debate
4The Logic of Value Added
- Holding educators accountable for student
performance has many pitfalls - Educators cannot control their students incoming
achievement - Educators cannot control the effectiveness of
their students previous teachers/schools - Educators cannot control the effects of
non-instructional student characteristics such
as - Poverty
- Parental education
- Mobility
- Home environment
- Etcetera
5The Logic of Value Added, Continued
- Value Added Models (VAM) attempt to obtain pure
estimates of the contribution of educators to
student achievement and/or growth in achievement - The promise of VAM is that educators are held
accountable only for their impact on student
learning - The idea is not rocket science (Sanders), but the
implementation is (Reckase)
6The Idea Is Not Rocket Science
- For each school
- Estimate the expected average achievement or gain
score - Calculate the observed average achievement or
gain score - Subtract the expected from the observed average
score - Define the resulting difference between expected
and observed scores as the value added by the
school
7The Idea Is Not Rocket ScienceAdjusting
Achievement Targets tobe More Fair to Educators
8The Idea Is Not Rocket ScienceAdjusting Gain
Targets to be MoreFair to Educators (Tennessee
Model)
9The Idea Is Not Rocket ScienceAdjusting Gain
Targets to be MoreFair to Educators (Dallas
Model)
10The Idea Is Getting Closer to Rocket
ScienceAdjusting Yearly Gain Targets to Meeta
Final Achievement Goal (Thum Model)
11The Implementation IS Rocket ScienceIn a
Growth-Based VAM, For Each School You Must
- Specify a Mixed Model (a sophisticated
statistical procedure that accounts for the
structure of data coming from multiple occasions
for each student, and multiple students per unit) - Estimate an overall average gain for each school
year, and for the entire set of students and
schools - Estimate a unique expected average gain for each
school year and school - Estimate the difference between the schools
actual average trajectory and the expected
average trajectory for each school year and
school - Keep track of previous schools effects so that
they dont get counted toward later schools - Estimate a unique expected gain for each school
year, student, and school - Estimate the difference between the expected gain
and the actual gain for each school year,
student, and school - Keep track of all differences across years so
that a students high growth in one year is not
counted toward all subsequent years - Estimate all of these expected and actual gains
together so that they are unbiased and reliable - Do this all using a sparse data matrix, which
causes ordinary software to choke - So, you write your own software, and develop new
applications of statistical theory to make your
idea work - Communicate the results in an understandable
fashion to stakeholders
12The Problem with Rocket Science
- And with rocket science, many things can cause
large distortions in the results of VAM,
including - Small problems with the scales of measurement
- Small programming errors
- Small errors in assumptions needed for the
statistical models to work appropriately
13Statistical Issues in VAM
- 50 years ago, researchers despaired of every
being able to measure growth validly, because the
statistical issues seemed insurmountable - Most of the statistical issues have been solved
by the introduction of Statistical Mixed Models
14Statistical Issues in VAM, Continued
- For VAM, one very significant statistical issue
remains - The parts of the statistical models that produce
estimates of Value Added were originally included
in statistical models with the purpose of
accounting for sources of error so that other
effects were easier to identify. Therefore - Therefore, estimates of value added can also be
classified as error terms - Estimates of Value Added are technically the
portion of achievement or gains that cannot be
explained by anything else included in the model - In effect, the implementation of a Value-Added
Model says whatever portion of achievement
and/or growth we do not know how to explain is to
be attributed to schools
15Statistical Issues in VAM, Continued
- Philosophical, ethical, and political
considerations of attributing to schools all
achievement/gains that cannot be explained any
other way - Do we have to remove differences explained by
ethnicity before we can attribute the rest to
schools? - Do we have to remove differences explained by
poverty before we can attribute the rest to
schools? - Etcetera
- Is it possible to ever satisfy the majority of
stakeholders that whats left over is pure enough
to hold schools accountable for? - No matter how we answer these questions, it
raises additional philosophical, ethical, and
political concerns.
16Ethical Issues in VAM, Continued
- VAMs as Currently Implemented
- Focus lies squarely on being fair to educators
- In TN and OH
- All educators are expected to produce the same
average gains in their students - The achievement gap is expected to remain as it
was because educators or lower-achieving groups
of students are not expected to help their
students catch up - In Dallas
- All educators are expected to produce gains in
their students that are equivalent to the average
gains achieved by similar groups of students - The achievement gap may be expected to widen
because lower performing groups of students may
achieve lower average gains than other groups of
students
17Ethical Issues in VAM, Continued
- Where does VAM take into account fairness for
low-performing students? - Currently implemented VAMs say basically, I need
to see one years growth for one year of
instruction where (as in the Dallas model), one
years worth of growth can be less for some
groups of students than for others - Because of concerns about being fair to
educators, groups of students that start out
behind are left behind by the same amount (or
even more) - Thum model is a compromise that expects a modest
amount more of educators serving low-achieving
students, but that the gap will be closed over
many grades - Not really a VAM
- A mixture of status and growth
18Political Issues in VAM
- Complexity
- Rocket Science is a political liability
- As more of the statistical and ethical issues of
VAM are addressed, VAMs are likely to become even
more inaccessible to the lay audience - VAM requires an extraordinary amount of trust in
those who implement the system - Ethical issues will be decided by a political
process that does not necessarily account for the
best interest of students and educators, e.g. - Dallas Focus on best interests of educators at
the possible price of increasing achievement gaps - TN, HO Focus on best interests of educators at
the possible price of leaving achievement gaps as
they are - Thum Focus on best interests of low-performing
groups at the possible expense of (1)
high-performing groups of students, and (2)
making low-achieving schools less attractive to
qualified teachers - The state of the art in VAM is incapable of
providing for both high achievement for all
students and fairness in evaluating educators of
lower-performing students
19Measurement Issues in VAM
- Having solved most of the statistical issues in
VAM, the measurement issues have been forgotten
in the excitement
20Measurement Issues in VAM, Continued
- Assumes that the same thing is being measured at
every grade level of the test - Presents a dilemma
- In order to measure validly, we have to measure
what is being taught, which changes over grade
levels - In order to calculate growth, gains, and
value-added, we have to measure the same thing
every time we measure - Value added models are being applied to
construct-shifting scales as if the scales were
interval-level measures of student achievement on
unchanging content
21Cautions in using Vertical Scales
- Scholars have been warning against the use of
construct-shifting scales to measure growth for
50 years - However, the use of vertical scales in growth
models has become increasingly prevalent in
scholarly literature with the advent of recent
statistical developments (HLM and SEM) - So am I just straining at gnats?
- Cant I just use vertical scales to measure
growth? - What harm can it do?
- How big is the effect of changing content on
growth- and growth-based value-added models?
22Hypothetical example
- A vertically scaled mathematics test
- Grades 3-8
- Composed of only two constructs
- Basic Computation (BC)
- Problem Solving (PS)
- BC is heavily represented in early grades
- PS is heavily represented in later grades
- Only the single, combined math score is available
(BC and PS are just in the background)
23Hypothetical example
24Hypothetical Example
25Hypothetical Example
26The Effects of Construct Shift
- Construct shift affects
- The estimation of educational effectiveness (the
results of Value-Added Models) - Does not accurately identify effectiveness if
student achievement is outside the range measured
well by the grade-level test - Attributes effectiveness of prior
teachers/schools to current teachers/schools
(violates the promise of Value-Added Models)
27(No Transcript)
28Reliability
- Ratio of construct-related variance to total
variance (construct-related plus
non-construct-related variance) - Extend to Value-Added Models
- Ratio of variance in true value added to total
variance (true value-added variance plus variance
of distortions) - How important is this distortion, especially when
the constructs are correlated?
29Reliability
Martineau (in press) derived an an upper bound
on reliability of VAM
- Affected by content balance (more balanced means
lower reliability) - Affected by correlation in value added (higher
correlation means higher reliability) - Affected by grade level (later grades have lower
reliability) - Affected by magnitude of changes in content
across grades (larger changes mean lower
reliability)
30Reliability of VAM Results
31Reliability
- Only in extraordinary circumstances are the
results reliable enough for high-stakes use - For research use, the results may be reliable
enough in some limited circumstances
32Alleviating low reliability of value-added
analyses
- Twice a year testing
- Not politically viable
- Completely eliminates low reliability
- Once yearly testing, new equating design
- Embed the entire set of below-grade items on the
current grade test by including a small portion
of the set on each of multiple test forms - Calibrate a separate vertical scale for each
adjacent pair of grades (e.g. 3/4, 4/5, 5/6) - Concurrent calibration of grade 3 and 4 items
together, 4 and 5 items together, 5 and 6 items
together - Should markedly reduce the amount of construct
shift, and increase the reliability to an
acceptable degree
33Contact Information
- Joseph Martineau
- Office of Educational Assessment Accountability
- Michigan Department of Education
- P.O. Box 30008
- Lansing, MI 48909
- (517) 241-4710
- martineauj_at_michigan.gov