Growing Pains: The State of the Art in ValueAdded Modeling PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Growing Pains: The State of the Art in ValueAdded Modeling


1
Growing Pains The State of the Art in
Value-Added Modeling
  • Presentation on March 2, 2005 to
  • Michigan School Testing Conference
  • By Joseph A. Martineau
  • Psychometrician
  • Office of Educational Assessment Accountability
  • Michigan Department of Education

2
Why Value Added?
  • Value Added measures of achievement are being
    discussed as a possible addition to the
    regulations of No Child Left Behind (NCLB).
  • Various ways of implementing Value Added in NCLB
    are possible
  • One likely implementation of Value Added is as
    another way to make safe harbor if the percent
    proficiency targets are not met

3
What is Value Added?
  • In accountability, Value Added is a term that
    describes the part of achievement (or change in
    achievement) that is attributable to the
    effectiveness of a unit (teacher or school)
  • Positive estimates indicate units that are above
    average, negative estimates indicate that units
    are below average
  • Defining what is attributable to the
    effectiveness of a unit is a matter of
    philosophical debate

4
The Logic of Value Added
  • Holding educators accountable for student
    performance has many pitfalls
  • Educators cannot control their students incoming
    achievement
  • Educators cannot control the effectiveness of
    their students previous teachers/schools
  • Educators cannot control the effects of
    non-instructional student characteristics such
    as
  • Poverty
  • Parental education
  • Mobility
  • Home environment
  • Etcetera

5
The Logic of Value Added, Continued
  • Value Added Models (VAM) attempt to obtain pure
    estimates of the contribution of educators to
    student achievement and/or growth in achievement
  • The promise of VAM is that educators are held
    accountable only for their impact on student
    learning
  • The idea is not rocket science (Sanders), but the
    implementation is (Reckase)

6
The Idea Is Not Rocket Science
  • For each school
  • Estimate the expected average achievement or gain
    score
  • Calculate the observed average achievement or
    gain score
  • Subtract the expected from the observed average
    score
  • Define the resulting difference between expected
    and observed scores as the value added by the
    school

7
The Idea Is Not Rocket ScienceAdjusting
Achievement Targets tobe More Fair to Educators
8
The Idea Is Not Rocket ScienceAdjusting Gain
Targets to be MoreFair to Educators (Tennessee
Model)
9
The Idea Is Not Rocket ScienceAdjusting Gain
Targets to be MoreFair to Educators (Dallas
Model)
10
The Idea Is Getting Closer to Rocket
ScienceAdjusting Yearly Gain Targets to Meeta
Final Achievement Goal (Thum Model)
11
The Implementation IS Rocket ScienceIn a
Growth-Based VAM, For Each School You Must
  • Specify a Mixed Model (a sophisticated
    statistical procedure that accounts for the
    structure of data coming from multiple occasions
    for each student, and multiple students per unit)
  • Estimate an overall average gain for each school
    year, and for the entire set of students and
    schools
  • Estimate a unique expected average gain for each
    school year and school
  • Estimate the difference between the schools
    actual average trajectory and the expected
    average trajectory for each school year and
    school
  • Keep track of previous schools effects so that
    they dont get counted toward later schools
  • Estimate a unique expected gain for each school
    year, student, and school
  • Estimate the difference between the expected gain
    and the actual gain for each school year,
    student, and school
  • Keep track of all differences across years so
    that a students high growth in one year is not
    counted toward all subsequent years
  • Estimate all of these expected and actual gains
    together so that they are unbiased and reliable
  • Do this all using a sparse data matrix, which
    causes ordinary software to choke
  • So, you write your own software, and develop new
    applications of statistical theory to make your
    idea work
  • Communicate the results in an understandable
    fashion to stakeholders

12
The Problem with Rocket Science
  • And with rocket science, many things can cause
    large distortions in the results of VAM,
    including
  • Small problems with the scales of measurement
  • Small programming errors
  • Small errors in assumptions needed for the
    statistical models to work appropriately

13
Statistical Issues in VAM
  • 50 years ago, researchers despaired of every
    being able to measure growth validly, because the
    statistical issues seemed insurmountable
  • Most of the statistical issues have been solved
    by the introduction of Statistical Mixed Models

14
Statistical Issues in VAM, Continued
  • For VAM, one very significant statistical issue
    remains
  • The parts of the statistical models that produce
    estimates of Value Added were originally included
    in statistical models with the purpose of
    accounting for sources of error so that other
    effects were easier to identify. Therefore
  • Therefore, estimates of value added can also be
    classified as error terms
  • Estimates of Value Added are technically the
    portion of achievement or gains that cannot be
    explained by anything else included in the model
  • In effect, the implementation of a Value-Added
    Model says whatever portion of achievement
    and/or growth we do not know how to explain is to
    be attributed to schools

15
Statistical Issues in VAM, Continued
  • Philosophical, ethical, and political
    considerations of attributing to schools all
    achievement/gains that cannot be explained any
    other way
  • Do we have to remove differences explained by
    ethnicity before we can attribute the rest to
    schools?
  • Do we have to remove differences explained by
    poverty before we can attribute the rest to
    schools?
  • Etcetera
  • Is it possible to ever satisfy the majority of
    stakeholders that whats left over is pure enough
    to hold schools accountable for?
  • No matter how we answer these questions, it
    raises additional philosophical, ethical, and
    political concerns.

16
Ethical Issues in VAM, Continued
  • VAMs as Currently Implemented
  • Focus lies squarely on being fair to educators
  • In TN and OH
  • All educators are expected to produce the same
    average gains in their students
  • The achievement gap is expected to remain as it
    was because educators or lower-achieving groups
    of students are not expected to help their
    students catch up
  • In Dallas
  • All educators are expected to produce gains in
    their students that are equivalent to the average
    gains achieved by similar groups of students
  • The achievement gap may be expected to widen
    because lower performing groups of students may
    achieve lower average gains than other groups of
    students

17
Ethical Issues in VAM, Continued
  • Where does VAM take into account fairness for
    low-performing students?
  • Currently implemented VAMs say basically, I need
    to see one years growth for one year of
    instruction where (as in the Dallas model), one
    years worth of growth can be less for some
    groups of students than for others
  • Because of concerns about being fair to
    educators, groups of students that start out
    behind are left behind by the same amount (or
    even more)
  • Thum model is a compromise that expects a modest
    amount more of educators serving low-achieving
    students, but that the gap will be closed over
    many grades
  • Not really a VAM
  • A mixture of status and growth

18
Political Issues in VAM
  • Complexity
  • Rocket Science is a political liability
  • As more of the statistical and ethical issues of
    VAM are addressed, VAMs are likely to become even
    more inaccessible to the lay audience
  • VAM requires an extraordinary amount of trust in
    those who implement the system
  • Ethical issues will be decided by a political
    process that does not necessarily account for the
    best interest of students and educators, e.g.
  • Dallas Focus on best interests of educators at
    the possible price of increasing achievement gaps
  • TN, HO Focus on best interests of educators at
    the possible price of leaving achievement gaps as
    they are
  • Thum Focus on best interests of low-performing
    groups at the possible expense of (1)
    high-performing groups of students, and (2)
    making low-achieving schools less attractive to
    qualified teachers
  • The state of the art in VAM is incapable of
    providing for both high achievement for all
    students and fairness in evaluating educators of
    lower-performing students

19
Measurement Issues in VAM
  • Having solved most of the statistical issues in
    VAM, the measurement issues have been forgotten
    in the excitement

20
Measurement Issues in VAM, Continued
  • Assumes that the same thing is being measured at
    every grade level of the test
  • Presents a dilemma
  • In order to measure validly, we have to measure
    what is being taught, which changes over grade
    levels
  • In order to calculate growth, gains, and
    value-added, we have to measure the same thing
    every time we measure
  • Value added models are being applied to
    construct-shifting scales as if the scales were
    interval-level measures of student achievement on
    unchanging content

21
Cautions in using Vertical Scales
  • Scholars have been warning against the use of
    construct-shifting scales to measure growth for
    50 years
  • However, the use of vertical scales in growth
    models has become increasingly prevalent in
    scholarly literature with the advent of recent
    statistical developments (HLM and SEM)
  • So am I just straining at gnats?
  • Cant I just use vertical scales to measure
    growth?
  • What harm can it do?
  • How big is the effect of changing content on
    growth- and growth-based value-added models?

22
Hypothetical example
  • A vertically scaled mathematics test
  • Grades 3-8
  • Composed of only two constructs
  • Basic Computation (BC)
  • Problem Solving (PS)
  • BC is heavily represented in early grades
  • PS is heavily represented in later grades
  • Only the single, combined math score is available
    (BC and PS are just in the background)

23
Hypothetical example
24
Hypothetical Example
25
Hypothetical Example
26
The Effects of Construct Shift
  • Construct shift affects
  • The estimation of educational effectiveness (the
    results of Value-Added Models)
  • Does not accurately identify effectiveness if
    student achievement is outside the range measured
    well by the grade-level test
  • Attributes effectiveness of prior
    teachers/schools to current teachers/schools
    (violates the promise of Value-Added Models)

27
(No Transcript)
28
Reliability
  • Ratio of construct-related variance to total
    variance (construct-related plus
    non-construct-related variance)
  • Extend to Value-Added Models
  • Ratio of variance in true value added to total
    variance (true value-added variance plus variance
    of distortions)
  • How important is this distortion, especially when
    the constructs are correlated?

29
Reliability
Martineau (in press) derived an an upper bound
on reliability of VAM
  • Affected by content balance (more balanced means
    lower reliability)
  • Affected by correlation in value added (higher
    correlation means higher reliability)
  • Affected by grade level (later grades have lower
    reliability)
  • Affected by magnitude of changes in content
    across grades (larger changes mean lower
    reliability)

30
Reliability of VAM Results
31
Reliability
  • Only in extraordinary circumstances are the
    results reliable enough for high-stakes use
  • For research use, the results may be reliable
    enough in some limited circumstances

32
Alleviating low reliability of value-added
analyses
  • Twice a year testing
  • Not politically viable
  • Completely eliminates low reliability
  • Once yearly testing, new equating design
  • Embed the entire set of below-grade items on the
    current grade test by including a small portion
    of the set on each of multiple test forms
  • Calibrate a separate vertical scale for each
    adjacent pair of grades (e.g. 3/4, 4/5, 5/6)
  • Concurrent calibration of grade 3 and 4 items
    together, 4 and 5 items together, 5 and 6 items
    together
  • Should markedly reduce the amount of construct
    shift, and increase the reliability to an
    acceptable degree

33
Contact Information
  • Joseph Martineau
  • Office of Educational Assessment Accountability
  • Michigan Department of Education
  • P.O. Box 30008
  • Lansing, MI 48909
  • (517) 241-4710
  • martineauj_at_michigan.gov
Write a Comment
User Comments (0)
About PowerShow.com