Title: Teachers and Student Achievement in the Chicago Public High Schools
1Teachers and Student Achievement in the Chicago
Public High Schools
- Daniel Aaronson
- Federal Reserve Bank of Chicago
- Lisa Barrow
- Federal Reserve Bank of Chicago
- William Sander
- DePaul University
- latest version, on my hard disk in Chicago
2What Are We Trying To Do?
- Estimate the importance of teachers to
educational achievement. - Why does the Fed care about this?
- Productivity study of Teachers and (in the
future) Students. - Test scores are an indicator of future student
productivity. Grogger and Eide (1995), Murnane
et al (1995), Neal and Johnson (1996), Hanushek
and Kimko (2000), Bishop (1992)
3Lots of Policy Implications Along the Way
- How to compensate teachers.
- Most industries, ProductivityCompensation.
- How to set up accountability standards.
- How sensitive are the teacher rankings to
specification issues? Does this matter for
accountability systems? - Can the econometrician predict who the good
teachers are? - Or more importantly, can the principal?
- How to determine hiring, tenure policy.
- Critical for thinking about other policy levers
-- e.g. reducing class size -- with
quality/quantity tradeoffs.
4New Literature on Teacher Effects
- Original U.S. study--Coleman (1966).
- 1990s use of administrative records. Pioneered
in Tennessee and Texas. - Advantages
- Micro data -- lots of cross-sectional variation.
- Longitudinal ability to minimize sorting
behavior and other confounding factors by looking
at fixed effect models, repeated measures and
multiple cohorts. - Many students per teacher.
5What Do We Contribute?
- Large urban, mostly minority, mostly poor school
system. - Critical for policy.
- Chicago particularly useful since it was doing so
poorly (perhaps less so now). i.e. Secretary
Bennetts fondness for the Chicago schools.
6School Districts in the U.S., 2000-01
7What Do We Contribute?
- Match students with teachers at the classroom
level. - No aggregation issues. Level that plausibly
corresponds to the intervention effects.
8What Do We Contribute?
- High schools (most studies on elementary
schools). - Can look at subject rather than general teachers.
- Populations rather than samples.
- Know everyone that is in every classroom
(including non-math classes). - Decent info on teacher characteristics.
- Covers major compensation factors.
- Can isolate quality coming from observed and
unobserved stuff. - Many of these features are available in other
datasets but rarely together.
9Data
- Administrative records from the Chicago Public
High Schools for 1996-97 to 1999-2000 (3 years). - Only use 9th grade to this point but have all of
HS. - Population 27,000 to 29,000 9th grade students
in each year. - Sample 53,000 unique kids
- See paper for discussion of sample selection
issues
10Table 1 -- Student Descriptive Statistics
11Table 2 --More Student Test Score Statistics
12Sampling of other stuff available to us (Table 1)
13Table 4 -- Teacher Characteristics
14What Do We Do?
- Step 1 Estimate teacher quality.
- Step 2 Estimate the relationship between
measured teacher quality and observable teacher
characteristics.
15Estimating Teacher Quality
- Simple strategy ? value added model (include
lagged dependent variable(s) on RHS). Picks up
cumulative inputs for prior years while allowing
for flexible autoregressive relationship in test
scores.
16Estimating Teacher Quality
- Problem is biased by (simple
representation) - where Nj is the number of students per teacher
- I.e. the teacher dummies may be confounded by
time, school, individual and family (especially
nonrandom sorting), and random fluctuations that
should not be attributed to the teacher effect.
schools
time
white noise
family,indiv
17Individual and Family Effects
- Gains help here.
- Also can control for lots of stuff (see paper).
- Have to be sorting into certain teachers based on
changes in unobserved characteristics. - Throw out transition schools where this is
likely. - Clotfelter et al (2004).
- How much within-school classroom sorting is
there? Table 3 mean variance by teacher of
lagged test scores.
18School Effects
- Sorting across schools is likely important.
- ie. School-level policies (e.g. curriculum),
personnel (principal), latent family or
neighborhood characteristics that might influence
school choice. - Note funding and most curricula decisions are
central to the district and thus are not in play
here. - School fixed effects look at only within school
variation. - Dont have to assign school effect to particular
measures. - Variation used? Alternatives.
19Sampling Variation
- Kane and Staiger (2002) big problem
- Fixed effects in small samples can be severely
problematic. Sampling variation can overwhelm
signal. A few good (or bad) apples upset the
cart. - Variability is strongly related to the number of
observations that make up the teacher fixed
effect. I.e. teachers with low numbers of
student tend to be the highest and lowest
performing in a literal interpretation of the
fixed effect distribution. - Artificially inflates our FE dispersion.
20Figure 2 -- Teacher Effect Estimates versus
the Number of Student-Semester Observations
Regress on gt -0.00047
(0.00008). Disappears when gt200.
21Sampling Variation -- What Do We Do?
- Trim outlying observations on test score gains.
- Set minimum number (15, 50, 100) of
student-semester observations for identification. - Adjust for the size of sampling error by
assuming that the estimated teacher effect is the
sum of the actual effect and noise. - Use the mean of the square of the standard error
estimates of as an estimate of sampling
variance and subtract this from the observed
variance. - If Nj is big enough (around 200), this problem
essentially goes away. Practically, we cant
restrict to these guys though (misses interesting
group).
22Table 7 -- Distribution of Teacher Effects
23Is this Teacher Quality?
- Transition matrices -- year-to-year movement in
teacher quality. Measure of stability (permanent
vs. transitory). - Reestimate production function but with time
subscripts on . - First, separate into quartiles (reduce
measurement error). - Should be masses on diagonals.
- Pure noise would be equal shares in each cell.
Easily reject random draw scenario though.
24Table 8 -- Transition Matrices
25More on Year to Year Movement
- Can do with continuous measure too
- if pure noise, correlation in gains will be 0.5
- get about 0.5 for t-1, 0.3 for t-2.
- However, intensify sampling variability by
looking at year-to-year movements. Probably a
no-no. - Similar to results in Kane and Staiger (2002) for
NC schools.
26More on Year to Year Movement
- Of those in top decile in year t
- 18 are there in year t1
- (random 10 ). Statistically significant.
- 22 of those in 1997 are there in 1999.
- Of those in the bottom decile in year t
- 12 are there in year t1
- Turnover higher. To appear in the transition
matrix, you must be in the records 2 years in a
row. But those at the bottom are less likely to
reappear. Random draw is no longer 10. After
adjusting, looks statistically significant. - 23 of those in 1997 are there in 1999.
27How consistent are the rankings across
specifications?
- Each regression produces a teacher quality score.
- Q Does the way the regression is specified
matter to how a teacher is ranked? - If so, suggests potential concern about how
accountability standards are set up.
28How consistent are the rankings across
specifications?
Correlation matrix of teacher FE from
Specifications 1-5 on table 7
Warning label Preliminary. no adjustment
for sampling variation by individual teacher
These are lower bound.
All specifications include year FE and lagged
scores. 2 adds basic student demographics, 3 adds
richer student covariates and peer measures, 4
adds school FE (but no peer and limited student
stuff), 5 is kitchen sink
29How consistent are the rankings across
specifications? Predicting bottom 10 percentile
of teachers
Share of teachers commonly ranked in bottom 10
percentile
Warning label Preliminary. no adjustment
for sampling variation by individual teacher.
These are lower bound.
All specifications include year FE and lagged
scores. 2 adds basic student demographics, 3 adds
richer student covariates and peer measures, 4
adds school FE (but no peer and limited student
stuff), 5 is kitchen sink
30How consistent are the rankings across
specifications? Predicting the top 10 percentile
of teachers
Share of teachers commonly ranked in top 10
percentile
Warning label Preliminary. no adjustment
for sampling variation by individual teacher.
These are lower bound.
All specifications include year FE and lagged
scores. 2 adds basic student demographics, 3 adds
richer student covariates and peer measures, 4
adds school FE (but no peer and limited student
stuff), 5 is kitchen sink
31Robustness-- Cream Skimming
- Discourage certain students from taking test.
Look at correlation between and share missing
scores in teacher js classes -0.044 (0.196). - No evidence.
- Report scores of those who do well but are
exempt. Correlation between and share of
students excluded 0.083 (0.015). - Hmmm. So we excluded anyone (12 of sample) who
is exempt and reran the results. Did not change
anything.
32More Robustness Checks (table 9)
33By Student Initial Ability (table 9)
34Table 10 -- Controlling for English Teachers
35Can we predict teacher quality from resume items?
- Because of concerns raised by Moulton (1986)
about the efficiency of OLS estimates of teacher
attributes in the presence of a school-specific
fixed effect and multiple teachers per student,
we do not include the teacher characteristics
directly in the production function . - Rather, use a GLS estimator that regresses
teacher effects on teacher characteristics. - See paper for technical details.
36Can we predict teacher quality from resume items?
- Vast majority of teacher quality unexplained by
observables (90), even after correcting for
sampling error. See table 11. - Measures used for compensation purposes --
tenure, advanced degrees, certification is
probably 3 or less percent. - BA major matters. But most demographic (except
female) and human capital traits dont. - But at least principals can look at the
autoregression!!
37Conclusions
- Lots of issues related to using test scores to
evaluate teachers. - Dispersion of teacher effects can be way off in
Naïve regressions. - But consistency of teacher rankings are not too
bad (more work to be done here), especially if
include school fixed effects. - Teachers matter and to all groups of students
- Perhaps differentially (more to be done here).
- Unobservable teacher characteristics seem to
drive much of the dispersion in teacher quality.
But the principal can observe productivity over
time.
38(No Transcript)
39(No Transcript)