Title: Evaluating Impacts of MSP Grants
1Evaluating Impacts of MSP Grants
Common Issues and Potential Solutions
Ellen Bobronnikov March 30, 2009
2Overview
- Purpose of MSP Program
- GPRA Indicators
- Teacher Content Knowledge
- Student Achievement
- Evaluation Design
- Timeliness
- Application of Rubric to Determine Rigor of
Evaluation Design - Key Criteria for a Rigorous Design
- Common Issues and Potential Solutions
3Purpose of MSP Program
- The MSP Program supports partnerships between
STEM faculties of institutions of higher
education (IHE) and teachers in high-need school
districts. These partnerships focus on - Facilitating professional development activities
for teachers that focus on improving teacher
content knowledge and instruction, - Improving classroom instruction, and
- Improving student achievement.
- These are linked to indicators that the MSP
Program needs to report on annually.
4 GPRA Indicators for MSP Program
- Under the Government Performance and Results Act
(GPRA), all federal agencies are required to
develop indicators in order to report to the U.S.
Congress on federal program impacts and outcomes.
For the MSP Program, the following indicators
have been developed that look at the effects of
the program on teacher and student outcomes - Teacher Knowledge
- The percentage of MSP teachers who significantly
increase their content knowledge as reflected in
project-level pre- and post-assessments. - Student Achievement
- The percentage of students in classrooms of MSP
teachers who score at the basic/proficient level
or above in State assessments of mathematics or
science. - Note The information necessary to report on
these indicators is taken directly from the APR.
5GPRA Indicators for MSP Program (continued)
- In order to provide information about the impact
of the MSP intervention on teacher and student
outcomes, a rigorous evaluation design is
necessary. The following indicator gets at design
issues. - Evaluation Design
- The percentage of MSP projects that use an
experimental or quasi-experimental design for
their evaluations that are conducted successfully
and that yield scientifically valid results.
6Measuring GRPA Indicators Evaluation Design
- Criteria for Evaluating Designs
- We apply the Criteria for Classifying Designs of
MSP Evaluations (hereafter, referred to as the
rubric) to projects to determine which projects
had rigorous evaluations. - The rubric sets the minimum criteria for an MSP
evaluation to be considered rigorous. There are
seven criteria included. An evaluation has to
meet each of the seven criteria in order to meet
the GPRA indicator. - Based on our previous experience, we have found
that one of the most common issues in meeting all
of the criteria is missing data. Therefore,
throughout this presentation, we will let you
know the information we need to apply the rubric. - Information Sources
- We apply the rubric to final year projects only.
- We primarily use the information contained in the
final evaluation reports, but we compare it to
the evaluation data contained in the APRs, and
the data do not always agree. It is important to
ensure the information contained in all sources
is consistent, and that information contained in
the final evaluation report is complete.
7Rubric Criteria
- Type of design needs to be experimental or
quasi-experimental with comparison group - Equivalence of groups at baseline for
quasi-experimental designs, groups should be
matched at baseline on variables related to key
outcomes - Sufficient sample size to detect a real impact
rather than chance findings - Quality of measurement instruments need to be
valid and reliable - Quality of data collection methods methods,
procedures, and timeframes used to collect the
key outcome data need to be comparable for both
groups - Attrition rates no more than 70 overall up to
15 differential attrition between groups - Relevant statistics reported treatment and
comparison group post-test means, and tests of
statistical significance for key outcomes
8Applying the Rubric Type of Design
- 1. Type of Design
- To determine impact on teacher and student
outcomes, need to use an experimental or
quasi-experimental design with a comparison
group. - Common Issues
- Many projects used one-group only pre-post
studies. These do not account for differences
that would have naturally occurred in the absence
of the intervention. - Potential Solutions
- Using a comparison group will help to make a much
more rigorous study.
9Applying the Rubric Baseline Equivalence
- 2. Baseline Equivalence of Groups (Quasi-
Experimental Only) - Demonstration of no significant differences
between treatment and comparison at baseline on
variables related to the studys key outcomes. - Pre-test scores should be provided for treatment
and comparison groups. - A statistical test of differences should be
applied to the treatment and comparison groups.
10Applying the Rubric Baseline Equivalence
- Common Issues
- No pre-test information on outcome-related
measures. - Pre-test results given for the treatment and
comparison groups, but no tests of between groups
differences. - Potential Solutions
- Administer pre-test to both groups and test for
differences between groups. - Alternatively, provide means, standard
deviations, and sample sizes of pretest scores
for both groups, so differences can be tested. - If there were differences at baseline, control
for the differences between groups in statistical
analyses.
11Applying the Rubric Sample Size
- 3. Sample Size
- Sample size is adequate
- Based on a power analysis with recommended
- significance level 0.05
- power 0.8
- minimum detectable effect informed by the
literature or otherwise justified
12Applying the Rubric Sample Size
- Common Issues
- Power analyses rarely conducted.
- Different sample sizes given throughout the APR
and Evaluation Report. - Sample sizes and subgroup sizes not reported for
all teacher and student outcomes or are reported
inconsistently. - Potential Solutions
- Conduct power analyses.
- Provide sample sizes for all groups and subgroups.
13Applying the Rubric Measurement Instruments
- 4. Quality of the Measurement Instruments
- The study used existing data collection
instruments that had already been deemed valid
and reliable to measure key outcomes or - Data collection instruments developed
specifically for the study were sufficiently
pre-tested with subjects who were comparable to
the study sample, and instruments were found to
be valid and reliable.
14Applying the Rubric Measurement Instruments
- Common Issues
- Locally developed instruments not tested for
validity or reliability. - Instrument identified as not tested for validity
or reliability in APR, but instruments were
pre-existing instruments that had already been
tested for validity and reliability. - Use many instruments, but do not report validity
or reliability for all of them. - Assessments aligned with the intervention (this
provides an unfair advantage to treatment
participants). - Potential Solutions
- Report on validity and reliability on all
instruments. If the instrument was designed for
the study, conduct a validity and reliability
study. - If using pre-existing instrument, cite the
validity and reliability of instrument. If using
part of existing instruments, consider using full
subscales rather than selecting a limited number
of items. - Do not use instruments which may provide an
unfair advantage to a particular group.
15Applying the Rubric Data Collection Methods
- 5. Quality of the Data Collection Methods
- The methods, procedures, and timeframes used to
collect the key outcome data from treatment and
comparison groups were comparable. - Common Issues
- Little to no information is provided in general
about data collection or only provided for
treatment group. - Timing of the tests were not comparable for
treatment and comparison groups. - Potential Solutions
- It is important to provide the names and timing
of all assessments given to both groups.
16Applying the Rubric Attrition
- 6. Attrition
- Need to retain at least 70 of original sample
AND - Show that if there is differential attrition of
more than 15 between groups, it is accounted for
in the statistical model. - Common Issues
- Attrition information is typically not reported,
or is reported for treatment groups only. - Sample and subsample sizes are not reported for
all groups or are reported inconsistently. - Potential Solutions
- Provide initial and final sample sizes for all
groups and subgroups.
17Applying the Rubric Statistics Reported
- 7. Relevant Statistics Reported
- Include treatment and comparison group post-test
means and tests of significance for key outcomes
OR - Provides sufficient information for calculation
of statistical significance (e.g., mean, sample
size, standard deviation/standard error).
18Applying the Rubric Statistics Reported
- Common Issues
- Projects report that the results were significant
or non-significant but do not provide supporting
data. - Projects provide p-values but do not provide
means or standard deviations. - Projects report gain scores for the treatment and
comparison groups but do not provide
between-group tests of significance. - Potential Solutions
- Provide full data (means, sample sizes, and
standard deviations/errors) for treatment and
comparison groups on all key outcomes. - Provide complete information about statistical
tests that were performed for both groups.
19Projects with Rigorous Designs
- Projects that meet all of the rubric criteria
will be able to make a more accurate
determination of impact of their program on
teacher and student outcomes.
20(No Transcript)