Title: Assessment Adjustments Updates
1Assessment Adjustments Updates
- Debbie Swensen
- USOE
- January 10, 2008
2The Primary Issues
- Math CRT Implications resulting from curriculum
changes - CBT/PBT Comparability issues related to Equating
3Math CRT Implications resulting from curriculum
changes
4Math
- Test is based on 2007 content standards (Core),
while instruction is supposed to be based on 2008
Core - Changes evident in the new Core generally require
the teaching of key concepts earlier than was the
case with the old Core - NCTM Focal Points guide new core
- Uncertain about implementation fidelity of the
new Core however, there is strong anecdotal
evidence of high implementation
5Why Not Break the Chain
- Why not just say the two tests are different and
reset standards now? - Still using old test, so we couldnt even set
standards on the new Core if we wanted - Will have more of a chance to teach the new Core
with a year of preparation
6A Bridge
- Therefore, we must find a way to bridge the 2007
and 2008 test scores that - Is defensible for AYP and UPASS calculations
- Supports good instruction
- Validly reflects the general interpretation of
the results (that the test results reflect the
new curriculum)
7Plans for Scoring the Math CRTs
- Provide scores based only on those items that
match the new Core
8Plans for scoring the math CRTsSupport for
Decision
- The PAC and the district assessment directors
expressed concern with having the scores for the
2008 test include content that was not supposed
to be taught and learned - Members of the TAC originally questioned this
assumed level of implementation fidelity, but the
district representatives as well as the USOE
curriculum section feel quite strongly that
teachers have shifted to the new Core for this
school year - Evidence
- Core Academy
- District plans and Professional Development
- USOE generated old/new curriculum comparison
tools
9Plans for scoring the math CRTsSupport for
Decision
- The TAC suggested several studies to evaluate
both the instructional sensitivity of the test
items and the ability of schools to shift
instruction to fully implement the new Core
10Scoring Plan Details
- The TAC recommends only including 2008 items in
the test scores - In other words, those items not matching the new
Core at all will be deleted from scoring, - Why not delete these items from the test overall?
- Production realities
- Only a few of these items per test
- Remaining items will not distract from over all
test response.
11Raw score reports
- The raw score reports generated from both the CBT
or from the USOE scanned paper tests will be
based ONLY on the items used to generate the
student scores - For example, 4th grade student scores and raw
score reports will be based on the 60 remaining
items from the originally 65 item set - Subscore (standard/objective) reports will be
based on the 2008 Core objectives - some objective reports will be based on fewer
items than we would feel comfortable reporting in
this manner for validity concerns, but this is
the best we can do this year - This will provide excellent reasons to provide
education on the appropriately interpreting test
results
12CBT/PBT Comparability issues related to Equating
13Scaling and Equating plans for 2008 Wrestling
with issues of comparability
- Lords maxim the only valid equating design is
when the same examinees are taking the same test
items. In other words, we dont need to equate! - Given that the best equating plans violate
Fredrick Lords maxim about equating we know that
we are facing an uphill challenge here - The USOE national technical advisory committee
(TAC) recommended the following general approach
for equating the 2008 and 2007 scales
14Premises for Decisions
- We do not expect to find a significant degree of
variance due to modality - We will not disadvantage CBT testers
152008 CRT Equating Framework
- We will conduct our normal equating to place the
2008 scores on the 2007 scale, except that we
will base the equating on the 2008 CBT results to
serve as a basis for going forward to 2009 and
beyond - Based on what we find for mode differences (we do
not expect to find much variancesee following
slides), we will adjust the scores such that the
CBT testers will not be disadvantaged (for this
year only)
162008 CRT Equating Framework
- USOE technical advisors and the TAC will continue
to work on the specifics of this approach in
coming weeks - Because the CBT and PBT testers are not randomly
assigned, we need to adjust for pre-existing
performance differences - Once we adjust for these pre-existing difference
in student performance, we will then evaluate the
differences due to test modality
17Meta-Analysis of Multiple-Choice Comparability
Studies (from Neal Kingston)
- Looked for K-12 studies of multiple-choice tests
published or presented 1997-2006 - Journal tables of contents
- Conference programs
- Internet search (Google, Google Scholar)
- Contacting researchers
- Found 14 usable articles reporting on 81 studies
- Sample sizes ranged between 42 and 4,333
18From Neal Kingston
19Calculate Effect Sizes (N. Kingston)
Positive number means did better on computer
negative means did better on paper
20Note This plot indicates that over all of the
studies, there was NO average effect of CBT vs.
PBTfrom N. Kingston
Paper
Computer
median
21Weighted Mean Effect Size by Grade (Note that all
effects hover close to zero, Kingston, 2007)
22Weighted Mean Effect Size by Subject (Note all
effects hover around zero, Kingston, 2007)