Title: On-demand learning-embedded benchmark assessment using classroom-accessible technology
1On-demand learning-embedded benchmark assessment
using classroom-accessible technology
- Discussant Remarks
- Mark Wilson
- UC, Berkeley
2Outline
- What does Validity look like for these papers?
- What is it that these papers are distinguishing
themselves from? - Where might one go from here?
3Need for strong concern about validity
- Effect of NCLB requirements
- Schools are instituting frequent benchmark
tests - Intended to guide teachers as to students
strengths abd weaknesses - Often just little copies of the State test
- Teachers are complaining that it puts a vice-like
grip on the curriculum
4The Triangle of Learning standard interpretation
5The vicious triangle
6Validity
- 1999 AERA/APA/NCME Standards for educational and
psychological tests - Five types of validity evidence
- Evidence based on test content
- Evidence based on response processes
- Evidence based on internal structure
- Evidence based on external structure
- Evidence based on consequences
7Paper 1 Falmange et al-ALEKS
- Reliability gt Validity
- the collection of all the problems potentially
used in any assessment represents a fully
comprehensive coverage of a particular
curriculum, ..hence...arguing that such an
assessment, if it is reliable, is also
automatically endowed with a corresponding amount
of validity is plausible.
8Paper 1 Falmange et al-ALEKS
- Test content
- Theory of the Learning Space
- inner fringe and outer fringe
- the summary is meaningful for an instructor
- Database of Problems
- a consensus among educators that the database of
problems is a comprehensive compendium for
testing the mastery of a scholarly subject. This
phase is relatively straightforward. - Evidence Who were the experts?/What did they
do?/How much did they agree?
9Paper 1 Falmange et al-ALEKS
- Evidence based on response processes
- E.g., for selected K, Do students in K say things
that are consistent/inconsistent with that - Evidence based on internal structure
- E.g., for selected K, Do students in K have
high/low success rates at instances in K - Evidence based on external structure
- E.g., comparison with teacher judgments of
student ability - Evidence based on consequences
- E.g., use of fringesdoes this help/hinder
teacher interpretations
10Paper 2 Shute et al-ACED
- Two validity studies
- Study 1 Evidence based on external structure
- Prediction of residuals from external post-test
after controlling for pre-test - Informative design on conditions elaborated
feedback better - Study 2 Evidence based on response processes
- Usability study for students with disabilities
11Paper 2 Shute et al-ACED
- Evidence based on test content
- reference to earlier paper
- Evidence based on internal structure
- Could easily be investigated, as there is
interesting internal structure (Fig. 1) - Evidence based on consequences
- Probably not any real consequences yet
12Paper 3 Heffernan et al -ASSISTment System
- Evidence based on test content
- Items coded by 2 experts, 7 hrs.
- skill of Venn Diagram
- Evidence based on internal structure
- Which skill-model fits best--1, 5, 39, 106
skills? - Which number is different?
- 4.10, 4.11, 4.12, 4.10, 4.10
- 1, 5, 39, 106 (twice)
13Paper 3 Heffernan et al -ASSISTment System
- Evidence based on external structure
- Prediction of MCAS
23/38 61 dont fit well for the best model
(WPI-39 (B)).
14Paper 3 Heffernan et al -ASSISTment System
- Evidence based on response processes
- ?
- Evidence based on consequences
- Probably are real consequences
15Paper 4 Junker-ASSISTment System
- Two Validity studies
- Study 1 Evidence based on external structure
- Prediction of MCAS scores
- Study 2 Evidence based on internal structure
- 4 internal structure patterns
- 2 questions
- Q1 Regarding how scaffolds get easier--what
happens when you get a scaffold wrong? - Q2 What about the gap?
16(No Transcript)
17Paper 4 Junker-ASSISTment System
- Rest of types of validity--see Paper 3
18Looking Beyond
- What does this group of papers have to offer?
- What should it be looking out for?
19Paper 1 Falmange et al-ALEKS
- Inner and Outer Fringe
- What do teachers think of them, what do they do
with them? - Standardized tests, psychometrics as straw
men - Alternative compare ones work to the latest
developments in item response modeling (e.g.,
EIRM)
20Paper 2 Shute et al-ACED
- Weight of Evidence
- Good alternative to Fisher information
- Transparent, easily interpretable
- Models for people with disabilities
- Most likely going to have different internal
structure - Need to develop broader view of internal
structure criteria
21Paper 3 Heffernan et al -ASSISTment System
- MCAS as starting point for diagnostic testing?
- Using released items?!?
- What is unidimensionality
22Paper 3 Heffernan et al -ASSISTment System
- In a latent class model, the latent class looks
like this - In an item response model (e.g., Rasch model),
unidimensionality looks like this
See Karelitz, T.M., Wilson, M.R., Draney, K.L.
(2005). Diagnostic Assessment using Continuous
vs. Discrete Ability Models. Paper presented at
the NCME Annual Meeting in San Francisco, CA.
23Paper 4 Junker-ASSISTment System
- What is the effect of assuming MCAR/MAR
assumptions when neither is true? - Relevant to all CAT
- Or of assuming you know the response under NMAR
- Is there a discrimination paradox in DINA models?
- Why do scaffold questions get easier?
24Future Directions
- What is a Knowledge State (KS)
- How do we test if its a unitary thing?
- What if it isnt?
- Mixture models--structured KSs
- Do teachers (and other practitioners) find the
KSs useful - How to adjust if they dont?
- finer/coarser grained
- structured