Title: Chapter%204.%20Validity:
1Chapter 4. Validity
- Does the test cover what we are told (or believe)
- it covers?
- To what extent?
- Is the assessment
- being used for an
- appropriate purpose?
2Validity Topics
- Definition (usual and refined)
- Categories of validity evidence
- A. face validity
- B. content validity table of specifications,
alignment analysis, opportunity to learn - C. criterion-related validity
- D. construct validity
- E. consequential validity
- Test fairness
3Introduction
- Without good validity, all else is lost. Validity
is the most important characteristic of a test or
assessment technique. - Usual Definition
- It measures what it purports to measure.
- Refined Definition
- It involves the interpretation of a score for a
particular purpose or use (because, a score may
be valid for one use but not another) - It is a matter of degree, not all-or-none. As a
practical matter, our concern is to determine the
extent (for example in non-mathematical terms we
might say slight, moderate, considerable)
4Some Helpful Terms
- Construct
- The trait or characteristic that interests us.
We might call it a target or what we want to
get at. We create a test to cover this
attribute. - Validity addresses how well an assessment
technique provides useful information about the
construct / target. - Construct underrepresentation
- The test we made is not assessing all of the
construct our test misses things we should be
assessing. - Construct irrelevant variance
- The test we made is assessing things that are not
really part of our construct we are assessing
irrelevant stuff that we dont want. - see next two slides for illustrations
5The Construct and Valid Measurement
6Varying Degrees ofConstruct Underrepresentation
andConstruct Irrelevant Variance
7A. Face ValidityThink of the idiom on the face
of it . . .
- A test is said to have face validity if it "looks
like" it is going to measure what it is supposed
to measure - Face validity is not empirical one is saying
that the test appears it will work, as opposed
to saying it has been shown to work. - Face validity is often created to influence the
opinions of participants who are not expert in
testing methodologies, e.g. test takers, parents,
politicians.
8B. Content ValidityMost used in achievement
tests and employment exams
- Meaning of this type of validity
- there is a good match between the content of the
test and some well-defined domain of knowledge or
behavior. Reference to content defines the
orientation of the test. - For teachers, considered most important type of
validity for - your own classroom tests
- achievement tests
- Where do we find the well-defined domain
- Examination of textbooks in the field with
special attention to the learning objectives at
beginning of chapter and terms at the end. - Curriculum guides of school districts
- Ohios Academic Content Standards
- So, we now we have the content topics identified,
but what should we actually expect students to
know and be able to do in relation to these
topics? This question deals with process or
depth indicators. How should we make sure we
include both the content and the depth expected
in our tests?
9The Table of SpecificationsBuilding content
validity into my own classroom tests
- Table of Specifications this connects the
content determined earlier to the mental
processes students are expected to employ
regarding this content - Two way table
- Content
- Blooms taxonomy (simplest mental operation to
the most complex) - Each test item I create then falls into one cell
- By creating the table, I can see the relative
weight assign to each cell. Is this what I want?
10Alignment AnalysisChecking content validity in
existing tests
- These steps are parallel to building your own
good test and the table of specifications
construction. There are some things to watch for
and consider as you do this - Be wary of using the summary outline provided by
the test maker examine the actual test items - Match items on test with content you are
teaching watch for mismatches - Items on the test you are not teaching
- Content you are teaching that is not tested
- This matching requires considerable judgment
- The test does not have to cover every detail it
could be a representative sample - If stakes are high, use a panel of individuals
11Opportunity to LearnBut was it taught . . .
- An emerging idea related to content validity is a
concern called instructional validity. This
relates to your behavior as teacher. The content
may be in the book the content may be in the
state standards . . . BUT . . . did you actually
teach it? Some teachers skip items of
instruction they dont like, dont understand or
dont have time for. - If related items appear on a test, this would
reduce the validity of the test since the
students had no opportunity to learn the
knowledge or skill being assessed.
12C. Criterion-Related ValidityWhile the term
test is used, also think measure or
procedure
- The basic idea to demonstrate the degree of
accuracy of a test by comparing it with another
test, measure or procedure which has been
demonstrated to be valid (i.e. a valued
criterion). - Two general contexts
- predictive validity - one measure is now one is
later. The later test is known to be valid.
This approach allows me to show my current test
is valid by comparing it to a future valid test. - For example, a behind-the-wheel driving test has
been shown to be an accurate test of driving
skills. By comparing the scores on a written
rules-of-the-road test with the scores from the
driving test, the written test can be validated
by using a criterion related strategy. - concurrent validity both measures are current.
This approach allows me to show my test is valid
by comparing it with an already valid test. I
can do this if I can show my test varies directly
with a measure of the same construct or
indirectly with a measure of an opposite
construct. - The computed statistic in both cases is r
(which we now call a validity coefficient) and it
has all the characteristics we have already
discussed about correlations coefficients in
general.
13Special Considerations for Interpreting
Criterion-Related Validity
- Group Variability
- Greater the variability, the greater the r.
- Reliability-Validity Relationship
- Reliability limits validity reliability is a
prerequisite to validity - Validity of the Criterion
- How good is the criterion? Do you agree with the
operational definition of the critierion?
14D. Construct Validity
- When we ask about a tests construct validity, we
are taking a broad view of the test. Does the
test adequately measure the underlying,
unobserved construct? The question is asked both
in terms of - convergent validity, are test scores related to
behaviors and tests that it should be related to
and - divergent validity, are test scores unrelated to
behaviors and tests that it should be unrelated
to? - There is no single measure of construct
validity. Construct validity is based on the
accumulation of knowledge about the test and its
relationship to other tests and behaviors. - To establish construct validity, we demonstrate
that the measure changes in a logical way when
other conditions change.
15E. Consequential ValidityRecent controversial
entry into assessment lexicon . . .
- Some professionals feel that, in the real world,
the consequences that follow from the use of
assessments are important indications of
validity. - Some professionals feel that these consequences
are matters of politics and policymaking
important considerations, yes, but not matters of
validity. - On which side are we? As educators, we sometimes
see the consequences as more important than the
technical validity of the test. Judgments based
on assessments we give and use have value
implications and social consequences. - What is the intended use of these test scores?
- How are the scores really being used?
- Does this testing lead to educational benefits?
- Are there negative spin-offs?
16Test Fairness, Test Bias
- Test fairness / test bias have the same meaning
with opposite connotations - Fairness an assessment or test measures a
trait, construct, or target with equal validity
for different groups. - Bias the groups do not differ in terms of real
status on the trait, construct, or target being
assessed yet, the test suggests they do.
17Methods of Reviewing Fairness
- Test Companies (look in test manual to see what
a particular company did about test fairness
issues on this test) - Panel review - most popular but is this just
face validity? - Differential item functioning (DIF) - subsets
- Criterion-related validity whole test
- Teacher Created Assessments (teachers need to
be knowledgeable about, and sensitive to, issues
of test fairness) - Is there anything about my test that will
unfairly advantage or disadvantage a student or
group of students? - Is there anything about the mechanics of the test
that calls for skills other than those I intend
to measure?
18Practical Advice
- For building your own tests, think content
validity. - For judging externally prepared achievement test,
start with a clear definition of whats to be
covered. - For criterion-related validity, take into account
group variability and think about validity of
the criterion. - For test fairness (bias), distinguish between
differences in groups average scores and group
status on the trait. - For your own assessments, try to eliminate the
influence of any factors not related to what you
want to measure.
19Terms Concepts to Review andStudy on Your Own (1)
- alignment analysis
- Blooms taxonomy
- concurrent validity
- consequential validity
- construct
- construct irrelevant variance
- construct underrepresentation
- construct validity
- content validity
- criterion-related validity
20Terms Concepts to Review andStudy on Your Own (2)
- differential item functioning (DIF)
- external criterion
- face validity
- Fairness (or its opposite, bias)
- instructional validity
- opportunity to learn
- predictive validity
- table of specifications (two-way table)
- validity
- validity coefficient