Validity%20

About This Presentation

Title:

Validity%20

Description:

We cannot assess content or criterion validity for such characteristics ... Validity is a characteristic of evidence, not of tests. ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 58

Provided by: Patr581

Category:

more less

Transcript and Presenter's Notes

Title: Validity%20

1
Validity Outline

Definition
Two different views Traditional
Two different views CSEPT
Face Validity
Content Validity CSEPT
Content Validity Borsboom

2
Validity Outline

Criterion Validity CSEPT
Predictive vs. Concurrent
Validity Coefficients
Criterion Validity Borsboom
Construct Validity CSEPT
Convergent
Discriminant

3
Validity Definition

Validity measures agreement between a test score
and the characteristic it is believed to measure

4
Validity CSEPT view

Validity is a property of test score
interpretations

Validity exists when actions based on the
interpretation are justified given a theoretical
basis and social consequences

5
Validity Traditional view

Validity is a property of tests

Does the test measure what you think it measures?

6
Note the difference

Validity exists when actions based on the
interpretation are justified given a theoretical
basis and social consequences

Does the test measure what you think it measures?

7
A problem with the CSEPT view

Who is to say the social consequences of test
use are good or bad?

According to CSEPT validity is a subjective
judgment
In my view, this makes the concept useless if
you like the result the test gives you, you will
consider it valid. If you dont, you wont.
Thats not how scientists think.

8
Borsboom et al. (2004)

Borsboom et al reject CSEPTs view

Validity is a very basic concept and was
correctly formulated, for instance, by Kelley
(1927, p. 14) when he stated that a test is valid
if it measures what it purports to measure. (p.
1061)

9
Borsboom et al. (2004)

Variations in what you are measuring cause
variations in your measurements.

E.g., variations across people in intelligence
cause variations in their IQ scores
This is not a correlational model of validity

10
Borsboom et al. (2004)

You dont create a test and then do the analysis
necessary to establish its validity

Rather, you begin by doing the theoretical work
necessary to understand your subject and create a
valid test in the first place.
On this view, validity is not a big problem.

11
Borsboom et al. vs. CSEPT

Who is right?
Each scientist has to make up his or her own
mind on that question

I agree with Borsboom et al.s arguments.
Other psychologists may disagree.

12
The CSEPT view

CSEPT recognizes 3 types of evidence for test
validity
Content-related
Criterion-related
Construct-related
Boundaries not clearly defined

Cronbach (1980) Construct is basic, while
Content Criterion are subtypes.

13
Parenthetical Point Face Validity

Face validity refers to the appearance that a
test measures what it is intended to measure.

Face validity has P.R. value test-takers may
have better motivation if the test appears to be
a sensible way to measure what it measures.

14
Content validity CSEPT

Content-related evidence considers coverage of
the conceptual domain tested.

Important in educational settings
Like face validity, it is determined by logic
rather than statistics
Typically assessed by expert judges

15
Content validity CSEPT

Construct-irrelevant variance
arises when irrelevant items are included
or when external factors such as illness
influence test scores
requires a judgment about what is truly
external

Construct under-representation
Is domain adequately covered or are parts of it
left out?

16
Content validity Borsboom et al.

Borsboom et al. would say that content validity
is not something to be established after the test
has been created.

Rather, you build it into your test by having a
good theory of what you are testing

17
Criterion validity CSEPT

Criterion-related evidence tells us how well a
test score corresponds to a particular criterion
measure.

Generally, we want the test score to tell us
something about the criterion score.
How well the test does this provides
criterion-related evidence

18
Criterion validity CSEPT

CSEPT we could compare undergraduate GPAs to SAT
scores to produce evidence of validity of
conclusions draw on basis of SAT scores.

Two basic types
Predictive
Concurrent

19
Criterion validity CSEPT

Predictive validity

Test scores used to predict future performance
how good is the prediction?
E.g., SAT is used to predict final undergraduate
GPA
SAT GPA are moderately correlated

20
Criterion validity CSEPT

Predictive validity
Concurrent validity

Correlation between test scores and criterion
when the two are measured at same time.
Test illuminates current performance rather than
predicting future performance (e.g., why does
patient have a temperature? Why cant student do
math?)

21
Criterion validity Borsboom et al.

Criterion validity involves a correlation, of
test scores with some criterion such as GPA

That does not establish the tests validity, only
its utility.
E.g., height and weight are correlated, but a
test of height is not a test of what bathroom
scales measure.

22
Criterion validity Borsboom et al.

SAT is valid because it was developed on the
sensible theory that past academic achievement
is a good guide to future academic achievement

Validity is built into the test, not established
after the test has been created

23
Criterion validity

Note no point in developing a test if you
already have a criterion unless impracticality
or expense makes use of the criterion difficult.

Criterion measure only available in the future?
Criterion too expensive to use?

24
Criterion validity

Validity Coefficient

Compute correlation (r) between test score and
criterion.
r .30 or .40 would be considered normal.
r gt .60 is rare
Note r varies between -1.0 and 1.0

25
Criterion validity

Validity Coefficient

r2 gives proportion of variance in criterion
explained by test score.
E.g., if rxy .30, r2 .09, so 9 of
variability in Y can be explained by variation
in X

26
Interpreting validity coefficients

Watch out for

Changes in causal relationships
What does criterion mean? Is it valid, reliable?
Is subject population for validity study
appropriate?
Sample size

27
Interpreting validity coefficients

Watch out for

Criterion/predictor confusion
Range restrictions
Do validity study results generalize?
Differential predictions

28
Construct validity CSEPT

Problem for many psychological characteristics
of interest there is no agreed-upon universe of
content and no clear criterion

We cannot assess content or criterion validity
for such characteristics
These characteristics involve constructs
something built by mental synthesis.

29
Construct validity CSEPT

Examples of constructs
Intelligence
Love
Curiosity
Mental health

CSEPT We obtain evidence of validity by
simultaneously defining the construct and
developing instruments to measure it.
This is bootstrapping.

30
Bootstrapping construct validity

assemble evidence about what a test means in
other words, about the characteristic it is
testing.

CSEPT this process is never finished

31
Bootstrapping construct validity

assemble evidence about what a test means in
other words, about the characteristic it is
testing.

Borsboom this is part of the process of creating
the test in the first place, not something done
after the fact

32
Bootstrapping construct validity

assemble evidence
show relationships between a test and other tests

CSEPT none of the other tests is a criterion but
the web of relationships tells us what the test
means

33
Bootstrapping construct validity

assemble evidence
show relationships between a test and other tests

Borsboom these relationships do not tell us what
a test score means
(e.g., age is correlated with annual income but a
measure of age is not a measure of annual income).

34
Bootstrapping construct validity

assemble evidence
show relationships
each new relationship adds meaning to the test

CSEPT a tests meaning is gradually clarified
over time

35
Bootstrapping construct validity

assemble evidence
show relationships
each new relationship adds meaning to the test

Borsboom would say, why all the mystery? The
meaning of many tests (e.g., WAIS, academic
exams, Piagets tests) is clear right from the
start

36
Construct validity

Example from text Rubins work on Love.

Rubin collected a set of items for a Love scale
He read poetry, novels he asked people for
definitions
created a scale of Love and one of Liking

37
CSEPT Construct validity

Rubin gave scale to many subjects
factor-analyzed results

Love integrates Attachment, Caring, Intimacy
Liking integrates Adjustment, Maturity, Good
Judgment, and Intelligence
The two are independent you can love someone you
dont like (as song-writers know)

38
Rubins study of Love

Borsboom et al. when creating a test, the
researcher specifies the processes that convey
the effect of the measured attribute on the test
score.

Rubin laboriously built a theory about what the
construct Love means.
Rubins process reading poetry and novels,
asking people for definitions was a good
process, so his test has construct validity.

39
Campbell Fiske (1959)

Two types of Construct-related Evidence
Convergent evidence

When a test correlates well with other tests
believed to measure the same construct

40
Campbell Fiske (1959)

Two types of Construct-related Evidence
Convergent evidence
Discriminant evidence

When a test does not correlate with other tests
believed to measure some other construct.

41
Convergent validity

Example Health Index

Scores correlated with age, number of symptoms,
chronic medical conditions, physiological
measures
Treatments designed to improve health should
increase Health Index scores. They do.

42
Discriminant validity

Low correlations between new test and tests
believed to tap unrelated constructs.

Evidence that the new test measures something
unique

43
Validity Reliability CSEPT

CSEPT No point in trying to establish validity
of an unreliable test.

Its possible to have a reliable test that is not
valid (has no meaning).
Logically impossible to produce evidence of
validity for an unreliable test.

44
Validity Reliability Borsboom

Borsboom et al what does it mean to say that a
test is reliable but not valid?

What is it a test of?
It isnt a test at all, just a collection of
items

45
Blanton Jaccard arbitrary metrics

We observe a behavior in order to learn about the
underlying psychological characteristic
A persons test score represents their standing
on that underlying dimension

Such scores form an arbitrary metric
That is, we do not know how the observed scores
are related to the true scores on the underlying
dimension

46
Person A
Person B
Underlying dimension
Neutral
Test 1
0
1
2
3
4
5
6
Test 2
6
5
4
3
2
1
0
Adapted from Blanton Jaccard (2006) Figure 1,
p. 29
47
Arbitrary metrics the IAT

Implicit Association Test (IAT) claimed to
diagnose implicit attitudinal preferences or
racist attitudes

IAT authors say you may have prejudices you dont
know you have.
Are these claims true?

48
Arbitrary metrics the IAT

Task categorize stimuli using 2 pairs of
categories

2 buttons to press, 2 assignments of categories
to buttons, used in sequence

49
Arbitrary metrics the IAT

Assignment pattern A
Button 1 press if stimulus refers to the
category White or the category Pleasant
Button 2 press if stimulus refers to the
category Black or the category Unpleasant

Assignment pattern B
Button 1 press if stimulus refers to the
category White or the category Unpleasant
Button 2 press if stimulus refers to the
category Black or the category Pleasant

50
Arbitrary metrics the IAT

IAT authors claim that if responses are faster to
Pattern A than to Pattern B, that indicates a
preference for Whites over Blacks in other
words, a racist attitude

IAT authors also give test-takers feedback about
how strong their preferences are, based on how
much faster their responses are to Pattern A than
to Pattern B
This is inappropriate

51
Arbitrary metrics the IAT

Blanton Jaccard

The IAT does not tell us about racist attitudes
IAT authors take a dimension which is
non-arbitrary when used by physicists time
and use it in an arbitrary way in psychology

52
Arbitrary metrics the IAT

The function relating the response dimension
(time) to the underlying dimension (attitudes) is
unknown

Zero on the (Pattern A Pattern B) difference
may not be zero on the underlying attitude
preference dimension
There are alternative models of how that (Pattern
A Pattern B) difference could arise

53
Review

CSEPT
Validity is a characteristic of evidence, not of
tests.
Valid evidence supports conclusions drawn using
test results
Validity is determined by social consequences of
test use

Borsboom et al.
Validity is not a methodological issue, but a
substantive (theoretical) issue
A test of an attribute is valid if (a) the
attribute exists, and (b) variation in the
attribute causes variation in test scores

54
Review

CSEPT
Validity can be established in three ways, though
boundaries between them are fuzzy
Content-related evidence
Criterion-related evidence
Construct-related evidence

Borsboom et al
Its all the same validity a test is valid if it
measures what you think it measures
Validity is not mysterious

55
Review

CSEPT
Content-related evidence do test items represent
whole domain of interest?
Criterion-related evidence do test scores relate
to a criterion either now (concurrent) or in the
future (predictive)?

Borsboom et al.
These questions are properly part of the process
of creating a test

56
Review

CSEPT
Construct-related evidence is obtained when we
develop a psychological construct and the way to
measure it at the same time.
A test can be reliable but not valid. A test
cannot be valid if not reliable.

Borsboom et al.
A test must be valid for a reliability estimate
to have any meaning

57
Review

Blanton Jaccard (2006) warn against
over-interpretation of scores which are based on
an arbitrary metric

For an arbitrary metric, we have no idea how the
test scores are actually related to the
underlying dimension

Write a Comment

User Comments (0)

About PowerShow.com

Validity%20 - PowerPoint PPT Presentation

Validity%20

We cannot assess content or criterion validity for such characteristics ... Validity is a characteristic of evidence, not of tests. ... – PowerPoint PPT presentation