Title: Validity
1Validity
- In our last class, we began to discuss some of
the ways in which we can assess the quality of
our measurements. - We discussed the concept of reliability (i.e.,
the degree to which measurements are free of
random error).
2Why reliability alone is not enough
- Understanding the degree to which measurements
are reliable, however, is not sufficient for
evaluating their quality. - In-class scale example
- Recall that test-retest estimates of reliability
tend to range between 0 (low reliability) and 1
(high reliability) - Note An on-line correlation calculator is
available at http//easycalculation.com/statistics
/correlation.php
3Validity
- In this example, the measurements appear
reliable, but there is a problem . . . - Validity reflects the degree to which
measurements are free of both random error, E,
and systematic error, S. - O T E S
- Systematic errors reflect the influence of any
non-random factor beyond what were attempting to
measure.
4Validity Does systematic error accumulate?
- Question If we sum or average multiple
observations (i.e., using a multiple indicators
approach), how will systematic errors influence
our estimates of the true score?
5Validity Does error accumulate?
- Answer Unlike random errors, systematic errors
accumulate. - Systematic errors exert a constant source of
influence on measurements. We will always
overestimate (or underestimate) T if systematic
error is present.
6Note Each measurement is 2 points higher than
the true value of 10. The errors do no average
out.
7Note Even when random error is present, E
averages to 0 but S does not. Thus, we have
reliable measures that have validity problems.
8Validity Ensuring validity
- What can we do to minimize the impact of
systematic errors? - One way to minimize their impact is to use a
variety of indicatorsdifferent sources of
information. - Different kinds of indicators of a latent
variable may not share the same systematic errors - If true, then S will behave like random error
across measurements (but not within measurements)
9Example
- As an example, lets consider the measurement of
self-esteem. - Some methods, such as self-report questionnaires,
may lead people to over-estimate their
self-esteem. Most people want to think highly of
themselves. - Other methods, such as clinical ratings by
trained observers, may lead to under-estimates of
self-esteem. Clinicians, for example, may be
prone to assume that people are not as well-off
as they say they are.
10Self-reports
Clinical ratings
Note Method 1 systematically overestimates T
whereas Method 2 systematically underestimates T.
In combination, however, those systematic errors
cancel out.
11Another example
- One problem with the use of self-report
questionnaire rating scales is that some people
tend to give high (or low) answers consistently
(i.e., regardless of the question being asked). - This is sometimes referred to as a yay-saying
or nay-saying bias.
121 strongly disagree 5 strongly agree
Item T S O
I think I am a worthwhile person. 4 1 5
I have high self-esteem. 4 1 5
I am confident in my ability to meet challenges in life. 4 1 5
My friends and family value me as a person. 4 1 5
Average score 4 1 5
In this example, we have someone with relatively
high self-esteem, but this person systematically
rates questions one point higher than he or she
should.
131 strongly disagree 5 strongly agree
If we reverse key half of the items, the bias
averages out. Responses to reverse keyed items
are counted in the opposite direction. T (4 4
6-2 6-2) / 4 4 O (5 5 6-3
6-3) / 4 4
Item T S O
I think I am a worthwhile person. 4 1 5
I have high self-esteem. 4 1 5
I am NOT confident in my ability to meet challenges in life. 2 1 3
My friends and family DO NOT value me as a person. 2 1 3
Average score 4 1 4
14Validity
- To the extent to which a measure has validity, we
say that it measures what it is supposed to
measure - Question How do you assess validity?
-
Very tough question to answer!
15Different ways to think about validity
- To the extent that a measure has validity, we can
say that it measures what it is supposed to
measure. - There are different reasons for measuring
psychological variables. The precise way in
which we assess validity depends on the reason
that were taking the measurements in the first
place.
16Prediction
- As an example, if ones goal is to develop a way
to determine who is at risk for developing
schizophrenia, ones goal is prediction.
17Predictive Validity
- We may begin by obtaining a group of people who
have schizophrenia and a group of people who do
not. - Then, we may try to figure out which kinds of
antecedent variables differentiate the two groups.
18Correct classifications
Lost a parent before the age of 10 10
Parent or grandparent had schizophrenia 50
Mother was cold and aloof to the person when he or she was a child 15
19Predictive Validity
- In short, some of these variables appear to be
better than others at discriminating
schizophrenics from non-schizophrenics - The degree to which a measure can predict what it
is supposed to predict is called its predictive
validity. - When we are taking measurements for the purpose
of prediction, we assess validity as the degree
to which those predictions are accurate or useful.
20Reality Schizophrenic
Yes
10
No
Measure Schizophrenic
40
Yes
21Reality Schizophrenic
Yes
No
10
10
No
Measure Schizophrenic
40
40
Yes
50 ( 40 10 / 100) people were correctly
classified (with a 50 base rate. Yuck.)
22Reality Schizophrenic
Yes
No
0
98
No
Measure Schizophrenic
1
1
Yes
99 ( 98 1 / 100) people were correctly
classified, but note the base rate problem.
Cohens kappa is used to account for this
problem. Kappa in this example is 66
23Construct Validity
- Sometimes were not interested in measuring
something just for technological purposes, such
as prediction. - We may be interested in measuring a construct in
order to learn more about it - Example We may be interested in measuring
self-esteem not because we want to predict
something with the measure per se, but because we
want to know how self-esteem develops, whether it
develops differently for males and females, etc.
24Construct Validity
- Notice that this is much different than what we
were discussing before. In our schizophrenia
example, it doesnt matter whether our measure of
schizophrenia really measured schizophrenic
tendencies per se. - As long as the measure helps us predict
schizophrenia well, we dont really care what it
measures.
25Construct Validity
- When we are interested in the theoretical
construct per se, however, the issue of exactly
what is being measured becomes much more
important. - The general strategy for assessing construct
validity involves (a) explicating the theoretical
relations among relevant variables and (b)
examining the degree to which the measure of the
construct relates to things that it should and
fails to relate to things that it should not.
26Nomological Network
- The nomological network represents the
interrelations among variables involving the
construct of interest.
achieve in school
ability to cope
self- esteem
-
distrust friends
27Nomological Network Validity
- The process of assessing construct validity
basically involves determining the degree to
which our measure of the construct behaves in the
way assumed by the theoretical network in which
it is embedded. - If, theoretically, people with high self-esteem
should be more likely to succeed in school, then
our measure of self-esteem should be able to
predict peoples grades in school.
28Construct Validity
- Notice here that establishing construct validity
involves prediction. The difference between
prediction in this context and prediction in the
previous context is that we are no longer trying
to predict school performance as best as we
possibly can. - Our measure of self-esteem should only predict
performance to the degree to which we would
expect these two variables to be related
theoretically.
29Discriminant Validity
- The measure should also fail to be related to
variables that, theoretically, are unrelated to
self-esteem. - The ability of a measure to fail to predict
irrelevant variables is referred to as the
measures discriminant validity.
achieve in school
ability to cope
self- esteem
-
like coffee
distrust friends
30Validity Assessing validity
- Finally, it is useful, but not necessary, for a
measure to have face validity. - Face validity The degree to which a measure
appears to measuring what it is supposed to
measure. - A questionnaire item designed to measure
self-esteem that reads I have high self-esteem
has face validity. An item that reads I like
cabbage in my Frosted Flakes does not. - In the context of prediction, face validity
doesnt matter. In the context of construct
validity, it matters more.
31A Final Note on Construct Validity
- The process of establishing construct validity is
one of the primary enterprises of psychological
research. - When we are measuring the association between two
variables to assess a measures predictive or
discriminant validity, we are evaluating both (a)
the quality of the measure and (b) the soundness
of the nomological network. - It is not unusual for researchers to refine the
nomological network as they learn more about how
various measures are inter-related.