Title: Ethical and Social Implications of Testing
1(No Transcript)
2Basic Concepts of Validity
- ??? ??
- ??? ??
- ?????????
- ??????,??????
3VALIDITY
- The standard tripartite division of basic
concepts of validity content, criterion-related,
and construct validity - Extravalidity concerns include side effects and
unintended consequences of testing. - Validity must b built into the test from the
outset rather than being limited to the final
stage of test development. - The validity of a test is the extent to which it
measures what it claims to measure.
(Gregory, 2007, p.119-120)
4VALIDITY (Cont.)
- Validity defines the meaning of test scores.
Reliability is important, too, but only insofar
as it constrains validity. - To the extent that a test is unreliable, it
cannot be valid. Reliability is a necessary but
not a sufficient precursor of validity. - Test validation is a developmental process that
begins with test construction and continues
indefinitely. - Test validity hinges upon the accumulations of
research findings.
(Gregory, 2007, p.119-120)
5VALIDITY A DEFINITION
- A definition of validity paraphrased from the
influential Standards for Educational and
Psychological Testing (AERA, APA, NCME, 1985,
1999, as cited in Gregory, 2007, p.101) A test
is valid to the extent that inferences made from
it are appropriate, meaningful, and useful. - Classical test theory was the basis for test
development throughout most of the twentieth
century.
(Gregory, 2007, p.120-121)
6VALIDITY A DEFINITION (Cont.)
- Validity reflects an evolutionary, research-based
judgment of how adequately a test measures the
attribute it was designed to measure. - The validity of test is characterized on a
continuum ranging from weak to acceptable to
strong. - The traditionally different ways of accumulating
validity evidence have been grouped into three
categories - Content validity
- Criterion-related validity
- Construct validity
(Gregory, 2007, p.120-121)
7CONTENT VALIDITY
- Content validity is determined by the degree to
which the questions, tasks or items on a test are
representative of the universe of behavior the
test was designed to sample. - Content validity is a useful concept when a great
deal is known about the variable that the
researcher wishes to measure. - When evaluating content validity, response
specification is also an integral part of
defining the relevant universe of behaviors.
(Gregory, 2007, p.121-122)
8CONTENT VALIDITY (Cont.)
- Content validity is more difficult to assure when
the test measures an ill-defined trait. What
usually passes for content validity is the
considered opinion of expert judges. - The test developer asserts that a panel of
experts reviewed the domain specification
carefully and judged the following test questions
to posses content validity
(Gregory, 2007, p.121-122)
9Quantification of Content Validity (Cont.)
content validity 87/ (44587)
(Gregory, 2007, p.122-123)
10Quantification of Content Validity (Cont.)
- A coefficient of content validity is just one
piece of evidence in the evaluation of a test. - The commonsense approach to content validity
cannot identify nonexistent items that should be
added to a test to help make the pool of
questions more representative of the intended
domain.
(Gregory, 2007, p.123)
11Face Validity
- A test has face validity if it looks valid to
test users, examiners, and especially the
examinees. - Face validity is really a matter of social
acceptability. - Face validity should not be confused with
objective validity, which is determined by the
relationship of test scores to other sources of
information.
(Gregory, 2007, p.123)
12CRITERION-RELATED VALIDITY
- Criterion-related validity is demonstrated when a
test is shown to be effective in estimating an
examinees performance on some outcome measure.
The variable of primary interest is the outcome
measure, called a criterion. - Concurrent Validity - the criterion measure are
obtained at approximately the same time as the
test scores. - Predictive Validity the criterion measures are
obtained in the future usually months or years
after the test scores are obtained.
(Gregory, 2007, p.102)
13Characteristics of a Good Criterion
- A criterion is any outcome measure against which
a test is validated. - Criteria must be more than must imaginative, they
must also be reliable, appropriate, and free of
contamination from the test itself (criterion
contamination). - An unreliable criterion will be inherently
unpredictable, regardless of the merits of the
test. - The theoretical upper limit of the validity
coefficient is constrained by the reliability f
both the test and the criterion -
(Gregory, 2007, p.103)
14Concurrent Validity
- An evaluation of concurrent validity indicates
the extent to which test scores accurately
estimate an individuals present position on the
relevant criterion. - A test with demonstrated concurrent validity
provides a shortcut for obtaining information
that might otherwise require the extended
investment of professional time. - Correlations between a new test and existing test
are often cited as evidence of concurrent
validity. - The criterion tests must have been validated
through correlation with appropriate nontest
behavioral data. - The instrument being validated must measure the
same construct as the criterion tests.
(Gregory, 2007, p.125)
15Predictive Validity
- Predictive validity is particularly relevant for
entrance examinations and employment tests. - A regression equation describes the best-fitting
straight line for estimating the criterion form
the test.
(Gregory, 2007, p.125-126)
16Validity Coefficient and the Standard Error of
Estimate
- Perhaps the most popular approach to express the
relationship between test scores and criterion
measures is to compute the correlation between
test and criterion ( ) validity
coefficient. - The higher the validity coefficient , the more
accurate is the test in predicting the criterion.
(Gregory, 2007, p. 126-127)
17Validity Coefficient and the Standard Error of
Estimate (Cont.)
- The standard error of estimate (SEest) is the
margin of error to be expected in the predicted
criterion score. - The SEM indicates the margin of measurement error
caused by reliability of the test, whereas SEest
indicates the margin of prediction error caused
by the imperfect validity of the test.
(Gregory, 2007, p. 126-127)
18Decision Theory Applied to Psychological Tests
- Proponents of decision theory stress that the
purpose of psychological testing is not
measurement per se but measurement in the service
of decision making.
(Gregory, 2007, p. 127-128)
19Decision Theory Applied to Psychological Tests
(Cont.)
- Proponents of decision theory make two
fundamental assumptions about the use of
selection tests - The value of various outcome to the institution
can be expressed in terms of a common utility
scale. - In institutional selection decisions, the most
generally useful strategy is one that maximizes
the average gain on the utility scale (or
minimizes average loss) over many similar
decisions.
(Gregory, 2007, p. 127-128)
20Taylor-Russell Tables
- Taylor and Russell (1939) published the
statistical tables that permit a test user to
determine the expected proportion of successful
applicants selected with use of a test. - In order to use the Taylor-Russell tables, the
tester must specify (1) the predictive validity
of the test, (2) the selection ratio, and (3) the
base rate for successful applicants. A change in
any of these factors will alter the selection
accuracy of the test.
(Gregory, 2007, p.129-131)
21Taylor-Russell Tables (Cont.)
- The base rate is the proportion of successful
applicants who would be selected using current
method, without benefit of the new test. - When all three of these factors are known, the
Taylor-Russell tables can be consulted to
determine the proportion of successes expected
through the application of the test. In this
manner, the test user can determine the extent
to which using a new test would improve selection
over the base rate obtained from existing
methods.
(Gregory, 2007, p.129-131)
22Taylor-Russell Tables (Cont.)
(Gregory, 2007, p.130)
23Taylor-Russell Tables (Cont.)
- The most intriguing conclusion to emerge from the
Taylor-Russell table is that tests with poor
validity can, nonetheless, substantially improve
selection accuracy if the selection ratio is
low enough,
(Gregory, 2007, p.129-131)
24CONSTRUCT VALIDITY
- A Construct validity is at theoretical,
intangible quality of trait in which individual
differ (Messick, 1995, as cited in Gregory, 2007,
p.107). - A test designed to measure a construct must
estimate the existence of an inferred, underlying
characteristic (e.g., leadership ability) based
on a limited sample of behavior. - Construct validity refers to the appropriateness
of these inferences about the underlying
construct.
(Gregory, 2007, p. 131)
25CONSTRUCT VALIDITY (Cont.)
- All psychological construct possess two
characteristics in common - There is no single external referent sufficient
to validate the existence of the construct that
is, the construct cannot be operationally
defined. - Nonetheless, a network of interlocking
suppositions can be derived form existing theory
about the construct. - Construct validity pertains to psychological
tests that claim to measure complex,
multifaceted, and theory-bound psychological
attribute such as leadership ability,
intelligence, and the like.
(Gregory, 2007, p.131-132)
26CONSTRUCT VALIDITY (Cont.)
- The crucial point to understand about construct
validity is that no criterion or universe of
content validity is accepted as entirely adequate
to define the quality to be measured (Cronbach
Meehl, 1955, as cited in Gregory, 2007, p.132). - To evaluate the construct validity of a test, we
must amass a variety of evidence form numerous
sources. - Individual studies of content, concurrent, and
predictive validity are regarded merely as
supportive evidence in the cumulative quest for
construct validation.
(Gregory, 2007, p.131-132)
27APPROACHES TO CONSTRUCT VALIDITY
- Most studies of construct validity tall into one
of the following categories - Analysis to determine whether the test items or
subtests are homogeneous and therefore measure a
single construct. - Study of development changed to determine whether
they are consistent with the theory of the
construct. - Research to ascertain whether group differences
on test scores are theory-consistent. - Analysis to determine whether intervention
effects on test scores are theory-consistent.
(Gregory, 2007, p.132)
28APPROACHES TO CONSTRUCT VALIDITY (Cont.)
- Correlation of the test with other related and
unrelated tests and measures. - Factor analysis of test scores in relation to
other sources of information. - Analysis to determine whether test scores allow
for the correct classification of examinees.
(Gregory, 2007, p.108)
29Test Homogeneity
- If a test measures a single construct, the its
component items (or subtests) likely will be
homogeneous (also referred to as internally
consistent). - The aim of test development is to select items
that form a homogeneous scale. - Homogeneity is an important first step in
certifying the construct validity of a new test,
but standing alone it is weak.
(Gregory, 2007, p.108-110)
30Appropriate Developmental Changes
- Many constructs be assume to show regular
age-graded changes form early childhood in to
mature adulthood and perhaps beyond (e.g.,
vocabulary knowledge). - This approach does not provide information about
how the construct relates to other constructs.
(Gregory, 2007, p.133)
31Theory-Consistent Group Differences
- One way to bolster the validity of a new
instrument is to show that, on average, persons
with different backgrounds and characteristics
obtain theory-consistent scores on the test. - Crandall (1981) developed a social interest scale
that illustrates the use of theory-consistent
group differences in the process of construct
validation.
(Gregory, 2007, p.133)
32Theory-Consistent Intervention Effects
- Test scores change in appropriate direction and
amount in reaction to planned or unplanned
interventions. - Willis and Schaie (1986)
(Gregory, 2007, p.133-134)
33Convergent and Discriminant Validation
- Convergent validity is demonstrated when a test
correlated highly with other variables or tests
with which it shares an overlap of constructs. - Discriminant validity is demonstrated when a test
does not correlate with variables or test from
which it should differ. - Campbell and Fiske (1959) proposed a systematic
experimental design for simultaneously confirming
the convergent and discriminant validities of a
psychological test, called the multitrait-multimet
hod matrix. This matrix is a rich resource of
data on reliability, convergent validity, and
discriminant validity.
(Gregory, 2007, p.134-135)
34Convergent and Discriminant Validation (Cont.)
(Gregory, 2007, p.135)
35Factor Analysis
- Factor analysis is a specialized statistical
technique that is particularly useful for
investigating construct validity. - The purpose of factor analysis is to identify the
minimum number if determiners (factors) required
to account for the intercorrelations among a
battery of tests. - The goal in factor analysis is to find a smaller
set of dimension, called factors, that can
account for the observed array of
intercorrelations among individual tests.
(Gregory, 2007, p.135-136)
36Factor Analysis (Cont.)
- A factor loading is actually a correlation
between an individual test and a single factor - The final outcome of a factor analysis is a table
depicting the correlation of each test with each
factor. - A table of factor loadings help describe the
factorial composition of a test and thereby
provides information relevant to construct
validity.
(Gregory, 2007, p.114)
37Factor Analysis (Cont.)
(Gregory, 2007, p.136)
38Classification Accuracy
- Many tests are used for screening purposes to
identify examinees who meet (or dont meet)
certain diagnostic criteria. For these
instruments, accurate classification is an
essential index of validity. - The MMSE is one of the most widely researched
screening tests in existence. In exploring its
utility, researchers have paid special attention
to two psychometric features that bear upon
validity sensitivity and specificity.
(Gregory, 2007, p.137)
39Classification Accuracy (Cont.)
- Sensitivity has to do with accurate
identification of patients who have a syndrome
(e.g., dementia). - Specificity has to do with accurate
identification of normal patients. - THE concepts of sensitivity and specificity are
chiefly helpful in dichotomous diagnostic
situations in which individual are presumed
either to manifest a syndrome or not. - Screening tests typically provide a cutoff score
used to identify possible cases of the syndrome
in queston.
(Gregory, 2007, p.137)
40Classification Accuracy (Cont.)
- In general, the validity of a screening test is
bolstered to the extent that if possesses both
high sensitivity and high specificity. - There are no exact cutoffs, but for many purposes
a test will need sensitivity and specificity that
exceed 80 or 90 percent in order to justify its
use. - The reality of assessment is that the examiner
must choose a cutoff score that provides a
balance between the sensitivity and specificity. - Sensitivity and specificity typify an inverse
relationship in every case.
(Gregory, 2007, p.137-139)
41Classification Accuracy (Cont.)
(Gregory, 2007, p.138)
42EXTRAVALIDITY CONCERS AND THE WIDENING SCOPE OF
TEST VALIDITY
- Extravalidity concerns include side effects and
unintended consequences of testing. - Even if a test is valid, unbiased, and fair, the
decision to use it may ne governed by additional
considerations.
(Gregory, 2007, p.139)
43Unintended Side Effects of Testing
- The examiner must determine whether the benefits
of giving the test outweigh the costs of the
potential side effects. Furthermore, by
anticipating unintended side effects, the
examiner might be able to deflect or diminish
them. - A consideration of side effects should influence
an examiners decision to use a particular test
for a specified purpose.
(Gregory, 2007, p.139-140)
44The Widening Scope of Test Validity
- Several psychometric theoreticians have
introduced a wider, functionalist definition of
validity that asserts that a test is valid if it
serves the purpose for which it is used. - Test validity, then, is an overall evaluative
judgment of the adequacy and appropriateness of
inferences and actions that flow from test
scores.
(Gregory, 2007, p.140-141)
45The Widening Scope of Test Validity (Cont.)
- Messick (1980,1995, as cited in Gregory,
2007,p.141) argues that the new, wider conception
of validity rests on four bases - Traditional evidence of construct validity.
- An analysis of the value implications of the test
interpretation. - Evidence for the usefulness of test
interpretation. - An appraisal of the potential and actual social
consequences, including side effects, from test
use. - A valid test is one that answers well to all four
facets of test validity.
(Gregory, 2007, p.140-141)
46The Widening Scope of Test Validity (Cont.)
- Psychological measurement is not a neutral
endeavor, it is an applied science that occurs in
a social and political context.
(Gregory, 2007, p.140-141)
47SUMMARY
- ????????????????????(??)??????????????????????????
?????????????????????????,????????????????? - ?????????????????? ???????????????????,????????,?
????????????????? - ???????????(????????????)??????????????????,??????
??????????????????????????????(??????)??????,???
????????(????)????????
(Gregory, 2007, p.141)
48SUMMARY (Cont.)
- ??????????????????????????????????????????????????
?,??????????,???????? - ??????????????????????????????????????????????????
??????????????????????????????????????????????????
- ??????????????,????????????,??????????????????????
????? (?????????????????????)?
(Gregory, 2007, p.141)
49SUMMARY (Cont.)
- ??????????????( )?????,?????????????
- ????? (SEest)????????????????????????
? - ??????,???????????????????????????(??????)????????
??????????????????????????????????????,??????????
?????????????????,????????
(Gregory, 2007, p.141)
50SUMMARY (Cont.)
- ???????????????????????????????????????????????
???????????????????,?????????? - ????????????????,?????????????????????????????????
?????????????,?????????
(Gregory, 2007, p.142)
51SUMMARY (Cont.)
- ?????????????????? ??????????????????????????????
??????????????????????????????????????????????????
????????????????? - ??????????????????????(??,????????????????????????
?????????????)????????????????????????????????????
- ????????????????????,??????????????????
52REFERENCE
- Gregory, R.J. (2007). Psychological testing
history, principles, and applications (5th ed.).
Boston, MA Pearson Education, Inc.