Ethical and Social Implications of Testing

About This Presentation

Title:

Ethical and Social Implications of Testing

Description:

Correlations between a new test and existing test are often cited as evidence of ... One way to bolster the validity of a new instrument is to show that, on average, ... – PowerPoint PPT presentation

Number of Views:234

Avg rating:3.0/5.0

Slides: 53

Provided by: psy95

Category:

more less

Transcript and Presenter's Notes

Title: Ethical and Social Implications of Testing

1
(No Transcript)
2
Basic Concepts of Validity

??? ??
??? ??
?????????
??????,??????

3
VALIDITY

The standard tripartite division of basic
concepts of validity content, criterion-related,
and construct validity
Extravalidity concerns include side effects and
unintended consequences of testing.
Validity must b built into the test from the
outset rather than being limited to the final
stage of test development.
The validity of a test is the extent to which it
measures what it claims to measure.

(Gregory, 2007, p.119-120)
4
VALIDITY (Cont.)

Validity defines the meaning of test scores.
Reliability is important, too, but only insofar
as it constrains validity.
To the extent that a test is unreliable, it
cannot be valid. Reliability is a necessary but
not a sufficient precursor of validity.
Test validation is a developmental process that
begins with test construction and continues
indefinitely.
Test validity hinges upon the accumulations of
research findings.

(Gregory, 2007, p.119-120)
5
VALIDITY A DEFINITION

A definition of validity paraphrased from the
influential Standards for Educational and
Psychological Testing (AERA, APA, NCME, 1985,
1999, as cited in Gregory, 2007, p.101) A test
is valid to the extent that inferences made from
it are appropriate, meaningful, and useful.
Classical test theory was the basis for test
development throughout most of the twentieth
century.

(Gregory, 2007, p.120-121)
6
VALIDITY A DEFINITION (Cont.)

Validity reflects an evolutionary, research-based
judgment of how adequately a test measures the
attribute it was designed to measure.
The validity of test is characterized on a
continuum ranging from weak to acceptable to
strong.
The traditionally different ways of accumulating
validity evidence have been grouped into three
categories
Content validity
Criterion-related validity
Construct validity

(Gregory, 2007, p.120-121)
7
CONTENT VALIDITY

Content validity is determined by the degree to
which the questions, tasks or items on a test are
representative of the universe of behavior the
test was designed to sample.
Content validity is a useful concept when a great
deal is known about the variable that the
researcher wishes to measure.
When evaluating content validity, response
specification is also an integral part of
defining the relevant universe of behaviors.

(Gregory, 2007, p.121-122)
8
CONTENT VALIDITY (Cont.)

Content validity is more difficult to assure when
the test measures an ill-defined trait. What
usually passes for content validity is the
considered opinion of expert judges.
The test developer asserts that a panel of
experts reviewed the domain specification
carefully and judged the following test questions
to posses content validity

(Gregory, 2007, p.121-122)
9
Quantification of Content Validity (Cont.)
content validity 87/ (44587)
(Gregory, 2007, p.122-123)
10
Quantification of Content Validity (Cont.)

A coefficient of content validity is just one
piece of evidence in the evaluation of a test.
The commonsense approach to content validity
cannot identify nonexistent items that should be
added to a test to help make the pool of
questions more representative of the intended
domain.

(Gregory, 2007, p.123)
11
Face Validity

A test has face validity if it looks valid to
test users, examiners, and especially the
examinees.
Face validity is really a matter of social
acceptability.
Face validity should not be confused with
objective validity, which is determined by the
relationship of test scores to other sources of
information.

(Gregory, 2007, p.123)
12
CRITERION-RELATED VALIDITY

Criterion-related validity is demonstrated when a
test is shown to be effective in estimating an
examinees performance on some outcome measure.
The variable of primary interest is the outcome
measure, called a criterion.
Concurrent Validity - the criterion measure are
obtained at approximately the same time as the
test scores.
Predictive Validity the criterion measures are
obtained in the future usually months or years
after the test scores are obtained.

(Gregory, 2007, p.102)
13
Characteristics of a Good Criterion

A criterion is any outcome measure against which
a test is validated.
Criteria must be more than must imaginative, they
must also be reliable, appropriate, and free of
contamination from the test itself (criterion
contamination).
An unreliable criterion will be inherently
unpredictable, regardless of the merits of the
test.
The theoretical upper limit of the validity
coefficient is constrained by the reliability f
both the test and the criterion

(Gregory, 2007, p.103)
14
Concurrent Validity

An evaluation of concurrent validity indicates
the extent to which test scores accurately
estimate an individuals present position on the
relevant criterion.
A test with demonstrated concurrent validity
provides a shortcut for obtaining information
that might otherwise require the extended
investment of professional time.
Correlations between a new test and existing test
are often cited as evidence of concurrent
validity.
The criterion tests must have been validated
through correlation with appropriate nontest
behavioral data.
The instrument being validated must measure the
same construct as the criterion tests.

(Gregory, 2007, p.125)
15
Predictive Validity

Predictive validity is particularly relevant for
entrance examinations and employment tests.
A regression equation describes the best-fitting
straight line for estimating the criterion form
the test.

(Gregory, 2007, p.125-126)
16
Validity Coefficient and the Standard Error of
Estimate

Perhaps the most popular approach to express the
relationship between test scores and criterion
measures is to compute the correlation between
test and criterion ( ) validity
coefficient.
The higher the validity coefficient , the more
accurate is the test in predicting the criterion.

(Gregory, 2007, p. 126-127)
17
Validity Coefficient and the Standard Error of
Estimate (Cont.)

The standard error of estimate (SEest) is the
margin of error to be expected in the predicted
criterion score.
The SEM indicates the margin of measurement error
caused by reliability of the test, whereas SEest
indicates the margin of prediction error caused
by the imperfect validity of the test.

(Gregory, 2007, p. 126-127)
18
Decision Theory Applied to Psychological Tests

Proponents of decision theory stress that the
purpose of psychological testing is not
measurement per se but measurement in the service
of decision making.

(Gregory, 2007, p. 127-128)
19
Decision Theory Applied to Psychological Tests
(Cont.)

Proponents of decision theory make two
fundamental assumptions about the use of
selection tests
The value of various outcome to the institution
can be expressed in terms of a common utility
scale.
In institutional selection decisions, the most
generally useful strategy is one that maximizes
the average gain on the utility scale (or
minimizes average loss) over many similar
decisions.

(Gregory, 2007, p. 127-128)
20
Taylor-Russell Tables

Taylor and Russell (1939) published the
statistical tables that permit a test user to
determine the expected proportion of successful
applicants selected with use of a test.
In order to use the Taylor-Russell tables, the
tester must specify (1) the predictive validity
of the test, (2) the selection ratio, and (3) the
base rate for successful applicants. A change in
any of these factors will alter the selection
accuracy of the test.

(Gregory, 2007, p.129-131)
21
Taylor-Russell Tables (Cont.)

The base rate is the proportion of successful
applicants who would be selected using current
method, without benefit of the new test.
When all three of these factors are known, the
Taylor-Russell tables can be consulted to
determine the proportion of successes expected
through the application of the test. In this
manner, the test user can determine the extent
to which using a new test would improve selection
over the base rate obtained from existing
methods.

(Gregory, 2007, p.129-131)
22
Taylor-Russell Tables (Cont.)
(Gregory, 2007, p.130)
23
Taylor-Russell Tables (Cont.)

The most intriguing conclusion to emerge from the
Taylor-Russell table is that tests with poor
validity can, nonetheless, substantially improve
selection accuracy if the selection ratio is
low enough,

(Gregory, 2007, p.129-131)
24
CONSTRUCT VALIDITY

A Construct validity is at theoretical,
intangible quality of trait in which individual
differ (Messick, 1995, as cited in Gregory, 2007,
p.107).
A test designed to measure a construct must
estimate the existence of an inferred, underlying
characteristic (e.g., leadership ability) based
on a limited sample of behavior.
Construct validity refers to the appropriateness
of these inferences about the underlying
construct.

(Gregory, 2007, p. 131)
25
CONSTRUCT VALIDITY (Cont.)

All psychological construct possess two
characteristics in common
There is no single external referent sufficient
to validate the existence of the construct that
is, the construct cannot be operationally
defined.
Nonetheless, a network of interlocking
suppositions can be derived form existing theory
about the construct.
Construct validity pertains to psychological
tests that claim to measure complex,
multifaceted, and theory-bound psychological
attribute such as leadership ability,
intelligence, and the like.

(Gregory, 2007, p.131-132)
26
CONSTRUCT VALIDITY (Cont.)

The crucial point to understand about construct
validity is that no criterion or universe of
content validity is accepted as entirely adequate
to define the quality to be measured (Cronbach
Meehl, 1955, as cited in Gregory, 2007, p.132).
To evaluate the construct validity of a test, we
must amass a variety of evidence form numerous
sources.
Individual studies of content, concurrent, and
predictive validity are regarded merely as
supportive evidence in the cumulative quest for
construct validation.

(Gregory, 2007, p.131-132)
27
APPROACHES TO CONSTRUCT VALIDITY

Most studies of construct validity tall into one
of the following categories
Analysis to determine whether the test items or
subtests are homogeneous and therefore measure a
single construct.
Study of development changed to determine whether
they are consistent with the theory of the
construct.
Research to ascertain whether group differences
on test scores are theory-consistent.
Analysis to determine whether intervention
effects on test scores are theory-consistent.

(Gregory, 2007, p.132)
28
APPROACHES TO CONSTRUCT VALIDITY (Cont.)

Correlation of the test with other related and
unrelated tests and measures.
Factor analysis of test scores in relation to
other sources of information.
Analysis to determine whether test scores allow
for the correct classification of examinees.

(Gregory, 2007, p.108)
29
Test Homogeneity

If a test measures a single construct, the its
component items (or subtests) likely will be
homogeneous (also referred to as internally
consistent).
The aim of test development is to select items
that form a homogeneous scale.
Homogeneity is an important first step in
certifying the construct validity of a new test,
but standing alone it is weak.

(Gregory, 2007, p.108-110)
30
Appropriate Developmental Changes

Many constructs be assume to show regular
age-graded changes form early childhood in to
mature adulthood and perhaps beyond (e.g.,
vocabulary knowledge).
This approach does not provide information about
how the construct relates to other constructs.

(Gregory, 2007, p.133)
31
Theory-Consistent Group Differences

One way to bolster the validity of a new
instrument is to show that, on average, persons
with different backgrounds and characteristics
obtain theory-consistent scores on the test.
Crandall (1981) developed a social interest scale
that illustrates the use of theory-consistent
group differences in the process of construct
validation.

(Gregory, 2007, p.133)
32
Theory-Consistent Intervention Effects

Test scores change in appropriate direction and
amount in reaction to planned or unplanned
interventions.
Willis and Schaie (1986)

(Gregory, 2007, p.133-134)
33
Convergent and Discriminant Validation

Convergent validity is demonstrated when a test
correlated highly with other variables or tests
with which it shares an overlap of constructs.
Discriminant validity is demonstrated when a test
does not correlate with variables or test from
which it should differ.
Campbell and Fiske (1959) proposed a systematic
experimental design for simultaneously confirming
the convergent and discriminant validities of a
psychological test, called the multitrait-multimet
hod matrix. This matrix is a rich resource of
data on reliability, convergent validity, and
discriminant validity.

(Gregory, 2007, p.134-135)
34
Convergent and Discriminant Validation (Cont.)
(Gregory, 2007, p.135)
35
Factor Analysis

Factor analysis is a specialized statistical
technique that is particularly useful for
investigating construct validity.
The purpose of factor analysis is to identify the
minimum number if determiners (factors) required
to account for the intercorrelations among a
battery of tests.
The goal in factor analysis is to find a smaller
set of dimension, called factors, that can
account for the observed array of
intercorrelations among individual tests.

(Gregory, 2007, p.135-136)
36
Factor Analysis (Cont.)

A factor loading is actually a correlation
between an individual test and a single factor
The final outcome of a factor analysis is a table
depicting the correlation of each test with each
factor.
A table of factor loadings help describe the
factorial composition of a test and thereby
provides information relevant to construct
validity.

(Gregory, 2007, p.114)
37
Factor Analysis (Cont.)
(Gregory, 2007, p.136)
38
Classification Accuracy

Many tests are used for screening purposes to
identify examinees who meet (or dont meet)
certain diagnostic criteria. For these
instruments, accurate classification is an
essential index of validity.
The MMSE is one of the most widely researched
screening tests in existence. In exploring its
utility, researchers have paid special attention
to two psychometric features that bear upon
validity sensitivity and specificity.

(Gregory, 2007, p.137)
39
Classification Accuracy (Cont.)

Sensitivity has to do with accurate
identification of patients who have a syndrome
(e.g., dementia).
Specificity has to do with accurate
identification of normal patients.
THE concepts of sensitivity and specificity are
chiefly helpful in dichotomous diagnostic
situations in which individual are presumed
either to manifest a syndrome or not.
Screening tests typically provide a cutoff score
used to identify possible cases of the syndrome
in queston.

(Gregory, 2007, p.137)
40
Classification Accuracy (Cont.)

In general, the validity of a screening test is
bolstered to the extent that if possesses both
high sensitivity and high specificity.
There are no exact cutoffs, but for many purposes
a test will need sensitivity and specificity that
exceed 80 or 90 percent in order to justify its
use.
The reality of assessment is that the examiner
must choose a cutoff score that provides a
balance between the sensitivity and specificity.
Sensitivity and specificity typify an inverse
relationship in every case.

(Gregory, 2007, p.137-139)
41
Classification Accuracy (Cont.)
(Gregory, 2007, p.138)
42
EXTRAVALIDITY CONCERS AND THE WIDENING SCOPE OF
TEST VALIDITY

Extravalidity concerns include side effects and
unintended consequences of testing.
Even if a test is valid, unbiased, and fair, the
decision to use it may ne governed by additional
considerations.

(Gregory, 2007, p.139)
43
Unintended Side Effects of Testing

The examiner must determine whether the benefits
of giving the test outweigh the costs of the
potential side effects. Furthermore, by
anticipating unintended side effects, the
examiner might be able to deflect or diminish
them.
A consideration of side effects should influence
an examiners decision to use a particular test
for a specified purpose.

(Gregory, 2007, p.139-140)
44
The Widening Scope of Test Validity

Several psychometric theoreticians have
introduced a wider, functionalist definition of
validity that asserts that a test is valid if it
serves the purpose for which it is used.
Test validity, then, is an overall evaluative
judgment of the adequacy and appropriateness of
inferences and actions that flow from test
scores.

(Gregory, 2007, p.140-141)
45
The Widening Scope of Test Validity (Cont.)

Messick (1980,1995, as cited in Gregory,
2007,p.141) argues that the new, wider conception
of validity rests on four bases
Traditional evidence of construct validity.
An analysis of the value implications of the test
interpretation.
Evidence for the usefulness of test
interpretation.
An appraisal of the potential and actual social
consequences, including side effects, from test
use.
A valid test is one that answers well to all four
facets of test validity.

(Gregory, 2007, p.140-141)
46
The Widening Scope of Test Validity (Cont.)

Psychological measurement is not a neutral
endeavor, it is an applied science that occurs in
a social and political context.

(Gregory, 2007, p.140-141)
47
SUMMARY

????????????????????(??)??????????????????????????
?????????????????????????,?????????????????
?????????????????? ???????????????????,????????,?
?????????????????
???????????(????????????)??????????????????,??????
??????????????????????????????(??????)??????,???
????????(????)????????

(Gregory, 2007, p.141)
48
SUMMARY (Cont.)

??????????????????????????????????????????????????
?,??????????,????????
??????????????????????????????????????????????????
??????????????????????????????????????????????????
??????????????,????????????,??????????????????????
????? (?????????????????????)?

(Gregory, 2007, p.141)
49
SUMMARY (Cont.)

??????????????( )?????,?????????????
????? (SEest)????????????????????????
?
??????,???????????????????????????(??????)????????
??????????????????????????????????????,??????????
?????????????????,????????

(Gregory, 2007, p.141)
50
SUMMARY (Cont.)

???????????????????????????????????????????????
???????????????????,??????????
????????????????,?????????????????????????????????
?????????????,?????????

(Gregory, 2007, p.142)
51
SUMMARY (Cont.)

?????????????????? ??????????????????????????????
??????????????????????????????????????????????????
?????????????????
??????????????????????(??,????????????????????????
?????????????)????????????????????????????????????
????????????????????,??????????????????

52
REFERENCE