Ethical and Social Implications of Testing - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Ethical and Social Implications of Testing

Description:

Correlations between a new test and existing test are often cited as evidence of ... One way to bolster the validity of a new instrument is to show that, on average, ... – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 53
Provided by: psy95
Category:

less

Transcript and Presenter's Notes

Title: Ethical and Social Implications of Testing


1
(No Transcript)
2
Basic Concepts of Validity
  • ??? ??
  • ??? ??
  • ?????????
  • ??????,??????

3
VALIDITY
  • The standard tripartite division of basic
    concepts of validity content, criterion-related,
    and construct validity
  • Extravalidity concerns include side effects and
    unintended consequences of testing.
  • Validity must b built into the test from the
    outset rather than being limited to the final
    stage of test development.
  • The validity of a test is the extent to which it
    measures what it claims to measure.

(Gregory, 2007, p.119-120)
4
VALIDITY (Cont.)
  • Validity defines the meaning of test scores.
    Reliability is important, too, but only insofar
    as it constrains validity.
  • To the extent that a test is unreliable, it
    cannot be valid. Reliability is a necessary but
    not a sufficient precursor of validity.
  • Test validation is a developmental process that
    begins with test construction and continues
    indefinitely.
  • Test validity hinges upon the accumulations of
    research findings.

(Gregory, 2007, p.119-120)
5
VALIDITY A DEFINITION
  • A definition of validity paraphrased from the
    influential Standards for Educational and
    Psychological Testing (AERA, APA, NCME, 1985,
    1999, as cited in Gregory, 2007, p.101) A test
    is valid to the extent that inferences made from
    it are appropriate, meaningful, and useful.
  • Classical test theory was the basis for test
    development throughout most of the twentieth
    century.

(Gregory, 2007, p.120-121)
6
VALIDITY A DEFINITION (Cont.)
  • Validity reflects an evolutionary, research-based
    judgment of how adequately a test measures the
    attribute it was designed to measure.
  • The validity of test is characterized on a
    continuum ranging from weak to acceptable to
    strong.
  • The traditionally different ways of accumulating
    validity evidence have been grouped into three
    categories
  • Content validity
  • Criterion-related validity
  • Construct validity

(Gregory, 2007, p.120-121)
7
CONTENT VALIDITY
  • Content validity is determined by the degree to
    which the questions, tasks or items on a test are
    representative of the universe of behavior the
    test was designed to sample.
  • Content validity is a useful concept when a great
    deal is known about the variable that the
    researcher wishes to measure.
  • When evaluating content validity, response
    specification is also an integral part of
    defining the relevant universe of behaviors.

(Gregory, 2007, p.121-122)
8
CONTENT VALIDITY (Cont.)
  • Content validity is more difficult to assure when
    the test measures an ill-defined trait. What
    usually passes for content validity is the
    considered opinion of expert judges.
  • The test developer asserts that a panel of
    experts reviewed the domain specification
    carefully and judged the following test questions
    to posses content validity

(Gregory, 2007, p.121-122)
9
Quantification of Content Validity (Cont.)
content validity 87/ (44587)
(Gregory, 2007, p.122-123)
10
Quantification of Content Validity (Cont.)
  • A coefficient of content validity is just one
    piece of evidence in the evaluation of a test.
  • The commonsense approach to content validity
    cannot identify nonexistent items that should be
    added to a test to help make the pool of
    questions more representative of the intended
    domain.

(Gregory, 2007, p.123)
11
Face Validity
  • A test has face validity if it looks valid to
    test users, examiners, and especially the
    examinees.
  • Face validity is really a matter of social
    acceptability.
  • Face validity should not be confused with
    objective validity, which is determined by the
    relationship of test scores to other sources of
    information.

(Gregory, 2007, p.123)
12
CRITERION-RELATED VALIDITY
  • Criterion-related validity is demonstrated when a
    test is shown to be effective in estimating an
    examinees performance on some outcome measure.
    The variable of primary interest is the outcome
    measure, called a criterion.
  • Concurrent Validity - the criterion measure are
    obtained at approximately the same time as the
    test scores.
  • Predictive Validity the criterion measures are
    obtained in the future usually months or years
    after the test scores are obtained.

(Gregory, 2007, p.102)
13
Characteristics of a Good Criterion
  • A criterion is any outcome measure against which
    a test is validated.
  • Criteria must be more than must imaginative, they
    must also be reliable, appropriate, and free of
    contamination from the test itself (criterion
    contamination).
  • An unreliable criterion will be inherently
    unpredictable, regardless of the merits of the
    test.
  • The theoretical upper limit of the validity
    coefficient is constrained by the reliability f
    both the test and the criterion

(Gregory, 2007, p.103)
14
Concurrent Validity
  • An evaluation of concurrent validity indicates
    the extent to which test scores accurately
    estimate an individuals present position on the
    relevant criterion.
  • A test with demonstrated concurrent validity
    provides a shortcut for obtaining information
    that might otherwise require the extended
    investment of professional time.
  • Correlations between a new test and existing test
    are often cited as evidence of concurrent
    validity.
  • The criterion tests must have been validated
    through correlation with appropriate nontest
    behavioral data.
  • The instrument being validated must measure the
    same construct as the criterion tests.

(Gregory, 2007, p.125)
15
Predictive Validity
  • Predictive validity is particularly relevant for
    entrance examinations and employment tests.
  • A regression equation describes the best-fitting
    straight line for estimating the criterion form
    the test.

(Gregory, 2007, p.125-126)
16
Validity Coefficient and the Standard Error of
Estimate
  • Perhaps the most popular approach to express the
    relationship between test scores and criterion
    measures is to compute the correlation between
    test and criterion ( ) validity
    coefficient.
  • The higher the validity coefficient , the more
    accurate is the test in predicting the criterion.

(Gregory, 2007, p. 126-127)
17
Validity Coefficient and the Standard Error of
Estimate (Cont.)
  • The standard error of estimate (SEest) is the
    margin of error to be expected in the predicted
    criterion score.
  • The SEM indicates the margin of measurement error
    caused by reliability of the test, whereas SEest
    indicates the margin of prediction error caused
    by the imperfect validity of the test.

(Gregory, 2007, p. 126-127)
18
Decision Theory Applied to Psychological Tests
  • Proponents of decision theory stress that the
    purpose of psychological testing is not
    measurement per se but measurement in the service
    of decision making.

(Gregory, 2007, p. 127-128)
19
Decision Theory Applied to Psychological Tests
(Cont.)
  • Proponents of decision theory make two
    fundamental assumptions about the use of
    selection tests
  • The value of various outcome to the institution
    can be expressed in terms of a common utility
    scale.
  • In institutional selection decisions, the most
    generally useful strategy is one that maximizes
    the average gain on the utility scale (or
    minimizes average loss) over many similar
    decisions.

(Gregory, 2007, p. 127-128)
20
Taylor-Russell Tables
  • Taylor and Russell (1939) published the
    statistical tables that permit a test user to
    determine the expected proportion of successful
    applicants selected with use of a test.
  • In order to use the Taylor-Russell tables, the
    tester must specify (1) the predictive validity
    of the test, (2) the selection ratio, and (3) the
    base rate for successful applicants. A change in
    any of these factors will alter the selection
    accuracy of the test.

(Gregory, 2007, p.129-131)
21
Taylor-Russell Tables (Cont.)
  • The base rate is the proportion of successful
    applicants who would be selected using current
    method, without benefit of the new test.
  • When all three of these factors are known, the
    Taylor-Russell tables can be consulted to
    determine the proportion of successes expected
    through the application of the test. In this
    manner, the test user can determine the extent
    to which using a new test would improve selection
    over the base rate obtained from existing
    methods.

(Gregory, 2007, p.129-131)
22
Taylor-Russell Tables (Cont.)
(Gregory, 2007, p.130)
23
Taylor-Russell Tables (Cont.)
  • The most intriguing conclusion to emerge from the
    Taylor-Russell table is that tests with poor
    validity can, nonetheless, substantially improve
    selection accuracy if the selection ratio is
    low enough,

(Gregory, 2007, p.129-131)
24
CONSTRUCT VALIDITY
  • A Construct validity is at theoretical,
    intangible quality of trait in which individual
    differ (Messick, 1995, as cited in Gregory, 2007,
    p.107).
  • A test designed to measure a construct must
    estimate the existence of an inferred, underlying
    characteristic (e.g., leadership ability) based
    on a limited sample of behavior.
  • Construct validity refers to the appropriateness
    of these inferences about the underlying
    construct.

(Gregory, 2007, p. 131)
25
CONSTRUCT VALIDITY (Cont.)
  • All psychological construct possess two
    characteristics in common
  • There is no single external referent sufficient
    to validate the existence of the construct that
    is, the construct cannot be operationally
    defined.
  • Nonetheless, a network of interlocking
    suppositions can be derived form existing theory
    about the construct.
  • Construct validity pertains to psychological
    tests that claim to measure complex,
    multifaceted, and theory-bound psychological
    attribute such as leadership ability,
    intelligence, and the like.

(Gregory, 2007, p.131-132)
26
CONSTRUCT VALIDITY (Cont.)
  • The crucial point to understand about construct
    validity is that no criterion or universe of
    content validity is accepted as entirely adequate
    to define the quality to be measured (Cronbach
    Meehl, 1955, as cited in Gregory, 2007, p.132).
  • To evaluate the construct validity of a test, we
    must amass a variety of evidence form numerous
    sources.
  • Individual studies of content, concurrent, and
    predictive validity are regarded merely as
    supportive evidence in the cumulative quest for
    construct validation.

(Gregory, 2007, p.131-132)
27
APPROACHES TO CONSTRUCT VALIDITY
  • Most studies of construct validity tall into one
    of the following categories
  • Analysis to determine whether the test items or
    subtests are homogeneous and therefore measure a
    single construct.
  • Study of development changed to determine whether
    they are consistent with the theory of the
    construct.
  • Research to ascertain whether group differences
    on test scores are theory-consistent.
  • Analysis to determine whether intervention
    effects on test scores are theory-consistent.

(Gregory, 2007, p.132)
28
APPROACHES TO CONSTRUCT VALIDITY (Cont.)
  • Correlation of the test with other related and
    unrelated tests and measures.
  • Factor analysis of test scores in relation to
    other sources of information.
  • Analysis to determine whether test scores allow
    for the correct classification of examinees.

(Gregory, 2007, p.108)
29
Test Homogeneity
  • If a test measures a single construct, the its
    component items (or subtests) likely will be
    homogeneous (also referred to as internally
    consistent).
  • The aim of test development is to select items
    that form a homogeneous scale.
  • Homogeneity is an important first step in
    certifying the construct validity of a new test,
    but standing alone it is weak.

(Gregory, 2007, p.108-110)
30
Appropriate Developmental Changes
  • Many constructs be assume to show regular
    age-graded changes form early childhood in to
    mature adulthood and perhaps beyond (e.g.,
    vocabulary knowledge).
  • This approach does not provide information about
    how the construct relates to other constructs.

(Gregory, 2007, p.133)
31
Theory-Consistent Group Differences
  • One way to bolster the validity of a new
    instrument is to show that, on average, persons
    with different backgrounds and characteristics
    obtain theory-consistent scores on the test.
  • Crandall (1981) developed a social interest scale
    that illustrates the use of theory-consistent
    group differences in the process of construct
    validation.

(Gregory, 2007, p.133)
32
Theory-Consistent Intervention Effects
  • Test scores change in appropriate direction and
    amount in reaction to planned or unplanned
    interventions.
  • Willis and Schaie (1986)

(Gregory, 2007, p.133-134)
33
Convergent and Discriminant Validation
  • Convergent validity is demonstrated when a test
    correlated highly with other variables or tests
    with which it shares an overlap of constructs.
  • Discriminant validity is demonstrated when a test
    does not correlate with variables or test from
    which it should differ.
  • Campbell and Fiske (1959) proposed a systematic
    experimental design for simultaneously confirming
    the convergent and discriminant validities of a
    psychological test, called the multitrait-multimet
    hod matrix. This matrix is a rich resource of
    data on reliability, convergent validity, and
    discriminant validity.

(Gregory, 2007, p.134-135)
34
Convergent and Discriminant Validation (Cont.)
(Gregory, 2007, p.135)
35
Factor Analysis
  • Factor analysis is a specialized statistical
    technique that is particularly useful for
    investigating construct validity.
  • The purpose of factor analysis is to identify the
    minimum number if determiners (factors) required
    to account for the intercorrelations among a
    battery of tests.
  • The goal in factor analysis is to find a smaller
    set of dimension, called factors, that can
    account for the observed array of
    intercorrelations among individual tests.

(Gregory, 2007, p.135-136)
36
Factor Analysis (Cont.)
  • A factor loading is actually a correlation
    between an individual test and a single factor
  • The final outcome of a factor analysis is a table
    depicting the correlation of each test with each
    factor.
  • A table of factor loadings help describe the
    factorial composition of a test and thereby
    provides information relevant to construct
    validity.

(Gregory, 2007, p.114)
37
Factor Analysis (Cont.)
(Gregory, 2007, p.136)
38
Classification Accuracy
  • Many tests are used for screening purposes to
    identify examinees who meet (or dont meet)
    certain diagnostic criteria. For these
    instruments, accurate classification is an
    essential index of validity.
  • The MMSE is one of the most widely researched
    screening tests in existence. In exploring its
    utility, researchers have paid special attention
    to two psychometric features that bear upon
    validity sensitivity and specificity.

(Gregory, 2007, p.137)
39
Classification Accuracy (Cont.)
  • Sensitivity has to do with accurate
    identification of patients who have a syndrome
    (e.g., dementia).
  • Specificity has to do with accurate
    identification of normal patients.
  • THE concepts of sensitivity and specificity are
    chiefly helpful in dichotomous diagnostic
    situations in which individual are presumed
    either to manifest a syndrome or not.
  • Screening tests typically provide a cutoff score
    used to identify possible cases of the syndrome
    in queston.

(Gregory, 2007, p.137)
40
Classification Accuracy (Cont.)
  • In general, the validity of a screening test is
    bolstered to the extent that if possesses both
    high sensitivity and high specificity.
  • There are no exact cutoffs, but for many purposes
    a test will need sensitivity and specificity that
    exceed 80 or 90 percent in order to justify its
    use.
  • The reality of assessment is that the examiner
    must choose a cutoff score that provides a
    balance between the sensitivity and specificity.
  • Sensitivity and specificity typify an inverse
    relationship in every case.

(Gregory, 2007, p.137-139)
41
Classification Accuracy (Cont.)
(Gregory, 2007, p.138)
42
EXTRAVALIDITY CONCERS AND THE WIDENING SCOPE OF
TEST VALIDITY
  • Extravalidity concerns include side effects and
    unintended consequences of testing.
  • Even if a test is valid, unbiased, and fair, the
    decision to use it may ne governed by additional
    considerations.

(Gregory, 2007, p.139)
43
Unintended Side Effects of Testing
  • The examiner must determine whether the benefits
    of giving the test outweigh the costs of the
    potential side effects. Furthermore, by
    anticipating unintended side effects, the
    examiner might be able to deflect or diminish
    them.
  • A consideration of side effects should influence
    an examiners decision to use a particular test
    for a specified purpose.

(Gregory, 2007, p.139-140)
44
The Widening Scope of Test Validity
  • Several psychometric theoreticians have
    introduced a wider, functionalist definition of
    validity that asserts that a test is valid if it
    serves the purpose for which it is used.
  • Test validity, then, is an overall evaluative
    judgment of the adequacy and appropriateness of
    inferences and actions that flow from test
    scores.

(Gregory, 2007, p.140-141)
45
The Widening Scope of Test Validity (Cont.)
  • Messick (1980,1995, as cited in Gregory,
    2007,p.141) argues that the new, wider conception
    of validity rests on four bases
  • Traditional evidence of construct validity.
  • An analysis of the value implications of the test
    interpretation.
  • Evidence for the usefulness of test
    interpretation.
  • An appraisal of the potential and actual social
    consequences, including side effects, from test
    use.
  • A valid test is one that answers well to all four
    facets of test validity.

(Gregory, 2007, p.140-141)
46
The Widening Scope of Test Validity (Cont.)
  • Psychological measurement is not a neutral
    endeavor, it is an applied science that occurs in
    a social and political context.

(Gregory, 2007, p.140-141)
47
SUMMARY
  • ????????????????????(??)??????????????????????????
    ?????????????????????????,?????????????????
  • ?????????????????? ???????????????????,????????,?
    ?????????????????
  • ???????????(????????????)??????????????????,??????
    ??????????????????????????????(??????)??????,???
    ????????(????)????????

(Gregory, 2007, p.141)
48
SUMMARY (Cont.)
  • ??????????????????????????????????????????????????
    ?,??????????,????????
  • ??????????????????????????????????????????????????
    ??????????????????????????????????????????????????
  • ??????????????,????????????,??????????????????????
    ????? (?????????????????????)?

(Gregory, 2007, p.141)
49
SUMMARY (Cont.)
  • ??????????????( )?????,?????????????
  • ????? (SEest)????????????????????????
    ?
  • ??????,???????????????????????????(??????)????????
    ??????????????????????????????????????,??????????
    ?????????????????,????????

(Gregory, 2007, p.141)
50
SUMMARY (Cont.)
  • ???????????????????????????????????????????????
    ???????????????????,??????????
  • ????????????????,?????????????????????????????????
    ?????????????,?????????

(Gregory, 2007, p.142)
51
SUMMARY (Cont.)
  • ?????????????????? ??????????????????????????????
    ??????????????????????????????????????????????????
    ?????????????????
  • ??????????????????????(??,????????????????????????
    ?????????????)????????????????????????????????????
  • ????????????????????,??????????????????

52
REFERENCE
  • Gregory, R.J. (2007). Psychological testing
    history, principles, and applications (5th ed.).
    Boston, MA Pearson Education, Inc.
Write a Comment
User Comments (0)
About PowerShow.com