More About Reliability and Introduction to Validity - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

More About Reliability and Introduction to Validity

Description:

Formula for alpha indicates that to boost the reliability of a test, you may ... can be 68% confident that the test-taker's true score lies within this interval ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 22
Provided by: psyc52
Category:

less

Transcript and Presenter's Notes

Title: More About Reliability and Introduction to Validity


1
More About Reliability and Introduction to
Validity
2
Techniques for Boosting Reliability
  • Formula for alpha indicates that to boost the
    reliability of a test, you may either
  • (1) increase the homogeneity of the test (its
    mean interitem correlation), or
  • (2) increase the number of items

3
Item Analysis
  • Once data is gathered for a test, it is possible
    to study the items one-by-one to see which is
    contributing most to the test, and which is
    contributing the least
  • This process, called item analysis, can be
    valuable for refining and improving a test
  • Gives item means and variances (to look for floor
    and ceiling effects), and correlations of each
    item with the total test score, called item-total
    correlations

4
Item-Total Correlations (ITCs)
  • Items with the highest ITCs contribute the most
    to the test homogeneity (and therefore to the
    test reliability)
  • Items with ITCs near 0 (or even negative ITCs)
    detract from homogeneity (and therefore depress
    reliability)
  • With this item-by-item information, we can select
    and delete items to affect the reliability of a
    test

5
Boosting Reliability Adding More Items
reliability of existing test
multiple of items needed
desired reliability
6
Examples
  • If k 2, then we would need to double the number
    of items on our test to achieve the desired
    reliability
  • If our 10-item test has a disappointing
    reliability of .5, and we want to obtain a
    reliability of .8
  • Therefore, our test needs to be 4(10) 40 items
    long
  • We will need to add (4010) 30 items

7
Stability
  • A.k.a. test-retest reliability, stability indexes
    how well a test correlates with itself sometime
    later
  • Stability is the consistency of test scores over
    time
  • Stability and internal consistency reliability
    for the most part refer to different sources of
    random measurement error (e.g., mood vs. silly
    scale)
  • Measurement of traits requires both high
    stability and high internal consistency
    measurement of states requires only high internal
    consistency

8
What is Adequate Reliability?
  • For a test to be used in research (for computing
    group means and correlations with other
    variables), reliability should be at least .6
  • For a test to be used to make judgments about
    individuals, such as in clinical and counseling
    work, reliability should be at least .8 and
    preferably more like .9 or higher

9
Standard Error of Measurement
  • For the clinical problem of interpreting the
    score of a particular person, an index of the
    likely amount of error in the persons test score
    is useful
  • Standard error of measurement is most common

standard deviation of the test
reliability of the test
10
Using the standard error of measurement
  • Individuals test score serr yields a 68 CI
    we can be 68 confident that the test-takers
    true score lies within this interval
  • Individuals test score 2(serr) yields a 95
    CI we can be 95 confident that the test-takers
    true score lies within this interval
  • WISC-IV subtest with sd3 and reliability of .75
  • For a person who scores 8, the 95 CI (5,11)

11
Reliability of Differences Between Tests
  • Random measurement error becomes most vexing when
    we are considering differences in test scores
  • E.g., Assessing change on a test such as before
    vs. after therapy, or discrepancies in several
    subtests
  • Difference between 2 measures is usually less
    reliable than either original measure, and can be
    much less reliable

12
Reliability of Difference Scores
  • Reliability of a difference between subtest x and
    subtest y is

reliability of subtest x
reliability of subtest y
correlation of subtest x with subtest y
13
Example
  • If were interested in the difference between two
    subtests whose average reliability is .75 and
    whose intercorrelation is .6, the resulting
    reliability is
  • Proportion of random measurement error more than
    doubles when looking at discrepancies between
    these tests
  • Due to subtracting out half of systematic
    variance in both subtests, while the random
    measurement errors persist

14
Correction For Attenuation
latent or true correlation between
variables X and Y (measured without error)
manifest or observed correlation between
our tests of X and Y (measured with error)
reliability of test of X
reliability of test of Y
15
Test Validity
  • Validity has to do with whether a test measures
    what it is supposed to measure, and not something
    else instead
  • Validity of a test is examined by correlating it
    with an external variable (i.e., a measure
    outside of the test in question)
  • Such a correlation is called a validity
    coefficient

16
Sources of Test Invalidity
  • The 2 sources of test invalidity are
  • Random measurement error (RME)
  • Systematic measurement error (SME)
  • RME limits test validity in a way we already know
    about The maximum a validity coefficient can
    attain is the square root of the tests
    reliability
  • Thus, demonstrating high test validity is also
    necessarily a demonstration of high test
    reliability
  • The role of SME is the aspect of test invalidity
    well focus on now

17
Correlating With External Variables
  • A test could correlate with an external variable
  • Due to its non-error test variance. That is, the
    external variable measures, or is related to,
    what our test measures.
  • Due to its systematic measurement error. That is,
    the external variable measures what our test
    should not measure.

18
Non-Error Test Variance (NETV)
  • Predictive Criterion Validity Whether our test
    correlates with some future criterion, such as
    future violence
  • Concurrent Criterion Validity Whether our test
    correlates with a same time criterion (such as an
    unimpeachable test that is quite expensive to
    administer)
  • Concurrent Validity Whether our test correlates
    well with other, previously validated tests that
    use similar measurement methods

19
Non-Error Test Variance (NETV) cont
  • Convergent Validity Whether our test correlates
    well with other, previously validated tests that
    use different measurement methods
  • Construct Validity Whether our test can be used
    to verify hypotheses about construct hypotheses
    stem from theory of construct most inclusive
  • Content Validity Whether our test looks like it
    has appropriate content (by a panel of experts)
  • Face Validity Whether our test looks like it has
    appropriate content (to the untrained eye)

20
Systematic Measurement Error (SME)
  • Discriminant Validity Demonstration that our
    test does not correlate with what it should not
    be measuring

21
Several External Variables
  • Possible to examine the correlations of a test
    with several external variables, possibly some
    correlating due to NETV and some due to SME

Factor II
X3
E
G
F
X2
A
C
X1
X4
B
Factor I
D
Write a Comment
User Comments (0)
About PowerShow.com