Reliability - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Reliability

Description:

Consistency gives researchers confidence that the results actually represent the ... Consistency Over Time. Usually expressed ... Internal-Consistency Methods ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 58
Provided by: DrDavidD
Category:

less

Transcript and Presenter's Notes

Title: Reliability


1
Reliability Validity of Instruments
  • Lesson 8

2
The MaxMinCon Principle
  • Maximize
  • The variance of the variables under study
  • You want greater variance in dependent variable
    as a result of the independent variable.
  • Make treatments as different as possible.

3
The MaxMinCon Principle
  • Minimize
  • Error or random variance including errors in
    measurement.

4
From Week 1
  • Typically concerned with three aspects of
    validity
  • Measurement validity exists when a measure
    measures what we think it measures.
  • Generalizability exists when a conclusion holds
    true for the population, group, setting, or event
    that we say it does, given the conclusions that
    we specify.

5
Introduction to Validity
  • (continued)
  • Internal validity (also called causal validity)
    exists when a conclusion that A leads to or
    results in B is correct.

6
Measurement Validity
  • How do we know that this measuring instrument
    measures what we think it measures?
  • Deals with accuracy
  • There is an inference involved
  • Between the indicators we can observe and the
    construct we aim to measure.

7
Types of Validity
  • Content
  • Face validity
  • Criterion-related
  • Concurrent
  • Predictive
  • Construct
  • Convergent
  • Discriminant

8
Content Validity
  • Focuses on whether the full content of a
    conceptual definition is represented in the
    measure
  • Essentially, checks the operationalization
    against the relevant content domain of the
    construct(s) you are measuring.

9
Content Validity
  • Driven by the literature review.
  • Did you include all the content that you should
    have to measure the construct?
  • What are the criteria to measure the construct?

10
Content Validation
  • Two steps
  • Specify the content of a definition.
  • Develop indicators which sample from all areas of
    the content in the definition.

11
Content Validity
  • Face validity
  • A visual inspection of the test by expert (or
    non-expert) reviewers.
  • Often included as a part (or subset) of content
    validity.

12
Criterion-related Related
  • An indicator is compared with another measure of
    the same construct in which the research has
    confidence.
  • Comparing test or scale scores with one or more
    external variables known or believed to measure
    the attributes under study.

13
Criterion-related Related
  • An instrument high in criterion-related validity
    helps test users make better decisions in terms
    of placement, classification, selection, and
    assessment.

14
Criterion-related
  • Two methods
  • Concurrent
  • Predictive
  • Difference is the temporal (time) relationship
    between the operationalizations and the criterion.

15
Concurrent
  • The criterion variable exists in the present
  • Compares the operationalizations ability to
    distinguish between groups that it should
    theoretically be able to distinguish between.

16
Concurrent
  • Example
  • A researcher might wish to establish the
    awareness of students about their performance in
    school during the past year.
  • Ask student What was your grade point average
    last year?
  • Compare to school records
  • Calculate correlation

17
Predictive
  • Criterion variable will not exist until later.
  • You assess the operationalizations ability to
    predict something it should theoretically be able
    to predict

18
Predictive
  • Example
  • A researcher might wish student to anticipate
    their performance in school during the next year.
  • Ask student What do you think your GPA will be
    next year?
  • Compare to school records after year is completed
  • Calculate correlation

19
Construct Validity
  • Focuses on how well a measure conforms with
    theoretical expectations
  • Established by relating a presumed measure of a
    construct to some behavior that it is
    hypothesized to underlie.

20
Construct Validity
  • Think of construct validity as the distinction
    between two broad territories the land of theory
    and the land of observation.

21
Construct Validity
22
Construct Validity Methods
  • Convergent
  • Examine the degree to which the
    operationalization is similar to (converges on)
    other operationalizations to which it
    theoretically should be similar.

23
Construct Validity Methods
  • Discriminant
  • Examine the degree to which the
    operationalization is not similar to (diverges
    from) other operationalizations that it
    theoretically should not be similar to.

24
Reliability
  • Reliability is another important consideration,
    since researchers want consistent results from
    instrumentation
  • Consistency gives researchers confidence that the
    results actually represent the achievement of the
    individuals involved.

25
Two Main Aspects to Reliability
  • Consistency over time (or stability)
  • Internal consistency

26
Consistency Over Time
  • Usually expressed in this question
  • If the same instrument were given to the same
    people, under the same circumstances, but at a
    different time, to what extent would they get the
    same scores?
  • Takes two administrations of the instrument to
    establish

27
Internal Consistency
  • Relates to the concept-indicator idea of
    measurement
  • Since we will use multiple items (indicators) to
    infer to a concept, the question concerns the
    extent that these items are consistent with each
    other.
  • All working in the same direction
  • Only one administration of instrument

28
Reliability
  • Scores obtained can be considered reliable but
    not valid.
  • An instrument should be reliable and valid
    (Figure 8.2), depending on the context in which
    an instrument is used.

29
Errors of Measurement
  • Because errors of measurement are always present
    to some degree, variation in test scores are
    common.
  • This is due to
  • Differences in motivation
  • Energy
  • Anxiety
  • Different testing situation

30
Some Factors Influencing Reliability
  • The greater the number of items, the more
    reliable the test.
  • In general, the longer the test administration
    time, the greater the reliability.
  • The narrower the range of difficulty of items,
    the greater the reliability.
  • The more objective the scoring, the greater the
    reliability.

31
Some Factors Influencing Reliability
  • The greater the probability of success by chance,
    the lower the reliability.
  • Inaccuracy in scoring leads to unreliability.
  • The more homogeneous the material, the greater
    the reliability.

32
Some Factors Influencing Reliability
  • The more common the experiences of the
    individuals tested, the greater the reliability
  • Catch/trick questions lower reliability
  • Subtle factors leading to misinterpretation of
    the test item lead to unreliability.

33
Reliability Coefficient
  • Expresses a relationship between scores of the
    same instrument at two different times or parts
    of the instrument.
  • The 3 best known methods are
  • Test-retest
  • Equivalent forms method
  • Internal consistency method

34
Test-Retest Method
  • Involves administering the same test twice to the
    same group after a certain time interval has
    elapsed.
  • A reliability coefficient is calculated to
    indicate the relationship between the two sets of
    scores.
  • A coefficient of stability ( a Pearson product
    moment correlation)

35
Test-Retest Method
  • Reliability coefficients are affected by the
    lapse of time between the administrations of the
    test.
  • An appropriate time interval should be selected.
  • Greater than zero but less than 6 months

36
Test-Retest Method
  • When reporting a coefficient of stability
  • Always report the time interval between
    administrations (as a subscript to the r)
  • Report any significant experiences that may have
    intervened in the measurements.

37
Test-Retest Method
  • When reporting a coefficient of stability
  • Describe the conditions of each measurement to
    account for measurement error due to poor
    lighting, loud noises and the like.

38
Equivalent-Forms Method
  • Two different but equivalent (alternate or
    parallel) forms of an instrument are administered
    to the same group during the same time period.
  • Also called parallel form reliability procedure.

39
Equivalent-Forms Method
  • A reliability coefficient is then calculated
    between the two sets of scores.
  • A coefficient of equivalence (again, a Pearson
    r correlation coefficient)

40
Equivalent-Forms Method
  • Parallel forms should
  • Contain the same number of items
  • Contain items of equal difficulty
  • Have means, variances, and interrelations with
    other variables that are not significantly
    different from each other.

41
Equivalent-Forms Method
  • Parallel form tests are often needed for studies
    with pretests and posttests
  • Where it is important that they are not the same
    tests but measure the same things.

42
Equivalent-Forms Method
  • It is possible to combine the test-retest and
    equivalent-forms methods by giving two different
    forms of testing with a time interval between the
    two administrations.
  • This produces a coefficient of stability and
    equivalency

43
Internal-Consistency Methods
  • There are several internal-consistency methods
    that require only one administration of an
    instrument.
  • In essence, computer splits the instrument into
    two halves and correlates scores from each half.
  • Because it cuts it in half, a short instrument
    will be calculated conservatively.

44
Internal-Consistency Methods
  • Split-half Procedure
  • involves scoring two halves of a test separately
    for each subject and calculating the correlation
    coefficient between the two scores.
  • split systematically (odd-even) or randomly
  • A Spearman-Brown correction formula is used

45
Internal-Consistency Methods
  • Other measures do not require the researcher to
    split the instrument in half
  • Assess the homogeneity of the items.

46
Internal-Consistency Methods
  • Two sources of random error are reflected in the
    measures of reliability using homogeneity of the
    items.
  • Content sampling (as in split-half)
  • Heterogeneity of the behavior domain sampled
  • The more homogeneous the domain is, the less
    error thus the higher the reliability

47
Internal-Consistency Methods
  • Kuder-Richardson Approaches (KR20 and KR21)
  • Requires 3 pieces of information
  • Number of items on the test
  • The mean
  • The standard deviation

48
Internal-Consistency Methods
  • Kuder-Richardson Approaches (KR20 and KR21)
  • KR20 is used on dichotomous items KR21 on
    multiple response items
  • This test does assume that all items have equal
    difficulty
  • If not, estimates will be lower

49
Internal-Consistency Methods
  • Alpha Coefficient (Cronbachs Alpha)
  • A general form of the KR20 used to calculate the
    reliability of items that are not scored right
    vs. wrong.

50
Internal-Consistency Methods
  • KR21 and Cronbachs alpha are most widely used
    because of only requiring one administration.
  • Choice may depend on which statistical package
    you use (SPSS has Cronbachs alpha).

51
Running Our Reliability
  • Design items that appear to measure the same
    domain.
  • Collect data using a pilot test.
  • Run Cronbachs alpha procedure.

52
Running Our Reliability
  • View correlation matrix
  • No 0s or 1s?
  • No negatives?
  • Eliminate negatives, ones, and zeros
  • View alpha if item deleted
  • If alpha can be raised by deletion decide if
    reduction in number of items will hurt content
    validity.
  • Rerun alpha and repeat procedure

53
How high must reliability of a measurement be?
  • There is no absolute answer to this question.
  • May depend on competition -- Are there stronger
    instruments available?
  • Early stages of developing a test for a
    construct, a reliability of .50 or .60 may
    suffice.
  • Usually the higher the better (at least .70)

54
How high must reliability of a measurement be?
  • Remember
  • A reliability coefficient of .90 says that 90 of
    the variance in scores is due to true variance in
    the characteristic measures leaving only 10 due
    to error.

55
Standard Error of Measurement
  • An index that shows the extent to which a
    measurement would vary under changed
    circumstances.
  • There are many possible standard errors for
    scores given.
  • Also known as measurement error, a range of
    scores that show the amount of error which can be
    expected. (Appendix D)

56
Scoring Agreement
  • Scoring agreement requires a demonstration that
    independent scorers can achieve satisfactory
    agreement in their scoring.
  • Instruments that use direct observations are
    highly vulnerable to observer differences (e.g.
    qualitative research methods).

57
Scoring Agreement
  • What is desired is a correlation of at least .90
    among scorers as an acceptable level of agreement.
Write a Comment
User Comments (0)
About PowerShow.com