Reliability - PowerPoint PPT Presentation

About This Presentation
Title:

Reliability

Description:

Reliability & Validity Overview for this lecture Ethical considerations in testing Reliability of tests Split-half reliability Validity of tests Reliability and ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 37
Provided by: sdreed03
Category:

less

Transcript and Presenter's Notes

Title: Reliability


1
Reliability Validity
2
Overview for this lecture
  • Ethical considerations in testing
  • Reliability of tests
  • Split-half reliability
  • Validity of tests
  • Reliability and validity in designed research
  • Internal and external validity

3
What does this resemble?
4
Rorschach test
  • You look at several images like this, and say
    what they resemble
  • At the end of the test, the tester says
  • you need therapy
  • or you can't work for this company
  • What assurance would you expect about the test?

5
Or imagine some asks your child to draw a human
figure
  • The tester says this shows signs that your
    child is a victim of sexual abuse.
  • What questions would you ask?

6
What questions would you ask?
  • Is it valid for the purpose to which you plan to
    put it?
  • Can it be faked?
  • How were the norms constructed?
  • Can we see the data on which the norm is based?
  • Are there tester effects?
  • Is scoring reliable?
  • Is it culture fair are there separate norms for
    my culture?

7
Ethics developmental role for a test
  • Sometimes said a good test will let you give
    the subject a debrief that they can use to help
  • - personal decisions
  • - career
  • - choice of therapy
  • - personal development targets
  • eg learning styles study practices

But how reliable / specific is the test, really?
8
Psychological Testing
  • Occurs widely
  • in personnel selection
  • in clinical settings
  • in education
  • Test construction is an industry
  • There are many standard tests available
  • What constitutes a good test?

9
Working assumption - a test is
  • a set of items
  • questions, pictures,
  • to which an individual responds
  • rating, comment, yes/no .
  • The responses to these items are added up
    (combined in some way) to create an overall score
    that assesses one psychological construct

Also called a scale
10
Eg. The Warwick Sweetness Scale
  • 1, 2, 3, 4, 5
  • How much do you like sugar in coffee?
  • How much do you like toffee?
  • How much do you like ice-cream?
  • How much do you like pudding?
  • How much do you like chocolate cake?
  • How much do you like honey?

11
Specificity sensitivity
  • Critical for diagnostic tests (dyslexic
    autistic diabetic)
  • Sensitivity the test picks out people who really
    do have the condition
  • Specificity the test excludes people who do not
    have the condition

12
Reliability
  • consistency
  • Test-retest reliability
  • Parallel forms reliability
  • Split-half reliability
  • Intraclass correlation (ICC, Cronbachs alpha)
  • Inter-rater reliability (kappa, ICC)

13
Split-half reliability
even
odd
  1. sugar in coffee? 3
  2. toffee? 4
  3. ice-cream? 2
  4. pudding? 3
  5. chocolate cake? 5
  6. honey? 4

3
4
2
3
5
4
Total Warwick Sweetness score 21
10
11
14
Split-half reliability
  • Split test in two halves do you get similar
    scores on the halves?
  • - Separate sub-totals for odd and even items
    (for each subject)
  • - correlate these subtotals (rhalf)
  • Adjust the reliability estimate with the
    Spearman-Brown correction
  • rtest (2 rhalf) / (1 rhalf)

15
Reliability v. accuracy
  • Can be reliable but not accurate

m1 m2 m3
1 11 21
2 12 22
3 13 23
4 14 24
5 15 25
16
(No Transcript)
17
Validity
  • Interpretation link to reality
  • The relationship between test scores and the
    conclusions we draw from them.
  • "The degree to which evidence and theory support
    the interpretation of test scores entailed by
    proposed use of tests." (AERA/APA/NCME, 1999)
  • IQ tests intelligence
  • Personality tests personality

18
Validity
  • Fast cars

move quickly the speed test
are powerful the bhp test
are red the colour test
19
Validity
  • "Validation is inquiry into the soundness of the
    interpretations proposed for scores from a test"
    Cronbach (1990, p. 145)
  • Face validity
  • Content validity
  • Construct validity
  • Criterion validity

20
Face validity
  • Does a test, on the face of it, seem to be a good
    measure of the construct
  • E.g., how fast can a particular car go?
  • time it over a fixed distance
  • ? Direct measurement of speed has good face
    validity

21
Face validity
  • The bishop / colonel question

22
Content validity
  • Does the test systematically cover all parts of
    the construct?

Eg the examination for a module
Topics taught Soup Fish Beetroot Custard Rice
Topics examined Soup Beetroot Custard
23
Content validity
Spider phobia
Aspects of the construct Strength of fear
reaction Persistence of reaction Invariability of
reaction Recognition that reaction is
unreasonable Avoidance of spiders
Aspects assessed
X
24
Construct validity
  • Measuring things that are in our theory of a
    domain.
  • e.g. engine power propels car
  • A construct is a mechanism that is believed to
    account for some aspect of behaviour
  • working memory
  • trait introversion/extroversion
  • E.g., children's spelling ability in native
    language is correlated with learning of second
    language

25
Construct validity
  • The construct is sometimes called a latent
    variable
  • You cant directly observe the construct
  • You can only measure its surface manifestations

Construct (Latent variable)
Extroversion
Personality questionnaire
Measurement (Manifest variable)
Behavioural observation
26
Construct validity
  • Measuring construct validity
  • Convergent validity
  • Agrees with other measures of the same thing
  • Divergent validity
  • Does not agree with measures of different things
  • (Campbell Fiske, 1959)

Warwick spider phobia questionnaire positive
correlation with SPQ no correlation with BDI
27
Criterion validity
  • A test has high criterion validity if it
    correlates highly with some external benchmark
  • e.g. spelling test predict learning 2nd language
  • e.g. "Bishop/colonel" test might predict good
    cleaners

Concurrent validity Predictive validity
28
Criterion / predictive validity
  • Graphology for job selection
  • Candidate writes something Validity .18
  • But untrained graphologists, too
  • Candidate copies something
  • Validity none
  • Schmidt Hunter (1998) in Psychological
    Bulletin, 124, 262-274

29
Reliability and validity
  • Reliability limits validity
  • without reliability, there is no validity
  • Measures of validity cannot exceed measures of
    reliability
  • validity reliability

30
Replicability
  • Can the result be repeated?
  • Drachnik (1994)
  • 43 children abused 14 included tongues
  • 194 not abused only 2
  • d 1.44

31
Replicability
  • Does it replicate?
  • Chase (1987)
  • 34 abused, 26 not abused d 0.09
  • 2. Grobstein (1996)
  • 81 abused, 82 not abused d 0.08

32
Reliability in designed research
  • Use reliable measurement instruments
  • Standardized questionnaires
  • Accurate and reliable clocks
  • Repeat measurements
  • Many participants
  • Many trials
  • Eliminate (control) sources of noise
    irrelevant factors that randomly affect the
    outcome variable
  • Temperature
  • Time of day

33
Reliability in designed research
  • Eliminate (control) sources of noise
    irrelevant factors that randomly affect the
    outcome variable
  • Temperature
  • Time of day
  • Tip
  • Reduce irrelevant individual differences
  • e.g. test only female participants
  • test only a narrow age band
  • Why? reduces error variance, makes test more
    powerful
  • Cost? ability to generalise to other groups or
    situations is reduced

variance due to effect
F
_
error variance
34
Validity in designed research
  • Internal validity
  • Are there flaws in the design or method?
  • Can the study generate data that allows
    suitable conclusions to be drawn?
  • External validity
  • How well do the results carry over from sample
    to populations? How well do they generalise?

35
Lecture Overview
  • Ethical considerations in testing
  • Results can be used to make important decisions,
    is the test good enough to justify these?
  • Reliability
  • Test-retest internal consistency (Split-half)
  • Accuracy specificity sensitivity
  • Validity
  • Face, content, construct, criterion
  • Divergent convergent
  • Replicability
  • Reliability and validity in designed research
  • Internal and external validity

36
http//wilderdom.com/personality/L3-2EssentialsGoo
dPsychologicalTest.html
  • Standardization
  • Standardization Standardized tests are
  • administered under uniform conditions. i.e. no
    matter where, when, by whom or to whom it is
    given, the test is administered in a similar way.
  • scored objectively, i.e. the procedures for
    scoring the test are specified in detail so that
    ant number of trained scorers will arrive at the
    same score for the same set of responses. So for
    example, questions that need subjective
    evaluation (e.g. essay questions) are generally
    not included in standardized tests.
  • designed to measure relative performance. i.e.
    they are not designed to measure ABSOLUTE ability
    on a task. In order to measure relative
    performance, standardized tests are interpreted
    with reference to a comparable group of people,
    the standardization, or normative sample. e.g.
    Highest possible grade in a test is 100. Child
    scores 60 on a standardized achievement test. You
    may feel that the child has not demonstrated
    mastery of the material covered in the test
    (absolute ability) BUT if the average of the
    standardization sample was 55 the child has done
    quite well (RELATIVE performance).
  • The normative sample should (for hopefully
    obvious reasons!) be representative of the target
    population - however this is not always the case,
    thus norms and the structure of the test would
    need to interpreted with appropriate caution.
Write a Comment
User Comments (0)
About PowerShow.com