Introduction to measurement - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Introduction to measurement

Description:

To introduce the concepts of reliability (inter- and intra-rater reliability; ... Epidemiologists look for variability (want to explain variability) ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 43
Provided by: janemc1
Category:

less

Transcript and Presenter's Notes

Title: Introduction to measurement


1
  • LECTURE 5 - PRINCIPLES OF MEASUREMENT I
  • Objectives
  • To review levels of measurement  (nominal,
    ordinal, interval, ratio)
  • To introduce the concepts of reliability (inter-
    and intra-rater reliability test-retest
    reliability) and validity (content/face validity,
    criterion validity (concurrent, predictive)
    construct validity (discriminant, convergent)
  • To overview methods of measuring reliability (
    agreement Kappa ICC )
  • To review the kinds of biases that affect
    measurement (response bias social desirability)
  • To understand the concept of responsiveness
  • Required Reading
  • Streiner DL, Norman GR. Health Measurement
    Scales. A Practical Guide to Their Development
    and Use. Third Edition. New York Oxford
    University Press, 2004. Chapter 8, 10.
  • Optional Reading
  • Foltynie T, Matthews F, Ishihara L, Brayne C.The
    frequency and validity of self-reported diagnosis
    of Parkinson's Disease in the UK elderly MRC
    CFAS Cohort. BMC Neurology 2006, 629.
  • Miyashita M, Yamaguchi A, Kayama M, Narita Y,
    Kawada N, Akiyama M, Hagiwara A, Suzukamo Y,
    Fukuhara S. Validation of the Burden Index of
    Caregivers (BIC), a multidimensional short care
    burden scale from Japan Health and Quality of
    Life Outcomes 2006, 452

2
Context
  • Epidemiologists look for variability (want to
    explain variability)
  • Must reflect on measurement in the context of
    variability
  • If there is little variability, very difficult to
    detect differences (age of students and insulin
    resistance) must have highly valid and reliable
    measures
  • If there a lot of variability it will be easier
    to pick up differences (crp and cigarette use in
    adolescents)
  • Measurement (properties such as validity and
    reliability) are context dependent so that they
    are not stable characteristics of a measure.
  • We should reflect on these issues in each and
    every measurement situation
  • Definitions vary

3
Concepts to retain
  • Level of measurement  (nominal, ordinal,
    interval, ratio)
  • Single- and multi-item measures (index, scale)
  • Response options (categorical vs. continuous)
  • Common response scales (Likert visual analogue,
    semantic differential) 
  • Reliability (inter-rater, intra-rater,
    test-retest)
  • Measuring reliability ( agreement Kappa
    coefficient ICC)
  • Internal consistency (Cronbachs alpha)
  • Validity (content/face, criterion (concurrent
    and   predictive) construct (discriminant and
    convergent)
  • Biases (response bias recall bias, acquiescence
    bias, social desirability)
  • Responsiveness

4
True/False
5
Traditions of measurement theory
  • Clinimetric - clinical, epidemiological (focus on
    screening and diagnostic tests)
  • Psychometric - psychology (focus on scales)

6
What do we measure in epidemiology?
  • Outcomes (health, disability, mortality,
    behavior, satisfaction)
  • Exposures, determinants, correlates, risk factors
  • Intervening variables
  • Confounders
  • Effect modifiers
  • Objective is to maximize the validity of the
    study results

7
Sources of data
  • Primary
  • Clinical observations
  • Questionnaires and interviews
  • Secondary
  • Reportable diseases, registries
  • Administrative databases (hospital discharges,
    medication prescriptions)
  • Hospital charts
  • Vital statistics

8
Measurement
  • Researcher must have a very clear idea of the
  • - concept that needs to be measured
  • - the type and amount of information needed
    for analysis and to make inferences
  • - operational definition
  • A measure comprises
  • - question(s)/item(s) (single vs. multi-item
    index/scale)
  • - response options (- open vs. closed-ended
    categorical vs. continuous)
  • - many options and many decisions need to be
    made

9
Criteria to select measure
  • Appropriate to purpose (describe health evaluate
    intervention compare groups predict outcome)
  • Feasible
  • Respondent burden
  • Method of administration
  • - self-administered (in-person, mail)
  • - interviewer (face-to-face, telephone)
  • - informant or proxy
  • Cost
  • Acceptable
  • Simplicity
  • Parsimonious
  • Meaningful
  • Reliability
  • Validity
  • Responsiveness (sensitivity to change)

10
Single vs. Multi-item Measures

11
Single-item measures
  • Used when underlying concept is simple and easy
    to measure
  • Examples of simple concepts
  • How old are you?
  • What color are your eyes?
  • What is your date birth?
  • Not so simple
  • How good is your diet?
  • What is your ethnicity?
  • On a scale of 1 to 10, how addicted are you to
    cigarettes?
  • Does the medication work?
  • How satisfied are you with your physician?

12
Multi-item measures (index)
  • - Sets of items measuring a latent construct
  • - Items interrelated with each more than with
    items representing other latent variables
  • - Cronbach's alpha is a common test of whether
    items are sufficiently interrelated to justify
    their combination in an index
  • - Items summed, averaged, weighted
  • - Sub-scales
  • - Scale - ordinal index

13
Fagerstrom Test for Nicotine Dependence
  • 1. How soon after you wake up do you smoke your
    first cigarette?
  • - After 60 minutes (0)
  • - 31-60 minutes (1)
  • - 6-30 minutes (2)
  • - Within 5 minutes (3)
  • 2. Do you find it difficult to refrain from
    smoking in places where it is forbidden?
  • - No (0)
  • - Yes (1)
  • 3. Which cigarette would you hate most to give
    up?
  • - The first in the morning (1)
  • - Any other (0)
  • 4. How many cigarettes per day do you smoke?
  • - 10 or less (0)
  • - 11-20 (1)
  • - 21-30 (2)
  • - 31 or more (3)
  • 5. Do you smoke more frequently during the first
    hours after awakening than during the rest of the
    day?
  • - No (0)
  • - Yes (1)
  • 6. Do you smoke even if you are so ill that you
    are in bed most of the day?
  • - No (0)
  • - Yes (1)

14
Choice of Response Options
15
Open- vs. Closed Ended
  • Open-ended What do you like most about the
    epidemiology program at McGill?__________________
    _______
  • - useful in exploratory research
  • - used to develop more structured
    questions
  • - analysis time-consuming (requires
    qualitative methods)
  • Closed-ended What I like most about McGill is
    the(choose one response)
  • (i) the teachers in 611
  • (ii) the walk up the hill to Purvis in
    the winter
  • (iii) the fascinating Monday seminars
  • (iv) other
  • - used more frequently
  • - easier to analyze

16
Categorical (Discrete)
  • Dichotomous, binary
  • - two response categories
  • - Are you able to climb stairs? (yes,
    no)
  • Polychotomous - multiple response categories
  • - nominal - What is your marital status?
    (single, married, divorced)
  • - ordinal - categorical data where there is a
    logical ordering in the categories (Do you
    have difficulty walking? (0 - no 1 - some
    problems 2 - confined to bed)
  • - can be analyzed as continuous
    (pseudo-continuous)
  • - disadvantages
  • - need to make judgments
  • - not clear if the distance
    between categories is equivalent
  • - loss of information
    (precision)

17
Continuous (Quantitative)
  • Interval scale
  • - measures quantitative differences between
    values of a variable
  • - equal distances between values
  • - scores can be added and subtracted but not
    multiplied or divided
  • - no 0 value (or it is hard to define)
  • - intelligence, temperature, weight
  • Ratio scale
  • - a numerical interval scale with a true
    zero point
  • - a given size interval has the same
    interpretation for the entire scale
  • - no. cigarettes/day no. nights spent in a
    hospital
  • Continuous measures can be categorized

18
Common Types of Scales
19
Likert scale
  • Ordinal scales commonly used in attitudinal
    measurements
  • Please circle the response that corresponds
    best to your opinion. I am able to get up early
    enough in the morning to exercise before work.
  • 1. Totally agree
  • 2. Agree
  • 3. No opinion
  • 4. Disagree
  • 5. Totally disagree

20
Visual analog scale
  • A bipolar scale (absence vs. highest degree)
    used to determine the degree of stimuli
    experienced, commonly used as a visual
    measurement of pain or stimuli.
  • To help people say how good or had their
    health is, lets say the best state you can
    imagine is 100, and the worst if 0. In your
    opinion, how good or bad is you heath today?
    Please mark an X on the line below.
  • 0___________________________________________
    _100
  • How severe is your arthritic pain been
    today?
  • Pain as
  • bad as
  • can be_______________________________________
    __No pain

21
Semantic differential scale
  • A technique for obtaining a value for
    subjective response in which the subject is asked
    to denote the intensity of a stimulus by choosing
    a subdivision between two extremes
  • My illness is
  • Painful ________________________Painless
  • Serious________________________Mild
  • Boring ________________________Interesting
  • Costly ________________________Not costly

22
Reliability
  • Refers to the degree to which the results
    obtained by a measurement procedure can be
    replicated
  • Reliability of a measure can vary across
    situations
  • Measures with low reliability will vary across
    interviewers, time, method of administration
  • Internal consistency
  • Reproducibility (stability)
  • Test-retest reliability
  • Inter-rater and intra-rater reliability

23
Internal Consistency
  • Concept that is relevant to multi-item index
  • Inter-correlation between items of a scale that
    are meant to measure different dimensions of the
    same construct
  • Based on a single administration of an index
  • Scales with more items have higher internal
    consistency
  • Cronbachs alpha (psychometric property)
  • - assesses the extent to which a set of items
    can be treated as measuring a single latent
    variable

24
Measure of Internal Reliability (Consistency)
  • Split-half reliability - correlation between
    scores on arbitrary half of measure with scores
    on other half
  • Cronbachs alpha estimates split half correlation
    for all possible combinations of dividing the
    scale
  • May be used to reduce the number of items in a
    scale
  • Ranges between 0.0-1.0
  • Widely-accepted cut-off is that alpha should be
    .70 or higher, some use .75 or .80 while others
    are as lenient as .60

25
Chen et al. Use of the Fagerstrom tolerance
questionnaire for measuring nicotine dependence
among adolescent smokers in China a pilot
test.Institute for Health Promotion and Disease
Prevention Research, University of Southern
California, USA. jim_chen_at_abtassoc.comThe
validity of the Prokhorov adolescent version of
the Fagerstrom Tolerance Questionnaire (FTQ) has
not been demonstrated in assessing nicotine
dependence among Chinese adolescents in China.
Data for 48 tenth-grader 30-day smokers in Wuhan,
China (ages 16-17 years), were analyzed. Two
different item scoring protocols were used, and
self-reports of smoking were validated with
saliva cotinine. When items were scored using
Protocol A, Cronbach's alphas were .42 and .63
for the 7-item and the 4-item scales,
respectively while using Protocol B, the alphas
were .67 and .79 for the 7-item and 4-item
scales, respectively. The total FTQ scores were
significantly associated with self-reported
smoking and saliva cotinine levels. These results
support the reliability and validity of the
Prokhorov FTQ.
26
To measure test retest, inter-, intra-rater
reproducibility
  • Need at least two administrations
  • Intra-rater - repeated measurements by the same
    rater
  • Inter-rater - two or more raters assess the same
    measure
  • Test-retest - measure is taken two or more times
    under identical conditions
  • - for constructs that fluctuate, 2 weeks
    often used to reduce effects of memory and true
    change
  • - some constructs should not fluctuate
    (personality traits)

27
To measures of reliability of categorical data
  • Percent agreement
  • - limitation value is affected by prevalence
    - higher if very low or very high prevalence
  • Kappa statistic
  • - takes chance agreement into account
  • - defines fraction of observed agreement not due
    to chance
  • - Kappa p (obs) p (exp)
  • 1 p (exp)
  • - Where
  • p(obs) proportion of observed agreement
  • p(exp) proportion of agreement expected
    by chance

28
(No Transcript)
29
Interpretation of Kappa
  • Range 0.0-1.0
  • Excellent 0.75
  • Fair to good 0.40 - 0.75
  • Poor 0.40

30
To measures of reliability of continuous data
  • Correlation coefficients measure pair-wise
    comparison
  • Pearsons r
  • - assesses linear association between 2 sets of
    observations
  • - cumbersome when there are more than two sets
    of observations
  • - sensitive to range of values, especially
    outliers
  • Spearman r
  • - ordinal or rank order correlation
  • - less influenced by outliers

31
Intra-class correlation coefficient (ICC)
  • Equivalent to kappa and same range of values
    (0.0-1.0)
  • Assesses reliability by comparing the variability
    of different ratings of the same subject to the
    total variation across all ratings and all
    subjects.
  • Estimates proportion of total measurement
    variability due to between-individuals (vs error
    variance)
  • Interpretation of ICC0.88 is that i.e.,88 of
    that variation in the score relates to true
    variance between subjects (reflects true
    agreement, including systematic differences)
  • Affected by range of values - if less variation
    between individuals, ICC will be lower

32
To improve reliability
  • Increase the number of items in a scale
  • Increase the number of response choices for each
    item
  • Reduce inter-observer variation through training
    of interviewers, use of standardized protocols
  • Reduce ambiguity in questions

33
Validity
  • An expression of the degree to which a
    measurement measures what it purports to measure.
    Does it measure what it is intended to?
  • Types
  • - Face, content
  • - Criterion (concurrent predictive)
  • - Construct (discriminant convergent)
  • - Responsiveness
  • Depends on purpose
  • - Develop new scale - content
  • - Screening discriminant construct validity
  • - Outcome of treatment responsiveness,
    sensitivity to change
  • - Prognosis predictive validity

34
Content and face validity
  • Judgment of experts and/or members of target
    population
  • Face validity extent to which, on the face of
    it, the measurement appears to be measuring the
    desired qualities (eyeball test)
  • Content validity - extent to which the
    measurement incorporates all the relevant content
    or domains of the construct under study
  • Content can be developed through lit reviews,
    interviews with target population, focus groups,
    review of existing instruments

35
Criterion validity
  • Extent to which a measure correlates with an
    external criterion (gold standard)
  • Convergent (concurrent) criterion validity -
    correlation between the measurement of interest
    and another measure known to measure the same
    concept. Both measures are taken at the same time
  • - 0.4-0.8
  • - screening test vs. diagnostic test
  • Predictive criterion validity ability of the
    measure to predict the criterion
  • - cancer staging test vs 5-year survival

36
Construct validity
  • Is the theoretical construct underlying the
    measure valid?
  • Development and testing of hypotheses
  • Requires multiple data sources and ongoing
    investigation
  • - convergent validity measure is correlated with
    other measures of similar constructs (i.e., food
    frequency questionnaire and food records
    Fagerstrom correlates with saliva cotinine)
  • - discriminant validity measure is not
    correlated with measures of different constructs
    (i.e., Fagerstrom not correlated with depression)

37
Response bias
  • Tendency to respond in a particular way or style
    to items on a scale that yields systematic error
  • Recall bias - systematic error due to the
    differences in accuracy or completeness of recall
    to memory of past events or experiences
  • Acquiescence bias - tendency to agree with
    statements of opinions
  • Social desirability - tendency to respond in a
    way that is perceived to be more socially
    desirable than true response

38
Factors affecting response
  • Question wording/response scale
  • Characteristics of subjects (age, sex,
    education)
  • Method of data collection (questionnaire,
    interview, telephone vs face-to-face)
  • Training of interviewers

39
Responsiveness
  • Ability of measure to detect clinically
    important change over time or differences between
    treatments
  • Sensitivity to change ability to detect any
    change
  • Important when testing the effectiveness of an
    interventions

40
Translation
  • Not an simple matter
  • Double back translation
  • Need to retest validity and reliability in target
    population

41
True/False
42
Ask yourself.
  • How will you measure the outcome? Exposures?
    Confounders?
  • Are your measures reliable? In the population you
    will target? How was reliability established?
  • Is there any evidence that your measures are
    valid? In the population you will target? How was
    validity established?
Write a Comment
User Comments (0)
About PowerShow.com