Title: Concepts to retain
 1Concepts to retain
- Level of measurement  (nominal, ordinal, 
 interval, ratio)
- Single- and multi-item measures (index, scale) 
- Response options (categorical vs. continuous) 
- Common response scales (Likert visual analogue, 
 semantic differential)
- Reliability (inter-rater, intra-rater, 
 test-retest)
- Measuring reliability ( agreement Kappa 
 coefficient ICC)
- Internal consistency (Cronbachs alpha) 
- Validity (content/face, criterion (concurrent 
 and   predictive) construct (discriminant and
 convergent)
- Biases (response bias recall bias, acquiescence 
 bias, social desirability)
- Responsiveness 
2True/False 
 3(No Transcript) 
 4Traditions of measurement theory
- Clinimetric - clinical, epidemiological (focus on 
 screening and diagnostic tests)
- Psychometric - psychology (focus on scales) 
5What do we measure in epidemiology?
- Health outcomes 
- Exposures, determinants, risk factors 
- Confounders 
- Effect modifiers 
- Objective is to maximize the validity of the 
 study results
6Sources of data
- Primary 
- Clinical observations 
- Questionnaires and interviews 
- Secondary 
- Reportable diseases, registries 
- Administrative databases (hospital 
 discharges,medication prescription)
- Vital statistics 
7Measurement
- Researcher must have a very clear idea of the 
-  - concept that needs to be measured 
-  - the type and amount of information needed 
 for analysis
-  - operational definition 
- A measure comprises 
-  - question(s)/item(s) (i.e., single vs. 
 multi-item index/scale)
-  - response options (- open vs. closed-ended 
 categorical vs. continuous)
-  - many options and many decisions need to be 
 made
8Criteria to select measure
- Appropriate to purpose (describe health evaluate 
 intervention compare groups predict outcome)
- Feasible 
- Respondent burden 
- Method of administration 
- - self-administered (in-person, mail) 
- - interviewer (face-to-face, telephone) 
- - informant or proxy 
- Cost 
- Acceptable 
- Simplicity 
- Parsimonious 
- Meaningful 
- Reliability 
- Validity 
- Responsiveness (sensitivity to change) 
9Single vs. multi-item measures
- Single item measures 
-  - used when underlying concept is simple and 
 easy to measure
- Multi-item measures (index) 
-  - sets of items measuring a latent construct 
-  - items interrelated with each more than with 
 items representing other latent variables
-  - items summed, averaged, weighted 
-  - sub-scales 
-  - Cronbach's alpha is a common test of whether 
 items are sufficiently interrelated to justify
 their combination in an index
-  - scale - ordinal index 
-  
10Single-item measure of nicotine dependence
- On a scale of 1 to 10, how addicted are you to 
 cigarettes?
11Fagerstrom Test for Nicotine Dependence 
- 1. How soon after you wake up do you smoke your 
 first cigarette?
-  - After 60 minutes (0) 
-  - 31-60 minutes (1) 
-  - 6-30 minutes (2) 
-  - Within 5 minutes (3) 
- 2. Do you find it difficult to refrain from 
 smoking in places where it is forbidden?
-  - No (0) 
-  - Yes (1) 
- 3. Which cigarette would you hate most to give 
 up?
-  - The first in the morning (1) 
-  - Any other (0) 
- 4. How many cigarettes per day do you smoke? 
-  - 10 or less (0) 
-  - 11-20 (1) 
-  - 21-30 (2) 
-  - 31 or more (3) 
- 5. Do you smoke more frequently during the first 
 hours after awakening than during the rest of the
 day?
-  - No (0) 
-  - Yes (1) 
- 6. Do you smoke even if you are so ill that you 
 are in bed most of the day?
-  - No (0) 
-  - Yes (1) 
12Choice of response options
- Open-ended What do you like most about the 
 epidemiology program at McGill?__________________
 _______
-  - useful in exploratory research 
-  - used to develop more structured 
 questions
-  - analysis time-consuming requires 
 qualitative methods
- Closed-ended What I like most about McGill is 
 the(choose one response)
-  (i) the teachers in 611 
-  (ii) the walk up the hill to Purvis in 
 the winter
-  (iii) the fascinating Monday seminars 
-  (iv) other 
-  - used more frequently 
-  - easier to analyze 
-  
13Choice of response options -categorical (discrete)
- Dichotomous, binary 
-  - two response categories 
-  - Are you able to climb stairs? (yes, 
 no)
- Polychotomous - multiple response categories 
-  - nominal - What is your marital status? 
 (single, married, divorced)
-  - ordinal - categorical data where there is a 
 logical ordering in the categories (Do you have
 difficulty walking? (0- no 1- some problems 2-
 confined to bed)
-  - can be analyzed as continuous 
 (pseudocontinuous)
- Disadvantages 
-  - need to make judgments 
-  - loss of information (precision) 
-  
-  
-  
14Choice of response options - continuous 
(quantitative)
- Interval scale 
-  - measures quantitative differences between 
 values of a variable
-  - equal distances between values 
-  - scores can be added and subtracted but not 
 multiplied or divided
-  - no 0 value (or it is hard to define) 
-  - intelligence, temperature, weight 
- Ratio scales 
-  - a numerical interval scale with a true 
 zero point
-  - a given size interval has the same 
 interpretation for the entire scale
-  - no. cigarettes/day no. nights spent in a 
 hospital
- Continuous measures can be categorized
15 Visual analog scale
-  A bipolar scale (absence vs. highest degree) 
 used to determine the degree of stimuli
 experienced, commonly used as a visual
 measurement of pain or stimuli.
-  To help people say how good or had their 
 health is, lets say the best state you can
 imagine is 100, and the worst if 0. In your
 opinion, how good or bad is you heath today?
 Please mark an X on the line below.
-  0___________________________________________
 _100
-  How severe is your arthritic pain been 
 today?
-  Pain as 
-  bad as 
-  can be_______________________________________
 __No pain
16Likert scale
-  Ordinal scales commonly used in attitudinal 
 measurements
-  Please circle the response that corresponds 
 best to your opinion. I am able to get up early
 enough in the morning to exercise before work.
-  1. Totally agree 
-  2. Agree 
-  3. No opinion 
-  4. Disagree 
-  5. Totally disagree
17 Semantic differential scale
- A technique for obtaining a value for subjective 
 response in which the subject is asked to denote
 the intensity of a stimulus by choosing a
 subdivision between two extremes
- My illness is 
- Painful ________________________Painless 
- Serious________________________Mild 
- Boring ________________________Interesting 
18Fagerstrom Test for Nicotine Dependence 
- 1. How soon after you wake up do you smoke your 
 first cigarette?
-  - After 60 minutes (0) 
-  - 31-60 minutes (1) 
-  - 6-30 minutes (2) 
-  - Within 5 minutes (3) 
- 2. Do you find it difficult to refrain from 
 smoking in places where it is forbidden?
-  - No (0) 
-  - Yes (1) 
- 3. Which cigarette would you hate most to give 
 up?
-  - The first in the morning (1) 
-  - Any other (0) 
- 4. How many cigarettes per day do you smoke? 
-  - 10 or less (0) 
-  - 11-20 (1) 
-  - 21-30 (2) 
-  - 31 or more (3) 
- 5. Do you smoke more frequently during the first 
 hours after awakening than during the rest of the
 day?
-  - No (0) 
-  - Yes (1) 
- 6. Do you smoke even if you are so ill that you 
 are in bed most of the day?
-  - No (0) 
-  - Yes (1) 
19Level of dependence on nicotine
- 0-2 Very low dependence 
- 3-4 Low dependence 
- 5 Medium dependence 
- 6-7 High dependence 
- 8-10 Very high dependence 
20Reliability
- Refers to the degree to which the results 
 obtained by a measurement procedure can be
 replicated
- Measures with low reliability will vary across 
 interviewers, time, method of administration
- Internal consistency 
- Reproducibility (stability) 
-  Test-retest reliability 
-  Inter-rater and intra-rater reliability 
-  
21Internal consistency
- Concept that is relevant to multi-item index 
- Inter-correlation between items of a scale that 
 are meant to measure different dimensions of the
 same construct
- Based on a single administration of an index 
- Scales with more items have higher internal 
 consistency
- Cronbachs alpha (psychometric property) 
-  - assesses the extent to which a set of items 
 can be treated as measuring a single latent
 variable
-  
22Measure of internal consistency
- Split-half reliability - correlation between 
 scores on arbitrary half of measure with scores
 on other half
-  
- Cronbachs alpha estimates split half correlation 
 for all possible combinations of dividing the
 scale
- May be used to reduce the number of items in a 
 scale
- Ranges between 0.0-1.0 
- Widely-accepted cut-off is that alpha should be 
 .70 or higher, some use .75 or .80 while others
 are as lenient as .60
23Use of the fagerstrom tolerance questionnaire for 
measuring nicotine dependence among adolescent 
smokers in China a pilot test.Chen X, Zheng H, 
Steve S, Gong J, Stacy A, Xia J, Gallaher P, Dent 
C, Azen S, Shan J, Unger JB, Johnson 
CA.Institute for Health Promotion and Disease 
Prevention Research, University of Southern 
California, USA. jim_chen_at_abtassoc.comThe 
validity of the Prokhorov adolescent version of 
the Fagerstrom Tolerance Questionnaire (FTQ) has 
not been demonstrated in assessing nicotine 
dependence among Chinese adolescents in China. 
Data for 48 tenth-grader 30-day smokers in Wuhan, 
China (ages 16-17 years), were analyzed. Two 
different item scoring protocols were used, and 
self-reports of smoking were validated with 
saliva cotinine. When items were scored using 
Protocol A, Cronbach's alphas were .42 and .63 
for the 7-item and the 4-item scales, 
respectively while using Protocol B, the alphas 
were .67 and .79 for the 7-item and 4-item 
scales, respectively. The total FTQ scores were 
significantly associated with self-reported 
smoking and saliva cotinine levels. These results 
support the reliability and validity of the 
Prokhorov FTQ. 
 24To measure reproducibility
- Need at least two administrations 
- Intra-rater - repeated measurements by the same 
 rater
- Inter-rater - two or more raters assess the same 
 measure
- Test-retest - measure is taken two or more times 
 under identical conditions
-  - for constructs that fluctuate, 2 weeks 
 often used to reduce effects of memory and true
 change
-  - some constructs should not fluctuate 
 (personality traits)
25To measures of reliability of categorical data
- Percent agreement 
-  - limitation value is affected by prevalence 
 - higher if very low or very high prevalence
-  
- Kappa statistic 
-  - takes chance agreement into account 
-  - defines fraction of observed agreement not due 
 to chance
-  - Kappa  p (obs)  p (exp) 
-  1  p (exp) 
-  p(obs) proportion of observed agreement 
-  p(exp) proportion of agreement expected 
 by chance
26(No Transcript) 
 27Interpretation of Kappa 
-  Range 0.0-1.0 
-  Excellent  0.75 
-  Fair to good 0.40 - 0.75 
-  Poor  0.40 
28To measures of reliability of continuous data
- Correlation coefficients measure pair-wise 
 comparison
- Pearsons r 
-  - assesses linear association between 2 sets of 
 observations
-  - sensitive to range of values, especially 
 outliers
- Spearman r 
-  - ordinal or rank order correlation 
-  - less influenced by outliers 
-  
29Intra-class correlation coefficient (ICC)
- Equivalent to kappa and same range of values 
 (0.0-1.0)
- Reflects true agreement, including systematic 
 differences
- Assesses reliability by comparing the variability 
 of different ratings of the same subject to the
 total variation across all ratings and all
 subjects.
- Estimates proportion of total measurement 
 variability due to between-individuals (vs error
 variance)
- Interpretation of ICC0.88 is that i.e.,88 of 
 that variation in the score relates to true
 variance between subjects
- Affected by range of values - if less variation 
 between individuals, ICC will be lower
30The Fagerström Test for Nicotine Dependence in a 
Dutch sample of daily smokers and ex-smokers J 
M. Vink , G Willemsen, A Beem, D Boomsma 
Abstract We explored the performance of the 
Fagerström Test for Nicotine Dependence (FTND) in 
a sample of 1378 daily smokers and 1058 
ex-smokers who participated in a survey study of 
the Netherlands Twin Register. FTND scores were 
higher for smokers than for ex-smokers. Nicotine 
dependence level was not associated with age. 
FTND score was highly correlated with the maximum 
number of cigarettes smoked (even after excluding 
the item number of cigarettes per day from 
FTND), but the FTND score showed a low 
correlation with age of first cigarette and total 
number of years smoked. In a subsample of smokers 
(n143) and ex-smokers (n181) the testretest 
correlations for the FTND were high. In general, 
the performance of the FTND in ex-smokers was 
comparable with that in smokers. These findings 
suggest the FTND to be a valuable tool for 
studies of nicotine dependence in large 
epidemiological samples. 
In the testretest sample, the mean FTND score of 
the first measurement was not significantly 
different from the mean FTND score at the second 
measurement occasion. The testretest 
correlations (PearsonLawly correction) were .70 
for male smokers, .83 for female smokers, .91 for 
male ex-smokers. and .83 for female ex-smokers. 
These correlations did not differ much from the 
regular Pearson ProductMoment Correlations (.72 
for male smokers, .85 for female smokers, .92 for 
male ex-smokers, and .86 for female ex-smokers). 
 31To improve reliability
- Increase the number of items in a scale 
- Increase the number of response choices for each 
 item
- Reduce inter-observer variation through training 
 of interviewers, use of standardized protocols
- Reduce ambiguity in questions 
32 Validity
- An expression of the degree to which a 
 measurement measures what it purports to measure.
 Does it measure what it is intended to?
- Types 
-  - Face, content 
-  - Criterion (concurrent (convergent) 
 predictive)
-  - Construct (discriminant convergent) 
-  - Responsiveness 
- Depends on purpose 
-  - Develop new scale - content 
-  - Screening discriminant construct validity 
-  - Outcome of treatment responsiveness, 
 sensitivity to change
-  - Prognosis predictive validity
33Content and face validity
- Judgment of experts and/or members of target 
 population
- Face validity  extent to which, on the face of 
 it, the measurement appears to be measuring the
 desired qualities (eyeball test)
- Content validity - extent to which the 
 measurement incorporates al the relevant content
 or domains of the construct under study
- Content can be developed through lit reviews, 
 interviews with target population, focus groups,
 review of existing instruments
34Criterion validity 
- Extent to which a measure correlates with an 
 external criterion (gold standard)
- Convergent (concurrent) criterion validity - 
 correlation between the measurement of interest
 and another measure known to measure the same
 concept. Both measures are taken at the same time
 
-  - 0.4-0.8 
-  - screening test vs. diagnostic test 
- Predictive criterion validity  ability of the 
 measure to predict the criterion
-  - cancer staging test vs 5-year survival
35Construct validity 
- Is the theoretical construct underlying the 
 measure valid?
- Development and testing of hypotheses 
- Requires multiple data sources and investigations 
- - convergent validity measure is correlated with 
 other measures of similar constructs (i.e., food
 frequency questionnaire and food records
 Fagerstrom correlates with saliva cotinine)
- - discriminant validity measure is not 
 correlated with measures of different constructs
 (i.e., Fagerstrom not correlated with depression)
36Table 1. Correlation among FTND scores of daily 
smokers and ex-smokers and other smoking 
variables                                       
                All correlation are 
significant at the Plt.05 level. 
 37Response bias
- Tendency to respond in a particular way or style 
 to items on a scale that yields systematic error
- Recall bias - systematic error due to the 
 differences in accuracy or completeness of recall
 to memory of past events or experiences
- Acquiescence bias - tendency to agree with 
 statements of opinions
- Social desirability - tendency to respond in a 
 way that is perceived to be more socially
 desirable than true response
38Factors affecting response 
- Question wording/response scale 
- Characteristics of subjects (age, sex, 
 education)
- Method of data collection (questionnaire, 
 interview, telephone vs face-to-face
- Training of interviewers 
39Responsiveness
- Ability of measure to detect clinically 
 important change over time or differences between
 treatments
- Sensitivity to change 
- Important when testing the effectiveness of an 
 interventions
40Translation
- Not an simple matter 
- Double back translation 
- Need to retest validity and reliability in target 
 population
41Ask yourself.
- How will you measure the outcome? Exposures? 
 Confounders?
- Are your measures reliable? In the population you 
 will target? How was reliability established?
- Is there any evidence that your measures are 
 valid? In the population you will target? How was
 validity established?
42True/False