Class 5 Additional Psychometric Characteristics: Validity and Bias, Responsiveness, Sensitivity to C - PowerPoint PPT Presentation

1 / 75

About This Presentation

Title:

Class 5 Additional Psychometric Characteristics: Validity and Bias, Responsiveness, Sensitivity to C

Description:

Do clinical assessment only on those who screen 'likely' ... Life Inventory ..., Med Care, 2001;39:800-812. 32. Types of Measurement ... positive personality ... – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 76

Provided by: ucsf4

Category:

more less

Transcript and Presenter's Notes

Title: Class 5 Additional Psychometric Characteristics: Validity and Bias, Responsiveness, Sensitivity to C

1
Class 5Additional Psychometric
Characteristics Validity and Bias,
Responsiveness, Sensitivity to Change October
16, 2008

Anita L. Stewart
Institute for Health Aging
University of California, San Francisco

2
Overview

Validity
Including bias
How bias affects validity
Responsiveness, sensitivity to change
Meaningfulness of change

3
Validity

Does a measure (or instrument) measure what it is
supposed to measure?
AndDoes a measure NOT measure what it is NOT
supposed to measure?

4
Valid Scale? No!

There is no such thing as a valid scale
We accumulate evidence of validity in a variety
of populations in which it has been tested
Similar to reliability

5
Validation of Measures is an Iterative, Lengthy
Process

Accumulation of evidence
Different samples
Longitudinal designs

6
Types of Measurement Validity

Content
Criterion
Construct
Convergent
Discriminant
Convergent/discriminant

All can be Concurrent Predictive
7
Content Validity

Relevant when writing items
Extent to which a set of items represents the
defined concept

8
Relevance of Content Validity to Selecting
Measures

Conceptual adequacy
Does candidate measure represent adequately the
concept YOU are intending to measure

9
Content Validity Appropriate at Two Levels

Battery or Are all relevant domainsinstrument
represented in an instrument?
Measure Are all aspects of a defined
concept represented in the items of a
scale?

10
Example of Content Validity of Instrument

You are studying health-related quality of life
(HRQL) in clinical depression
Your HRQL concept includes sleep problems,
ability to work, and social functioning
SF-36 - a candidate
Missing sleep problems

11
Types of Measurement Validity

Content
Criterion
Construct
Convergent
Discriminant
Convergent/discriminant

All can be Concurrent Predictive
12
Criterion Validity

How well a measure correlates with another
measure considered to be an accepted standard
(criterion)
Can be
Concurrent
Predictive

13
Criterion Validity of Self-reported Health Care
Utilization

Compare self-report with objective data
(computer records of utilization)
MD visits past 6 months (self-report)
correlated .64 with computer records
hospitalizations past 6 months (self-report)
correlated .74 with computer records

Ritter PL et al, J Clin Epid, 200154136-141
14
Criterion Validity of Screening Measure

Develop depression screening tool to identify
persons likely to have disorder
Do clinical assessment only on those who screen
likely
Criterion validity
Extent to which the screening tool detects
(predicts) those with disorder
sensitivity and specificity, ROC curves

15
Criterion Validity of Measure to Predict Outcome

If goal is to predict health or other outcome
Extent to which the measure predicts the outcome
Example Develop self-reported war-related stress
measure to identify vets at risk of PTSD
How well does it predict subsequent PTSD (Vogt et
al., 2004, readings)

16
Interpreting Validity Coefficients

Magnitude and conformity to hypothesis are
important, not statistical significance
Nunnally rarely exceed .30 to .40 which may be
adequate (1994, p. 99)
McDowell and Newell typically between 0.40 and
0.60 (1996, p. 36)
Max correlation between 2 measures square root
of product of reliabilities
2 scales with .70 reliabilities, max correlation
.70
Correlation of .60 would be high

17
Types of Measurement Validity

Content
Criterion
Construct
Convergent
Discriminant
Convergent/discriminant

All can be Concurrent Predictive
18
Construct Validity Basics

Does measure relate to other measures in
hypothesized ways?
Do measures behave as expected?
3-step process
State hypothesis direction and magnitude
Calculate correlations
Do results confirm hypothesis?

19
Source of Hypotheses in Construct Validity

Prior literature in which associations between
constructs have been observed
e.g., other samples, with other measures of
constructs you are testing
Theory, that specifies how constructs should be
related
Clinical experience

20
Who Tests for Validity?

When measure is being developed, investigators
should test construct validity
As measure is applied, results of other studies
provide information that can be used as evidence
of construct validity

21
Types of Measurement Validity

Content
Criterion
Construct
Convergent
Discriminant
Convergent/discriminant

All can be Concurrent Predictive
22
Convergent Validity

Hypotheses stated as expected direction and
magnitude of correlations
We expect X measure of depression to be
positively and moderately correlated with two
measures of psychosocial problems
The higher the depression, the higher the level
of problems on both measures

23
Testing Validity of Expectations Regarding Aging
Measure

Hypothesis 1 ERA-38 total score would correlate
moderately with ADLS, PCS, MCS, depression,
comorbidity, and age
Hypothesis 2 Functional independence scale would
show strongest associations with ADLs, PCS, and
comorbidity

Sarkisian CA et al. Gerontologist. 200242534
24
Testing Validity of Expectations Regarding Aging
Measure

Hypothesis 1 ERA-38 total score would correlate
moderately with ADLS, PCS, MCS, depression,
comorbidity, and age (convergent)
Hypothesis 2 Functional independence scale would
show strongest associations with ADLs, PCS, and
comorbidity

Sarkisian CA et al. Gerontologist. 200242534
25
ERA-38 Convergent Validity Results Hypothesis 1
26
ERA-38 Non-Supporting Convergent Validity Results
27
Types of Measurement Validity

Content
Criterion
Construct
Convergent
Discriminant
Convergent/discriminant

All can be Concurrent Predictive
28
Discriminant Validity Known Groups

Does the measure distinguish between groups
known to differ in concept being measured?
Tests for mean differences between groups

29
Example of a Known Groups Validity Hypothesis

Among three groups
General population
Patients visiting providers
Patients in a public health clinic
Hypothesis scores on functioning and well-being
measures will be the best in a general population
and the worst in patients in a public health
clinic

30
Mean Scores on MOS 20-item Short Form in Three
Groups

Public
General MOS health
population patients patients
Physical function 91 78 50
Role function 88 78 39
Mental health 78 73 59
Health perceptions 74 63 41
Bindman AB et al.,
Med Care 1990281142

31
PedsQL Known Groups Validity

Hypothesis PedsQL scores would be lower in
children with a chronic health condition than
without

JW Varni et al. PedsQL 4.0 Reliability and
Validity of the Pediatric Quality of Life
Inventory , Med Care, 200139800-812.
32
Types of Measurement Validity

Content
Criterion
Construct
Convergent
Discriminant
Convergent/discriminant

All can be Concurrent Predictive
33
Convergent/Discriminant Validity

Does measure correlate lower with measures it is
not expected to be related to than to measures
it is expected to be related to?
The extent to which the pattern of correlations
conforms to hypothesis is confirmation of
construct validity

34
Basis for Convergent/Discriminant Hypotheses

All measures of health will correlate to some
extent
Hypothesis is of relative magnitude

35
Example of Convergent/Discriminant Validity
Hypothesis

Expected pattern of relationships
A measure of physical functioning is
hypothesized to be more highly related to a
measure of mobility than to a measure of
depression

36
Example of Convergent/Discriminant Validity
Evidence

Pearson correlation
Mobility Depression
Physical functioning .57 .25

37
Testing Validity of Expectations Regarding Aging
Measure

Hypothesis 1 ERA-38 total score would correlate
moderately with ADLS, PCS, MCS, depression,
comorbidity, and age (convergent)
Hypothesis 2 Functional independence scale would
show strongest associations with ADLs, PCS, and
comorbidity (convergent/discriminant)

Sarkisian CA et al. Gerontologist. 200242534
38
ERA-38 Convergent/Discriminant Validity Results
Hypothesis 2
39
ERA-38 Non-Supporting Validity Results
40
Construct Validity Thoughts Lee Sechrest

There is no point at which construct validity is
established
It can only be established incrementally
Our attempts to measure constructs help us better
understand and revise these constructs

Sechrest L, Health Serv Res, 200540(5 part II),
1596
41
Construct Validity Thoughts Lee Sechrest (cont)

An impression of construct validity emerges from
examining a variety of empirical results that
together make a compelling case for the assertion
of construct validity

42
Construct Validity Thoughts Lee Sechrest (cont)

Because of the wide range of constructs in the
social sciences, many of which cannot be exactly
defined..
once measures are developed and in use, we must
continue efforts to understand them and their
relationships to other measured variables.

43
Overview

Validity
Including bias
Responsiveness, sensitivity to change
Meaningfulness of change

44
Components of an Individuals Observed Item Score
(from Class 3)

Observed true item
score score

random systematic
error

45
Random versus Systematic Error

Observed true item
score score

Relevant to reliability
random systematic
error

Relevant to validity
46
Bias is Systematic Error

Affects validity of scores
If scores contain systematic error, cannot know
the true mean score
Will obtain an observed score that is either
systematically higher or lower than the true
score

47
Bias or Systematic Error?

Bias implies that the direction of error known
Systematic error direction neutral
Same error applies to entire sample

48
Sources of Bias in Observed Scores of
Individuals

Respondent
Socially desirable responding
Acquiescent response bias
Cultural beliefs (e.g., not reporting distress)
Halo affects
Observer
Belief that respondent is ill
Instrument

49
Socially Desirable Responding

Tendency to respond in socially desirable ways to
present oneself favorably
Observed score is consistently lower or higher
than true score in the direction of a more
socially acceptable score

50
Socially Desirable Response Set Looking good

After coming up with an answer to a question,
respondent screens the answer
Will this make the person like me less?
May edit their answer to be more desirable
Example a woman has 2 drinks of alcohol a day,
but responds that she drinks a few times a week
Systematic underreporting of risk behavior

51
Ways to Minimize Socially Desirable Responding

Write items to increase acceptability of an
undesirable response
Instead of
Have you followed your doctors
recommendations?
Use
Have you had any of the following problems
following your doctors recommendations?

52
Example of Bias Due to Cultural Norms or Beliefs

A person feels sad most of the time
Unwilling to admit this to the interviewer so
answers a little of the time
Not culturally appropriate to admit to negative
feelings
Always present a positive personality
Observed response reflects less sadness than
true sadness of respondent

53
Acquiescent Response Set - Yea Saying

Tendency to
agree with statements regardless of content
give positive response such as yes, true,
satisfied
Extent and nature of bias depends on direction of
wording of the questions
Minimizing acquiescence
Include positively- and negatively-worded items
in the same scale

54
Discrepancies in Various Information Sources
Bias or Different Perspectives?

In reporting on a patients well-being
Patients report highest levels
Clinicians report levels in the middle
Family members report the lowest levels
No way to know which is the true score
to say one score is biased implies another one
is the true score

55
Overview

Validity
Including bias
Responsiveness, sensitivity to change
Meaningfulness of change

56
Two Meanings of Sensitivity and Responsiveness
to Change

Measure able to detect true changes
One knows how much change is meaningful
regardless of statistical significance
change scores are interpretable in terms of
meaningfulness

57
Sensitivity to Change Detects True Change

Sensitive to true differences or changes in the
attribute being measured
Sensitive enough to measure differences in
outcomes that might be expected given the
relative effectiveness of treatments
Ability of a measure to detect change
statistically

58
Instrument has Potential Distribution of Scores
to Detect Change

Evidence of good variability in sample like yours
(at baseline)
Room to improve
Multi-item scales many scale levels

59
Importance of Sensitivity

Need to know measure can detect change if
planning to use it as outcome of intervention
Approaches for testing sensitivity are often
simultaneous tests of
effectiveness of an intervention
sensitivity or responsiveness of measures

60
Considerations in Developing CHAMPS Physical
Activity (PA) Questionnaire

Needed outcome measure to detect changes in PA
due to CHAMPS intervention
increase PA levels in everyday life (e.g.,
walking, stretching) in activities of their
choice
Existing measures designed to capture younger
persons PA

Stewart AL et al. Med Sci Sports Exerc,
2001331126-1141.
61
Changes in Measure Resulting from Intervention
Validity Evidence for Others

After intervention detected PA change, others
used our results as evidence of sensitivity to
change
Used in Project ACTIVE because of its
sensitivity to change in CHAMPS (S Wilcox et al,
Am J Pub Health, 2006961201-1209)
Changes in a depression measure in a drug trial
is evidence that the measure is capable of
detecting change in another study

62
Measuring Sensitivity

Score is stable in those who are not changing
Score changes in those who are actually changing
(true change)
Not easy to quantify
can administer multiple measures of same concept
in intervention
see which measures change the most

63
Responsiveness to Change

Used DSM-IV criteria to classify patients who had
major depression at earlier time into
Persistent depression
Partial remission
Full remission
Examined PHQ-9 change scores in relation to these
criteria
PHQ-9 a short screener for depression

Löwe B et al. Med Care, 2004421194-1201
64
Changes in PHQ-9 Scores by Criteria of Change in
Depression
Löwe et al, 2004, p. 1200
65
Relevant or Meaningful Change