Title: Assessing Intervention Fidelity in RCTs: Models, Methods and Modes of Analysis
1Assessing Intervention Fidelity in RCTs Models,
Methods and Modes of Analysis
- David S. Cordray Chris Hulleman
- Vanderbilt University
- Presentation for the IES Research Conference
- Washington, DC
- June 9, 2009
2Overview
- Fidelity and Achieved Relative Strength
- Definitions, distinctions and illustrations
- Conceptual foundation for assessing fidelity in
RCTs - Achieved relative strength, a special case in
RCTs - Modes of analysis
- Approaches and challenges
- Chris Hulleman -- Assessing implementation
fidelity and achieved relative strength indices
The single core component case - Questions and discussion
3Distinguishing Implementation Assessment from the
Assessment of Implementation Fidelity
- Two ends on a continuum of intervention
implementation/fidelity - A purely descriptive model
- Answering the question What transpired as the
intervention was put in place (implemented). - Based on a priori intervention model, with
explicit expectations about implementation of
program components - Fidelity is the extent to which the realized
intervention (tTx) is faithful to the pre-stated
intervention model (TTx ) - Infidelity TTx tTx
- Most implementation fidelity assessments involve
descriptive and model-based approaches.
4Dimensions Intervention Fidelity
- Aside from agreement at the extremes, little
consensus on what is meant by the term
intervention fidelity. - Most frequent definitions
- True Fidelity Adherence or compliance
- Program components are delivered/used/received,
as prescribed - With a stated criteria for success or full
adherence - The specification of these criteria is relatively
rare - Intervention Exposure
- Amount of program content, processes, activities
delivered/received by all participants (aka,
receipt, responsiveness) - This notion is most prevalent
- Intervention Differentiation
- The unique features of the intervention are
distinguishable from other programs, including
the control condition - A unique application within RCTs
5Linking Intervention Fidelity Assessment to
Contemporary Models of Causality
- Rubins Causal Model
- True causal effect of X is (YiTx YiC)
- RCT methodology is the best approximation to this
true effect - In RCTs, the difference between conditions, on
average, is the causal effect - Fidelity assessment within RCTs entails examining
the difference between causal components in the
intervention and control conditions. - Differencing causal conditions can be
characterized as achieved relative strength of
the contrast. - Achieved Relative Strength (ARS) tTx tC
- ARS is a default index of fidelity
6Expected Relative Strength (0.40-0.15) 0.25
7Why is this Important?
- Statistical Conclusion validity
- Unreliability of Treatment Implementation
Variations across participants in the delivery
receipt of the causal variable (e.g., treatment).
Increases error and reduces the size of the
effect decreases chances of detecting
covariation. - Resulting in a reduction in statistical power or
the need for a larger study. -
8The Effects Structural Infidelity on Power
.60
.80
1.0
Fidelity
9Influence of Infidelity on Study-size
1.0
.80
.60
Fidelity
10If That Isnt Enough.
- Construct Validity
- Which is the cause? (TTx - TC) or (tTx tC)
- Poor implementation essential elements of the
treatment are incompletely implemented. - Contamination The essential elements of the
treatment group are found in the control
condition (to varying degrees). - Pre-existing similarities between T and C on
intervention components. - External validity generalization is about (tTx
- tC) - This difference needs to be known for proper
generalization and future specification of the
intervention components -
11So what is the cause? The achieved relative
difference in conditions across components
12 TTX TTx
.45 .40 .35 .30 .25 .20 .15 .10 .05 .00
Intervention Exposure
Positive Infidelity
100 90 85 80 75 70 65 60 55 50
True Fidelity
Intervention Differentiation
Achieved Relative Strength .15
TC
Tx Contamination Augmentation of C Intervention
Exposure
Treatment Strength
Outcome
Review Concepts and Definitions
13Some Sources and Types of Infidelity
- If delivery or receipt could be dichotomized (yes
or no) - Simple fidelity involves compliers
- Simple infidelity involves No shows and
cross-overs. - Structural flaws in implementing the
intervention - Missing or incomplete resources, processes
- External constraints (e.g. snow days)
- Incomplete delivery of core intervention
components - Implementer failures or incomplete delivery
14A Tutoring Program Variation in Exposure
4-5 tutoring sessions per week, 25 minutes each,
11weeks Expectations 44-55 sessions
Random Assignment of Students
Time ?
Cycle 1 47.7 16-56
Cycle 2 33.1 12-42
Cycle 3 31.6 16-44
Average Sessions Delivered Range
15Variation in Exposure Tutor Effects
The other fidelity question How faithful to the
tutoring model is each tutor?
16In Practice.
- Identify core components in the intervention
group - e.g., via a Model of Change
- Establish bench marks (if possible) for TTX and
TC - Measure core components to derive tTx and tC
- e.g., via a Logic model based on Model of
Change - Measurement (deriving indicators)
- Converted to Achieved Relative Strength and
implementation fidelity scales - Incorporated into the analysis of effects
17What do we measure?
- What are the options?
- (1) Essential or core components (activities,
processes) - (2) Necessary, but not unique, activities,
processes and structures (supporting the
essential components of T) and - (3) Ordinary features of the setting (shared
with the control group) - Focus on 1 and 2.
18Fidelity Assessment Starts With a Model or
Framework for the Intervention
From Gamse et al. 2008
19Core Reading Components for Local Reading First
Programs
Design and Implementation of Research-Based
Reading Programs
Use of research-based reading programs,
instructional materials, and assessment, as
articulated in the LEA/school application
1)Teacher use of instructional strategies and
content based on five essential components of
reading instruction 2) Use of assessments to
diagnose student needs and measure progress 3)
Classroom organization and supplemental services
and materials that support five essential
components
Teacher professional development in the use of
materials and instructional approaches
After Gamse et al. 2008
20From Major Components to Indicators
Indicators
Major Components
Sub-components
Facets
Scheduled block?
Block
Instructional Time
Actual Time
Reported time
Instructional Material
Reading Instruction
Instructional Activities/Strategies
Support for Struggling Readers
Assessment
Professional Development
21Reading First Implementation Specifying
Components and Operationalization
Components Sub-components Facets Indicators (I/F)
Reading Instruction Instructional Time 2 2 (1)
Reading Instruction Instructional Materials 4 12 (3)
Reading Instruction Instructional Activities /Strategies 8 28 (3.5)
Support for Struggling Readers (SR) Intervention Services 3 12 (4)
Support for Struggling Readers (SR) Supports for Struggling Readers 2 16 (8)
Support for Struggling Readers (SR) Supports for ELL/SPED 2 5 (2.5)
Assessment Selection/Interpretation 5 12 (2.4)
Assessment Types of Assessment 3 9 (3)
Assessment Use by Teachers 1 7 (7)
Professional development Improved Reading Instruction 11 67 (6.1)
4 10 41 170 (4)
Adapted from Moss et al. 2008
22Reading First Implementation Some Results
Components Sub-components Performance Levels Performance Levels ARSI (U3)
Components Sub-components RF Non-RF ARSI (U3)
Reading Instruction Instructional Time (minutes) 101 78 0.33 (63)
Reading Instruction Support 79 58 0.50 (69)
Struggling Readers More Tx, Time, Supplemental Service 83 74 0.20 (58)
Professional Development Hours of PD 41.5 17.6 0.42 (66)
Professional Development Five reading dimensions 86 62 0.55 (71)
Assessment Grouping, progress, needs 84 71 0.32 (63)
0.39 (65)
Adapted from Moss et al. 2008
23So What Do I Do With All This Data?
- Start with
- Scale construction, aggregation over facets,
sub-components, components - Use as
- Descriptive analyses
- Explanatory (AKA exploratory) analyses
- There are a lot of options
- In this section we describe a hierarchy of
analyses, higher to lower levels of causal
inference - Caveat Except for descriptive analyses, most
approaches are relative new and not fully tested.
24Hierarchy of Approaches to Analysis
- ITT (Intent-to-treat) estimates (e.g., ES) plus
- an index of true fidelity
- ES.50 Fidelity 96
- an index of Achieved Relative Strength (ARS).
- Hullemans initial analysis ES0.45, ARS0.92.
- LATE (Local Average Treatment Effect)
- If treatment receipt/delivery can be meaningfully
dichotomized and there is experimentally induced
receipt or non-receipt of treatment - adjust ITT estimate by T and C treatment receipt
rates. - Simple model can be extended to an Instrumental
Variable Analysis (see Blooms 2005 book). - ITT retains causal status LATE can approximate
causal statements.
25More on Fidelity to Outcome Linkages
- TOT (Treatment-on-Treated)
- Simple ITT estimate adjusted for compliance rate
in Tx, no randomization. - Two-level linear production function, modeling
the effects of implementation factors in Tx and
modeling factors affecting C in separate Level 2
equations. - Regression-based model, exchanging implementation
fidelity scales for treatment exposure variable.
26Descriptive Analyses
- Fidelity is often examined in the intervention
group, only. - Dose-response relationship
- Partition intervention sites into high and
low implementation fidelity - My review of some ATOD prevention studies, the
- ESHIGH 0.13 to 0.18
- ESLOW 0.00 to 0.03
-
27Some Challenges
- Interventions are rarely clear
- Measurement involves novel constructs
- How should components be weighted? If at all.
- Fidelity assessment occurs at multiple levels
- Fidelity indicators are used in 2nd and 3rd
levels of HLM models, few degrees of freedom - There is uncertainty about the psychometric
properties of fidelity indicators and - Functional form of fidelity and outcome measures
is not always known. - But, despite these challenges, Chris Hulleman has
a dandy example
28Assessing Implementation Fidelity in the Lab and
in Classrooms The Case of a Motivation
Intervention
29The Theory of Change
INTEREST
PERCEIVED UTILITY VALUE
MANIPULATED RELEVANCE
PERFORMANCE
Model Adapted from Eccles et al. (1983)
Hulleman et al. (2009)
30Methods(Hulleman Cordray, 2009)
Laboratory Classroom
Sample N 107 undergraduates N 182 ninth-graders 13 classes 8 teachers 3 high schools
Task Mental Multiplication Technique Biology, Physical Science, Physics
Treatment manipulation Write about how the mental math technique is relevant to your life. Pick a topic from science class and write about how it relates to your life.
Control manipulation Write a description of a picture from the learning notebook. Pick a topic from science class and write a summary of what you have learned.
Number of manipulations 1 2 8
Length of Study 1 hour 1 semester
Dependent Variable Perceived Utility Value Perceived Utility Value
31Motivational Outcome
?
g 0.05 (p .67)
32Fidelity Measurement and Achieved Relative
Strength
- Simple intervention one core component
- Intervention fidelity
- Exposure quality of participant responsiveness
- Rated on scale from 0 (none) to 3 (high)
- 2 independent raters, 88 agreement
33Exposure
Laboratory Laboratory Laboratory Laboratory Classroom Classroom Classroom Classroom
C C Tx Tx C C Tx Tx
Quality of Responsiveness N N N N
0 47 100 7 11 86 96 38 41
1 0 0 15 24 4 4 40 43
2 0 0 29 46 0 0 14 15
3 0 0 12 19 0 0 0 0
Total 47 100 63 100 90 100 92 100
Mean 0.00 0.00 1.73 1.73 0.04 0.04 0.74 0.74
SD 0.00 0.00 0.90 0.90 0.21 0.21 0.71 0.71
34Indexing Fidelity
- Absolute
- Compare observed fidelity (tTx) to absolute or
maximum level of fidelity (TTx) - Average
- Mean levels of observed fidelity (tTx)
- Binary
- Yes/No treatment receipt based on fidelity scores
- Requires selection of cut-off value
35Fidelity Indices
Conceptual Laboratory Classroom
Absolute Tx
C
Average Tx 1.73 0.74
C 0.00 0.04
Binary Tx
C
36Indexing Fidelity as Achieved Relative Strength
- Intervention Strength Treatment Control
- Achieved Relative Strength (ARS) Index
- Standardized difference in fidelity index across
Tx and C - Based on Hedges g (Hedges, 2007)
- Corrected for clustering in the classroom (ICCs
from .01 to .08) - See Hulleman Cordray (2009)
37Average ARS Index
Group Difference
Sample Size Adjustment
Clustering Adjustment
- Where,
- mean for group 1 (tTx )
- mean for group 2 (tC)
- ST pooled within groups standard deviation
- nTx treatment sample size
- nC control sample size
- n average cluster size
- p Intra-class correlation (ICC)
- N total sample size
38Absolute and Binary ARS Indices
Group Difference
Sample Size Adjustment
Clustering Adjustment
- Where,
- pTx proportion for the treatment group (tTx )
- pC proportion for the control group (tC)
- nTx treatment sample size
- nC control sample size
- n average cluster size
- p Intra-class correlation (ICC)
- N total sample size
39Average ARS Index
Treatment Strength
100 66 33 0
3 2 1 0
TTx
Infidelity
t tx
(0.74)-(0.04) 0.70
tC
Infidelity
TC
40Achieved Relative Strength Indices
Observed Fidelity Observed Fidelity Lab vs. Class Contrasts
Lab Class Lab - Class
Absolute Tx 0.58 0.25
C 0.00 0.01
g 1.72 0.80 0.92
Average Tx 1.73 0.74
C 0.00 0.04
g 2.52 1.32 1.20
Binary Tx 0.65 0.15
C 0.00 0.00
g 1.88 0.80 1.08
41Linking Achieved Relative Strength to Outcomes
42Sources of Infidelity in the Classroom
- Student behaviors were nested within teacher
behaviors - Teacher dosage
- Frequency of student exposure
- Student and teacher behaviors were used to
predict treatment fidelity (i.e., quality of
responsiveness/exposure).
43Sources of Infidelity Multi-level Analyses
- Part I Baseline Analyses
- Identified the amount of residual variability in
fidelity due to students and teachers. - Du to missing data, we estimated a 2-level model
(153 students, 6 teachers) - Student Yij b0j b1j(TREATMENT)ij rij,
- Teacher b0j ?00 u0j,
- b1j ?10 u10j
44Sources of Infidelity Multi-level Analyses
- Part II Explanatory Analyses
- Predicted residual variability in fidelity
(quality of responsiveness) with frequency of
responsiveness and teacher dosage - Student Yij b0j b1(TREATMENT)ij
- b2(RESPONSE FREQUENCY)ij rij
- Teacher b0j ?00 u0j
- b1j ?10 b10(TEACHER DOSAGE)j u10j
- b2j ?20 b20(TEACHER DOSAGE)j u20j
45Sources of Infidelity Multi-level Analyses
Baseline Model Baseline Model Explanatory Model Explanatory Model
Variance Component Residual Variance of Total Variance Reduction
Level 1 (Student) 0.15437 52 0.15346 lt 1
Level 2 (Teacher) 0.13971 48 0.04924 65
Total 0.29408 0.20270
p lt .001.
46Case Summary
- The motivational intervention was more effective
in the lab (g 0.45) than field (g 0.05). - Using 3 indices of fidelity and, in turn,
achieved relative treatment strength, revealed
that - Classroom fidelity lt Lab fidelity
- Achieved relative strength was about 1 SD less in
the classroom than the laboratory - Differences in achieved relative strength
differences motivational outcome, especially in
the lab. - Sources of fidelity teacher (not student) factors
47Key Points and Issues
- Identifying and measuring, at a minimum, should
include model-based core and necessary components - Collaborations among researchers and
practitioners (e.g., developers and implementers)
is essential for specifying - Intervention models
- Core and essential components
- Benchmarks for TTx (e.g., an educationally
meaningful dose what level of X is needed to
instigate change) - Tolerable adaptation
48Key Points and Issues
- Fidelity assessment serves two roles
- Average causal difference between conditions and
- Using fidelity measures to assess the effects of
variation in implementation on outcomes. - Post-experimental (re)specification of the
intervention
49Thank You Questions and Discussion