Title: Reading and reporting evidence from trialbased evaluations
1Reading and reporting evidence from trial-based
evaluations
- Professor David Torgerson
- Director, York Trials Unit
- www.rcts.org
2Background
- Good quality randomised controlled trials (RCTs)
are the best form of evidence to inform policy
and practice. - However, poorly conducted RCTs may be more
misleading than other types of evidence.
3RCTs a reminder
- Randomised controlled trials (RCTs) provide the
strongest basis for causal inference by - Controlling for regression to the mean effects
- Controlling for temporal changes
- Providing a basis for statistical inference
- Removing selection bias.
4Selection Bias
- Selection bias can occur in non-randomised
studies when group selection is related to a
known or unknown prognostic variable. - If the variable is either unknown or imperfectly
measured then it is not possible to control for
this confound and the observed effect may be
biased.
5Randomisation
- Randomisation ONLY ensures removal of selection
bias if all those who are randomised are retained
in the analysis within the groups they were
originally allocated. - If we lose participants or the analyst moves
participants out of their original randomised
groups, this violates the randomisation and can
introduce selection bias.
6Is it randomised?
- The students were assigned to one of three
groups, depending on how revisions were made
exclusively with computer word processing,
exclusively with paper and pencil or a
combination of the two techniques.
Greda and Hannafin, J Educ Res 199285144.
7The Perfect Trial
- Does not exist.
- All trials can be criticised methodologically,
but is best to be transparent about trial
reporting so we can interpret the results in
light of the quality of the trial.
8Types of randomisation
- Simple randomisation
- Stratified randomisation
- Matched design
- Minimisation
9Simple randomisation
- Use of a coin toss, random number tables.
- Characteristics will tend to produce some
numerical imbalance (e.g., for a total n 30 we
might get 14 vs 16). Exact numerical balance
unlikely. For sample sizes of lt50 units is less
efficient than restricted randomisation.
However, more resistant to subversion effects in
a sequentially recruiting trial.
10Stratified randomisation
- To ensure known covariate balance restrictions on
randomisation are used. Blocks of allocation are
used ABBA AABB etc. - Characteristics ensures numerical balance within
the block size increases subversion risk in
sequentially recruiting trials small trials with
numerous covariates can result in imbalances.
11Matched Designs
- Here participants are matched on some
characteristic (e.g., pre-test score) and then a
member of each pair (or triplet) are allocated to
the intervention. - Characteristics numerical equivalence loss of
numbers if total is not divisible by the number
of groups can lose power if matched on a weak
covariate, difficult to match on numerous
covariates can reduce power in small samples.
12Minimisation
- Rarely used in social science trials. Balance is
achieved across several covariates using a simple
arithmetical algorithm. - Characteristics numerical and known covariate
balance. Good for small trials with several
important covariates. Increases risk of
subversion in sequentially recruiting trials
increases risk of technical error.
13Characteristics of a rigorous trial
- Once randomised all participants are included
within their allocated groups. - Random allocation is undertaken by an independent
third party. - Outcome data are collected blindly.
- Sample size is sufficient to exclude an important
difference. - A single analysis is prespecified before data
analysis.
14Problems with RCTs
- Failure to keep to random allocation
- Attrition can introduce selection bias
- Unblinded ascertainment can lead to ascertainment
bias - Small samples can lead to Type II error
- Multiple statistical tests can give Type I errors
- Poor reporting of uncertainty (e.g., lack of
confidence intervals).
15Are these RCTs?
- We took two groups of schools one group had
high ICT use and the other low ICT use we then
took a random sample of pupils from each school
and tested them. - We put the students into two groups, we then
randomly allocated one group to the intervention
whilst the other formed the control - We formed the two groups so that they were
approximately balanced on gender and pretest
scores - We identified 200 children with a low reading
age and then randomly selected 50 to whom we gave
the intervention. They were then compared to the
remaining 150.
16Examples
- Of the eight schools two randomly chosen
schools served as a control group1 - From the 51 children we formed 17 sets of
tripletsOne child from each triplet was randomly
assigned to each of the 3 experimental groups2 - Stratified random assignment was used in forming
2 treatment groups, with strata (low, medium,
high) based on kindergarten teachers estimates
of reading3
1 Kim et al. J Drug Ed 19932367. 2 Torgesen et
al, J Ed Psychology 199284364 3 Uhry and
Shepherd, RRQ, 199328219
17What is the problem here?
- A random-block technique was used to ensure
greater homogeneity among the groups. We
attempted to match age, sex, and diagnostic
category of the subjects. The composition of the
final 3 treatment groups is summarized in Table
1.
Roberts and Samuels. J Ed Res 199387118.
18Stratifying variables
Plus 3 groups for each bottom cell 24 groups in
all, sample size 36
19Blocking
- With so many stratifying variables and a small
sample size then blocked allocation results in on
average 1.5 children per cell. It is likely that
some cells will be empty and this technique can
result in greater imbalances than less restricted
allocation.
20Mixed allocation
- Students were randomly assigned to either Teen
Outreach participation or the control condition
either at the student level (I.e., sites had more
students sign up than could be accommodated and
participants and controls were selected by
picking names out of a hat or choosing every
other name on an alphabetized list) or less
frequently at the classroom level
Allen et al, Child Development 199764729-42.
21Is it randomised?
- The groups were balanced for gender and, as far
as possible, for school. Otherwise, allocation
was randomised.
Thomson et al. Br J Educ Psychology
199868475-91.
22Class or Cluster Allocation
- Randomising intact classes is a useful approach
to undertaking trials. However, to balance out
class level covariates we must have several units
per group (a minimum of 5 classes per group is
recommended) otherwise we cannot possibly balance
out any possible confounders.
23What is wrong here?
- the remaining 4 classes of fifth-grade students
(n 96) were randomly assigned, each as an
intact class, to the 4 prewriting treatment
groups
Brodney et al. J Exp Educ 199968,5-20.
24Misallocation issues
- We used a matched pairs design. Children were
matched on gender and then 1 of each pair was
then allocated to the intervention whilst the
remaining child acted as a control. 31 children
were included in the study 15 in the control
group and 16 in the intervention. - 23 offenders from the treatment group could not
attend the CBT course and they were then placed
in the control group.
25Attrition
- Rule of thumb 0-5, not likely to be a problem.
6 to 20, worrying, gt 20 selection bias. - How to deal with attrition?
- Sensitivity analysis.
- Dropping remaining participant in a matched
design does NOT deal with the problem.
26What about matched pairs?
- We can only match on observable variables and we
trust to randomisation to ensure that unobserved
covariates or confounders are equally distributed
between groups. - If we lose a participant dropping the matched
pair does not address the unobservable
confounder, which is one of the main reasons we
randomise.
27Matched Pairs on Gender
28Drop-out of 1 girl
29Removing matched pair does not balance the groups!
30Dropping matched pairs
- In that example by dropping the matched pair we
make the situation worse. - Balanced on gender but imbalanced on high/low
- We can correct for gender in statistical analysis
as it is observable variable we cannot correct
for high/low as this is unobservable - Removing the matched pair reduces our statistical
power but does not solve our problem.
31Sensitivity analysis
- In the presence of attrition we can see if our
results change because of this. For example, for
the group that has a good outcome, we can give
the worst possible scores to the missing
participants and vice versa. - If the difference still remains significant we
can be reassured that attrition did not make a
difference to the findings.
32Flow Diagrams
Hatcher et al. 2005 J Child Psych Psychiatry
online
33Flow Diagram
- In health care trials reported in the main
medical journals authors are required to produce
a CONSORT flow diagram. - The trial by Hatcher et al, clearly shows the
fate of the participants after randomisation
until analysis.
34Poorly reported attrition
- In a RCT of Foster-Carers extra training was
given. - Some carers withdrew from the study once the
dates and/or location were confirmed others
withdrew once they realized that they had been
allocated to the control group 117
participants comprised the final sample - No split between groups is given except in one
table which shows 67 in the intervention group
and 50 in the control group. 25 more in the
intervention group unequal attrition hallmark
of potential selection bias. But we cannot be
sure.
Macdonald Turner, Brit J Social Work (2005)
35,1265
35Recent Blocked Trial
- This was a block randomised study (four patients
to each block) with separate randomisation at
each of the three centres. Blocks of four cards
were produced, each containing two cards marked
with "nurse" and two marked with "house officer."
Each card was placed into an opaque envelope and
the envelope sealed. The block was shuffled and,
after shuffling, was placed in a box.
Kinley et al., BMJ 3251323.
36What is wrong here?
Kinley et al., BMJ 3251323.
37Type I error issues
- 3 group trial - Pre-test to posttest scores
improved for most of the 14 variables. 42
potential comparisons between pairs. Authors
actually did more reporting pretest posttest one
group tests as well as between groups, which
gives 82 tests.
Roberts and Samuels. J Ed Res 199387118.
38Type II errors
- Most social science interventions show small
effect sizes (typically 0.5 or lower). To have
80 chance of observing a 0.5 effect of an
intervention we need 128 participants. For
smaller effects we need much larger studies
(e.g., 512 for 0.25 of an Effect Size).
39Analytical Errors
- Many studies do the following
- Do paired tests of pre post tests. Unnecessary
and misleading in a RCT as we should compare
group means. - Do not take into account cluster allocation.
- Use gain scores without adjusting for baseline
values. - Do multiple tests.
40Pre-treatment differences
- A common approach is to statistically test
baseline covariates - The first issue we examined was whether there
were pretreatment differences between the
experimental groups and the control groups on the
following independent variables There were
two pretreatment differences that attained
statistical significance However, since they
were statistically significant these 2 variables
are included as covariates in all statistical
tests.
Davis Taylor Criminology 199735307-33.
41What is wrong with that?
- If randomisation has been carried out properly
then the null hypothesis is true, any differences
have occurred by chance. - Statistical significance of differences gives no
clue as to the importance of the covariate to be
included in the analysis. Including a
significant covariate, which is unimportant
reduces power whilst ignoring a balanced
covariate also reduces power.
42The CONSORT statement
- Many journals require authors of RCTs to conform
to the CONSORT guidelines. - This is a useful approach to deciding whether or
not trials are of good quality.
43 Modified CONSORT quality criteria
44Review of Trials
- In a review of RCTs in health care and education
the quality of the trial reports were compared
over time.
Torgerson CJ, Torgerson DJ, Birks YF, Porthouse
J. Br Ed Res J. 200531761-85.
45Study Characteristics
46Change in concealed allocation
P 0.04
P 0.70
NB No education trial used concealed allocation
47Blinded Follow-up
P 0.03
P 0.13
P 0.54
48Underpowered
P 0.22
P 0.76
P 0.01
49Mean Change in Items
P 0.03
P 0.001
P 0.07
50Summary
- A lot of evidence from health care trials that
poor quality studies give different results
compared with high quality studies. - Social science trials tend to be poorly reported.
Often difficult to distinguish between poor
quality and poor reporting. - Can easily increase reporting quality.