Statistics in Medical Research RCTs and Cohort - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Statistics in Medical Research RCTs and Cohort

Description:

Clinical Epidemiology & Biostatistics Pathology ... Statistical Analysis Paper II Paper III Estimation and Confidence Intervals Interpretation of 95% CI ... – PowerPoint PPT presentation

Number of Views:227
Avg rating:3.0/5.0
Slides: 63
Provided by: JosephB172
Category:

less

Transcript and Presenter's Notes

Title: Statistics in Medical Research RCTs and Cohort


1
Statistics in Medical ResearchRCTs and Cohort
  • Jemila Hamid
  • Clinical Epidemiology Biostatistics
  • Pathology Molecular Medicine
  • McMaster University
  • jhamid_at_mcmaster.ca
  • July 21, 2011

2
Outline
  • I. Introduction
  • II. Group comparison
  • Paper I Edwards et al., submitted
  • III. Survival Analysis
  • Paper II Krag et al, Lancet oncol, 2010
  • Paper III Weaver et al., NEJM, 2011.
  • IV. Design Issues sample size
  • V. Summary

3
I. Introduction
  • Study types are classified into two broad
    categories
  • Experimental researcher investigates the
    effects of intervention
  • They are prospective studies and are usually
    comparative in nature, longitudinal or cross
    sectional, parallel vs crossover designs eg.
    clinical trials Krag et al., Lancet oncol
  • Observational researcher doesnt influence
    events
  • Case-control (retrospective and cross sectional
    in general) or cohort (prospective and mostly
    longitudinal), Surveys (cross sectional) eg.
    epidemiology, diagnostic testing, public health
    Weaver et al., NEJM

4
Statistical Question?
  • Estimation? Estimating prevalence of disease,
    treatment effect, risk, hazard, accuracy etc.
  • Comparative? Comparing treatment effect with a
    constant (single sample)? Comparing a new
    diagnostic test with a gold standard? Comparing
    two or more treatments? Comparing before and
    after an intervention?
  • Association and regression? Relationship between
    two variables? Effect of one variable on another?
    Effect of multiple variables on an outcome?
  • Prediction? Predict (classify into) disease
    subtype? predict an outcome based on risk
    factors?

5
  • Paper II Krag et al.
  • Comparing survival and disease free survival
    between two surgical procedures
    Sentinel-lymph-node resection and
    axillary-lymph-node dissection
  • Paper III Weaver al al.
  • Comparison of disease recurrence and survival
    between two groups of patients those with occult
    lymph-node metastases and those in whom no occult
    metastases was detected
  • In both Papers, estimation and confidence
    intervals are also a part of the statistical
    question - eg. estimate overall survival,
    disease free survival, hazard ratio etc.

6
Outcome Measures?
  • Continuous - mean, standard deviation, mean
    difference
  • Normal distribution is often assumed
  • Transformations, non-parametric approaches
  • Binary risk, odds, relative risk, risk
    difference, odds ratio, sensitivity, specificity,
    classification accuracy, AUC
  • Logistic regression
  • Binomial distribution
  • Count - mean count, proportions, rates
  • Poisson regression
  • Negative binomial
  • Survival hazard, hazard ratio, time to event
  • Cox regression
  • Weibull, lognormal and generalized Gamma
    distributions

7
Statistical Analysis
  • Descriptive Table 1 of medical articles
  • Summarizing and evaluating data using
    graphical, tabular
  • This can be done using boxplots, histograms,
    normal probability plots
  • Gives a good feel of data
  • Assess distributions normal? need
    transformation?
  • Outliers? What are we going to do about them?
  • Missing values? Why are they missing? What are we
    going to do about them?

8
Paper II
9
Paper III
10
Estimation and Confidence Intervals
  • Estimator parameter of interest could be mean,
    response rate, proportion etc
  • Confidence interval (CI) quantifies imprecision
    or uncertainty associated with an estimate
  • the reader can assess whether a result is
    estimated precisely or not, definitive or not
  • presenting CI has been widely promoted in the
    literature

11
Interpretation of 95 CI
  • If we repeated the experiment many many times,
    95 of the time the TRUE parameter value would be
    in the interval
  • Before performing the experiment, the probability
    that the interval would contain the true
    parameter value was 0.95

12
Set of 95 C.I.s from samples of size n12 drawn
from a normal distribution with ? 211 and s2
46.
13
95 CI for continuous outcome
  •  

14
  •  

15
95 CI for proportions
  •  

16
  •  

17
Examples from Papers II and III
  • Paper II
  • 8 year overall survival were 91.8 (95 CI
    90.4-93.4) for group 1 and 90.3 (95
    CI88.8-91.8) in group 2
  • HR 1.05 (95 CI 0.90-1.22)
  • Paper III
  • Occult metastases were detected in 15.9 of the
    patients (95CI 14.7-17.1)
  • Adjusted hazard ratio HR 1.40 (95 CI
    1.05-1.86)

18
Hypothesis Testing
  •  

19
Examples from papers I-III
  • Papers I III, overall and disease free survival
    were compared among the groups considered eg
    paper II, HR1.2,p value 0.12 paper III,
    HR1.40, p-value0.03
  • Paper I
  • Statistical Question - Comparison of clinical
    characteristics of B-Cell lymphoma unclassifiable
    (BCLU) with that of Burkitt Lymphoma (BL) and
    Diffuse Large B-Cell Lymphoma (DLBL)
  • Several clinical variables were considered some
    binary and some categorical eg. Researchers
    compared DLBL and BCLU with respect to Gender
    (P-value0.51), CNS involved (p-value0.01)

20
Analysis of variance (ANOVA)
  •  

21
  • ANOVA table

Source df SS MS F P- value
Between Groups k-1
Within Groups n-k
Total n-1
22
Example for ANOVA
  • In paper I, if the researches were to compare all
    the three tumor types DLBL, DCLU and BL with
    respect to the a continuous clinical
    characteristics, Aanova would be appropriate
  • For the other non-continous characteristics,
    one can apply an appropriate transformation
    before applying anova
  • If they dont use anova and opt to use pairwise
    comparisons, there will be an issue of multiple
    comparison

23
Example Anova
  • E.g. (Altman, 1991) Twenty two patients
    undergoing cardiac bypass surgery were randomized
    to one of three ventilation groups
  • 50 nitrous oxide and 50 oxygen mixture for 24
    hours
  • Same as 1, but received received treatment
    during the operation
  • No nitrous oxide but received 35-50 oxygen for
    24 hours
  • Compare if three groups have the same red
    cell folate (RCL)
  • levels

24
Summary of RCL levels Summary of RCL levels Summary of RCL levels
Group Mean Std Dev Freq.
I 316.6 58.7 s1 8 n1
II 256.4 37.1 s2 9 n2
II 278.0 33.8 s3 5 n3
Total 283.2 22 n
25
ANOVA table
Source df SS MS F P-value
Between Groups 2 15515.9 7757.9 3.71 0.04
Within Groups 19 39716.1 2090.3
Total 21 55232 2510.5
At the 5 level, there is evidence to suggest
there is a significant difference in RCL levels
among the three groups.
26
Regression Methods
  • So far we considered estimation, confidence
    intervals and comparisons. Some of these can be
    framed as a simple regression model
  • Anova can, for example, be framed as a regression
    model where the treatment groups are independent
    variables
  • Estimation is a big part of regression methods
  • But, will present regression models in general
    and talk about special cases eg. Cox regression

27
Regression Methods
  • A method for analyzing relationship between two
    or more variables
  • There is a causal direction investigator seeks
    to ascertain the causal effect of one variable
    upon another
  • Otherwise, it will be association or correlation
    analysis no causal relationship, here one needs
    to measure the strength of association between
    variables without assuming any causal
    relationship

28
  • Two kinds of variables outcome (dependent
    variable) and predictor (independent variable)
  • Predictors are sometimes called risk factors,
    exposure variables, prognostic factors depending
    on the nature of data
  • Simple linear regression
  • Y a ß1X1 ß2X2 ßpXp
  • Depending on the distribution of the outcome
    variable, we have different types of regression
    Anova (MD), logistic regression (RR and OR), Cox
    regression (HR)

29
Example
  • E.g. (Altman, 1991) Twenty two patients
    undergoing cardiac bypass surgery were randomized
    to one of three ventilation groups
  • 50 nitrous oxide and 50 oxygen mixture for 24
    hours
  • Same as 1, but received received treatment
    during the operation
  • No nitrous oxide but received 35-50 oxygen for
    24 hours
  • Compare if three groups have the same red
    cell folate (RCL)
  • levels

30
Examples
  • Study of biomarkers Kazu et al., work in
    progress
  • Evaluate the diagnostic ability of a panel of
    five immunohistochemical markers in
    distinguishing between Endometrioid
    Adenocarcinoma (EC) and Serous Carcinoma (SC)
  • Study the relative contribution of each of the
    five markers towards predicting the two
    histologic types.
  • Clinical covariates such as age, body mass index
    (BMI), stage, and history of hormone replacement
    therapy
  • Multiple logistic regression and ROC analysis was
    performed to estimate odds ratio and construct a
    predictive model

31
Examples
  • Paper II
  • Cox regression is used to estimate hazard ratio,
    model and compare survival between the two
    surgical procedures
  • Here outcome (dependent variable) is survival and
    disease free survival, predictor variable is
    surgical groups, other covariates are also
    included in the model to estimate adjusted HR
  • Paper III
  • Again, Cox regression is used to model and
    compare survival between the two group of
    patients
  • Outcome variables overall survival, disease free
    survival and distant-disease free interval.
    Predictor variable is two groups of patients,
    other covariates are also included here

32
Other Methods
  • Diagnostic testing
  • Agreement studies
  • Multivariate methods cluster analysis,
    discriminant analysis, factor analysis, PCA, CCA
  • Meta analysis combining data from different
    studies
  • Methods for correlated data - longitudinal and
    repeated measures data
  • Methods for high-dimensional data genomics and
    genetics

33
II. Group Comparison
  • We will talk about Paper I
  • We will focus only on the comparative aspect of
    the paper
  • Talk about group comparison, multiple comparison
    using same data

34
Paper I
  • Statistical Question - Comparison of clinical
    characteristics of B-Cell lymphoma unclassifiable
    (BCLU) and Diffuse Large B-Cell Lymphoma (DLBL)
  • Several clinical variables were considered some
    binary and some categorical eg. Researchers
    compared DLBL and BCLU with respect to the
    clinical variables
  • Survival is also considered in this paper but
    we will focus on the comparative aspect of the
    paper here

35
Materials and methods Paper I
  • A ten-year retrospective examination of the
    clinical characteristics, survival, treatment
    response and molecular profile of BCLU (n34)
    compared to DLBL (n97)
  • Variables considered include Gender, age at
    diagnosis, International Prognostic Index (IPI),
    Eastern Cooperative Oncology Group (EGOC)
    performance status, Ann Arbour stage, presence of
    B-symptoms, bone marrow (BM) and central nervous
    system (CNS) involvement, extranodal and bulky
    disease
  • Chi-square and one way Analysis of Variance were
    used respectively for categorical and continuous
    data to compare the baseline characteristics
    between groups

36
Results paper I
Table 2. Clinical characteristics at diagnosis
and treatment regimes.
Variable DLBL n () DLBL n () DLBL n () BCLU n () BCLU n () P-value
Male Gender 45 (46) 18 18 (53) 0.51
B symptoms 37 (41) 12 12 (40) 0.95
Positive BM 21 (26) 5 5 (19) 0.46
CNS involved 4 (5) 6 6 (20) 0.01
Bulky disease 18 (20) 12 12 (40) 0.03
Extranodal disease 35 (39) 14 14 (47) 0.43
IPI Score 0 1 2 3 4 5   6 16 19 18 11 14   (7) (19) (23) (21) (13) (17)   3 5 8 6 7 0   3 5 8 6 7 0   (10) (17) (28) (21) (24) (0) 0.22  
EGOC 0 1 2 3 4   31 27 12 15 6   (34) (30) (13) (17) (7)   11 10 6 1 2   11 10 6 1 2   (37) (33) (20) (3) (7) 0.43  
Ann Arbour stage 1 2 3 4   18 18 14 41   (20) (20) (15) (45)   8 5 8 9   8 5 8 9   (27) (17) (27) (30) 0.33  
Treatment regime BL-like DLBL-like Palliative No treatment   4 74 13 6   (4) (76) (13) (6)   4 24 1 5   4 24 1 5   (12) (71) (3) (15)  
             
37
II. Survival Analysis
  • Focus on statistical methods used Papers II and
    III
  • We will discuss
  • Study type and design
  • Materials and Methods
  • Statistical Analysis
  • Results

38
Paper II
39
Paper III
40
Study type - Paper II
  • Randomized controlled phase 3 trial done at 80
    centers across Canada and the USA
  • Women with invasive breast cancer were randomly
    assigned to two surgical procedures
  • Sentinel-lymph-node resection (SLN) plus
    axillary-lymph-node dissection (ALND)
  • SLN alone with ALND only if the SLNs were
    positive
  • Randomization was stratified by age ( 49, 50),
    tumor size and surgical plan (lumpectomy,
    mastectomy)
  • Primary outcome was overall survival but other
    secondary outcomes were considered

41
(No Transcript)
42
Study type - Paper III
  • Retrospective and observational study from
    previously conducted RCT
  • Paraffin-embedded tissue blocks of sentinel lymph
    nodes obtained from patients with pathologically
    negative SLNs were centrally evaluated for occult
    metastases
  • Objective is estimate proportion of patients
    with occult metastases and compare survival
    between group of patients with and without occult
    metasetases

43
(No Transcript)
44
Methods, survival analysis
  • In both papers, the primary outcome is overall
    survival secondary outcome disease free
    survival, regional control
  • Both papers used the log rank test and Cox
    proportional hazard models
  • Kaplan-Meier corves were used in both papers
  • In both papers, HRs and adjusted HRs with 95 CIs
    were provided

45
Survival analysis
  • Survival analysis is used to analyze time to
    event data arises in both clinical and cohort
    studies
  • Event Death, disease occurrence, disease
    recurrence, recovery, or other experience of
    interest
  • Time The time from the beginning of an
    observation period (e.g., surgery) to (a) an
    event, or (b) end of the study, or (c) loss of
    contact or withdrawal from the study
  • We almost never observe the event of interest in
    all subjects for these patients, we dont know
    their survival time

46
Survival analysis
  • Censoring/censored observation
  • When a subject does not have an event during the
    observation time, they are described as censored,
    meaning that we cannot observe what has happened
    to them subsequently.
  • A censored subject may or may not have an event
    after the end of observation time
  • Such survival times are called censored.

47
Survival analysis
  •  

48
Survival analysis.
  • Median survival- time point at which 50 of the
    population survives
  • Mean survival the average survival time (not
    commonly used) calculated as the number if
    years survived by all patients divide by the
    number of deaths
  • 5 year survival proportion of patients that
    survive 5 years ( it can be 1 year, 2 years, 5
    years or 10 years depending on the nature of the
    event)

Time to event
49
Paper II
50
Paper III
51
Paper III
52
Paper I
53
Comparing survival curves
  • One use KM curves is to compare survival between
    two or more groups
  • We can visually compare median survival, mean
    survival, 5 year survival etc.
  • We need more objective way of comparing long
    rank test

Remission time for acute myelogenous leukemia.
Group 1 received maintenance chemo, group 2,
didnt
54
Log rank test
Alive Dead Total
Group 1 a1 d1 n1
Group 2 a2 d2 n2
Total a d n
 
55
Cox proportional hazard model
  •  

56
Results from paper II
  • Comparison of survival between group 1 (SLR
    ALND) and group 2 (SLN alone) resulted in
    unadjusted HR 1.20 (95CI 0.86-1.50,
    p-value0.12)
  • 8 year KM estimates for overall survival were
    91.8 (95CI 90.4-93.3) for group 1 and 90.3
    (95CI88.8-91.8) for group 2
  • Comparison for disease free survival resulted in
    HR of 1.05 (95CI0.90-1.22,p-value0.54)
  • 8 year KM estimates for disease free survival
    were 82.4 (95CI 80.5-84.4) for group 1 and
    81.5 (95CI79.6-83.4) for group 2

57
Results from Paper III
  • Occult metastases were detected in 15.9 (95CI
    14.7 -17.1) of the patients
  • A significant difference in overall survival
    (p-value0.03), disease free survival
    (p-value0.02) and distant disease free survival
    (p-value0.04) between the two groups of patients
  • The 5 year KM survival estimates for group 1 (in
    whom occult metastases was detected were 94.6,
    86.4, 89.7 for overall, disease free and
    distant disease free survival, respectively.
  • For group 2, the estimates are 95.8, 89.2, and
    92.5, respectively

58
(No Transcript)
59
IV. Design Issues sample size
  • Determining the sample size for a study is a
    crucial component of study design
  • Small sample size leads to imprecise estimates
    and the study will be under powered to detect
    differences, in particular when the effect size
    is very small
  • Using too many subjects may result in
    statistically significant conclusions and clear
    future study directions however, the same
    answer could have been obtained with fewer
    subjects studies are over powered and lead to
    wasted resources
  • Some treatments are also invasive

60
  • Important to choose primary outcome
  • There are important parameters needed for sample
    size calculations desired power (1-type II
    error), level of the test (Type one error or a),
    variance (or standard deviation) and effect size
    (the minimum difference to be detected)
  • Eg. comparing two means 80 power, 0.05 level,
    range of sd, range of MD
  • N102 (50 for each group is needed to achieve 80
    power when 3.5 and sd3
  • Only N9 (3 for each group is needed to achieve
    80 power when 1 and sd1

61
  • Multiple logistic regression we need to chose
    the primary predictor variable
  • Provide parameter values based on this primary
    predictor
  • Consider the correlation between these predictor
    with other predictors or covariates
  • When correlation is high, we need larger sample
    size

62
Summary
  • Statistical question, outcome variable, primary
    and secondary outcomes
  • Estimation, CI, hypothesis testing
  • Multiple comparison
  • Survival analysis and linear regression
    (comparison between two or more groups)
  • Design issues sample size
Write a Comment
User Comments (0)
About PowerShow.com