Size matters: the value of large scale epidemiology presentation

About This Presentation

Transcript and Presenter's Notes

Title: Size matters: the value of large scale epidemiology

1
Size matters the value of large scale
epidemiology

Paul Burton
Professor of Genetic Epidemiology
University of Leicester
P³G Consortium
PHOEBE

2
A daunting task!

Need for extensive, valid information
Developments in biotechnology, IT
Pre-morbid and longitudinal life-style/environment
relevant
Bioclinical complexity ? low statistical power!!

3
Large scale genetic epidemiology

Focus on the aetiology of complex diseases
Common disease common variant hypothesis
a shift in paradigm from linkage to association
BUT serious failure to identify associations
that can consistently be replicated

4
Hattersley AT, McCarthy MI. Lancet
20053661315-1323 Examples of some polymorphisms
or haplotypes that have shown consistent
association with complex disease
5
Why has replicationproved to be so difficult?

Poorly designed studies
e.g. wrong controls, family v non-family designs
Poorly conducted analyses and meta-analyses
e.g. use of inefficient or inconsistent methods
failure to take proper account of extreme
multiple testing publication and/or reporting
bias
Inconsistent definitions of outcome or exposure
e.g. what do we mean by asthma?
Poor methods of assessment
e.g. bad choice of SNP genotyping platform

6
Why has replicationproved to be so difficult?

Heterogeneity
e.g. stroke encompasses important
subcategories phenocopies pleiotropy
Population substructure
Latent stratification and admixture pertaining to
population of origin

7
Why has replicationproved to be so difficult?

LOW STATISTICAL POWER!!
A key feature of almost all proffered
explanations, and/or of the approaches needed to
correct for them
If we need 5,000 cases to test for a given
aetiological effect with a power of 80, and with
a critical p-value of 0.0001, how much power
would there be for a study with 500 cases?

8
Why has replicationproved to be so difficult?

LOW STATISTICAL POWER!!
A key feature of almost all proffered
explanations, and/or of the approach needed to
correct for them
If we need 5,000 cases to test for a given
aetiological effect with a power of 80, and with
a critical p-value of 0.0001, how much power
would there be for a study with 500 cases?

?0.008!!
9
How should we respond?

Increase the quality of individual studies
Limit measurement/assessment error
Increase the size of individual studies
Promote harmonization to enable data pooling and
integration

10
How should we respond?

Increase the quality of individual studies
Limit measurement/assessment error
Increase the size of individual studies
Promote harmonization to enable data pooling and
integration

? MAJOR international investment in biobanks
and biobank harmonization
11
What is a biobank?

An organised collection of human biological
material and associated information stored for
one or more research purposes
Population Biobanks Lexicon (P3G, PHOEBE)
Types
Disease-specific
Exposure-focused
Population-based

12
Justification for large-scalegenetic-epidemiology
programs
13
BIG per se

No argument about
Need to increase statistical power
Benefit of constructing biobanks containing
extensive case-series for case-control studies
Benefit of constructing large acceptably
representative series of controls for each nation

14
BIG cohort studies

Studies of the joint effects of genes and
environment/life-style
Genotype-based studies
The genetics of disease progression
Direct association of genes with disease
Population-based replication studies
Universal controls

15
BUT how big is big?With Anna Hansell,
Imperial College
16
The statistical power ofcase-control studies

Contemporary pre-eminence of genetic association
studies rather than genetic linkage studies
Covers both stand-alone case-control studies, and
nested case-control studies in large cohorts.
Main issue is the number of cases.
Sample size determining in both settings

17
Simulation-based power calculations

Work with the least powerful (common) setting
Disease outcome and exposures all binary
Logistic regression interactions departure
from a multiplicative model
Complexity (arbitrary but realistic).
Four controls per case

18
Diabetes mellitus defined by HbA1C 97.5
percentile
19
Genetic main effects
Prevalence of at-risk genotype 0.1, 0.5
20
Lifestyle main effects
Prevalence of at-risk life-style determinant
0.5
Reliability 1.0 measured height 0.9 self
reported weight 0.7 office BP, measured
serum cholesterol 0.5 dietary recall of
many components (424 hr recalls)
21
Gene-lifestyle interactions
Prevalence of at-risk genotype 0.1
Prevalence of at-risk life-style determinant
0.5
22
Mean power ? 55
23
What is needed?

Genetic main effects
2,500-10,000 cases
Life-style main effects
5,000-20,000 cases
Gene-lifestyle interactions
Probably need at least 20,000 cases

24
How can this be achieved?

Large disease-based biobanks
Very large cohort-based biobanks
But how large do these need to be?

25
Expected event ratesin UK BiobankWith Anna
Hansell, Imperial College
26
Taking account of

Age range at recruitment 40-69 years
Recruitment over 5 years
All cause mortality
Disease incidence (healthy cohort effect)
Migration overseas
Withdrawal from the study

27
(No Transcript)
28
Conclusions

Having taken account of realistic bioclinical
complexity, a cohort-based biobank needs to be
very large if it is to provide a stand-alone
infrastructure
Anything much less than 500,000 recruits severely
curtails the number of diseases that will be able
to be studied based on that biobank alone
The value of any biobank will be greatly
augmented if it proves possible to set up a
coherent and scientifically harmonized
international network of biobanks

29
What is biobank harmonization?
30
Biobank harmonization

A set of procedures that promote, both now and in
the future, the effective interchange of valid
information and samples between a number of
studies or biobanks, accepting that there may be
important differences between those studies
With thanks to Alastair Kent

31
Biobank harmonization

Prospective harmonization
Aims to modify study design and conduct, ahead of
time, in order to render subsequent data and
sample pooling more efficient and more
straightforward
Retrospective harmonization
Aims to optimize the pooling of data, samples and
phenotypes that have already been collected,
between studies with inevitably heterogeneous
designs.

32
Why harmonize?

Investigate less common (but not rare!!!)
conditions
UKBB Ca stomach 2,500 cases in 29 years
6 UKBB equivalents ? 10,000 cases in 20 years
Investigate smaller ORs
GME 1.5 ? 1.2 requires 2,000 ? 12,600
6.3 UKBB equivalents
Analysis based on subsets homogeneous classes
of phenotype, or e.g. by sex

33
Why harmonize?

Earlier analyses
UKBB Alzheimers disease, 10,000 cases in 18 yrs
5 UKBB equivalents ? 9 years
Events at younger ages
Broad range of environmental exposures
Aim for 5-6 UKBB equivalents
2.5M 3M recruits

34
Some key issues

Scientifically and politically VERY challenging
Laboratory science, clinical science, population
science, IT challenges, ethico-legal issues
A need for REAL collaboration and tools that are
ACCESSIBLE and USABLE
Case-control and cohort studies

35
International biobankharmonization programs

Public Population Program in Genomics (P3G)
Tom Hudson, Bartha Knoppers, Isabel Fortier
Population Biobanks
FP6 Co-ordination Action (PHOEBE Promoting
Harmonization Of Epidemiological Biobanks in
Europe)
Jennifer Harris, Leena Peltonen, Paul Burton
Human Genome Epidemiology Network (HuGENet)
Muin Khoury, Julian Little
ESSENTIAL THAT ALL INITIATIVES WORK TOGETHER!!

36
Extra slides
37
Rarer genotypes

Genetic main effects

38
Proposed assessment visit model
39
Taking account of

Age range at recruitment 40-69 years
Recruitment over 5 years
All cause mortality
Disease incidence (healthy cohort effect)
Migration overseas
Comprehensive withdrawal (max 1/500 p.a.)
Partial withdrawal (c.f. 1958 Birth Cohort)

40
(No Transcript)
41
Necessary to contact subjects
42
Issues that are often ignored in standard power
calculations

Multiple testing/low prior probability of
association
Interactions
Unobserved frailty
Misclassification
Genotype
Environmental determinant
Case-control status
Subgroup analyses
Population substructure

43
Harmonisation

Prospective
Retrospective
Description
Comparison
Harmonised synthesis

44
(No Transcript)
45
(No Transcript)
46
Recruitment and assessment

Recruitment via centrally held list of
individuals registered with Primary Care
Practitioners (GPs)
Assessment in large centres (?100 subjects per
day)
Assessment ? 70 minutes
Questionnaire, physical examination, bloods

47
Assessment visit model
48
Summary

80 power for genotype frequency 0.1
Genetic main effect ? 1.5, p10-4 ? 2,000 cases
Genetic main effect ? 1.3, p10-4 ? 5,500 cases
Genetic main effect ? 1.2, p10-4 ? 12,600 cases
Genetic main effect ? 1.7, p10-7 ? 2,000 cases
Genetic main effect ? 1.5, p10-7 ? 3,400 cases
Genetic main effect ? 1.3, p10-7 ? 9,500 cases
Genetic main effect ? 1.2, p10-7 ? 21,500
cases
GE interaction with environmental exp.
prevalence
0.5 ? 2.0, p10-4 ? 10,000 to 30,000
cases

49
UK Biobank

A prospective cohort study
500,000 adults (40-69 years) across UK
A population-based biobank
Not disease or exposure based
Recruitment via electronic GP lists
Broad spectrum not fully representative
Individuals not families
MRC, Wellcome Trust, DH, Scottish Executive
61M

50
UK Biobank

Initial data/sample collection and subsequent
longitudinal health tracking
Nested case-control studies
Long time-horizon
Owned by the Nation
Central Administration Manchester
PI Prof Rory Collins - Oxford
6 collaborating groups (RCCs) of university
scientists

51
Smaller sample sizes

Write a Comment

User Comments (0)

About PowerShow.com

Size matters: the value of large scale epidemiology PowerPoint PPT Presentation