Size matters: the value of large scale epidemiology PowerPoint PPT Presentation

presentation player overlay
1 / 51
About This Presentation
Transcript and Presenter's Notes

Title: Size matters: the value of large scale epidemiology


1
Size matters the value of large scale
epidemiology
  • Paul Burton
  • Professor of Genetic Epidemiology
  • University of Leicester
  • P³G Consortium
  • PHOEBE

2
A daunting task!
  • Need for extensive, valid information
  • Developments in biotechnology, IT
  • Pre-morbid and longitudinal life-style/environment
    relevant
  • Bioclinical complexity ? low statistical power!!

3
Large scale genetic epidemiology
  • Focus on the aetiology of complex diseases
  • Common disease common variant hypothesis
  • a shift in paradigm from linkage to association
  • BUT serious failure to identify associations
    that can consistently be replicated

4
Hattersley AT, McCarthy MI. Lancet
20053661315-1323 Examples of some polymorphisms
or haplotypes that have shown consistent
association with complex disease
5
Why has replicationproved to be so difficult?
  • Poorly designed studies
  • e.g. wrong controls, family v non-family designs
  • Poorly conducted analyses and meta-analyses
  • e.g. use of inefficient or inconsistent methods
    failure to take proper account of extreme
    multiple testing publication and/or reporting
    bias
  • Inconsistent definitions of outcome or exposure
  • e.g. what do we mean by asthma?
  • Poor methods of assessment
  • e.g. bad choice of SNP genotyping platform

6
Why has replicationproved to be so difficult?
  • Heterogeneity
  • e.g. stroke encompasses important
    subcategories phenocopies pleiotropy
  • Population substructure
  • Latent stratification and admixture pertaining to
    population of origin

7
Why has replicationproved to be so difficult?
  • LOW STATISTICAL POWER!!
  • A key feature of almost all proffered
    explanations, and/or of the approaches needed to
    correct for them
  • If we need 5,000 cases to test for a given
    aetiological effect with a power of 80, and with
    a critical p-value of 0.0001, how much power
    would there be for a study with 500 cases?

8
Why has replicationproved to be so difficult?
  • LOW STATISTICAL POWER!!
  • A key feature of almost all proffered
    explanations, and/or of the approach needed to
    correct for them
  • If we need 5,000 cases to test for a given
    aetiological effect with a power of 80, and with
    a critical p-value of 0.0001, how much power
    would there be for a study with 500 cases?

?0.008!!
9
How should we respond?
  • Increase the quality of individual studies
  • Limit measurement/assessment error
  • Increase the size of individual studies
  • Promote harmonization to enable data pooling and
    integration

10
How should we respond?
  • Increase the quality of individual studies
  • Limit measurement/assessment error
  • Increase the size of individual studies
  • Promote harmonization to enable data pooling and
    integration

? MAJOR international investment in biobanks
and biobank harmonization
11
What is a biobank?
  • An organised collection of human biological
    material and associated information stored for
    one or more research purposes
  • Population Biobanks Lexicon (P3G, PHOEBE)
  • Types
  • Disease-specific
  • Exposure-focused
  • Population-based

12
Justification for large-scalegenetic-epidemiology
programs
13
BIG per se
  • No argument about
  • Need to increase statistical power
  • Benefit of constructing biobanks containing
    extensive case-series for case-control studies
  • Benefit of constructing large acceptably
    representative series of controls for each nation

14
BIG cohort studies
  • Studies of the joint effects of genes and
    environment/life-style
  • Genotype-based studies
  • The genetics of disease progression
  • Direct association of genes with disease
  • Population-based replication studies
  • Universal controls

15
BUT how big is big?With Anna Hansell,
Imperial College
16
The statistical power ofcase-control studies
  • Contemporary pre-eminence of genetic association
    studies rather than genetic linkage studies
  • Covers both stand-alone case-control studies, and
    nested case-control studies in large cohorts.
    Main issue is the number of cases.
  • Sample size determining in both settings

17
Simulation-based power calculations
  • Work with the least powerful (common) setting
  • Disease outcome and exposures all binary
  • Logistic regression interactions departure
    from a multiplicative model
  • Complexity (arbitrary but realistic).
  • Four controls per case

18
Diabetes mellitus defined by HbA1C 97.5
percentile
19
Genetic main effects
Prevalence of at-risk genotype 0.1, 0.5
20
Lifestyle main effects
Prevalence of at-risk life-style determinant
0.5
Reliability 1.0 measured height 0.9 self
reported weight 0.7 office BP, measured
serum cholesterol 0.5 dietary recall of
many components (424 hr recalls)
21
Gene-lifestyle interactions
Prevalence of at-risk genotype 0.1
Prevalence of at-risk life-style determinant
0.5
22
Mean power ? 55
23
What is needed?
  • Genetic main effects
  • 2,500-10,000 cases
  • Life-style main effects
  • 5,000-20,000 cases
  • Gene-lifestyle interactions
  • Probably need at least 20,000 cases

24
How can this be achieved?
  • Large disease-based biobanks
  • Very large cohort-based biobanks
  • But how large do these need to be?

25
Expected event ratesin UK BiobankWith Anna
Hansell, Imperial College
26
Taking account of
  • Age range at recruitment 40-69 years
  • Recruitment over 5 years
  • All cause mortality
  • Disease incidence (healthy cohort effect)
  • Migration overseas
  • Withdrawal from the study

27
(No Transcript)
28
Conclusions
  • Having taken account of realistic bioclinical
    complexity, a cohort-based biobank needs to be
    very large if it is to provide a stand-alone
    infrastructure
  • Anything much less than 500,000 recruits severely
    curtails the number of diseases that will be able
    to be studied based on that biobank alone
  • The value of any biobank will be greatly
    augmented if it proves possible to set up a
    coherent and scientifically harmonized
    international network of biobanks

29
What is biobank harmonization?
30
Biobank harmonization
  • A set of procedures that promote, both now and in
    the future, the effective interchange of valid
    information and samples between a number of
    studies or biobanks, accepting that there may be
    important differences between those studies
  • With thanks to Alastair Kent

31
Biobank harmonization
  • Prospective harmonization
  • Aims to modify study design and conduct, ahead of
    time, in order to render subsequent data and
    sample pooling more efficient and more
    straightforward
  • Retrospective harmonization
  • Aims to optimize the pooling of data, samples and
    phenotypes that have already been collected,
    between studies with inevitably heterogeneous
    designs.

32
Why harmonize?
  • Investigate less common (but not rare!!!)
    conditions
  • UKBB Ca stomach 2,500 cases in 29 years
  • 6 UKBB equivalents ? 10,000 cases in 20 years
  • Investigate smaller ORs
  • GME 1.5 ? 1.2 requires 2,000 ? 12,600
  • 6.3 UKBB equivalents
  • Analysis based on subsets homogeneous classes
    of phenotype, or e.g. by sex

33
Why harmonize?
  • Earlier analyses
  • UKBB Alzheimers disease, 10,000 cases in 18 yrs
  • 5 UKBB equivalents ? 9 years
  • Events at younger ages
  • Broad range of environmental exposures
  • Aim for 5-6 UKBB equivalents
  • 2.5M 3M recruits

34
Some key issues
  • Scientifically and politically VERY challenging
  • Laboratory science, clinical science, population
    science, IT challenges, ethico-legal issues
  • A need for REAL collaboration and tools that are
    ACCESSIBLE and USABLE
  • Case-control and cohort studies

35
International biobankharmonization programs
  • Public Population Program in Genomics (P3G)
  • Tom Hudson, Bartha Knoppers, Isabel Fortier
  • Population Biobanks
  • FP6 Co-ordination Action (PHOEBE Promoting
    Harmonization Of Epidemiological Biobanks in
    Europe)
  • Jennifer Harris, Leena Peltonen, Paul Burton
  • Human Genome Epidemiology Network (HuGENet)
  • Muin Khoury, Julian Little
  • ESSENTIAL THAT ALL INITIATIVES WORK TOGETHER!!

36
Extra slides
37
Rarer genotypes
  • Genetic main effects

38
Proposed assessment visit model
39
Taking account of
  • Age range at recruitment 40-69 years
  • Recruitment over 5 years
  • All cause mortality
  • Disease incidence (healthy cohort effect)
  • Migration overseas
  • Comprehensive withdrawal (max 1/500 p.a.)
  • Partial withdrawal (c.f. 1958 Birth Cohort)

40
(No Transcript)
41
Necessary to contact subjects
42
Issues that are often ignored in standard power
calculations
  • Multiple testing/low prior probability of
    association
  • Interactions
  • Unobserved frailty
  • Misclassification
  • Genotype
  • Environmental determinant
  • Case-control status
  • Subgroup analyses
  • Population substructure

43
Harmonisation
  • Prospective
  • Retrospective
  • Description
  • Comparison
  • Harmonised synthesis

44
(No Transcript)
45
(No Transcript)
46
Recruitment and assessment
  • Recruitment via centrally held list of
    individuals registered with Primary Care
    Practitioners (GPs)
  • Assessment in large centres (?100 subjects per
    day)
  • Assessment ? 70 minutes
  • Questionnaire, physical examination, bloods

47
Assessment visit model
48
Summary
  • 80 power for genotype frequency 0.1
  • Genetic main effect ? 1.5, p10-4 ? 2,000 cases
  • Genetic main effect ? 1.3, p10-4 ? 5,500 cases
  • Genetic main effect ? 1.2, p10-4 ? 12,600 cases
  • Genetic main effect ? 1.7, p10-7 ? 2,000 cases
  • Genetic main effect ? 1.5, p10-7 ? 3,400 cases
  • Genetic main effect ? 1.3, p10-7 ? 9,500 cases
  • Genetic main effect ? 1.2, p10-7 ? 21,500
    cases
  • GE interaction with environmental exp.
    prevalence
  • 0.5 ? 2.0, p10-4 ? 10,000 to 30,000
    cases

49
UK Biobank
  • A prospective cohort study
  • 500,000 adults (40-69 years) across UK
  • A population-based biobank
  • Not disease or exposure based
  • Recruitment via electronic GP lists
  • Broad spectrum not fully representative
  • Individuals not families
  • MRC, Wellcome Trust, DH, Scottish Executive
  • 61M

50
UK Biobank
  • Initial data/sample collection and subsequent
    longitudinal health tracking
  • Nested case-control studies
  • Long time-horizon
  • Owned by the Nation
  • Central Administration Manchester
  • PI Prof Rory Collins - Oxford
  • 6 collaborating groups (RCCs) of university
    scientists

51
Smaller sample sizes
Write a Comment
User Comments (0)
About PowerShow.com