Statistical%20Issues%20in%20Development%20and%20Evaluation%20of%20Genetic%20Risk%20Prediction%20Models - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical%20Issues%20in%20Development%20and%20Evaluation%20of%20Genetic%20Risk%20Prediction%20Models

Description:

Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics Branch ... – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Statistical%20Issues%20in%20Development%20and%20Evaluation%20of%20Genetic%20Risk%20Prediction%20Models


1
Statistical Issues in Development and Evaluation
of Genetic Risk Prediction Models
  • Nilanjan Chatterjee, PhD
  • Chief and Senior Investigator
  • Biostatistics Branch, Division of Cancer
    Epidemiology and Genetics

2
Thanks to team science!
  • Biostatistics Branch
  • JuHyun Park, Fellow
  • Paige Maas, Fellow
  • Jianxin Shi, TT Investigator
  • Joshua Sampson, TT Investigator
  • Bin Zhu, TT Investigator
  • Mitchell Gail, Investigator
  • Minsun Song, Fellow
  • DCEG
  • Stephen Chanock, Director
  • Nat Rothman, Investigator
  • Debra Silverman, Investigator

Other Institutions/Collaborations Peter Kraft,
HSPH Montserrat Garcia-Closas, ICR, UK Cambridge
University, UK German Cancer Research Center BPC3
Consortium BCAC Consortium
3
(No Transcript)
4
Utility of Risk Models
  • Individual counseling
  • weighing risks and benefits for various
    preventive interventions
  • Screening, medication, risk-factor modification
  • Understanding distribution of risk at
    population-level and inform public heath
    strategies for prevention
  • Comparative effectiveness studies
  • Design of intervention trial

5
Methodological Issues
  • Sample size and study design
  • Model building
  • Polygenic risk score (PRS)
  • Incorporating environmental risk-factors
  • Using external information
  • Model calibration
  • Model validation and evaluation

6
Limited Discriminatory Ability of Early GWAS
Discoveries
A tiny step to personalized risk prediction of
breast cancer
- Devilee and Rookus, NEJM, Editorial
7
Many more to be found
8
Utility of Foreseeable Cancer SNPs
Cancer Site Family History Only Known SNPs Foreseeable SNPs Family History and Known SNPs Family History and Foreseeable SNPs Epidemiologic Risk-Factors and Foreseeable SNPs
BREAST 0.536 0.599 0.635 0.613 0.646 0.670
PROSTATE 0.549 0.647 0.676 0.668 0.694
COLORECTUM 0.528 0.582 0.616 0.598 0.629 0.658
OVARY 0.509 0.557 0.568 0.564 0.575
BLADDER 0.514 0.596 0.615 0.602 0.620 0.726
GLIOMA 0.503 0.597 0.621 0.598 0.622
PANCREAS 0.517 0.576 0.600 0.588 0.610
Park et al., JCO, 2012
9
Hidden Heritability for Complex Traits
Trait HT BMI TC HDL LDL CD T1D T2D PrCA CAD
Narrow sense heritability ( ) 0.45 0.14 - 0.12 - 0.22 0.30 0.51 0.22 -
Effective sample-size for the largest GWAS 133K 162K 100K 100K 95K 25K 22K 36K 28K 73K
No. of detected SNPs 108 31 45 35 36 64 30 22 20 21
Heritability explained by detected SNPs 0.066 0.014 0.063 0.046 0.059 0.066 0.053 0.034 0.061 0.024
  • Heritability fraction of total variance
    attributable to susceptibility (Quantitative
    traits) and sibling-recurrence-risks (Qualitative
    traits)

10
Challenges
  • Many loci with very small effects are
    undetectable at genome-wide significance level
  • Can we still exploit them to improve risk
    prediction?
  • Using a more liberal threshold or a fancier
    penalized regression method?
  • Needs an understanding of power in the context
    of prediction

11
Predictive Correlation Coefficient (PCC)
  • covariances and variances are taken with respect
    to randomness of a new observation for which
    prediction is desired
  • Remaining randomness is due to that of the
    training dataset


12
The Expected PCC value for GWAS Polygenic Models
  • Parameters of genetic architecture
  • Properties of the statistical method
  • For fixed N, optimal threshold (opt(N)) can be
    chosen by maximizing ¹(N,)

Chatterjee et al, Nature Genetics, 2013
13
Further Results
  • Many measures of discriminatory performance of
    risk-model have a one-to-one relationship with
    PCC
  • Can project performance of models that include
    polygenic-risk-score (PRS) and family history
  • Family hx effect is attenuated by a quantity
    related to PCC

Chatterjee et al., Nature Genetics, 2013
14
AUC (Contd)
Trait (AUC with FH alone) Model Current Sample size (N) Current Sample size (N) 3xN 3xN 5xN 5xN
Trait (AUC with FH alone) Model a10-7 aOPT a10-7 aOPT a10-7 aOPT
T2D (0.595) SNPs 0.570 0.598 0.617 0.704 0.660 0.750
T2D (0.595) SNPsFH 0.632 0.654 0.667 0.736 0.700 0.776
PrCA (0.552) SNPs 0.621 0.625 0.637 0.648 0.646 0.673
PrCA (0.552) SNPsFH 0.648 0.651 0.661 0.670 0.669 0.692
CAD (0.601) SNPs 0.582-0.584 0.587-0.589 0.595-0.604 0.612-0.650 0.603-0.629 0.635-0.676
CAD (0.601) SNPsFH 0.647-0.648 0.651-0.652 0.656-0.663 0.669-0.697 0.663-0.681 0.686-0.717
15
Architecture of Joint Effects Implications for
Disease Prevention
16
Breast Cancer Risk Modeling BPC3 Study
  • 17,176 cases and 19,860 controls from 8
    prospective studies
  • Risk factors
  • Family history, height, reproductive
    risk-factors, smoking, BMI, alcohol and HRT use
  • SNPs
  • 24 genotyped SNPs, imputed PRS for 86 SNPs

17
Steps for Building Absolute Risk Model and
Projecting Risk Distribution
  • Develop models for relative-risk
  • Construction of efficient PRS, Model selection
    for gene-gene/gene-environment interaction
  • Utilize rates from SEER cancer registry to
    calibrate absolute risk to the US population
  • Use national survey data to project risk
    distribution

18
Gene-gene/Gene-Environment Interactions in
Disease-risk
  • Interaction in what scale?
  • Logistic, probit (liability threshold),
    additive
  • Little evidence of SNP-SNP/SNP-E interactions
    under the logistic scale
  • Lack of power or are risks truly multiplicative?
  • Does the scale matter?
  • Important to have good model-fit at extremes of
    disease risks
  • Clinically important

19
Linear Logistic vs Linear Additive Null Models
  • Linear logistic
  • Linear additive
  • Can be fitted in the logistic scale under rare
    disease assumption

20
(No Transcript)
21
(No Transcript)
22
A Tail-based Goodness-of-fit Test (also a global
test for interaction)
Song et al. (Biostatistics, In Press)
23
Multiplicative Model Multiplicative Model Multiplicative Model Multiplicative Model Additive Model Additive Model
Complete case analysis Complete case analysis Analysis including subjects with missing genotypes Analysis including subjects with missing genotypes Complete case analysis Complete case analysis
Hom OR Het OR Hom OR Het OR Hom OR Het OR
Hosmer and Lemeshow test 0.11 0.87 . . 0.0003 0.01
Tail-based Test
C25 0.11 0.85 0.16 0.11 0 0
C100 0.20 0.77 0.23 0.17 0 0
24
Statistically Speaking
  • Multiplicative model could not be rejected even
    with a large dataset and a powerful method
  • Fit seems adequate even at extremes
  • Modest departure cannot be ruled out
  • Additive model is soundly rejected
  • Plethora of gene-gene interactions in the
    additive scale

25
Does the Scale Matter Clinically?
  • Stronger risk variation (or risk stratification)
    under the multiplicative than the additive model
  • Proportion of the population identified at 2 fold
    or higher than average risk
  • 1.16 under multiplicative model
  • 0.02 under additive model
  • Correlation in PRS under two model 0.93 (AUC is
    hardly different)

26
Concluding Remarks
  • Translating heritability to predictability is
    hard
  • Due to highly polygenic (non-sparse) architecture
  • Multiplicative model for gene-gene and
    gene-environment interaction works amazingly well
  • Time to seriously think about public health
    implications for joint effects
  • Evaluate risk stratification
  • Stop using AUC

27
(No Transcript)
28
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com