Statistical%20Issues%20in%20Development%20and%20Evaluation%20of%20Genetic%20Risk%20Prediction%20Models - PowerPoint PPT Presentation

About This Presentation

Title:

Statistical%20Issues%20in%20Development%20and%20Evaluation%20of%20Genetic%20Risk%20Prediction%20Models

Description:

Statistical Issues in Development and Evaluation of Genetic Risk Prediction Models Nilanjan Chatterjee, PhD Chief and Senior Investigator Biostatistics Branch ... – PowerPoint PPT presentation

Number of Views:207

Avg rating:3.0/5.0

Slides: 29

Provided by: NCI87

Category:

more less

Transcript and Presenter's Notes

Title: Statistical%20Issues%20in%20Development%20and%20Evaluation%20of%20Genetic%20Risk%20Prediction%20Models

1
Statistical Issues in Development and Evaluation
of Genetic Risk Prediction Models

Nilanjan Chatterjee, PhD
Chief and Senior Investigator
Biostatistics Branch, Division of Cancer
Epidemiology and Genetics

2
Thanks to team science!

Biostatistics Branch
JuHyun Park, Fellow
Paige Maas, Fellow
Jianxin Shi, TT Investigator
Joshua Sampson, TT Investigator
Bin Zhu, TT Investigator
Mitchell Gail, Investigator
Minsun Song, Fellow
DCEG
Stephen Chanock, Director
Nat Rothman, Investigator
Debra Silverman, Investigator

Other Institutions/Collaborations Peter Kraft,
HSPH Montserrat Garcia-Closas, ICR, UK Cambridge
University, UK German Cancer Research Center BPC3
Consortium BCAC Consortium
3
(No Transcript)
4
Utility of Risk Models

Individual counseling
weighing risks and benefits for various
preventive interventions
Screening, medication, risk-factor modification
Understanding distribution of risk at
population-level and inform public heath
strategies for prevention
Comparative effectiveness studies
Design of intervention trial

5
Methodological Issues

Sample size and study design
Model building
Polygenic risk score (PRS)
Incorporating environmental risk-factors
Using external information
Model calibration
Model validation and evaluation

6
Limited Discriminatory Ability of Early GWAS
Discoveries
A tiny step to personalized risk prediction of
breast cancer
- Devilee and Rookus, NEJM, Editorial
7
Many more to be found
8
Utility of Foreseeable Cancer SNPs
Cancer Site Family History Only Known SNPs Foreseeable SNPs Family History and Known SNPs Family History and Foreseeable SNPs Epidemiologic Risk-Factors and Foreseeable SNPs
BREAST 0.536 0.599 0.635 0.613 0.646 0.670
PROSTATE 0.549 0.647 0.676 0.668 0.694
COLORECTUM 0.528 0.582 0.616 0.598 0.629 0.658
OVARY 0.509 0.557 0.568 0.564 0.575
BLADDER 0.514 0.596 0.615 0.602 0.620 0.726
GLIOMA 0.503 0.597 0.621 0.598 0.622
PANCREAS 0.517 0.576 0.600 0.588 0.610
Park et al., JCO, 2012
9
Hidden Heritability for Complex Traits
Trait HT BMI TC HDL LDL CD T1D T2D PrCA CAD
Narrow sense heritability ( ) 0.45 0.14 - 0.12 - 0.22 0.30 0.51 0.22 -
Effective sample-size for the largest GWAS 133K 162K 100K 100K 95K 25K 22K 36K 28K 73K
No. of detected SNPs 108 31 45 35 36 64 30 22 20 21
Heritability explained by detected SNPs 0.066 0.014 0.063 0.046 0.059 0.066 0.053 0.034 0.061 0.024

Heritability fraction of total variance
attributable to susceptibility (Quantitative
traits) and sibling-recurrence-risks (Qualitative
traits)

10
Challenges

Many loci with very small effects are
undetectable at genome-wide significance level
Can we still exploit them to improve risk
prediction?
Using a more liberal threshold or a fancier
penalized regression method?
Needs an understanding of power in the context
of prediction

11
Predictive Correlation Coefficient (PCC)

covariances and variances are taken with respect
to randomness of a new observation for which
prediction is desired
Remaining randomness is due to that of the
training dataset

12
The Expected PCC value for GWAS Polygenic Models

Parameters of genetic architecture
Properties of the statistical method
For fixed N, optimal threshold (opt(N)) can be
chosen by maximizing ¹(N,)

Chatterjee et al, Nature Genetics, 2013
13
Further Results

Many measures of discriminatory performance of
risk-model have a one-to-one relationship with
PCC
Can project performance of models that include
polygenic-risk-score (PRS) and family history
Family hx effect is attenuated by a quantity
related to PCC

Chatterjee et al., Nature Genetics, 2013
14
AUC (Contd)
Trait (AUC with FH alone) Model Current Sample size (N) Current Sample size (N) 3xN 3xN 5xN 5xN
Trait (AUC with FH alone) Model a10-7 aOPT a10-7 aOPT a10-7 aOPT
T2D (0.595) SNPs 0.570 0.598 0.617 0.704 0.660 0.750
T2D (0.595) SNPsFH 0.632 0.654 0.667 0.736 0.700 0.776
PrCA (0.552) SNPs 0.621 0.625 0.637 0.648 0.646 0.673
PrCA (0.552) SNPsFH 0.648 0.651 0.661 0.670 0.669 0.692
CAD (0.601) SNPs 0.582-0.584 0.587-0.589 0.595-0.604 0.612-0.650 0.603-0.629 0.635-0.676
CAD (0.601) SNPsFH 0.647-0.648 0.651-0.652 0.656-0.663 0.669-0.697 0.663-0.681 0.686-0.717
15
Architecture of Joint Effects Implications for
Disease Prevention
16
Breast Cancer Risk Modeling BPC3 Study

17,176 cases and 19,860 controls from 8
prospective studies
Risk factors
Family history, height, reproductive
risk-factors, smoking, BMI, alcohol and HRT use
SNPs
24 genotyped SNPs, imputed PRS for 86 SNPs

17
Steps for Building Absolute Risk Model and
Projecting Risk Distribution

Develop models for relative-risk
Construction of efficient PRS, Model selection
for gene-gene/gene-environment interaction
Utilize rates from SEER cancer registry to
calibrate absolute risk to the US population
Use national survey data to project risk
distribution

18
Gene-gene/Gene-Environment Interactions in
Disease-risk

Interaction in what scale?
Logistic, probit (liability threshold),
additive
Little evidence of SNP-SNP/SNP-E interactions
under the logistic scale
Lack of power or are risks truly multiplicative?
Does the scale matter?
Important to have good model-fit at extremes of
disease risks
Clinically important

19
Linear Logistic vs Linear Additive Null Models

Linear logistic
Linear additive
Can be fitted in the logistic scale under rare
disease assumption

20
(No Transcript)
21
(No Transcript)
22
A Tail-based Goodness-of-fit Test (also a global
test for interaction)
Song et al. (Biostatistics, In Press)
23
Multiplicative Model Multiplicative Model Multiplicative Model Multiplicative Model Additive Model Additive Model
Complete case analysis Complete case analysis Analysis including subjects with missing genotypes Analysis including subjects with missing genotypes Complete case analysis Complete case analysis
Hom OR Het OR Hom OR Het OR Hom OR Het OR
Hosmer and Lemeshow test 0.11 0.87 . . 0.0003 0.01
Tail-based Test
C25 0.11 0.85 0.16 0.11 0 0
C100 0.20 0.77 0.23 0.17 0 0
24
Statistically Speaking

Multiplicative model could not be rejected even
with a large dataset and a powerful method
Fit seems adequate even at extremes
Modest departure cannot be ruled out
Additive model is soundly rejected
Plethora of gene-gene interactions in the
additive scale

25
Does the Scale Matter Clinically?

Stronger risk variation (or risk stratification)
under the multiplicative than the additive model
Proportion of the population identified at 2 fold
or higher than average risk
1.16 under multiplicative model
0.02 under additive model
Correlation in PRS under two model 0.93 (AUC is
hardly different)

26
Concluding Remarks

Translating heritability to predictability is
hard
Due to highly polygenic (non-sparse) architecture
Multiplicative model for gene-gene and
gene-environment interaction works amazingly well
Time to seriously think about public health
implications for joint effects
Evaluate risk stratification
Stop using AUC

27
(No Transcript)
28
(No Transcript)

Write a Comment

User Comments (0)