Genome-Wide Association Studies (GWAS) - PowerPoint PPT Presentation

About This Presentation
Title:

Genome-Wide Association Studies (GWAS)

Description:

... gene Systematic screen of SNPs in an entire pathway Genomewide screen Systematic screen for all coding changes Introduction ... statistic under null ... – PowerPoint PPT presentation

Number of Views:556
Avg rating:3.0/5.0
Slides: 49
Provided by: Zha131
Learn more at: http://www.ph.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Genome-Wide Association Studies (GWAS)


1
Genome-Wide Association Studies (GWAS)
  • Epidemiology 243
  • Molecular Epidemiology of Cancer
  • Spring 2008

2
Association Studies of Genetic Factors
  • 1st generation
  • Very small studies (lt100 cases)
  • Usually not epidemiologic study design 1-2 SNPs
  • 2nd generation
  • Small studies (100-500 cases)
  • More epi focus a few SNPs
  • 3rd generation
  • Large molecular epi studies (gt500 cases)
  • Proper epi design pathways
  • 4th generation
  • Consortium-based pooled analyses (gt2000 cases)
  • GxE analyses
  • 5th generation
  • Post-GWS studies

Boffeta, 2007
3
International Lung Cancer Consortium (ILCCO)
Wichmann
Risch
McLaughlin
Schwarts
Wild
Boffetta
Kiyohara
Harris
Brennan
Goodman
Benhamou
Wiencke
Tajima
Christiani
Zhang
Landi
Hong
Stucker
Vineis
Yang
Chen
Berwick
Lan
Lazarus
Spitz
Thun
Le Marchand

3 cohort studies 17
population based case-control studies
13 hospital based case-control studies
2 studies with mixed controls
1 cross-sectional
study
4
Issues in genetic association studies
  • Many genes
  • 25,000 genes, many can be candidates
  • Many SNPs
  • 12,000,000 SNPs, ability to predict functional
    SNPs is limited
  • Methods to select SNPs
  • Only functional SNPs in a candidate gene
  • Systematic screen of SNPs in a candidate gene
  • Systematic screen of SNPs in an entire pathway
  • Genomewide screen
  • Systematic screen for all coding changes

5
Introduction
  • A genome-wide association study is an approach
    that involves rapidly scanning markers across the
    complete sets of DNA, or genomes, of many people
    to find genetic variations associated with a
    particular disease.
  • Once new genetic associations are identified,
    researchers can use the information to develop
    better strategies to detect, treat and prevent
    the disease. Such studies are particularly useful
    in finding genetic variations that contribute to
    common, complex diseases, such as asthma, cancer,
    diabetes, heart disease and mental illnesses.

http//www.genome.gov/20019523
6
Definition of GWAS
  • A genome-wide association study is defined as
    any study of genetic variation across the entire
    human genome that is designed to identify genetic
    associations with observable traits (such as
    blood pressure or weight), or the presence or
    absence of a disease (such as cancer) or
    condition.

7
Potential of GWAS
  • Whole genome information, when combined with
    epidemiological, clinical and other phenotype
    data, offers the potential for increased
    understanding of basic biological processes
    affecting human health, improvement in the
    prediction of disease and patient care, and
    ultimately the realization of the promise of
    personalized medicine.
  • In addition, rapid advances in understanding the
    patterns of human genetic variation and maturing
    high-throughput, cost-effective methods for
    genotyping are providing powerful research tools
    for identifying genetic variants that contribute
    to health and disease.

8
Potential of GWAS
9
(No Transcript)
10
Selection of SNPs(Genome-wide association
studies)
  • Molecular
  • Higher requirements Affymetrix and Illumina
  • Analytical
  • Highest requirements Data management, automation
  • Advantages
  • No biological assumptions and can identify novel
    genes/pathways
  • Excellent chance to identify risk alleles
  • Utility in individual risk assessment
  • Disadvantages
  • High costs
  • Concern of multiple tests

11
SNP Selection
12
SNP Selection
13
Affymetrix Genome-Wide Human SNP Array
  • The new Affymetrix Genome-Wide Human SNP Array
    6.0 features 1.8 million genetic markers,
    including more than 906,600 single nucleotide
    polymorphisms (SNPs) and more than 946,000 probes
    for the detection of copy number variation. The
    SNP Array 6.0 represents more genetic variation
    on a single array than any other product,
    providing maximum panel power and the highest
    physical coverage of the genome.

14
The need for GWA
  • Current understanding of disease etiology is
    limited
  • Therefore, candidate genes or pathways are
    insufficient
  • Current understanding of functional variants is
    limited
  • Therefore, the focusing on nonsynonymous changes
    is not sufficient
  • Results from linkage studies are often
    inconsistent and broad
  • Therefore, the utility of identified linkage
    regions is limited
  • GWA studies offer an effective and objective
    approach
  • Better chance to identify disease associated
    variants
  • Improve understanding of disease etiology
  • Improve ability to test gene-gene interaction and
    predict disease risk

Xu JF, 2007
15
GWA is promising
  • Many diseases and traits are influenced by
    genetic factors
  • i.e., they are caused by sequence variants in the
    genome
  • Over 12 millions SNPs are known in the genome
  • i.e., some SNPs will be directly or indirectly
    associated with causal variants
  • The cost of SNP Genotyping is reduced
  • i.e., it is affordable to genotype a large number
    of SNPs in the genome
  • Large numbers of cases and controls are available
  • i.e., there is statistical power to detect
    variants with modest effect
  • When the above conditions are met
  • associated SNPs will have different frequencies
    between cases

16
GWA is challenging
  • Many diseases and traits are influenced by
    genetic factors
  • But probably due to multiple modest risk variants
  • They confer a stronger risk when they interact
  • True associated SNPs are not necessary highly
    significant
  • Too many SNPs are evaluated
  • False positives due to multiple tests
  • Single studies tend to be underpowered
  • False negatives
  • Considerable heterogeneity among studies
  • Phenotypic and genetic heterogeneity
  • False positives due to population stratification

Xu, 2007
17
Genome coverage
  • Two major platforms for GWA
  • Illumina HumanHap300, HumanHap550, and
    HumanHap1M
  • Affymetrix GeneChip 100K, 500K, 1M, and 2.3M
  • Genome-wide coverage
  • The percentage of known SNPs in the genome that
    are in LD with the genotyped SNPs
  • Calculated based on HapMap
  • Calculated based on ENCODE

Xu, 2007
18
Strategies for pre-association analysis
  • Quality control
  • Filter SNPs by genotype call rates
  • Filter SNPs by minor allele frequencies
  • Filter SNPs by testing for Hardy-Weinberg
    Equilibrium

19
Data Analysis
  • Single SNP analysis using pre-specified genetic
    models
  • 2 x 3 table (2-df)
  • Additive model (1-df), and test for additivity
  • All possible genetic models (recessive, dominant)

20
Data Analysis
  • Haplotype analysis
  • Gene-gene and gene-environment interactions
  • Interaction with main effect
  • Logistic regression
  • Interaction without main effect data mining
  • Classification and recursive tree (CART)
  • Multifactor Dimensionality Reduction (MDR)

21
Sample size needs as a function of genotype
prevalence and OR for main effects
Boffeta, 2007
22
(No Transcript)
23
False Positives
  • False positives too many dependent tests
  • Adjust for number of tests
  • Bonferroni correction
  • Nominal significance level study-wide
    significance / number of tests
  • Nominal significance level 0.05/500,000 10-7
  • Effective number of tests
  • Take LD into account
  • Permutation procedure
  • Permute case-control status
  • Mimic the actual analyses
  • Obtain empirical distribution of maximum test
    statistic under null hypothesis

24
False Positives
  • False discovery rate (FDR)
  • Expected proportion of false discoveries among
    all discoveries
  • Offers more power than Bonferroni
  • Holds under weak dependence of the tests

25
False Positives
  • Bayesian approach
  • Taking a priori into account, False-Positive
    Report Probability (FPRP)

26
Confirmation in independent study populations
  • The approach may limit the number of false
    positives
  • Confirmation is needed to dissect true from false
    positives
  • Replication, examine the results from the 2nd
    stage only
  • Joint analysis, combining data from 1st stage
    with 2nd stage
  • Multiple stages

27
(No Transcript)
28
Issues of GWAS
  • Population stratification
  • Multiple Testing False Positives
  • Gene-Environmental Interaction
  • High Costs

29
Kingsmore, 2008
30
Kingsmore, 2008
31
(No Transcript)
32
GWAS
33
Proposed GWAS of Lung Cancer among Non-smokers
34
Motives and Conceptual Framework For Study of
Genetic Susceptibility to Lung Cancer among
Non-smokers
  • About 16 of the male smokers and 10 of female
    smokers will eventually develop lung cancer,
    which suggest exposures to other environmental
    carcinogens and individual genetic susceptibility
    may play an important role among non smoking lung
    cancer.
  • It is suggested that 26 of lung cancer are
    associated with genetic susceptibility
    Lichtenstein P, et al. NEJM, 2000)
  • We hypothesize that the variation of genetic
    susceptibility or single nucleotide polymorphisms
    (SNPs) of genes in inflammation, DNA repair, and
    cell cycle control pathways may be important on
    the development of lung cancer among non-smokers.

35
(No Transcript)
36
DNA damage repaired
Defected DNA repair gene
If DNA damage not repaired
G0
If loose cell cycle control
37
500K SNP Coverage Median intermarker distance
3.3 kb Mean intermarker distance
5.4 kb Average Heterozygosity
0.30 Average minor allele frequency
0.22 SNPs in genes 196,384 80 of genome within
10kb of a SNP
38
Figure 1. The effects of SNPs on the Risk of Lung
Cancer among Smokers and Non-smokers
OR
39
Hypothesis
  • The overall hypothesis is that multiple sequence
    variants in the genome are associated with the
    risk of lung cancer among non-smokers.
    Specifically, we hypothesize that a number of
    common nonsmoking lung cancer risk-modifying SNPs
    are in strong LD with the SNPs arrayed on the
    500K GeneChip.

40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Specific Aims
  • Aim 1. To perform exploratory tests for
    association between 500K SNPs across the genome
    and lung cancer risk among 200 non-smoking lung
    cancer patients and 200 controls.
  • Aim 2. To perform first stage of confirmatory
    association tests between lung cancer risk and
    more than 1,000 SNPs implicated in Aim 1 among an
    independent set of 600 pairs of cases and
    controls.

44
Specific Aims
  • Aim 3. To perform second stage of confirmatory
    association tests between lung cancer risk and
    more than 500 SNPs that were replicated in Aim 2
    among an additional 600 cases and 600 controls.
    Additional SNPs will also be added from our
    ongoing pathway specific analyses of DNA repair,
    cell cycle regulation, inflammation and metabolic
    pathways based on non-smokers in our lung cancer
    study.
  • Aim 4. To perform fine mapping association
    studies in the flanking regions of each of the
    30-100 SNPs confirmed in Aim 3 among the entire
    1,400 cases and 1,400 controls. The large number
    of cases with non-smoking lung cancer in this
    study population also allows us to identify SNPs
    that are associated with risk of the disease
    among nonsmokers.

45
Specific Aims
  • Aim 5. To explore the generalizability of the
    SNPs identified in Specific Aims 1-4 within a
    Chinese population of 600 nonsmoking lung cancer
    cases and 600 nonsmoking controls. The relatively
    homogeneous Chinese population not only allows us
    to further confirm the associations, but also
    improves our ability to finely map the SNPs
    associated with lung cancer risk among
    non-smokers.

46
Discussion Costs
  • Affy 500 k SNP chip 1000/case
  • 2000 x 10002m
  • 1000 x 10001m
  • 500 x 10000.5 M
  • 500 x 3000 (SNP) x 0.15225, 000
  • 500 x 30 (SNP) x 0.15 2,250

47
(No Transcript)
48
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com