Title: Selection of Candidate Genes for Population Studies
1Selection of Candidate Genes for Population
Studies
- Zuo-Feng Zhang, MD, PhD
- Epidemiology 243 Molecular Epidemiology
2Gene Selection for Molecular Studies
- Selection of putative genetic factors is the
central issue of the molecular epidemiological
studies even thought the selection of the
putative risk factors are equally important
because of the focus of the molecular
epidemiology is the assessment of
gene-environment interaction
3Two Types of Genes
- High Risk Genes
- Low Risk Genes
4Familiar Disease Genes (High Risk Gene)
- -High penetrance
- -High AR/RR
- -Gene frequency low (lt1)
- -Study setting family
- -Study type Linkage
- -PAR low
- -Role of Environment Modest
5Example of High Risk Genes
- Mutations of TP53 gene
- BRCA1 and BRCA2
- RB gene mutations
6Susceptibility Genes (Low Risk Genes)
- -Low penetrance
- -Low AR/RR
- -Gene frequency high (gt1-90)
- -Study setting population
- -Study type association
- -PAR high
- -Role of Environment critical
7Approach for High Risk Genes
- Functional approach (forward genetics) from
genotype to phenotype -
- Positional approach (reversed genetics) from
phenotype to genotype
8Functional Approach An Example
- From patients with DNA repair defects
- a cell line is created
- Add certain fragment of human chromosome
- Produce a repair component phenotype
9Positional Approach
- Linkage analysis
- Loss of heterozygosity (LOH)
- Chromosome abnormalities
10Linkage analysis
- It is method to identify the disease loci
- Family based, need sufficient sample size
- Germline DNA from affected and unaffected
individuals - A genetic mechanism (autosomal dominant/recessive)
- A set of markers
11Loss of Heterozygosity (LOH)
- Need both normal and tumor tissues
- The loss of signal in targeted tissue (tumor) in
comparison with normal tissue - If LOH consistently observed in a particular
region, an indication of an important gene is
indicated in the region.
12Chromosome Abnormalities
- Deletion
- Insertion
- Microsatellite instability
13In-depth Approaches to Identify Candidate Genes
- When above three methods indicate a region in
chromosome, further work is needed to identify
particular candidate genes - -Mutation screening
- -restoration of normal phenotype by transfection
of a normal allele - -mouse model of disease by introducing defective
mutations
14Approaches for Low Risk Genes
- Linkage analysis may not be feasible because it
requires a relatively large sample size (If the
OR2, 2500 family would be needed)
15Approaches for Low Risk Genes
- New techniques will be needed to identify the low
risk susceptibility genes - -Automated micro-array genechips
- -SNP identification
16Selection of Putative Genes (1)
- Inter-individual variation in the trait exist in
the population - -If there is very small variation of the
phenotype in the population, the rationale to
examine the genotype is weak. - -If there is a very large variation of the
phenotype, other potential factors need to be
considered
17Selection of Putative Genes (2)
- The gene is involved in the process related to
carcinogenesis - -DNA repair
- -Chromosome stability
- -Activities of oncogenes/tumor suppressor genes
- -cell cycle control/signal tranduction
18Selection of Putative Genes (3)
- The trait exhibits an inheritance pattern
consistent with Mendelian transmission - Any phenotype should have a genetic basis
19Selection of Putative Genes (3)
- Certain phenotypes such as mutagen sensitivity
has been reported to be associated with many
smoking related cancers, however, the precise
nature of this susceptibility factor remains
incompletely understand because the genotype
associated with mutagen sensitivity is still
unclear.
20Selection of Putative Genes (4)
- Gene action exists in relevant organ.
- -CYP1A1 is largely absent from liver, but
present in lung - -CYP2D6 is expressed in brain
- -GSTM1 has some expression in lung
- -GSTP1 is expressed in lung
21Selection of Putative Genes (5)
- Gene location and characterization.
- -Similar gene structure may indicate similar
function - -Most of mutations occur in the coding sequence,
but mutations in intragenic noncoding may occur - -Specific point mutation may indicate specific
exposures
22Selection of Putative Genes
- Polymorphisms and mutation
- Gene-Gene interactions
- Animal models
- Human studies
- Genotype-phenotype
- Relation to disease
- Ethnic variation
23Selection of Putative Genes
- Gene-Gene interaction (phase I and phase II).
- -CYP1A1 and GSTM1 and lung cancer risk, PAH
(carcinogens) - -CYP2A6 and CYP2D6, NNK
242-1. BackgroundThe summary of characteristics
and significance of the genes of interest.
252-1. Background Theoretical model of
gene-gene/environmental interaction pathway
262-1. Background Theoretical model of
gene-gene/environmental interaction pathway
272-1. Background Theoretical model of
gene-gene/environmental interaction pathway
GSTM1
282-1. Background Theoretical model of
gene-gene/environmental interaction pathway
DNA damage repaired
Defected DNA repair gene
If DNA damage not repaired
If loose cell cycle control
29DNA damage repaired
Defected DNA repair gene
If DNA damage not repaired
If loose cell cycle control
302-1. BackgroundThe summary of epidemiological
literature for the genes of interest
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37UCLA Prostate Cancer SPORE Development
ProjectSingle Nucleotide Polymorphisms (SNPs) of
Genes in the DNA Double Strand Break Repair
(DSBR) Pathways and Risk of Prostate Cancer, A
Preliminary Study
- Zuo-Feng Zhang, MD, PhD
- Department of EpidemiologyUCLA School of Public
Health
38Epidemiological Observations Involvement of DSBR
Pathway Genes in Prostate Cancer Risk
- The risk of prostate cancer is known to be
elevated in carriers of germline mutations in
BRCA2 - Increased risk of prostate cancer is also
observed in carriers of BRCA1 and CHEK2
mutations, and also associated with SNPs of the
ATM genes - Those observations indicate possible involvement
of DNA DSBR pathway genes
39Non-homologous Recombination
homologous recombination
BRCA1
BRCA2
Damage recognition cell cycle delay response
(DRCCD )
ATM
CHEK2(RAD53
BRCA1
40Hypotheses
- Single Nucleotide Polymorphisms (SNPs) of genes
in the DNA Double Strand Break Repair (DSBR)
Pathways may be associated with the
susceptibility to prostate cancer. -
- We further hypothesize that the SNPs of the DSBR
may interplay each other and may modify effects
of environmental factors on the risk of prostate
cancer.
41Specific Aim 1
- To assay Single Nucleotide Polymorphisms (SNPs)
of genes in double strand break (DSB) repair
pathway, including genes involved in Homologous
Recombinational Repair (HRR) RAD51, RAD52,
RAD54L, NBS1, XRCC2, XRCC3, BRCA1, and BRCA2
LIG4, and XRCC4 in Non-homologous end-joining
(NHEJ), ATM, BRCA1, CHEK1, CHEK2 (RAD53), P53,
and HUS1 in damage recognition cell cycle delay
response (DRCCD) pathway.
42Specific Aim 2
- To evaluate independent effect of SNPs of the DSB
repair pathway when potential confounding
factors, such as age, race, and education and to
assess potential combined effects of SNPs - To explore possible effect modifications on
nutritional factors on the risk of prostate
cancer.
43Proposed Experimental Approach
- This study is based on a case-control study with
a total of 122 cases with prostate cancer and 135
healthy controls. All cases and controls were
interviewed by a research nurse using a standard
epidemiological questionnaire at MSKCC from 1993
to 1997. - Blood samples and tumor tissue specimens were
collected. - The SNPs will be genotyped in individual DNA
samples using the SNPlex platform by ABI. The
UCLA Sequencing and Genotyping Core Facility has
recently added Applied Biosystems
high-throughput SNP genotyping assay SNPlex
to the available services. This assay is
flexible, robust and highly reproducible.
44JCCC Genotyping Core ABI SNPlex, a New High
Throughput Approach to Identify SNPs of
Susceptibility Genes
www.genetics.ucla.edu/genotyping
45Zhang Lab SNP GenotypingPilot Project
- 75 passed design process
- 48 SNPs chosen for first pool
- Whole Genome Amplification of DNA for 3080
samples - 122,496 SNPs since genotyped since January
46Preliminary Results
- 99.4 reproducibility by automated scoring.
- 99.7 reproducibility by manual scoring.
- 6 SNPs never worked
- 96 call rate of remaining markers
- Comparable to results reported by other labs
47Study Population
48Progress of the Study
- Specific Aim 1, we have assayed selected single
nucleotide polymorphisms (SNPs) of genes in
double strand break repair (DSBR) pathway,
including genes BRCA1, NBS1, TP53, APEX1, CHEK1,
CHEK2, and ATM in 68 cases with prostate cancer
and 90 healthy male controls using ABI SNPlex
platform.
49Progress of the Study
- Specific Aim 2, we explored independent effect of
SNPs of the genes mentioned above in the DSBR
pathway when potential confounding factors, such
as age, race, and education were controlled.
50(No Transcript)
51Results of Preliminary study
- The adjusted ORs are
- 4.6 (95CI 0.6-34.1) for BRCA1 (rs8176109) 5.0
(95 CI 1.1-22.3) for NBS1 (rs9995) - 3.1 (95CI 0.46-21.2) for TP53 (rs2909430)
- 2.0 (95CI 0.49-8.02) for APEX1 (rs3136820) 2.6
(95CI 0.58-11.6) for CHEK1 (rs506504) - 0.6 (95CI 0.17-2.3) for ATM (rs228591).
52Future Plan
- We will continue our proposed specific aims by
assaying additional SNPs in the DSBR pathway
genes as well as other pathways including other
DNA repair pathways, metabolic, inflammatory, and
cell cycle pathways among prostate cancer cases
and controls. We will explore the independent
effect of those SNPs on the risk of prostate
cancer. We will also add the haplotype tagging
SNPs of the DSBR pathways in order to identify
haplotypes associated with prostate cancer risk.
Those additional studies will have a greater
impact on the translational research objectives
of the SPORE.
53BRCA1 Haplotypes and Risk of Prostate Cancer
54Translational Potential of the Study
- Our results with relatively small samples size
suggest potential involvement of SNPs of the DSBR
pathway genes in the development of prostate
cancer. - If confirmed by studies with larger sample size,
SNPs in DSBR pathway genes may be used in
individual risk assessment, and identification of
high risk population for intervention and
chemoprevention
55(No Transcript)
56The Selection Criteria of SNPs
- functional SNPs if possible
- amino-acid-changing SNPs
- SNPs in the functional region of the gene or SNPs
without amino acid changes that were hypothesized
to affect the transcription/ translation of the
protein - the rare allele frequency of SNPs must be equal
to or higher than 5 in the general population
57(No Transcript)
58(No Transcript)
59(No Transcript)
60Proposed Study of Lung Cancer among Non-smokers
61Motives and Conceptual Framework For Study of
Genetic Susceptibility to Lung Cancer among
Non-smokers
- About 16 of the male smokers and 10 of female
smokers will eventually develop lung cancer,
which suggest exposures to other environmental
carcinogens and individual genetic susceptibility
may play an important role among non smoking lung
cancer. - It is suggested that 26 of lung cancer are
associated with genetic susceptibility
Lichtenstein P, et al. NEJM, 2000) - We hypothesize that the variation of genetic
susceptibility or single nucleotide polymorphisms
(SNPs) of genes in inflammation, DNA repair, and
cell cycle control pathways may be important on
the development of lung cancer among non-smokers.
62(No Transcript)
63DNA damage repaired
Defected DNA repair gene
If DNA damage not repaired
G0
If loose cell cycle control
64Issues in genetic association studies
- Many genes
- 25,000 genes, many can be candidates
- Many SNPs
- 10,000,000 SNPs, ability to predict functional
SNPs is limited - Methods to select SNPs
- Only functional SNPs in a candidate gene
- Systematic screen of SNPs in a candidate gene
- Systematic screen of SNPs in an entire pathway
- Genomewide screen
- Systematic screen for all coding changes
65Selection of SNPs(Genome-wide association
studies)
- Molecular
- Higher requirements Affymetrix and Perlegen
- Analytical
- Highest requirements Data management, automation
- Advantages
- No biological assumptions and can identify novel
genes/pathways - Excellent chance to identify risk alleles
- Utility in individual risk assessment
- Disadvantages
- High costs
- Concern of multiple tests
66500K SNP Coverage Median intermarker distance
3.3 kb Mean intermarker distance
5.4 kb Average Heterozygosity
0.30 Average minor allele frequency
0.22 SNPs in genes 196,384 80 of genome within
10kb of a SNP
67(No Transcript)
68LIG SNP and Passive Smoking
69Figure 1. The effects of SNPs on the Risk of Lung
Cancer among Smokers and Non-smokers
OR
70Hypothesis
- The overall hypothesis is that multiple sequence
variants in the genome are associated with the
risk of lung cancer among non-smokers.
Specifically, we hypothesize that a number of
common nonsmoking lung cancer risk-modifying SNPs
are in strong LD with the SNPs arrayed on the
500K GeneChip.
71(No Transcript)
72(No Transcript)
73(No Transcript)
74Specific Aims
- Aim 1. To perform exploratory tests for
association between 500K SNPs across the genome
and lung cancer risk among 200 non-smoking lung
cancer patients and 200 controls. - Aim 2. To perform first stage of confirmatory
association tests between lung cancer risk and
more than 1,000 SNPs implicated in Aim 1 among an
independent set of 600 pairs of cases and
controls.
75Specific Aims
- Aim 3. To perform second stage of confirmatory
association tests between lung cancer risk and
more than 500 SNPs that were replicated in Aim 2
among an additional 600 cases and 600 controls.
Additional SNPs will also be added from our
ongoing pathway specific analyses of DNA repair,
cell cycle regulation, inflammation and metabolic
pathways based on non-smokers in our lung cancer
study. - Aim 4. To perform fine mapping association
studies in the flanking regions of each of the
30-100 SNPs confirmed in Aim 3 among the entire
1,400 cases and 1,400 controls. The large number
of cases with non-smoking lung cancer in this
study population also allows us to identify SNPs
that are associated with risk of the disease
among nonsmokers.
76Specific Aims
- Aim 5. To explore the generalizability of the
SNPs identified in Specific Aims 1-4 within a
Chinese population of 600 nonsmoking lung cancer
cases and 600 nonsmoking controls. The relatively
homogeneous Chinese population not only allows us
to further confirm the associations, but also
improves our ability to finely map the SNPs
associated with lung cancer risk among
non-smokers.
77Discussion Costs
- Affy 500 k SNP chip 1000/case
- 2000 x 10002m
- 1000 x 10001m
- 500 x 10000.5 M
- 500 x 3000 (SNP) x 0.15225, 000
- 500 x 30 (SNP) x 0.15 2,250
781040 controls, 601 head and neck cancer cases,
and 611 lung cancer cases
79(No Transcript)
80GSTM1 and Lung Cancer among Non-Smokers
1.19 0.69-2.03
GSTM1 Normal
Null Smoking No
No
81GSTT1 and Lung Cancer among Non-Smokers
1.53 0.83-2.81
GSTT1 Normal Null
Smoking No
No
82p53 codon 72 and Lung Cancer among non-smokers
0.79 0.32-1.95
p53 A/A or A/P P/P Smoking No
No
83GSTP1 and Lung Cancer among Non-Smokers
0.69 0.39-1.24
GSTP1 Ile/Ile Any Val
Smoking No
No
84(No Transcript)
85(No Transcript)
86(No Transcript)
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93(No Transcript)