Title: Diapositive%201
1Introduction to the concept of functional genomics
David Meyre, Associate Professor, McMaster
University (meyred_at_mcmaster.ca) HRM 728 Graduate
Course Genetic Epidemiology October, 24th 2014
Population Genomics Program
2Introduction to the concept of functional genomics
What Is Functional Genomics? The goal of
functional genomics is to understand the
relationship between an organisms genome and its
phenotype. Functional genomics is a field of
molecular biology that is attempting to make use
of the vast wealth of data produced by genome
sequencing projects to describe genome function.
Functional genomics uses high-throughput
techniques like DNA microarrays, proteomics,
epigenomics, metagenomics, metabolomics and
mutation analysis to describe the function and
interactions of genes.
3The genomic revolution
Human genome sequence
High-throughput technologies
GENES
Biostatistics Bioinformatics
Large human biobanks
FUNCTIONAL GENOMICS
4Gene identification approaches
Genome-wide linkage
Candidate gene
GENES
Genome-wide association
Homozygosity mapping
Full exome / genome sequencing
5Classification of human genetic diseases
OBESITY
Syndromic disease ( lt 0.004)
Monogenic disease ( lt 2 )
Polygenic disease ( 20)
6Genes and causality
SYNDROMIC / MONOGENIC DISEASE
Beyond co-segregation studies, additional
arguments are needed to demonstrate the causal
role of a mutation in the disease
? functional genomics
7Genes and causality
POLYGENIC DISEASE (e.g. type 2 diabetes)
Beyond association studies, additional arguments
are needed to demonstrate the causal role of a
variant / gene in the disease
? functional genomics
Sladek et al., Nature 2007
8Introduction to the concept of functional genomics
TRANS-ETHNIC FINE MAPPING APPROACH
9Candidate gene
Hypothesis free approaches
We are here
Fine mapping
Gene / locus Identification
Variant identification Functional prediction
(in silico)
Functional validation (in vitro, in vivo)
10Trans-ethnic fine mapping approach
.Linkage disequilibrium is the non-random
association of alleles at two or more loci . The
human genome is composed of blocks of linkage
disequilibrium . The extent of linkage
disequilibrium blocks varies according to the
ethnic background
11Trans-ethnic fine mapping approach
SNP2
SNP3
SNP4
SNP5
SNP1
Icelandic
French
Asian
African
Distance (Kb)
Disease-associated LD block
Causal SNP
12Trans-ethnic fine mapping approach
. Large-scale resequencing and case control
association studies in Icelandic, Danish, West
African and American African subjects identified
the rs903146 as the likely causal type 2
diabetes-associated SNP
13We are here
Candidate gene
Hypothesis free approaches
Fine mapping
Gene / locus Identification
Variant identification Functional prediction
(in silico)
Functional validation (in vitro, in vivo)
14Gene candidacy
. Are genes in the disease-associated LD block
involved in syndromic / monogenic forms of the
same disease? -loci associated with polygenic
obesity MC4R, BDNF, POMC, PCSK1, SIM1 -GWAS for
complex traits 20 of the GWAS loci include
genes involved in mendelian disorders for the
same trait . Are genes in the disease-associated
LD block involved in a corresponding phenotype in
animal models (KO, Tg, SiRNA)? -loci associated
with polygenic obesity MC4R, BDNF, POMC, PCSK1,
SIM1, FTO, GIPR, NPC1, SH2B1, TBC1D1, NEGR1 - gt
170 genes induce a phenotype of severe obesity in
genetic mice models . Gene function,
biology -function related to energy metabolism
15Gene candidacy
In order to find the causal gene in a
disease-associated linkage disequilibrium block,
mRNA expression studies can be useful
(microarrays, RT-PCR) 1-Is the gene expressed
in target tissues for the disease (obesity
brain, adipocytes T2D pancreas)? 2-Is the gene
mRNA expression modulated by the disease status
in a relevant tissue? 3-Is the gene mRNA
expression modulated by the disease-associated
SNP in a relevant tissue?
16Gene candidacy
- . ORMDL3 is one of the 19 genes located in the
asthma-associated LD block - . ORMDL3 is expressed in the lung
- . ORMDL3 mRNA level is modulated by asthma
disease status in lymphoblastoid cell lines - . ORMDL3 mRNA level is strongly modulated by the
asthma-associated SNP in lymphoblastoid cell
lines - ORMDL3 is a highly relevant candidate gene at
this locus -
Moffatt et al., Nature 2007
17Gene candidacy
. Combination of expression mRNA and GWAS
studies . 27 genes differentially regulated in
adipose tissue of monozygotic twins discordant
for obesity . Hypothesis driven GWAS analysis
for these 27 genes followed by a replication in a
second independent sample identified a novel
obesity gene F13A1
Naukkarinen et al., PLOS Genet 2010
18Candidate gene
Hypothesis free approaches
Fine mapping
Gene / locus Identification
We are here
Functional prediction (in silico)
Functional validation (in vitro, in vivo)
19Introduction to the concept of functional genomics
GENE VARIANT AND FUNCTION
20Gene variant and function
- Two major types of variants
- 1- Variants that affect the protein structure and
function of the gene in which they occur - Missense, nonsense, frameshift (indels) coding
mutations altered protein function - Intron / exon mutations, splicing branch points
exon skipping/adding
21Gene variant and function
- 2- Variants that affect expression and regulation
of the gene in which they occur or other distal
genes (eQTLs) - gene variant in the promoter (Transcription
Factor Biding Site) change in gene expression - gene variant in 3UTR altered mRNA stability
- gene variant in microRNAs binding sites change
in expression - gene variant in enhancers / silencers/
insulators change in expression in a distal gene
(or a group of genes) - gene variant in a CpG methylation site change
in DNA methylation pattern - Copy Number Variants (CNV) modulation of gene
expression, haplo-insufficiency
How to prove causality between a genetic variant
and a biological effect?
22In silico prediction studies for coding variants
Mutations PolyPhen-2 PANTHER SIFT SNAP PMUT
K26E - - - - -
M125I - - - -
T175M
N180S -
Y181H -
G226R -
S325N -
T558A NA - - -
G593R NA
deleterious - neutral
Eight coding non-synonymous mutations in the
PCSK1 gene have been identified in extreme obese
patients the Polyphen-2 software (conservation
of the amino-acid across evolution protein
structure) is 100 concordant with in vitro
studies
Creemers et al., Diabetes 2012
23prediction studies for regulatory variants
- Combine both in silico and indirect experimental
data, ex ANOVAR, FunciSNP, PMCA, GWAS3D - These tools attribute a score to each variant in
the LD block of the genomic region thought to
cause the phenotype and predict its functionalaty
based on its proximity to
- Transcription factor binding sites and cis
regulatory modules - phylogenetically conserved sites
- specific epigenetic marks (ex enhancer
/silencers /insulators specific proteins,
promoter proteins , DNA methylation, DNAse
hypersensitivity )
24Introduction to the concept of functional genomics
EVOLUTIONARY GENETICS
We are here
25Evolutionary genetics
Natural selection is the gradual, non-random
process by which biological traits become either
more or less common in a population as a function
of differential reproduction of their bearers. It
is a key mechanism of evolution. The term
"natural selection" was popularized by Charles
Darwin.
Evolutionary genetics (Huxley 1942) -advantageous
mutations have been positively selected in human
populations during recent evolution -disadvantageo
us mutations have been negatively selected in
human populations during recent evolution
26Evolutionary genetics
THRIFTY GENOTYPE HYPOTHESIS the 'thrifty'
genotype would have been advantageous for
hunter-gatherer populations, especially
child-bearing women, because it would allow them
to fatten more quickly during times of abundance.
Fatter individuals carrying the thrifty genes
would thus better survive times of food scarcity.
? Obesity and type 2 diabetes predisposing
mutations may show evidence of positive signature
of evolution
27Evolutionary genetics
.The LCT rs4988235 T variant confers lactase
persistence . The LCT rs4988235 T variant is
associated with more milk / dairy products
consumption and increased body mass index . The
LCT rs4988235 T variant has a selective advantage
in milk-producing dairy farming populations and
has been submitted to positive selection in
relation with events of cattle domestication .
The LCT rs4988235 T allele frequency is more
frequent in Northern (MAF 0.7) than in Southern
Europe (MAF 0.1)
28Evolutionary genetics
LCT rs4988235 T allele frequency in UK
Davey-Smith et al., EJHG 2009
29Evolutionary genetics
. Genome-wide approaches in diverse ethnic
backgrounds have identified several hundreds of
regions showing recent positive natural
selection . New methods are able to identify
causal variants in regions with positive natural
selection signature . The amino-acid change
Lys109Arg in the LEPR gene is as a causal variant
submitted to positive selection . The Lys109Arg
variant is associated with body mass index
variation
Grossman et al., Science 2010
30Evolutionary genetics
. Genome-wide approaches in diverse ethnic
backgrounds have identified several hundreds of
regions showing recent positive natural
selection . New methods are able to identify
causal variants in regions with positive natural
selection signature . The amino-acid change
Lys109Arg in the LEPR gene is as a causal variant
submitted to positive selection . The Lys109Arg
variant is associated with body mass index
variation
Grossman et al., Science 2010
31Introduction to the concept of functional genomics
Sources of data for variant functionality
prediction
We are here
32ENCODE PROJECT (ENCyclopedia of Dna Elements).
- Integrative analysis of
- 3545 biosamples (2441 in humans)
- from different cell lines/ tissues
- 971 epigenetic marks
- 5194 assays (Chip-seq, RNA-seq, IP, DNAse seq,
transcription profiling )
- https//www.encodeproject.org/
33NIH Roadmap Epigenomics Mapping Consortium
- Integrative analysis of
- 111 reference human cells/tissues
- 40 epigenetic marks
- http//genomebrowser.wustl.edu/
34Genotype-Tissue Expression (GTEx) project
- Correlations between genotype and tissue-specific
gene expression levels in - 42 cell lines/ tissues
- 100 - 200 RNA seq and genotyped samples
- http//www.gtexportal.org/home/
35 Examples of coding and regulatory variants
36Gene variation in the promoter and gene
expression
. The -11391 GgtA variant in the promoter of the
ACDC/adiponectin gene is associated with higher
in vitro promoter activity and with higher plasma
adiponectin level in lean and in obese children
Bouatia-Naji et al., Diabetes 2006
37Gene variation and long-range enhancer
. The obesity-associated FTO intron 1 region
directly interacts with the promoter of IRX3 gene
(580 Kb downstream of FTO) . The intron 1 SNP in
FTO modulates IRX3 (but not FTO) expression .
Irx3-deficient mice display a leanness phenotype
Smemo et al., Nature 2014
38Gene variation at a CpG methylation site
. Gene variant rs1421085 in intron 1 of FTO is
the main contributor to polygenic obesity (Dina
et al., Nat Genet 2007) . Gene variant
rs7202116, in full linkage disequilibrium with
rs1421085, creates a CpG methylation site and is
associated with increased methylation of a 7.7 kb
regulatory region within FTO . The 7.7 kb
regulatory region encapsulates a Highly-Conserved
non Coding Element that acts as a long range gene
expression enhancer
Bell et al., PLOS One 2010
39Intron / exon mutations and exon skipping
. Extreme obesity cosegregates with homozygosity
for a G/A substitution in the splice donor site
of exon 16 of the LEPR gene . The intron / exon
mutation induces skipping of exon 16 and a
truncated inactive leptin receptor
Clement et al., Nature 1998
40CNVs are highly causal variants in mendelian
diseases
- a 600kb heterozygous deletion (30 genes) on
chromosome 16p11.2 explains 0.7 of morbid
hyperphagic obesity and is associated with
developmental delays - duplications in the same chromosomal region are
associated with underweight and eating
restrictive disorders - SH2B1, a key modulator of the response to the
satiety hormone leptin, and a Mendelian
hyperphagic obesity gene, is located in the
deleted interval
Walters et al., Nature 2010 Jacquemont et al.,
Nature 2012
41Gene variation in 3UTR and mRNA stability
.AgtG 1044 TGA SNP is included in the ENPP1 risk
haplotype associated with higher ENPP1 plasma
level and risk of obesity / T2D .AgtG 1044 TGA
forms a linkage disequilibrium block in 3UTR
with AgtC 1092 TGA and CgtT1157 TGA .In HLA
cells transfected with either 3UTR variant or
wild-type cDNA, specific ENPP1 mRNA half-life was
increased for those transfected with 3UTR
variant cDNA (t/24.35 vs. 2.55 h p0.001)
Meyre et al., unpublished
42Cis versus Trans e-QTLs?
. The polymorphism rs9585056 is associated with
T1D, modulates the expression of the cis-gene
GPR183 and the expression of the IRF7 network
genes
Heinig et al., Nature 2010
43Gene candidate
Hypothesis free approaches
Fine mapping
Gene / locus Identification
Functional prediction (in silico)
We are here
Functional validation (in vitro, in vivo)
44In vitro functional studies
68 of non-synonymous mutations found in obese
patients are deleterious (test
alpha-MSH)
Stutzmann et al., Diabetes 2008
45In vitro/ In vivo functional studies
CRISPR/Cas9 system, a powerful genetic tool for
genome
editing and the study of functional variants
?The rs1421085 variant of FTO identified by PMCA
has been proven to modulate the expression of
IRX3 and IRX5 genes using CRISPR/Cas9 method
46Introduction to the concept of functional genomics
STUDY OF ENDOPHENOTYPES
47Study of endophenotypes
. Rs17782313 near MC4R has been associated with
BMI by GWAS . Deleterious coding mutations in
MC4R are the commonest form of monogenic obesity
with hyperphagia and increased stature . If the
SNP modulates the expression / function of MC4R,
we can predict associations with the same traits
in an appropriate direction . The SNP rs17782313
obesity predisposing allele is associated with
more snacking and overeating and increased
stature ? MC4R is a highly relevant candidate
gene at this locus
Stutzmann et al., Int J Obes 2009
48Gene candidate
Hypothesis free approaches
Fine mapping
Gene / locus Identification
Functional prediction (in silico)
Functional validation (in vitro, in vivo)
49FTO, a good illustration of integrative approach
.Novel variants identified in African
populations .FTO SNP shows evidence of positive
natural selection .The SNP is associated with
different patterns of methylation
(demethylase) FTO SNPs in intron 1 affect the
expression of other genes (IRX3 and IRX5)
implicated in fat storage and energy expenditure
. FTO complete deficiency leads to a
polymalformative lethal syndrome in humans . FTO
partial deficiency does not relate to
leanness/obesity in humans . FTO knock-out mice
are lean, FTO transgenic mice are obese . FTO is
highly expressed in hypothalamus and is regulated
by fasting and feeding . FTO SNP is associated
with food intake in humans
50Ichimura et al., Nature 2012
51(No Transcript)
52ANY QUESTIONS?
The French fair-play!