An introduction to Genetical Genomics and Systems Genetics to better understand cancer and other com - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

An introduction to Genetical Genomics and Systems Genetics to better understand cancer and other com

Description:

Global gene expression profiling of individuals with a disease and individuals ... List of differentially expressed genes (candidate genes for QTL) ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 41
Provided by: inah4
Category:

less

Transcript and Presenter's Notes

Title: An introduction to Genetical Genomics and Systems Genetics to better understand cancer and other com


1
An introduction to Genetical Genomics and Systems
Genetics to better understand cancer and other
complex diseases
  • Ina Hoeschele
  • September 2006

2
Genetic base of complex disease traits
  • Understanding the genetic determination of
    complex, disease-related traits is long-standing
    goal
  • Some standard approaches
  • Mapping of quantitative trait loci (QTL) via
    linkage or association mapping
  • In human populations
  • In animal models of specific diseases
  • Collaboration with Professor Miller (Cancer
    Biology) to identify candidate genes responsible
    for the observed phenotypic variance in lung
    tumor incidence of adult mice following in utero
    exposure to chemical carcinogens
  • Still no ready to use software for some animal
    models (AIL, RIX)

3
Genetic base of complex disease traits
  • Some standard approaches
  • Gene expression studies
  • Global gene expression profiling of individuals
    with a disease and individuals without the
    disease, or two groups of individuals with
    different sub-types of a disease (observational
    data)
  • List of differentially expressed genes (candidate
    genes for QTL)
  • Identification of differentially expressed and/or
    differentially regulated gene networks
  • Collaboration with Professor Miller
    Identification of time dependent alterations in
    signaling networks that drive tumor progression
    from benign to advanced

4
Genetic base of complex disease traits
  • Systems Biology Reconstruction of cellular
    networks involving genes, proteins, metabolites
  • Gene networks
  • phenomenological, not physical
  • network of interactions or regulations among
    genes
  • provide valuable information about the genetic
    architecture of complex diseases
  • classical genetics concepts such as dominance and
    epistasis can be understood in terms of gene
    networks and their properties
  • important practical applications, identification
    of drug targets

5
(No Transcript)
6
Gene networks from observational data
  • This network can be obtained using linear
    graphical methods
  • This is an interaction network edges between any
    pair of genes that directly interact with each
    other
  • the edges are undirected - it is not a causal or
    regulatory network
  • it does not tell us which gene regulates which
    other gene(s)
  • We compute such a network as an Undirected
    Dependency Graph

7
Gene networks from observational data
  • Undirected Dependency Graph UDG
  • Based on partial correlations
  • Corr(gene A, gene B) 0.7, Corr(A,C) 0.4
  • A ? B ? C Corr(A,C B) 0
  • Sometimes, we can find regulation
  • A ? C ? B A and B jointly regulate C
  • A and B do not regulate each other
  • Corr(A,B C) ? 0 BUT Corr(A,B) 0
  • Observational data
  • Power and Sample Size
  • Several simulation studies 100 genes, 200
    edges, samples sizes 50 200 power lt 0.20 to
    0.60

8
Causal, regulatory gene networks
  • Also represented by graph with gene nodes but now
    edges are directed
  • A ? B gene A regulates gene B
  • Two approaches
  • Time series experiment
  • Specific perturbation experiment
  • One-at-a-time specific perturbations in same
    genetic background (several genes are knocked
    down, one at a time, RNA interference)
  • Multifactorial perturbations genetical
    genomics or expression genetics (Jansen and
    Nap 2003 Trends in Genetics)

9
Design and Data
  • Specific Perturbation Experiment

10
Design and Data
  • Multi-factorial perturbation experiment
    Genetical Genomics (M marker tested genome
    location)

11
Genetical Genomics / Systems Genetics
  • Segregating population of n x 100 individuals
    (n1,2,3,), each individual is
  • DNA marker genotyped (genome-wide)
  • Expression profiled (genome-wide)
  • Phenotyped for disease-complex related traits

We need to find out which DNA markers are
expression QTL (eQTL)
12
Genetical Genomics / Systems Genetics
  • Goal causal, regulatory gene network
  • Identification of DNA markers affecting the
    expression profiles (etraits) of genes eQTLs
  • Identification of regulator-target gene pairs
    from
  • set of genes physically located in an eQTL region
    (candidate regulators)
  • set of genes affected by the eQTL region
  • Construction of Encompassing Network (EN)
  • Search for set of optimal networks within EN

13
Genetical Genomics / Systems GeneticsIdentificat
ion of expression QTLs
  • Identification of DNA marker influencing an
    etrait via linear regression
  • Eig ?g bgXik eig Xik 0 (AA) / 1
    (BB)
  • H0 bg 0
  • Standard approach genome-wide search of each
    etrait separately
  • Mapping of principal components based on PCA of
    all genes or subsets of genes obtained by cluster
    analysis

14
Genetical Genomics / Systems GeneticsGenetic
Mapping of Expression QTLs
  • Cis- and trans-eQTL mapping
  • Cis-eQTL
  • eQTL affects the expression of a gene located at
    the eQTL eQTL is a DNA polymorphism in the
    promoter of the gene
  • Test only the marker(s) closest to the gene (not
    genome-wide)
  • Cis-Trans-Regulation
  • Test the effects of any cis-eQTL on other genes
    cis-eQTL ? Gene A ? Gene B
  • cis-eQTL will affect expression of gene B

15
Genetical Genomics / Systems GeneticsGenetic
Mapping of Expression QTLs
  • Cis- and trans-eQTL mapping
  • Trans-eQTL A ? B ? trans-eQTLA
  • Coding region polymorphism in gene A which
    regulates gene B
  • Test jointly the effect of candidate regulator
    gene (kA) and its nearest DNA marker on
    expression of target gene (gB)
  • Eig ?g b1gXik b2gEik ( b3gEik?Xik) eig
  • Xik 0/1
  • Intersection-Union test to identify cases where
    b1g and b2g are both non-zero

16
eQTL overlap for SPA, PC-mapping, Cis-mapping and
Trans-mapping
7
13
21
70
83
2
1
3
SPA
Cis-mapping
24
3
8
3
8
87
16
49
2
PC-mapping
Trans-mapping
17
Genetical Genomics / Systems GeneticsIdentificat
ion of regulator-target pairs
eQTL
Regulators genes physically located in eQTL
Targets genes whose etraits are affected by
eQTL
18
Genetical Genomics / Systems GeneticsIdentificat
ion of regulator-target pairs
  • Cis- versus cis-trans regulation

19
Genetical Genomics / Systems GeneticsIdentificat
ion of regulator-target pairs
  • Cis-trans versus trans regulation

20
Genetical Genomics / Systems GeneticsIdentificat
ion of regulator-target pairs
  • Trans regulation (is gene A a trans-regulator)
  • Intersection-Union test for b1D and b2D

21
Regulator-target pairs - SPA
23
15
15
6
41
22
Regulator-target pairs - Cis mapping
14
24
41
6
15
23
Regulator-target pairs PC mapping
62
8
9
16
5
24
Regulator-target pairs Trans-mapping
25
Genetical Genomics / Systems GeneticsEncompassin
g (Directed) Network EDN
  • EDN is obtained by combining all retained
    regulator-target pairs
  • cis-eQTL ? target gene
  • cis-regulated gene ? target gene (cis-trans
    regulation)
  • trans-eQTL ? target gene
    regulator gene ? target gene (trans-regulation)
  • Next step constrained network search within the
    EDN

26
Yeast Data Encompassing Network
  • Yeast data of Brem and Kruglyak (2005)
  • 112 haploid offspring of cross between wild and
    laboratory strain of yeast
  • Expression-profiled for 6000 genes
  • DNA marker genotyped for 3000 markers
  • Used 4589 etraits and 2956 markers
  • Encompassing network
  • 28,609 regulator-target pairs
  • 4,274 gene nodes
  • 2118 gene regulators, 4116 gene targets

27
Yeast Data Encompassing Network
  • Encompassing network
  • Regulator with most targets (PHM7) 468
  • Target with most regulators (YLR152C) 32
  • Confirmed regulators
  • Amn1 top cistrans regulator with 408 targets
  • MAK5 110 trans targets
  • GPA1 60 targets (half trans)
  • Heme dependent transcription factor HAP1 141
    targets (100 cistrans)

28
Genetical Genomics / Systems GeneticsGene
network reconstruction
  • Popular tool Bayesian Network (BN)
  • Represents conditional independence A?B?C
  • Suitable for noisy data
  • Search among equivalence classes (equivalent
    models cannot be distinguished based on available
    data) A?B A?B or A?B
  • Limited to DAG directed, acyclic graph no
    cycles or feedback loops
  • Time dimension A1?B2?C3?A4? dynamic BN
  • Usually, the expression data are discretized

A
B
C
29
Genetical Genomics / Systems GeneticsGene
network reconstruction
  • Our tool Structural Equation Modeling (SEM)
  • Represents conditional independence A?B?C
  • Uses continuous expression data with normality
    assumption robustness?
  • Suitable for noisy data
  • Can model DCG directed cyclic graph
  • Search among models, not equivalence classes
  • Edge directions are fixed
  • Among two equivalent models with different
    numbers of edges (possible in DCG), we prefer the
    sparser
  • No efficient algorithm for identification of
    equivalence classes

30
Genetical Genomics / Systems GeneticsGene
network reconstruction
  • Structural Equation Modeling (SEM)

yn expression data, xn eQTL genotype codes
Regulator gene A
bBA
fC2
bBC
Target gene B
Regulator gene C
hBA1
eQTL 2
fB1
eQTL 1
kB12
31
RESULTS
  • SEM widely used in econometrics, sociology and
    psychology (confirmatory, not exploratory)
  • Typically applied to at most 10s of variables
  • Available in many software packages (Lisrel, Mx,
    ) but not feasible for genome-size data
  • Own implementation based on C can handle 100s
    of genes (in Genetical Genomics context)
  • SEM applied to sub-network of Yeast Encompassing
    Network
  • 265 genes, 241 eQTL, 832 gene-gene edges, 640
    eQTL-gene edges, cycle with 168 genes
  • Sparsified network with 475 gene-gene edges and
    468 eQTL-gene edges

32
Yeast network
EDN 265 genes, 241 QTLs, 832 gene ? gene edges,
and 640 QTL ? gene edges After sparsification
475 gene ? gene edges and 468 QTL ? gene edges
33
RESULTS
  • SEM applied to artificial gene expression data
    (known network structure)
  • 10 data sets with different, random network
    topologies, 100 genes, 100 eQTLs
  • mRNA levels simulated with non-linear ODE
    (Gepasi)
  • On average 148 gene-gene and 123 eQTL-gene edges
  • EN with 360 gene-gene and 301 eQTL-gene edges
  • On average 42 genes in cycles (1-3)
  • False Discovery Rate ( wrongly identified edges
    / total identified edges) 0 - 15.
  • Power ( edges correctly inferred / total edges
    in the true network) 72 - 100

34
Systems Genetics
  • The merger of Systems Biology with the study of
    genetic variation
  • The integration and anchoring of
    multi-dimensional data-types to underlying
    genetic variation (transcriptomic, phenomic,
    metabolomic )

35
Systems Genetics
  • Using a segregating population, we can
  • Reconstruct a network of direct relationships
    among various phenotypic measurements related to
    a disease complex
  • Reconstruct a causal gene regulatory network
    (with DNA marker and expression profiling)
  • Combine the above into a causal network of genes
    and disease phenotypes
  • Determine the extent of genetic control of
    metabolomic variation (with additional
    metabolomic profiling) and transcriptional
    control of metabolic reactions

36
Systems Genetics
  • Using a segregating population, we can
  • Reconstruct a network of direct relationships
    among various phenotypic measurements related to
    a disease complex
  • e.g., Cardiovascular System
  • Bone Fragility (morphology, mineral content,
    mechanical properties, body composition)

37
Systems Genetics
  • Our prospects for investigating the complex
    interactions between gene variants, disease, and
    the environment will be significantly improved
  • Our understanding of the gene and metabolic
    regulatory circuitry and its relationship with
    disease phenotypes will be greatly enhanced

38
Questions / Comments
39
Gene networks from observational data
  • Undirected Dependency Graph UDG
  • Observational data tumor and normal samples
    with measured gene expression for all genes
  • Method A
  • Start with a network that has an edge between any
    pair of genes which are significantly correlated
    (many of these interactions are indirect through
    other genes)
  • For each pair of genes, compute Corr(G1,G2 Gk),
    where k denotes any gene other than G1 and G2,
    retain only those edges with significant 1st
    order partial correlation
  • For each pair of genes, compute Corr(G1,G2 Gk,
    Gm), where k and m are any two genes other than
    G1 and G2, retain only those edges with
    significant 2nd order partial correlation
  • Etc.

40
Gene networks from observational data
  • Undirected Dependency Graph UDG
  • Method B
  • Estimate the covariance matrix of all genes and
    invert this matrix.
  • genes gtgt observations, sample covariance
    matrix does not have inverse
  • use of special shrinkage estimator
  • Inverse ? matrix of partial correlations
    Corr(G1,G2 all other genes)
  • Begin with a network having edges between any
    genes with significant partial correlation
  • For each pair of genes, compute Corr(G1,G2),
    retain only those edges with significant simple
    correlation
  • For each pair of genes, compute Corr(G1,G2 Gk),
    retain only those edges with significant 1st
    order partial correlation
  • For each pair of genes, compute Corr(G1,G2 Gk,
    Gm), retain only those edges with significant 2nd
    order partial correlation
  • Etc.
Write a Comment
User Comments (0)
About PowerShow.com