Computational Analysis of Genome-Wide Gene Expression: From Genes to Networks to Evolution - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Computational Analysis of Genome-Wide Gene Expression: From Genes to Networks to Evolution

Description:

Universal Turing Machine. Abstraction. Architecture. Architectural Problems in Gene Regulation ... Chi-square: 179.61, 1 df, p 0.0001 ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 69
Provided by: Junh9
Category:

less

Transcript and Presenter's Notes

Title: Computational Analysis of Genome-Wide Gene Expression: From Genes to Networks to Evolution


1
Computational Analysis of Genome-Wide Gene
Expression From Genes to Networks to Evolution
  • Junhyong Kim
  • University of Pennsylvania
  • Department of Biology
  • Penn Center for Bioinformatics

2
DNA makes RNA makes Proteins
3
The organism/cell is a dynamical system
  • Autonomous Dynamics
  • Development, Cell-cycle
  • Environmental Interaction
  • Metabolism
  • Behavior
  • Phenotypic Plasticity
  • Cell fate Survival, replication,
    differentiation, cell-death

4
Control of Gene Expression
5
(No Transcript)
6
(No Transcript)
7
Expression Microarray/DNA Chips
Probes
Target
Oligonucleotide cDNA
8
Microarray Measurement of Genome-wide Gene
Expression
Technologies differ on Type of probe (length,
multiplicity, gene content) How probe is
synthesized (independently, directly on the
media) Substrate for the probe Detection
mechanism (fluorescence, direct)
Target Pool of nucleotide strings (mRNA, mRNA
copies, genomic DNA)
Probes Whole gene copies (cDNA), segment of the
gene (amplicons), long oligomers (5070bp), short
oligomers (1425bp)
Target capture by the probes in proportion to
target availability
9
Sources of variability in microarray measurements
  • Biological
  • Inherent noise
  • Expression kinetics
  • Physiological noise
  • Developmental noise
  • Measurement problems
  • Tissue heterogeneity
  • Temporal heterogeneity
  • Transcript extraction and preparation
  • Instrumentation
  • Inherent noise
  • Probe arraying (printing)
  • Target/probe labeling
  • Detector noise
  • Measurement Bias
  • Probe/target composition effects (GC, gene
    family, 2nd structure, genomic location)
  • Differential labeling
  • Background signal, contamination, edge effects,
    focus
  • Optics

10
Computational Analysis in Microarray Experiment
Here Lie Pipe Dreams
Biological Inference Differential
Expression Classification Pathway
Reconstruction Other
Array Design Composition Grey Coding 2ndary
Structure Etc.
Image Analysis Alignment Foreground
Selection Noise
Quantification Bias correction Signal
definition Signal aggregation
11
Transcriptome Analysis
  • Classification (e.g., Cancer types)
  • Lots of characters available where there used to
    be few
  • Targeted Experiments
  • Gene Cloninglots of prior knowledge
  • Systems Analysis
  • Want the logical understanding of gene action at
    the whole system level

12
Lac Operon
catabolite activator protein (CAP) /cAMP receptor
protein (CRP)
Low when glucose is high High when glucose is low
13
Logic of Lac Operon
High
High
Transcription
cAMP
Low
Low
lac repressor ON/OFF
switch
rheostat
14
Genes, Pathways, Phenotypes
How is the Gene Circuitry Connected and how does
it relate to the phenotype?
15
Dang hard and not enough data
16
Holy Grail de novo deduction of molecular
interactions
Cast of characters receptor tyrosine kinase GRB2
growth factor receptor bound protein2 SOS Son of
Sevenless Ras GAP GTPase activating
protein Raf MEK MAPkinase/ERKkinase kinase ERK
Extracellular signal regulated kinase Elk-1 Ets
like protein 1
17
Some baby step problems
  • What is the best way to visualize gene expression
    data?
  • How do we know if the expression levels of a gene
    is functionally important?
  • What is the genetic nature of expression data and
    how does it evolve?
  • Can we meaningfully deduce some of the gene
    interactions from the expression data?
  • Give up the idea of deducing circuitry?

18
Visualizing Dynamically Complex High-Dimensional
Gene Expression Data
Rifkin and Kim, Bioinformatics 2002
19
Problem Microarray Data is Noisy, Short, and
High Dimensional
20
Project onto a line (vector) such that the
temporal variation is best captured
Gene 3
Expression Level
Gene 2
Time
Gene 1
21
Dynamical reconstruction from single-valued
measurements(Takens 1981 Packard et al., 1980)
E(t2d)
E(td)
t
E(t)
E(td)
E(t2d)
E(t)
22
Data from Spellman et al. 1998
23
Human Hela Cell line Time Series, Whitfield et
al. 2002
24
Evolution of Gene Expression and Using
Comparative Methods to Assess Functional
Significance
  • Rifkin, Kim, and White, Nat Genet, 2003

25
Thousands of genes being expressed, are they all
functionally active? What is the structure of
their activity? Is gene expression under
selection?
26
Goal Use comparative data to decompose the gene
expression into functional categories
  • (analogous to sequence analysis)

27
Species B
Species A
Sequence
ACGTTACGGTAC AGGTTACGGAAA
Gene Expression
1.24 vs 1.33
28
Gene Expression is a Polygenic Quantitative
Phenotypic Functional Trait
Holstege et al. 1998
29
Brownian Motion Model of Mutation-DriftFelsenstei
n, 1973 Lande, 1979 LynchHill, 1986
HansenMartins, 1996
t
30
Need to Know For each gene, how much mutational
change accumulates in gene expression under no
selection
31
Experimental Design
Single genetic line
Accumulate Mutations with small population size
to minimize selection
D. melanogaster 230 generations with mutation
accumulation (4 mating pairs per line) from D.
Houle (FSU) 6 lines assayed at larvae-pupae
transition for 13,076 genes
Measure Gene Expression
32
Mutation Accumulation Line Results
  • 1080 genes with significant mutational variation
    at p lt 0.05, 536 genes at p lt 0.01
  • Mutational heritability for 536 significant
    genes Average 0.784, Range (0.417, 5.77)
  • These values are generally higher than for other
    phenotypic measurements (Clark et al., 1995
    0.080.315, Lynch 1988 0.001, Mackay et al.
    1992 0.03)
  • No detectable pattern in cytological position

33
Map Position in the Drosophila Genome for Genes
Showing Significant Mutational Variance
34
Comparative Gene Expression in Drosophila
4-5 mya?
2-3 mya
20,000 ya
D. melanogaster
CS
OR
Samarkand
Netherlands 2
D.simulans
D. yakuba
35
Experimental Setup
  • Measure expression contrast between Late
    Larvae/Pupae transition
  • Four strains of D. melanogaster D. simulans
    D. yakuba
  • All measured on D. melanogaster arrays with
    13,000 genes
  • 4-6 replicates with dye swap design

36
Compare Expression Difference Between
Larvae/Pupae Transition
data from Arbeitman et al., 2002 (in press)
37
Highlights
  • 3000 genes change developmentally for each
    strain/species
  • 50 of the genome changes developmentally in at
    least one lineage
  • A common set of 1000 genes change
    developmentally in all strain/spp
  • 3,600 genes show evolutionary change across the
    six lines

38
Brownian Motion Model of Mutation-DriftFelsenstei
n, 1973 Lande, 1979 LynchHill, 1986
HansenMartins, 1996
t
39
all genes 13061
changing developmentally in some lineage 6618
not changing 6443
stabilizing selection across the clade 4565
not stabilized2053
variable469
not changing5974
stabilizing selection within D.mel.
implying directional selection between species
1556
polymorphic within melanogaster 497
not neutral implying lineage specific
directional selection 209
neutral 288
40
Even when a genes expression differs
(significantly) across some treatment, it may not
be functionally important
all genes 13061
changing developmentally in some lineage 6618
not changing 6443
stabilizing selection across the clade 4565
not stabilized2053
variable469
not changing5974
stabilizing selection within D.mel.
implying directional selection between species
1556
41
Summary of Drosophila Gene Expression Evolution
  • Gene expression evolves as a polygenic
    quantitative trait with mutational variability
    comparable to or higher than standard phenotypic
    traits
  • Not all significantly different expression can be
    interpreted as functionally important
  • There seems to be more functional conservation
    for turning on a gene than for turning off
    the gene

42
Deducing Gene Regulatory Networks
43
Estimating Networks Using FOCI
  • FOCI First Order Conditional Independence
  • Related to Graphical Models and Bayesian Networks
  • Based on conditional independence relationships
  • Start with covariances or correlations between
    variables of interest

44
Why Conditional Independence?
Gene X
Gene A
Gene Y
Two genes expression may seem related but are
actually due to the influence of a third gene.
Thus, the Gene X and Gene Y is conditionally
independent given Gene A
(We limit ourselves to first order because of
data density)
45
What is FOCI?
  • Two variables, X and Y, are first order
    conditionally independent if there is at least
    one other variable in the analysis, Z, for which
    ( X ? Y Z)
  • Ignores higher moment conditional interactions
    (as used in Graphical Modeling or Bayesian
    Networks)
  • However, it is very fast to compute and so can be
    employed to study very large problems

46
Building a Network Using First Order Conditional
Independence (FOCI)
Calculate conditional interactions
Remove edges between pairs of variables which are
conditionally independent
Start with saturated network (all edges connected)
47
FOCI networks reveal patterns of functional and
regulatory interaction
48
Example Subnetwork Mating Response
49
Ideal Case
Davidson et al. 2002
50
Not enough data!
600 million transistors
millions of regulatory elements
51
Give up the idea of circuitry!
  • Organizational Principles can be understood at an
    abstract level
  • Computation
  • CPU Engineering
  • Statistical Mechanics

52
Abstraction
Universal Turing Machine
53
Architecture
54
Architectural Problems in Gene Regulation
  • What is the connectivity of genes in terms of
    their expression regulation?
  • Can we partition the genes into functionally
    modular components?
  • What are the the dynamical and functional
    relationship between such modules?

55
Yeast Regulatory Network Statistics
56
Rosetta Compendium Data Set (274 gene knockouts
in Yeast)
affecting change in value gt 2 S.E. over
control variability
57
Modularity of the Yeast Genetic Network
Clustering Based on Topological Overlap among
Vertices
Sorted Adjacency Matrix
Sorted Correlation Matrix
58
Algorithmically Extracted Modules
59
Positive vs. Negative Interactions in the Yeast
FOCI Network
60
Contigency Table Direction of Marginal
Correlation with Respect to Clusters
Within Clusters Between Clusters Row Total
Direction of Marginal Correlation Positive 2449 (2349.5) 640 (739.5) 3089 (92.5)
Negative 153 (252.5) 179 (79.5) 332 (9.7)
Column Total 2602 (76) 819 (24) Grand Total 3421
Chi-square 179.61, 1 df, p lt0.0001
Conclusion Negative interactions preferentially
occur between modules
61
Network Responses to Selection
Expression higher in evolved strains
Expression higher in parental strain
Data from Ferea et al. 1999
62
Towards Comprehensive Analysis of Gene Expression
Dynamics
  • Measure Fundamental Parameters (e.g., Mutational
    Variance-Covariance, RNA decay constants)
  • More external sources of information (e.g.,
    literature, direct interaction experiments)
  • More Data (at least on the order of number of
    genes)

63
All Natural Science progresses from asking What
is it? to asking What are its organizational
principles? Biology has reached the information
density where this transition is taking place A
key part of this transition is asking different
kinds of questions separate from traditional
questions
64
Thanks to...
  • Scott Rifkin
  • Paul Magwene
  • Yu Sun
  • Sheng Guo
  • Kevin White (Yale)

65
Rosetta Compendium Study for Yeast
TranscriptomeHughes et al., Cell 2000
  • Nearly Isogenic Background, Identical Growth
    Conditions
  • 276 deletion of known and unknown genes
    (Mutational variance)
  • 24 perturbation with compounds (Environmental
    variance)
  • 63 control lines (Control variance)

66
Dimension Estimation
Noise (3D)
Model (2D)
Data (3D)
67
Effective Dimensions of Rosetta Mutational
Variation
Max noise
17 dimensions out of 275 degrees of freedom
68
Expression is a very integrated trait
affecting change in value gt 2 S.E. over
control variability
Write a Comment
User Comments (0)
About PowerShow.com