Title: Two examples
1Liquid Association (LA)
2Liquid Association (LA)
- LA is a generalized notion of association for
describing certain kind of ternary relationship
between variables in a system. (Li 2002 PNAS)
- Green points represent four conditions for
cellular state 1. - Red points represent four conditions for cellular
state 2. - Blue points represent the transit state between
cellular states 1 and 2. - (X,Y) forms a LA.
Profiles of genes X and Y are displayed in the
above scatter plot.
Important! Correlation between X and Y is 0
3Mathematical Statistics on LA
- EX0, EY0, SD(X)SD(Y)1
- LA is defined by following equation. g(Z) is the
conditional expectation of the correlation
between X and Y. LA(X,YZ) is the expected
changes of the correlation between X and Y.
4Stein Lemma
- To compute E(g(Z)) is not easy. With help from
mathematical statistics theory, the LA(X,YZ) can
be simplified as E(XYZ) when Z follows normal
distribution.
Stein lemma
5Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
6gene-expression data
cond1 cond2 .. condp
x11 x12 .. x1p x21 x22 ..
x2p
gene1gene2 gene n
7Correlation Coefficient has been used by Gauss,
Bravais, Edgeworth Sweeping impact in data
analysis is due to Galton(1822-1911) Typical
laws of heredity in man Karl Pearson modifies
and popularizes its use. A building block in
multivariate analysis, of which clustering,
classification, dimension reduction are recurrent
themes
8(No Transcript)
9Two classes problem
An application
ALL (acute lymphoblastic leukemia) AML(acute
myeloid leukemia)
10Why clustering make sense biologically?
The rationale is
Genes with high degree of expression similarity
are likely to be functionally related. may form
structural complex, may participate in common
pathways. may be co-regulated by common
upstream regulatory elements.
Simply put,
Profile similarity implies functional association
11However, the converse is not true
- The expression profiles of majority of
functionally associated genes are indeed
uncorrelated
- Microarray is too noisy
- Biology is complex
12Why no correlation?
- Protein rarely works alone
- Protein has multiple functions
- Different biological processes or pathways have
to be synchronized - Competing use of finite resources metabolites,
hormones, - Protein modification Phosphorylation,
proteolysis, shuttle, - Transcription factors serving both as
activators and repressors
13Transcription factors proteins that bind to DNA
Activator repressors
14Going subtleProtein modification Histone
inhibits transcription To activate transcription,
the lysine side chain must be acetylated.
Weaver(2001)
15Corepressor histone deacetylase
Thyroid hormone
Coactivator Histone acetyltransferase
16Math. Modeling a nightmare
Current
Next
mRNA
F I T N E S S
mRNA
mRNA
Observed
protein kinase
hidden
ATP, GTP, cAMP, etc
Cytoplasm Nucleus Mitochondria Vacuolar
localization
F U N C T I O N
Statistical methods become useful
DNA methylation, chromatin structure
Nutrients- carbon, nitrogen sources Temperature Wa
ter
17What is LA? PLA?
18Schematic illustration of LA
19Example 1. Positive-to-negative
- XARP4,YLAS17, ZMCM1
- Corr 0 in each plot
- For low Z (marked points in A), X and Y are
coexpressed - (B). For high Z (marked points in B), X and Y are
contra-expressed
Arp4 Protein that interacts with core histones,
member of the NuA4 histone acetyltransferase
complex actin related protein Las17 Component
of the cortical actin cytoskeleton
20(No Transcript)
21Example 2 -Negative to Positive
- XQCR9, Y ROX1, ZMCM1
- Corr0 in each plot
- For low Z (marked points in A), X and Y are
contra-expressed - (B). For high Z (marked points in B), X and Y are
co-expressed
Rox1 Heme-dependent transcriptional repressor of
hypoxic genes including CYC7(iso-2-cytochrome c
) and ANB1(translation initiation, ribosome) Qcr9
Ubiquinol cytochrome c reductase subunit 9
22(No Transcript)
23A Challenge
- What genes behave like that ?
- Can we identify all of them ?
- N5878 ORFs
- N choose 3 33.8 billion triplets to inspect
24Statistical theory for LA
- X, Y, Z random variables with mean 0 and variance
1 - Corr(X,Y)E(XY)E(E(XYZ))Eg(Z)
- g(z) an ideal summary of association pattern
between X and Y when Z z - g(z)derivative of g(z)
- Definition. The LA of X and Y with respect to Z
is LA(X,YZ) Eg(Z)
25Statistical theory-LA
- Theorem. If Z is standard normal, then
LA(X,YZ)E(XYZ) - Proof. By Steins Lemma Eg(Z)Eg(Z)Z
- E(E(XYZ)Z)E(XYZ)
- Additional math. properties
- bounded by third moment
- 0, if jointly normal
- transformation
26Normality ?
- Convert each gene expression profile by taking
normal score transformation - LA(X,YZ) average of triplet product of three
gene profiles - (x1y1z1 x2y2z2 . ) / n
-
-
27How does LA work in yeast?
- Urea cycle/arginine biosynthesis
28Yeast Cell Cycle(adapted from Molecular Cell
Biology, Darnell et al)
Most visible event
29ARG1
Glutamate
ARG2
30ARG1
Glutamate
ARG2
31ARG1
8th place negative
Y
Head
X
Compute LA(X,YZ) for all Z
Backdoor
Rank and find leading genes
Adapted from KEGG
32Why negative LA?high CPA2 signal for
arginine demand. up-regulation of ARG2
concomitant with down-regulation of CAR2
prevents ornithine from leaving the urea
cycle.When the demand is relieved, CPA2 is
lowered, CAR2 is up-regulated, opening up the
channel for orinthine to leave the urea cycle.
33Other examples (see Li 2002)
- XGLN3(transcription factor), YCAR1, ZARG4 (8th
place negative end) - Electron transport XCYT1(cytochome c1), gives
ATP1 (11 times), ATP5 (subunits of ATPase) - Calmodulin CMD1, NUF1 (binding target of CMD1),
CMK1(calmodulin-regulated kinase), YGL149W - Glycolysis genes PFK1, PFK2 (6-phospho-fructokinas
e) - CYR1(adenylate cyclase) , GSY1 (glycogen
synthase), GLC2( glucan branching),
SCH9(serine/threonine protein kinase longevity)
34SCH9
- Protein kinase that regulates signal transduction
activity and G1 progression, controls cAPK
activity, required for nitrogen activation of the
FGM pathway, involved in life span regulation,
homologous to mammalian Akt/PKB (SGD summary) - Science. 2001 Apr 13292(5515)288-90.
Regulation of longevity and stress resistance by
Sch9 in yeast.Fabrizio P, Pozza F, Pletcher SD,
Gendron CM, Longo VD. - The protein kinase Akt/protein kinase B (PKB) is
implicated in insulin signaling in mammals and
functions in a pathway that regulates longevity
and stress resistance in Caenorhabditis elegans.
We screened for long-lived mutants in nondividing
yeast Saccharomyces cerevisiae and identified
mutations in adenylate cyclase and SCH9, which is
homologous to Akt/PKB, that increase resistance
to oxidants and extend life-span by up to
threefold. Stress-resistance transcription
factors Msn2/Msn4 and protein kinase Rim15 were
required for this life-span extension. These
results indicate that longevity is associated
with increased investment in maintenance and show
that highly conserved genes play similar roles in
life-span regulation in S. cerevisiae and higher
eukaryotes.
35- Blue low SCH9
- Red high SCH9