Title: Bioinformatics
1Extraction of functional information from
large-scale gene expression data
- Bioinformatics
- 91.580 2003 Spring
- Jianping Zhou
2Contents
- A prominence feature of cell cycle-regulated
genes - ----- show more remarkable and active functions
than others - SP( shortest-path) analysis to extract
functional information - ----- An alternative and complementary to
clustering analysis
3Prominence Feature
- Because of their ruling features, the cell
cycle-regulated genes are assumed to be more
active and remarkable than others in the Yeast
Saccharomyces cerevisiae genome. - When performing filtering process against
original dataset by some thresholds in terms of
significance, if the cell cycle-regulated genes
show higher survival ratio than others, we may
conclude they are more active and remarkable
4Prominence Feature
- The preprocess utility of Gepas package can be
used to prepare the comparing dataset - Microarray gene data are the ideal data sources
- 800 Spellmans identified cell cycle-regulated
genes for Yeast Saccharomyces cerevisiae are the
most complete spectrum at this point
5Prominence Feature
Use a single sentence of Common Lisp to count the
hitting genes (length (intersection regu '(plain
text file content))) Â regu the preset CL list
representing the list of 800 cell cycle-regulated
gene names. It is defined in CL as (setf regu
(plain text of 800 cell cycle-regulated gene))
The plain text of 800 cell cycle-regulated gene
can be got by copy and paste of ORF column of
CellCycle98.xls  plain text file content Copy
and paste of preprocess or clustering output
plain text file inside which the ORFs
corresponding to selected genes are contained.
6Prominence Feature
7Prominence Feature
8Prominence Feature
Â
Â
9Prominence Feature
Parameter Pe, Pk, Sd Pe Minimum percentage of
existing values -- patterns with missing values
greater this rate will be removed. Pk Minimum
number of peaks -- patterns with peak values less
this value will be removed. Sd Threshold for
standard deviation -- patterns with a standard
deviation below the threshold will be
removed. P0 total profiles in the original
file P1 Removed profiles with missing values,
determined by Minimum percentage of existing
values P2 Profiles mended through imputing
missing values, determined by Minimum number of
peaks P3 Removed profiles through filtering out
flat profiles by number of peaks P4 Removed
profiles through filtering out flat profiles by
standard deviation P5 Profiles remaining in the
result dataset Hit Count of genes existing in
both result dataset and 800 Spellman cell
cycle-regulated gene dataset. Hit rate Hit / P5
10Prominence Feature
11Prominence Feature
Pe 95
12SP( shortest-path) analysis
- SP( shortest-path) analysis is used to identify
transitive genes between two given genes from the
same biological process. - Transitive expression similarity among genes can
be used as an important attribute to link genes
of the same biological pathway. - Recent advances in computational and
experimental technologies have opened up real
opportunities for annotating gene functions not
only at the phenomenological levels but also at
the mechanistic levels.
13SP( shortest-path) analysis
- With Yeast Saccharomyces cerevisiae genome, The
author, X. Zhou 5, constructed the cytoplasm
graph (another two graphs include mitochondria,
nucleus), which contain 398Â genes. All those
genes are got involved in the same biological
pathway. - Through matching the cytoplasm outcome with
Spellman CellCycle98.xls, six genes are
identified, they are - YPR045C YPL221W(BOP1) YIL056W YHR029C YDR130C
YBR053C
14SP( shortest-path) analysis
- Referring to CellCycle98.xls, all these genes
are with unknown process and far away cluster
order number each other. - For the SOM clustering output with respect to
normalized file, which has 561 hits with 800
Spellman genes, those genes exist in YPR045C
Cluster (2, 4) YPL221W Cluster (1, 1) YBR053C
Cluster (2, 7). Other three are not found. - As far as all my clustering outputs, none is
found in clustering. - All Ftigo linked databases have no results for
these five genes or ORFs - No evidence show these six genes can stay in the
same cluster.
15References
1 Paul T. Spellman, Gavin Sherlock,Michael Q.
Zhang, Vishwanath R. Iyer, Kirk Anders, Michael
B. Eisen, Patrick O. Brown, David Botstein, and
Bruce Futcher Comprehensive Identification of
Cell Cycle-regulated Genes of the Yeast
Saccharomyces cerevisiae by Microarray
Hybridization MBC, Vol. 9, Issue 12, 3273-3297,
December 1998 2 Oliveros, J.C., Blaschke, C.,
Herrero, J., Dopazo, J. Valencia, A. (2000)
Expression profiles and biological function.
Genome Informatics Workshop 2000, 11, 106-117 3
M. Q. Zhang Extracting functional information
from microarrays A challenge for functional
genomics PNAS, October 1, 2002 99(20) 12509 -
12511. 4 M. Q. Zhang Large-Scale Gene
Expression Data Analysis A New Challenge to
Computational Biologists Genome Res.,
August 1, 1999 9(8) 681 - 688. 5 X. Zhou,
M.-C. J. Kao, and W. H. Wong From the Cover
Transitive functional annotation by shortest-path
analysis of gene expression data PNAS,
October 1, 2002 99(20) 12783 - 12788. 6
www.biostat.harvard.edu/complab/SP/