Bioinformatics - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Bioinformatics

Description:

Bioinformatics. 91.580 2003 Spring. Jianping Zhou. Extraction of functional information ... show more remarkable and active functions than others ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 16
Provided by: jian4
Learn more at: https://www.cs.uml.edu
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics


1
Extraction of functional information from
large-scale gene expression data
  • Bioinformatics
  • 91.580 2003 Spring
  • Jianping Zhou

2
Contents
  • A prominence feature of cell cycle-regulated
    genes
  • ----- show more remarkable and active functions
    than others
  • SP( shortest-path) analysis to extract
    functional information
  • ----- An alternative and complementary to
    clustering analysis

3
Prominence Feature
  • Assumption
  • Because of their ruling features, the cell
    cycle-regulated genes are assumed to be more
    active and remarkable than others in the Yeast
    Saccharomyces cerevisiae genome.
  • When performing filtering process against
    original dataset by some thresholds in terms of
    significance, if the cell cycle-regulated genes
    show higher survival ratio than others, we may
    conclude they are more active and remarkable

4
Prominence Feature
  • Methods
  • The preprocess utility of Gepas package can be
    used to prepare the comparing dataset
  • Microarray gene data are the ideal data sources
  • 800 Spellmans identified cell cycle-regulated
    genes for Yeast Saccharomyces cerevisiae are the
    most complete spectrum at this point

5
Prominence Feature
  • Methods (cont)

Use a single sentence of Common Lisp to count the
hitting genes (length (intersection regu '(plain
text file content)))  regu the preset CL list
representing the list of 800 cell cycle-regulated
gene names. It is defined in CL as (setf regu
(plain text of 800 cell cycle-regulated gene))
The plain text of 800 cell cycle-regulated gene
can be got by copy and paste of ORF column of
CellCycle98.xls  plain text file content Copy
and paste of preprocess or clustering output
plain text file inside which the ORFs
corresponding to selected genes are contained.
6
Prominence Feature
  • Steps

7
Prominence Feature
  • Steps (cont)

8
Prominence Feature
 
  • Steps (cont)

 
9
Prominence Feature
  • Steps (cont)

Parameter Pe, Pk, Sd Pe Minimum percentage of
existing values -- patterns with missing values
greater this rate will be removed. Pk Minimum
number of peaks -- patterns with peak values less
this value will be removed. Sd Threshold for
standard deviation -- patterns with a standard
deviation below the threshold will be
removed. P0 total profiles in the original
file P1 Removed profiles with missing values,
determined by Minimum percentage of existing
values P2 Profiles mended through imputing
missing values, determined by Minimum number of
peaks P3 Removed profiles through filtering out
flat profiles by number of peaks P4 Removed
profiles through filtering out flat profiles by
standard deviation P5 Profiles remaining in the
result dataset Hit Count of genes existing in
both result dataset and 800 Spellman cell
cycle-regulated gene dataset. Hit rate Hit / P5
10
Prominence Feature
  • Result

11
Prominence Feature
  • Result (cont)

Pe 95
12
SP( shortest-path) analysis
  • Introduction
  • SP( shortest-path) analysis is used to identify
    transitive genes between two given genes from the
    same biological process.
  • Transitive expression similarity among genes can
    be used as an important attribute to link genes
    of the same biological pathway.
  • Recent advances in computational and
    experimental technologies have opened up real
    opportunities for annotating gene functions not
    only at the phenomenological levels but also at
    the mechanistic levels.

13
SP( shortest-path) analysis
  • Discovery
  • With Yeast Saccharomyces cerevisiae genome, The
    author, X. Zhou 5, constructed the cytoplasm
    graph (another two graphs include mitochondria,
    nucleus), which contain 398 genes. All those
    genes are got involved in the same biological
    pathway.
  • Through matching the cytoplasm outcome with
    Spellman CellCycle98.xls, six genes are
    identified, they are
  • YPR045C YPL221W(BOP1) YIL056W YHR029C YDR130C
    YBR053C

14
SP( shortest-path) analysis
  • Discovery (cont)
  • Referring to CellCycle98.xls, all these genes
    are with unknown process and far away cluster
    order number each other.
  • For the SOM clustering output with respect to
    normalized file, which has 561 hits with 800
    Spellman genes, those genes exist in YPR045C
    Cluster (2, 4) YPL221W Cluster (1, 1) YBR053C
    Cluster (2, 7). Other three are not found.
  • As far as all my clustering outputs, none is
    found in clustering.
  • All Ftigo linked databases have no results for
    these five genes or ORFs
  • No evidence show these six genes can stay in the
    same cluster.

15
References
1 Paul T. Spellman, Gavin Sherlock,Michael Q.
Zhang, Vishwanath R. Iyer, Kirk Anders, Michael
B. Eisen, Patrick O. Brown, David Botstein, and
Bruce Futcher Comprehensive Identification of
Cell Cycle-regulated Genes of the Yeast
Saccharomyces cerevisiae by Microarray
Hybridization MBC, Vol. 9, Issue 12, 3273-3297,
December 1998 2 Oliveros, J.C., Blaschke, C.,
Herrero, J., Dopazo, J. Valencia, A. (2000)
Expression profiles and biological function.
Genome Informatics Workshop 2000, 11, 106-117 3
M. Q. Zhang Extracting functional information
from microarrays A challenge for functional
genomics PNAS, October 1, 2002 99(20) 12509 -
12511. 4 M. Q. Zhang Large-Scale Gene
Expression Data Analysis A New Challenge to
Computational Biologists Genome Res.,
August 1, 1999 9(8) 681 - 688. 5 X. Zhou,
M.-C. J. Kao, and W. H. Wong From the Cover
Transitive functional annotation by shortest-path
analysis of gene expression data PNAS,
October 1, 2002 99(20) 12783 - 12788. 6
www.biostat.harvard.edu/complab/SP/
Write a Comment
User Comments (0)
About PowerShow.com