Characterizing Gene Functional Expression Profiles - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Characterizing Gene Functional Expression Profiles

Description:

DNA microarray technology allows measuring expressions for tens of ... Fold change (log ratio) Statistics methods. 1)T-test. 2)ANOVA. 3)Non-parametric analysis ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 42
Provided by: hongb9
Category:

less

Transcript and Presenter's Notes

Title: Characterizing Gene Functional Expression Profiles


1
Characterizing Gene Functional Expression Profiles
  • Zoran Obradovic
  • Slobodan Vucetic
  • Hongbo Xie, Hao Sun, Pooja Hedge
  • Information Science and Technology Center, Temple
    University

2
Outline
  • Microarray Data Analysis Process
  • Functional Expression Profile Analysis
  • Functional Expression Profile Ranking
  • Functional Expression Profile Clustering
  • Functional Characterization of
  • Plasmodium Falciparum,
  • Saccharomyces Cerevisiae,
  • Mus Musculus and
  • Homo Sapiens

3
What is a DNA Microarray?
DNA microarray technology allows measuring
expressions for tens of thousands of genes at a
time
Analysis of Replicated Experiments Gordon Smyth,
Walter and Eliza Hall Institute
4
Scanning/Signal Detection
Cy3 channel
Cy5 channel
5
Microarray Data Analysis Process
  • Designing gene expression experiments
  • Image processing and analysis
  • Preprocessing raw intensity data
  • Discovering differentially expressed genes
  • Advanced analysis
  • Finding relevant pathways
  • Discovering gene expression patterns
  • Understanding gene functions
  • More information
  • www.ist.temple.edu/research/biocore.html

6
Designing Gene Expression Experiments
reference design
loop design
Design experiment
A saturated design
Comparative designing
http//discover.nci.nih.gov/microarrayAnalysis/Exp
erimental.Design.jsp
7
Image Processing and Analysis (figure is
obtained using Imagene software)
8
Preprocessing Raw Intensity Data
normalize
Analysis of Replicated Experiments Gordon Smyth,
Walter and Eliza Hall Institute
9
Discovering Differentially Expressed Genes
  • Fold change (log ratio)
  • Statistics methods
  • 1)T-test
  • 2)ANOVA
  • 3)Non-parametric analysis
  • Wilcoxon Rank-Sum Test

10
Advanced Analysis Finding Relevant Pathways
(figure is obtained using Ingenuity software)
11
Advanced Analysis Discovering Gene Expression
Patterns
  • Plasmodium Falciparum intraerythrocytic
    developmental cycle
  • Genes are sorted based on expression time peaks
  • Bozdech Z et al., PLoS Biol. 2003 Oct1(1))

12
Advanced Analysis Identifying Unknown Gene
Functions Based on Expression Profiles
Is this alignment reliable ?
  • Standard practice
  • Basic Assumption Expression profiles of
    functionally related genes are correlated
  • Objectives Confirm a specific biological
    hypothesis predict functional properties of less
    characterized genes or uncover new/unexpected
    biological knowledge
  • Methodology clustering genes based on similarity
    of their expression profiles followed by
    functional analysis of the obtained clusters

Gene 2 expression profile with function B
Unknown sequence Tag
Unknown sequence has high correlation With gene
1 expression profile
Gene 1 expression profile with function A
Functions ?
Sequence Tag has function A
13
Problems with old approaches
  • Genes with same function do not necessarily have
    the same expression profiles
  • Clustering on all genes expression profiles could
    be unreliable

14
Our Approach Analyzing Microarray Functional
Expression Profiles (FEP)FEPs Compute FEP as
the average profile of all genes associated with
a given highly correlated GO term
Advanced Analysis Identifying Unknown Gene
Functions Based on Expression Profiles
GO0004721 phosphoprotein phosphatase activity
GO0016311 Dephosphorylation
15
Questions that we address
  • How to perform functional analysis in an
    objective manner
  • How to estimate biological significance of
    discovers

16
Tools and Applications
  • Developed tools to identify
  • (1) Explore which functions have the conserved
    expression profiles
  • (Tool 1 functional expression profile
    ranking package)
  • (2) Explore which functions have similar
    expression profiles and test of their functional
    similarity
  • (Tool 2 functional expression profile
    clustering package)
  • Applications
  • Functional characterization of gene expression
    related to Intraerythrocytic Developmental Cycle
    of Plasmodium Falciparum, Saccharomyces
    Cerevisiae, Mus Musculus and Home Sapiens

17
Tools Architecture
Microarray raw data
Report
List of significantly correlated GO terms
Clusters of functional Expression profiles
Data pre- processing
Gene function annotation database
Functional expression profile ranking
Functional expression profile clustering
Gene Function Semantic Distance Mapping Space
18
Tool 1 Functional Expression Profile (FEP)
Ranking Package
  • Objective
  • Identify genes with same function having
    correlated expression profiles
  • Task
  • Evaluate gene expression correlation within each
    FEP
  • Methodology
  • Step 1 calculate average pairwise correlation
    coefficient S among n gene expression profiles
    for a given function term
  • Step 2 randomly select n genes from the whole
    dataset and compute average pairwise correlation
    coefficient S
  • Step 3 repeated Step 2 m times (mgt10,000) and
    compare the distribution S to the original S to
    evaluate p-value

19
Dataset 1 Plasmodium Falciparum
Intraerythrocytic Developmental Cycle
  • (Bozdech Z et al., (2003) PLoS Biol. Oct 1(1))

Objective Identification of P.falciparum genes
whose RNA levels vary periodically within the
asexual intraerythrocytic developmental cycle
(IDC) transcriptom Materials 5080 ORFs, 3532
unique genes, 46 assays (sampled in time) using
cDNAs Methods Permutation test with Fast Fourier
Transform alg. and correlations Found 60 of
genes transcriptionally active and most genes
only active once during the IDC Figure Major
morphological stages during the IDC and 2712
genes transcriptional profiles
20
Dataset 2 Saccharomyces Cerevisiae Cell Cycle
(Spellman et al., (1998) Molecular Biology of
the Cell 9, 3273-3297)
  • Objective Identification of yeast genes whose
    RNA levels vary periodically within cell cycle
    process
  • Materials 6178 ORFs, 4450 unique genes, 77
    assays (sampled in time) using cDNAs
  • Methods Periodicity and correlation algorithm
  • Found Identified 800 genes that meet an
    objective minimum criterion for cell cycle
    regulation
  • Figure The M/G1 clusters

21
Dataset 3 Homo Sapiens Cell Cycle(R.Cho, et al
(2001) Nature, 27)
  • Objective Identification of human genes whose
    RNA levels vary periodically within cell cycle
    process
  • Materials 6800 ORFs, 5795 unique genes, 14
    assays (sampled in time) Using affymatrix arrays
  • Methods Fold change
  • Found 700 genes that display transcriptional
    fluctuation with a periodicity consistent with
    that of the cell cycle
  • Figure Clustering analysis of cell-cycleregulate
    d transcripts

22
DataSet 4 Mus Musculus Cell Cycle(Ishida, S et
al (2001) Mol. Cell. Biol. 21, 4684-4699 )
  • Objective Analysis of gene regulation during the
    mammalian cell cycle
  • Materials 6347 unique genes, 14 assays
  • Methods Clustering
  • Found Identified 7 distinct clusters of genes
    that exhibit unique patterns of expression
  • Figure Patterns of gene expression following
    growth stimulation and during the mammalian cell
    cycle

23
Applying FEP Ranking Package Cumulative
Distributions of GO Term p-Values of Human,
Yeast, Mouse and P.F.
24
Applying FEP Ranking Package GO Terms with the
Most Conserved FEP Among Multi-organisms
25
Applying FEP Ranking Package Selection of GO
Terms with Significantly Correlated Expression
Patterns at Plasmodium Falciparum Developmental
Cycle Data
Cumulative distribution of p-values for GO terms
associated with at least two genes
GO0016311 Dephosphorylation
GO 0007028 cytoplasm Organization and
biosynthesis
46 functions of all function GO terms are
significantly correlated 52 processes of all
process GO terms are significantly correlated
Selected
26
Plasmodium Falciparum Processes and Functions
with the Highest/Lowest Correlation
Highest correlation
Lowest correlation
27
Plasmodium Falciparum Findings by FEP Ranking
Package
  • Of 12 FEPs referenced by Bozdech et al, two have
    p-value larger than 0.05.
  • E.g. the average correlation coefficient among
    genes associated with Robonucleotide Synthesis
    function is only 0.258 (p-value 0.11) which
    weakens the claim that is related to the Ring
    stage of IDC.
  • No linear relationship were found between number
    of genes associated with a given GO term and
    average correlation coefficient among these genes
  • Ranking of GO terms based on p-value could be
    useful in rapid identification of functions that
    are closely related with a specific developmental
    stage (of Plasmodium Falciparum)

28
All Datasets Findings by FEP Ranking Package
  • To some extent genes with identical functions
    have similar expression profiles
  • However, a large fraction of functions do not
    follow the underlying hypothesis!
  • Higher level organisms seem to have lower
    fraction of significantly correlated expression
    profiles for identical functions.
  • Fractions of correlated FEPs
  • Saccharomyces Cerevisiae 59 (643/1,083)
  • Plasmodium Falciparum 48.4 (428/ 884)
  • Homo Sapiens 16.4 (249/1514)
  • Mus musculus 13.3 (182/1366)
  • fractions are for both processes and functions

29
Tool 2 FEP Clustering Package
  • Objective
  • Identifying genes with similar functions and
    similar expression profiles
  • Tasks
  • Cluster FEPs selected by FEP ranking package
  • Evaluate found clusters for biological relevance
    by
  • Identifying similar functions based on GO term
    hierarchy tree structure
  • Evaluating inter-cluster GO term distance
  • Methodology
  • Randomly generate k sets each containing same
    number of GO terms as the corresponding cluster
  • Calculate total GO term distance within each
    generated set and sum total distance of all sets
    to get the overall score S
  • Repeat the procedure 1000 times and compare the
    distribution S to the overall distance obtained
    through clustering

30
Structure of GO Term Tree (Example)
GO0008150 Biological Process
Level 1
GO0007275 development
GO0007582 physiological process
Level 2
GO0007389 pattern specification
GO0008152 metabolism
Level 3
GO0000003 reproduction
GO0009798 axis specification
Level 4
Level 5
GO0009948 anterior/posterior axis specification
  • Measuring Distance of GO Terms
  • -- length of the minimal chain
    between X and Y terms in GO tree
  • -- is length of maximal chain from the top
    to the bottom

31
Determination of Number of Clusters
  • Measured
  • Larger z-score indicates a better grouping of
    functions within clusters.

32
Number of Clusters vs Z-score Results for
Plasmodium Falciparum
Plasmodium Falciparum biological processes number
of clusters vs z-scores
Plasmodium Falciparum molecular function number
of clusters vs z-scores
33
Applying FEP Clustering Package Results on
Plasmodium Falciparum Processes
k-mean clustering profiles of FEPs for 238
identified processes
1
2
Cluster vs Stage of IDC
3
4
34
Applying FEP Clustering Package Results on
Plasmodium Falciparum Functions
k-means clustering profiles of FEPs for 199
identified molecular functions
1
2
Cluster vs stage of IDC
3
4
35
GO Trees of Functions 4 Clusters of Plasmodium
Falciparum
36
Statistical Evaluation Fund vs. Random Clusters
for P. Falciparum
Biological Processes
Molecular Functions
found clusters
found clusters
  • larger distance from found cluster to random
    clusters for
  • biological processes.
  • random clusters for biological processes have
    smaller variance

37
Statistical Evaluation Clustering All GO Terms
for P. Falciparum
  • Clustering all GO terms will lead to smaller
    z-score which means that we have worse quality
    clusters
  • Right figure is P.F. functional clustering
    result. Z-score is 8.5 compared to 12 for
    clustering correlated GO terms only

found clusters
38
Statistical EvaluationFound vs. Random Clusters
at S. Cerevisiae and Homo Sapiens
found clusters
found clusters
Yeast Processes
Human Processes
found clusters
found clusters
Yeast functions
Human functions
39
Remarks
  • Statistical significance of identified clusters
    (separation between clusters and random
    groupings) is increased by
  • Normalizing data (Plasmodium Falciparum)
  • Eliminating noise through singular vector
    decomposition (SVD)
  • Reducing data through Principle Components
    Analysis

40
Conclusions
  • Proposed microarray tools help identifying
  • genes with same function and correlated
    expression profiles
  • genes with similar functions have similar
    expression profiles
  • Measuring GO tree based distance was useful for
    evaluating biological relevance of clusters
    however,
  • many GO terms have only 1 associated gene
  • many genes do not even have a GO term
  • parenthood and siblings in GO trees should be
    differentiated, but there should be a smaller
    penalty for siblings relationship compared to
    parenthood
  • More robust clustering methods could be used

41
Thank You !
More information www.ist.temple.edu/research/b
iocore.html Contact Zoran Obradovic,
director IST Center, Temple University 215
204-6265 zoran_at_ist.temple.edu
Write a Comment
User Comments (0)
About PowerShow.com