FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS - PowerPoint PPT Presentation

About This Presentation
Title:

FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS

Description:

Gene expression is the process of synthesizing a functional protein coded by the ... Flowering time in Arabidopsis. MOLECULAR ANNOTATION ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 23
Provided by: Ric143
Category:

less

Transcript and Presenter's Notes

Title: FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS


1
FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
  • Jayesh Pandey, Mehmet Koyuturk, Wojciech
    Szpankowski, and Ananth Grama.
  • PURDUE UNIVERSITY
  • DEPARTMENT OF COMPUTER SCIENCE
  • Supported by the National Institutes of Health

2
GENE REGULATION
  • Gene expression is the process of synthesizing a
    functional protein coded by the corresponding
    gene
  • Genes (and their products) regulate the extent of
    each others expression
  • Any step of gene expression can be modulated
  • Transcription, translation, post-transcriptional
    modification, RNA transport, mRNA degradation

Ligand independent transcriptional regulation at
chromatin level
3
GENE REGULATORY NETWORKS
  • Model the organization of regulatory interactions
    in the cell
  • Genes are nodes, regulatory interactions are
    directed edges
  • Boolean network model Edges are signed,
    indicating up- (promotion) and down-regulation
    (supression)

Flowering time in Arabidopsis
4
MOLECULAR ANNOTATION
  • Similar systems involving different molecules
    (genes, proteins) in different species
  • Functional annotation of genes provides an
    unified understanding of the underlying
    principles
  • Molecular function What is the role of a gene?
  • Biological process In which processes is a gene
    involved?
  • Cellular component Where is a genes product
    localized?
  • Gene Ontology provides a library of molecular
    annotation
  • We refer to each annotation class as a functional
    attribute

5
FROM MOLECULES TO SYSTEMS
  • Networks are species-specific
  • Annotation is at the molecular level
  • Map networks from gene space to function space
  • Can generate a library of annotated modular
    (sub-) networks

Network of Gene Ontology terms based on
significance of pairwise interactions in yeast
synthetic gene array (SGA) network (Tong et al.,
Science, 2004)
6
INDIRECT REGULATION
  • Assessment of pairwise interactions is simple,
    but not adequate

g1
g3
g5
g1
g3
g5
g2
g4
g4
g2
g4
g4
7
FUNCTIONAL ATTRIBUTE NETWORKS
  • Multigraph model
  • A gene is associated with multiple functional
    attributes
  • A functional attribute is associated with
    multiple genes
  • Functional attributes are represented by nodes
  • Genes are represented by ports, reflecting
    context

Functional attribute network
Gene network
8
FREQUENCY OF A MULTIPATH
  • A pathway of functional attributes occurs in
    various contexts in the gene network
  • Multipath in the functional attribute network

Frequency of multipath is 4 on the left, it is 0
on the right
9
SIGNIFICANCE OF A PATHWAY
  • We want to identify multipaths with unusual
    frequency
  • These might correspond to modular pathways
  • Frequency alone is not a good measure of
    statistical significance
  • The distribution of functional attributes among
    genes is highly skewed
  • The degree distribution in the gene network is
    highly skewed
  • Pathways that contain common functional
    attributes have high frequency, but they are not
    necessarily interesting

10
STATISTICAL INTERPRETABILITY
  • Additional positive observation gt increased
    significance
  • Additional negative observation gt decreased
    significance

B
B
A
A
P(B) lt P(A)
P(B) gt P(A)
Frequency is not statistically interpretable!
11
MONOTONICITY
  • Frequency is a monotonic measure
  • If a pathway is frequent, then all of its
    sub-paths are frequent
  • Algorithmic advantage enumerate all frequent
    patterns in a bottom-up fashion
  • Commonly exploited in traditional data mining
    applications
  • Statistically interpretable measures are not
    monotonic!
  • Statistical significance fluctuates in the search
    space
  • Existing data mining algorithms do not apply
  • Significance of pathways are non-monotonic in two
    dimensions

12
GO HIERARCHY
  • Functional attributes are organized in a
    hierarchical manner
  • regulation of steroid biosynthetic process is a
    regulation of steroid metabolic process and is
    part of steroid biosynthetic process
  • Interpretable statistical measures are not
    monotonic with respect to GO hierarchy

P( ) lt
g1
g5
g3
P( ) lt
g2
g4
P( )
13
PATHWAY LENGTH
P( ) gt P( )
P( ) lt P( )
  • Open problems
  • How can we effectively search in the pathway
    space, where significance fluctuates?
  • How can we find optimal resolution in functional
    attribute space?

14
STATISTICAL MODEL
  • Emphasize modularity of pathways
  • Condition on frequency of building blocks!
  • We denote each frequency random variable by N,
    their realization by n
  • Significance of pathway p123
  • p123 P (N123 n123N12n12, N23n23, N1n1,
    N2n2, N3n3)

p123
N1
N2
N3
N12
N23
N123
15
SIGNIFICANCE OF A PATHWAY
  • Assume that regulatory interactions are
    independent
  • There are n12 n23 occurrences of p 12 and p 23
  • The probability that these go through the same
    gene is 1/n2
  • The probability that at least n123 of the n12n23
    pairs of edges go through the same gene can be
    bounded by
  • p123 exp(n12n23Hq(t)) where q 1/n2 and t
    n123 / n12n23
  • Hq(t) t log(q/t) (1-t) log((1-q)/(1-t)) is the
    weighted entropy of t with respect to q
  • Can be generalized to pathways of arbitrary length

16
SIGNIFICANCE OF AN EDGE
  • A single regulatory interaction is the shortest
    pathway
  • Statistical significance is evaluated with
    respect to baseline model
  • The number of edges leaving and entering each
    functional attribute is specified
  • Edges are assumed to be independent
  • The frequency of a regulatory interaction is a
    hypergeometric random variable
  • Can derive a similar bound for the p-value of a
    single regulatory interaction

17
ALGORITHMIC ISSUES
  • Significance is not monotonic
  • Need to enumerate all pathways?
  • Strongly significant pathways
  • A pathway is strongly significant if all its
    building blocks are significant (defined
    recursively)
  • Allows pruning out the search space effectively
  • Shortcutting common functional attributes
  • Transcription factors, DNA binding genes, etc.
    are responsible for mediating regulation
  • Shortcut these terms, consider regulatory effect
    of different processes on each other directly

18
NARADAhttp//www.cs.purdue.edu/homes/jpandey/nara
da/
  • A software for identification of significant
    pathways
  • Queries
  • Given functional attribute T, find all
    significant pathways that originate at T
  • Given functional attribute T, find all
    significant pathways that terminate at T
  • Given a sequence of functional attributes T1, T2,
    , Tk, find all occurrences of the corresponding
    pathway
  • Identified pathways are displayed as a tree
  • User can explore back and forth between the gene
    network and the functional attribute network

19
RESULTS
  • E. coli transcription network obtained from
    RegulonDB
  • 3159 regulatory interactions between 1364 genes
  • Using Gene Ontology, 881 of these genes are
    mapped to 318 processes

Pathway length 2 3 4 5
All 427 580 1401 942
Strongly significant 427 208 183 142
Common terms shortcut 184 119 3 1
20
MOLYBDATE ION TRANSPORT
Significant regulatory pathways that originate at
molybdate ion transport
Their occurrences in the gene network
21
WHAT IS SIGNIFICANT?
  • Molybdate ion transport regulates various
    processes directly
  • Mo-molybdopterin cofactor biosynthesis,
    oligopeptide transport, cytochrome complex
    assembly
  • It regulates various other processes indirectly
  • Through DNA-dependent regulation of
    transcription, two-component signal transduction
    system, nitrate assimilation
  • Direct regulation of these mediator processes is
    not significant
  • NARADA captures modularity of indirect
    regulation!

22
CONCLUSION
  • Mapping gene regulatory networks to functional
    attribute space demonstrates great potential
  • Abstract, unified understanding of regulatory
    systems
  • Algorithmically, a wide range of new challenges
  • How can we bound interpretable statistical
    measures?
  • How can we handle hierarchy in functional
    attribute space?
  • Discovering new information
  • How can we project identified canonical
    patterns on other species to discover new
    regulatory relationships?
Write a Comment
User Comments (0)
About PowerShow.com