Title: How are navigation in networks and splicing in parasites related?
1How are navigation in networks and splicing in
parasites related?
- Shai CarmiBar-Ilan UniversityDepartment of
physics and the faculty of life sciences
Summer 2010, USA
2Navigation in networks with local information
- Navigation is important in communication
networks, transportation networks, and social
networks. - Knowledge of the entire network is usually not
feasible. - Use greedy navigation.
The Internet at the Autonomous Systems
levelCarmi et. al, PNAS 104, 11150 (2007)
MapQuest
Boguna Krioukov, PRL 102, 058701 (2009)
3Scale-free networks
- Nomenclature In a network (graph), links
(edges) connect nodes (vertices).The degree of a
node, k, is its number of links. - In the last decade, measurements showed that
almost all natural networks are scale-free. - Nodes in scale-free networks have degrees in all
orders of magnitude, including nodes with an
extremely large number of links (hubs). - Degree distribution
- Small ? network is highly heterogeneous, many
hubs exist. - Large ? network is homogeneous, fewer hubs,
similar to purely random networks.
4Navigation models
- Navigating to the hub.S. Carmi., P. L.
Krapivsky, and D. ben-Avraham, Physical Review E
78, 066111 (2008). - Kleinbergs navigation model.S. Carmi, S.
Carter, J. Sun, and D. ben-Avraham. Physical
Review Letters 102, 238702 (2009).
5How to find the most connected node in the
network an algorithm
- Start from a given node.
- Go to the neighbor with highest degree(break
ties arbitrarily). - Keep going, until reaching a peak- a node whose
degree is greater than the degrees of all of
its neighbors. - Only knowledge of the neighbors degree is
required!
Basins of attraction are formed around each hub.
6Example
Courtesy of Hernan Rozenfeld
7Who cares?
- Practical interestFast message routing to the
most connected node (for example, wireless
sensor networks). - Theoretical interest - A new decomposition
procedure based on association to hubs.- Number
and sizes of basins can be used to characterize
networks.
Rao et al., JMB 2004
8Basins distribution in scale-free networks
- How does the basins topology depend on the degree
exponent ? (P(k)k-?)? - For ??c 3 the largest hub attracts all nodes,
forming a giant basin. - For larger ? the network is fragmented to
numerous basins whose size distribution decays
as a power-law. - Mathematically
- The size of the largest basin scales as SNd,
where d1 for ? ?c and d1/(?-1) for large ?. - The probability of a node to belong to a basin of
size s is Q(s)s-a for small s with a?-1.
The giant basin
9Theory- a transition at ?3
- We prove that the probability of a node of degree
k to be a peak is approximately exp-Ak3-?,
where A is a k-independent constant. - For ?lt3, the probability approaches zero- only
the true hub is a peak. - For ?gt3, many nodes with large degree will be
peaks. - For large ?, we prove that the size of the
largest basin scales as SN1/(?-1). - The first two moments of the number of basins
and the number of solitary basins can be
approximated analytically.
10Deterministic fractal (u,v)-nets
- The behavior Q(s)s-a can be explained using a
fractal scale-free network model. - Each link in generation n splits into uvw
links in generation n1.
11A short summary
- Greedy search for the most connected node
partitions the network into basins of attraction. - For scale-free networks with ?lt3, a giant basin
exists (and thus greedy search works). - For ?gt3, there are many basins (corresponding to
the network modules). - The transition at ?3 and the power-law
distribution of small basin sizes can be
analytically explained. - The Internet and the glass network have a giant
basin.
12Generalization to lattices
- All degrees are equal.
- Nodes importance is determined by height or
energy. - Assume each node is attracted to its shortest
neighbor. - Basins of attraction have simple physical
interpretation.
valley
peak
valley
peak
saddle
saddle
valley
peak
peak
valley
saddle
13A fun exercise in probability
- The number of valleys
- R(s) the probability of a node to be the valley
of a basin of size s. - In 1D, R(1)1/30, and
- R(s) decays as 1/s!, much faster than the
power-law for networks. - In 2D, the density of peaks and valleys is 1/5,
of saddles 1/15. - R(1)109/4290.
- Density of craters is 3/715.
- Density of ridges is 1/20.
14The navigation problem and the Kleinberg model
- We know short paths exist in social networks
(six degrees of separation) . But how do people
find them? - The Kleinberg model (Nature 406, 845 (2000)).
Underlying lattice one long-range link for each
node long range link has length r with
probability r--a. Greedy navigation message
is always sent to the neighbor geographically
nearest to the destination. - Kleinberg proved (T- delivery time d- dimension
L- lattice linear size)- For ad, T ln2L.-
For a?d, T Lx for some exponent x. - For ad, greedy navigation can find short paths!
- Accurate expression for the delivery time- an
open problem for 9 years. - We prove
- We also show that short paths can be found for
a?d if messages can be lost.
15A sharp transition
16Trypanosoma brucei
- Parasitic eukaryotes that diverged 200-500
million years ago. - Pathogens of the African Sleeping
Sickness(30,000 deaths per year, best treatment
is from 1916). - Transfer from the gut of the Tsetse fly to the
bloodstream of humans and cattle. - Unique biology - Kinetoplast - RNA editing
with gRNA- Antigenic variation - trans-splicing
17mRNA processing
- T. brucei genes have no promoters.
- Gene expression is regulated by
controllingmRNA stabilityand translation.
18Splicing overview
SL- Spliced Leader RNA
See alsoLiang et. al, Euk. Cell (2003).
19Open questions
- Where are the splice sites?
- Is there alternative trans-splicing?
20Mapping transcript boundariesa deep-sequencing
approach
N. G. Kolev, J. B. Franklin, S. Carmi., H. Shi,
S. Michaeli, and C. Tschudi, PLoS Pathogens (in
press).
21Data analysis results
- 532 transcripts with misannotated start codon.
- 898 annotated genes not producing a transcript.
- 1,114 new transcripts, including conserved coding
and non-coding. - 394 genes with non-coding transcripts in their
3UTR. - Trans-splicing and polyadenylation of snoRNA
clusters. - Transcription initiation sites of the
polycistronic units. - Digital gene expression.
22Splice-site composition
Non AG splice-sites due to sequencing errors and
strain differences.
No signal observed in the exon, except for small
purine excess.
The 3-splice site
No G at -3
5UTR
ORF
PPT
PolyPyrimidine Tract
Human
Pyrimidine peak at about -25,distance from AG
variesunique to trypanosomes.
23Splice site composition
Define the PPT as the longest stretch of
pyrimidines (separated by no more than one
purine) in the 200nts upstream of the splice site.
Median- 43nts
Median- 18nts
24UTRs
Median- 130nts
Median- 388nts
25Alternative splicing
Uncertainty of splice-site usage
(Shannon entropy).
26Alternative splicing
Position relative to primary splice site, nt
Alternative splicing dispersion average
distance (nts) of all weak splice sites from the
strongest one.
-150
150
Sites near the ORF are stronger. Some sites are
found in frame.
ATG
60
40
Gene number
relative usage of trans-splice sites
20
0
-300
-100
100
300
nt position relative to START codon
27Why alternative splicing?
- Usually does not create protein isoforms.
- Noise?
- Regulatory role?- Affinity of splice sites could
depend on environmental conditions.- Different
5UTRs can carry sequences that determine the
fate of the mRNA. - Future studies will find out whether splice sites
usage varies between environments, life cycles,
and strains.
28Polyadenylation sites
Median 142nts
29Summary
- Deep sequencing of Trypanosoma brucei mRNA
reveals the transcriptome of the parasite at
single nucleotide resolution. - Hundreds of genes reannotated.
- Splice sites and polyadenylation sites mapped for
the first time. - Splice site sequence is HAG.
- PPT length and distance from splice site highly
variable. - Considerable amount of alternative splicing
previously unpredicted. - Polyadenylation occurs preferentially at
adenosynes but location is highly irregular. - Evidence for coupling of polyadenylation and
trans-splicing of the downstream gene.
30Does splicing regulate gene expression?
- Gene expression is regulated by the presence of
splicing factors. - What is the molecular mechanism?
- No significant sequence motifs.
Splicing factor silenced
31Downregulation
- Tb11.02.1100- nucleobase/nucleoside transporter
8.1. - Downregulated in all lines.
- Regulatory sequenceCAGTATCATCCCCACTTAAGGAAACTGTA
AGCTTAGTCACTTCCCTCCTTTCTCTTTCTTTTTGTACGAAGGTTAAAGC
CACAAGACTCTCTTACTGAACTCAGGCAAGTGAACAACACCGCACTAAAC
CAGAATCGCATAAGTTACATCCACTATCCATCCACTCGGGTTTAACTGAA
TTGCATCGCTGGATACCTTTCGTGTGCAATG
Particularly short PPT-AG distance!
Polypyrimidine tract (PPT)
3-splice site
C-rich PPT!
5-UTR
START codon
32Hypothesis
- Binding of splicing factors (U2AF65) to the PPT
is weak because of the short distance to the AG. - Binding of PTB (PPT Binding) protein to its
target- the C-rich PPT is required for efficient
splicing. - Knockdown of U2AF65 or PTB1 decreases splicing
factors affinity and splicing efficiency.
U2AF35
U2AF65
Normal
Rest of intron
PPT
AG
5UTR
U2AF65
Short PPT-AG distance and C-rich PPT
U2AF35
PTB
Rest of intron
PPT
AG
5UTR
33Experiment design
Tb11.02.1100
Luciferase
Procyclin
1
promoter
intron
5UTR
reporter
AG
PPT
spacer
2
intron
5UTR
reporter
AG
promoter
TTTTTTTTT
spacer
3
promoter
intron
5UTR
reporter
AG
PPT
spacer
Transfect constructs into U2AF65 silenced cells.
Expect (1) Downregulation of luciferase
activity in response to U2AF65 silencing. (2-3)
Elimination of downregulation.
34Upregulation
Tb927.7.1110- Asparagine synthetase a, putative.
Upregulated in U2AF65.
35Hypothesis
- Biochemical evidence that upregulation is due to
cytoplasmatic binding of U2AF65 to the 3UTR of
the mature mRNA. - U2AF65 binding expected when trans-splicing
occurs in the 3UTR. - Possible that U2AF65 binding to 3UTR of mature
mRNA responsible for downregulation of the
species with the downstream polyadenylation site.
mRNA species degraded in the presence of U2AF65
U2AF65
ORF
3UTR
PPT
3UTR
5UTR
PolyA tail
Other species
ORF
3UTR
5UTR
PolyA tail
36Experiment design
Luciferase
Procyclin
Tb927.7.1110 3UTR
1
promoter
Intron5UTR
reporter
PA
PPT
2
promoter
Intron5UTR
reporter
PA
Transfect constructs into U2AF65 silenced cells.
Expect(1) Upregulation of luciferase activity
in response to U2AF65 silencing. (2) Elimination
of upregulation.
Results are expected in the upcoming few months.
37Summary
- The mapping of splice sites and polyadenylation
sites by deep sequencing improves our
understanding of these processes. - The presence/absence of specific splicing factors
regulates the expression of some genes. - Regulation is likely to be related to structural
features of the mRNA rather than sequence motifs. - Model genes were selected for which we have
conjectures about the molecular mechanism of
regulation. - Reporter gene assays are carried out to test
these conjectures.
38Acknowledgements
- Navigation in networks
- Prof. Daniel ben-Avraham (Clarkson University,
NY) students Dr. Hernan Rozenfeld, Stephen
Carter, Jie Sun - Prof. Paul Krapivsky (Boston University)
- Splicing in trypanosomes
- Prof. Shulamit Michaeli (Bar-Ilan)students
Sachin Kumar-Gupta, Asher Pivko, Ilana
Naboishchikov - Prof. Elisabetta Ullu, Prof. Christian Tschudi
(Yale)staff Dr. Joseph Franklin, Dr. Nikolay
Kolev, Dr. Huafang Shi - Thesis advisor Prof. Shlomo Havlin (Bar-Ilan).
- Funding Adams Fellowship Program of the Israel
Academy of Sciences and Humanities
39Thank you for your attention!
40My research interests
- Biology (general)
- Protein interaction (comp)
- DNA editing (comp)
- Trypanosomes
- Unfolded protein response (comp expr)
- Splicing regulation (comp expr)
- Mapping alternative splicing (comp)
- Networks
- Modeling
- Flow
- Diffusion
- Percolation
- Disease spreading
- Navigation
- Data analysis
- The Internet
- Glass models
- Diffusion
- Anomalous functionals (theory)
- Microscopy (biophysics)
41Random network models
- In a network, links (edges) connect
computers/individuals (nodes). - Simplest model a regular lattice. Good for
purely spatial, local interactions. - Erdos-Renyi (ER) network model (GN,p) fully
random. Number of nodes N, probability of link
p. Narrow degree distribution (Poisson). - Scale-free (SF) networks emergence of hubs.
Broad degree distribution Nodes with
extremely high degree exist (hubs). Found to
describe most real-world systems.
42Basins of attraction vs. community detection
- The calculation of the basins of attraction
provides a decomposition of the network. - How does it compare with state of the art
community detectors? - Most community detectors use global information.
- More importantly, community detection and
separation to basins have different goals.
Consider this example
Community detectorsMaximize links within
communitiesminimize links between communities.
Basins of attractionSeparate nodes by the hub
they associate with.
Not really two communities!
43Tie breaking
- What happens when the neighbor of highest degree
of a node has the same degree as the node itself? - In our local search, a node can be a peak even
if it has neighbors of equal degree. - In a recursive search, we surf over ridges of
connected nodes of equal degree to reach the
true hub. - Less basins exist, but other results remain
qualitatively the same.
442D random surface example
45Kleinberg model simulations
- Our solution agrees with numerical results
(navigation simulations and iteration of the
master equation).
46Message loss probability
- Kleinbergs model is unrealistic why does the
network need to be fine-tuned (have ad) for
greedy routing to work? - The missing ingredient- message loss probability.
- We calculated Tz(L) analytically, where z is the
probability of successful completion of a single
step. - The system is small-world for a much wider range
of a! - Explains why the system need not be fine-tuned
to become navigable.
No message loss
With message loss
z0.9, 1D
47Splicing machinery and sequence
mammalian
Yeast conserved branch site TACTAAC
48Splicing regulation
SR proteins create bridges to stabilize the
spliceosome
- In trypanosomes
- U2AF65 and 35 exist and do not interact.
- U2AF65 interacts with SF1.
- Interacting SR proteins were identified.
- hnRNP proteins exist.
hnRNP
splicing enhancer
splicing silencer
49Predicting splicing heterogeneity
- What determines if a gene will be differentially
spliced? - Look at 100nts up- and down-stream the strongest
site. - Rank all potential splice sites TAG-3, AAG,
CAG-2, GAG-1. - heterogeneity rank of a gene sum of ranks of
all other AG dinucleotides / rank of strongest
site. - Average heterogeneity rank about 10 for high
uncertainty genes, but only about 7 for low
uncertainty genes (P10-20). - Signatures do not look meaningful, but analysis
shows that longer 5UTRs, shorter PPTs, and
longer PPT-AG distance also contribute
significantly to heterogeneity.
50Explaining abundance
- A-rich exons are more abundant.
Splice-site ambiguity is anti-correlated with
abundance.
Abundance
Dispersion
Other correlations Genes with longer PPT and
shorter 5UTR are more abundant.