How are navigation in networks and splicing in parasites related? PowerPoint PPT Presentation

presentation player overlay
1 / 50
About This Presentation
Transcript and Presenter's Notes

Title: How are navigation in networks and splicing in parasites related?


1
How are navigation in networks and splicing in
parasites related?
  • Shai CarmiBar-Ilan UniversityDepartment of
    physics and the faculty of life sciences

Summer 2010, USA
2
Navigation in networks with local information
  • Navigation is important in communication
    networks, transportation networks, and social
    networks.
  • Knowledge of the entire network is usually not
    feasible.
  • Use greedy navigation.

The Internet at the Autonomous Systems
levelCarmi et. al, PNAS 104, 11150 (2007)
MapQuest
Boguna Krioukov, PRL 102, 058701 (2009)
3
Scale-free networks
  • Nomenclature In a network (graph), links
    (edges) connect nodes (vertices).The degree of a
    node, k, is its number of links.
  • In the last decade, measurements showed that
    almost all natural networks are scale-free.
  • Nodes in scale-free networks have degrees in all
    orders of magnitude, including nodes with an
    extremely large number of links (hubs).
  • Degree distribution
  • Small ? network is highly heterogeneous, many
    hubs exist.
  • Large ? network is homogeneous, fewer hubs,
    similar to purely random networks.

4
Navigation models
  1. Navigating to the hub.S. Carmi., P. L.
    Krapivsky, and D. ben-Avraham, Physical Review E
    78, 066111 (2008).
  2. Kleinbergs navigation model.S. Carmi, S.
    Carter, J. Sun, and D. ben-Avraham. Physical
    Review Letters 102, 238702 (2009).

5
How to find the most connected node in the
network an algorithm
  • Start from a given node.
  • Go to the neighbor with highest degree(break
    ties arbitrarily).
  • Keep going, until reaching a peak- a node whose
    degree is greater than the degrees of all of
    its neighbors.
  • Only knowledge of the neighbors degree is
    required!

Basins of attraction are formed around each hub.
6
Example
Courtesy of Hernan Rozenfeld
7
Who cares?
  • Practical interestFast message routing to the
    most connected node (for example, wireless
    sensor networks).
  • Theoretical interest - A new decomposition
    procedure based on association to hubs.- Number
    and sizes of basins can be used to characterize
    networks.

Rao et al., JMB 2004
8
Basins distribution in scale-free networks
  • How does the basins topology depend on the degree
    exponent ? (P(k)k-?)?
  • For ??c 3 the largest hub attracts all nodes,
    forming a giant basin.
  • For larger ? the network is fragmented to
    numerous basins whose size distribution decays
    as a power-law.
  • Mathematically
  • The size of the largest basin scales as SNd,
    where d1 for ? ?c and d1/(?-1) for large ?.
  • The probability of a node to belong to a basin of
    size s is Q(s)s-a for small s with a?-1.

The giant basin
9
Theory- a transition at ?3
  • We prove that the probability of a node of degree
    k to be a peak is approximately exp-Ak3-?,
    where A is a k-independent constant.
  • For ?lt3, the probability approaches zero- only
    the true hub is a peak.
  • For ?gt3, many nodes with large degree will be
    peaks.
  • For large ?, we prove that the size of the
    largest basin scales as SN1/(?-1).
  • The first two moments of the number of basins
    and the number of solitary basins can be
    approximated analytically.

10
Deterministic fractal (u,v)-nets
  • The behavior Q(s)s-a can be explained using a
    fractal scale-free network model.
  • Each link in generation n splits into uvw
    links in generation n1.

11
A short summary
  • Greedy search for the most connected node
    partitions the network into basins of attraction.
  • For scale-free networks with ?lt3, a giant basin
    exists (and thus greedy search works).
  • For ?gt3, there are many basins (corresponding to
    the network modules).
  • The transition at ?3 and the power-law
    distribution of small basin sizes can be
    analytically explained.
  • The Internet and the glass network have a giant
    basin.

12
Generalization to lattices
  • All degrees are equal.
  • Nodes importance is determined by height or
    energy.
  • Assume each node is attracted to its shortest
    neighbor.
  • Basins of attraction have simple physical
    interpretation.

valley
peak
valley
peak
saddle
saddle
valley
peak
peak
valley
saddle
13
A fun exercise in probability
  • The number of valleys
  • R(s) the probability of a node to be the valley
    of a basin of size s.
  • In 1D, R(1)1/30, and
  • R(s) decays as 1/s!, much faster than the
    power-law for networks.
  • In 2D, the density of peaks and valleys is 1/5,
    of saddles 1/15.
  • R(1)109/4290.
  • Density of craters is 3/715.
  • Density of ridges is 1/20.

14
The navigation problem and the Kleinberg model
  • We know short paths exist in social networks
    (six degrees of separation) . But how do people
    find them?
  • The Kleinberg model (Nature 406, 845 (2000)).
    Underlying lattice one long-range link for each
    node long range link has length r with
    probability r--a. Greedy navigation message
    is always sent to the neighbor geographically
    nearest to the destination.
  • Kleinberg proved (T- delivery time d- dimension
    L- lattice linear size)- For ad, T ln2L.-
    For a?d, T Lx for some exponent x.
  • For ad, greedy navigation can find short paths!
  • Accurate expression for the delivery time- an
    open problem for 9 years.
  • We prove
  • We also show that short paths can be found for
    a?d if messages can be lost.

15
A sharp transition
16
Trypanosoma brucei
  • Parasitic eukaryotes that diverged 200-500
    million years ago.
  • Pathogens of the African Sleeping
    Sickness(30,000 deaths per year, best treatment
    is from 1916).
  • Transfer from the gut of the Tsetse fly to the
    bloodstream of humans and cattle.
  • Unique biology - Kinetoplast - RNA editing
    with gRNA- Antigenic variation - trans-splicing

17
mRNA processing
  • T. brucei genes have no promoters.
  • Gene expression is regulated by
    controllingmRNA stabilityand translation.

18
Splicing overview
SL- Spliced Leader RNA
See alsoLiang et. al, Euk. Cell (2003).
19
Open questions
  • Where are the splice sites?
  • Is there alternative trans-splicing?

20
Mapping transcript boundariesa deep-sequencing
approach
N. G. Kolev, J. B. Franklin, S. Carmi., H. Shi,
S. Michaeli, and C. Tschudi, PLoS Pathogens (in
press).
21
Data analysis results
  • 532 transcripts with misannotated start codon.
  • 898 annotated genes not producing a transcript.
  • 1,114 new transcripts, including conserved coding
    and non-coding.
  • 394 genes with non-coding transcripts in their
    3UTR.
  • Trans-splicing and polyadenylation of snoRNA
    clusters.
  • Transcription initiation sites of the
    polycistronic units.
  • Digital gene expression.

22
Splice-site composition
Non AG splice-sites due to sequencing errors and
strain differences.
No signal observed in the exon, except for small
purine excess.
The 3-splice site
No G at -3
5UTR
ORF
PPT
PolyPyrimidine Tract
Human
Pyrimidine peak at about -25,distance from AG
variesunique to trypanosomes.
23
Splice site composition
Define the PPT as the longest stretch of
pyrimidines (separated by no more than one
purine) in the 200nts upstream of the splice site.
Median- 43nts
Median- 18nts
24
UTRs
Median- 130nts
Median- 388nts
25
Alternative splicing
Uncertainty of splice-site usage
(Shannon entropy).
26
Alternative splicing
Position relative to primary splice site, nt
Alternative splicing dispersion average
distance (nts) of all weak splice sites from the
strongest one.
-150
150
Sites near the ORF are stronger. Some sites are
found in frame.
ATG
60
40
Gene number
relative usage of trans-splice sites
20
0
-300
-100
100
300
nt position relative to START codon
27
Why alternative splicing?
  • Usually does not create protein isoforms.
  • Noise?
  • Regulatory role?- Affinity of splice sites could
    depend on environmental conditions.- Different
    5UTRs can carry sequences that determine the
    fate of the mRNA.
  • Future studies will find out whether splice sites
    usage varies between environments, life cycles,
    and strains.

28
Polyadenylation sites
Median 142nts
29
Summary
  • Deep sequencing of Trypanosoma brucei mRNA
    reveals the transcriptome of the parasite at
    single nucleotide resolution.
  • Hundreds of genes reannotated.
  • Splice sites and polyadenylation sites mapped for
    the first time.
  • Splice site sequence is HAG.
  • PPT length and distance from splice site highly
    variable.
  • Considerable amount of alternative splicing
    previously unpredicted.
  • Polyadenylation occurs preferentially at
    adenosynes but location is highly irregular.
  • Evidence for coupling of polyadenylation and
    trans-splicing of the downstream gene.

30
Does splicing regulate gene expression?
  • Gene expression is regulated by the presence of
    splicing factors.
  • What is the molecular mechanism?
  • No significant sequence motifs.

Splicing factor silenced
31
Downregulation
  • Tb11.02.1100- nucleobase/nucleoside transporter
    8.1.
  • Downregulated in all lines.
  • Regulatory sequenceCAGTATCATCCCCACTTAAGGAAACTGTA
    AGCTTAGTCACTTCCCTCCTTTCTCTTTCTTTTTGTACGAAGGTTAAAGC
    CACAAGACTCTCTTACTGAACTCAGGCAAGTGAACAACACCGCACTAAAC
    CAGAATCGCATAAGTTACATCCACTATCCATCCACTCGGGTTTAACTGAA
    TTGCATCGCTGGATACCTTTCGTGTGCAATG

Particularly short PPT-AG distance!
Polypyrimidine tract (PPT)
3-splice site
C-rich PPT!
5-UTR
START codon
32
Hypothesis
  • Binding of splicing factors (U2AF65) to the PPT
    is weak because of the short distance to the AG.
  • Binding of PTB (PPT Binding) protein to its
    target- the C-rich PPT is required for efficient
    splicing.
  • Knockdown of U2AF65 or PTB1 decreases splicing
    factors affinity and splicing efficiency.

U2AF35
U2AF65
Normal
Rest of intron
PPT
AG
5UTR
U2AF65
Short PPT-AG distance and C-rich PPT
U2AF35
PTB
Rest of intron
PPT
AG
5UTR
33
Experiment design
Tb11.02.1100
Luciferase
Procyclin
1
promoter
intron
5UTR
reporter
AG
PPT
spacer
2
intron
5UTR
reporter
AG
promoter
TTTTTTTTT
spacer
3
promoter
intron
5UTR
reporter
AG
PPT
spacer
Transfect constructs into U2AF65 silenced cells.
Expect (1) Downregulation of luciferase
activity in response to U2AF65 silencing. (2-3)
Elimination of downregulation.
34
Upregulation
Tb927.7.1110- Asparagine synthetase a, putative.
Upregulated in U2AF65.
35
Hypothesis
  • Biochemical evidence that upregulation is due to
    cytoplasmatic binding of U2AF65 to the 3UTR of
    the mature mRNA.
  • U2AF65 binding expected when trans-splicing
    occurs in the 3UTR.
  • Possible that U2AF65 binding to 3UTR of mature
    mRNA responsible for downregulation of the
    species with the downstream polyadenylation site.

mRNA species degraded in the presence of U2AF65
U2AF65
ORF
3UTR
PPT
3UTR
5UTR
PolyA tail
Other species
ORF
3UTR
5UTR
PolyA tail
36
Experiment design
Luciferase
Procyclin
Tb927.7.1110 3UTR
1
promoter
Intron5UTR
reporter
PA
PPT
2
promoter
Intron5UTR
reporter
PA
Transfect constructs into U2AF65 silenced cells.
Expect(1) Upregulation of luciferase activity
in response to U2AF65 silencing. (2) Elimination
of upregulation.
Results are expected in the upcoming few months.
37
Summary
  • The mapping of splice sites and polyadenylation
    sites by deep sequencing improves our
    understanding of these processes.
  • The presence/absence of specific splicing factors
    regulates the expression of some genes.
  • Regulation is likely to be related to structural
    features of the mRNA rather than sequence motifs.
  • Model genes were selected for which we have
    conjectures about the molecular mechanism of
    regulation.
  • Reporter gene assays are carried out to test
    these conjectures.

38
Acknowledgements
  • Navigation in networks
  • Prof. Daniel ben-Avraham (Clarkson University,
    NY) students Dr. Hernan Rozenfeld, Stephen
    Carter, Jie Sun
  • Prof. Paul Krapivsky (Boston University)
  • Splicing in trypanosomes
  • Prof. Shulamit Michaeli (Bar-Ilan)students
    Sachin Kumar-Gupta, Asher Pivko, Ilana
    Naboishchikov
  • Prof. Elisabetta Ullu, Prof. Christian Tschudi
    (Yale)staff Dr. Joseph Franklin, Dr. Nikolay
    Kolev, Dr. Huafang Shi
  • Thesis advisor Prof. Shlomo Havlin (Bar-Ilan).
  • Funding Adams Fellowship Program of the Israel
    Academy of Sciences and Humanities

39
Thank you for your attention!
40
My research interests
  • Biology (general)
  • Protein interaction (comp)
  • DNA editing (comp)
  • Trypanosomes
  • Unfolded protein response (comp expr)
  • Splicing regulation (comp expr)
  • Mapping alternative splicing (comp)
  • Networks
  • Modeling
  • Flow
  • Diffusion
  • Percolation
  • Disease spreading
  • Navigation
  • Data analysis
  • The Internet
  • Glass models
  • Diffusion
  • Anomalous functionals (theory)
  • Microscopy (biophysics)

41
Random network models
  • In a network, links (edges) connect
    computers/individuals (nodes).
  • Simplest model a regular lattice. Good for
    purely spatial, local interactions.
  • Erdos-Renyi (ER) network model (GN,p) fully
    random. Number of nodes N, probability of link
    p. Narrow degree distribution (Poisson).
  • Scale-free (SF) networks emergence of hubs.
    Broad degree distribution Nodes with
    extremely high degree exist (hubs). Found to
    describe most real-world systems.

42
Basins of attraction vs. community detection
  • The calculation of the basins of attraction
    provides a decomposition of the network.
  • How does it compare with state of the art
    community detectors?
  • Most community detectors use global information.
  • More importantly, community detection and
    separation to basins have different goals.
    Consider this example

Community detectorsMaximize links within
communitiesminimize links between communities.
Basins of attractionSeparate nodes by the hub
they associate with.
Not really two communities!
43
Tie breaking
  • What happens when the neighbor of highest degree
    of a node has the same degree as the node itself?
  • In our local search, a node can be a peak even
    if it has neighbors of equal degree.
  • In a recursive search, we surf over ridges of
    connected nodes of equal degree to reach the
    true hub.
  • Less basins exist, but other results remain
    qualitatively the same.

44
2D random surface example
45
Kleinberg model simulations
  • Our solution agrees with numerical results
    (navigation simulations and iteration of the
    master equation).

46
Message loss probability
  • Kleinbergs model is unrealistic why does the
    network need to be fine-tuned (have ad) for
    greedy routing to work?
  • The missing ingredient- message loss probability.
  • We calculated Tz(L) analytically, where z is the
    probability of successful completion of a single
    step.
  • The system is small-world for a much wider range
    of a!
  • Explains why the system need not be fine-tuned
    to become navigable.

No message loss
With message loss
z0.9, 1D
47
Splicing machinery and sequence
mammalian
Yeast conserved branch site TACTAAC
48
Splicing regulation
SR proteins create bridges to stabilize the
spliceosome
  • In trypanosomes
  • U2AF65 and 35 exist and do not interact.
  • U2AF65 interacts with SF1.
  • Interacting SR proteins were identified.
  • hnRNP proteins exist.

hnRNP
splicing enhancer
splicing silencer
49
Predicting splicing heterogeneity
  • What determines if a gene will be differentially
    spliced?
  • Look at 100nts up- and down-stream the strongest
    site.
  • Rank all potential splice sites TAG-3, AAG,
    CAG-2, GAG-1.
  • heterogeneity rank of a gene sum of ranks of
    all other AG dinucleotides / rank of strongest
    site.
  • Average heterogeneity rank about 10 for high
    uncertainty genes, but only about 7 for low
    uncertainty genes (P10-20).
  • Signatures do not look meaningful, but analysis
    shows that longer 5UTRs, shorter PPTs, and
    longer PPT-AG distance also contribute
    significantly to heterogeneity.

50
Explaining abundance
  • A-rich exons are more abundant.

Splice-site ambiguity is anti-correlated with
abundance.
Abundance
Dispersion
Other correlations Genes with longer PPT and
shorter 5UTR are more abundant.
Write a Comment
User Comments (0)
About PowerShow.com