Title: Bioinformatics Computational methods to discover ncRNA in bacteria
1BioinformaticsComputational methods to discover
ncRNA in bacteria
- Ulf Schmitz
- ulf.schmitz_at_informatik.uni-rostock.de
- Bioinformatics and Systems Biology Group
- www.sbi.informatik.uni-rostock.de
2Outline
- Problem description
- Streptoccocus pyogenes
- The RNome, transcriptome
- Characteristics of bacterial ncRNA
- Approaches to find fRNA
- Conclusion / Outlook
3Streptococcus pyogenes
- important human pathogen (group A streptococcus
or GAS) - causes following diseases
- pyoderma (111 million cases/year)
- pharyngitis (616 million cases/year and 517,000
deaths/year)
- completely adapted to humans as its only natural
host - causes purulent infections of the skin and mucous
membranes and rarely life-threatening systemic
diseases
4Streptococcus pyogenes
- varies in multiplication rate -gt associated with
type of infection - to understand the regulation, one studied the
growth-phase regulatory factors and gene
expression in response to specific environmental
differences within the host - a novel growth phase assosiated
two-component-type regulator was identified - fasBCA operon, present in all 12 tested M
serotypes - contained two potential HPK genes (FasB, FasC)
and one RR (FasA) - shows its maximum expression and activity at the
transition phase - and to potentially support the aggressive
spreading of the bacteria in its host
HPK Histidine protein kinase RR response
regulator
5Streptococcus pyogenes
- downstream of the fas operon they identified a
300 nucleotide transcript (fasX) - not encoding for a peptide/protein
- but also growth phase related
- main effector molecule of fas regulon
- ncRNA or fRNA
6ncRNA
fasX
gltX-L
fasB
fasC
fasA
rnpA-L
tt
pfasX
prnpA
7RNome or transcriptome
putative gene expression regulators (also protein
interaction and housekeeping ncRNAs where found)
8RNome or transcriptome
types of RNA
Non-coding RNA (ncRNA) genes produce functional
RNA molecules rather than encoding proteins and
here are the nominees
9Functions of ncRNA
target mRNAs via imperfect sequence
complementarity
- binding may result in
- blockage of ribosome entry
- (translation repression)
- melting of inhibitory
- secondary structures
- (translation activation)
dissolving fold the fold back structure
loop-loop kissing complex
10Streptococcus pyogenes genomes
Genome Info Features
11Intergenic sequence inspector (ISI)
Bacterial genomes database
Annotated genome
IGR databank
Filtered IGR databank
BLAST results
Aligned features
Sequence features
Final results
IGR extractor
IGR filtering
BLAST
BLAST Analyser
Genview
12Characteristics of bacterial ncRNA
- intergenic sequence/structure conservation
between related - genomes
- encoded by free-standing genes, oriented in
opposite - fashion to both flanking genes
- 50 to 400 nt long (avrg. gt200nt)
- higher GC content than average intergenic space
- s70 promoter
- ? independent terminator
- imperfect sequence complementary with target
mRNA
13Characteristics of bacterial ncRNA
14The structure approach with RNAz
Function of many ncRNAs depend on a defined
secondary structure
- multiple sequence alignment
- measure of thermodynamic stability (z score)
- measure for RNA secondary structure conservation
15The structure approach
Thermodynamic stability
- calculation of the MFE (minimum free energy) as a
measure of thermodynamic stability - MFE depends on the length and the base
composition of the sequence - and is therefor difficult to interpret in
absolute terms - RNAz calculates a normalized measure of
thermodynamic stability by - compares the MFE m of a given (native) sequence
- with the MFEs of a large number of random
sequences with similar length and base
composition. - A z-score is calculated as
- , where µ and s are the mean and standard
deviations, resp., of the MFEs of the random
samples - negative z score indicates the a sequence is
more stable than expected by chance
16The structure approach
Structural conservation
- RNAz predicts a consensus secondary structure for
an alignment - results in a consensus MFE EA
- RNAz compares this consensus MFE to the average
MFE of the individual sequences E and calculates
a structure conservation index - SCI will be low if no consensus fold can be
found.
17The structure approach
- z-score and SCI, are used to classify an
alignment as structural RNA or other. - RNAz uses a support vector machine (SVM) learning
algorithm which is trained on a set of known
ncRNAs.
18Analysis pipeline of Freiburg group
extraction of intergenic regions 50nt
BLASTN
local alignment of IGRs with BLASTN
E-value 10-8
no
discard
reverse complement
of candidate sequences
to reduce redundancy
Unify overlapping
using ClustalW
Clustering
using RNAz
Scoring
19Summary / Conclusion
- there are reliable computational methods to
find ncRNA coding genes in bacteria - key methods involve
- IGR extraction and filtering
- observing sequence conservation in related
genomes (BLAST search, ClustalW alignment) - checking for structure conservation and
thermodynamic stability - next step is to proof their existance
experimentally via microArrays or Northern Blots
20Outlook
- might it be possible to predict target mRNA?
Thanks for your attention!