Title: IslandPath: A computational aid for identifying genomic islands
1IslandPath A computational aid for identifying
genomic islands that may play a role in
microbial pathogenicity William Hsiao1, Nancy
Price2, Ivan Wan3, Steven J. Jones3, and Fiona S.
L. Brinkman1. 1Department of Molecular Biology
and Biochemistry, Simon Fraser University,
Burnaby, 2Department of Medical Genetics,
University of British Columbia, Vancouver, and
3Genome Sequence Centre, B.C. Cancer Agency,
British Columbia, Canada
www.pathogenomics.bc.ca/brinkman
Abstract As more genomes from bacterial pathogens
are sequenced, it is becoming apparent that a
significant proportion of virulence factors are
encoded in clusters of genes, termed
Pathogenicity Islands (reviewed in 1). These
islands and other genomic islands, tend to have
atypical guanine and cytosine content (GC),
contain mobility genes (e.g. transposases and
integrases), and are associated with tRNA
sequences. We have developed a web-based
computational tool, IslandPath, to aid the
visualization of these features in a full genome
display in order to facilitate the identification
of genes in new genome sequences that may be
involved in virulence or have horizontal origins.
The ability to visualize these features within
the genomic context can facilitate better
detection of the genomic island borders and
neighbouring genes. Atypical GC by itself is
not indicative of the horizontal origin of the
sequence involved, however, the predictive power
increases when such regions are associated with
mobile elements, direct repeats, or contain genes
with similarity to known virulence factors.
Therefore, we are incorporating into IslandPath
algorithms to detect partial tRNAs in new genomic
sequences that are likely to be the reminiscent
of phage insertion events, and are also comparing
the genomic sequences to a custom-built database
of a subset of known virulence factors.
Preliminary results are encouraging through our
investigation of the ability of IslandPath to
visualize known Pathogenicity Islands as distinct
regions within the genomes. This computational
tool also permitted us to perform a more in-depth
analysis of GC variance in genomes and enabled
us to detect correlations not previously
reported. As more and more genome data become
available, tools like IslandPath, which can be
updated in an automated fashion, will become
valuable for genomic research.
Frequencies of ORF GC in Genomes
GC Analysis for Complete Genome Sequences
Histograms of frequencies of GC were plotted
for several organisms.
Bacterial Pathogens Primary Diseases Cellular Localization of ORFs GC Mean (ORFs gt300bp) GC S.D. (ORFs gt300bp)
Neisseria meningitidis serogroup B strain MC58 meningitis extracellular 2025 52.4 6.9
Neisseria meningitidis serogroup A strain Z2491 meningitis extracellular 2121 52.6 6.5
Xylella fastidiosa Citrus variegated chlorosis extracellular 2766 53.4 5.4
Escherichia coli O157H7 (E. coli O157H7_EDL933) diarrhoea facultative intracellular 5361 (5349) 51.1 (51.9) 5.3 (5.3)
Mycoplasma pneumoniae M129 mycoplasmal pneumonia ("walking pneumonia") extracellular 677 40.3 4.9
Yersinia pestis strain CO92 bubonic plague and Pneumonic plague facultative intracellular 3885 48.3 4.7
Streptococcus pneumoniae TIGR4 (S. pneumoniae R6) bacterial pneumonia, meningitis, sepsis, and otitis media extracellular 2094 (2043) 40.3 (40.4) 4.4 (4.3)
Treponema pallidum Nichols syphilis extracellular 1031 51.4 4.2
Mycoplasma pulmonis murine respiratory mycoplasmosis extracellular 782 27.2 3.8
Pseudomonas aeruginosa PAO1 variety of mucosal infections (opportunistic) extracellular 5565 67.0 3.8
Rickettsia conorii Malish 7 Mediterranean spotted fever obligate intracellular 1374 32.4 3.8
Ureaplasma urealyticum serovar 3 urethritis extracellular 613 25.8 3.8
Vibrio cholerae N16961 cholera extracellular I 2736 II 1092 I 48.1 II 46.9 I 3.7 II 4.3
Borrelia burgdorferi B31 Lyme disease facultative intracellular 851 28.7 3.6
Streptococcus pyogenes scarlet fever, toxic shock like syndrome extracellular 1696 38.9 3.6
Mycoplasma genitalium G37 urethritis (opportunistic, usually HIV patients) extracellular 484 31.4 3.5
Campylobacter jejuni NCTC11168 gastroenteritis extracellular 1654 30.6 3.5
Helicobacter pylori 26695 (H. pylori J99) peptic ulcers and gastritis extracellular 1566 (1491) 39.4 (39.7) 3.4 (3.3)
Haemophilus influenzae Rd-KW20 upper respiratory infection meningitis extracellular 1709 38.5 3.4
Mycobacterium tuberculosis CDC1551 (M. tuberculosis H37Rv) tuberculosis facultative intracellular 4187 (3918) 65.5 (65.6) 3.3 (3.3)
Pasteurella multocida PM70 fowl cholera, cattle septicemia, etc. extracellular 2014 40.8 3.3
Rickettsia prowazekii Madrid E epidemic typhus obligate intracellular 834 30.1 3.3
Staphylococcus aureus Mu50 (S. aureus N315) food poisoning, toxic shock syndrome, necrotizing fascitis extracellular 2714 (2595) 33.3 (32.2) 3.0 (3.0)
Mycobacterium leprae Leprosy obligate intracellular 2720 60.0 2.9
Agrobacterium tumefacien C58 (Cereon) crown gall (in plants) Extracellular c2721 l1833 c 59.8 l 59.7 c 2.7 l 2.9
Chlamydophila pneumoniae AR39 (C. pneumoniae J138) C. pneumoniae CWL029 chlamydial pneumonia obligate intracellular 1110 (1070) 1052 41.1 (41.1) 41.1 2.6 (2.6) 2.6
Chlamydia trachomatis D chlamydia obligate intracellular 894 41.5 2.3
Chlamydia muridarum MoPn chlamydia obligate intracellular 909 40.8 2.2
Observations Lowest kurtosis occurs most
commonly with a mode of 33.33 for GC values of
ORFs in a genome (e.g. M. jannaschii DSM2661)
This GC value corresponds to maximum A/T in
synonymous sites for the standard codon usage
table. Long tails in the frequency plots occur
more frequently downward (e.g. H. pylori J99 and
N. meningitidis) than upward These observations
likely reflect either a bias in gene
identification in high GC genomes, or a
selection to higher AT content.
- Discussion
- IslandPath appears to be an effective automated
tool to visualize and detect genomic islands.
Previous reports have expressed concern about the
use of GC to detect HGT however, these reports
were examining GC for individual genes. We
propose that GC analysis is effective if
clusters of genes containing motifs associated
with mobility elements are considered. - Foreign genes with similar GC to the organisms
genome are not detected, and due to gene
amelioration, only recent HGT can be detected.
This tool represents one approach that can be
complemented with others, to prioritize
particular genomic islands that merit further
research. - Future developments
- Virulence factor homology search (based on
comparison to our VGS dataset) - Alternative DNA signatures (e.g. codon usage)
- Allow users to input their own sequences for
analysis
- Methods
- Core scripts written in Perl and CGI/Perl
- Sequence Data NCBI Genome FTP site
- Potential mobility elements COG analysis2,3
plus keyword scan - RNA locations NCBI data plus tRNAscan-SE4
- GC calculated for each ORF
- Mean and Std. Dev. for all ORFs in genome
calculated - File containing all ORF information used to
generate a graphical representation - Virulence Gene Subset (VGS) database developed
through literature analysis of genes identified
as virulence factors using the Molecular Kochs
Postulates (i.e. gene knockout affects virulence)
References 1 Hacker J and Kaper JB, 2000, Annu
Rev Microbiol. 54641-79 2 Tatusov RL, et al.,
1997, Science 278(5338)631-7 3 Tatusov RL, et
al., 2001, Nucleic Acids Res. 29(1)22-8 4 Lowe
TM and Eddy SR, 1997, Nucleic Acids Res.
25(5)955-64 5 Heidelberg JF, et al., 2000,
Nature 406477-84
Acknowledgements This project is funded by the
Peter Wall Institute for Advanced Studies.We
wish to thank Tatiana Tatusov of NCBI for
providing helpful files for IslandPath and
acknowledge the efforts of the many genome
projects that have made our analysis possible.
Non-pathogens of ORFs GC Mean (ORFs gt300bp) GC S.D. (ORFs gt300bp)
Escherichia coli K12 4289 51.3 4.7