Title: Genome Analysis of Burkholderia pseudomallei
1Genome Analysis of Burkholderia pseudomallei
2Introduction
- Burkholderia pseudomallei is a free-living
gram-negative soil bacillus that causes
melioidosis - The genome was completely sequenced on May 2003
by the Sanger Institute - Two replicons - 4Mbp (chromosome 1)
- 3Mbp (chromosome 2) - No annotation yet
3Objectives
- To construct the genome annotation database
- To analyze the genome using the database
- General analysis
- Comparison of the two chromosomes
- Prediction of putative Pathogenicity Islands
(clusters of virulence genes)
4Creation of B. pseudomallei Genome Database -
Annotation
Predict ORF by GeneMark
5679 ORFs predicted
ORFs gt BLAST
4987 ORFs have homologs
ORFs gt NCBI Conserved Domain Database
4161 ORFs contain Pfam domains 4332 ORFs contain
COG domains
ORFs gt Functional Categories
3553 ORFs in categories
ORFs gt Gene Ontology vocabulary to classify
genes In more detail
3066 ORFs in GO
- 30 Java or Perl programs have been created to
carry out the tasks
5B. pseudomallei Genome Database
- 16 tables were created in the database to store
the annotation results - 19 Perl scripts were written for the web
interface of the database - http//origin.bic.nus.edu.sg/xiechao/query.html
6Analyses of B. pseudomallei Genome- General
features
7Analyses of B. pseudomallei Genome- Comparative
genomics
- The ORFs were compared with all the 155
completely sequenced microbial genomes - 37 of B. pseudomallei genes have best homology
with R. solanacearum genes - Percentage of B. pseudomallei ORFs with best
homolog matches and their source organisms was
shown
No Homolog 15
R. solanacearum 37
P. aeruginosa 7
C. violaceum 5
8Replicon 2 is more diverse than Replicon 1
Percentage of ORFs with best homolog matches and
their source organism for each B. pseudomallei
replicon.
No homolog
R. solanacearum 19
No homolog
R. solanacearum 49
P. aeruginosa 10
C. violaceum 6
C. violaceum 5
P. aeruginosa 5
9Chromosome vs Plasmid?
- No clear definition
- Origin of replication, DnaA vs RepA
- Chromosome generally housekeeping genes
- Plasmid - carry genes conferring growth
advantage, virulence factors etc, but not
essential. - Therefore, examine the origin of replication and
the genes encoded by the two replicons next
10Is Replicon 2 a megaplasmid?- analysis of
origins of replication
- A Java program ORIPredict was written to predict
origin of replication - GC-skew analysis (based on the bias toward G in
the leading strand during replication) (Lobry,
JR 1996) - (C-G)/(CG) window size 20kbp sliding step
5kbp
Replicon 2
Origin of replication
11Replicon 2 has plasmid-like ori
- Plasmid type origin of replication
- Plasmid replication initiator RepA
- Direct repeats in AT-rich region 15 (iteron)
- Low copy number plasmid partition protein ParA
and ParB
12Housekeeping genes complete in Replicon 1
incomplete in Replicon 2
- Replicon 1 encodes a complete set of essential
housekeeping genes required for - (1) DNA replication, cell division
- (2) transcription and
- (3) translation.
- Replicon 1 other essential pathways
- purine and pyrimidine biosynthesis and salvage
- coenzyme biosynthesis
- amino acid biosynthesis
- electron transport and phosphorylation
- None of the above pathways is complete on
replicon 2 - Encodes some genes in the pathways
- Keep the pathways work in various environment?
- More transcriptional regulators (p3.1e-6) more
robust control?
13Conclusion I
- Chromosome 2 is a highly probable megaplasmid
- Possibly dispensable, as replicon 1 encodes a
full set of housekeeping genes and other
essential genes - Size-3 Mbp plasmid?
14Prediction of Pathogenicity Islands in B.
pseudomallei
- Virulence genes are often clustered together,
called Pathogenicity Island (PAI) - Most PAIs are putative alien
- Different genomic properties compared to whole
genome - Karlin (2001) has reviewed 5 criteria to identify
PAI - GC frequency
- Dinucleotide bias
- Codon usage bias
- Amino acid usage bias
- Putative alien genes cluster
15Prediction of PAI in B. pseudomallei-
Implementation
- Java Program PAIPredict with graphical user
interface was written - The prediction of B. pseudomallei PAI was carried
out using this program
16Prediction of PAI in B. pseudomalleiReplicon 1
Window size 100 kbp Sliding step 5 kbp
17Replicon 1 has 3 putative PAIs
18Replicon 2 results Window size 100 kbp Sliding
step 5 kbp
19Replicon 2 has 4 putative PAIs
20Conclusion
- The B. pseudomallei Genome Database (BpmDB) was
constructed, and web interface of this database
was designed. - Many software programs were created
- PAI prediction package (PAIPredict)
- Origin of replication prediction (ORIPredict)
- Database interface Perl scripts
- 30 other Java or Perl program or scripts
- Chromosome 2 is probably a 3.2 Mbps megaplasmid
(largest known so far is 2.1 Mbps) - Experimental verification needed
- E.g. knock out replicon 2 ori
- Discovery of 5 new putative PAIs in B.
pseudomallei genome 2 PAIs confirmed
21Acknowledgement
- I would like to express my sincere appreciation
to my supervisors, Dr Chua Kim Lee and A/P Tan
Tin Wee, for their guidance, patience, and frank
advice throughout the entire project