Title: Poxviruses, Biodefense and Bioinformatics
1Poxviruses, Biodefense and Bioinformatics
- Working towards a better understanding of viral
pathogenesis and evolution
2Bioinformatics
- Managing Complexity
- Technology development
- Enhancing Understanding
- Research
3Managing Complexity
- Data
- Acquisition
- Storage
- Manipulation
- Retrieval
4Managing Complexity
- Data Analysis
- Development and Utilization of
- Analytical tools
- Visualization tools
5Enhancing Understanding
- What distinguishes one organism from another?
- Sequence
- Molecular Biology
- Physiology
- Pathogenesis
- Epidemiology
- Evolution
- Will the genomic sequence provide an explanation
for the differences?
6What is Bioinformatics?
- Computer-aided analysis of biological information
- Discerning the characteristic (repeatable)
patterns in biological information that help to
explain the properties and interactions of
biological systems. - Caveat
- In the end, bioinformatics (a.k.a. computers) can
only help in making inferences concerning
biological processes. - These inferences (or hypotheses) have to be
tested in the laboratory
7The Poxvirus Bioinformatic Resource
PBR
8PBR Collaborators
- UAB
- Elliot Lefkowitz
- St. Louis University
- Mark Buller
- University of Victoria
- Chris Upton
- ATCC
- Charles Buck
- Medical College of Wisconsin
- Paula Traktman
9The UAB MGBF ContingentMolecular and Genetic
Bioinformatics Facility
- Programmers
- Jim Moon
- Don Dempsey
- Uma Dave
- Bei Hu
- Students
- Chunlin Wang
- Fellows
- Shankar Changayil
- Xiaosi Han
10Poxviruses
- Large dsDNA genome
- 150,000 300,000 base pairs
- 150 260 genes
- Complex virion morphology
- Cytoplasmic replication
- Array of immunoevasion strategies.
- Human pathogens
- Molluscum contagiosum
- Variola
- Monkeypox
11The PBR is Designed to Support
- Basic and applied research on Poxviruses
including the development of new - Environmental Detectors
- Diagnostic Reagents
- Animal Models
- Vaccines
- Antiviral Compounds
12PBR Design Philosophy
- Useful and Used
- Supporting all poxvirus investigators
- UAB PBR Web-based application requirements
- Web Browser
- Java plugin
- In-depth analyses
- UVic analytical tools
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20BLAST
- Search a sequence database for primary sequence
similarities to some query sequence - Provides a measure of the significance of the
similarity - Does not necessarily imply common evolutionary
origin - Developed at NCBI
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W.
Lipman, D.J. (1990) "Basic local alignment
search tool." J. Mol. Biol. 215403-410.
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
3018 Genomes 563 genes Avg. 31 genes/genome
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35PBR Knowledge Database
- Mini review of available structure-function
information - Human-curated database based on the literature
- Bibliographic information
- Available scientific resources
- clones, mutants, and antibodies
- Empirically-derived properties
- MW, pI . . .
- Post-translational modifications
- Expression
- Functional Assignments
- Gene Ontology controlled vocabulary
- Molecular function
- Biological Process
- Cellular component
- Virulence Ontology
36(No Transcript)
37Molecular Evolution and GenomicAnalyses of
Poxviruses
38Objectives
- To better understand the role individual genes
and groups of genes (or other genetic elements)
play in poxvirus (especial smallpox ) host range
and virulence - Try to describe and understand poxvirus diversity
via reconstruction of the families evolutionary
history
39Orthopoxvirus Phylogeny
40Orthopoxvirus Phylogeny
132 gene tree possible
4165 gene treepossible forChordopoxviruses
42Horizontal Gene Transfer
- The acquisition of genetic material from another
organism that becomes a permanent addition to
the recipients genome - Many poxvirus genes involved in immune evasion
may have been acquired thorough HGT - Detection of HGT
- Alternative base composition
- Alternative codon usage pattern
- Alternative evolutionary inheritance pattern
43Detecting HTGs by plotting codon usage
44GC distribution of Molluscum Contagiosum
MOCV-SB1_011
MOCV-SB1_055
MOCV-SB1_132
GC distribution in Molluscum Contagiosum genome.
It is smoothened by wavelet technique. The blue
number is the position in genome. The green bars
mark significant deviation and a putative gene is
marked there.
45VARV Proteins with Similarity to Human Proteins
- 3-beta-hydroxysteroid dehydrogenase
- Ankyrin
- CD47 antigen
- Carbonic Anhydrase
- Casein kinase 1
- Complement control protein
- DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide
- DNA ligase
- Glutaredoxin
- Hypothetical protein
- JNK-stimulating phosphatase
- Kelch-like protein
- Lymphocyte activation-associated protein
- Makorin zinc-finger protein
- Myosin heavy chain
- Plasminogen activator inhibitor
- Profilin
- RNA polymerase
- Ribonucleotide reductase M2
46Ribonucleotide Reductase Homolog Evolution
47(No Transcript)
48TNF Receptor Homolog Evolution
49(No Transcript)
50TNF Receptor GenBank nr Hits
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56VARV B22R BLASTN Results
57Genome Comparison Variola major vs. minor
58Genome vs. Gene Phylogeny
59Molecular Evolution and GenomicAnalyses of
Poxviruses
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64Poxvirus Gene Prediction
- Little consistency from one genome to another
- Methods employed
- Minimum ORF size
- Similarity with previously described proteins
65Consistently predict and annotate the gene set
for all Poxvirus genomes
- Development of a comprehensive gene prediction
tool - Discovery of new or missed genes
- Removal of pseudo genes
- As an added bonus
- Computational annotation of each predicted gene
66What is a gene?
- Does it looks like a gene?
- Open Reading Frame
- Base composition
- Codon usage
- Is it expressed?
- Regulatory signals
- Transcription
- Translation
- Has it been previously recognized?
- Similarity searching
67Proposal gene finding tool
- Combination of a series of complementary gene
prediction algorithms - DNA Signals
- ORF detection
- Base composition
- Codon preference
- HMM gene models
- Similarity searching
- BLAST similarity searches
- Similarity to identified poxvirus protein domains
using an HMM-based domain database - Promoter detection
- Neural Network promoter detection tool
- Patterns of amino acid sequence conservation
- Biodictionary-based analysis
- Knowledge-based integration of all predictive
methods - Computational conclusions
- Visualization tool for human inspection
68Using High Performance Computing to Speedup
Bioinformatic Applications
69(No Transcript)
70Features to consider in porting an application to
a cluster environment
- Balancing the processing workload among nodes is
critical to successful implementation - A computational method with a lower percentage
load imbalance (PLIB) is more efficient than one
with a higher PLIB. The workload is perfectly
balanced if PLIB is equal to zero. - Similarity searching workload can be difficult to
estimate - Dependent on the nature of both the database and
query sequences - sequence length
- number of sequences
- complexity of the sequences
71Data Segmentation
- Database Sequences
- Utilize when the database size is larger than
physical memory of each computational node - Results need to be combined and statistics
recalculated - Not possible with some applications (PSI-BLAST)
- Query Sequences
- Flexible and allows for better balancing of the
workload - Statistics remain valid
- Database remains intact
- Best performance when the database can be fully
loaded into available memory
72Work Flow for Database segmentation
- Database is split evenly and formatted
- Database fragments are sent to each node
- Query file is distributed to all nodes
- The search is initiated
- Output is collected for merging and formatting
73Work Flow for Query Segmentation
- Database is distributed to all nodes
- 90 of the query sequences are split into bins
and distributed among the available nodes - Balanced for sequence length and number
- The remaining 10 query of the query sequences
are delivered to nodes as they finish the initial
search - Individual results are merged and reported
74Implementation
- Utilizes the LAM/MPI Message Passing Interface
package from Indiana University - The application executables are not altered
- The implementation wraps the executable and data
and sends it to each node - Easily accommodate application updates
- Easily extends to similar applications
- Currently have implemented two wrappers
- BLAST
- HMMPFAM
- Sean Eddy, Washington University School of
Medicine, St. Louis, Missouri - Benchmarks performed on the UAB School of
Engineer Linux cluster - 2 storage servers (IBM x345).
- one compile node and 64 compute nodes (IBM x335)
- 2 x 2.4 GHz Xeon processors per node
- 2-4 GB of RAM per node
- 18 GB SCSI hard drive
- connected via Gigabit Ethernet to a Cisco 4006
switch
75(No Transcript)
76(No Transcript)
77(No Transcript)
78Comparison of gene finding methods
79Gene prediction Putting it all together
38000
32000
40000
36000
34000
80Now the real work can begin
- More rigorous comparative analysis
- Shared and unique sets of gene composition
- SNP analysis of gene differences
- Whole genome phylogenetic prediction
- Individual gene phylogenetic prediction
- Unique patterns of evolutionary inheritance
- Clustering of evolutionary inheritance with
pathogenesis