Poxviruses, Biodefense and Bioinformatics - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Poxviruses, Biodefense and Bioinformatics

Description:

Alternative codon usage pattern. Alternative evolutionary inheritance pattern ... Codon preference. HMM gene models. Similarity searching. BLAST similarity searches ... – PowerPoint PPT presentation

Number of Views:264
Avg rating:3.0/5.0
Slides: 81
Provided by: elli63
Category:

less

Transcript and Presenter's Notes

Title: Poxviruses, Biodefense and Bioinformatics


1
Poxviruses, Biodefense and Bioinformatics
  • Working towards a better understanding of viral
    pathogenesis and evolution

2
Bioinformatics
  • Managing Complexity
  • Technology development
  • Enhancing Understanding
  • Research

3
Managing Complexity
  • Data
  • Acquisition
  • Storage
  • Manipulation
  • Retrieval

4
Managing Complexity
  • Data Analysis
  • Development and Utilization of
  • Analytical tools
  • Visualization tools

5
Enhancing Understanding
  • What distinguishes one organism from another?
  • Sequence
  • Molecular Biology
  • Physiology
  • Pathogenesis
  • Epidemiology
  • Evolution
  • Will the genomic sequence provide an explanation
    for the differences?

6
What is Bioinformatics?
  • Computer-aided analysis of biological information
  • Discerning the characteristic (repeatable)
    patterns in biological information that help to
    explain the properties and interactions of
    biological systems.
  • Caveat
  • In the end, bioinformatics (a.k.a. computers) can
    only help in making inferences concerning
    biological processes.
  • These inferences (or hypotheses) have to be
    tested in the laboratory

7
The Poxvirus Bioinformatic Resource
PBR
  • www.poxvirus.org

8
PBR Collaborators
  • UAB
  • Elliot Lefkowitz
  • St. Louis University
  • Mark Buller
  • University of Victoria
  • Chris Upton
  • ATCC
  • Charles Buck
  • Medical College of Wisconsin
  • Paula Traktman

9
The UAB MGBF ContingentMolecular and Genetic
Bioinformatics Facility
  • Programmers
  • Jim Moon
  • Don Dempsey
  • Uma Dave
  • Bei Hu
  • Students
  • Chunlin Wang
  • Fellows
  • Shankar Changayil
  • Xiaosi Han

10
Poxviruses
  • Large dsDNA genome
  • 150,000 300,000 base pairs
  • 150 260 genes
  • Complex virion morphology
  • Cytoplasmic replication
  • Array of immunoevasion strategies.
  • Human pathogens
  • Molluscum contagiosum
  • Variola
  • Monkeypox

11
The PBR is Designed to Support
  • Basic and applied research on Poxviruses
    including the development of new
  • Environmental Detectors
  • Diagnostic Reagents
  • Animal Models
  • Vaccines
  • Antiviral Compounds

12
PBR Design Philosophy
  • Useful and Used
  • Supporting all poxvirus investigators
  • UAB PBR Web-based application requirements
  • Web Browser
  • Java plugin
  • In-depth analyses
  • UVic analytical tools

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
BLAST
  • Search a sequence database for primary sequence
    similarities to some query sequence
  • Provides a measure of the significance of the
    similarity
  • Does not necessarily imply common evolutionary
    origin
  • Developed at NCBI
  • Altschul, S.F., Gish, W., Miller, W., Myers, E.W.
    Lipman, D.J. (1990) "Basic local alignment
    search tool." J. Mol. Biol. 215403-410.

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
18 Genomes 563 genes Avg. 31 genes/genome
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
PBR Knowledge Database
  • Mini review of available structure-function
    information
  • Human-curated database based on the literature
  • Bibliographic information
  • Available scientific resources
  • clones, mutants, and antibodies
  • Empirically-derived properties
  • MW, pI . . .
  • Post-translational modifications
  • Expression
  • Functional Assignments
  • Gene Ontology controlled vocabulary
  • Molecular function
  • Biological Process
  • Cellular component
  • Virulence Ontology

36
(No Transcript)
37
Molecular Evolution and GenomicAnalyses of
Poxviruses
38
Objectives
  • To better understand the role individual genes
    and groups of genes (or other genetic elements)
    play in poxvirus (especial smallpox ) host range
    and virulence
  • Try to describe and understand poxvirus diversity
    via reconstruction of the families evolutionary
    history

39
Orthopoxvirus Phylogeny
40
Orthopoxvirus Phylogeny
132 gene tree possible
41
65 gene treepossible forChordopoxviruses
42
Horizontal Gene Transfer
  • The acquisition of genetic material from another
    organism that becomes a permanent addition to
    the recipients genome
  • Many poxvirus genes involved in immune evasion
    may have been acquired thorough HGT
  • Detection of HGT
  • Alternative base composition
  • Alternative codon usage pattern
  • Alternative evolutionary inheritance pattern

43
Detecting HTGs by plotting codon usage
44
GC distribution of Molluscum Contagiosum
MOCV-SB1_011
MOCV-SB1_055
MOCV-SB1_132
GC distribution in Molluscum Contagiosum genome.
It is smoothened by wavelet technique. The blue
number is the position in genome. The green bars
mark significant deviation and a putative gene is
marked there.
45
VARV Proteins with Similarity to Human Proteins
  • 3-beta-hydroxysteroid dehydrogenase
  • Ankyrin
  • CD47 antigen
  • Carbonic Anhydrase
  • Casein kinase 1
  • Complement control protein
  • DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide
  • DNA ligase
  • Glutaredoxin
  • Hypothetical protein
  • JNK-stimulating phosphatase
  • Kelch-like protein
  • Lymphocyte activation-associated protein
  • Makorin zinc-finger protein
  • Myosin heavy chain
  • Plasminogen activator inhibitor
  • Profilin
  • RNA polymerase
  • Ribonucleotide reductase M2

46
Ribonucleotide Reductase Homolog Evolution
47
(No Transcript)
48
TNF Receptor Homolog Evolution
49
(No Transcript)
50
TNF Receptor GenBank nr Hits
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
VARV B22R BLASTN Results
57
Genome Comparison Variola major vs. minor
58
Genome vs. Gene Phylogeny
59
Molecular Evolution and GenomicAnalyses of
Poxviruses
  • We have a problem

60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
Poxvirus Gene Prediction
  • Little consistency from one genome to another
  • Methods employed
  • Minimum ORF size
  • Similarity with previously described proteins

65
Consistently predict and annotate the gene set
for all Poxvirus genomes
  • Development of a comprehensive gene prediction
    tool
  • Discovery of new or missed genes
  • Removal of pseudo genes
  • As an added bonus
  • Computational annotation of each predicted gene

66
What is a gene?
  • Does it looks like a gene?
  • Open Reading Frame
  • Base composition
  • Codon usage
  • Is it expressed?
  • Regulatory signals
  • Transcription
  • Translation
  • Has it been previously recognized?
  • Similarity searching

67
Proposal gene finding tool
  • Combination of a series of complementary gene
    prediction algorithms
  • DNA Signals
  • ORF detection
  • Base composition
  • Codon preference
  • HMM gene models
  • Similarity searching
  • BLAST similarity searches
  • Similarity to identified poxvirus protein domains
    using an HMM-based domain database
  • Promoter detection
  • Neural Network promoter detection tool
  • Patterns of amino acid sequence conservation
  • Biodictionary-based analysis
  • Knowledge-based integration of all predictive
    methods
  • Computational conclusions
  • Visualization tool for human inspection

68
Using High Performance Computing to Speedup
Bioinformatic Applications
69
(No Transcript)
70
Features to consider in porting an application to
a cluster environment
  • Balancing the processing workload among nodes is
    critical to successful implementation
  • A computational method with a lower percentage
    load imbalance (PLIB) is more efficient than one
    with a higher PLIB. The workload is perfectly
    balanced if PLIB is equal to zero.
  • Similarity searching workload can be difficult to
    estimate
  • Dependent on the nature of both the database and
    query sequences
  • sequence length
  • number of sequences
  • complexity of the sequences

71
Data Segmentation
  • Database Sequences
  • Utilize when the database size is larger than
    physical memory of each computational node
  • Results need to be combined and statistics
    recalculated
  • Not possible with some applications (PSI-BLAST)
  • Query Sequences
  • Flexible and allows for better balancing of the
    workload
  • Statistics remain valid
  • Database remains intact
  • Best performance when the database can be fully
    loaded into available memory

72
Work Flow for Database segmentation
  • Database is split evenly and formatted
  • Database fragments are sent to each node
  • Query file is distributed to all nodes
  • The search is initiated
  • Output is collected for merging and formatting

73
Work Flow for Query Segmentation
  • Database is distributed to all nodes
  • 90 of the query sequences are split into bins
    and distributed among the available nodes
  • Balanced for sequence length and number
  • The remaining 10 query of the query sequences
    are delivered to nodes as they finish the initial
    search
  • Individual results are merged and reported

74
Implementation
  • Utilizes the LAM/MPI Message Passing Interface
    package from Indiana University
  • The application executables are not altered
  • The implementation wraps the executable and data
    and sends it to each node
  • Easily accommodate application updates
  • Easily extends to similar applications
  • Currently have implemented two wrappers
  • BLAST
  • HMMPFAM
  • Sean Eddy, Washington University School of
    Medicine, St. Louis, Missouri
  • Benchmarks performed on the UAB School of
    Engineer Linux cluster
  • 2 storage servers (IBM x345).
  • one compile node and 64 compute nodes (IBM x335)
  • 2 x 2.4 GHz Xeon processors per node
  • 2-4 GB of RAM per node
  • 18 GB SCSI hard drive
  • connected via Gigabit Ethernet to a Cisco 4006
    switch

75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
Comparison of gene finding methods
79
Gene prediction Putting it all together
38000
32000
40000
36000
34000
80
Now the real work can begin
  • More rigorous comparative analysis
  • Shared and unique sets of gene composition
  • SNP analysis of gene differences
  • Whole genome phylogenetic prediction
  • Individual gene phylogenetic prediction
  • Unique patterns of evolutionary inheritance
  • Clustering of evolutionary inheritance with
    pathogenesis
Write a Comment
User Comments (0)
About PowerShow.com