Developing novel web-based Bioinformatics analysis tools for Comparative Genomics PowerPoint PPT Presentation

presentation player overlay
1 / 28
About This Presentation
Transcript and Presenter's Notes

Title: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics


1
Developing novel web-based Bioinformatics
analysis tools for Comparative Genomics
  • Kashi Vishwanath Revanna,
  • Capstone Presentation,
  • May 1, 2009
  • Primary Advisor
  • Dr. Qunfeng Dong,
  • The Center for Genomics and Bioinformatics (CGB)

2
Introduction
  • Comparative genomics
  • It is the analysis and comparison of genomes from
    different species.
  • Identify
  • gene duplications.
  • gene inversions.
  • gene translocations.
  • gene clusters.
  • orthologs and paralogs.

3
Overview
  • Blast Output Visualization (BOV) Tool.
  • visual representation of BLAST output.
  • Perl scripts from Rajesh Gollapudi, CGB.
  • Comparative Genome Cluster Viewer (CGCV)
  • gene clusters across multiple genomes.
  • database developed by Vivek Krishnakumar, CGB.
  • Multiple Genome Browser (MGB)
  • synteny regions between genomes.

4
BOVBLAST Output Visualization Tool
5
Motivation
  • Commonly used tool for comparative genomics
  • Basic Local Alignment Search Tool (BLAST)
  • web based at NCBI or Standalone local
    installation.
  • input nucleotide/protein sequence(s)
  • database nucleotide sequences of genes or
    genomes, or protein sequence.
  • output textual format.
  • BLAST output consists of High-scoring Segment
    Pairs (HSPs) that correspond to matching pair
    between the query and the database hit sequence.
  • Manual interpretation of these regions can/will
    be difficult.

Altschul SF, Madden TL, Schaffer AA, Zhang J,
Zhang Z, Miller W, Lipman DJ Gapped BLAST and
PSI-BLAST A new generation of protein database
search programs. Nucleic Acids Res 1997,
25(17)3389-3402.
6
Requirement
  • Post-processing BLAST Output.
  • Programs are available to
  • flexibly select BLAST matching regions. (e.g.
    MuSeqBox, BioParser).
  • parse the output into database to facilitate
    keyword search. (e.g. NuclearBLAST program, PLAN
    web server).
  • Need
  • A tool for graphical representation of HSPs,
    extracted from the BLAST output and provide
    options to interactively select and analyze.

7
Specifications
  • To develop the tool
  • parse uploaded BLAST output.
  • extract HSP co-ordinates.
  • store the information in the database.
  • provide summary of query sequences and
    corresponding hit sequences.
  • generate visual representation of HSPs.
  • ability to manipulate the HSPs.

8
Implementation
CGB server (Perl 5, Linux Platform)
Web interface (DHTML, Perl, CGI)
Blast Output (BLASTN/P/X, TBLASTN/X)
Perl Scripts (BioPerl Modules)
Email
MySQL (HSPs, Projects, ..)
Summary
Visualization (Javascript)
Create Image (Perl GD Library)
Download (Sequences, HSP, image, ..)
9
Screenshots
  • BLAST output submission

Query information
10
Screenshots
11
Screenshots
12
Program Release
  • BOV ver-1.0.7 is live and hosted at
  • http//bioportal.cgb.indiana.edu/bov
  • Web-pages
  • in-depth tutorial on using the tool.
  • download and installation manual.
  • Publication
  • Rajesh Gollapudi, Kashi Vishwanath Revanna,
    Chris Hemmerich, Sarah Schaack, and Qunfeng Dong
    (2008) BOV - A Web-based BLAST Output
    Visualization Tool. BMC Genomics. 2008 Sep
    159(1)414.
  • contributed equally

13
CGCVComparative Genome Cluster Viewer
14
Motivation
  • Standard practice in comparative genomics
  • identification of conserved gene clusters across
    multiple genomes.
  • Existing tools rely on pre-computation strategies
    and algorithms that are genome wide and
    computationally intensive.
  • Genome-wide orthologs for all gene families based
    on identifying reciprocal best BLAST hits.
  • Limitations
  • no optimal universal BLAST parameters for all
    gene families
  • distinguishing orthologs from paralogs on a
    genome-wide scale
  • when new organisms are available, time-consuming
    updates.
  • Requirement
  • Updated Database.
  • A tool which considers only a set of genes,
    perform dynamic search against selected genomes
    and interactively visualize the gene cluster
    conservation across the selected genomes.

15
Specification
  • To develop the web-based tool
  • maintain database of Prokaryotic and Eukaryotic
    sequences, annotated gene information.
  • Database in-sync with NCBI and Ensembl
  • Use BLAST program to blast uploaded query
    sequences.
  • User selects the BLAST database and parameters.
  • Generate Phylogenetic Profiling Table,
  • i.e., count of HSPs against a given genome with
    respect to each query sequence.
  • Provide interactive tools to manipulate the
    visual representation of the gene clusters across
    genomes.

16
Implementation
CGB Server (Perl 5, Linux Platform)
Web Interface (DHTML, Perl, CGI, Ajax)
Database (CGB)
- Select Genomes - Query Sequences
BLAST Program
MySQL (Sequences, GFF, GTF)
Email
Perl Scripts (BioPerl Modules)
Phylogenetic Profiling Table
Perl Scripts (download, daily updates)
GFF format file
Visualization (Javascript)
NCBI
Create Image (Perl, GD Library)
Ensembl
Download (BLAST output, ..)
17
Screenshots
18
Screenshots
19
(No Transcript)
20
Program Release
  • CGCV ver-1.0.5 is live and hosted at
  • http//cgcv.cgb.indiana.edu/
  • Web pages also provide
  • in-depth tutorial to use the tool
  • step-by-step procedure for local installation.
  • update information on database.
  • Publication
  • Kashi Vishwanath Revanna, Vivek Krishnakumar
    Qunfeng Dong (2009) A web-based software system
    for dynamic gene cluster comparison across
    multiple genomes. Bioinformatics, 25(7)956-957

21
MGB Multiple Genome Browser
22
Motivation
  • Comparative Genomics involves determination of
    the synteny regions between two or more genomes.
  • Synteny is the preserved order of genes between
    related species.
  • Currently available tools like SynBrowse,
    provide visualization of synteny between genomes
    but it involves pre-computation of alignments.
  • Pan X, Stein L, Brendel V SynBrowse, a synteny
    browser for comparative sequence analysis.
    Bioinformatics 2005, 21(17)3461-3468.

23
Specification
  • To develop a web-based tool for visualizing
    synteny for multiple genomes.
  • To allow users to determine the synteny by using
    their choice of sequence comparison
    methods/tools.
  • To be portable with simple installation procedure.

24
Progress
  • Currently building this tool.
  • Expected time of completion End of June.

25
Conclusion
  • Web-based tools were built to assist a Biologist
    in Comparative Genomics.
  • Design, implementation, testing, maintenance and
    provide support.
  • Balance between usability, functionality and
    portability.
  • Future work
  • further development.
  • incorporate these tools in their workflow.

26
References
  • Altschul SF, Madden TL, Schaffer AA, Zhang J,
    Zhang Z, Miller W, Lipman DJ Gapped BLAST and
    PSI-BLAST a new generation of protein database
    search programs. Nucleic Acids Res 1997,
    25(17)3389-3402.
  • Dong Q, Lawrence CJ, Schlueter SD, Wilkerson MD,
    Kurtz S, Lushbough C, Brendel V Comparative
    plant genomics resources at PlantGDB. Plant
    Physiol 2005, 139(2)610-618.
  • Xing L, Brendel V Multi-query sequence BLAST
    output examination with MuSeqBox. Bioinformatics
    2001, 17(8)744-745.
  • Catanho M, Mascarenhas D, Degrave W, de Miranda
    AB BioParser a tool for processing of sequence
    similarity analysis reports. Appl Bioinformatics
    2006, 5(1)49-53.
  • Stajich JE, Block D, Boulez K, Brenner SE,
    Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG,
    Korf I, Lapp H, Lehvaslaiho H, Matsalla C,
    Mungall CJ, Osborne BI, Pocock MR, Schattner P,
    Senger M, Stein LD, Stupka E, Wilkinson MD,
    Birney E The Bioperl toolkit Perl modules for
    the life sciences. Genome Res 2002,
    12(10)1611-1618.
  • Pan X, Stein L, Brendel V SynBrowse a synteny
    browser for comparative sequence analysis.
    Bioinformatics 2005, 21(17)3461-3468.
  • Wang H, Su Y, Mackey AJ, Kraemer ET, Kissinger
    JC SynView a GBrowse-compatible approach to
    visualizing comparative genome data.
    Bioinformatics 2006, 22(18)2308-2309.
  • Fong C, et al. PSAT a web tool to compare
    genomic neighborhoods of multiple prokaryotic
    genomes. BMC Bioinformatics (2008) 9170.
  • Koski LB, Golding GB. The closest BLAST hit is
    often not the nearest neighbor. J. Mol. Evol.
    (2001) 52540542.
  • Markowitz VM, et al. The integrated microbial
    genomes (IMG) system in 2007 data content and
    analysis tool extensions. Nucleic Acids Res.
    (2008) 36D528D533.
  • Uchiyama I, et al. CGAT a comparative genome
    analysis tool for visualizing alignments in the
    analysis of complex evolutionary changes between
    closely related genomes. BMC Bioinformatics
    (2006) 7472.

27
Acknowledgment
  • Dr. Qunfeng Dong.
  • Bioinformatics Director,
  • The Center for Genomics and Bioinformatics (CGB)
  • Bioinformatics Faculty and Staff,
  • School of Informatics.
  • Friends and Colleagues at CGB for their support
    and resources.
  • Special Thanks to my family.
  • Thank You.

28
Questions?
Write a Comment
User Comments (0)
About PowerShow.com