Title: NCBI Data Mining Tools: NCBI Genome Workbench
1NCBI Data Mining Tools NCBI Genome Workbench
- http//www.ncbi.nlm.nih.gov/projects/gbench/
June 10, 2008
Ceske Budejovice, Czech Republic
2Tools for Data Mining at NCBI
Nucleotide Sequence Analysis
Structures
Protein Sequence Analysis and Proteomics
Genome Analysis
Gene Expression
http//www.ncbi.nlm.nih.gov/Tools/
3Tools for Data Mining at NCBI
Nucleotide Sequence Analysis
Structures
Protein Sequence Analysis and Proteomics
Genome Analysis
Gene Expression
http//www.ncbi.nlm.nih.gov/Tools/
4Genome WorkbencH
- What it is?
- Provide an interactive, client side GUI
- Provide a suite of annotation tools
- Provide a platform for visualization and analysis
- Provide an easily extensible platform
- View data from publicly available sequence
databases, e.g. NCBI and your own private data. - Display sequence data as
- graphical sequence views, various alignment
views, phylogenetic tree views, and tabular views
of data. - Align your private data to data in public
databases, display your data in the context of
public data, and retrieve BLAST results.
5Genome Workbench Home Page
http//www.ncbi.nlm.nih.gov/projects/gbench/
6Genome Workbench All Screens
7Workspace vs Project
- Projects Hold Data. Workspaces Hold Projects. It
is best to combine data that go together inside
projects, and to use workspaces to hold
collections of projects that may or may not go
together.
8Project Tree
- The Project Tree is a view on your workspace and
projects. The project tree shows you a
hierarchical expansion of your data, and allows
you to group data items into folders. - The Project Tree is available from the main menu
at View -gt Project Tree. It is on by default, and
appears on the left-hand side.
9Tools Selection Inspector
- The selection inspector provides a means for
evaluating all the selected objects in Genome
Workbench. - The selection inspector has three modes of
operation (Table, Brief Text, and Full Text),
selectable by using the icons in the right-hand
corner of the view - Aggregating selection across views
- Show selection from all view
10Data Mining View
- The data mining view is a view that combines many
modes of searching into one interface. From the
data mining view, you can search for items in the
public sequence repository you can search for
gene records from Entrez Gene you can search for
annotations in a given view and you can search
for patterns of sequences. - The data mining view is on by default, and is
available from View -gt Data Mining View. It is
generally docked along the bottom
11Supported Data Formats
- FASTA sequence files
- GFF2/GTF format (NOTE GFF3 support will be added
soon) - RepeatMasker .out format
- Sequin-style 5-Column Feature Table format
- Newick-format phylogenetic trees
- Phrap/ACE assembly files
- AGP sequence assembly files
- NCBI ASN.1 objects (in ASN.1 text or binary or in
XML format) - NOT GenBank nor EMBL Flatfile format
12Manage/Add Plug-ins
Plug-ins developed in Python (best support) and
PerlMaybe too cluttered to load everything
13Tools Alignment
14Tools Composition
15Tools Phylogenetic Trees
16Tools Send to Web Pages
17Example
18Example 2
19(No Transcript)
20Genome Workbench Screenshots
21Streptococcus pyogenes Alignments Using Genome
Workbench
- Workspace containing ONE project that shows the
relationship of two bacterial genomes, both
serotyped as Streptococcus pyogenes strains. The
alignments are notable for two large inversions - This project includes an alignment of
Streptococcus pyogenes (serotype M3, strain
SSI-1, NC_004606 ) to Streptococcus pyogenes
(serotype M6, strain MGAS10394, NC_006086 ).
22Streptococcus pyogenes Alignments
23Streptococcus pyogenes Alignments
24Streptococcus pyogenes Alignments
25Streptococcus pyogenes Alignments