Supported by the NSF Plant Genome Research and REU Programs - PowerPoint PPT Presentation

About This Presentation

Title:

Supported by the NSF Plant Genome Research and REU Programs

Description:

Each individual clade of a family tree is also prepared in TreeDyn and link ... Under each tree (family 1.1 shown) is the link 'View the protein sequence file' ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 24

Provided by: nickca9

Learn more at: https://cellwall.genomics.purdue.edu

Category:

more less

Transcript and Presenter's Notes

Title: Supported by the NSF Plant Genome Research and REU Programs

1
Tutorial of bioinformatics and tree generation at
the Cell Wall Genomics website
Bryan Penning
Supported by the NSF Plant Genome Research and
REU Programs
2
Bioinformatics Goals

We currently have a wealth of Arabidopsis
thaliana cell wall gene information on the
website, we wanted to
Add family information about rice and maize Type
II cell walls to compare to A. thaliana Type I
cell walls
Add links to outside information on rice genes
like we have for A. thaliana
Include annotated composite trees of A. thaliana,
rice and maize gene families
Add links to sites used to generate the data
Add source protein sequence used for our family
trees so other researchers can make their own
adding their genes of interest
Generate a tutorial on how researchers can make
use of the bioinformatics data on our site

3
Diagram of our bioinformatics approach
Too few genes, Blast other sites
N
Blast TIGR
Choose genes
Make tree
Y
Too many genes, tighten criteria
N
Diagram of the process used to find the genes and
draw family trees for cell wall related rice
genes. The same approach is used for maize.
4
Diagram of our bioinformatics approach
A. thaliana genes
Draw tree with all family members
Publish to web
Annotate
Rice genes
Diagram of the process used to integrate cell
wall related genes from all three family trees
into a composite tree.
Maize genes
5
BLASTing genes

To be considerate of the bioinformatics community
with the number of BLASTs to be performed and to
speed the process, we downloaded the text or
flat file of the TIGR rice protein sequences
(available at http//www.tigr.org/tdb/e2k1/osa1/d
ata_download.shtml) and performed local blasts
using blastall from NCBI (available at
http//www.ncbi.nlm.nih.gov/BLAST/download.shtml)
Direction for use of these tools is available at
the above sites and is beyond the scope of this
tutorial
For a small number of BLASTs, you can use
web-based methods and common programs such as
Word and Excel plus any of a number of
downloadable tree drawing programs to make these
kinds of trees on your own if you are not
familiar with programming languages such as Perl
to automate the process. Although web searches
can be more time consuming, they work just as
well for a few sequences

6
Web BLASTing

For smaller numbers of BLASTs to the rice genome,
TIGR provides an excellent Web BLAST at
http//tigrblast.tigr.org/euk-blast/index.cgi?proj
ectosa1
You can also use the new BLAST tool at Gramene
http//www.gramene.org/multi/blastview for most
cereal sequences
Note gene model versions sometimes differ
between Gramene and TIGR as one site may update
to the latest model before the other

7
Web BLASTing

Downloading the protein sequence for Arabidopsis
SUD1 (At3g46440) from TIGR, you can BLAST it
against the TIGR Rice Pseudomolecules Protein
database using BLASTp

8
Web BLASTing

You get a series of hits to the gene of
interest
A higher score and smaller probability is a
better match to the original gene
This procedure is followed for all of the genes
in a family to gather the best possible hits,
sort the hits to remove duplicates and choose the
best rice matches to the Arabidopsis families
You can use NCBIs blastall tool for multiple
simultaneous blasts as we do for this step

9
Organizing BLASTs

This is a word document generated by BLASTing
SUD1 and SUD2 of Arabidopsis against the TIGR
Rice Protein database
The hits were copied into word and set to the
font Courier New, 9 pt and saved as a text only
document (to remove the HTML code)
The file was reloaded in Word and converted to a
table (table menu) using other and the character
(shift \) to separate the columns

10
Organizing BLASTs

The Word file is copied into Excel and the Data
Sort menu is used to sort by the first column
This brings all of the same named genes together
(the two highlighted lines for example)
Duplicate genes are removed from the spreadsheet
and the far right column only (LOC_Osxxgxxxxxx)
tags can be copied back to word

11
Organizing BLASTs

You can use the table menu to convert table to
text (Paragraph Marks) to generate a list of
genes
These genes can be searched through a downloaded
database using the NCBI fastacmd (included in the
BLAST download tools) or you can search them one
at a time using a web-based database such as the
locus search name on TIGR (http//www.tigr.org/td
b/e2k1/osa1/LocusNameSearch.shtml)

12
Generating a tree

Once you have found all of your sequences, check
that each sequence name has a (denoting a new sequence name) and the sequence
starts on a new line
Copy and paste all of your sequences into an
alignment program like ClustalW (we use
http//align.genome.jp/ from the Kyoto University
Bioinformatics center, but any ClustalW program
will work)

13
Generating a tree

For our trees we use Slow/Accurate pair-wise
comparisons and Gonnet for our Weight Matrix (two
spots on the website)
Click execute alignment to get your sequence
alignment
At the end of the alignment page will be the
information needed for tree drawing programs
You can click on clustal.dnd for a quick tree or
take the information after it A Newick format
tree and copy it into a new Word file, saving it
as a text file (include all parenthesis)

14
Creating a tree

We use the program TreeDyn to generate our trees
(available at http//www.treedyn.org/)
This is an example of the Arabidopsis and rice
1.1 family
The tree text file was loaded into TreeDyn and
the frame enlarged
The red text for Arabidopsis sequences was done
by changing the font color to red and using the
find panel to find all At sequences (which turn
red)
The scale at the bottom was added by right
clicking on that space and choosing the tree
name, annotation, and scale sub-menus
This square tree is useful to see associations of
genes for different species

15
Square tree example

This is part of the family 1.1 square dendrogram
of Arabidopsis, rice and maize from our website
The red names are Arabidopsis sequence, the black
names are rice, and the green names are maize
Regions alternate between grey shaded and white
backgrounds (added with Photoshop) to indicate
clades of similar sequence genes which may relate
function (such as AUD/SUD or GME, etc)

16
Radial dendrograms

TreeDyn can also draw radial dendrograms such as
the one shown for rice family 1.1
This can be done by right clicking on the tree
area to bring up the grey box in TreeDyn,
choosing your tree, then Conformation- Radial
Treedyn allows you to resize, rotate, and flip
clades around (see http//www.treedyn.org/ for
detailed tutorials on these processes)
For our site, we export the radial trees as jpeg
images

17
Finishing a radial dendrogram
The TreeDyn tree jpeg is finished as a FLASH file
where the ovals and family names are added (Rice
family 1.1 shown)
Each individual clade of a family tree is also
prepared in TreeDyn and link buttons added later
in FLASH (AUD/SUD-like shown)
18
Viewing your gene of interest

We provide protein sequence information you can
download and add in your own sequence of interest
for comparison to these three species
Under each tree (family 1.1 shown) is the link
View the protein sequence file
Right click and choose Save Target as to
download the sequence with a filename and
location you will remember
You can do this for each Arabidopsis, rice, and
maize family

19
Viewing your gene of interest

You may have a sequence you think is related to
a particular family such as nucleotide
interconversion pathway (family 1.1)
For example, the wheat EST CV523101 from
Genebank
http//www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db
nucleotidevalCV523101
might be related to the TIGR rice gene
Os05g29990 in the AUD/SUD clade of family 1.1
according to information from Gramene

20
Viewing your gene of interest

You can take the nucleotide sequence and covert
it to protein sequence using a program such as
Genemark (http//opal.biology.gatech.edu/GeneMark
/eukhmm.cgi)
Protein sequence returned
CV523101_wheat
IARIFNTYGPRMCIDDGRVVSNFVAQALRKEPLTVYGDGKQTRSFQYVS
DLVEGLMRLMEGDHIGPFNLGNPGEFTMLELAKVVQDTIDPNARIEFREN
TQDDPHKRKPDITKAKEQLGWEPKIALRDGLPLMVTDFRKRIFGDQDSAA
TATEG

21
Viewing your gene of interest

Paste all of the sequences for family 1.1
(Arabidopsis, rice, and maize) plus the Wheat
EST, CV523101_wheat, converted to protein into a
ClustalW program such as
http//align.genome.jp/
from the Kyoto University Bioinformatics center
Perform the multiple alignment, copy the Newick
tree data generated into a new word file, and
save a text file as previously shown

22
Viewing your gene of interest

Taking the Newick tree from clustalW into TreeDyn
as previously shown will allow you to visualize
the tree
The AUD/SUD clade of the tree generated by
TreeDyn shows that the wheat EST (in blue) is
most closely related to the rice gene Os05g29990
in the AUD clade

The AUD/SUD clade of the family 1.1 tree for
Arabidopsis (red), Rice (black), Maize (green),
and a wheat EST (blue) added to demonstrate how
you can visualize relatedness of your own genes
using our protein sequences
23
Bioinformatics sites used

General
Multiple alignment for trees, ClustalW
(http//align.genome.jp/)
Making trees, TreeDyn (http//www.treedyn.org/)
BLASTing NCBI (http//www.ncbi.nlm.nih.gov/BLAST/)
Proteins translated by GeneMark
(http//opal.biology.gatech.edu/GeneMark/eukhmm.cg
i)
Rice
Sequence BLAST using TIGR (http//www.tigr.org/tdb
/e2k1/osa1/)
Downloading rice protein sequences from TIGR
(http//www.tigr.org/tdb/e2k1/osa1/LocusNameSearch
.shtml)
Maize
Sequence BLAST using TIGR ZmGI (http//www.tigr.or
g/tigr-scripts/tgi/T_index.cgi?speciesmaize)
Sequence BLAST using Gramene (http//www.gramene.o
rg/multi/blastview)