Supported by the NSF Plant Genome Research and REU Programs - PowerPoint PPT Presentation

About This Presentation
Title:

Supported by the NSF Plant Genome Research and REU Programs

Description:

Each individual clade of a family tree is also prepared in TreeDyn and link ... Under each tree (family 1.1 shown) is the link 'View the protein sequence file' ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 24
Provided by: nickca9
Category:

less

Transcript and Presenter's Notes

Title: Supported by the NSF Plant Genome Research and REU Programs


1
Tutorial of bioinformatics and tree generation at
the Cell Wall Genomics website
Bryan Penning
Supported by the NSF Plant Genome Research and
REU Programs
2
Bioinformatics Goals
  • We currently have a wealth of Arabidopsis
    thaliana cell wall gene information on the
    website, we wanted to
  • Add family information about rice and maize Type
    II cell walls to compare to A. thaliana Type I
    cell walls
  • Add links to outside information on rice genes
    like we have for A. thaliana
  • Include annotated composite trees of A. thaliana,
    rice and maize gene families
  • Add links to sites used to generate the data
  • Add source protein sequence used for our family
    trees so other researchers can make their own
    adding their genes of interest
  • Generate a tutorial on how researchers can make
    use of the bioinformatics data on our site

3
Diagram of our bioinformatics approach
Too few genes, Blast other sites
N
Blast TIGR
Choose genes
Make tree
Y
Too many genes, tighten criteria
N
Diagram of the process used to find the genes and
draw family trees for cell wall related rice
genes. The same approach is used for maize.
4
Diagram of our bioinformatics approach
A. thaliana genes
Draw tree with all family members
Publish to web
Annotate
Rice genes
Diagram of the process used to integrate cell
wall related genes from all three family trees
into a composite tree.
Maize genes
5
BLASTing genes
  • To be considerate of the bioinformatics community
    with the number of BLASTs to be performed and to
    speed the process, we downloaded the text or
    flat file of the TIGR rice protein sequences
    (available at http//www.tigr.org/tdb/e2k1/osa1/d
    ata_download.shtml) and performed local blasts
    using blastall from NCBI (available at
    http//www.ncbi.nlm.nih.gov/BLAST/download.shtml)
  • Direction for use of these tools is available at
    the above sites and is beyond the scope of this
    tutorial
  • For a small number of BLASTs, you can use
    web-based methods and common programs such as
    Word and Excel plus any of a number of
    downloadable tree drawing programs to make these
    kinds of trees on your own if you are not
    familiar with programming languages such as Perl
    to automate the process. Although web searches
    can be more time consuming, they work just as
    well for a few sequences

6
Web BLASTing
  • For smaller numbers of BLASTs to the rice genome,
    TIGR provides an excellent Web BLAST at
    http//tigrblast.tigr.org/euk-blast/index.cgi?proj
    ectosa1
  • You can also use the new BLAST tool at Gramene
    http//www.gramene.org/multi/blastview for most
    cereal sequences
  • Note gene model versions sometimes differ
    between Gramene and TIGR as one site may update
    to the latest model before the other

7
Web BLASTing
  • Downloading the protein sequence for Arabidopsis
    SUD1 (At3g46440) from TIGR, you can BLAST it
    against the TIGR Rice Pseudomolecules Protein
    database using BLASTp

8
Web BLASTing
  • You get a series of hits to the gene of
    interest
  • A higher score and smaller probability is a
    better match to the original gene
  • This procedure is followed for all of the genes
    in a family to gather the best possible hits,
    sort the hits to remove duplicates and choose the
    best rice matches to the Arabidopsis families
  • You can use NCBIs blastall tool for multiple
    simultaneous blasts as we do for this step

9
Organizing BLASTs
  • This is a word document generated by BLASTing
    SUD1 and SUD2 of Arabidopsis against the TIGR
    Rice Protein database
  • The hits were copied into word and set to the
    font Courier New, 9 pt and saved as a text only
    document (to remove the HTML code)
  • The file was reloaded in Word and converted to a
    table (table menu) using other and the character
    (shift \) to separate the columns

10
Organizing BLASTs
  • The Word file is copied into Excel and the Data
    Sort menu is used to sort by the first column
  • This brings all of the same named genes together
    (the two highlighted lines for example)
  • Duplicate genes are removed from the spreadsheet
    and the far right column only (LOC_Osxxgxxxxxx)
    tags can be copied back to word

11
Organizing BLASTs
  • You can use the table menu to convert table to
    text (Paragraph Marks) to generate a list of
    genes
  • These genes can be searched through a downloaded
    database using the NCBI fastacmd (included in the
    BLAST download tools) or you can search them one
    at a time using a web-based database such as the
    locus search name on TIGR (http//www.tigr.org/td
    b/e2k1/osa1/LocusNameSearch.shtml)

12
Generating a tree
  • Once you have found all of your sequences, check
    that each sequence name has a (denoting a new sequence name) and the sequence
    starts on a new line
  • Copy and paste all of your sequences into an
    alignment program like ClustalW (we use
    http//align.genome.jp/ from the Kyoto University
    Bioinformatics center, but any ClustalW program
    will work)

13
Generating a tree
  • For our trees we use Slow/Accurate pair-wise
    comparisons and Gonnet for our Weight Matrix (two
    spots on the website)
  • Click execute alignment to get your sequence
    alignment
  • At the end of the alignment page will be the
    information needed for tree drawing programs
  • You can click on clustal.dnd for a quick tree or
    take the information after it A Newick format
    tree and copy it into a new Word file, saving it
    as a text file (include all parenthesis)

14
Creating a tree
  • We use the program TreeDyn to generate our trees
    (available at http//www.treedyn.org/)
  • This is an example of the Arabidopsis and rice
    1.1 family
  • The tree text file was loaded into TreeDyn and
    the frame enlarged
  • The red text for Arabidopsis sequences was done
    by changing the font color to red and using the
    find panel to find all At sequences (which turn
    red)
  • The scale at the bottom was added by right
    clicking on that space and choosing the tree
    name, annotation, and scale sub-menus
  • This square tree is useful to see associations of
    genes for different species

15
Square tree example
  • This is part of the family 1.1 square dendrogram
    of Arabidopsis, rice and maize from our website
  • The red names are Arabidopsis sequence, the black
    names are rice, and the green names are maize
  • Regions alternate between grey shaded and white
    backgrounds (added with Photoshop) to indicate
    clades of similar sequence genes which may relate
    function (such as AUD/SUD or GME, etc)

16
Radial dendrograms
  • TreeDyn can also draw radial dendrograms such as
    the one shown for rice family 1.1
  • This can be done by right clicking on the tree
    area to bring up the grey box in TreeDyn,
    choosing your tree, then Conformation- Radial
  • Treedyn allows you to resize, rotate, and flip
    clades around (see http//www.treedyn.org/ for
    detailed tutorials on these processes)
  • For our site, we export the radial trees as jpeg
    images

17
Finishing a radial dendrogram
The TreeDyn tree jpeg is finished as a FLASH file
where the ovals and family names are added (Rice
family 1.1 shown)
Each individual clade of a family tree is also
prepared in TreeDyn and link buttons added later
in FLASH (AUD/SUD-like shown)
18
Viewing your gene of interest
  • We provide protein sequence information you can
    download and add in your own sequence of interest
    for comparison to these three species
  • Under each tree (family 1.1 shown) is the link
    View the protein sequence file
  • Right click and choose Save Target as to
    download the sequence with a filename and
    location you will remember
  • You can do this for each Arabidopsis, rice, and
    maize family

19
Viewing your gene of interest
  • You may have a sequence you think is related to
    a particular family such as nucleotide
    interconversion pathway (family 1.1)
  • For example, the wheat EST CV523101 from
    Genebank
  • http//www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db
    nucleotidevalCV523101
  • might be related to the TIGR rice gene
    Os05g29990 in the AUD/SUD clade of family 1.1
    according to information from Gramene

20
Viewing your gene of interest
  • You can take the nucleotide sequence and covert
    it to protein sequence using a program such as
    Genemark (http//opal.biology.gatech.edu/GeneMark
    /eukhmm.cgi)
  • Protein sequence returned
  • CV523101_wheat
  • IARIFNTYGPRMCIDDGRVVSNFVAQALRKEPLTVYGDGKQTRSFQYVS
    DLVEGLMRLMEGDHIGPFNLGNPGEFTMLELAKVVQDTIDPNARIEFREN
    TQDDPHKRKPDITKAKEQLGWEPKIALRDGLPLMVTDFRKRIFGDQDSAA
    TATEG

21
Viewing your gene of interest
  • Paste all of the sequences for family 1.1
    (Arabidopsis, rice, and maize) plus the Wheat
    EST, CV523101_wheat, converted to protein into a
    ClustalW program such as
  • http//align.genome.jp/
  • from the Kyoto University Bioinformatics center
  • Perform the multiple alignment, copy the Newick
    tree data generated into a new word file, and
    save a text file as previously shown

22
Viewing your gene of interest
  • Taking the Newick tree from clustalW into TreeDyn
    as previously shown will allow you to visualize
    the tree
  • The AUD/SUD clade of the tree generated by
    TreeDyn shows that the wheat EST (in blue) is
    most closely related to the rice gene Os05g29990
    in the AUD clade

The AUD/SUD clade of the family 1.1 tree for
Arabidopsis (red), Rice (black), Maize (green),
and a wheat EST (blue) added to demonstrate how
you can visualize relatedness of your own genes
using our protein sequences
23
Bioinformatics sites used
  • General
  • Multiple alignment for trees, ClustalW
    (http//align.genome.jp/)
  • Making trees, TreeDyn (http//www.treedyn.org/)
  • BLASTing NCBI (http//www.ncbi.nlm.nih.gov/BLAST/)
  • Proteins translated by GeneMark
    (http//opal.biology.gatech.edu/GeneMark/eukhmm.cg
    i)
  • Rice
  • Sequence BLAST using TIGR (http//www.tigr.org/tdb
    /e2k1/osa1/)
  • Downloading rice protein sequences from TIGR
    (http//www.tigr.org/tdb/e2k1/osa1/LocusNameSearch
    .shtml)
  • Maize
  • Sequence BLAST using TIGR ZmGI (http//www.tigr.or
    g/tigr-scripts/tgi/T_index.cgi?speciesmaize)
  • Sequence BLAST using Gramene (http//www.gramene.o
    rg/multi/blastview)
Write a Comment
User Comments (0)
About PowerShow.com