The Poor Beginners - PowerPoint PPT Presentation

About This Presentation
Title:

The Poor Beginners

Description:

a text editor (Notepad or better) public databases of genomic sequences ... Your phylogeny cannot be better than your alignment. Gaps are no data. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 24
Provided by: rskcv
Category:

less

Transcript and Presenter's Notes

Title: The Poor Beginners


1
The Poor Beginners Guide to Bioinformatics
2
What we have and dont have...
  • a computer connected to the Internet (incl. Web
    browser)
  • a text editor (Notepad or better)
  • public databases of genomic sequences
  • public databases of cDNA EST
  • public databases of protein sequences, structures
    and motifs
  • money for specialised software packages
  • public servers capable of (almost) anything we
    wish to do

3
Dealing with a sequence model tasks
  • basic (DNA) sequence manipulation restriction
    analysis, translation
  • sequence similarity and pattern/motif searches
  • gene building modelling exon-intron structures
  • protein domain searches,structure analysis
  • construction and interpretation of sequence
    alignments

4
Notes on basic sequence handling
  • Make sure you have the correct format.
  • FASTA format is (almost) always correct.
  • gtsequencename
  • thisisasequenceinfastaformat
  • If not, you can always use raw data.
  • If things dont work, check for gaps in sequence,
    empty lines, and file extension.
  • BEWARE OF MICROSOFT!

5
(No Transcript)
6
Model tasks continued
  • basic (DNA) sequence manipulation restriction
    analysis, translation
  • sequence similarity and pattern/motif searches
  • gene building modelling exon-intron structures
  • protein domain searches,structure analysis
  • construction and interpretation of sequence
    alignments

7
Defining a gene family
  • By overall domain structure
  • By domain sequence
  • Based on a peptide motif

L-X-X-G-N-X-ML-N
8
Sequence comparison-based searches
  • Entrez related sequences
  • easy identification of false starts
  • no organism selection
  • BLAST/FASTA
  • all DNA/protein combinations
  • taxonomy selection possible
  • statistical data provided
  • domain structure comparison available
  • divergent motifs may be missed

Two methods are better than one.
9
Notes on all sequence comparisons, searches,
alignments
  • Start with defaults (the authors know what they
    are doing)
  • BUT dont be afraid to vary the parameters
  • Chose a reasonable scoring matrix
  • Distant sequences low BLOSUM, high PAM
  • Closely related sequences low PAM, high BLOSUM

10
(No Transcript)
11
Motif-based searches
  • sensitive
  • no statistics
  • only protein databases can be searched
  • TAIR PatMatch
  • Arabidopsis - specific
  • Problematic user interface
  • ISREC - INSECTS
  • admirable technology
  • access to SwissProt and TrEMBL
  • no organism selection

12
(No Transcript)
13
Model tasks continued
  • basic (DNA) sequence manipulation restriction
    analysis, translation
  • sequence similarity and pattern/motif searches
  • gene building modelling exon-intron structures
  • protein domain searches,structure analysis
  • construction and interpretation of sequence
    alignments

14
Some genes are more alike than others
  • A number of splicing prediction servers available
  • Agreement of different methods is a good sign but
    no absolute measure
  • Always align ESTs if possible
  • Beware of non-conventional intron boundaries
    (GC-AG instead of GT-AG)
  • Plant data for transcription start/factor binding
    sites prediction are limited

15
Model tasks continued
  • basic (DNA) sequence manipulation restriction
    analysis, translation
  • sequence similarity and pattern/motif searches
  • gene building modelling exon-intron structures
  • protein domain searches,structure analysis
  • construction and interpretation of sequence
    alignments

16
Searching for known domains/motifs
  • Searching for PROSITE patterns allowing
    ambiguities
  • PROSITE and Pfam profile searches
  • SMART, CDsearch (domains and more)

17
(No Transcript)
18
Predicting protein localisation
  • predicting signal peptides/anchors
  • 2 methods available
  • possibility to predict organelle localisation
  • transmembrane segments prediction

19
Model tasks continued
  • basic (DNA) sequence manipulation restriction
    analysis, translation
  • sequence similarity and pattern/motif searches
  • gene building modelling exon-intron structures
  • protein domain searches,structure analysis
  • construction and interpretation of sequence
    alignments

20
Alignment manual or automated?
  • objective results
  • a number of servers available
  • recommended for well-conserved proteins
  • empiric parameters (e.g. gap penalties)
  • bad for divergent sequences
  • locally installed, free, for Mac and PC
  • interactive domain definition
  • statistical data provided
  • may produce false-positive blocks (read the
    on-line manual!)

21
Phylogenetic analyses
  • Two methods are better than one.
  • Your phylogeny cannot be better than your
    alignment.
  • Gaps are no data.
  • Allways do bootstrapping (100-500 cycles)
  • Certain questions cannot be answered from an
    unrooted tree.

22
Points to take off...
  • go to the Bioinformatics page http//www2.rhul.ac.
    uk/ujba110/Bioinfo.htm
  • select your exercise (A,B,C,D,E)
  • and enjoy it!
  • If you mean it seriously
  • create your own bookmarks (seed provided on the
    course web page)

23
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com