Sequence Analysis Tools Introduction - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Sequence Analysis Tools Introduction

Description:

The operations are colour coded to suggest which of the category of bioinformatics tool discussed above they would typically employ. – PowerPoint PPT presentation

Number of Views:575
Avg rating:3.0/5.0
Slides: 37
Provided by: hum6
Category:

less

Transcript and Presenter's Notes

Title: Sequence Analysis Tools Introduction


1
Sequence Analysis ToolsIntroduction
2
Bioinformatics
a definition ?
The design, construction and use of software
tools to generate, store, annotate, access and
analyse data and information relating to
Molecular Biology
Here we consider the use of Bioinformatics tools
rather than their design and construction
Here we consider the access and analysis of data
and information items rather than their
generation, storage or annotation
3
Introduction of sequence analysis toolsDifferent
analysis tools
  • Standard Unix tools (e.g., the grep family, sed,
    awk, and cut).
  • Publicly available tools (e.g., BLAST, the EMBOSS
    package,DNAstar, vector NTI).
  • Open source libaries (e.g., BioPerl, BioJava,
    BioPython, BioRuby).
  • Custom tools.

4
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
5
The field of sequence analysis
  • Many of the software tools used in studying
    sequence analysis, which is one of the many
    subfields of computational molecular biology. The
    field of sequence analysis includes
  • Primer Design
  • Pattern and motif searching
  • Sequence comparison
  • Multiple sequence alignment
  • Sequence composition determination
  • Secondary structure and 3D prediction
  • Phylogenetic analysis

6
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
7
Software Tools for Sequence Analysis
WWW Resources
Database Retrieval
Sequence Retrieval System
Retrieves MUCH more than sequences
Core elements free to academic sites
Implemented in many places
It is possible to integrate analysis tools
Elements of SRS are incorporated into EMBOSS
8
Software Tools for Sequence Analysis
WWW Resources
Database Retrieval
Retrieves MUCH more than sequences
Access to NCBI databases only
Entrez client software available by anonymous ftp
Most general packages include tools to access
local sequence databases
EMBOSS programs can access sequences from remote
SRS servers
9
Readseq
  • Readseq is a classic sequence format convert
    tools.
  • 1989. Developed by Don Gilbert,
  • Functionthis program reads and writes nucleotide
    and protein sequences in many useful formats.
  • To run Readseq use
  • java -cp readseq.jar run options inputfiles
  • Supported formats GCG,fasta,genbank embl,msf

10
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
11
Software Tools for Sequence Analysis
Specialised Packages
Sequencing Project Management
Free academic licence
The Phred - Phrap Package By Phil Green et al
Excellent base call confidence estimation (phred)
Excellent large scale contig assembler (phrap)
Available by anonymous ftp
Excellent GUI
Excellent contig editor
Excellent finishing tools
Simple confidence estimation Contig assembler
not good for big projects BUT phred and phrap can
be accessed from Staden GUI
12
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
13
Primer Design
  • Oligo 6 (????)
  • Premier Primer (????)
  • Vector NTI Suit
  • Dnasis
  • Omiga
  • Dnastar
  • Primer3 (????)

14
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
15
Software Tools for Sequence Analysis
Specialised Packages
DNA/RNA Folding
Free for academic use
Can be installed locally or run via a WWW page
Incorporated into the GCG general package
Michael Zukers Programs
Protein Structure Analysis
Nominal fee for academic use
LINUX, IRIX, Windows
Whatif by Gert Vriend
16
Software Tools for Sequence Analysis
Specialised Packages
Protein Structure Analysis for very rich people
IRIX, HP-UX, LINUX
IRIX, AIX, LINUX
Both systems are very impressive _at_ very expensive
17
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
18
http//evolution.genetics.washington.edu/phylip/so
ftware.html
Here are some 195 of the phylogeny packages, and
18 free servers, that I know about. It is an
attempt to be completely comprehensive.
19
Software Tools for Sequence Analysis
Specialised Packages
Phylogeny
Available by anonymous ftp
Windows, Macintosh, UNIX
Incorporated into the EMBOSS general package
Commercial, but reasonable
UNIX, VMS, DOS and windows
Incorporated into the GCG general package
20
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
21
Software Tools for Sequence Analysis
WWW Resources
Database Similarity Searching
Very popular, very widely available
Not sensitive But extremely fast
Popular, widely available
Not sensitive much slower than blast
Can be installed locally or run via a WWW page
Available by anonymous ftp (blast, fasta)
BOTH blast fasta
DNA/Protein query V DNA/Protein database
Incorporated into the GCG general package
22
Clustal
  • Clustal is a general-purpose multiple sequence
    alignment program for nucleotide sequences or
    proteins.
  • FunctionIt produces biologically meaningful
    multiple sequence alignments of divergent
    sequences. It calculates the best match for the
    selected sequences, and lines them up so that the
    identities, similarities and differences can be
    seen.

23
Clustal
  • Download address
  • ftp//ftp-igbmc.u-strasbg.fr/pub/ClustalX/clustalx
    1.8.msw.zip
  • Clustalw for dos
  • Clustalx for windows
  • Clustalv for unix or linux

24
Sequence Analysis an Overview
Sequencing Project Management
Database Retrieval
Restriction Mapping
Primer Design
DNA/RNA Folding
Nucleic Acid Sequence Analysis
Database Retrieval
Seeking Coding regions
Database Similarity Searching
Translation to amino acids
Pairwise Sequence Comparison
Multiple Sequence Alignment
Protein Sequence analysis
Prediction of Function
Structure prediction
Phylogeny
Motifs and Patterns
Structure analysis
25
HMMER
  • HMMER is a collection of programs that create a
    hidden Markov model (HMM) of a sequence family
    which can be utilized as a query against a
    sequence database to identify (and/or align)
    additional homologs of the sequence family.
  • HMMER was developed by Sean Eddy at Washington
    University.

26
HMMER
  • Download address
  • ftp//ftp.genetics.wustl.edu/pub/eddy/hmmer/2.2g/h
    mmer-2.2g.bin.dos-cygwin.zip
  • Different for Linux\Solaris\MAC\IRIX\dos
  • ???????win\dos?????,?????dos ??????unix???dos
    ?????????????????

27
HMMER
28
Application of HMM
  • PfamProtein families database of alignments and
    HMMs
  • Pfam is a collection of protein families and
    domains. Pfam contains multiple protein
    alignments and profile-HMMs of these families.
    Pfam is a semi-automatic protein family database,
    which aims to be comprehensive as well as
    accurate.
  • http//www.sanger.ac.uk/Software/Pfam/

29
Application of HMMpfam
30
Application of HMM
  • TMHMMPrediction of transmembrane helices in
    proteins
  • http//www.cbs.dtu.dk/services/TMHMM/

31
The MEME/MAST
  • Motif Discovery and Search tools
  • MEMEDiscover motifs (highly conserved regions)
    in groups of related DNA or protein sequences.
  • MASTSearch sequence databases using motifs.
  • http//meme.sdsc.edu/meme/website/intro.html

32
Software Tools for Sequence Analysis
General Packages
Commercial
UNIX only
WWW and X GUIs
Comprehensive
Widely available
Open source
UNIX only
Several GUIs (java, WWW, X)
Comprehensive
Similar structure to the GCG package
Open source
Windows, MacOS X, UNIX
Excellent GUI including interactive graphical
output
Not comprehensive but allows access to EMBOSS
33
Genetics Computer Group
Molecular biologists worldwide use the GCG
Wisconsin Package as their software of choice
for comprehensive sequence analysis. The
Wisconsin Package meets research
34
Founded in 1982 as a service of the Department of
Genetics at the University of Wisconsin, GCG
became a private company in 1990 and was acquired
by Oxford Molecular Group in 1997. The company
was one of the pioneers of bioinformatics and its
Wisconsin Package sequence analysis tools are
widely used and well regarded throughout the
pharmaceutical and biotechnology industries and
in academia.
35
EMBOSS
  • EMBOSS (European Molecular Biology Open Software
    Suite) is an open source package of sequence
    analysis tools. This software covers a wide range
    of functionality and can handle data in a variety
    of formats
  • Download address
  • ftp//ftp.uk.embnet.org/pub/EMBOSS/EMBOSS-2.8.0.ta
    r.gz
  • Only linux/unix version no version for win/dos

36
  • Totally 150 programs
  • Sequence alignment.
  • Rapid database searching with sequence patterns.
  • Protein motif identification, including domain
    analysis.
  • Nucleotide sequence pattern analysis, for example
    to identify CpG islands or repeats.
  • Codon usage analysis for small genomes.
  • Rapid identification of sequence patterns in
    large scale sequence sets.
  • Presentation tools for publication.
Write a Comment
User Comments (0)
About PowerShow.com