Title: Genome
1Genome Protein Sequence Analysis Programs
application in establishing Epidemiology and
Variability
RAJESH KUMARPh.D 1st yrDairy Microbiology
DivisionN.D.R.I
2Introduction
- Bio-informatics/Computational Biology-
- Proteomics- Large-scale study of proteins.
- Genomics- study of an organisms genome and use
of genes. - Comparative Genomics- comparison of genomes.
- Structural Genomics- determination of
tridimensional structure of all proteins of a
given organism.
3- Major Research efforts of Bio-informatics-
- Sequence analysis / alignment.
- gene finding.
- genome assembly.
- protein structure alignment.
- protein structure prediction.
- prediction of gene expression and protein-protein
interactions. - modeling of evolution.
4Sequence Analysis Encompasses the use of various
bioinformatic methods to determine the biological
function and structure of genes and the
proteins. DNA sequences ? Decoded ? Stored in
electronic databases
?
Analysis
?
Phylogenetic Tree
Comparative Genomics
?
5Shotgun Sequencing Used in genetics for
sequencing long DNA strands. DNA ?
small segments ? sequenced
?
Computer programs
Sequence Alignment- arrangement of two or more
sequences highlighting their similarity. tcctct
gcctctgccatcat---caaccccaaagt
tcctgtgcatctgcaatcatgggcaacc
ccaaagt
6Structural Alignment More reliable over long
evolutionary distances. Useful in identifying
structurally-conserved regions. Multiple
Alignment extension of pairwise alignment to
incorporate more than two sequences into an
alignment. help in the identification of common
regions between the sequences. Programs Clustal
is used in cladistics to build phylogenetic trees
7Framesearch It is extension of Smith-Waterman,
for pairwise alignment between a protein sequence
and a nucleotide sequence. It dynamically
considers every possible single-nucleotide
insertion or deletion to generate the translation
that best matches the protein sequence.
Software- Ssearch Smith-Waterman remains the
gold standard for protein-protein or
nucleotide-nucleotide pairwise alignment.
8- BLAST
- An algorithm for comparing biological sequences.
- Widely used tools for searching protein and DNA
databases for sequence similarities. - It gives answers of following questions-
- Which bacterial species have a protein that is
related in lineage to a certain protein whose
amino-acid sequence I know? - Where does the DNA that I've just sequenced come
from? - . What other genes encode proteins that exhibit
structures or motifs such as the one I've just
determined?
9- To run, BLAST requires two sequences as input
- a query sequence or target sequence
- a sequence database.
- Search for high scoring sequence alignments.
- Three stages of BLAST-
- 1st stage, BLAST searches for exact matches of
a small fixed length W between the query and
sequences in the database. - 2nd stage, BLAST tries to extend the match in
both directions, starting at the seed. - If a high-scoring ungapped alignment is found,
the database sequence is passed on to 3rd stage .
10- In 3rd stage BLAST performs a gapped alignment
between the query sequence and the database
sequence - Alternative to BLAST is BLAT (Blast Like
Alignment Tool). - FASTA-
- Slower but more sensitive than BLAST.
- DNA and Protein sequence alignment software
package. - The original FASTP program was designed for
protein sequence similarity searching. - FASTA provided a more sophisticated shuffling
program for evaluating statistical significance.
11- Programs in this package-
- "FAST-Aye", and stands for "FAST-All.
- "FAST-P" (protein) alignment.
- "FAST-N" (nucleotide) alignment.
- Current FASTA package contains programs for-
- proteinprotein
- DNADNA.
- Proteintranslated DNA
- Ordered or unordered peptide searches.
- Recent versions of the FASTA package include
special translated search algorithms that
correctly handle frameshift errors when comparing
nucleotide to protein sequence data.
12Clustal Clustal is a widely used multiple
alignment computer program. i)
ClustalW ii) ClustalX Sequence Analysis
Programmes- EMBOSS European Molecular Biology
Open Software Suite (EMBOSS) is a program suite
for nucleic acid and protein sequence
analysis. EMBOSS programs manipulate, analyze,
and display nucleic acid and protein sequences.
Similar in functionality to the commercial GCG
Wisconsin Software.
13PhyloGibbs Designed to identify where these
regulatory molecules bind to DNA. PhyloGibbs
compares DNA from multiple species in order to
identify areas in which the genetic code is
statistically similar and filter segments that
are most likely to be of interest to scientists.
AutoEditor Automated correction of sequencing
and basecaller errors a tool for correcting
sequencing and basecaller errors using sequence
alignment and chromatogram data. On average
AutoEditor corrects 80 of erroneous base calls.
It also greatly improves our ability to
discover SNPs between closely related strains and
isolates of the same species.
14- MUMmer
- System for aligning whole genome sequences. Using
an efficient data structure called a suffix tree,
the system is able rapidly to align sequences
containing millions of nucleotides. - MUMmer 3.0
- Open source.
- Improved efficiency.
- Ability to find non-unique, repetitive matches as
well as unique matches. - New graphical output modules.
- Applications-
- MUMmer 1.0 was used to detect numerous
large-scale inversions in bacterial genomes.
15- MUMmer 2.1 was used to align all human
chromosomes to one another and to detect numerous
large-scale. - PROmer was used to compare the human and mouse
malaria parasites P.falciparium and P.yoelii. - Current use of MUMmer 3.0-
- Identifying SNPs and other mutations in a large
collection of Bacillus anthracis strains. - 2) Comparing different assemblies of the same
genome at different stages of sequencing and
finishing.
16-
- E.coli K12 vs. E.coli O157H7
- S.cerevisiae vs. S.pombe
- A.fumigatus vs. A.nidulans
- P.falciparum vs.P.yoelii
- PSORT WWW Server
- PSORT is a computer program for the prediction of
protein localization sites in cells. - WoLF PSORT
- WoLF PSORT Prediction
- PSORT II (Recommended for animal/yeast sequences)
- PSORT II Users' Manual
- PSORT II Prediction
- PSORT (Old version for bacterial/plant
sequences) - PSORT-B (Recommended for Gram-negative bacteria)
- PSORT-B Prediction
- PSORT-B, a program applicable to the sequences of
Gram-negative bacteria.
17PSORT Prediction Source of Input
Sequence Gram-positive bacterium
Gram-negative bacterium yeast animal plant
Sequence ID (Default is MYSEQ) Enter your
Amino Acid sequence below (by copy paste)
Characters except the standard 20 codes will
be removed off To submit the query, press this
button
Submit
18- PHIRE
-
- This Visual Basic program performs an algorithmic
string-based search on bacteriophage genome
sequences. - Discovering and extracting blocks displaying
sequence similarity, without any prior
experimental or predictive knowledge. - MB Advanced DNA Analysis
- MB is relatively small and easy to use program.
- Main features of MB are
- restriction analysis
- amino acids analysis
- multiple sequence alignment tool
- dot plot
- calculation of molecular weights and chemical
properties of proteins - prediction of 3D structures for small amino acids
sequences.
19UniPro DPview This is a tool for finding and
analyzing matches between genomes.
SEQtools Program package for routine handling
and analysis of DNA and protein sequences. The
package includes general facilities for sequence
and contig editing, restriction enzyme mapping,
translation, and repeat identification. DNA
Club DNA analysis software, Features- remove
vector sequence, find ORF, sequence editing,
translate to protein sequence, protein sequence
editing, RE Map, RE Map with translation, PCR
primer selection, primer or probe evaluation.
20- ZCURVE
- New highly accurate system for recognizing
protein coding genes in bacterial and archaeal
genomes based on the Z curve theory of DNA
sequence. - DNA for Windows
- is a compact, easy to use DNA analysis program,
ideal for small-scale sequencing projects. -
- Webcutter
- is a free on-line tool to help restriction map
nucleotide sequences. - Features-
- a simple, customizable interface
- worldwide platform-independent accessibility via
the web - seamless interfaces to NCBI's GenBank
- DNA sequence database
- restriction enzyme database.
21Multilocus sequence typing (MLST) Compares
sequence variation in numerous housekeeping gene
targets. Developed for Neisseria gonorrhoeae,
Streptococcus pneumoniae, and S. aureus. Based
on the classic multilocus enzyme electrophoresis
(MLEE) method used to study the genetic
variability of a species. Drawbacks- labor-inte
nsive, time-consuming, and costly.
22Single-locus sequence typing(SLST) compares
sequence variation of a single target. provides
an inexpensive, rapid, objective, and portable
genotyping method to subspeciate bacteria.
Using a single target depends on finding a
region for sequencing that is sufficiently
polymorphic to provide useful strain resolution.
Loci with short sequence repeat (SSR) regions
may have suitable variability for discriminating
outbreaks.
23Two S. aureus genes conserved within the species,
protein A (spa) and coagulase (coa), have
variable SSR regions constructed from closely
related 24- and 81-bp tandem repeat units,
respectively. The genetic alterations in SSR
regions include both point mutations and
intragenic recombination that arise by
slipped-strand mispairing during chromosomal
replication and that result in a high degree of
polymorphism.