Genome - PowerPoint PPT Presentation

About This Presentation
Title:

Genome

Description:

Genome & Protein Sequence Analysis Programs application in establishing Epidemiology and Variability RAJESH KUMAR Ph.D 1st yr Dairy Microbiology Division – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 24
Provided by: DrSKt
Category:
Tags: analysis | genome

less

Transcript and Presenter's Notes

Title: Genome


1
Genome Protein Sequence Analysis Programs
application in establishing Epidemiology and
Variability

RAJESH KUMARPh.D 1st yrDairy Microbiology
DivisionN.D.R.I
2
Introduction
  • Bio-informatics/Computational Biology-
  • Proteomics- Large-scale study of proteins.
  • Genomics- study of an organisms genome and use
    of genes.
  • Comparative Genomics- comparison of genomes.
  • Structural Genomics- determination of
    tridimensional structure of all proteins of a
    given organism.

3
  • Major Research efforts of Bio-informatics-
  • Sequence analysis / alignment.
  • gene finding.
  • genome assembly.
  • protein structure alignment.
  • protein structure prediction.
  • prediction of gene expression and protein-protein
    interactions.
  • modeling of evolution.

4
Sequence Analysis Encompasses the use of various
bioinformatic methods to determine the biological
function and structure of genes and the
proteins. DNA sequences ? Decoded ? Stored in
electronic databases
?
Analysis
?
Phylogenetic Tree
Comparative Genomics
?
5
Shotgun Sequencing Used in genetics for
sequencing long DNA strands. DNA ?
small segments ? sequenced
?
Computer programs
Sequence Alignment- arrangement of two or more
sequences highlighting their similarity. tcctct
gcctctgccatcat---caaccccaaagt
tcctgtgcatctgcaatcatgggcaacc
ccaaagt
6
Structural Alignment More reliable over long
evolutionary distances. Useful in identifying
structurally-conserved regions. Multiple
Alignment extension of pairwise alignment to
incorporate more than two sequences into an
alignment. help in the identification of common
regions between the sequences. Programs Clustal
is used in cladistics to build phylogenetic trees
7
Framesearch It is extension of Smith-Waterman,
for pairwise alignment between a protein sequence
and a nucleotide sequence. It dynamically
considers every possible single-nucleotide
insertion or deletion to generate the translation
that best matches the protein sequence.
Software- Ssearch Smith-Waterman remains the
gold standard for protein-protein or
nucleotide-nucleotide pairwise alignment.
8
  • BLAST
  • An algorithm for comparing biological sequences.
  • Widely used tools for searching protein and DNA
    databases for sequence similarities.
  • It gives answers of following questions-
  • Which bacterial species have a protein that is
    related in lineage to a certain protein whose
    amino-acid sequence I know?
  • Where does the DNA that I've just sequenced come
    from?
  • . What other genes encode proteins that exhibit
    structures or motifs such as the one I've just
    determined?

9
  • To run, BLAST requires two sequences as input
  • a query sequence or target sequence
  • a sequence database.
  • Search for high scoring sequence alignments.
  • Three stages of BLAST-
  • 1st stage, BLAST searches for exact matches of
    a small fixed length W between the query and
    sequences in the database.
  • 2nd stage, BLAST tries to extend the match in
    both directions, starting at the seed.
  • If a high-scoring ungapped alignment is found,
    the database sequence is passed on to 3rd stage .

10
  • In 3rd stage BLAST performs a gapped alignment
    between the query sequence and the database
    sequence
  • Alternative to BLAST is BLAT (Blast Like
    Alignment Tool).
  • FASTA-
  • Slower but more sensitive than BLAST.
  • DNA and Protein sequence alignment software
    package.
  • The original FASTP program was designed for
    protein sequence similarity searching.
  • FASTA provided a more sophisticated shuffling
    program for evaluating statistical significance.

11
  • Programs in this package-
  • "FAST-Aye", and stands for "FAST-All.
  • "FAST-P" (protein) alignment.
  • "FAST-N" (nucleotide) alignment.
  • Current FASTA package contains programs for-
  • proteinprotein
  • DNADNA.
  • Proteintranslated DNA
  • Ordered or unordered peptide searches.
  • Recent versions of the FASTA package include
    special translated search algorithms that
    correctly handle frameshift errors when comparing
    nucleotide to protein sequence data.

12
Clustal Clustal is a widely used multiple
alignment computer program. i)
ClustalW ii) ClustalX Sequence Analysis
Programmes- EMBOSS European Molecular Biology
Open Software Suite (EMBOSS) is a program suite
for nucleic acid and protein sequence
analysis. EMBOSS programs manipulate, analyze,
and display nucleic acid and protein sequences.
Similar in functionality to the commercial GCG
Wisconsin Software.
13
PhyloGibbs Designed to identify where these
regulatory molecules bind to DNA. PhyloGibbs
compares DNA from multiple species in order to
identify areas in which the genetic code is
statistically similar and filter segments that
are most likely to be of interest to scientists.
AutoEditor Automated correction of sequencing
and basecaller errors a tool for correcting
sequencing and basecaller errors using sequence
alignment and chromatogram data. On average
AutoEditor corrects 80 of erroneous base calls.
It also greatly improves our ability to
discover SNPs between closely related strains and
isolates of the same species.
14
  • MUMmer
  • System for aligning whole genome sequences. Using
    an efficient data structure called a suffix tree,
    the system is able rapidly to align sequences
    containing millions of nucleotides.
  • MUMmer 3.0
  • Open source.
  • Improved efficiency.
  • Ability to find non-unique, repetitive matches as
    well as unique matches.
  • New graphical output modules.
  • Applications-
  • MUMmer 1.0 was used to detect numerous
    large-scale inversions in bacterial genomes.

15
  • MUMmer 2.1 was used to align all human
    chromosomes to one another and to detect numerous
    large-scale.
  • PROmer was used to compare the human and mouse
    malaria parasites P.falciparium and P.yoelii.
  • Current use of MUMmer 3.0-
  • Identifying SNPs and other mutations in a large
    collection of Bacillus anthracis strains.
  • 2) Comparing different assemblies of the same
    genome at different stages of sequencing and
    finishing.

16
  •  
  • E.coli K12 vs. E.coli O157H7
  • S.cerevisiae vs. S.pombe
  • A.fumigatus vs. A.nidulans
  •  P.falciparum vs.P.yoelii
  • PSORT WWW Server
  • PSORT is a computer program for the prediction of
    protein localization sites in cells.
  • WoLF PSORT
  • WoLF PSORT Prediction
  • PSORT II (Recommended for animal/yeast sequences)
  • PSORT II Users' Manual
  • PSORT II Prediction
  • PSORT (Old version for bacterial/plant
    sequences)
  • PSORT-B (Recommended for Gram-negative bacteria)
  • PSORT-B Prediction
  • PSORT-B, a program applicable to the sequences of
    Gram-negative bacteria.

17
PSORT Prediction Source of Input
Sequence   Gram-positive bacterium
Gram-negative bacterium yeast animal plant
  Sequence ID (Default is MYSEQ) Enter your
Amino Acid sequence below (by copy paste)  
Characters except the standard 20 codes will
be removed off To submit the query, press this
button
Submit
18
  • PHIRE
  • This Visual Basic program performs an algorithmic
    string-based search on bacteriophage genome
    sequences.
  • Discovering and extracting blocks displaying
    sequence similarity, without any prior
    experimental or predictive knowledge.
  • MB Advanced DNA Analysis
  • MB is relatively small and easy to use program.
  • Main features of MB are
  • restriction analysis
  • amino acids analysis
  • multiple sequence alignment tool
  • dot plot
  • calculation of molecular weights and chemical
    properties of proteins
  • prediction of 3D structures for small amino acids
    sequences.

19
UniPro DPview This is a tool for finding and
analyzing matches between genomes.
SEQtools Program package for routine handling
and analysis of DNA and protein sequences. The
package includes general facilities for sequence
and contig editing, restriction enzyme mapping,
translation, and repeat identification. DNA
Club DNA analysis software, Features- remove
vector sequence, find ORF, sequence editing,
translate to protein sequence, protein sequence
editing, RE Map, RE Map with translation, PCR
primer selection, primer or probe evaluation.
20
  • ZCURVE
  • New highly accurate system for recognizing
    protein coding genes in bacterial and archaeal
    genomes based on the Z curve theory of DNA
    sequence.
  • DNA for Windows
  • is a compact, easy to use DNA analysis program,
    ideal for small-scale sequencing projects. 
  •  
  • Webcutter
  • is a free on-line tool to help restriction map
    nucleotide sequences.
  • Features-
  • a simple, customizable interface
  • worldwide platform-independent accessibility via
    the web
  • seamless interfaces to NCBI's GenBank
  • DNA sequence database
  • restriction enzyme database.

21
Multilocus sequence typing (MLST) Compares
sequence variation in numerous housekeeping gene
targets. Developed for Neisseria gonorrhoeae,
Streptococcus pneumoniae, and S. aureus. Based
on the classic multilocus enzyme electrophoresis
(MLEE) method used to study the genetic
variability of a species. Drawbacks- labor-inte
nsive, time-consuming, and costly.
22
Single-locus sequence typing(SLST) compares
sequence variation of a single target. provides
an inexpensive, rapid, objective, and portable
genotyping method to subspeciate bacteria.
Using a single target depends on finding a
region for sequencing that is sufficiently
polymorphic to provide useful strain resolution.
Loci with short sequence repeat (SSR) regions
may have suitable variability for discriminating
outbreaks.
23
Two S. aureus genes conserved within the species,
protein A (spa) and coagulase (coa), have
variable SSR regions constructed from closely
related 24- and 81-bp tandem repeat units,
respectively. The genetic alterations in SSR
regions include both point mutations and
intragenic recombination that arise by
slipped-strand mispairing during chromosomal
replication and that result in a high degree of
polymorphism.
Write a Comment
User Comments (0)
About PowerShow.com