Essential Bioinformatics and Biocomputing LSM2104: Section I Biological Databases and Bioinformatics - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Essential Bioinformatics and Biocomputing LSM2104: Section I Biological Databases and Bioinformatics

Description:

http://srs1.bic.nus.edu.sg/jnlp/ (nucleic, codon usage, cusp) ... Our test must have sufficient examples, so that we can make reasonable conclusions. ... – PowerPoint PPT presentation

Number of Views:300

Avg rating:3.0/5.0

Slides: 36

Provided by: dbs7

Category:

more less

Transcript and Presenter's Notes

Title: Essential Bioinformatics and Biocomputing LSM2104: Section I Biological Databases and Bioinformatics

1
Essential Bioinformatics and Biocomputing
(LSM2104 Section I) Biological Databases
andBioinformatics SoftwareProf. Chen Yu
ZongTel 6874-6877Email csccyz_at_nus.edu.sghttp
//xin.cz3.nus.edu.sgRoom 07-24, level 7, SOC1,
NUSJanuary 2003
2
Lecture 5 Bioinformatics software

Outline
Types of bioinformatics software
Sequence, pattern and domain
Evolutionary analysis
Visualization
Modeling and prediction (sequence, structure and
function)
Data mining (bibliographic and text searches)
Examples

3
Types of Bioinformatics software

Analysis of biological data/systems and
characterization of molecules and sequences.
Analysis and interpretation of experimental
results
Simulation of laboratory experiments, important
for tackling large scale problems
Predictions that lead to the design of
experiments
Bioinformatics software can be accessed via WWW,
or through integrated software packages (such as
Emboss, GCG, Staden, DNAstar, ). It may be
coupled with databases, or may stand alone.

4
Bioinformatics software

Major sources
Software package at ExPASy Molecular Biology
Server http//www.expasy.org
http//au.expasy.org
Software at PBIL Bio-Informatique Lyonnais
http//pbil.univ-lyon1.fr/
Toolbox at EBI European Bioinformatics Institute
http//www.ebi.ac.uk/Tools/index.html

5
Bioinformatics software

Major types of bioinformatics tools
Sequence analysis tools
Sequence comparison
Pattern and domain search
Evolutionary analysis
Prediction of sequence structure and function
Visualization of molecular structures
Structure modeling
Bibliographic and text searches
Specialized and other tools

6
Bioinformatics software

Sequence analysis tools
This kind of software focuses on extraction and
comparison of properties in DNA and protein
sequences
Sequence analysis provides for identification of
domains, structure, and function, and other
properties
The analysis of individual sequences helps with
sequence comparison
Textbook chapter 5, pages 81-93

7
Bioinformatics software

Sequence analysis tools
This kind of software focuses on extraction and
comparison of DNA and protein sequence
properties such as
composition of nucleotide or protein sequences
codon usage in DNA
translation and backtranslation
Textbook chapter 5, pages 81-93

8
Bioinformatics software

Composition of nucleotide or protein sequences
Composition (frequency of occurrence of a
nucleotide or of an amino acid) is the most basic
analysis. It can give us important functional and
structural clues.
For example, CG-rich regions called CpG islands
are often found in promoters. A short region just
before the splice site at the end of introns
often has high CT content.

9
Bioinformatics software

Composition of protein and DNA sequences
Web
NPS_at_ Network Protein Sequence _at_nalysis
http//npsa-pbil.ibcp.fr/ (Amino-acid
composition)
AA Composition http//molbiol.soton.ac.uk/compute/
aacomp.html
JEMBOSS (in our own laboratory)
http//srs1.bic.nus.edu.sg/jnlp/ (nucleic,
composition, compseq)

10
Bioinformatics software
11
Bioinformatics software
12
Bioinformatics software

Codon usage in DNA
Web
Count-codon program in Codon Usage Database
http//www.kazusa.or.jp/codon/countcodon.html
(needs start and stop codons at the start and the
end of the sequence)
Tool for Gene to Codon Usage Table
http//www.entelechon.com/eng/genetocut.html
(does not care about start and stop codons)
JEMBOSS (in the laboratory)
http//srs1.bic.nus.edu.sg/jnlp/ (nucleic, codon
usage, cusp)
DNA coding region should have only one stop codon

13
Bioinformatics software
14
Bioinformatics software
15
Bioinformatics software

Translation (DNA to protein) and back translation
(protein to DNA)
Web
Translate tool at ExPASy http//au.expasy.org/tool
s/dna.html (DNA to protein)
JEMBOSS (in the laboratory)
http//srs1.bic.nus.edu.sg/jnlp/ (DNA to protein
and reverse)
(nucleic, translation, transeq nucleic,
translation, backtranseq)
If we translate and back translate the same
sequence we will typically
not get the same sequence as the starting one.

16
Bioinformatics Software

Sequence comparison (the most important software)
This will be taught next month by A/P Tan Tin
Wee.
Web
Local alignment (BLAST, FASTA)
http//www.ebi.ac.uk/fasta33/
http//www.ncbi.nlm.nih.gov/BLAST/
http//www.ebi.ac.uk/blast2/
Multiple alignment (Clustal W)
http//www.ebi.ac.uk/clustalw/index.html
JEMBOSS (in the laboratory)
http//srs1.bic.nus.edu.sg/jnlp/
Local alignment Smith-Waterman (alignment,
local, water)
Global alignment Needleman-Wunsh (alignment,
global, needle)

17
Bioinformatics software

Evolutionary analysis
Multiple sequence alignments can be used as
measures of evolutionary distance between
proteins. The phylogeny systems are used to
represent evolutionary distances between
sequences.
WebPhylip
http//sdmc.krdl.org.sg8080/lxzhang/phylip/
GeneBee
http//www.genebee.msu.su/services/phtree_reduced.
html
Read textbook, page 83.

18
Bioinformatics software
19
Bioinformatics software

Prediction of sequence structure and function
Sequences that have similar structure often have
similar function. For many sequences we can
extract secondary and tertiary structure from the
PDB database.
What if our sequence is not in the PDB? We can
predict structure of a biological sequence using
appropriate software.
There are several programs for prediction of
secondary structure. For prediction of tertiary
structure we can do modelling.
http//npsa-pbil.ibcp.fr (PHD method for
secondary structure prediction)

20
Bioinformatics software

Secondary structure prediction

21
Bioinformatics software

Secondary structure prediction
The PHD program predicted four alpha helices in
the human IL-2 (red). The number of helices is
correct, but their lengths and boundaries are not
correct (purple).
When we make a prediction in bioinformatics, we
must have an idea about the accuracy of
prediction programs.
To assess the accuracy of a program, we can test
it with known data. Our test must have sufficient
examples, so that we can make reasonable
conclusions.

22
Secondary structure prediction Bioinformatics
software

alpha Lactalbumin PDB 1A4V
http//npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?p
age/NPSA/npsa_server.html

23
Bioinformatics software

We used nine different programs for prediction of
secondary structure of alphaLactalbumin (PDB
1A4V).
The results show that the best predictions for
this molecule were from Predator, while DSC was
the laggard.
This test does not mean that Predator is the best
of the tested programs, nor that DSC is the
worst. To make such conclusions we must make test
set first. The test set should contain the
examples from the family of proteins that our
query protein belongs to.
The learning point none of the prediction
programs (and this applies across all
bioinformatics software, not only secondary
structure prediction) is 100 accurate. The users
must be cautious when interpreting results from
the predictive software.

24
Bioinformatics software

Common measure (other measures also exist)
Sensitivity SETP/(TPFN)
Specificity SPTN/(TNFP)
For example, prediction of binding peptides to a
particular receptor
Experimental Predicted
Class
Example 1 Binder Binder
True positive (TP)
Example 2 Non-binder Non-binder
True negative (TN)
Example 3 Binder Non-binder
False negative (FN)
Example 4 Non-binder Binder
False positive (FP)
Prediction system that has SE0.8 and SP0.9 will
correctly predict 8 of 10 experimental positives,
and for each 10 experimental negatives it will
make one false prediction. This prediction
accuracy may be very good for prediction of
peptide binding, but is not very good for some
other predictions, for example gene prediction.

25
Bioinformatics software

Prediction of 3-D structure
Various modelling programs
comparative modelling, using known structures as
templates
ab initio modelling, using atomic simulation,
residue statistics, etc.
These methods will be covered later in the course
An example of the comparative modelling software
is SWISS-MODEL http//www.expasy.org/swissmod/SWIS
S-MODEL.html
This model is provided by email.
This tool has the facility for assessing the
quality of predictions

26
Bioinformatics software
27
Bioinformatics software
28
Bioinformatics software
29
Bioinformatics software

Software for visualisation of 3-D structures.
Provides different views to 3-D molecular
structure, which will be taught by A/P Shoba.
Chime, Rasmol (they use files in PDB format)
Scorpion database uses Chime. Chime can be
downloaded from http//www.mdli.com/downloads/dow
nloads.html?uidkeyid1

30
Bioinformatics software
31
Bioinformatics software
32
Bioinformatics software

Text searches
Text searching software is used associated with
databases. Most commonly we search by keywords or
combinations of keywords.
Examples of PubMed searches
Diabetes 181,672
matches
Diabetes AND IDDM 35,841
Diabetes AND IDDM AND autoimmunity 1,109
Diabetes OR autoimmunity 190,674
DiabetesTitle/Abstract 114,624
The last example is more advanced PubMed option
preview/index

33
Bioinformatics software Summary of Todays
lecture

Why bioinformatics software?
Types of software sequence, motif, evolution,
visualization, structural modeling, simulation,
test search.
Examples of selected software
Sequence composition
DNA-protein sequence translation
Evolutionary analysis
Protein secondary structure prediction
Comparative modeling
Text search
To be taught later Sequence comparison,
visualization etc.

34
Summary of the SectionBiological databases and
bioinformatics software

We first focused on biological databases. We
covered topics
discussed types of biological databases
briefly described popular databases
structure of the GenBank and SWISS-PROT entries
searching biological databases
types of questions that can be answered by
searching databases
completeness and errors in the databases

35
Summary of the SectionBiological databases and
bioinformatics software

The second topic was bioinformatics software. We
covered
why do we need bioinformatics software?
briefly described major types of bioinformatics
software
described software for sequence composition,
codon usage, translation and backtranslation
introduced the concept of sequence alignment,
evolutionary analysis
secondary and tertiary structure prediction,
molecular visualization
accuracy of prediction software
text searching