Essential Bioinformatics and Biocomputing LSM2104: Section I Biological Databases and Bioinformatics - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Essential Bioinformatics and Biocomputing LSM2104: Section I Biological Databases and Bioinformatics

Description:

http://srs1.bic.nus.edu.sg/jnlp/ (nucleic, codon usage, cusp) ... Our test must have sufficient examples, so that we can make reasonable conclusions. ... – PowerPoint PPT presentation

Number of Views:300
Avg rating:3.0/5.0
Slides: 36
Provided by: dbs7
Category:

less

Transcript and Presenter's Notes

Title: Essential Bioinformatics and Biocomputing LSM2104: Section I Biological Databases and Bioinformatics


1
Essential Bioinformatics and Biocomputing
(LSM2104 Section I) Biological Databases
andBioinformatics SoftwareProf. Chen Yu
ZongTel 6874-6877Email csccyz_at_nus.edu.sghttp
//xin.cz3.nus.edu.sgRoom 07-24, level 7, SOC1,
NUSJanuary 2003
2
Lecture 5 Bioinformatics software
  • Outline
  • Types of bioinformatics software
  • Sequence, pattern and domain
  • Evolutionary analysis
  • Visualization
  • Modeling and prediction (sequence, structure and
    function)
  • Data mining (bibliographic and text searches)
  • Examples

3
Types of Bioinformatics software
  • Analysis of biological data/systems and
    characterization of molecules and sequences.
  • Analysis and interpretation of experimental
    results
  • Simulation of laboratory experiments, important
    for tackling large scale problems
  • Predictions that lead to the design of
    experiments
  • Bioinformatics software can be accessed via WWW,
    or through integrated software packages (such as
    Emboss, GCG, Staden, DNAstar, ). It may be
    coupled with databases, or may stand alone.

4
Bioinformatics software
  • Major sources
  • Software package at ExPASy Molecular Biology
    Server http//www.expasy.org
    http//au.expasy.org
  • Software at PBIL Bio-Informatique Lyonnais
    http//pbil.univ-lyon1.fr/
  • Toolbox at EBI European Bioinformatics Institute
    http//www.ebi.ac.uk/Tools/index.html

5
Bioinformatics software
  • Major types of bioinformatics tools
  • Sequence analysis tools
  • Sequence comparison
  • Pattern and domain search
  • Evolutionary analysis
  • Prediction of sequence structure and function
  • Visualization of molecular structures
  • Structure modeling
  • Bibliographic and text searches
  • Specialized and other tools

6
Bioinformatics software
  • Sequence analysis tools
  • This kind of software focuses on extraction and
  • comparison of properties in DNA and protein
    sequences
  • Sequence analysis provides for identification of
    domains, structure, and function, and other
    properties
  • The analysis of individual sequences helps with
    sequence comparison
  • Textbook chapter 5, pages 81-93

7
Bioinformatics software
  • Sequence analysis tools
  • This kind of software focuses on extraction and
  • comparison of DNA and protein sequence
  • properties such as
  • composition of nucleotide or protein sequences
  • codon usage in DNA
  • translation and backtranslation
  • Textbook chapter 5, pages 81-93

8
Bioinformatics software
  • Composition of nucleotide or protein sequences
  • Composition (frequency of occurrence of a
    nucleotide or of an amino acid) is the most basic
    analysis. It can give us important functional and
    structural clues.
  • For example, CG-rich regions called CpG islands
    are often found in promoters. A short region just
    before the splice site at the end of introns
    often has high CT content.

9
Bioinformatics software
  • Composition of protein and DNA sequences
  • Web
  • NPS_at_ Network Protein Sequence _at_nalysis
    http//npsa-pbil.ibcp.fr/ (Amino-acid
    composition)
  • AA Composition http//molbiol.soton.ac.uk/compute/
    aacomp.html
  • JEMBOSS (in our own laboratory)
  • http//srs1.bic.nus.edu.sg/jnlp/ (nucleic,
    composition, compseq)

10
Bioinformatics software
11
Bioinformatics software
12
Bioinformatics software
  • Codon usage in DNA
  • Web
  • Count-codon program in Codon Usage Database
    http//www.kazusa.or.jp/codon/countcodon.html
    (needs start and stop codons at the start and the
    end of the sequence)
  • Tool for Gene to Codon Usage Table
    http//www.entelechon.com/eng/genetocut.html
  • (does not care about start and stop codons)
  • JEMBOSS (in the laboratory)
  • http//srs1.bic.nus.edu.sg/jnlp/ (nucleic, codon
    usage, cusp)
  • DNA coding region should have only one stop codon

13
Bioinformatics software
14
Bioinformatics software
15
Bioinformatics software
  • Translation (DNA to protein) and back translation
  • (protein to DNA)
  • Web
  • Translate tool at ExPASy http//au.expasy.org/tool
    s/dna.html (DNA to protein)
  • JEMBOSS (in the laboratory)
  • http//srs1.bic.nus.edu.sg/jnlp/ (DNA to protein
    and reverse)
  • (nucleic, translation, transeq nucleic,
    translation, backtranseq)
  • If we translate and back translate the same
    sequence we will typically
  • not get the same sequence as the starting one.

16
Bioinformatics Software
  • Sequence comparison (the most important software)
  • This will be taught next month by A/P Tan Tin
    Wee.
  • Web
  • Local alignment (BLAST, FASTA)
  • http//www.ebi.ac.uk/fasta33/
  • http//www.ncbi.nlm.nih.gov/BLAST/
  • http//www.ebi.ac.uk/blast2/
  • Multiple alignment (Clustal W)
  • http//www.ebi.ac.uk/clustalw/index.html
  • JEMBOSS (in the laboratory)
  • http//srs1.bic.nus.edu.sg/jnlp/
  • Local alignment Smith-Waterman (alignment,
    local, water)
  • Global alignment Needleman-Wunsh (alignment,
    global, needle)

17
Bioinformatics software
  • Evolutionary analysis
  • Multiple sequence alignments can be used as
    measures of evolutionary distance between
    proteins. The phylogeny systems are used to
    represent evolutionary distances between
    sequences.
  • WebPhylip
  • http//sdmc.krdl.org.sg8080/lxzhang/phylip/
  • GeneBee
  • http//www.genebee.msu.su/services/phtree_reduced.
    html
  • Read textbook, page 83.

18
Bioinformatics software
19
Bioinformatics software
  • Prediction of sequence structure and function
  • Sequences that have similar structure often have
    similar function. For many sequences we can
    extract secondary and tertiary structure from the
    PDB database.
  • What if our sequence is not in the PDB? We can
    predict structure of a biological sequence using
    appropriate software.
  • There are several programs for prediction of
    secondary structure. For prediction of tertiary
    structure we can do modelling.
  • http//npsa-pbil.ibcp.fr (PHD method for
    secondary structure prediction)

20
Bioinformatics software
  • Secondary structure prediction

21
Bioinformatics software
  • Secondary structure prediction
  • The PHD program predicted four alpha helices in
    the human IL-2 (red). The number of helices is
    correct, but their lengths and boundaries are not
    correct (purple).
  • When we make a prediction in bioinformatics, we
    must have an idea about the accuracy of
    prediction programs.
  • To assess the accuracy of a program, we can test
    it with known data. Our test must have sufficient
    examples, so that we can make reasonable
    conclusions.

22
Secondary structure prediction Bioinformatics
software
  • alpha Lactalbumin PDB 1A4V
  • http//npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?p
    age/NPSA/npsa_server.html

23
Bioinformatics software
  • We used nine different programs for prediction of
    secondary structure of alphaLactalbumin (PDB
    1A4V).
  • The results show that the best predictions for
    this molecule were from Predator, while DSC was
    the laggard.
  • This test does not mean that Predator is the best
    of the tested programs, nor that DSC is the
    worst. To make such conclusions we must make test
    set first. The test set should contain the
    examples from the family of proteins that our
    query protein belongs to.
  • The learning point none of the prediction
    programs (and this applies across all
    bioinformatics software, not only secondary
    structure prediction) is 100 accurate. The users
    must be cautious when interpreting results from
    the predictive software.

24
Bioinformatics software
  • Common measure (other measures also exist)
  • Sensitivity SETP/(TPFN)
  • Specificity SPTN/(TNFP)
  • For example, prediction of binding peptides to a
    particular receptor
  • Experimental Predicted
    Class
  • Example 1 Binder Binder
    True positive (TP)
  • Example 2 Non-binder Non-binder
    True negative (TN)
  • Example 3 Binder Non-binder
    False negative (FN)
  • Example 4 Non-binder Binder
    False positive (FP)
  • Prediction system that has SE0.8 and SP0.9 will
    correctly predict 8 of 10 experimental positives,
    and for each 10 experimental negatives it will
    make one false prediction. This prediction
    accuracy may be very good for prediction of
    peptide binding, but is not very good for some
    other predictions, for example gene prediction.

25
Bioinformatics software
  • Prediction of 3-D structure
  • Various modelling programs
  • comparative modelling, using known structures as
    templates
  • ab initio modelling, using atomic simulation,
    residue statistics, etc.
  • These methods will be covered later in the course
  • An example of the comparative modelling software
    is SWISS-MODEL http//www.expasy.org/swissmod/SWIS
    S-MODEL.html
  • This model is provided by email.
  • This tool has the facility for assessing the
    quality of predictions

26
Bioinformatics software
27
Bioinformatics software
28
Bioinformatics software
29
Bioinformatics software
  • Software for visualisation of 3-D structures.
    Provides different views to 3-D molecular
    structure, which will be taught by A/P Shoba.
  • Chime, Rasmol (they use files in PDB format)
  • Scorpion database uses Chime. Chime can be
    downloaded from http//www.mdli.com/downloads/dow
    nloads.html?uidkeyid1

30
Bioinformatics software
31
Bioinformatics software
32
Bioinformatics software
  • Text searches
  • Text searching software is used associated with
    databases. Most commonly we search by keywords or
    combinations of keywords.
  • Examples of PubMed searches
  • Diabetes 181,672
    matches
  • Diabetes AND IDDM 35,841
  • Diabetes AND IDDM AND autoimmunity 1,109
  • Diabetes OR autoimmunity 190,674
  • DiabetesTitle/Abstract 114,624
  • The last example is more advanced PubMed option
    preview/index

33
Bioinformatics software Summary of Todays
lecture
  • Why bioinformatics software?
  • Types of software sequence, motif, evolution,
    visualization, structural modeling, simulation,
    test search.
  • Examples of selected software
  • Sequence composition
  • DNA-protein sequence translation
  • Evolutionary analysis
  • Protein secondary structure prediction
  • Comparative modeling
  • Text search
  • To be taught later Sequence comparison,
    visualization etc.

34
Summary of the SectionBiological databases and
bioinformatics software
  • We first focused on biological databases. We
    covered topics
  • discussed types of biological databases
  • briefly described popular databases
  • structure of the GenBank and SWISS-PROT entries
  • searching biological databases
  • types of questions that can be answered by
    searching databases
  • completeness and errors in the databases

35
Summary of the SectionBiological databases and
bioinformatics software
  • The second topic was bioinformatics software. We
    covered
  • why do we need bioinformatics software?
  • briefly described major types of bioinformatics
    software
  • described software for sequence composition,
    codon usage, translation and backtranslation
  • introduced the concept of sequence alignment,
    evolutionary analysis
  • secondary and tertiary structure prediction,
    molecular visualization
  • accuracy of prediction software
  • text searching
Write a Comment
User Comments (0)
About PowerShow.com