Basic Overview of Bioinformatics Tools and Biocomputing Applications II PowerPoint PPT Presentation

presentation player overlay
1 / 18
About This Presentation
Transcript and Presenter's Notes

Title: Basic Overview of Bioinformatics Tools and Biocomputing Applications II


1
Basic Overview of Bioinformatics Tools and
Biocomputing Applications II
  • Dr Tan Tin Wee
  • Director
  • Bioinformatics Centre

2
Common Computational Analyses
  • Sequence Assembly
  • Simple sequence analysis
  • Translation and reverse Complement, ORF
  • Composition statistics (protein DNA)
  • Molecular mass
  • Total charge and pI local hydropathy
  • Simple determination of secondary structures
  • Restriction site analysis
  • Internal repeat analysis
  • Detection of active sites, functional residues,
    characteristic structures, substrates, and
    processing signals

3
Common Computational Analyses
  • Database sequence search
  • Multiple alignment
  • 2 and 3 Structure prediction transmembrane
    helix detection
  • Structure modeling
  • Docking prediction and design
  • Hidden Markov model searches

4
Database Searching
  • Text-based Database Searching -using a text
    string to match an annotation in a sequence
    database record, ie. Keyword search
  • Sequence-based Database Searching -using a
    biological sequence to match its whole or parts
    of its sequence to the sequences of every
    sequence database records

5
Text-Based Database Searching
  • Examples Entrez, SRS, DBGET, AceDB- common
    integrated database systems
  • Search Concepts
  • Boolean Search - AND, OR, NOT
  • Broadening Search
  • Narrowing the Search
  • Proximity searching, soundex
  • Wild Card, Stemming eg. Thala for thalasemia,
    thalassemia, thalassemic
  • Use standard string search algorithms and boolean
    operations, vocabulary matches

6
Text-based Database Searching
  • Example To find the human homolog of the
    Drosophila per gene
  • Procedure
  • Web to Entrez
  • All Fields enter "human" "per"
  • Hits returned, irrelevant - broaden search
  • "human" "period" - more hits
  • check every one, find the human RIGUI gene
  • Hit and miss, clever guess work, free form or
    controlled vocabulary (MeSH terms)?Use Boolean
    searches?

7
Sequence-based Database Searching
  • Homology Search
  • Global or Local Sequence Alignment
  • Needleman-Wunch Algorithm
  • Smith-Waterman Algorithm
  • Lipman - Pearson FASTA
  • Altschul's BLAST
  • Take a sequence, pairwise comparison with each
    sequence in the database

8
Sequence-based Database Searching
  • Basic Assumptions
  • Sequences of homologous Genes/Protein diverge
    over time even though structure and/or function
    change little
  • Significant sequence similarity inferred as
    potential structural /functional similarity or
    common evolutionary origin
  • Based on well-characterised protein, infer the
    function of an unknown sequence at gene or
    protein sequence level.

9
Sequence-based Database Searching
  • Global Alignmentforces complete alignment of the
    pairwise comparison of the two input sequences
  • Local Alignmentlooks for local stretches of
    similarity and tries to align the most similar
    segments
  • Algorithms used may be similar, but output
    different, statistics needed to assess results

10
Sequence-based Database Searching
  • Alignment Scoring
  • Substitution score and substitution matrixPAM,
    BLOSUM
  • affine gap costs/gap penalty and gap scores
  • Optimal alignments, dynamic programmingNeedleman-
    Wunsch algorithm,Smith-Waterman algorithm
    (SSEARCH)
  • Additional heuristics to speed up the search -
    FASTA, BLAST

11
Some definitions
  • Affine gap costs - scoring system for gaps within
    alignments which charges a penalty for gap
    formation and additional per-residue penalty
    proportional to size of gap
  • Alignment score - numerical value indicating the
    overall quality of an alignment, the higher the
    better the alignment.
  • Algorithm - fixed procedure embodied in a
    computer program
  • Heuristics - a computer science term referring to
    guesses made by the program to approximate
    results, usually based on arbitrary or predefined
    rules.
  • Gapped Alignment - alignment of sequences where
    gaps are permitted

12
Computational Genefinding
  • Major challenge in genome project
  • Given a DNA sequence, where does a gene begin and
    stop? - ORF
  • Where are the exons and introns?
  • Where are the transcription elements?
  • Gene structure and other regulatory elements?

13
Genomic Elements
  • Intron-exon splice sites
  • Start-Stop codons
  • Branch Points
  • Promoters and terminators of transcription
  • Polyadenylation sites
  • ribosomal binding sites
  • Topoisomerase II binding sites
  • Topoisomerase I cleavage sites
  • Transcription factor binding sites

14
Detecting Genomic Elements
  • Local sites and motifs/patterns for such element
    - signals and signal sensors
  • Extended variable-length regions eg exons and
    introns- contents and content sensors
  • Linguistic technique - gene structure described
    in formal grammar - GeneLang genefinding program

15
Signal sensors
  • Simple consensus sequenceUse of Pattern matching
    algorithms
  • Weight matricesallow for weighted score for each
    weight matrix sensors to be summed
  • Use of Artificial Neural Networks (ANN)

16
Content Sensors
  • Long ORF for bacteria
  • Statistical models eg. Markov models -
    GeneMarkstatistical models of nucleotide
    frequencies and dependencies in codon structure
  • Neural Nets eg Grailexon detection by neural
    network combined with signal sensors for
    exon-intron splice sites

17
Some Definitions
  • Artificial Neural Nets - statistical pattern
    recognition method - a type of nonlinear
    regression
  • Markov Models - statistical models for sequences
    in which the probability of each residue depends
    on the residues preceding it.
  • Dynamic Programming - type of algorithm widely
    used for constructing sequence aligments and for
    evaluating all posible candidate gene structure

18
Other Genefinding methods
  • Use of dynamic programmingLinguistic rules for
    functional featuresParameters of a Markov
    Process on hidden variables - hidden Markov
    Models (HMM)
  • HMM genefinder - EcoParse, Xpound GeneMark HMM,
    Veil, HMMgene, GenScan
Write a Comment
User Comments (0)
About PowerShow.com