Basic Overview of Bioinformatics Tools and Biocomputing Applications II presentation

About This Presentation

Transcript and Presenter's Notes

Title: Basic Overview of Bioinformatics Tools and Biocomputing Applications II

1
Basic Overview of Bioinformatics Tools and
Biocomputing Applications II

Dr Tan Tin Wee
Director
Bioinformatics Centre

2
Common Computational Analyses

Sequence Assembly
Simple sequence analysis
Translation and reverse Complement, ORF
Composition statistics (protein DNA)
Molecular mass
Total charge and pI local hydropathy
Simple determination of secondary structures
Restriction site analysis
Internal repeat analysis
Detection of active sites, functional residues,
characteristic structures, substrates, and
processing signals

3
Common Computational Analyses

Database sequence search
Multiple alignment
2 and 3 Structure prediction transmembrane
helix detection
Structure modeling
Docking prediction and design
Hidden Markov model searches

4
Database Searching

Text-based Database Searching -using a text
string to match an annotation in a sequence
database record, ie. Keyword search
Sequence-based Database Searching -using a
biological sequence to match its whole or parts
of its sequence to the sequences of every
sequence database records

5
Text-Based Database Searching

Examples Entrez, SRS, DBGET, AceDB- common
integrated database systems
Search Concepts
Boolean Search - AND, OR, NOT
Broadening Search
Narrowing the Search
Proximity searching, soundex
Wild Card, Stemming eg. Thala for thalasemia,
thalassemia, thalassemic
Use standard string search algorithms and boolean
operations, vocabulary matches

6
Text-based Database Searching

Example To find the human homolog of the
Drosophila per gene
Procedure
Web to Entrez
All Fields enter "human" "per"
Hits returned, irrelevant - broaden search
"human" "period" - more hits
check every one, find the human RIGUI gene
Hit and miss, clever guess work, free form or
controlled vocabulary (MeSH terms)?Use Boolean
searches?

7
Sequence-based Database Searching

Homology Search
Global or Local Sequence Alignment
Needleman-Wunch Algorithm
Smith-Waterman Algorithm
Lipman - Pearson FASTA
Altschul's BLAST
Take a sequence, pairwise comparison with each
sequence in the database

8
Sequence-based Database Searching

Basic Assumptions
Sequences of homologous Genes/Protein diverge
over time even though structure and/or function
change little
Significant sequence similarity inferred as
potential structural /functional similarity or
common evolutionary origin
Based on well-characterised protein, infer the
function of an unknown sequence at gene or
protein sequence level.

9
Sequence-based Database Searching

Global Alignmentforces complete alignment of the
pairwise comparison of the two input sequences
Local Alignmentlooks for local stretches of
similarity and tries to align the most similar
segments
Algorithms used may be similar, but output
different, statistics needed to assess results

10
Sequence-based Database Searching

Alignment Scoring
Substitution score and substitution matrixPAM,
BLOSUM
affine gap costs/gap penalty and gap scores
Optimal alignments, dynamic programmingNeedleman-
Wunsch algorithm,Smith-Waterman algorithm
(SSEARCH)
Additional heuristics to speed up the search -
FASTA, BLAST

11
Some definitions

Affine gap costs - scoring system for gaps within
alignments which charges a penalty for gap
formation and additional per-residue penalty
proportional to size of gap
Alignment score - numerical value indicating the
overall quality of an alignment, the higher the
better the alignment.
Algorithm - fixed procedure embodied in a
computer program
Heuristics - a computer science term referring to
guesses made by the program to approximate
results, usually based on arbitrary or predefined
rules.
Gapped Alignment - alignment of sequences where
gaps are permitted

12
Computational Genefinding

Major challenge in genome project
Given a DNA sequence, where does a gene begin and
stop? - ORF
Where are the exons and introns?
Where are the transcription elements?
Gene structure and other regulatory elements?

13
Genomic Elements

Intron-exon splice sites
Start-Stop codons
Branch Points
Promoters and terminators of transcription
Polyadenylation sites
ribosomal binding sites
Topoisomerase II binding sites
Topoisomerase I cleavage sites
Transcription factor binding sites

14
Detecting Genomic Elements

Local sites and motifs/patterns for such element
- signals and signal sensors
Extended variable-length regions eg exons and
introns- contents and content sensors
Linguistic technique - gene structure described
in formal grammar - GeneLang genefinding program

15
Signal sensors

Simple consensus sequenceUse of Pattern matching
algorithms
Weight matricesallow for weighted score for each
weight matrix sensors to be summed
Use of Artificial Neural Networks (ANN)

16
Content Sensors

Long ORF for bacteria
Statistical models eg. Markov models -
GeneMarkstatistical models of nucleotide
frequencies and dependencies in codon structure
Neural Nets eg Grailexon detection by neural
network combined with signal sensors for
exon-intron splice sites

17
Some Definitions

Artificial Neural Nets - statistical pattern
recognition method - a type of nonlinear
regression
Markov Models - statistical models for sequences
in which the probability of each residue depends
on the residues preceding it.
Dynamic Programming - type of algorithm widely
used for constructing sequence aligments and for
evaluating all posible candidate gene structure

18
Other Genefinding methods

Use of dynamic programmingLinguistic rules for
functional featuresParameters of a Markov
Process on hidden variables - hidden Markov
Models (HMM)
HMM genefinder - EcoParse, Xpound GeneMark HMM,
Veil, HMMgene, GenScan

Write a Comment

User Comments (0)

About PowerShow.com

Basic Overview of Bioinformatics Tools and Biocomputing Applications II PowerPoint PPT Presentation