Definitions - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Definitions

Description:

Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically meaningful. – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 26
Provided by: duan85
Category:

less

Transcript and Presenter's Notes

Title: Definitions


1
Definitions
  • Optimal alignment - one that exhibits the most
    correspondences. It is the alignment with the
    highest score. May or may not be biologically
    meaningful.
  • Global alignment - Needleman-Wunsch (1970)
    maximizes the number of matches between the
    sequences along the entire length of the
    sequences.
  • Local alignment - Smith-Waterman (1981) gives the
    highest scoring local match between two sequences.

2
Pairwise Global Alignment
  • Global alignment - Needleman-Wunsch (1970)
  • maximizes the number of matches between the
    sequences along the entire length of the
    sequences.
  • Reason for making a global alignment
  • checking minor difference between two sequences
  • Analyzing polymorphisms (ex. SNPs) between
    closely related sequences

3
Pairwise Global Alignment
  • Computationally
  • Given
  • a pair of sequences (strings of characters)
  • Output
  • an alignment that maximizes the similarity

4
How can we find an optimal alignment?
27
1
  • ACGTCTGATACGCCGTATAGTCTATCTCTGAT---TCG-CATCGTC--T
    -ATCT
  • How many possible alignments?
  • C(27,7) gap positions 888,000 possibilities
  • Dynamic programming The Needleman Wunsch
    algorithm

5
Time Complexity
  • Consider two sequences
  • AAGT
  • AGTC
  • How many possible alignments the 2 sequences
    have?

6
Scoring a sequence alignment
  • Match/mismatch score 1/0
  • Open/extension penalty 2/1ACGTCTGATACGCCGTATAG
    TCTATCT ----CTGATTCGC-
    --ATCGTCTATCT
  • Matches 18 (1)
  • Mismatches 2 0
  • Open 2 (2)
  • Extension 5 (1)

Score 9
7
Pairwise Global Alignment
  • Computationally
  • Given
  • a pair of sequences (strings of characters)
  • Output
  • an alignment that maximizes the similarity

8
Needleman Wunsch
  • Place each sequence along one axis
  • Place score 0 at the up-left corner
  • Fill in 1st row column with gap penalty
    multiples
  • Fill in the matrix with max value of 3 possible
    moves
  • Vertical move Score gap penalty
  • Horizontal move Score gap penalty
  • Diagonal move Score match/mismatch score
  • The optimal alignment score is in the lower-right
    corner
  • To reconstruct the optimal alignment, trace back
    where the max at each step came from, stop when
    hit the origin.

9
Example
  • Let gap -2
  • match 1
  • mismatch -1.

AAAC A-GC
AAAC -AGC
10
Time Complexity Analysis
  • Initialize matrix values O(n), O(m)
  • Filling in rest of matrix O(nm)
  • Traceback O(nm)
  • If strings are same length, total time O(n2)

11
Local Alignment
  • Problem first formulated
  • Smith and Waterman (1981)
  • Problem
  • Find an optimal alignment between a substring of
    s and a substring of t
  • Algorithm
  • is a variant of the basic algorithm for global
    alignment

12
Motivation
  • Searching for unknown domains or motifs within
    proteins from different families
  • Proteins encoded from Homeobox genes (only
    conserved in 1 region called Homeo domain 60
    amino acids long)
  • Identifying active sites of enzymes
  • Comparing long stretches of anonymous DNA
  • Querying databases where query word much smaller
    than sequences in database
  • Analyzing repeated elements within a single
    sequence

13
Local Alignment
  • Let gap -2
  • match 1
  • mismatch -1.

GATCACCT GATACCC
0
1
0
0
0
0
1
2
0
0
0
3
1
0
2
1
1
2
0
0
2
1
3
1
0
1
2
4
2
1
0
2
3
3
1
14
Smith Waterman
  • Place each sequence along one axis
  • Place score 0 at the up-left corner
  • Fill in 1st row column with 0s
  • Fill in the matrix with max value of 4 possible
    values
  • 0
  • Vertical move Score gap penalty
  • Horizontal move Score gap penalty
  • Diagonal move Score match/mismatch score
  • The optimal alignment score is the max in the
    matrix
  • To reconstruct the optimal alignment, trace back
    where the MAX at each step came from, stop when a
    zero is hit

15
exercise
  • Let
  • gap -2
  • match 1
  • mismatch -1.
  • Find the best local alignment
  • CGATGAAATGGA

16
Semi-global Alignment
  • Example
  • CAGCA-CTTGGATTCTCGG
  • CAGCGTGG
  • CAGCACTTGGATTCTCGG
  • CAGCGTGG
  • We like the first alignment much better. In
    semiglobal comparison, we score the alignments
    ignoring some of the end spaces.

17
Global Alignment
  • Example
  • AAACCC
  • A ?? CCC

empty A A A C C C
empty 0 -2 -4 -6 -8 -10 -12
A -2 1 -1 -3 -5 -7 -9
C -4 -1 0 -2 -2 -4 -6
C -6 -3 -2 -1 -1 -1 -3
C -8 -5 -4 -3 0 0 0
Prefer to see AAACCC ? ? ACCC
Do not want to penalize the end spaces
18
SemiGlobal Alignment
  • Example
  • s AAACCC
  • t ? ? ACCC

empty A A A C C C
empty 0 0 0 0 0 0 0
A -2 1 1 1 -1 -1 -1
C -4 -1 0 0 2 0 0
C -6 -3 -2 -1 1 3 1
C -8 -5 -4 -3 0 2 4
19
SemiGlobal Alignment
  • Example
  • s AAACCCG
  • t ? ? ACCC ?

empty A A A C C C
empty 0 0 0 0 0 0 0
A -2 1 1 1 -1 -1 -1
C -4 -1 0 0 2 0 0
C -6 -3 -2 -1 1 3 1
C -8 -5 -4 -3 0 2 4






G
0
-1
-2
-1
2
20
SemiGlobal Alignment
  • Summary of end space charging procedures

Place where spaces are not penalized for Action
Beginning of 1st sequence End of 1st sequence Beginning of 2nd sequence End of 2nd sequence Initialize 1st row with zeros Look for max in last row Initialize 1st column with zeros Look for max in last column
21
Pairwise Sequence Comparison over Internet
lalign www.ch.embnet.org/software/LALIGN_form.html Global/Local
lalign fasta.bioch.virginia.edu/fasta_www/plalign.htm Global/Local
USC www-hto.usc.edu/software/seqaln/seqaln-query.html Global/Local
alion fold.stanford.edu/alion Global/Local
genome.cs.mtu.edu/align.html Global/Local
align www.ebi.ac.uk/emboss/align Global/Local
xenAliTwo www.soe.ucsc.edu/kent/xenoAli/xenAliTwo.html Local for DNA
blast2seqs www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html Local BLAST
blast2seqs web.umassmed.edu/cgi-bin/BLAST/blast2seqs Local BLAST
lalnview www.expasy.ch/tools/sim-prot.html Visualization
prss www.ch.embnet.org/software/PRSS_form.html Evaluation
prss Fasta.bioch.virginia.edu/fasta/prss.htm Evaluation
graph-align Darwin.nmsu.edu/cgi-bin/graph_align.cgi Evaluation
Bioinformatics for Dummies
22
Significance of Sequence Alignment
  • Consider randomly generated sequences. What
    distribution do you think the best local
    alignment score of two sequences of sample length
    should follow?
  • Uniform distribution
  • Normal distribution
  • Binomial distribution (n Bernoulli trails)
  • Poisson distribution (n??, np?)
  • others

23
Extreme Value Distribution
  • Yev exp(- x - e-x )

24
Extreme Value Distribution vs. Normal Distribution
25
Twilight Zone
  • Some proteins with less than 15 similarity have
    exactly the same 3-D structure while some
    proteins with 20 similarity have different
    structures. Homology/non-homology is never
    granted in the twilight zone.
Write a Comment
User Comments (0)
About PowerShow.com