Sequence Alignments - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Sequence Alignments

Description:

Alignment between two or more nucleotide or amino acid sequences. Similarity between sequences ... Used to visualize regions of similarity ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 43
Provided by: cclo
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignments


1
Sequence Alignments
  • BIOL/CHEM 4900

2
Reading
  • Chapter 2 in your textbook

3
Sequence Alignments
  • Alignment between two or more nucleotide or amino
    acid sequences
  • Similarity between sequences
  • What can this tell you?
  • In this chapter
  • How do we align two or more sequences?
  • How do we evaluate these alignments?
  • What conclusions can we make based on these
    alignmets?

4
Dot Plots
  • Used to visualize regions of similarity
  • One sequence placed on the x-axis, the other on
    the y-axis
  • Dots are placed in the plot where the two
    sequences are identical
  • Diagonal lines in plot indicate regions of
    similarity
  • Example compare ATCG to GATC
  • Advantages easy, quick
  • Disadvantages only gives regions of similarity,
    not actual alignment
  • What would the dot plot look like with longer
    sequences?

5
Noise in Dot Plots
  • Control by adjusting the following
  • Window size
  • Similarity cutoff
  • Removing too much noise might conceal small
    region of similarity
  • Example GCTAGTCAGA and GATGGTCACA

Complete this plot!
Window of 1 Similarity cutoff of 1
Window of 4 Similarity cutoff of 3
6
Dot Plots in Excel
7
Try the DotPlot Program
  • Download the program from this link
  • It will automatically save the program and
    several files to your desktop
  • Open DotPlot application
  • Load sequences as FASTA text files
  • File, Open Horizontal, Browse
  • File, Open Vertical, Browse
  • Parameters menu changes length and cutoff
  • Draw, Identities shows plot
  • Clear screen when change parameters to visualize
  • Example Bos taurus and porcine myoglobin mRNA
    sequences (sequences on course website)

8
Simple Alignments
  • Molecular changes occur when organisms evolve
  • Mutation
  • Most common
  • Insertion
  • Deletion
  • Gaps in alignments
  • Added to account for insertions/deletions
  • Goal to obtain optimal alignment
  • Most likely to represent the true relationship
    between homologous sequences
  • Consider the following sequences AATCTATA and
    AAGATA
  • Either 2 insertions in first sequence or 2
    deletions in second sequence
  • What is the optimal alignment?

9
  • If no gaps allowed, there are three ways the
    sequences can be aligned
  • AATCTATA AATCTATA AATCTATA
  • AAGATA AAGATA AAGATA
  • Which alignment is optimal?
  • Scoring alignments
  • Match score credit for identical aligned pair
  • Mismatch score penalty for nonidentical
    residues
  • Total score sum of match and mismatch scores
  • Higher score better alignment

10
  • If gaps are allowed, there are many more ways the
    sequences can be aligned
  • Three examples
  • AATCTATA AATCTATA AATCTATA
  • AAG-AT-A AA-G-ATA AA--GATA
  • Scoring must now account for gaps
  • Gap penalty penalty for each residue aligned
    with
  • Total score match mismatch gap penalty

11
  • If match 1, mismatch 0, and gap penalty -1,
    what are the scores for these three alignments?
  • AATCTATA AATCTATA AATCTATA
  • AAG-AT-A AA-G-ATA AA--GATA

12
Gap Penalties
  • Is it more likely to have one longer
    insertion/deletion, or multiple smaller ones?
  • Two types of gap penalties
  • Length penalty
  • Penalty for each residue aligned with -
  • Origination penalty
  • Penalty for presence of a gap
  • Allows differentiation between alignments with
    many short gaps and those with fewer, longer gaps
  • Further penalizes for rare insertion/deletion
    (indel) events

13
  • If match 1, mismatch 0, length penalty -1,
    and origination penalty -2, what are the scores
    for these three alignments?
  • AATCTATA AATCTATA AATCTATA
  • AAG-AT-A AA-G-ATA AA--GATA

14
Terminal Gaps
  • Might not actually be indels
  • Data could be incomplete
  • Sometimes ignored in scoring
  • AATCTATAGC
  • AAG--ATA--

15
Mismatch Penalties
  • Different mismatch scores depending on particular
    nucleotide or amino acid that is mismatched
  • Reward mismatches that are more likely to occur
    (common substitutions)
  • Nucleotides
  • Purine vs. pyrimidine
  • Transitions vs. transversions

16
Scoring Matrices
  • Show scores for all non-gap positions in
    alignment
  • For nucleotide sequences

Identity (Sparse)
BLAST
Transition/transversion
17
Matrices for Proteins
  • Amino acids
  • 1. Structure and properties
  • Substitution of similar AAs
  • more likely to retain protein function
    (conservative substitution)
  • 2. Genetic code
  • Minimum number of nucleotide substitutions needed
    to convert a codon

18
Matrices for Proteins
  • 3. Actual observed substitution rates
  • Point accepted mutation (PAM)
  • Alignment constructed with high similarity (gt85)
  • Calculate relative mutability (mj)
  • Number of times one amino acid (j) is substituted
    by any other
  • Calculate specific substitution (Aij)
  • Number of times j is substituted by a specific
    amino acid i
  • See Box 2.1 (page 40)

19
PAM Example
  • Ambiguities
  • X ambiguous amino acid
  • B Asn or Asp
  • Z Gln or Glu
  • Some algorithms take ambiguities into account and
    score some count them as identical others
    ignore them
  • If the sequence has lots of ambiguities scores
    may not be reliable with certain types of software
  • Identical amino acids highest score
  • Conservative substitution next highest score
  • Non-conservative substitution lowest score

20
PAM Matrices
  • Pam matrix is normalized to represent
    substitution over a fixed period of evolutionary
    change
  • PAM-1
  • 1 substitution per 100 residues
  • Matrix represents probability of AA substitution
    in time it takes for 1 of all residues to be
    substituted
  • Used to compare sequences that are closely
    related
  • PAM-1000
  • Used for sequences with distant relationships
  • PAM-250
  • Commonly used middle ground

21
BLOSUM Matrix
  • Also derived from observing substitution rates in
    proteins
  • Looks at clusters of amino acids sequences
  • Lower numbered matrices used for more distantly
    related sequences
  • BLOSUM-45 vs. BLOSUM-80
  • BLOSUM-62 is default

22
PAM and BLOSUM
BLOSUM 80
BLOSUM 62
BLOSUM 45
PAM 1
PAM 250
PAM 1000
More Divergent
Less Divergent
23
Types of Scores
  • Raw Score
  • Protein and nucleotide alignments
  • Sum the scores for matches, mismatches, and gaps
  • Percent identities
  • Protein and nucleotide alignments
  • Ratio of residues that match up in both sequences
    to total number of residues compared
  • Percent positives
  • Protein alignments only
  • Matrix values gt1 are called positives
  • Ratio of positive values to total number of
    residues compared

24
An Example
  • Alignment of mouse and crayfish trypsin
  • Raw score
  • Identities
  • Positives

Mouse I V G G Y N C E E N S V P Y
Q 5 4 5 5 -3 2 -2 2 3 0 0 -1 6
10 4 Crayfish I V G G T D A V L G E
F P Y Q
25
Algorithms for Alignments
  • Global
  • Dynamic programming
  • Breaking a problem down into smaller subproblems,
    then rebuilding
  • Needleman and Wunsch
  • Aligns whole sequences
  • All gaps accounted for (internal and terminal)
  • Semiglobal
  • Revised by Needleman and Wunsch
  • Aligns whole sequences
  • Only internal gaps count
  • Local
  • Smith and Waterman
  • Aligns localized regions of similarity
  • Ignore gaps

26
Partial Scores Table
  • Used to align sequences
  • Top and left axes labeled with sequences
  • Contains alignment scores for all alignment
    options
  • Used to determine optimal alignment
  • Example alignment of ACTCG and ACAGTAG
  • Rules for global alignment
  • Horizontal move -1 (indicates gap in left axis)
  • Vertical move -1 (indicates gap in top axis)
  • Diagonal move 1 for match or 0 for mismatch
  • First row and column are initialized with
    multiples of gap penalty

27
Initial Partial Scores Table
28
  • Start in outlined box
  • Calculate the possible scores from diagonal,
    above, and left
  • Put the LARGEST (best) score in the box
  • Move across table to complete first row
  • Move to second row, etc., until table is complete

Diagonal 0 1(match) 1 Top -1 1
-2 Left -1 1 -2
29
Diagonal -1 0(mismatch) -1 Top -2 1
-3 Left 1 1 0
30
Completed Table
Now, trace the optimal path. Start at the bottom
right, and move in the direction that gave that
score. End at the top left.
31
Completed Path
Now, write the alignment
32
Writing the Alignment from the Partial Scores
Table
  • ? means the two residues are aligned
  • ? means there is a gap in top axis
  • ? means there is a gap in left axis

33
Semiglobal Alignments
  • Only internal gaps count
  • Do not penalize gaps at ends of sequence
  • Rules for semiglobal alignment
  • Horizontal move -1 (indicates gap in left axis)
    EXCEPT in bottom row
  • Vertical move -1 (indicates gap in top axis)
    EXCEPT in last column
  • Diagonal move 1 for match or 0 for mismatch
  • First row and column are initialized to zero
  • Example align ACACTG and ACACTGATCG

34
Initial Partial Scores Table
35
Diagonal 0 0 (mismatch) 0 Top 0 0 (no
penalty last column) 0 Left 0 1 -1
36
Diagonal 0 0 (mismatch) 0 Top 0 1
-1 Left 0 0 (no penalty last row) 0
37
Completed Table
38
Completed Path and Alignment
?
?
?
?
?
?
?
?
?
?
ACACTGATCG ACACTG----
39
Local Alignments
  • Used to find best matching subsequences within
    two sequences
  • Rules for local alignment
  • Horizontal move -1
  • Vertical move -1
  • Diagonal move 1 for match or -1 for mismatch
  • First row and column are initialized to zero
  • Place a zero in the table if all other scores are
    negative for that box
  • When determining path, find highest number on
    table, and work back until you come to a zero
  • Example GCGATATA and AACCTATAGCT

40
Completed Table
41
Alignment
Start with highest value continue until you
reach zero
?
?
?
TATA TATA
?
42
NextBLAST!
  • Lets let the computer do the work
Write a Comment
User Comments (0)
About PowerShow.com