Lecture 2: Character Homology with particular attention to sequence alignment PowerPoint PPT Presentation

presentation player overlay
1 / 65
About This Presentation
Transcript and Presenter's Notes

Title: Lecture 2: Character Homology with particular attention to sequence alignment


1
Lecture 2Character Homology(with particular
attention to sequence alignment)
2
DNA sequences
  • Strings of characters
  • Each with one of 4 possible states
  • 4 nucleic acids
  • adenine, cytosine, guanine, thymine

3
DNA sequences
  • Protein-coding genes
  • Other structural genes
  • Ribosomal RNA
  • Transfer RNA
  • Other DNA - "non-coding"
  • Introns
  • Repetitive DNA

4
Protein Coding Sequences
Structure and function determined by sequence of
amino acids
  • Myoglobin
  • 3D model

5
Protein-Coding Genes
6
Possible Changes in DNA sequences
  • Substitutions
  • Inversions
  • Insertions and Deletions
  • indels

7
Protein-Coding Genes
  • 3rd position of codon generally redundant
  • 1st position changes more common than 2nd
    position

8
Purines
Pyrimidines
9
Substitutions
  • Transitions
  • purine gt purine
  • pyrimidine gt pyrimidine
  • Transversions
  • purine gt pyrimidine
  • pyrimidine gt purine

10
Substitutions
  • Transitions
  • purine gt purine
  • pyrimidine gt pyrimidine

11
Substitutions
  • Transitions
  • purine gt purine
  • pyrimidine gt pyrimidine
  • Transversions
  • purine gt pyrimidine
  • pyrimidine gt purine

12
Homology in gene sequences
t c t c c a g g t g c a c g t c t t c t
a g t c t c c a g g t g c a c g t c t t
???
13
Homology in gene sequences
t c t c c a g g t g c a c g t c t t c t
a g t c t c c a g g t g c a c g t c t t
14
Homology in gene sequences
t c t c c a g g t g c a c g t c t t c t
a g t c t c c a g g t g c a c g t c t t
15
Homology in gene sequences
t c t c c a g g t g c a c g t c t t c t
a g t c t c c a g g t g c a c g t c t t
a g t c c c c a g g t g c a c g t c t t
16
Introns and exons
17
Insertions and deletions
  • indels
  • generally occur in multiples of three in exons of
    protein-coding genes
  • May be any length in introns

1 a g t c t c c a g g t g c a c g t c t t
2 a g t c c c c a g g t g c a c g t c t t
18
Insertions and deletions
  • indels
  • generally occur in multiples of three in exons of
    protein-coding genes
  • May be any length in introns

Frame-Shift Mutation
1 a g t c t c c a g g t g c a c g t c t t
2 a g t c c c c a g g t g c a c g t c t t
3 a g t t c c c c a g g t g c a c g t c t t
19
Gene families arise from gene duplications
20
  • Paralogous genes two or more different gene
    loci in the same organism that originated by gene
    duplication

21
  • Paralogous genes two or more different gene
    loci in the same organism that originated by gene
    duplication
  • Orthologous genes same gene in two different
    organisms, homologous due to presence in common
    ancestor

22
Gene Copy
  • May be more than one paralogous copy of a gene in
    the genome
  • Copies may be functional
  • e.g. EF1? gene occurs in two copies in insects
  • Non-functional copies are called pseudogenes
  • Insertions and deletions of any length
  • May contain stop codons (TGA, TAG, TAA)

23
BLAST(Basic Local Alignment Search Tool)
  • National Center for Biotechnology Information
  • Algorithms to match query sequence with sequences
    in database (Genbank)
  • If sequences are the same, there should be short
    stretches of complete identity (seeds, or
    "words")
  • Should be able to find seeds even if sequence has
    insertions and deletions
  • Finds best matches and computes overall
    similarity, probability of match this close, etc.

24
(No Transcript)
25
Ribosomes - the workbench for protein synthesis
26
Ribosomes - the workbench for protein synthesis
27
Ribosomes
  • Consist of ribosomal RNA and proteins
  • Protein manufacturing machinery
  • rRNA synthesized in nucleolus
  • The rRNA self-assembles into two folded
    structures, the large and small subunits

28
(image of ribosome by Harry Noller, U.C. Santa
Cruz, Venki Ramakrishnan at Cambridge, and Thomas
Steitz at Yale)
29
Eukaryotic rDNA
  • NTS - non-transcribed spacer regions
  • ITS - internal transcribed spacers
  • In thousands of tandem repeats
  • Identical due to concerted evolution

30
(No Transcript)
31
28s rRNA
32
Ribosomal DNA
  • Insertions and deletions are common
  • Alignment of sequences necessary to establish
    positional homologies of nucleotides

33
Sequence Alignment
X g a c g t t a g a g c t a a t c
Y g a c a g c t c g t c g a
Z g a c g c c c a t c g a g
34
Sequence Alignment
X g a c g t t a g a g c t a a t c
Y 1 g a c - - - a g c t c g t c g a
Y g a c a g c t c g t c g a
Z g a c g c c c a t c g a g
35
Sequence Alignment
X g a c g t t a g a g c t a a t c
Y 1 g a c - - - a g c t c g t c g a
Y 2 g a c - - - - - a g c t c g t c g a
Y g a c a g c t c g t c g a
Z g a c g c c c a t c g a g
36
Sequence Alignment
X g a c g t t a g a g c t a a t c
Y 1 g a c - - - a g c t c g t c g a
Y 2 g a c - - - - - a g c t c g t c g a
Y g a c a g c t c g t c g a
Z g a c g c c c a t c g a g
37
Sequence Alignment
X g a c g t t a g a g c t a a t c
Z 1 g a c - - - - - - g c c c a t c g a g
Y 1 g a c - - - a g c t c g t c g a
Y 2 g a c - - - - - a g c t c g t c g a
Y g a c a g c t c g t c g a
Z g a c g c c c a t c g a g
38
Sequence alignment
  • How many gaps should we insert?
  • Assign cost to whole gaps or each space in a gap
    separately?
  • Less cost for expanding an existing gap?
  • Should gaps be treated as character states?
  • Assign different costs for transitions and
    transversions that result from an alignment?
  • What order to align the sequences?
  • Programs such as CLUSTAL, MALIGN, POY and others
    attempt to optimize functions with all of these,
    and more, parameters

39
CLUSTAL X
  • Multiple Alignment Program
  • First, produce guide tree
  • Compute distances between sequences
  • Cluster most similar sequences together
  • Use tree as template for order of sequential
    pairwise alignments
  • Conduct sequential pairwise alignments

40
Clustal X Alignment ParametersSlow accurate
alignments
  • Penalty for opening first gap
  • Penalty for changing size of existing gap
  • Penalty for each inferred transition
  • Penalty for each inferred transversion

41
Optimization AlignmentWard Wheeler (AMNH)
  • Completely different approach
  • Strategy of multiple sequence alignment followed
    by phylogenetic analysis is fundamentally flawed
  • Better approach is to integrate process of
    sequence alignment with phylogenetic analysis in
    an iterative, recursive analysis
  • POY program
  • Attempts to optimize both phylogenetic analysis
    and alignment simultaneously
  • VERY CPU intensive

42
Optimization Alignment
  • Minimize both insertion/deletion events and
    substitutions
  • Alignments are dynamic and uniquely tailored to
    each topology
  • Alignments chosen to minimize tree length
  • Homoplasy and alignment cost functions minimized
    simultaneously

43
Sensitivity Analysis
  • Explore different regions of alignment parameter
    space by trying different combinations of
    parameters
  • Determine how sensitive results are to changes in
    different parameters
  • Determine optimal combinations of parameters
  • Criterion for optimality is often congruence
    (concordance) with other data sets after analysis

44
28s rRNA
45
Structural Alignment
  • Another approach to alignment of molecules for
    which we have models of secondary structure
  • ribosomal sequences, tRNA, etc.

46
28s rRNA - D2 region
47
28s rRNA - D2 region
48
Structural Alignment(with thanks to Joe
Gillespie and Matt Yoder)
49
Structural Alignment(with thanks to Joe
Gillespie and Matt Yoder)
50
Structural Alignment
51
Structural Alignment
52
Structural Alignment
53
Structural Alignment
54
Structural Alignment
55
Structural Alignment
56
Structural Alignment
57
Structural Alignment
Regions of unambiguous homology
58
Some recent reviews
  • Morrison, D. A. (2006). Multiple sequence
    alignment for phylogenetic purposes. Australian
    Systematic Botany (19) 479-539.
  • Ogden Rosenberg. 2006. Multiple Sequence
    Alignment Accuracy and Phylogenetic Inference.
    Syst. Biol. 55(2)314328
  • Höhl Ragan. 2007. Is Multiple-Sequence
    Alignment Required for Accurate Inference of
    Phylogeny? Syst. Biol. 56(2)206221
  • Parmentier et al. 2006. Large scale multiple
    sequence alignment with simultaneous phylogeny
    inference. J. Parallel Distrib. Comput. 66 (2006)
    15341545
  • Phillips, A. (2000). Multiple Sequence Alignment
    in Phylogenetic Analysis. Molecular Phylogenetics
    and Evolution 16(3) 317-330.

59
Lutzoni et al. (2000) Systematic Biology 49(4)
628-651
  • Method to integrate ambiguously aligned regions
    into analysis
  • Retain positional homologies
  • Calculate minimum distances necessary to
    transform one fragment into another

60
Structural Alignment
Regions of unambiguous homology
61
Fixed State Optimization
  • Treat contiguous strings of nucleotides as
    character states
  • Calculate minimum pairwise distances between such
    states
  • Attempt to find topology that minimizes these
    distances

62
Alignment methods(Wheeler, Cladistics, 2001)
63
Saturation of gene sequences
t c t c c a g g t g c a c g t c t t c t
1 a g t c t c c a g g t g c a c g t c t t
2 a g t c c c c a g g t g c a c g t c t t
64
Saturation of gene sequences
t c t c c a g g t g c a c g t c t t c t
1 a g t c t c c a g g t g c a c g t c t t
2 a g t c c c c a g g t g c a c g t c t t
3 a g t c t c c a g g t g c a c g t c t t
65
Saturation of gene sequences
t c t c c a g g t g c a c g t c t t c t
1 a g t c t c c a g g t g c a c g t c t t
2 a g t c c c c a g g t g c a c g t c t t
3 a g t c t c c a g g t g c a c g t c t t
4 a g t c a c c a g g t g c a c g t c t t
Write a Comment
User Comments (0)
About PowerShow.com