Title: S1
11
1.5
1
S1
S2
S1
S2
S3
S4
0.5
1
1.5
1
S1
S2
S3
S4
0.75
0.5
1
2.75
1.5
1
S1
S2
S3
S4
S5
2Score s(T,V) w1 w5 s(T,I) w1 w6 s(L,V) w2
w5 s(L,I) w2 w6 s(K,V) w3 w5 s(K,I) w3
w6 s(K,V) w4 w5 s(K,I) w4 w6 / 8
- PEEKSAVTAL
- GEEKAAVLAL
- K
- K
5. V 6. I
3CLUSTALW
- Each sequence is weighted by how similar to
others - - overrepresentation of a subfamily
- - weight derived from the guide tree
- (in the example, weight for S5 and S4 2.75,
1.9375)
- Problem situations
- similar only in smaller regions
- large insertion in a sequence
- repetitive elements in one sequence
4Deriving the Sequence of Genome
- Need a rapid and accurate sequencing method
- Normal sequencing technique 700 b
- Random small fragments 2-10kb
- http//seqcore.brcf.med.umich.edu/doc/educ/dnapr/s
equencing.html
5Genome alignment and comparison
- Sequencing - fast, available genomes
- Annotation not so fast
- - identify gene structure, function
- - cross species linking
- Comparative genomics
- -potential
- -coding and non-coding regions
- (e.g. human and mouse)
- -species-specific regions
- - selective pressure, evolution, rearrangement
6Genome alignment and comparison - An example
- Mouse chromosome 16 and human genome
- (Mural et al, 2000, Science, 296, 1661-71)
7Genome alignment
- Previous discussed alignment methods
- -most targets single gene
- -not accurate enough for genome scale
- -or computationally demanding
- Situation with genome comparisons
- rearrangements, repeats,
-
8Methods in Genome Alignment gt1998
- DIALIGN Diagonal Alignment
- ASSIRC Accelerated Search for Similarity
Regions in Chromosomes
- MUMmer - Maximal Unique Match (er)
- PipMaker percent Identity Plot Maker
- GLASS Global alignment System
- WABA Wobble Aware Bulk Aligner
- LSH-ALL-PAIRS Locality Similarity Hashing in All
Pairs
9Suffix Tree
- A suffix tree for an m-character-string S
- Has exactly m leaves, numbered 1 to m
- An internal nodes has at least 2 children
- Each edge is labeled with a non-empty substring
- No two edges out of the same node have the labels
beginning with the same character
- From root to leaf i, the concatenation of the
edge-labels on the path is the suffix of S
starting from position i.
10An Example of a Suffix Tree
String atgtgtgtc
11MUMmer
Delcher AL, 2002 Nucleic Acid Research Vol
30(11) Delcher AL, 1999 Nucleic Acid Research Vol
27(11)
- Maximum Unique Match
- Occurs once in genome A and once in genome B
- Cant be extended
- Difference between MUM and common subsequence
- Idea long enough MUMs are part of the alignment
12MUMmer
Delcher AL, 2002 Nucleic Acid Research Vol 30(11)
Example A acat B acaa
- Build a suffix tree for A and B
- Find unique matches then maximum matches