The Basic Local Alignment Search Tool (BLAST) - PowerPoint PPT Presentation

1 / 143
About This Presentation
Title:

The Basic Local Alignment Search Tool (BLAST)

Description:

Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment ... Order of pair-wise profile alignments determined ... – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 144
Provided by: publ153
Category:

less

Transcript and Presenter's Notes

Title: The Basic Local Alignment Search Tool (BLAST)


1
The Basic Local Alignment Search Tool(BLAST)
  • Rapid data base search tool (1990)
  • Idea
  • (1) Search for high scoring segment pairs


2
The Basic Local Alignment Search Tool(BLAST)
  • A Y W T Y I V A L T Q V R Q Y E A T
  • S I L C I V M I Y S R A - Q Y R Y W R Y
  • Most local alignments contain highly conserved
    sections without gaps


3
The Basic Local Alignment Search Tool(BLAST)
  • A Y W T Y I V A L T Q V R Q Y E A T
  • S I L C I V M I Y S R A - Q Y R Y W R Y
  • -gt search for high scoring segment pairs
  • (HSP), i.e. gap-free local alignments


4
The Basic Local Alignment Search Tool(BLAST)
5
The Basic Local Alignment Search Tool(BLAST)
  • A Y W T Y I V A L T Q V R Q Y E A T
  • S I L C I V M I Y S R A - Q Y R Y W R Y
  • Advantages
  • (a) speed
  • (b) statistical theory about HSP exists.


6
The Basic Local Alignment Search Tool(BLAST)
  • Rapid data base search tool (1990)
  • Idea
  • (1) Search for high scoring segment pairs
  • (2) Use word pairs as seeds


7
Pair-wise sequence alignment

  • T W L M H C A Q Y I
  • C
  • I
  • M X
  • H X
  • C X
  • T
  • H
  • Y
  • (1) Search word pairs of length 3 with score gt T,
  • Use them as seeds.

8
Pair-wise sequence alignment

  • Naïve algorithm would have a complexity of O(l1
    l2)
  • Solution
  • Preprocess query sequence
  • Compile a list of all words that have a
  • Score gt T when aligned to a word in the
  • Query.

9
Pair-wise sequence alignment

  • Naïve algorithm would have a complexity of O(l1
    l2)
  • Solution
  • Preprocess query sequence
  • Compile a list of all words that have a
  • Score gt T when aligned to a word in the
  • Query. Complexity O(l1)
  • Organize words in efficient data structure (tree)
    for fast look-up

10
The Basic Local Alignment Search Tool(BLAST)
  • Rapid data base search tool (1990)
  • Idea
  • (1) Search for high scoring segment pairs
  • (2) Use word pairs as seeds
  • (3) Extend seed alignments until score drops
    below threshold value


11
Pair-wise sequence alignment

  • T W L M H C A Q Y I
  • C
  • I
  • M X
  • H X
  • C X
  • T
  • H
  • Y
  • Extend seeds until score drops by X.

12
Pair-wise sequence alignment

  • T W L M H C A Q Y I
  • C
  • I X
  • M X
  • H X
  • C X
  • T X
  • H X
  • Y
  • Extend seeds until score drops by X.

13
Pair-wise sequence alignment

  • Algorithm not guaranteed to find best
  • segment pair
  • (Heuristic)
  • But works well in practice!

14
The Basic Local Alignment Search Tool(BLAST)
  • New BLAST version (1997)
  • Two-hit strategy


15
Pair-wise sequence alignment

  • W L M H C A Q Y A R V
  • I
  • M X
  • H X
  • C X
  • T
  • H
  • W
  • A X
  • R X
  • v X
  • Search two word pairs of at the same diagonal,
    use lower threshold T

16
The Basic Local Alignment Search Tool(BLAST)
  • New BLAST version (1997)
  • Two-hit strategy
  • Gapped BLAST
  • Position-Specific Iterative BLAST
  • (PSI BLAST)


17
The Basic Local Alignment Search Tool(BLAST)



18
Multiple sequence alignment
  • 1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn
    ..............gE
  • 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede
    ............deiE
  • 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslv
    algfsdgqearpeeiG
  • 1ihvA 1 .NFRVYYRDsrd......pvwkGPAKLLWkg...
    ..............eG
  • 1vie 1 .drvrkksga.........awqGQIVGWYctnlt
    .............peG
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

19
Multiple sequence alignment
  • First question how to score multiple
    alignments?
  • Possible scoring scheme
  • Sum-of-pairs score

20
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

21
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

22
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

23
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQtkngqGWVPSNYITPVN
  • 1ycsB 39 WWWARlndkeGYVPRNLLGLYP

24
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

25
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

26
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

27
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

28
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

29
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • Use sum of scores of these p.a.
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

30
Multiple sequence alignment
  • Goal
  • Find multi-alignment with maximum score !

31
Multiple sequence alignment
  • Needleman-Wunsch coring scheme can be generalized
    from pair-wise to multiple alignment
  • Multidimensional search space instead of
    two-dimensional matrix!

32
Multiple sequence alignment

33
Multiple sequence alignment
  • Complexity
  • For sequences of length l1 l2 l3
  • O( l1 l2 l3 )
  • For n sequences ( average length l )
  • O( ln )
  • Exponential complexity!

34
Multiple sequence alignment
  • Needleman-Wunsch coring scheme can be generalized
    from pair-wise to multiple alignment
  • Optimal solution not feasible

35
Multiple sequence alignment
  • Needleman-Wunsch coring scheme can be generalized
    from pair-wise to multiple alignment
  • Optimal solution not feasible
  • -gt Heuristics necessary

36
Multiple sequence alignment
  • (A) Carillo and Lipman (MSA)
  • Find sub-space in dynamic-programming
  • Matrix where optimal path can be found

37
Multiple sequence alignment
  • (B) Stoye, Dress (DCA)
  • Divide search space into small
  • Calculate optimal alignment for sub-spaces
  • Concatenate sub-alignments

38
Multiple sequence alignment
  • (B) Stoye, Dress (DCA)

39
Multiple sequence alignment
  • (B) Stoye, Dress (DCA)

40
Multiple sequence alignment
  • Progressive alignment.
  • Carry out a series of pair-wise alignment

41
Multiple sequence alignment
  • Most popular way of constructing multiple
    alignments
  • Progressive alignment.
  • Carry out a series of pair-wise alignment

42

Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP

43
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP
  • Align most similar sequences

44
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASFQPVAALERIN
  • WLNYNEERGDFPGTYVEYIGRKKISP

45
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP

46
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP
  • Align sequence to alignment

47
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN-
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP
  • Align alignment to alignment

48
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN--------
  • WW--RLNDKEGYVPRNLLGLYP--------
  • AVVIQDNSDIKVVP--KAKIIRD-------
  • YAVESEA---SVQ--PVAALERIN------
  • WLN-YNE---ERGDFPGTYVEYIGRKKISP

49
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN--------
  • WW--RLNDKEGYVPRNLLGLYP--------
  • AVVIQDNSDIKVVP--KAKIIRD-------
  • YAVESEA---SVQ--PVAALERIN------
  • WLN-YNE---ERGDFPGTYVEYIGRKKISP
  • Rule once a gap - always a gap

50
Multiple sequence alignment
  • Order of pair-wise profile alignments determined
  • by phylogenetic tree based on pair-wise
    similarity
  • values (guide tree)

51
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP

52
Multiple sequence alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP

53
Multiple sequence alignment
  • Problem simple guide tree determines multiple
    alignment multiple alignment determines
    phyolgeneitc analysis

54
Multiple sequence alignment
  • Implementations
  • Clustal W, PileUp, MultAlin

55
Local multiple alignment

M
M
56
Local multiple alignment

M
M
M
57
Local multiple alignment
M
M
M
M
M
M
58
Local multiple alignment
  • Find motifs contained in all sequences in data
    set
  • Problem
  • motifs often present in only sub-families

59



Neither local nor global methods appliccable
60



Alignment possible if order conserved
61
The DIALIGN approach

62
The DIALIGN approach
  • Combination of local and global methods.

63
The DIALIGN approach
  • Combination of local and global methods.
  • Find local pair-wise similarities between input
    sequences
  • (fragments)

64
The DIALIGN approach
  • Combination of local and global methods.
  • Find local pair-wise similarities between input
    sequences
  • (fragments)
  • Compose alignments from fragments

65
The DIALIGN approach
  • Combination of local and global methods.
  • Find local pair-wise similarities between input
    sequences
  • (fragments)
  • Compose alignments from fragments
  • Ignore non-related parts of the sequences

66
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc

67
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc

68
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc

69
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc

70
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc
  • ------atctaatagttaaaccccctcgtgcttag-------agatccaa
    ac
  • cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc
    --

71
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc
  • ------atctaatagttaaaccccctcgtgcttag-------agatccaa
    ac
  • cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc
    --
  • ------atcTAATAGTTAaaccccctcgtGCTTag-------AGATCCaa
    ac
  • cagtgcgtgTATTACTAAc----------GGTTcaatcgcgcACATCCgc
    --

72
The DIALIGN approach
  • Score of an alignment
  • Define score of fragment f
  • l(f) length of f
  • s(f) sum of matches (similarity values)
  • P(f) probability to find a fragment with length
    l(f) and
  • at least s(f) matches in random
    sequences that have
  • the same length as the input sequences.
  • Score w(f) -ln P(f)

73
The DIALIGN approach
  • Score of an alignment
  • Define score of alignment as
  • sum of scores w(f) of its fragments
  • No gap penalty is used!
  • Optimization problem for pair-wise alignment
  • Find chain of fragments with maximal total
    score

74
The DIALIGN approach
  • ------atctaatagttaaaccccctcgtgcttag-------agatccaa
    ac
  • cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc
    --
  • Fragment-chaining algorithm finds optimal chain
    of
  • fragments.

75
The DIALIGN approach
  • Multiple fragment alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

76
The DIALIGN approach
  • Multiple fragment alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

77
The DIALIGN approach
  • Multiple fragment alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

78
The DIALIGN approach
  • Multiple fragment alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

79
The DIALIGN approach
  • Multiple fragment alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

80
The DIALIGN approach
  • Multiple fragment alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

81
The DIALIGN approach
  • Multiple fragment alignment
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

82
The DIALIGN approach
  • Multiple fragment alignment
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

83
The DIALIGN approach
  • Multiple fragment alignment
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaac----------ggttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

84
The DIALIGN approach
  • Multiple fragment alignment
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

85
The DIALIGN approach
  • Multiple fragment alignment
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa
  • Consistency it is possible to introduce gaps
    such that all segment pairs are aligned.

86
The DIALIGN approach
  • Multiple fragment alignment
  • atc------TAATAGTTAaactccccCGTGC-TTag
  • cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg
  • caaa--GAGTATCAcc----------CCTGaaTTGAATaa

87
Program evaluation
  • Use biologically verified alignments
  • (known 3D structure of proteins)
  • Compare alignments produced by
  • computer programs to biologically correct
  • alignments.

88
Program evaluation
  • (1) First evaluation of multiple alignment
    programs (McClure, Vasi, Fitch,1994)
  • 4 protein families used
  • Globin, kinase, protease, ribonuclease H,
  • all globally related -gt global programs
  • performed best

89
Program evaluation
  • (2) The BAliBASE (Thompson et al., 1999)
  • 100 protein families with known 3D structure,
  • some with large insertions/deletions.

90
Program evaluation

1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn
..............gE 1ycsB 1
kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE
1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgs
lvalgfsdgqearpeeiG 1ihvA 1
.NFRVYYRDsrd......pvwkGPAKLLWkg.................eG
1vie 1 .drvrkksga.........awqGQIVGWYctn
lt.............peG 1aboA 36
WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39
WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51
WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27
AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28
YAVESeahpgsvQIYPVAALERIN...... Key alpha
helix RED beta strand GREEN core blocks
UNDERSCORE
91
Program evaluation

Results Four programs performed best, but no
method was best in all test examples. ClustalW,
SAGA and RPPR best for global alignment, DIALIGN
best for sequences with large insertions
or deletions.
92
Program evaluation

(3) Lassmann and Sonnhammer (2002) Used BAliBASE
plus artificial sequences for local
alignment Results T-COFFEE best for closely
related sequences, DIALIGN best for distal
sequences.
93
Program evaluation
94
Alignment of large genomic sequences
  • Important tool for identifying functional
  • sites (e.g. genes or regulatory elements)

95
Alignment of large genomic sequences
  • Phylogenetic Footprinting
  • Functional sites more conserved during evolution
  • gt Sequence similarity indicates biological
    function

96
Alignment of large genomic sequences
  • DIALIGN performs well in identifying local
    homologies, but is slow

97
Quadratic program running time

98
Quadratic program running time

99
Quadratic program running time

100
Quadratic program running time

101
Quadratic program running time

102
Quadratic program running time

103
Quadratic program running time


104
Solution Anchored alignments

105
Solution Anchored alignments

106
Solution Anchored alignments

107
Solution Anchored alignments

108
Solution Anchored alignments

109
Solution Anchored alignments

110
Solution Anchored alignments

111
Solution Anchored alignments

Find anchor points to reduce search space
112
Solution Anchored alignments
  • Use fast heuristic method to find anchor points
  • CHAOS developed together with Mike Brudno
  • Brudno et al. (2003), BMC Bioinformatics 466

113
Solution Anchored alignments


114
(3) Anchored alignments

115
(3) Anchored alignments

116
First step to gene predictionExon discovery by
genomic alignment

117
First step to gene predictionExon discovery by
genomic alignment
  • Evaluation of different alignment programs
  • Compare local sequence similarity identified by
    alignment programs to known exons
  • Morgenstern et al. (2002), Bioinformatics
    18777-787

118
DIALIGN alignment of human and murine genomic
sequences
119
DIALIGN alignment of tomato and Thaliana genomic
sequences
120
  • Evaluation of DIALIGN, PipMaker, WABA, BLASTN and
    TBLASTX on a set of 42 human and murine genomic
    sequences.
  • Compare similarities to annotated exons
  • Apply cut-off parameter to resulting alignments
  • Measure sensitivity and specificity

121
Performance of long-range alignment programs for
exon discovery (human - mouse comparison)
122
Performance of long-range alignment programs for
exon discovery (thaliana - tomato comparison)
123
AGenDA Alignment-based Gene Detection Algorithm
  • Bridge small gaps between DIALIGN fragments
  • -gt cluster of fragments
  • Search conserved splice sites and start/stop
    codons at cluster boundaries to Identify
    candidate exons
  • Recursive algorithm finds biologically consistent
    chain of potential exons

124
Identification of candidate exons

Fragments in DIALIGN alignment
125
Identification of candidate exons

Build cluster of fragments
126
Identification of candidate exons

Identify conserved splice sites
127
Identification of candidate exons

Candidate exons bounded by conserved splice sites

128
Construct gene models using candidate exons
  • Score of candidate exon (E) based on DIALIGN
    scores for fragments, score of splice junctions
    and penalty for shortening / extending
  • Find biologically consistent chain of candidate
    exons (starting with start codon, ending with
    stop codon, no internal stop codons ) with
    maximal total score

129
Find optimal consistent chain of candidate exons


130
Find optimal consistent chain of candidate exons


131
Find optimal consistent chain of candidate exons


132
Find optimal consistent chain of candidate exons


133
Find optimal consistent chain of candidate exons

atg
gt
ag
gt
ag
tga
atg
tga

134
Find optimal consistent chain of candidate exons

atg
gt
ag
gt
ag
tga
atg
tga
G1
G2

135
Find optimal consistent chain of candidate exons

Recursive algorithm calculates optimal chain of
candidate exons in N log N time

136
DIALIGN fragments
137
Candidate exons
138
Complete model
139
Results105 pairs of genomic sequences from
human and mouse (Batzoglou et al., 2000)
140
Results105 pairs of genomic sequences from
human and mouse (Batzoglou et al., 2000)
  • AGenDA
  • GenScan

  • 64
  • 12
    17

141
Results
  • Quality of AGenDA-based gene models comparable to
    results from GenScan
  • Exons identified that have not been identified by
    GenScan
  • No statistical models derived from known genes
    (no training data necessary!)
  • Method generally appliccable

142
AGenDA Alignment-based Gene Detection Algorithm
  • WWW server
  • http//bibiserv/TechFak.Uni-Bielefeld.DE/agenda
  • Rinner, Taher, Goel, Sczyrba, Brudno, Batzoglou,
    Morgenstern, submitted

143
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com