Title: Bioinformatics PhD. Course
1Bioinformatics PhD. Course
Summary (approximate)
- 1. Biological introduction
- 2. Comparison of short sequences (lt10.000 bps)
- 3 Comparison of large sequences (up to 250 000
000)
- 5 Efficient data search structures and algorithms
22. Comparison of short sequences (lt10.000 bps)
Summary (more or less)
- 2.1 Dot matrix
- 2.2 Pairwise alignment.
- 2.3 Hash algorithms.
- 2.4 Multiple alignment.
32. Dot matrix
Given two sequences, how we can analyse their
degree of identity?
By searching those parts that match
1/0
1 if both characters coincide
42. Dot matrix
Given two sequences, how we can analyse their
degree of identity?
By searching those parts that match
52.1 Dot matrix
Lwindow length
What is the cost of the algorithm?
When are the matchings relevant?
62.1. Dot matrix algorithm cost
- long(S1)long(S2) L in other words O(n2
L)
- can long(S1)long(S2) be possible?
- can we also say that O(n2 ) is independent of L?
72.1. Dot matrix signals
A transposons
When are signals statistically significant?
82.1. Dot matrix statistical significance
We need to define a random model against which
to compare the signals
we define RV X number of characters that
coincide,
then Prob(Xk)comb(L,k) pk (1-p)L-k
What is its expected value?