Title: Evolution
1Evolution
2Darwinian Evolution
- Heritable traits
- Variation in population (parental combinations
and mutations) - Visible, phenotypical differences lead to
different survival rates
Parental combinations and mutational changes
Natural Selection
3Meiosis
Each of us has two copies of each
chromosome (diploid)
- One allele from each parent
- Allele one of a series of different forms of a
gene - Each chromatid has a copy of each gene
Sperm and egg only one copy
4DNA and evolution
- Over time, genes accumulate mutations
- Environmental factors
- Radiation
- Oxidation
- Mistakes in replication or repair
- Evolution change in allele frequency over time
5Classification
- In the past phenotypical differences were used to
classify - Now that have sequenced genomes can look at
similarity of DNA - Chimps and humans 99 identical
- Diverged about 6 million years ago
6Homologs, Paralogs, and Orthologs
- To compare, species must look at similar genes
- Homologous genes
- Orthologous genes
- Separated by speciation
- Paralogous genes
- Similar due to gene duplication event
7Why do homologs drift apart?Types of mutations
- Point mutations
- Insertions, deletions
- Duplications, inversions, translocations
- Remember paralogs?
- Causes
- Radiation (cosmic, UV, X-ray)
- Replication (mitosis) or crossover (meiosis)
8If we want to compare two homologous genes
- Point mutations (substitutions),
easyACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCC
TATCGTCTATCT
9Insertions and deletions
- Indels are difficult, must align
sequencesACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCAT
CGTCTATCTACGTCTGATACGCCGTATAGTCTATCT----CTGATTC
GC---ATCGTCTATCT
10Deletions
- Codon deletionACG ATA GCG TAT GTA TAG CCG
- Effect depends on the protein, position, etc.
- Almost always deleterious
- Sometimes lethal
- Frame shift mutation (muscular dystrophy and
sickle-cell) ACG ATA GCG TAT GTA TAG CCG ACG
ATA GCG ATG TAT AGC CG? - Almost always lethal
11Insertion or deletion?
- Comparing two genes it is generally impossible to
tell if an indel is an insertion in one gene, or
a deletion in another, unless ancestry is
knownACGTCTGATACGCCGTATCGTCTATCTACGTCTGAT---CC
GTATCGTCTATCT
12Does it change the protein?
- Synonymous
- Serine
- UCU to UCT
- No change in protein
- Non-synonymous
- Serine to tyrosine
- UCU to UAU
13Nature experiments
- Gene duplication event
- One can continue to perform function
- Other can accumulate mutations experiment
- Paralogs
14Why align sequences?
- Already said
- Can then measure differences between genes to
determine evolutionary distance - see where indels and substitutions are
- Why else?
- What if wanted to do a database search?
- Databases great at perfect matches
- But to find homologous genes need fuzzy matches
Database of sequences
Sequence query
15But, how to align?
- Exhaustively, could try all possible alignments
From---ACGTACT---- ToACGT-------ACT
And everything in between
16Exhaustive placement of spaces
- Could setup a loop and place gaps in all possible
locations
Or, could solverecursively
---ACGT ACT---- --A-CGT ACT---- --AC-GT ACT----
. . . ACGT--- ----ACT
- Tricky
- Have to avoid all gap-gap situations
- Must find a way to look at ALL possibles
17Recursion
- A function that calls itself
- Can often be an elegant solution to difficult
problems - Elegant non-obvious solution that is much more
simple in design than the problem would suggest
Example factorial Definition of f(n) return
nf(n-1)
18A more practical example
- Factorial recurrence relation
- factorial(n) n factorial(n-1)
- Define f(n) return n f(n-1)
- Example f(3)
- 3 f(2)
- 2 f(1)
- 1
- 3 2 1
- How did the program know to stop at f(1)?
19Base case
- To know when to stop, must have a base case
- Base case for factorial is when n equals 1
Example my answer factorial(10) print
"answer\n" sub factorial my passedArg
shift check base case if(passedArg 1)
return 1 else if base case not
satisfied, recurse return passedArg
factorial(passedArg - 1)
20What does this have to do with aligning?
- Recursive solutions can usually be found by
breaking a problem into sub problems - Insert no gap, recurse on rest
- Insert a gap in string 1, recurse on rest
- Insert a gap in string 2, recurse on rest
Add score from matching to score from
First character of string 1 First character of string 2 rest of string 1 rest of string 2
Gap First character of string 2 string 1 rest of string 2
First character of string 1 Gap rest of string 1 string 2
Three sub-problems
21Example
t g c g _ tg c g t g _ cg
a tg a cg _ atg a cg a tg _ acg
22Scoring
- When scoring alignments there must be a gap
penalty, a mismatch penalty, and a bonus for a
match - For any two strings the best alignment score with
be the maximum of three possibilities - Recurrence relations
Match or mismatch of first chars allign(rest of
string1, rest of string2) Gap penalty
allign(string1, rest of string2) Gap penalty
allign(string1 starting at pos 2, string2)
max
23What is the base case?
- If down to empty string for either
- Return gap penalty the length of the non-empty
string (return 0 if both empty)
Base Case
24Pseudo code
- Definition of allign(string1, string2)
- If base case satisfied return base score
- Otherwise
- Return the max of
- Gap penalty allign(string1, rest of string2)
- Match or mismatch of first chars allign(rest of
string1, rest of string2) - Gap penalty allign(string1 starting at pos 2,
string2)
25Example
- These two strings
- atagcgcc
- ataggcc
- Align like
- atagcgcc
- atag_gcc
Have now taken a problem in biology and mapped it
to a common problem-solving technique in computer
science Recursion
26Life is good, but
- The previous example (an 8 character string
aligned with a 7 character string) took 103,342
invocations of allign - Why?
27Is exponential bad?
- Aligning two strings of size 500
- More invocations of align than there are
subatomic particles in the universe - If took one nanosecond per invocation
- Universe is 14 Billion years old
- It would take 8.2 10208 times the age of the
universe to calculate the alignment score
Exponential bad Corollary Tree exponential
28Complexity analysis
- Fixed best
- Linear next best
- Polynomial (n2) not bad
- Exponential (3n) very bad
- Big O notation
- O(1), O(n), O(n3), O(3n)
Big ONotation
29Next lesson
- Speeding things up
- Dynamic programming solution
Dynamic Programming
30(No Transcript)