Application of Algorithm Research to Molecular Biology - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Application of Algorithm Research to Molecular Biology

Description:

Each triplet is called a codon. ... For each codon being scanned, it produces a corresponding amino acid. ... Note that codon AAA corresponds to amino acid K ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 43
Provided by: csieNc5
Category:

less

Transcript and Presenter's Notes

Title: Application of Algorithm Research to Molecular Biology


1
Application of Algorithm Research to Molecular
Biology
  • R. C. T. Lee
  • Dept. Of Computer Science
  • National Chinan University

2
  • There is one peculiar characteristics of all
    living organisms We can reproduce ourselves.
  • Yet, it is important that what we reproduce have
    to be the same as we are.
  • That is, wild flowers produce the same kind of
    wild flowers and birds reproduce the same kind of
    birds.

3
  • Information about ourselves must be passed to our
    descendants.
  • Question How is this done?
  • Answer Through DNA.

4
  • DNA(Deoxyribonucleic Acid) can be viewed as two
    strands of nucleic acids formed as a double helix.

5
(No Transcript)
6
  • There are only four types of nucleic acids in
    every DNA
  • A Adenine
  • G Guanine
  • C Cytosine
  • T Thymine

7
  • Each strand of a DNA is a sequence of A, G, C and
    T.
  • Yet, in each strand, A is paired with T in the
    other strand.
  • Similarly, G is paired with C.

8
Human Mitochondrial DNA Control Region
  • TTCTTTCATGGGGAAGCAAA
  • AAGAAAGTACCCCTTCGTTT

9
  • DNA exists in cells.
  • For each living organism, there are a lot of
    different kinds of cells. For instance, in human
    beings, we have muscle cells, blood cells, neural
    cells etc.
  • How can different cells perform different
    functions?

10
Genes
  • In each DNA sequence, there are subsequences
    which are called genes.
  • Each gene corresponds to a distinct protein and
    it is the protein which determines the function
    of the cell.
  • For instance, in red blood cells, there must be
    oxygen carrying protein haemoglobin and the
    production of this protein is controlled by a
    certain gene.

11
Proteins
  • Each protein consists of amino acids.
  • There are 20 different amino acids

12
(No Transcript)
13
The Relationship between a Gene and its
Corresponding Protein
14
  • As shown above, each amino acid is coded by a
    triplet. For instance, TTC denotes
    PHE(Phenylalanine).
  • Each triplet is called a codon.
  • There are three codons, namely TAA, TGA and TAG
    which represent end of gene.

15
  • Protein Rnase AKETAAAKFER
  • Its corresponding DNA sequence isAAA GAA ACT
    GCT GCT GCT AAA TTT GAA CGT

16
How Is a Protein Produced?
  • RNA (Ribonucleic Acid)
  • Each cell is able to recognize all of the
    starting points of genes relevant to the proteins
    important to the functions of the cell.

17
  • The RNA system scans a gene. For each codon being
    scanned, it produces a corresponding amino acid.
  • After all codons have been scanned, the
    corresponding protein is produced.

18
(No Transcript)
19
  • AAA GAA ACT GCT GCT GCT AAA TTT GAA CGT
  • KETAAAKFER
  • Note that codon AAA corresponds to amino acid K
    and CGT corresponds to R.
  • Remember TAA, TGA and TAG signify end of gene.

20
Problems
  • 1. String Matching Problem
  • 2. Sequence Alignment Problem
  • 3. Evolution Tree Problem
  • 4. RNA Secondary Structure Prediction Problem
  • 5. Protein Structure Problem
  • 6. Physical Mapping Problem

21
Exact String Matching Problems
  • Exact String Matching Problems
  • Instance A text T of length n and a pattern P
    of length m, where n gt m.
  • Question Find all occurrences of P in T.
  • Example If T ttaptaap and P ap, then P
    occurs in T starting at 3 and 7.
  • Linear time (O(nm) time) Algorithms
  • Knuth-Morris-Pratt (KMP) algorithm
  • Boyer-Moore algorithm

22
Approximate String Matching Problems
  • Approximate String Matching Problems
  • Instance A text T of length n, a pattern P of
    length m and a maximal number of errors allowed k
  • Question Find all text positions where the
    pattern matches the text up to k errors, where
    errors can be substituting, deleting, or
    inserting a character.
  • Example
  • Let T pttapa, P patt and k 2.
  • The substrings T1..2, T1..3, T1..4 and
    T5..6 are up to 2 errors with P.
  • Algorithms
  • Dynamic Programming approach
  • NFA approach

23
Sequence Alignment Problem
  • ATTCATTACAACCGCTATGACCCATCAACAACCGCTATG
  • It appears that these two sequences are quite
    different.
  • An alignment will produce the followingATTCATTA-
    CAACCGCTATGACCCATCAACAACCGCTATG

24
  • Given two sequences, any alignment will have a
    corresponding score.
  • For each exact match, the score is equal to 2.
  • For each mismatch, the score is equal to -1.
  • AGC- AG-CAAAC AAAC2-3-1
    2x2-2x(-1)2

25
  • The sequence alignment problem Given two
    sequences, find an alignment which produces the
    highest score.
  • Approach Dynamic Programming
  • The multiple sequence alignment problem is NP-hard

26
The Evolution Tree Problem
27
(No Transcript)
28
  • The evolution tree problem Given a distance
    matrix of n species, find an evolution tree under
    some criterion.
  • Usually, the criteria are such that all of the
    tree distances reflect the original distances.
  • That is, when two species are close to each other
    in the distance matrix, they should be close in
    the evolution tree.

29
  • Each criterion corresponds to a distinct
    evolution tree problem.
  • Most of them are NP-complete.
  • Algorithms which produce optimal evolution trees
    in polynomial time are mostly based upon the
    minimal spanning tree approach.

30
A Partial Evolution Tree of the Homo Sapien
(Intelligent Human Beings, also Modern Men) Our
ancestors are from Africa.
31
Secondary Structure of RNA
  • Due to hydrogen bonds, the primary structure of a
    RNA can fold back on itself to form its secondary
    structure.
  • Base pairs (formed by hydrogen bonds) 1. A?U
    (Watson-Crick base pair) 2. C?G (Watson-Crick
    base pair)3. G?U (Wobble base pair)

32
AGGCCUUCCU
33
2D 3D Structures of Yeast Phenylalanyl-Transfer
RNA
3D Structure
2D Structure
34
Secondary Structure Prediction Problem
  • Given an RNA sequence, determine the secondary
    structure of the minimum free energy from this
    sequence.
  • Approach Dynamic Programming

35
Protein Structure Problem
  • Each amino acid of a protein can be classified
    into either of the following two types
  • H (hydrophobic, non-polar) (hating water)
  • P (hydrophilic, polar) (loving water)
  • Then the amino acid sequence of a protein can be
    viewed as a binary sequence of Hs (1s) and Ps
    (0s).

36
Example
  • Instance 011001001110010

0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
1
0
1
1
1
0
0
0
1
0
0
0
0
Score 5
Score 3
37
H-P Model
  • Instance A sequence of 1s (Hs) and 0s (Ps).
  • Question To find a self-avoiding paths embedded
    in either a 2D or 3D lattice which maximizes
    score, where the score is the number of pairs of
    1s that are adjacent in the lattice without
    being adjacent in the sequence.
  • NP-complete even for 2D lattice.

38
Physical Mapping Problem
39
Shortest Common Superstring
  • Input A collection F of strings.
  • Output A shortest possible string S such that
    for every f ? F, S is a superstring of f.
  • For example
  • NP-complete

ACT CTA AGTACTAGT
F
S
40
  • Suppose the target is too long and its contents
    are unknown.
  • What can we do?
  • Enzyme A ? 6, 8, 3, 10Enzyme B ? 7, 11, 4,
    5Enzymes A and B ? 1, 5, 2, 6, 7, 3, 3

41
This problem is called the two digest problem
which is NP-complete.
42
  • TAA, TGA, or TAG.
  • Do you know what they mean?
  • End of Gene.
  • Thank you for your patience. Have a good
    conference.
Write a Comment
User Comments (0)
About PowerShow.com