Title: A Program to Transcribe a DNA Sequence into RNA
1A Program to Transcribe a DNA Sequence into RNA
- HI5100 Data Structures for BioInformatics
- Lesson 7
2OBJECTIVES
- In this lesson you will
- Develop an algorithm.
- Translate the algorithm into a Pytyon program.
3DNA Sequence Representation
- You can represent a DNA sequence as a string
- DNA is composed of the four nucleic acids
(nucleotides, bases) - Adenine A
- Cytosine C
- Guanine G
- Thymine T
- The single letters are standard IUB / IUPAC
nucleic acid codes - IUB (International Union of Biochemistry)
- IUPAC (International Union of Pure and Applied
Chemistry http//www.iupac.org/index_to.html)
4DNA Sequence
ACCGATACGCCACTTAACAG
5DNA to RNA
- Very complex cellular mechanism
- Fairly simple from a programming standpoint
- Change all the Ts in the sequence to Us
6Understand the Problem Take 1
- From a computer processing perspective
- The computer must
- Examine each letter in the sequence
- Determine if it is a T
- If it is, replace it with a U
7Example
ACCGATACGCCACTTAACAG
First T is in position 5
8Example
- Can you replace the letter in position 5 with a
different letter?
ACCGATACGCCACTTAACAG
U
First T was in position 5
9Try It
DNA_to_RNA.py def main() DNA1
ACCGATACGCCACTTAACAG print DNA1 DNA5
U print DNA1
10Results
Translation You cannot assign a value to one
item in a string sequence.
11Understand the Problem Take 2
- The computer must
- Examine each letter in the sequence
- Determine if it is a T
- If it is, create a new string with a copy of
everything up to the T - Concatentate a U
- Continue looking through the original string for
more Ts
12Example
ACCGATACGCCACTTAACAG
ACCGA
U
- Continue looking in original string
13Build the Algorithm in Pseudocode
Set start_pos to 0 Find the position of the next
T Assign it to t_pos Copy characters from
start_pos to t_pos-1 into a new
string Concatenate a U Set start_pos to
t_pos1 Continue looking at characters in
original string Repeat from second line
- Think of the pseudocode as a rough draft of the
final algorithm
14Or as a Flowchart P. 1
15Or as a Flowchart p. 2
- Think of the flowchart as a rough draft of the
algorithm at this point
16Pseudocode / Flowchart
- Once you have the algorithm developed to a point
where you can write some code, proceed with - Write code
- Test
- Write code
- Test
17Assignment 2
- Use the psuedocode and flowchart drafts of the
DNA to RNA algorithm to build a small program
that - Reads DNA sequences from a text file
- Converts each sequence into RNA
- Saves the RNA sequence to a text file
- Prints the DNA sequence on one line
- Directly underneath on the next line prints the
RNA - Continues reading, converting, writing to file,
and printing until there is no more data in the
text file
18Assignment 2
- A text file with several test strings is provided
as DNAtest.csv - Add data to DNAtest.csv so that all important
special test cases are demonstrated, for example - A sequence that starts with T
- A sequence that ends with T
- A sequence with no Ts
- A sequence that is all Ts
19Summary
- That covers computing with Python in a nutshell
- Now we are ready to tackle some data structures!
20End of Slides for Lesson 7
- HI5100 Data Structures for BioInformatics
- Lesson 7