Title: An Introduction to Python Part III
1An Introduction to Python Part III
2Overview
- Assignments
- Solution to Programming Workshop 2
- 2-D Lists
- List comprehensions
- Zip
- File I/O
- Split
- Functions
- Programming Workshop 3
3Solution to Programming Workshop 2
- Script to calculate GC of a sequence of
nucleotides. - Inputs sequence, window size
- Outputs nucleotide number, GC for each window
- Written by Nancy Warter-Perez
- Date created April 22, 2004
- Last modified
- print("Script for computing GC.")
- seq raw_input("Please enter sequence ")
- winsize input("Please enter window size ")
4Solution to Programming Workshop 2
print("nucleotide\tGC") for i in range(0,
len(seq)-winsize1) cnt0 for j in
seqiiwinsize if(j'G' or j'C' or
j'g' or j 'c') cnt1 gc
(cnt100.0)/winsize print"i\t\t.2f"
((i1winsize/2),gc) x raw_input("\n\nPlease
enter any character to exit.\n")
5Python List Comprehensions
- Precise way to create a list
- Consists of an expression followed by a for
clause, then zero or more for or if clauses - Ex
- gtgtgt str(round(355/113.0, i)) for i in
range(1,6) '3.1', '3.14', '3.142', '3.1416',
'3.14159' - Ex replace all occurrences of G or C in a
string of amino acids with a 1 and A and T with a
0 - gtgtgt x "acactgacct"
- gtgtgt y int(i'c' or i'g') for i in x
- gtgtgt y
- 0, 1, 0, 1, 0, 1, 0, 1, 1, 0
6Creating 2-D Lists
- To create a 2-D list L, with C columns and R rows
initialized to 0 - L empty 2-Dlist
- L 0 for col in range(C) for row in range(R)
- To assign the value 5 to the element at the 2nd
row and 3rd column of L - L23 5
7Zip for parallel traversals
- Visit multiple sequences in parallel
- Ex
- gtgtgt L1 1,2,3
- gtgtgt L2 5,6,7
- gtgtgt zip(L1, L2)
- (1,5), (2,6), (3,7)
- Ex
- gtgtgt for(x,y) in zip(L1, L2)
- print x, y, '--', xy
- 1 5 -- 6
- 2 6 -- 8
- 3 7 -- 10
8More on Zip
- Zip more than two arguments and any type of
sequence - Ex
- gtgtgt T1, T2, T3 (1,2,3),(4,5,6),(7,8)
- gtgtgt T3
- (7,8)
- gtgtgt zip(T1, T2, T3)
- (1,4,7),(2,5,8) -- truncates to shortest
sequence
9Dictionary Construction with zip
- Ex
- gtgtgt keys 'a', 'b', 'd'
- gtgtgt vals 1.8, 2.5, -3.5
- gtgtgt hydro dict(zip(keys,vals))
- gtgtgt hydro
- 'a' 1.8, 'b' 2.5, 'd' -3.5
10File I/O
- To open a file
- myfile open('pathname', ltmodegt)
- modes
- 'r' read
- 'w' write
- Ex infile open("D\\Docs\\test.txt", 'r')
- Ex outfile open("out.txt", 'w') in same
directory
11Common input file operations
12Common output file operations
13Extracting data from string split
- String.split(sep, maxsplit) - Return a list
of the words of the string s. - If the optional argument sep is absent or None,
the words are separated by arbitrary strings of
whitespace characters (space, tab, newline,
return, formfeed). - If the argument sep is present and not None, it
specifies a string to be used as the word
separator. - The optional argument maxsplit defaults to 0. If
it is nonzero, at most maxsplit number of splits
occur, and the remainder of the string is
returned as the final element of the list (thus,
the list will have at most maxsplit1 elements).
14Split
- Ex
- gtgtgt x "a,b,c,d"
- gtgtgt x.split(',')'a', 'b', 'c', 'd'
- gtgtgt x.split(',',2)'a', 'b', 'c,d'
- Ex
- gtgtgt y "5 33 a 4"
- gtgtgt y.split()'5', '33', 'a', '4'
15Functions
- Function definition
- def adder(a, b, c) return abc
- Function calls
- adder(1, 2, 3) -gt 6
16Functions Polymorphism
- gtgtgtdef fn2(c)
- a c 3
- return a
- gtgtgt print fn2(5)
- 15
- gtgtgt print fn2(1.5)
- 4.5
- gtgtgt print fn2(1,2,3)
- 1,2,3,1,2,3,1,2,3
- gtgtgt print fn2("Hi")
- HiHiHi
17Functions - Recursion
- def fn_Rec(x)
- if x
- return
- fn_Rec(x1)
- print x0,
- y 1,2,3,4
- fn_Rec(y)
- gtgtgt 4 3 2 1
18Programming Workshop 3
- Create a text file called "test1.txt" with the
following data - Sample data
- 1 2 3 4 5
- more data
- 6 7 8 9 10
- Create another text filed called "test2.txt" with
the following data - More test data
- With more header info
- A B C D E F G
- Write a script to do the following
- 1. Prompt the user for a filename
- 2. Open the file
- 3. Read the file into a list of strings.
- 4. If the line does not begin with a '' print
the line to the screen. - Test your script on test1.txt and test2.txt.
19Programming Homework 2P
- Write a program to prompt the user for a scoring
matrix file name and read the data into a
dictionary - Download a representative set of PAM and Blossum
Scoring Matrix Files - Scoring matrices should be downloaded from
ftp//ftp.ncbi.nih.gov/blast/matrices/ - Due Date Thursday, May 14th
20Example Scoring Matrix File
21Algorithm for Homework 2P
- Step 1 Create an empty list (of dictionaries)
- Step 2 Prompt the user for the scoring matrix
file name - Step 3 Open the file and read the contents as a
list of strings. Ignore the comment lines - Step 4 When you reach a line that doesnt start
with '' read in the amino acid symbols and split
them into your keys for your dictionary - Step 5 Read in the rest of the lines one at a
time. For each line - Step 5a. Slice off the first character (amino
acid). - Step 5b. For the rest of the string split into
individual numbers and convert to a list of
integers (use a list comprehension). This is
your data for your dictionary. - Step 5c. Zip the keys and data together and
convert into a dictionary. - Step 5d. Add the dictionary to the list of
dictionaries - Step 6 After youve read all lines, create the
dictionary of dictionaries by zipping the keys
and the list of dictionaries and convert into a
dictionary.