Title: CISC 471 Computational Biology Tutorial: Perl
1CISC 471 Computational BiologyTutorial Perl
- Phil Hyoun Lee
- Edited By
- Maryam Salehi, Quinn Lau, Rob Denroche, Tony Kuo
- Sept. 22. 2008
2Outline
- Today, we will learn how to
- run a Perl program
- use basic data structures
- handle file input/output
- manipulate strings
3Running a Perl program
- Windows (ActivePerl)
- Open your text editor
- e.g. Notepad, Notepad, SciTE
- Type your Perl program
- e.g., Hello World
4Running a Perl program
- Windows (ActivePerl)
- Save your program as hello.pl at Z or My
Documents directory
- Run your program on the command line(Start ?
Run ? open cmd)
Hello, World!
5Running a Perl program
- Linux / Unix
- Open your text editor
- e.g.,SciTE, Emacs, Vi,
- Type your Perl program
- e.g., Hello World
6Running a Perl program
- Linux / Unix
- Save your program as hello.pl
- (only Unix)
- Give execution permission to your file
- chmod ax test.pl
Hello, World!
7To run Perl at home
- Windows / Linux / Mac,
- Install ActivePerl
- http//www.activestate.com/Products/activeperl/ind
ex.mhtml - Remote Access to CASLAB Linux Server
- How to Connect to a CASLAB Linux Server
- http//www.caslab.queensu.ca/
- CASLAB Perl Instruction
- http//www.caslab.queensu.ca/LabHelp/Linux/perl/
- SciTE
- Install SciTE (you need ActivePerl as well)
- http//scintilla.sourceforge.net/SciTE.html
8use strict and warnings
- use strict forces you to declare variables using
my - It will alert you if you try to use an undeclared
variable by accident - use warnings will output warnings to the command
line - It will let you know if unexpected types or
values are being used
use strict use warnings my message Hello
world!\n print message
9Basic Data Structure
- Scalar Variable
- e.g., strings, numbers
- Array Variable
- e.g., list of strings, list of numbers,
- Hash Variable
- e.g., list of ltkey, valuegts
10Basic Data Structure Scalar
- use to denote scalar variables
lt test.pl gt
!perl ------------------------------------
SCALAR VARIABLES -------------------------------
----- my number 1 my language perl
my string language book print
anumber \n . bstring \n
11Comparisons
- Numbers and strings are compared differently
- Numbers Stringslt ltgt gtlt legt gr eq
ltgt cmp
ltgt and cmp return 1 if the first input is
larger, 0 if they are equal, and -1 if the second
input is larger
my s1 hat my s2 catif (s1 eq
s2) print equal strings!\n
my n1 5 my n2 7if (n1
n2) print equal numbers!\n
12Basic Data Structure Array
- use _at_ to denote array variables
lt test.pl gt
!perl ------------------------------------
ARRAY VARIABLES --------------------------------
---- my _at_array (10, 20, 30) print
array0 . \n print scalar(array) . \n
print arrayarray\n
array index
13Basic Data Structure Array
- to enumerate all elements
lt test.pl gt
!perl ------------------------------------
ARRAY VARIABLES --------------------------------
---- my _at_array (10, 20, 30) my
element foreach element (_at_array) print
element . \n
14Basic Data Structure Array
- to add/delete an element to/from the end of array
variables
!perl ------------------------------------
ARRAY VARIABLES --------------------------------
---- my _at_array (10, 20) my last pop
_at_array push _at_array, 4
foreach my element (_at_array) print element .
"\n"
15Basic Data Structure Array
- to add/delete an element to/from the start of
array variables
!perl ------------------------------------
ARRAY VARIABLES --------------------------------
---- my _at_array (10, 20) my first shift
_at_array unshift _at_array, 5
foreach element (_at_array) print element .
"\n"
16Basic Data Structure Array
lt test.pl gt
my _at_array ( Lucy, Fred, Ricky )
sort in ASCENDING order my _at_sorted_array sort
_at_array foreach element (_at_sorted_array) print
element . \n print \n sort in
DESCENDING order my _at_sorted_array sort b cmp
a _at_array foreach element (_at_sorted_array)
print element . \n
17Basic Data Structure Hash
- use to denote hash variables
lt test.pl gt
!perl ------------------------------------
HASH VARIABLES ---------------------------------
--- my hash (Fredgt8, Lucygt1, Rickygt2
) print hashFred . \n
if(exists
hashFred) print Hi, the id of Fred is
hashFred\n
18Basic Data Structure Hash
lt test.pl gt
19Basic Data Structure Hash
- to enumerate all elements
lt test.pl gt
!perl ------------------------------------
HASH VARIABLES ---------------------------------
--- my hash (fredgt8, lucygt1, rickygt2
) while( (my key, my value) each hash )
print Hi, the id of key is value\n
20Basic Data Structure Hash
lt test.pl gt
21Basic Data Structure Hash
- to add/delete hash elements and sorting
!perl my hash (fredgt8, lucygt1,
rickygt2 ) hashchris 7
this adds a new element delete
hashricky my _at_sortedKeys sort (keys
hash) foreach my key (_at_sortedKeys) print
hashkey.\n
22Basic Data Structure Hash
lt test.pl gt
23File Input / Output
!perl ------------------------------------
FILE OUTPUT ------------------------------------
this will create a new file test.txt to
write open MY_FILE, gttest.txt write
something to the opened file print MY_FILE
barney\n lucy\n fred\n close the
file close MY_FILE
24File Input / Output
!perl ------------------------------------
FILE OUTPUT ------------------------------------
this will open an existing file test.txt to
append open MY_FILE, gtgttest.txt write
something to the opened file print MY_FILE
barney\n lucy\n fred\n close the
file close MY_FILE
25File Input / Output
!perl ------------------------------------
FILE INPUT ------------------------------------
this will open a file test.txt to read open
MY_FILE, lttest.txt while( defined( my
line ltMY_FILEgt ) ) print line
print one line read from a file close the
file close MY_FILE
26String Manipulation
- to break up a string by delimiters
!perl ------------------------------------
STRING MANIPULATION ----------------------------
-------- my _at_tokens split ( //, abcdefg
) _at_tokens (ab, c, def, g)
foreach my element (_at_tokens) print element
. \n
27String Manipulation
- to break up a string using delimiters
!perl ------------------------------------
STRING MANIPULATION ----------------------------
-------- _at_tokens split /\s/, any number of
white \t spaces _at_tokens
(any, number, of, white,
spaces) foreach element (_at_tokens) print
element . \n
28String Manipulation
29Subroutines
sub max my maximum shift(_at__)
foreach my item (_at__) if (maximum lt
item) maximum item
return maximum my bestday
max("1","2","3","4","5") print bestday
30Subroutines
31Subroutines
- _at__ is an array of all the variables passed to the
subroutine - For example, if you called a function with two
arguments, those would be stored in
_0 and _1 - The index of the last input would be _
32Regular Expressions
- string /regular expression/
- /most text/ will match exactly the same text
- matches the start of a line
- matches the end of a line
- . matches any single character
- matches zero or more of the preceding
character - http//www.regular-expressions.info/reference.html
33Example Parsing one entry
!perl use strict use warnings open our file
using input from the command line my fileName
ARGV0 open SPFILE, "ltfileName" or die
"Couldn't open fileName\n" declaring
variables to store information my spAccess my
fullName my organism
34Example Parsing one entry
reading a file with one protein entry in
it while (defined (my line ltSPFILEgt)) if
(line /ltaccessiongt(.)lt\/accessiongt/)
spAccess 1 elsif (line
/ltfullNamegt(.)lt\/fullNamegt/) fullName
1 elsif (line /ltname
type"scientific"gt(.)lt\/namegt/) organism
1
( )s save whatever they match as 1
35Example Parsing a Table
- Open the files by yourself, and figure it out
which column contains which information from
their header
36Part I
- !perl
- open practice.dat to read
- open MY_FILE, ltpractice.dat
- extract id and the year information from
the file, - and save them to a hash using the id as a
key - line ltMY_FILEgt ignore the
header line - while( defined( line ltMY_FILEgt ) )
- _at_tokens split /\t/, line
- my_hashtokens0tokens2
-
- close the file
- close MY_FILE
37Part II
- write the extracted ids to practice.out in an
alphabetical order - 1. open a new file to write
- open MY_FILE, gtpractice.out
- 2. sort the ids
- _at_ids keys my_hash
- _at_sorted_ids sort _at_ids
- 3. write ids to the file
- foreach element (_at_sorted_ids)
- print MY_FILE element . \n
-
- close the file
- close MY_FILE
38Part III
save the ids whose year is 2006 to an
array while( (key, value) each my_hash )
if (value2006) push _at_my_array, key
print the number of the selected ids
print The number of the selected ids in 2006 is
.scalar(_at_my_array).\n print the list of
the ids print The ids are \n foreach element
(_at_my_array) print element . \n
39Perl Resources
- CPAN ( Comprehensive Perl Archive Network )
- http//www.cpan.org
- Perl Users Group
- http//www.pm.org
- Perl Built-in Functions
- perldoc u f function_name
- The Perl function list is at
- http//docs.activestate.com/activeperl/5.10/lib/po
ds/perlfunc.html - Books
- Learning Perl by R.L. Schwartz and T. Phoenix,
OReilly - Programming by Perl by L. Wall, T. Christiansen,
and J. Orwant OReilly - Web Tutorials
- http//docs.activestate.com/activeperl/5.10/lib/po
ds/perlintro.html