What%20is%20Perl? - PowerPoint PPT Presentation

About This Presentation
Title:

What%20is%20Perl?

Description:

What is Perl? Practical Extraction and Report Language Interpreted Language Optimized for String Manipulation and File I/O Full support for Regular Expressions – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 91
Provided by: Tria746
Category:

less

Transcript and Presenter's Notes

Title: What%20is%20Perl?


1
What is Perl?
  • Practical Extraction and Report Language
  • Interpreted Language
  • Optimized for String Manipulation and File I/O
  • Full support for Regular Expressions

2
Running Perl Scripts
  • Windows
  • Download ActivePerl from ActiveState
  • Just run the script from a 'Command Prompt'
    window
  • UNIX Cygwin
  • Put the following in the first line of your
    script
  • !/usr/bin/perl
  • Run the script
  • perl script_name

3
Basic Syntax
  • Statements end with semicolon
  • Comments start with
  • Only single line comments
  • Variables
  • You dont have to declare a variable before you
    access it
  • You don't have to declare a variable's type

4
Scalars and Identifiers
  • Identifiers
  • A variable name
  • Case sensitive
  • Scalar
  • A single value (string or numerical)
  • Accessed by prefixing an identifier with ''
  • Assignment with ''
  • scalar expression

5
Strings
  • Quoting Strings
  • With ' (apostrophe)
  • Everything is interpreted literally
  • With " (double quotes)
  • Variables get expanded

6
Comparison Operators
String Operation Arithmetic
lt less than lt
gt greater than gt
eq equal to
le less than or equal to lt
ge greater than or equal to gt
ne not equal to !
cmp compare, return 1, 0, -1 ltgt
7
Logical Operators
Operator Operation
, or logical or
, and logical and
!, not logical not
xor logical xor
8
String Operators
Operator Operation
. string concatenation
x string repetition
. concatenation and assignment
  • string1 "potato"
  • string2 "head"
  • newstring string1 . string2 "potatohead"
  • newerstring string1 x 2 "potatopotato"
  • string1 . string2 "potatohead"

9
Perl Functions
  • Perl functions are identified by their unique
    names (print, chop, close, etc)
  • Function arguments are supplied as a comma
    separated list in parenthesis.
  • The commas are necessary
  • The parentheses are often not
  • Be careful! You can write some nasty and
    unreadable code this way!

Check 02_unreadable.pl
10
Lists
  • Ordered collection of scalars
  • Zero indexed (first item in position '0')
  • Elements addressed by their positions
  • List Operators
  • () list constructor
  • , element separator
  • take slices (single or multiple element
    chunks)

11
List Operations
  • sort(LIST)
  • a new list, the sorted version of LIST
  • reverse(LIST)
  • a new list, the reverse of LIST
  • join(EXPR, LIST)
  • a string version of LIST, delimited by EXPR
  • split(PATTERN, EXPR)
  • create a list from each of the portions of EXPR
    that match PATTERN

Check 03_listOps.pl
12
Arrays
  • A named list
  • Dynamically allocated, can be saved
  • Zero-indexed
  • Shares list operations, and adds to them
  • Array Operators
  • _at_ reference to the array (or a portion of it,
    with )
  • reference to an element (used with )

13
Array Operations
  • push(_at_ARRAY, LIST)
  • add the LIST to the end of the _at_ARRAY
  • pop(_at_ARRAY)
  • remove and return the last element of _at_ARRAY
  • unshift(_at_ARRAY, LIST)
  • add the LIST to the front of _at_ARRAY
  • shift(_at_ARRAY)
  • remove and return the first element of _at_ARRAY
  • scalar(_at_ARRAY)
  • return the number of elements in the _at_ARRAY

Check 04_arrayOps.pl
14
Associative Arrays - Hashes
  • Arrays indexed on arbitrary string values
  • Key-Value pairs
  • Use the "Key" to find the element that has the
    "Value"
  • Hash Operators
  • refers to the hash
  • denotes the key
  • the value of the element indexed by the key
    (used with )

15
Hash Operations
  • keys(ARRAY)
  • return a list of all the keys in the ARRAY
  • values(ARRAY)
  • return a list of all the values in the ARRAY
  • each(ARRAY)
  • iterates through the key-value pairs of the
    ARRAY
  • delete(ARRAYKEY)
  • removes the key-value pair associated with KEY
    from the ARRAY

16
Arrays Example
  • !/usr/bin/perl
  • Simple List operations
  • Address an element in the list
  • _at_stringInstruments ("violin","viola","cello","ba
    ss")
  • _at_brass ("trumpet","horn","trombone","euphonium",
    "tuba")
  • biggestInstrument stringInstruments3
  • print("The biggest instrument ",
    biggestInstrument)
  • Join elements at positions 0, 1, 2 and 4 into a
    white-space delimited string
  • print("orchestral brass ",
  • join(" ",_at_brass0,1,2,4),
  • "\n")
  • _at_unsorted_num ('3','5','2','1','4')
  • _at_sorted_num sort( _at_unsorted_num )
  • Sort the list

Add a few more numbers _at_numbers_10 _at_sorted_num push(_at_numbers_10, ('6','7','8','9','10')) print("Numbers (1-10) ", _at_numbers_10, "\n") Remove the last print("Numbers (1-9) ", pop(_at_numbers_10), "\n") Remove the first print("Numbers (2-9) ", shift(_at_numbers_10), "\n") Combine two ops print("Count elements (2-9) ", _at_numbers_10 scalar( _at_numbers_10 ), "\n") print("What's left (numbers 2-9) ", _at_numbers_10, "\n")
17
Hashes Example
  • !/usr/bin/perl
  • Simple List operations
  • player"clarinet" "Susan Bartlett"
  • player"basson" "Andrew Vandesteeg"
  • player"flute" "Heidi Lawson"
  • player"oboe" "Jeanine Hassel"
  • _at_woodwinds keys(player)
  • _at_woodwindPlayers values(player)
  • Who plays the oboe?
  • print("Oboe ", player'oboe', "\n")
  • playerCount scalar(_at_woodwindPlayers)
  • while ((instrument, name) each(player))
  • print( "name plays the instrument\n" )

18
Pattern Matching
  • A pattern is a sequence of characters to be
    searched for in a character string
  • /pattern/
  • Match operators
  • tests whether a pattern is matched
  • ! tests whether patterns is not matched

19
Patterns
Pattern Matches Pattern Matches
/def/ "define" /d.f/ dif
/\bdef\b/ a def word /d.f/ dabcf
/def/ def in start of line /d.f/ df, daffff
/def/ def line /de1,3f/ deef, deeef
/de?f/ df, def /de3f/ deeef
/deEf/ def, dEf /de3,f/ deeeeef
/deEf/ daf, dzf /de0,3f/ up to deeef
20
Character Ranges
Escape Sequence Pattern Description
\d 0-9 Any digit
\D 0-9 Anything but a digit
\w _0-9A-Za-z Any word character
\W _0-9A-Za-z Anything but a word char
\s \r\t\n\f White-space
\S \r\t\n\f Anything but white-space
21
Backreferences
  • Memorize the matched portion of input
  • Use of parentheses.
  • /a-z(.)a-z\1a-z/
  • asd-eeed-sdsa, sd-sss-ws
  • NOT as_eee-dfg
  • They can even be accessed immediately after the
    pattern is matched
  • \1 in the previous pattern is what is matched by
    (.)

22
Pattern Matching Options
Escape Sequence Description
g Match all possible patterns
i Ignore case
x Ignore white-space in pattern



23
Substitutions
  • Substitution operator
  • s/pattern/substitution/options
  • If string "abc123def"
  • string s/123/456/
  • Result "abc456def"
  • string s/123//
  • Result "abcdef"
  • string s/(\d)/1/
  • Result "abc123def
  • Use of backreference!

24
Predefined Read-only Variables
is the part of the string that matched the regular expression
is the part of the string before the part that matched
' is the part of the string after the part that matched
EXAMPLE EXAMPLE
_ "this is a sample string" /sa.le/ matches "sample" within the string is now "this is a " is now "sample" ' is now " string" Because these variables are set on each successful match, you should save the values elsewhere if you need them later in the program. _ "this is a sample string" /sa.le/ matches "sample" within the string is now "this is a " is now "sample" ' is now " string" Because these variables are set on each successful match, you should save the values elsewhere if you need them later in the program.
25
The split and join Functions
The split function takes a regular expression and a string, and looks for all occurrences of the regular expression within that string. The parts of the string that don't match the regular expression are returned in sequence as a list of values. The join function takes a list of values and glues them together with a glue string between each list element. The split function takes a regular expression and a string, and looks for all occurrences of the regular expression within that string. The parts of the string that don't match the regular expression are returned in sequence as a list of values. The join function takes a list of values and glues them together with a glue string between each list element.
Split Example Join Example
line "merlyn11810Randal/home/merlyn/usr/bin/perl" _at_fields split(//,line) split line, using as delimiter now _at_fields is ("merlyn","","118","10","Randal", "/home/merlyn","/usr/bin/perl") bigstring join(glue,_at_list) For example to rebuilt the password file try something like outline join("", _at_fields)
26
String - Pattern Examples
  • A simple Example
  • !/usr/bin/perl
  • print ("Ask me a question politely\n")
  • question ltSTDINgt
  • what about capital P in "please"?
  • if (question /please/)
  • print ("Thank you for being polite!\n")
  • else
  • print ("That was not very polite!\n")

27
String Pattern Example
  • !/usr/bin/perl
  • print ("Enter a variable name\n")
  • varname ltSTDINgt
  • chop (varname)
  • Try asdasdas... It gets accepted!
  • if (varname /\A-Za-z_0-9a-zA-Z/)
  • print ("varname is a legal scalar variable\n")
  • elsif (varname /_at_A-Za-z_0-9a-zA-Z/)
  • print ("varname is a legal array variable\n")
  • elsif (varname /A-Za-z_0-9a-zA-Z/)
  • print ("varname is a legal file variable\n")
  • else

28
Sources
  • Beginning Perl for Bioinformatics
  • James Tisdall, OReilly Press, 2000
  • Using Perl to Facilitate Biological Analysis in
    Bioinformatics A Practical Guide (2nd Ed.)
  • Lincoln Stein, Wiley-Interscience, 2001
  • Introduction to Programming and Perl
  • Alan M. Durham, Computer Science Dept., Univ. of
    São Paulo, Brazil

29
Why Write Programs?
  • Automate computer work that you do by hand - save
    time reduce errors
  • Run the same analysis on lots of similar data
    files scale-up
  • Analyze data, make decisions
  • sort Blast results by e-value /or species of
    best mach
  • Build a pipeline
  • Create new analysis methods

30
Why Perl?
  • Fairly easy to learn the basics
  • Many powerful functions for working with text
    search extract, modify, combine
  • Can control other programs
  • Free and available for all operating systems
  • Most popular language in bioinformatics
  • Many pre-built modules are available that do
    useful things

31
Programming Concepts
  • Program a text file that contains instructions
    for the computer to follow
  • Programming Language a set of commands that the
    computer understands (via a command
    interpreter)
  • Input data that is given to the program
  • Output something that is produced by the
    program

32
Programming
  • Write the program (with a text editor)
  • Run the program
  • Look at the output
  • Correct the errors (debugging)
  • Repeat
  • (computers are VERY dumb -they do exactly what
    you tell them to do, so be careful what you ask
    for)

33
Strings
  • Text is handled in Perl as a string
  • This basically means that you have to put quotes
    around any piece of text that is not an actual
    Perl instruction.
  • Perl has two kinds of quotes - single
    and double
  • (they are different- more about this later)

34
Print
  • Perl uses the term print to create output
  • Without a print statement, you wont know what
    your program has done
  • You need to tell Perl to put a carriage return at
    the end of a printed line
  • Use the \n (newline) command
  • Include the quotes
  • The \ character is called an escape - Perl
    uses it a lot

35
Program details
  • Perl programs always start with the line
  • !/usr/bin/perl
  • this tells the computer that this is a Perl
    program and where to get the Perl interpreter
  • All other lines that start with are considered
    comments, and are ignored by Perl
  • Lines that are Perl commands end with a

36
Run your Perl program
  • gtchmod ux .pl make the file executable
  • gtperl my_perl1.pl
  • use the perl interpreter to run your script

37
Numbers and Functions
  • Perl handles numbers in most common formats
  • 456
  • 5.6743
  • 6.3E-26
  • Mathematical functions work pretty much as you
    would expect
  • 47
  • 64
  • 43-27
  • 256/12
  • 2/(3-5)

38
Do the Math (your 2nd Perl program)
  • !/usr/bin/perl
  • print 45\n
  • print 45 , \n
  • print 45 , 45 , \n
  • Note use commas to separate multiple items in a
    print statement, whitespace is ignored

39
Variables
  • To be useful at all, a program needs to be able
    to store information from one line to the next
  • Perl stores information in variables
  • A variable name starts with the symbol, and
    it can store strings or numbers
  • Variables are case sensitive
  • Give them sensible names
  • Use the sign to assign values to variables
  • one_hundred 100
  • my_sequence ttattagcc

40
You can do Math with Variables
  • !/usr/bin/perl
  • put some values in variables
  • sequences_analyzed 200
  • new_sequences 21
  • now we will do the work
  • percent_new_sequences ( new_sequences /

    sequences_analyzed) 100
  • print of new sequences ,
    percent_new_sequences
  • of new sequences 952.381

41
String Operations
  • Strings (text) in variables can be used for some
    math-like operations
  • Concatenate (join) use the dot . operator
  • seq1 ACTG
  • seq2 GGCTA
  • seq3 seq1 . seq2
  • print seq3
  • ACTGGGCTA
  • String comparison (are they the same, gt or lt)
  • eq (equal )
  • ne (not equal )
  • ge (greater or equal )
  • gt (greater than )
  • lt (less than )
  • le (less or equal )

Uses some non-intuitiveways of comparing
letters (ASCII values)
42
DNA ?? ??
!/usr/bin/perl DNA ?? ??? ??? DNA1? DNA2??
??? ?? DNA1 ACGGAA DNA2 CCGGAAGAA DNA3
DNA1DNA2 ??????? ????? ?? ?? ?? ?????. ??
??? ???? ??. ?????? ?? DNA4DNA1.DNA2
43
DNA? RNA? ??
!/usr/bin/perl DNA ? RNA? ?? DNA1
ACGGAA RNADNA1 RNA s/T/U/g ?????
?? Exit
? ??? ???, s? ?????, T? U? g(??)
???? ???? ??????
44
?????(??)
!/usr/bin/perl ??? A-gtT, T-gtA, C-gtG, G-gtC DNA1
ACGGAA DNA2DNA1 DNA2 s/A/T/g DNA2
s/T/A/g DNA2 s/G/C/g DNA2 s/C/G/g ?????
?? Exit
? ??? ???, s? ?????, T? U? g(??)
45
?????
!/usr/bin/perl ??? A-gtT, T-gtA, C-gtG, G-gtC DNA1
ACGGAA DNA2DNA1 DNA2 tr/ATGC/TAGC/
????? ?? Exit
? ??? ???, tr? ?????
46
??????
!/usr/bin/perl filename a.dat open(MYFILE,
filename) DNA1 ltMYFILEgt ??? ? DNA2
ltMYFILEgt ??? ? DNA3 ltMYFILEgt ??? ? ?????
?? Exit
47
??????
!/usr/bin/perl filename a.dat open(MYFILE,
filename) _at_DNA ltMYFILEgt ??? ? print
_at_protein close MYFILE ????? ?? Exit
_at_? ????. ??? ?? ??? ?? ???? ??
48
?? ?
!/usr/bin/perl _at_bases (A, G, C,
T) print _at_base AGCT print _at_base A G
C T print base0 print base1 ?????
?? Exit
_at_? ????. ??? ?? ??? ?? ???? ??
49
?? ?
!/usr/bin/perl _at_bases (A, G, C,
T) base1 pop _at_bases print base1 T?
?? Print _at_bases AGC? ??
Pop? ??? ??? ??? ???? ??
50
?? ?
!/usr/bin/perl _at_bases (A, G, C,
T) base1 shift _at_bases print base1 A?
?? Print _at_bases GCT? ??
Pop? ??? ?? ??? ???? ??
51
?? ?
!/usr/bin/perl _at_bases (A, G, C,
T) base1 pop _at_bases Unshift (_at_bases,
base1) print base1 ? Print _at_bases ?
unshift? ??? ??? ??? ??? ???? ??
52
?? ?
!/usr/bin/perl _at_bases (A, G, C,
T) base1 pop _at_bases push (_at_bases,
base1) print base1 ? Print _at_bases ?
push? ??? ?? ??? ??? ???? ??
53
?? ?
!/usr/bin/perl _at_bases (A, G, C,
T) rev1 reverse _at_bases
??? ??? ??? ??
54
?? ?
!/usr/bin/perl _at_bases (A, G, C,
T) Print scalar _at_bases 4? ?? Splice (_at_bases,
2, 0, X) 2?? ??? X? ?? Print _at_bases
AGXCT? ??
unshift? ??? ??? ??? ??? ???? ??
55
????
  • ???
  • If, if-else, unless
  • If(?) do something ?? ?? 1
  • If(??) do not do something ?? ??? 0
  • If (?) else
  • Unless (10) print 1 !0

56
??
  • ???
  • ??? ? ???? ??? ??? ????. While, for, foreach??
    ???? ??.

57
??
!/usr/bin/perl proteinfilename
file1.pep ??? ?? ??? ?????? ????? ??? ????
?? Unless (open(PROTEINFILE, proteinfilename))
print Could not open file exit while???
?? ????? ??? ?? ???? ??, ? ?? ?? While
(proteinltPROTEINFILEgt) print protein
???? Close PROTEINFILE Exit
58
?????? ??
!/usr/bin/perl proteinfilename ltSTDINgt ???
?? ??? ?????? ????? ??? ???? ?? Unless
(open(PROTEINFILE, proteinfilename)) print
Could not open file exit while??? ??
????? ??? ?? ???? ??, ? ?? ?? While
(proteinltPROTEINFILEgt) print protein
???? Close PROTEINFILE Exit
59
??
!/usr/bin/perl For( ) While
(??)
60
??? ? ??????
!/usr/bin/perl proteinfilename file1.pep
??? ??? ?? ???? ?? Chomp proteinfilename ???
?? ??? ?????? ????? ??? ???? ?? Unless
(open(PROTEINFILE, proteinfilename)) print
Could not open file exit ????? ??? ??
???? ?? ???? _at_protein? ?? _at_proteinltPROTEINFILEgt
???? Close PROTEINFILE ??? ????
?? proteinjoin(,_at_protein) ???? protein
s/\s//g Exit
61
???? ??? ??
!/usr/bin/perl DNA? ???? ???? ?? ?? _at_DNA
split(, DNA) count? ??? Count_A0 Count_G0
Count_C0 Count_T0 Foreach base (_at_DNA)
if(base eq A) count_A elseif (base
eq G) count_G .
62
??? ??
out1?? ??? ??? ?? outputfile
out1 Unless(open(COUNTBASE,
gtoutputfile)) print cannot open file
exit Print COUNTBASE Aa Gg
Close(COUNTBASE)
63
????
  • ???? ? ?? ?? DNA? ???? ????? ??. Dot(.)??? ?
    ?????()? ???? ??? ???? ??? ???? ??? ??? ? ??
    DNA? ???? ?????
  • 1?? 100??? ?? ??? ????.(??? ??)
  • DNA? ? ??? ???? ???? ????? ????. ?? ??? ???? ?
    ??? DNA???? ????.
  • ??? ??? ? ?? ???? ?? ????? ???? ????? ????. ? ???
    split, pop, shift, eq ? ?????
  • ??? DNA ??? G, C,? ???? ????.
  • ? ?? ??? ?? ? ?? ??? ??? ??? ?? ?? ????? ?????
    ????.
  • ??? ?? ?? ??? ??? ??? ??? ??? ??? ???? ?????
    ????.

64
subroutine
ACGT? DNA? ??? ?? dna dnaCGACTTAA longer_
dnaaddACGT(dna) Print I added ACGT to dna
and got longer_dna Exit subroutine? ??
?? Sub addACGT my(dna)_at__ ? ?? ??? ????.
??? ? ?? dna .ACGT return dna
65
subroutine
? ? ??? ??? ? my(dna, protein,
name_of_genes) _at_
66
Subroutine(call by reference)
? ????? ??? ??? ??? ???? ? ??? \? ??
reference? ?? ???? ???. !/usr/bin/perl My
_at_i(1,2,3) Reference_sub(\_at_i) Print
_at_i Exit Sub reference_sub ?? ??? ???
????? my(i)_at__ ?? ??? ? ??? ?? ??? ????
???? ??? ?? push(_at_i, 4) 4? ??
67
subroutine
  • ???? ? ?? ?? DNA? ???? ????? ??. Dot(.)??? ?
    ?????()? ???? ??? ???? ??? ???? ??? ??? ? ??
    DNA? ???? ?????
  • 1?? 100??? ?? ??? ????.(??? ??)
  • DNA? ? ??? ???? ???? ????? ????. ?? ??? ???? ?
    ??? DNA???? ????.
  • ??? ??? ? ?? ???? ?? ????? ???? ????? ????. ? ???
    split, pop, shift, eq ? ?????
  • ??? DNA ??? G, C,? ???? ????.
  • ? ?? ??? ?? ? ?? ??? ??? ??? ?? ?? ????? ?????
    ????.
  • ??? ?? ?? ??? ??? ??? ??? ??? ??? ???? ?????
    ????.

68
??? ???? DNA? ?? G? ?? ????
!/usr/bin/perl 0? ????? ??? ?? ???
?? My(USAGE)0 DNA _at_ARGV? ?? ????? ??? ??
?? ???? ?? ARGV? ?? ??? Unless(_at_ARGV) print
USAGE My(dna)ARGV0 subroutine?? My(num_
of_Gs)countG(dna) ?? ???? ?? Print
num_of_G Exit
69
??? ???? DNA? ?? G? ?? ????
subroutine Sub countG ?? dna? ???? G? ??
??? ????. ??? ?? ??? my(dna)_at__ my(cou
nt)0 tr??? ????? ??? ??? ??? ??? ????. ??? ??
??????? ???? ??? ??? ???? ??? ???
???. count(dna tr/Gg//) return count
70
????
  • DNA? ? ?? ???? ???? ??? ??? ????? ????
  • DNA? ?? A, T, G, C? ???? ???? ????? ????.
  • ? ??? 10?? ?? ????. ? ?? ?? ???? ????? ????.
  • ? ??? 10?? ?? ????. ? ??? ??? 50? ??? ????? ???.
  • DNA? ???(A-gt, G-gtC, C-gtG, T-A)? ??? ????? ????.
    (?? ??? ????? ???. ?? ??????)
  • DNA??? ????, ??? ??? ???? ????? ????.

71
??? ??? ???
subroutine Sub revcom my(dna)_at__ my(revcom
)reverse(dna) revcom tr/ACGTacgt/TGCAtgca/
return revcom
72
Regular Expressions
  • Regular expressions are used to define patterns
    you wish to search for in strings
  • Use a syntax with rules and operators
  • Can create extremely sophisticated patterns
  • Numbers, letters, case insensitivity, repetition,
    anchoring, zero or one, white space, tabs,
    newlines, etc....
  • Patterns are deterministic but can be made
    extremely specific or extremely general
  • Test for match, replace, select
  • Lots on REGEX tomorrow!

73
Using REGEX
  • is the operator we use with REGEX
  • is combined with utility operators to match,
    replace

DNA AGATGATAT if (DNA /ATG/) print
Match!
Matching leaves the string unchanged
The pattern is a set of characters between //
pattern match comparison operator
74
REGEX - Substitution
  • You can substitute the parts of a string that
    match a regular expression with another string

Substitution operator
DNA AGATGATAT DNA s/T/U/g print DNA,
\n
Pattern to search for
Replacement string
AGAUGAUAU
Global replacement
Substitution changes the variable
75
REGEX - Substitution
DNA AGATGATAT DNA s/T/U/ print DNA,
\n
AGAUGATAT
76
REGEX - Translation
  • You can translate a string by exchanging one set
    of characters for another set of characters

Translation operator
DNA AGATGATAT DNA tr/ACGT/TGCA/ print
DNA, \n
Set of characters to replace
Replacement characters
TCTACTATA
Translation changes the variable
77
S , tr
  • Task
  • transcription and reverse complement a DNA
    sequence
  • Concepts
  • Simple regular expressions using s and tr

78
Functions
  • reverse(STRING)
  • Function that reverses a string
  • STRING s/PATTERN/REPLACEMENT/modifiers
  • This is the substitute operator
  • STRING tr/CHARS/REPLACEMENT CHARS/
  • This is the translation operator

79
REGEX - recap
  • REGEX are used to find patterns in text
  • Use a syntax that must be learned in order to be
    exploited
  • Extremely powerful for processing and
    manipulating text
  • Will be examined more closely tomorrow

80
Functions
  • Functions (sub-routines) are like small programs
    inside your program
  • Like programs, functions execute a series of
    statements that process input to produce some
    desired output
  • Functions help to organise your program
  • parcel it into named functional units that can be
    called repeatedly
  • There are literally hundreds of functions
    built-in to Perl
  • You can make your own functions

81
What happens when you call a function?
sub reverse process
input return output
DNA ACATAATCAT rcDNA reverse
(DNA) rcDNA tr/ACGT/TGCA/
82
Calling a function
  • Input is passed to a function by way of an
    ordered parameter list

Basic syntax of calling a function
result function_name (parameter list)
longDNA "ACGACTAGCATGCATCGACTACGACTACGATCAGCATC
GACT" shortDNA substr (longDNA, 0, 10)
String from which to extract the substring
Start from this position
Length of the substring
83
Useful string functions in Perl
  • chop(STRING) OR chop(ARRAY)
  • Removes the last character from a string or the
    last character from every element in an array.
    The last character chopped is returned.
  • index(STRING, SUBSTRING, POSITION)
  • Returns the position of the first occurrence of
    SUBSTRING in STRING at or after POSITION. If you
    don't specify POSITION, the search starts at the
    beginning of STRING.
  • join(STRING, ARRAY)
  • Returns a string that consists of all of the
    elements of ARRAY joined together by STRING. For
    instance, join("gtgt", ("AA", "BB", "cc")) returns
    "AAgtgtBBgtgtcc".
  • lc(STRING)
  • Returns a string with every letter of STRING in
    lowercase. For instance, lc("ABCD") returns
    "abcd".
  • lcfirst(STRING)
  • Returns a string with the first letter of STRING
    in lowercase. For instance, lcfirst("ABCD")
    returns "aBCD".
  • length(STRING)
  • Returns the length of STRING.
  • split(PATTERN, STRING, LIMIT)
  • Breaks up a string based on some delimiter. In an
    array context, it returns a list of the things
    that were found. In a scalar context, it returns
    the number of things found.
  • substr(STRING, OFFSET, LENGTH)
  • Returns a portion of STRING as determined by the
    OFFSET and LENGTH parameters. If LENGTH is not
    specified, then everything from OFFSET to the end
    of STRING is returned. A negative OFFSET can be
    used to start from the right side of STRING.
  • uc(STRING)
  • Returns a string with every letter of STRING in
    uppercase. For instance, uc("abcd") returns
    "ABCD".
  • ucfirst(STRING)

source http//www.cs.cf.ac.uk/Dave/PERL/
84
Useful array functions in Perl
  • pop(ARRAY)
  • Returns the last value of an array. It also
    reduces the size of the array by one.
  • push(ARRAY1, ARRAY2)
  • Appends the contents of ARRAY2 to ARRAY1. This
    increases the size of ARRAY1 as needed.
  • reverse(ARRAY)
  • Reverses the elements of a given array when used
    in an array context. When used in a scalar
    context, the array is converted to a string, and
    the string is reversed.
  • scalar(ARRAY)
  • Evaluates the array in a scalar context and
    returns the number of elements in the array.
  • shift(ARRAY)
  • Returns the first value of an array. It also
    reduces the size of the array by one.
  • sort(ARRAY)
  • Returns a list containing the elements of ARRAY
    in sorted order. See next Chapter 8on References
    for more information.
  • split(PATTERN, STRING, LIMIT)
  • Breaks up a string based on some delimiter. In an
    array context, it returns a list of the things
    that were found. In a scalar context, it returns
    the number of things found.

source http//www.cs.cf.ac.uk/Dave/PERL/
85
String functions - split
  • splits a string into an array based on a
    delimiter
  • excellent for processing tab or comma delimited
    files

line MacDonald,Old,The farm,Some city,BC,E1E
1O1 (lastname, firstname, address, city,
province, postalcode) split (/,/, line)
print (LAST NAME , lastname, \n,
FIRST NAME , firstname, \n,
ADDRESS , address, \n, CITY ,
city, \n, PROVINCE , province,
\n, POSTAL CODE , postalcode, \n)
REGEX goes here
String goes here
LAST NAME MacDonald FIRST NAME Old ADDRESS
The Farm CITY Some city PROVINCE BC POSTAL
CODE E1E 1O1
86
Array functions - sort
  • You can sort the elements in your array with
    sort

_at_myNumbers ("one","two","three","four") _at_sorted
sort(_at_myNumbers) print _at_sorted\n
sorts alphabetically
four one three two
87
Making your own function
sub tells the interpreter you are declaring a
function
This is the function name. Use this name to
call the function from within your program
sub function_name (my param1, my param2,
...) _at__ do something with the parameters
my result ... return result
What is this? This is an array that gets created
automatically to hold the parameter list.
What is the word my doing here? my is a
variable qualifier that makes it local to the
function. Without it, the variable is available
anywhere in the program. It is good practice to
use my throughout your programs more on this
tomorrow.
return tells the interpreter to go back to the
place in the program that called this function.
When followed by scalars or variables, these
values are passed back to where the function was
called. This is the output of the function
88
Making your own function - example
Function definition
sub mean my _at_values _at__ my numValues
scalar _at_values my mean foreach my
element (_at_values) my sum sum element
mean mean / numValues return
mean avg mean(1,2,3,4,5)
local variables to be used inside the function
do the work!
return the answer
89
Functions - recap
  • A function packages up a set of statements to
    perform a given task
  • Functions take a parameter list as input and
    return some output
  • Perl has hundreds of functions built-in that you
    should familiarise yourself with
  • Keep a good book, or URL handy at all times
  • You can (and should!) make your own functions

90
????
  1. ??? ??? ? ?? ???? ?? ?????? ???? ????? ????. ?
    ????? split, pop,shift, eq ?? ????
  2. ???? ??? 20? dna??? ??. ? ??? 5??? ??? T? ????
    ????? ????.
  3. Dna ??? ???? ???? ????? ????. ?? tr? ???? ?? ???
    ?? ??? ????? ????.
  4. ??? ?? ?? ??? ??? ?? ????? ??? ???? ????? ?????
    ????. Push, pop, shift, unshift??? ????.
  5. ?? ??? reverse ??? ???? ????.
  6. FASTA format? ??? DNA??? ??? ??? ???? ????? ????.
  7. FASTA format? ??? DNA??? ??. ? ??? 'TATA'?? ????
    ? ? ????? ???? ????? ????
Write a Comment
User Comments (0)
About PowerShow.com