Title: Benjamin J. Lynch
1Intermediate Perl
- by
- Benjamin J. Lynch
- blynch_at_msi.umn.edu
2Introduction
- Perl is a powerful interpreted language that
takes very little knowledge to get started. It
can be used to automate many research tasks with
little effort. - The greatest strength and weakness of Perl is the
ability to accomplish the same task using two
very different codes.
3Outline
- Review of Perl
- Variable types
- Context
- Operators
- Control structure
- Pattern Matching
- Subroutines
- Context
- References
- grep
- map
- modules
4When should I use Perl?
- Perl stands for Practical Export Report Language
- Perl is most useful for
- parsing files to extract desired data
- Doing almost anything you can do in a shell
script - cgi scripts to generate HTML for web pages
- updating or retrieving information from databases
- acting as in interface between programs
5Programming Style
- Questions you should ask
- Who else might look at the code?
- Co-workers?
- Complete strangers?
- How often will the code be modified?
- Remember your target audience
- There is no substitute for comments
6An Interpreted Language
- Perl programs are also called Perl scripts
because Perl is an interpreted language. - When you execute a Perl script, the script is
compiled into a set of instructions for the Perl
interpreter - This set of instructions (or parse tree) is sent
to the Perl interpreter - The Perl interpreter shares many similarities to
the virtual machine in Java - There is no need to compile a Perl script as a
separate preliminary step, making Perl scripts
similar to shell scripts (at least on the
exterior).
7A Simple Perl Script
- !/usr/bin/perl
- print Hello world! \n
- blynch_at_msi chmod x hello.pl
- blynch_at_msi ./hello.pl
- Hello world!
- blynch_at_msi
- \n is a new line.
- The routine print will print the item or list of
items that follows.
8Variable Types
- Scalar
- Reference (scalar pointer to another variable)
- List (array)
- Hash (associative array)
9Scalars
- Examples
- var 3
- name Larry
- float 1.1235813
- sum a 1.2
10- A scalar is a single value.
- number 1
- text Hello world!
- a 1.2
- b 1.3
- sum a b
- print sum \n
-
- 2.5
The scalar can be integer 64-bit floating
point string reference
The way that the data is stored (integer,
floating point ,) does not need to be specified.
The Perl interpreter will determine it
automatically
11Lists
- A list (or array) of values can be specified
like - _at_number_list (1,1,2,3,5,8,13,21)
- _at_grocery_list (apples,chicken,canned
soup) - A list always starts with a _at_
12Lists (arrays)
- _at_mylist (1,2,2,3,4,4,4)
- _at_names (Larry, Moe)
- push(_at_names, Curly)
- print _at_names
- Larry Moe Curly
Adds an item to the end of a list
13Lists
- A list (or array) of values
-
- _at_grocery_list (apples,chicken,canned
soup) - print grocery_list2
- canned soup
Note the numbering of elements
A is used in the print statement because of
the context. We only want print to handle a
single value from the array and so we use to
denote the scalar context.
14Hashes (associative arrays)
- A Hash is an associative array
- Instead of using an integer index, a hash uses a
key to access elements of the hash - lunch (monday gt pizza,
- tuesday gt burritos,
- wednesday gt sandwich,
- thursday gt fish)
- print on Tuesday Ill eat lunchtuesday
- on Tuesday Ill eat burritos
15Hashes (associative arrays)
- A Hash can be created with a list of key/value
pairs. - Each key has one value associated with it.
- hash (Larry gt 1, Moe gt 2, Curly gt 3)
- hash (Larry , 1, Moe , 2, Curly , 3)
Either of these work to specify a hash
16Variable Context
- _at_number_list (1,1,2,3,5,8,13,21)
- _at_grocery_list (apples,chicken,canned
soup) - print _at_grocery_list
- appleschickencanned soup
If we use the array (or list) context, the print
command will print out all elements from the
array.
17Variable Context
- _at_number_list (1,1,2,3,5,8,13,21)
- _at_grocery_list (apples,chicken,canned
soup) - print grocery_list1
- chicken
If we use the scalar context, we must specify the
element we want to print from the list.
18Variable Context
- _at_grocery_list (apples,chicken,canned
soup) - var _at_grocery_list
- print var
- 3
If we request a scalar from a list, the list
will return its length.
19Perl Operators
- massheight Multiplication
- a b Addition
- a - b Subtraction
- a / b division
- str1.str2 Concatenate
- count Increment count by 1
- missing-- decrease missing by 1
- total subtotal increase total by
subtotal - interest factor set interest to
interestfactor - string. more append more to the end of
string
20rand Perl
- rand(num)
- returns a random, double-precision floating-point
number between 0 and num. - var rand(4)
21Control structure
- !/usr/bin/perl
- _at_my_grocery_list (apples,chicken,canned
soup) - foreach item (_at_my_grocery_list)
- purchase (item)
-
- while ( some condition is true )
- do_this
22Control structure
- Two ways to if/then
- if (condition) print It is true \n
- print It is true \n if condition
23Retrieving a random element from a list
- _at_greeting (Hello,Greetings,Hola,Howdy)
- print greetingrand _at_greeting
- print greetingrand 4
- print greeting2.59196266661263168
- print greeting2
- print Hola
24Subroutines
- Defined like this
- sub my_sub_name
- do something
-
- Used like this
- mysubroutine(variables passed)
25Subroutines
- Variables passed to a subroutine enter the
routine as a single list - _at_list1 (a ,b ,d )
- scalar 42
- mysub(_at_list1, scalar)
- sub mysub
- print _at__
a b d 42
26Returning values from subroutines
- Subroutines return whatever is returned by the
return statement or else the last item evaluated
in the subroutine. - _at_list1 (2,3)
- print mymult(_at_list1)
- sub mymult
- product _0_1
- return product
6
27Pattern Matching
- Perl uses a very robust pattern matching syntax
- The most basic pattern match looks like
- string /some pattern/
- string 1 2 three
- if (string /2/)
- print the number 2 is in the string\n
-
In Perl, anything but and 0 are considered
TRUE
28Pattern Matching
- , \n
- string 1 2 hello 2 5
- matching (string /\d/)
- print matching
- 1
- _at_matches (string /\d/g)
- print _at_matches
- 1
- 2
- 2
- 5
g is for global. This will allow the pattern to
be matched multiple times.
\d will match any single digit
29Pattern Matching
- ,\n
- string 1.45 1.482 1.938 other text
10.2849 - print (string /\d.\d/g)
- 1.45
- 1.482
- 1.938
- 0.2849
30Pattern matching
- /pattern/
- /(sub-expression1)(sub-expression2)/
- \d number
- \s whitespace
- \S non-whitespace
- pattern2 will match pattern exactly twice
- character list defined character class
- abcDEF
- a NOT b
- OR statement - it will match pattern on either
side
31/(bbb2)/
This is written on a T-Shirt I own
32/(bbb2)/
bb
We want 2 of them
New character class NOT b
OR statement
33Pattern Matching
- ------------------------------------------------
- Charge Models 2 and 3 (CM2 and CM3) and
- Solvation Model SM5.42 GAMESSPLUS version 4.3
- ------------------------------------------------
- Gas-phase
- ------------------------------------------------
- Center Atomic CM3 RLPA Lowdin
- Number Number Charge Charge Charge
- ------------------------------------------------
- 1 3 .218 -1.090 -.938
- Gas-phase dipole moment (Debye)
- ------------------------------------------------
- X Y Z Total
- CM3 -.718 -.592 -1.748 1.980
- RLPA -.327 1.122 -.840 1.440
- Lowdin -.116 1.662 -.761 1.832
- ------------------------------------------------
34Pattern Matching
- if (/ CM3\s(-?\d\.\d)\s(-?\d\.\d)\s(-
?\d\.\d)\s(-?\d\.\d)\s/) - amsol94
- if (/ CM3\s(-?\d\.\d\s)3(-?\d\.\d)\s/
) - amsol92
- if (/ CM3\s(-?\d\.\d\s)3(-?\d\.\d)\s/)
- amsol92
- if (/ CM3(\s\S)3\s(\S)/)
- amsol92
35Pattern Matching
- if (/ CM3\s(-?\d\.\d\s)3(-?\d\.\d)\s/)
- amsol92
- if (/ CM3(\s\S)3\s(\S)/)
- amsol92
36Substitutions
- s/search pattern/replace/
- string words9words383words
- string s/\d/, /g
- print string
words, words, words
37Special Variables
- 1, 2, 3,
- Holds the contents of the most recent
sub-patterns matched - if (string /(Larry) (Moe) Curly/)
- print 2
-
-
Moe
38Special Variables
-
- Determines which index in a list is the first,
the default is 0. - my _at_mylist (Larry, Moe, Curly)
- print mylist1
- 1
- print mylist1
- Moe
- Larry
39Special Variables
-
- Entire pattern from most recent match
40Special Variables
- /
- Input record separator, default is \n
- undef /
- open(FILE,ltinput.txt)
- bufferltINFILEgt
- buffer contains the entire file
41Special Variables
42Special Variables
- ,
- Default separator used when a list is printed,
default is - , will add a space between each item if you
print out a list. - \
- Default record separator, default is
- \ \n will add a blank line after each print
statement.
43Special Variables
- T time the perl program was executed
- autoflush
- process ID number for Perl
- 0 name of perl script executed
- ENV hash containing environmental variables.
44Special Variables
- _at_ARGV is a list that old all the arguments passed
to the Perl script. - _at__ is a list of all the variables passed to the
current subroutine
45Special Variables
- _ is a variable that hold the current topic.
- e.g.
- while (ltFILE1gt)
- print line . _
46Special Variables
- _ is a variable that hold the current topic.
- e.g.
- while (ltFILE1gt)
- print line . _
This is the current line number
This is the current line being processed in FILE1
47References
- A reference is a scalar
- Instead of number or string, a reference holds a
memory location for another variable or
subroutine. - myref \variable
- subref \subroutine
48Dereferencing the Reference
- To retrieve the value stored in a reference, you
must dereference it. - name Larry
- ref_name \name
- print ref_name , \n
- print ref_name, \n
- SCALAR(0x60000000000218a0)
- Larry
49Dereferencing the Reference
- Modifying a dereferenced reference to a variable
is the same as modifying the variable. - name Larry
- ref_name \name
- ref_name . , Moe, and Curly
- print ref_name , \n
- print name, \n
- Larry, Moe, and Curly
- Larry, Moe, and Curly
50Where do we want to use a reference?
- References are very useful when passing lists to
a subroutine. - _at_mylist (Larry, Moe, Curly)
- list_ref \_at_mylist
- mysub(list_ref )
- sub mysub
- my ref _0
- my _at_list _at_ref
- print list2, \n
51Where do we want to use a reference?
- References are very useful when passing lists,
hashes, or subroutines to a subroutine. - myhash (1 gt Larry, 2 gt Moe, 3 gt Curly)
- hash_ref \myhash
- mysub(hash_ref )
- sub mysub
- my ref_inside _0
- print ref_inside2, \n
- print _02, \n
52Where do we want to use a reference?
- References are very useful when passing lists,
hashes, or subroutines to a subroutine. - myhash (1 gt Larry, 2 gt Moe, 3 gt Curly)
- hash_ref \myhash
- mysub(hash_ref )
- sub mysub
- my ref_inside _0
- print ref_inside2, \n
- print _02, \n
Both print the same thing
53We can even pass subroutines
- sub_ref \my_subroutine
- run_this(sub_ref )
- sub runthis
- my ref _0
- ref
54GREP
- _at_matching_linesgrep(/expression/,_at_input_lines)
- _at_matching_linesgrep /expression/ _at_input_lines
- _at_no_commentsgrep !// _at_lines_of_code
- open(FILE1,ltmycode.pl)
- _at_no_commentsgrep !// ltFILE1gt
55MAP
- map BLOCK _at_array
- Returns the list generated by executing BLOCK for
each value of _at_array - foreach number (_at_mylist)
- print number1
-
- print map _1 _at_mylist
56MAP
- The block can have any amount of code or
subroutines - map mysub(_) _at_array
- map
- sub1(_)
- sub2(_)
- a1
- _at_array
This map would simply return a list of 1s.
The last value evaluated is returned
57keys
- keys hash
- Will return a list of the keys in the hash
- myhash (1 , Larry, 2, Moe,3,Curly)
- _at_keylist keys myhash
- print _at_keylist
- 123
58Modules
- Modules are reusable packages defined in a
library file - They offer simple access to routines such as
- Database access
- Matrix manipulations
- Communication libraries
- Editing standard binary formats (.doc, xls, )
- Graphics libraries (OpenGL, Tk, )
59Using a Module
- use MailSendmail
- ...
- foreach user (_at_email_list)
- mail_mess ( To gt "user",
Fromgt'me_at_uofu.edu', - Subject gt "subject",
- Message gt "message"
- )
- sendmail(mail_mess)
60New Objects in Modules
- Perl is an object-orientated language
- You can define new object types
- A scalar can hold other object types, such as a
matrix, a tensor, a window, a database, - The behavior of new objects are defined in the
corresponding modules. - See www.cpan.org for a few thousand useful perl
modules.
61Overloading Operators
- The standard operators in Perl can be defined for
additional operations when placed between objects
that are not fundamental types. - a b 4
- use MathMatrixSparse
-
- matrix_product matrix1matrix2
62global, local, my
- Global variables are the default in Perl.
- Globals can be seen in any subroutines.
- var 1
- printme()
- sub printme
- print var
We didnt pass var, but this works because var
is global.
63global, local, my
- Local variables are also global
- Local variables become undef when they go out of
scope. -
- local name
- mysub()
-
name has become undef here
64global, local, my
- my variables are preferred for most Perl code.
- my scalar
- sub1(\myhash, _at_array)
scalar will not be available in sub1 because
it is not explicitly passed.
65my Code
use strict will not allow global variables to be
defined on the fly. Global variables that
appear partway through the code often make your
script unreadable.
- use strict
- my _at_array
- my hash
66How much have you learned?
- How do we make a Perl script that takes a list as
its argument, and returns the unique values from
that list? - _at_array (1, 1, 2,3, 4, 4, 34, 20 , 20)
- ,,
- print unique(_at_array)
- 1, 2, 3, 4, 34, 20
67- sub unique
- max_el_at__-1
- my _at_u_list
- u_list0_0
- for i (1 .. max_el)
- original1
- foreach item (_at_u_list)
- if (item _i)
- original0
-
-
- if (original)
- push (_at_u_list, _i)
-
-
- return _at_u_list
This is how a FORTRAN77 Expert might solve the
problem. Simple, straightforward, lots of code.
68A smaller script
- sub unique
- my _at_u_list keys map _ gt 1 _at__
- return _at_u_list
How does that work?
69A smaller script
- sub unique
- my _at_u_list keys map _ gt 1 _at__
- return _at_u_list
map _ gt 1 _at__
This will return a list of key/value pairs. The
keys will be each value (_) in _at__ (the array
passed to the subroutine).
The values will all be 1
70A smaller script
- sub unique
- my _at_u_list keys key/value pairs
- return _at_u_list
key/value pairs
This creates an anonymous hash and returns a
reference to it.
71A smaller script
- sub unique
- my _at_u_list keys reference_to_a_hash
- return _at_u_list
reference_to_a_hash
We dereference the anonymous hash
72A smaller script
- sub unique
- my _at_u_list keys hash
- return _at_u_list
This returns the keys in the hash
Where did we wipe out the duplicates?
73A smaller script
- sub unique
- my _at_u_list keys key/value pairs
- return _at_u_list
key/value pairs
When we create our anonymous hash, we assign the
value 1 for each key. When a key is repeated, it
simply reassigns that key to the value specified
(always 1 in this case).
74Our Anonymous Hash
- unique(a,b,c,c,d,d)
- contents
- a gt 1, a gt1
- b gt 1, a gt1, bgt1
- c gt 1, a gt1, bgt1, cgt1
- c gt 1, a gt1, bgt1, cgt1
- d gt 1, a gt1, bgt1, cgt1, dgt1
- d gt 1 a gt1, bgt1, cgt1, dgt1
75A smaller script
- sub unique
- my _at_u_list keys map _ gt 1 _at__
- return _at_u_list
We dont need to explicitly create _at_u_list
A subroutine will return the object most recently
returned by an operator in the subroutine,
unless another object is returned explicitly with
a return statement
76A smaller script
- sub unique
- keys map _ gt 1 _at__
Our compact and slightly cryptic routine to
return unique items
77Why some people dislike Perl
- sub unique
- keys map _ gt 1 _at__
-
- sub unique
- _at_l_at__()
- keys l
-
- sub unique
- grep!l__at__
78The End
Questions?
- blynch_at_msi.umn.edu
- help_at_msi.umn.edu
- 612-626-0802 (MSI helpline)