Title: Benjamin J. Lynch
 1Intermediate Perl
- by 
- Benjamin J. Lynch 
- blynch_at_msi.umn.edu
2Introduction
- Perl is a powerful interpreted language that 
 takes very little knowledge to get started. It
 can be used to automate many research tasks with
 little effort.
- The greatest strength and weakness of Perl is the 
 ability to accomplish the same task using two
 very different codes.
3Outline
- Review of Perl 
- Variable types 
- Context 
- Operators 
- Control structure 
- Pattern Matching 
- Subroutines 
- Context 
- References 
- grep 
- map 
- modules
4When should I use Perl?
- Perl stands for Practical Export Report Language 
- Perl is most useful for 
- parsing files to extract desired data 
- Doing almost anything you can do in a shell 
 script
- cgi scripts to generate HTML for web pages 
- updating or retrieving information from databases 
- acting as in interface between programs
5Programming Style
- Questions you should ask 
- Who else might look at the code? 
- Co-workers? 
- Complete strangers? 
- How often will the code be modified? 
- Remember your target audience 
- There is no substitute for comments
6An Interpreted Language
- Perl programs are also called Perl scripts 
 because Perl is an interpreted language.
- When you execute a Perl script, the script is 
 compiled into a set of instructions for the Perl
 interpreter
- This set of instructions (or parse tree) is sent 
 to the Perl interpreter
- The Perl interpreter shares many similarities to 
 the virtual machine in Java
- There is no need to compile a Perl script as a 
 separate preliminary step, making Perl scripts
 similar to shell scripts (at least on the
 exterior).
7A Simple Perl Script
- !/usr/bin/perl 
- print Hello world! \n 
- blynch_at_msi  chmod x hello.pl 
- blynch_at_msi  ./hello.pl 
- Hello world! 
- blynch_at_msi  
- \n is a new line. 
- The routine print will print the item or list of 
 items that follows.
8Variable Types
- Scalar 
- Reference (scalar pointer to another variable) 
- List (array) 
- Hash (associative array)
9Scalars
- Examples 
- var  3 
- name  Larry 
- float  1.1235813 
- sum  a  1.2 
10- A scalar is a single value. 
-  number  1 
-  text  Hello world! 
-  a  1.2 
-  b  1.3 
-  sum  a  b 
-  print sum \n 
-  
-  2.5 
The scalar can be integer 64-bit floating 
point string reference
The way that the data is stored (integer, 
floating point ,) does not need to be specified. 
 The Perl interpreter will determine it 
automatically 
 11Lists
- A list (or array) of values can be specified 
 like
-  _at_number_list  (1,1,2,3,5,8,13,21) 
-  _at_grocery_list  (apples,chicken,canned 
 soup)
- A list always starts with a _at_ 
12Lists (arrays)
- _at_mylist  (1,2,2,3,4,4,4) 
- _at_names  (Larry,  Moe) 
- push(_at_names,  Curly) 
- print _at_names 
- Larry Moe Curly
Adds an item to the end of a list 
 13Lists
- A list (or array) of values 
-  
- _at_grocery_list  (apples,chicken,canned 
 soup)
- print grocery_list2 
- canned soup 
Note the numbering of elements
A  is used in the print statement because of 
the context. We only want print to handle a 
single value from the array and so we use  to 
denote the scalar context.  
 14Hashes (associative arrays)
- A Hash is an associative array 
- Instead of using an integer index, a hash uses a 
 key to access elements of the hash
- lunch  (monday gt pizza, 
-  tuesday gt burritos, 
-  wednesday gt sandwich, 
-  thursday gt fish) 
- print on Tuesday Ill eat lunchtuesday 
- on Tuesday Ill eat burritos
15Hashes (associative arrays)
- A Hash can be created with a list of key/value 
 pairs.
- Each key has one value associated with it. 
- hash  (Larry gt 1, Moe gt 2, Curly gt 3) 
- hash  (Larry , 1, Moe , 2, Curly , 3) 
Either of these work to specify a hash 
 16Variable Context
-  _at_number_list  (1,1,2,3,5,8,13,21) 
-  _at_grocery_list  (apples,chicken,canned 
 soup)
- print _at_grocery_list 
- appleschickencanned soup
If we use the array (or list) context, the print 
command will print out all elements from the 
array. 
 17Variable Context
-  _at_number_list  (1,1,2,3,5,8,13,21) 
-  _at_grocery_list  (apples,chicken,canned 
 soup)
- print grocery_list1 
- chicken
If we use the scalar context, we must specify the 
element we want to print from the list. 
 18Variable Context
- _at_grocery_list  (apples,chicken,canned 
 soup)
- var  _at_grocery_list 
- print var 
- 3
If we request a scalar from a list, the list 
will return its length. 
 19Perl Operators
- massheight  Multiplication 
- a  b Addition 
- a - b Subtraction 
- a / b division 
- str1.str2 Concatenate 
- count Increment count by 1 
- missing-- decrease missing by 1 
- total subtotal increase total by 
 subtotal
- interest factor set interest to 
 interestfactor
- string. more append more to the end of 
 string
20rand Perl
- rand(num) 
- returns a random, double-precision floating-point 
 number between 0 and num.
- var  rand(4)
21Control structure
- !/usr/bin/perl 
- _at_my_grocery_list  (apples,chicken,canned 
 soup)
- foreach item (_at_my_grocery_list) 
-  purchase (item) 
-  
- while ( some condition is true ) 
-  do_this 
22Control structure
- Two ways to if/then 
- if (condition) print It is true \n 
- print It is true \n if condition
23Retrieving a random element from a list
- _at_greeting  (Hello,Greetings,Hola,Howdy) 
- print greetingrand _at_greeting 
- print greetingrand 4 
- print greeting2.59196266661263168 
- print greeting2 
- print Hola
24Subroutines
- Defined like this 
- sub my_sub_name  
-  do something 
-  
- Used like this 
- mysubroutine(variables passed) 
25Subroutines
- Variables passed to a subroutine enter the 
 routine as a single list
- _at_list1  (a ,b ,d )  
- scalar  42  
- mysub(_at_list1, scalar)  
- sub mysub 
-  print _at__ 
a b d 42 
 26Returning values from subroutines
- Subroutines return whatever is returned by the 
 return statement or else the last item evaluated
 in the subroutine.
- _at_list1  (2,3)  
- print mymult(_at_list1)  
- sub mymult 
-  product  _0_1 
-  return product 
6 
 27Pattern Matching
- Perl uses a very robust pattern matching syntax 
- The most basic pattern match looks like 
- string  /some pattern/ 
- string   1 2 three 
- if (string  /2/)  
-  print the number 2 is in the string\n 
-  
In Perl, anything but  and 0 are considered 
 TRUE  
 28Pattern Matching
- ,  \n 
- string  1 2 hello 2 5 
- matching  (string  /\d/) 
- print matching 
- 1 
- _at_matches  (string  /\d/g) 
- print _at_matches 
- 1 
- 2 
- 2 
- 5
g is for global. This will allow the pattern to 
be matched multiple times.
\d will match any single digit 
 29Pattern Matching
- ,\n 
- string  1.45 1.482 1.938 other text 
 10.2849
- print (string   /\d.\d/g) 
- 1.45 
- 1.482 
- 1.938 
- 0.2849 
30Pattern matching
- /pattern/ 
- /(sub-expression1)(sub-expression2)/ 
- \d number 
- \s whitespace 
- \S non-whitespace 
- pattern2 will match pattern exactly twice 
- character list defined character class 
- abcDEF 
- a NOT b 
-  OR statement - it will match pattern on either 
 side
31/(bbb2)/
This is written on a T-Shirt I own 
 32/(bbb2)/
bb
We want 2 of them
New character class NOT b
OR statement 
 33Pattern Matching
- ------------------------------------------------ 
- Charge Models 2 and 3 (CM2 and CM3) and 
- Solvation Model SM5.42 GAMESSPLUS version 4.3 
- ------------------------------------------------ 
-  Gas-phase 
- ------------------------------------------------ 
-  Center Atomic CM3 RLPA Lowdin 
-  Number Number Charge Charge Charge 
- ------------------------------------------------ 
-  1 3 .218 -1.090 -.938 
- Gas-phase dipole moment (Debye) 
- ------------------------------------------------ 
-  X Y Z Total 
-  CM3 -.718 -.592 -1.748 1.980 
-  RLPA -.327 1.122 -.840 1.440 
-  Lowdin -.116 1.662 -.761 1.832 
- ------------------------------------------------
34Pattern Matching
- if (/ CM3\s(-?\d\.\d)\s(-?\d\.\d)\s(-
 ?\d\.\d)\s(-?\d\.\d)\s/)
-  amsol94 
- if (/ CM3\s(-?\d\.\d\s)3(-?\d\.\d)\s/
 )
-  amsol92 
- if (/ CM3\s(-?\d\.\d\s)3(-?\d\.\d)\s/)  
-  amsol92 
- if (/ CM3(\s\S)3\s(\S)/)  
-  amsol92 
35Pattern Matching
- if (/ CM3\s(-?\d\.\d\s)3(-?\d\.\d)\s/)  
-  amsol92 
- if (/ CM3(\s\S)3\s(\S)/)  
-  amsol92 
36Substitutions
- s/search pattern/replace/ 
- string  words9words383words 
- string  s/\d/, /g 
- print string
words, words, words 
 37Special Variables
- 1, 2, 3,  
- Holds the contents of the most recent 
 sub-patterns matched
- if (string  /(Larry) (Moe) Curly/) 
-  print 2 
-  
-  
Moe 
 38Special Variables
-  
- Determines which index in a list is the first, 
 the default is 0.
- my _at_mylist  (Larry, Moe, Curly) 
- print mylist1 
-   1 
- print mylist1 
- Moe 
- Larry
39Special Variables
-  
- Entire pattern from most recent match
40Special Variables
- / 
- Input record separator, default is \n 
- undef / 
- open(FILE,ltinput.txt) 
- bufferltINFILEgt 
- buffer contains the entire file 
41Special Variables
  42Special Variables
- , 
- Default separator used when a list is printed, 
 default is
- ,  will add a space between each item if you 
 print out a list.
- \ 
- Default record separator, default is  
- \  \n will add a blank line after each print 
 statement.
43Special Variables
- T time the perl program was executed 
-  autoflush 
-  process ID number for Perl 
- 0 name of perl script executed 
- ENV hash containing environmental variables. 
44Special Variables
- _at_ARGV is a list that old all the arguments passed 
 to the Perl script.
- _at__ is a list of all the variables passed to the 
 current subroutine
45Special Variables
- _ is a variable that hold the current topic. 
- e.g. 
- while (ltFILE1gt) 
-  print line . _ 
46Special Variables
- _ is a variable that hold the current topic. 
- e.g. 
- while (ltFILE1gt) 
-  print line . _ 
This is the current line number
This is the current line being processed in FILE1 
 47References
- A reference is a scalar 
- Instead of number or string, a reference holds a 
 memory location for another variable or
 subroutine.
- myref  \variable 
- subref \subroutine
48Dereferencing the Reference
- To retrieve the value stored in a reference, you 
 must dereference it.
- name  Larry 
- ref_name  \name 
- print ref_name , \n 
- print ref_name, \n 
- SCALAR(0x60000000000218a0) 
- Larry
49Dereferencing the Reference
- Modifying a dereferenced reference to a variable 
 is the same as modifying the variable.
- name  Larry 
- ref_name  \name 
- ref_name . , Moe, and Curly 
- print ref_name , \n 
- print name, \n 
- Larry, Moe, and Curly 
- Larry, Moe, and Curly
50Where do we want to use a reference?
- References are very useful when passing lists to 
 a subroutine.
- _at_mylist  (Larry, Moe, Curly) 
- list_ref  \_at_mylist 
- mysub(list_ref ) 
- sub mysub  
-  my ref  _0 
-  my _at_list  _at_ref 
-  print list2, \n 
51Where do we want to use a reference?
- References are very useful when passing lists, 
 hashes, or subroutines to a subroutine.
- myhash  (1 gt Larry, 2 gt Moe, 3 gt Curly) 
- hash_ref  \myhash 
- mysub(hash_ref ) 
- sub mysub  
-  my ref_inside  _0 
-  print ref_inside2, \n 
-  print _02, \n 
52Where do we want to use a reference?
- References are very useful when passing lists, 
 hashes, or subroutines to a subroutine.
- myhash  (1 gt Larry, 2 gt Moe, 3 gt Curly) 
- hash_ref  \myhash 
- mysub(hash_ref ) 
- sub mysub  
-  my ref_inside  _0 
-  print ref_inside2, \n 
-  print _02, \n 
Both print the same thing 
 53We can even pass subroutines
- sub_ref  \my_subroutine 
- run_this(sub_ref ) 
- sub runthis  
-  my ref  _0 
-  ref 
54GREP
- _at_matching_linesgrep(/expression/,_at_input_lines) 
- _at_matching_linesgrep /expression/ _at_input_lines 
- _at_no_commentsgrep !// _at_lines_of_code 
- open(FILE1,ltmycode.pl)  
-  _at_no_commentsgrep !// ltFILE1gt 
55MAP
- map BLOCK _at_array 
- Returns the list generated by executing BLOCK for 
 each value of _at_array
- foreach number (_at_mylist) 
-  print number1 
-  
- print map _1 _at_mylist
56MAP
- The block can have any amount of code or 
 subroutines
- map mysub(_) _at_array 
- map  
-  sub1(_) 
-  sub2(_) 
-  a1 
-  _at_array
This map would simply return a list of 1s.
The last value evaluated is returned 
 57keys
- keys hash 
- Will return a list of the keys in the hash 
- myhash  (1 , Larry, 2, Moe,3,Curly)  
- _at_keylist  keys myhash  
- print _at_keylist 
- 123
58Modules
- Modules are reusable packages defined in a 
 library file
- They offer simple access to routines such as 
- Database access 
- Matrix manipulations 
- Communication libraries 
- Editing standard binary formats (.doc, xls, ) 
- Graphics libraries (OpenGL, Tk, ) 
59Using a Module
- use MailSendmail 
- ... 
- foreach user (_at_email_list) 
-  mail_mess  ( To gt "user", 
 Fromgt'me_at_uofu.edu',
-  Subject gt "subject", 
-  Message gt "message" 
-  ) 
-  sendmail(mail_mess) 
60New Objects in Modules
- Perl is an object-orientated language 
- You can define new object types 
- A scalar can hold other object types, such as a 
 matrix, a tensor, a window, a database,
- The behavior of new objects are defined in the 
 corresponding modules.
- See www.cpan.org for a few thousand useful perl 
 modules.
61Overloading Operators
- The standard operators in Perl can be defined for 
 additional operations when placed between objects
 that are not fundamental types.
- a  b  4 
- use MathMatrixSparse 
-  
- matrix_product  matrix1matrix2
62global, local, my
- Global variables are the default in Perl. 
- Globals can be seen in any subroutines. 
- var  1 
- printme() 
- sub printme 
-  print var 
We didnt pass var, but this works because var 
is global. 
 63global, local, my
- Local variables are also global 
- Local variables become undef when they go out of 
 scope.
-  
-  local name 
-  mysub() 
-  
name has become undef here 
 64global, local, my
- my variables are preferred for most Perl code. 
- my scalar 
- sub1(\myhash, _at_array) 
scalar will not be available in sub1 because 
it is not explicitly passed. 
 65my Code
use strict will not allow global variables to be 
defined on the fly. Global variables that 
appear partway through the code often make your 
script unreadable.
- use strict 
- my _at_array 
- my hash 
66How much have you learned?
- How do we make a Perl script that takes a list as 
 its argument, and returns the unique values from
 that list?
- _at_array  (1, 1, 2,3, 4, 4, 34, 20 , 20) 
- ,,  
- print unique(_at_array) 
- 1, 2, 3, 4, 34, 20
67- sub unique 
-  max_el_at__-1 
-  my _at_u_list 
-  u_list0_0 
-  for i (1 .. max_el) 
-  original1 
-  foreach item (_at_u_list) 
-  if (item  _i) 
-  original0 
-   
-   
-  if (original) 
-  push (_at_u_list, _i) 
-   
-   
-  return _at_u_list 
This is how a FORTRAN77 Expert might solve the 
problem. Simple, straightforward, lots of code. 
 68A smaller script
- sub unique 
-  my _at_u_list keys  map _ gt 1 _at__  
-  return _at_u_list 
How does that work? 
 69A smaller script
- sub unique 
-  my _at_u_list keys  map _ gt 1 _at__  
-  return _at_u_list 
map  _ gt 1  _at__
This will return a list of key/value pairs. The 
keys will be each value (_) in _at__ (the array 
passed to the subroutine).
The values will all be 1 
 70A smaller script
- sub unique 
-  my _at_u_list keys   key/value pairs   
-  return _at_u_list 
 key/value pairs 
This creates an anonymous hash and returns a 
 reference to it. 
 71A smaller script
- sub unique 
-  my _at_u_list keys  reference_to_a_hash  
-  return _at_u_list 
 reference_to_a_hash 
We dereference the anonymous hash 
 72A smaller script
- sub unique 
-  my _at_u_list keys hash 
-  return _at_u_list 
This returns the keys in the hash
Where did we wipe out the duplicates? 
 73A smaller script
- sub unique 
-  my _at_u_list keys   key/value pairs   
-  return _at_u_list 
 key/value pairs 
When we create our anonymous hash, we assign the 
value 1 for each key. When a key is repeated, it 
simply reassigns that key to the value specified 
(always 1 in this case).  
 74Our Anonymous Hash
- unique(a,b,c,c,d,d) 
-  contents 
- a gt 1, a gt1 
-  b gt 1, a gt1, bgt1 
-  c gt 1, a gt1, bgt1, cgt1 
-  c gt 1, a gt1, bgt1, cgt1 
-  d gt 1, a gt1, bgt1, cgt1, dgt1 
-  d gt 1 a gt1, bgt1, cgt1, dgt1
75A smaller script
- sub unique 
-  my _at_u_list keys  map _ gt 1 _at__  
-  return _at_u_list 
We dont need to explicitly create _at_u_list
A subroutine will return the object most recently 
 returned by an operator in the subroutine, 
unless another object is returned explicitly with 
a return statement 
 76A smaller script
- sub unique 
-  keys  map _ gt 1 _at__  
Our compact and slightly cryptic routine to 
return unique items 
 77Why some people dislike Perl
- sub unique 
-  keys  map _ gt 1 _at__  
-  
- sub unique 
-  _at_l_at__() 
-  keys l 
-  
- sub unique 
-  grep!l__at__ 
78The End
Questions?
- blynch_at_msi.umn.edu 
- help_at_msi.umn.edu 
- 612-626-0802 (MSI helpline)