Title: Introduction to PERL
1Introduction to PERL
- Instructor Jon Frederick, M.S.
- Division of Information Infrastructure,
- UNIX/NT Systems Group
- smiile_at_utk.edu, http//web.utk.edu/smiile
2What is PERL?
- Practical Extraction and Report Language
- Invented by Larry Wall in 1986
- Originally for UNIX system administration
- Based on C, sed, awk, and English
3Why is PERL Popular?
- Easy to use
- Default behavior e.g. print hello world
- Free (GNU public license)
- Available for every O.S. programs transport
seamlessly - Modern hardware makes it run fast
- No built-in limitations
4Why is PERL popular?
- Well-documented and supported
- man perl
- OReilly books, esp. The Perl CD-ROM
- comp.lang.perl
- the open-source code movement
- www.cpan.org, thousands of free scripts and
modules
5- !/usr/perl-5.6/bin/perl5.6.0
- print "which file would you like to search?\n"
- file ltSTDINgt chomp file
- open (FH, "ltfile")
- print "what pattern do you want to find?\n"
- pattern ltSTDINgt chomp pattern
- while (lineltFHgt)
- chomp line push(_at_lines,line)
-
- foreach line2 (_at_lines)
- if (line2 /pattern/)
- print "line2\n"
-
-
6Running a Perl Program
- In UNIX, first set file permissions
- Chmod ux filename.pl
- filename.pl or
- perl filename.pl
- In Windows/DOS,
- C\gt filename.pl or C\gtperl filename.pl
- or click on the file from Windows Explorer
7Perl Programs
- First line states path to the perl interpreter.
At UTK UNIX - PATH VERSION
- !/usr/misc/bin/perl 4.0
- !/soft/script/bin/perl 5.003
- !/usr/perl-5.6/bin/perl5.6.0 5.6.0
- Unsure? Type, which perl and/or perl -v
-
8Perl Programs
- comments begin with a pound sign
- and go until end-of-line
- Commands are terminated by a
- semicolon and can go for multiple lines
- Blocks of commands are surrounded by
- curly braces
- The standard file suffix for a perl program is
.pl - White space is polite but optional
9Scalar Variables
- Contain a single element, or string of characters
or numbers (can be any length) - Begin with a dollar sign
- Names are case-sensitive (var ne VAR)
- Values are assigned with an equals sign
- Variables do not need to be pre-declared.
(automatically null or zero until you assign
values). - Single quotes mean literal double mean
interpolate Example nameuserid\n - gives the userid and a line return
- usrid\n is the same as \userid\\n
10Scalar Variables
- Concatenation is achieved with a dot .
- var1 Hello
- var2 var1 . world!\n
- Print var2 prints Hello world!
- The default variable, _
- print var foreach var (_at_list) works
- Print _ foreach _ (_at_list) also works
- print foreach (_at_list) also works
11Array Variables
- Arrays are an ordered list of strings
- Names begin with the at sign _at_
- Individual elements of an array are specified by
their index number. For an array named _at_array,
the first element could be referred to by
array0 The last element is array-1 or
array99 (if the array has 100 elements).
12Array Variables
- In a scalar context _at_array is the number of
elements in the array - y _at_array
- When quoted _at_array returns all of the elements
in the array separated by a space. - array is the index number of the last element
in the array, i.e., array(_at_array-1) - The default array is _at__ (not often used).
13Array Input
- Like scalar assignment
- _at_foods (pizza, salad, beer) or
- _at_foods qw(pizza salad beer)
- Push adds element to the end of the list
- while(xltFHgt) push (_at_lines,x)
- Unshift adds element to the beginning
- while(xltFHgt) unshift (_at_lines, x)
14Array Input
- Individual Array Elements can be assigned like
scalar variables - foods0 pizza foods1 salad
- Can be read from the standard input
- _at_lines ltSTDINgt or push (_at_lines,_) while
(ltgt) - The split command divides up the elements of a
scalar into an array based on a delimiter - _at_line split / /, lines0
15Array Input Split
- Syntax split /delimiter/, string
- Example
- lineTime flies like an arrow\ fruit flies
like a banana. - _at_time split /\s/, line
- print time5 time6\n prints fruit
flies - A neat trick, grab only elements 5 and 6
- (word5,word6) (split, line)5,6
16Array Output
- By specifying the index
- While (x lt _at_lines)
- print linesx\n
- x
-
- By using foreach
- foreach (_at_lines)
- print _\n
-
17Array Output
- By using join
- file join , _at_foods
- file is now pizzasaladbeer
- By using pop and shift
- first shift _at_foods last pop _at_foods
- Order can be sorted or reversed
- _at_sorted sort _at_foods beer pizza salad
- _at_reversed reverse _at_sorted salad pizza beer
18File Input/Output
- Prompt the user for input
- Print which file would you like to read?\n
- filename ltSTDINgt
- chomp filename get rid of that pesky newline.
- Use the default _at_ARGV array
- _at_ARGV is list of arguments supplied by the user
from the command line. - pattern ARGV0
- filename ARGV1 for our script that
executes - program.pl pattern filename
19Input/Output
- Since _at_ARGV is a default variable in PERL, you
can open files explicitly - open(FH,ltARGV1)
- while(varltFHgt) print var
- close FH
- Or, let Perl assume you know what youre doing
- while(ltgt) print
- opens the default file, ARGV0, assigns it to
the - default filehandle, assigns each line of
ARGV0 - to the default variable, and prints.
20Input/Output
- The default or standard output is the terminal
screen - while(ltgt)
- print _ if (_ /pattern/)
-
- Which can be redirected from the cmd line
- program.pl pattern filename gt newfile
21Input/Output
- Or, you can explicitly open an output filehandle
- open(OUT,gtoutput.txt)
- while(_ltgt)
- print OUT if (/pattern/)
22Exercise Suppose you have the following SAS
output for 100 variables. Write a PERL program
that extracts and prints just the variable name
and the p-value of each signed rank test, one
variable name and p-value per line. The SAS
System 2311 Sunday, February 4,
2001 1 The
UNIVARIATE Procedure
Variable C3C4CAL1ALPHA Test
-Statistic- -----p Value------
Student's t t 0.095791 Pr gt t 0.9246
Sign M 1.5 Pr gt M
0.6776 Signed Rank S 5.5 Pr
gt S 0.8715
23Control Structures
- If (some_statement)
- do something
- do another something
- elsif (other_statement)
- do something else
- else
- do this only if both statements false
-
- NOTE unless (!statement) eq if (statement)
24Control Structures
- While (some_statement)
- do something until statement becomes false.
-
- Equivalent to
- Until (!some_statement)
- do something
25The Nature of Truth
- 0 and are false everything else is true.
- 0 converts to "0", so false
- 1-1 computes to 0, then converts to "0", so
false - 1 converts to "1", so true
- empty string, so false
- 1 not "" or "0", so true
- 00 not "" or "0", so true (this one is weird,
watch out) - "0.000" also true for the same reason and
warning - Undef evaluates to "", so false
- Schwartz, Christiansen and Wall, 1997. Learning
Perl.
26Boolean Operators
- And
- Or
- ! Not
- While (_ltgt (x!12))
- print if (/Signed/ /Student/)
- x
- print lines containing Signed or Student from
- the first twelve lines of the standard input.
27Comparison Operators
- Numeric String
- Equal eq
- Greater than gt gt
- Less than lt lt
- Greater than or equal gt ge
- Less than or equal lt le
- Not equal ! ne
- Not equal with signed return ltgt cmp
28Whats wrong with this Picture?
- If ((x 25) (y lt 25) )
- print y\n
29Arithmetic Operations
- Plus
- Minus -
- Divide / (floating point mode default)
- Multiply
- Exponentiate
- Modulus
30Example
- Suppose I have a data file that has K variables
and N observations, and I want the average for
each K across all N - Obsn, Varname(1), varname(2), varname(k)
- 1, data(1), data(2), . data (k)
- 2, data(1), data(2), . data (k)
-
- N, data(1), data(2), data(k)
31- while (ltgt) first read in each line and find
the sum - chomp _at_eachline split /\,\s/, _
- if (x lt 1) dont include the variable names
- _at_varnames _at_eachline
- _at_eachline () x
- else
- k 0
- shift _at_eachline get rid of obsn number.
- while (k lt _at_eachline)
- sumk sumk eachlinek
- k
-
- n counts n, the number of obsns.
32- while (z lt k)
- averagez
- (int(1000 sumz / n))/1000
- z
-
- shift _at_varnames
- print _at_varnames\n
- print "_at_average\n"
33Hashes
- Also known as associative arrays
- Consist of pairs of keys and values.
- Useful for database implementations
- Hash names begin with the percent sign
- Unlike arrays which are ordered lists indexed by
integers, hashes are unordered lists indexed by
keys. - Example emails (Jon gt smiile_at_utk.edu,
- AJ gt ajw_at_utk.edu)
- print emailsAJ\n prints ajw_at_utk.edu
- Note hashes indexed by curly, not square
brackets!
34Hash Input
- Three ways to get data into a hash
- Assigning, with commas
- grades (Jon, A, Harley, C, Marco, B)
- gt is a more readable synonym for comma
- grades(JongtA, HarleygtC, MarcogtB)
- Assign each element in scalar context
- gradesJonA gradesHarleyC
gradesMarcoB
35Hash Input
- If a key already exists, adding it to the hash
will clobber the previous value of that key. To
prevent this - unless (exists (emailsname))
- emailsnameemail
-
- Or
- if (!emailsname)
- emailsnameemail
36Hash Output
- Refer to the value with the key
- print gradesMarco
- Grab all the keys and sort alphanumerically
- print sort keys grades
- Just the values
- print sort values grades
37Hash Output
- Values and keys
- foreach key (keys grades)
- print key got a gradeskey\n
-
- Or use each
- while ((name,grade) each(grades))
- print name got a grade\n"
-
38Hash Functions
- Delete a key-value pair from a hash
- delete hashnamekey
- Make all the keys values and values keys
- reverse hashname
39A Sample CGI Form
40A Sample CGI Script
- !/usr/perl-5.6/bin/perl5.6.0
- invoke the perl compiler
- read(STDIN, buffer, ENV'CONTENT_LENGTH')
- slurp in the data from the CGI form
- the buffer comes in the form,
- lastnameFrederickfirstnameJonemailsmiile_at_ut
k.eduphone555-1212 - so it must be parsed into the separate data
fields. - _at_pairs split(//, buffer)
41A Sample CGI Script
- foreach pair (_at_pairs)
- (name, value) split(//, pair)
- FORMname value
-
- open(MEM, "ltmemberemails.txt")
- while (ltMEMgt)
- chomp
- seen_ 1
-
- close MEM
- the value of one is arbitrary for keys of seen
42A Sample CGI Script
- address FORMemail
- if (seenaddress)
- print "Content-type text/html\n\n"
- print "You're already a member!ltpgt"
- else
- print "Content-type text/html\n\n"
- foreach key (sort keys FORM)
- print "key is FORMkeyltpgt"
- open(MEM,gtgtmemberemails.txt)
- print MEM address close MEM
-
43Regular Expressions
- Regular expressions are patterns to be matched
against a string - Perl regular expressions are a superset of those
used by the UNIX utilities grep, sed, vi and awk - Weve already seen
- print if (/pattern/)
- Which is shorthand for
- print var if (varm/pattern/)
44Pattern Matching Operators/Functions
- varm/pattern/
- the match operator
- vars/pattern/replacementpattern/g
- the substitution operator
- g modifier means all occurences on each line
- _at_list split /pattern/, var
- splits var into list with pattern as
delimiter - var join /pattern/, _at_list
- joins list into a single variable
- /pattern/i i means ignore case
45Regular Expressions
- Metacharacters
- \ ( ) ? .
- Backslash means escape or literal
interpretation of metacharacters - var s/\\/pipe-dollar/
- means replace with pipe-dollar
- Escaping normal alphanumeric characters turns
them (some of them) into metacharacters - \s means white space (tab or space)
- \n means line return
46Regular Expressions
- means or Parentheses allow grouping
- print if (/Dept of (PsychologyBiology)/)
- prints lines containing
- Dept of Psychology or Dept of Biology
- . Means any character
- means any number of the previous character
- /Psych./ matches Psychology or Psychiatry
- means one or more of the previous character
- lines/\s/\t/g
- replace one-or-more spaces with a tab
47Regular Expressions
- means beginning of the line
- means end of the line
- s/\s// gets rid of spaces at beginning of
line - identify a character class
- s/A-Ex2/R/g replaces A, B, C, D, E, 2, or x
with R. - identifies a negative character class
- \w any word character a-zA-Z0-9_
- while(ltgt)
- /\_at_/ print _\n foreach(split
/\w\_at_\.\-/ ) - extracts email addresses from an html file
48Command Line Options
- perl -w filename.pl
- Debug mode, provides extra detail about potential
flaws in code - perl -c filename.pl
- Test if file compiles successfully without
actually running - perl -e command1 command2
- Command line switch runs perl code typed
directly on the command line. - perl -e sleep(120) while (1) print "\a"
- a cheap alarm clock
49Subroutines
- Defining a subroutine
- sub name .
- Invoking a subroutine
- name
- print Whats your name?
- chomp (name ltstdingt)
- hello
- sub hello
- print Hello, name!\n
-
50System Calls
- Backticks execute an expression from the command
line and return the standard output - files ls
- _at_files split /\n/,files
- system( ) just executes the expression and
returns 1 if successful, 0 if not - system (mailx -s \test mailing\ smiile_at_utk.edu
lt file)
51Additional Resources
- CGI Course, March 28 and April 6. See
- http//web.utk.edu/training
- Another PERL tutorial
- http//www.netcat.co.uk/rob/perl/win32perltut.html
- A Directory of PERL tutorials
- http//www.astentech.com/tutorials/Perl.html
- Schwartz, R., Christiansen, T., Wall, L.
(1997). Learning Perl. Sebastopol, CA OReilly
Associates.
52Additional Resources
- The PERL Bookshelf (CD-ROM with 6 books).
OReilly Associates. Includes Learning Perl. - Christiansen, T., Torkington, N. (1998). Perl
Cookbook. Sebastopol, CA OReilly Associates. - UNIX for Windows
- http//www.research.att.com/dgk/uwin/