Title: Introduction to PERL and RegExp
1Introduction to PERL and RegExp
 Perl is a language of getting your job doneÂ
Larry Wall
2Course overview
- Introduction to Perl language and Regular
Expressions - Introduction to HTML and cgi-bin
- Perl modules and in-liners
- Perl object and references
3Contents
- Definitions
- PERL, CGI...
- History
- Language basics
- reserved words
- operators
- variables
- subroutines
- modules
- Data Structure
- Interpolation, quoting
- File handling
- Context
- Special variables
- Documents, help
- Regular Expressions
- Exercises
4Definitions
- PERL
- Practical Extraction and Report Language
- CGI
- Common Gateway Interface
- CPA N
- Comprehensive Perl Archive Network
5History
- Created by Larry Wall in 1987
- Last major release Perl5 in 1994 (currently Perl
5.8) - 1996 creation of the CPAN repository of Perl
modules and documentation - Today most cgi-bin behind web pages are perl
scripts - Good starting sites
- www.perl.org, www.perl.com, www.cpan.org
- history.perl.org
6Language basics (1)
- Interpreted, JIT compiled
- Context dependent
- Object programming
- Reserved words
- Many!! E.g. if, else, until, while, for,
foreach - Large number of predefined functions
- See http//www.perldoc.com/perl5.8.0/pod/perlfunc.
html
- In UNIX systems
- /usr/local/bin/perl
- /usr/bin/perl
7Language basics (2)
- Variables
- Scalar
- var 12 car  abcÂ
- Array
- _at_var ( a , b , c , d )
- Hash
- var ( a ,first ,b,3)
- var (
-  a gt  first ,
-  b gt 3,
- )
- Operators
- - /
- ltlt gtgt
- ! and or not xor
- . x . x
- eq ! neq
- lt lt gt gt
- lt le gt ge
- ltgt cmp
- !
8Language basics (3)
- Special variables (e.g.)
- _ default input
- process ID
- string of last match
- _at_ARGV cmd-line args
- ENV environment
- Functions
- sub funcname expr
- funcname(parameters)
- Get parameters by _at__
- Comments
- Start with
- End with \n
- Modules
- Huge amount of free modules in CPAN repository
- use mod_name
- require mod_name
9Data structure
- Scalars
- Integer x 12345
- Float x 123.45
- Scientific x 12.34E5
- Octal x 0123
- Hexadec. x 0xffff
- Binary x 0b1100_1010
- Chains x petit texte
- Arrays / List
- _at_list (un, deux, trois)
- _at_list qw(un deux trois)
- list
- (a,b,c) (un, deux, trois)
- What if ???
- _at_list (un, deux, trois)
- list (un, deux, trois)
- size _at_list
- print list1
10Data structure
- _at_clefs keys (hash)
- _at_valeurs values (hash)
- foreach k (keys hash)
- print hashk
-
- Warning the hash does not keep the order
- Hashes
- hash (rouge, 0xff0000, vert, 0x00ff00,
bleu, 0x0000ff) - hash (
- rouge gt 0xff0000,
- vert gt 0x00ff00,
- bleu gt 0x0000ff,
- )
11Data structure
- References
- \ similar to in C
- refarray \_at_array
- What if ???
- refarray1 12
- refarray1 15
- refarray-gt1 18
- print array1
- Nested data structure
- Arrays of arrays
- Hashes of arrays
- Arrays of Hashes
- Hashes of hashes
- And more!
12Data structure
- Global vs Local
- Lexicals
- my
- our
- Dynamics
- local
- What if ???
- our var A
- while (ltgt)
- var D
- local var B
- print var
- my var C
- print var
- last
-
- print var
13Interpolation, quoting
- The quotes have different significations
- price 100
- print the price is price
- This is called interpolation
- Quoting (interpolated)
- q// (chain)
- qq// (chain)
- qx// (execute)
- ( ) qw// (word list)
- // m// (motif match)
14File handling
- Opening a file
- open(LIRE, filename)
- Reading from a file
- line ltLIREgt
- Writing to a file
- open(ECRIRE, gtfilename)
- print ECRIRE line
- Closing a file
- close(LIRE) close(ECRIRE)
- Special handlers
- STDOUT, STDIN, STDERR
- Select(STDOUT)
- Piping
- open(PIPE, ls -1 )
- _at_filelist ltPIPEgt
- Other system calls
- system(command)
- exec(command, options, filename)
15File testing
- File operators
- -rwxo
- -e exists
- -z empty
- -s not empty (return size)
- -f simple file
- -d directory
- -l symbolic link
- Example
- open (READ, filename) if -f filename
- What if ???
- foreach (_at_ARGV)
- next unless -f
- fsize -s _
- print("_ is fsize bytes long.\n")
-
16Context
- Scalar
- a _at_b
- Boolean
- while (_at_a) shift _at_a
- List
- (a) _at_b
- Empty
- () f(a)
- Interpolation
- a b
17Special variables
- INC hash of modules
- _at_INC list of lib directories
- ENV hash of env variables
- _at_ARGV list of arguments
- _at__ auto list
- _ auto var
- _at_F in liner -a option
- pid
- 0 program name
- autoflush
- string of last match
- / record delimiter
- V PERL version
18Documents, help, debugging
- perl -h
- perldoc ltkeywordgt
- Web help
- www.perl.org
- www.perl.com
- Books
- OReilly
- Debugging?
- use strict
- use warnings or -w
- -d (debug mode)
- man perldebug
19Calling perl
- perl file.pl
- or in an executable use shebang
- !/usr/bin/perl
- Check syntax
- perl -c file.pl
20Exercises 1.1 and 1.2 on paper
- 1.1 Write a little Hello World application
- 1.2 Write a quadratic equation solver (polynomial
of the 2nd degree) - It should solve ax2bxc0 by taking a,b and c
values on the command-line and print x1 and x2 or
tell if the equation is not solvable (deltalt0)
21Regular Expressions
- Idea powerful way to search for text patterns
- Literal (or normal characters)
- Alphanumeric
- abcABC0123...
- Punctuation
- -_ ,.()/Â ?!\ltgt"_at_
- Metacharacters
- Ex ls .java
- Flavors
- awk, egrep, Emacs, grep, Perl, POSIX, Tcl,
PROSITE !
22Patterns are regular expressions
- Pattern ltA-x-ST(2)-x(0,1)-V
- Regexp A.ST2.?V
- Text The sequence must start with an alanine,
followed by any amino acid, followed by a serine
or a threonine, two times, followed by any amino
acid or nothing, followed by any amino acid
except a valine. - Simply the syntax differ
23Regular Expressions (1)
- In Perl
- Start and End of line
- start, end
- Match any of several
- ()
- Match 0, 1 or more
- . any, ? 0 or 1, 1 or more, 0 or more
- m,n range
- ! negation
- Examples
- Match every instance of a SwissProt AC
- m/OPQ0-9A-Z0-930-9/
- m/OPQ\dA-Z0-93\d/
- Match every instance of a SwissProt ID
- m/A-Z0-91,4_A-Z0-93,5/
24Regular Expressions (2)
- Escape character or back reference
- \char or \num
- Shorthand
- \d digit 0-9
- \s whitespace space\f\n\r\t
- \w character a-zA-Z0-9_
- \D\S\W complement of \d\s\w
- Byte notation
- \num character in octal
- \xnum character in hexadecimal
- \cchar control character
- Match operator
- m//
- var m/colou?r/
- var ! m/colou?r/
- Substitution operator
- s///
- var s/colou?r/couleur/
- Translate operator
- tr///
- revcomp tr/ACGT/tgca/
- Modifiers //
- /i case insensitive
- /g global match
- Many other /s,/m,/o,/x...
25Regular Expressions (3)
- Grouping
- External reference
- var s/sp\(\w\d5)/swissprot AC1/
- Internal reference
- var s/tr\(\w\d5)\\1/trembl AC1/
- Numbering
- 1 to 9
- 10 to more if needed...
- Exercises
- Create a regexp to recognize any pseudo IP
address 123.456.789. 12 - Create a regexp to recognize any email address
Jean.Dupond_at_isb-sib.ch - Create a regexp to change any HTML tag to another
- ltaddressgt -gt ltpregt
- On sib-dea
- use visual_regexp-1.2.tcl to check your regular
expressions (requires X-windows)
26Regular Expressions (4)
27Solution 1.1
- !/usr/local/bin/perl
- print "Hello World!\n"
- exit 0
28Solution 1.2
!/usr/local/bin/perl my (a, b, c) _at_ARGV if
(a eq '' b eq '' c eq '') die ("missing
variable!") my delta bb - 4ac if
(delta 0) my x -b/2a
print "delta 0, one solution only x x\n"
elsif (delta gt 0) my x1
(-bsqrt(delta))/2a my x2
(-b-sqrt(delta))/2a print "delta gt
0, two solutions x1 x1, x2 x2\n"
else print "delta lt 0, no
solutions!!" exit 0
29Solution RegExp
- /\d1,3\.3\d1,3/
- /\w\.\w\_at_\w\-\w\.a-z2,3/
- /\lt(\/?)address\gt/\lt1pre\gt/
- generalized
- address \w