Title: Practical Extraction
1Practical Extraction Report Language
- Picture taken from http//www.wendy.org/DPW2006/sh
irt.htm
2Agenda
- Why Perl?
- Getting/Installing Perl
- Using Perl
- Structure of basic program (Hello world)
- Variables Operators
- Regular Expressions
- Other Topics
3Why Perl
- Perl is built around regular expressions
- REs are good for string processing
- Therefore Perl is a good scripting language
- Perl is especially popular for CGI scripts
- Perl makes full use of the power of UNIX
- Short Perl programs can be very short
- Perl is designed to make the easy jobs easy,
without making the difficult jobs impossible. --
Larry Wall, Programming Perl
4Getting/Installing Perl
- Windows
- www.activestate.com
- Download ActivePerl
- Run installer
- Linux
- Mostly installed when Linux is installed
- userX_at_machineY which perl
- Get it from
- Linux distribution CDs
- Update your installation and during package
selection, select perl - ActiveState.com
- CPAN
5Other Possibilities
- Using Virtual Machines
- VMWare
- Install VMWare workstation on windows
- Install Linux under VMware workstation (select
perl to be installed) - Cygwin
- Install Cygwin on windows
- It will provide a Linux interface such that perl
can be used - http//www.cygwin.com/mirrors.html
6Using Perl
- Windows
- Write a program and save it with .pl extension
- C\perl\bingtperl program_name.pl
- Linux
- Write a program and save it with .pl extension
- userX_at_machineY perl program_name.pl
- userX_at_machineY ./program_name.pl
- Same under VMware Cygwin
chmod x program_name.pl
7Structure of a basic program
First line is special, Path to perl installation
this path can be different e.g., /bin/perl
- !/usr/bin/perl
- Program to do the obvious
- print 'Hello world.'
denotes comment, any thing after till the end
of line is comment
Statement ends with semicolon
Built in function
Function argument, in this case a string constant
userX_at_machineY perl hello.pl
8Variables
- Scalar variables
- Only one value at a time
- List variables
- List of values (Arrays)
9Scalar Variables
- The scalar variable means that it can store only
one value. - They should always be preceded with the symbol.
e.g., var1 - There is no necessity to declare the variable
before hand. (but recommended) - There are no data types such as character or
numeric. If you treat the variable as character
then it can store a character. If you treat it as
string it can store one word . if you treat it as
a number it can store one number.
10Example scalar
- !/perl/bin
- x "100\n"
- print x
- x x 1
- print x
Output 100 101
11List/Array Variables
- They are like arrays. It can be considered as a
group of scalar variables. - They are always preceded by the _at_symbol.
- eg _at_items (Apple",Bell",Chair")
- Like in C the index starts from 0.
- If you want the second name you should use
items1 - Watch the symbol here because each element is
a scalar variable. - Followed by the list variable gives the length
of the list variable. - items will provide index of last element _at_items
- len _at_items will assign length of array to
len
Is the result of two statements same len
_at_items and print _at_items ?
12Example List/Array
- !/perl/bin
- _at_myarray (1721, 2974, string")
- print _at_myarray
- myarray0 string
- myarray1 1234
- myarray2 5646
- print _at_myarray
- print myarray0 . myarray1 .
myarray0
13Operations on Arrays
- Push
- push adds one or more things to the end of a list
- push (_at_items, table", chair")
- push returns the new length of the list
- Pop
- pop removes and returns the last element
- myitem pop(_at_items)
- shift, unshift, reverse
14Example (Push Pop)
- !/perl/bin
- _at_myarray (1721, 2974, string")
- print _at_myarray\n
- push(_at_myarray,newval1,newval2)
- print _at_myarray\n
- popvalue pop(_at_myarray)
- print myarray\n
- print _at_myarray
15Operators
- Arithmetic
- String
- Single and Double quotes
- Conditional
16Arithmetic in Perl
a 1 2 Add 1 and 2 and store in a a
3 - 4 Subtract 4 from 3 and store in
a a 5 6 Multiply 5 and 6 a 7 /
8 Divide 7 by 8 to give 0.875 a 9
10 Nine to the power of 10, that is, 910 a
5 2 Remainder of 5 divided by 2 a
Increment a and then return
it a Return a and then
increment it --a Decrement a
and then return it a-- Return a
and then decrement it
17String and assignment operators
a b . c Concatenate b and c a b x
c b repeated c times a b
Assign b to a a b Add b to a a
- b Subtract b from a a . b
Append b onto a
18Single and double quotes
- a 'apples'
- b 'bananas'
- print a . ' and ' . b
- prints apples and bananas
- print 'a and b'
- prints a and b
- print "a and b"
- prints apples and bananas
19Conditions
- Strings Numbers
- eq equal to
- ne ! not equal to
- lt lt less than
- gt gt greater than
- le lt less then or equal to
- ge gt greater then or equal to
- Logical
- And
- Or
- ! negation
20Control structures
- Loops
- Foreach
- For
- while
- Condition
- If / else
- Subroutines
21foreach
Visit each item in turn and call it
myitem _at_item (item1,item2,item3) forea
ch myitem (_at_items) print "myitem\n"
22for loops
- for loops are just as in C or Java
- for (i 0 i lt 10 i) print
"i\n"
23while loops
!/usr/local/bin/perl a 1 while (a !
10) a
24do..while loops
!/usr/local/bin/perl a 1 do a while
(a ! 10)
25if statements
if (a) print "The string is not
empty\n" else print "The string is
empty\n"
26if - elsif statements
if (!a) print "The string is empty\n"
elsif (length(a) 1) print "The string
has one character\n" elsif (length(a)
2) print "The string has two characters\n"
else print "The string has many
characters\n"
27Calling subroutines
- Assume you have a subroutine printargs that just
prints out its arguments - Subroutine calls
- printargs(arg1", arg2")
- Prints arg1 arg2"
- returnvalue printargs(arg1", arg2")
- Prints arg1 arg2
- returnvalue will be assigned two
28Defining subroutines
- Here's the definition of printargs
- sub printargs
- print "_at__\n"
-
- Parameters are put in the array _at__ which can be
accessed using - _0, _1 etc
How many parameters are passed to sub routine?
29Returning a result
sub maximum if (_0 gt _1)
return _0 else
return _1 biggest maximum(37,
24)
30Basic pattern matching
- sentence /the/
- True if sentence contains "the"
- sentence "The dog bites."if (sentence
/the/) is false - because Perl is case-sensitive
- ! is "does not contain"
31RE special characters
. Any single character except a
newline The beginning of the line or
string The end of the line or
string Zero or more of the last
character One or more of the last
character ? Zero or one of the last
character
32RE examples
. matches the entire string hi.bye
matches from "hi" to "bye" inclusive x y
matches x, one or more blanks, and y Dear
matches "Dear" only at beginning bags?
matches "bag" or "bags" hiss matches
"hiss", "hisss", "hissss", etc.
33Other Topics
- Split() and join()
- File handling
- Perl 5
- Modules
- http//www.pageresource.com/cgirec/index2.htm