Title: Practical Extraction
 1Practical Extraction  Report Language
- Picture taken from http//www.wendy.org/DPW2006/sh
 irt.htm
2Agenda
- Why Perl? 
- Getting/Installing Perl 
- Using Perl 
- Structure of basic program (Hello world) 
- Variables  Operators 
- Regular Expressions 
- Other Topics 
3Why Perl
- Perl is built around regular expressions 
- REs are good for string processing 
- Therefore Perl is a good scripting language 
- Perl is especially popular for CGI scripts 
- Perl makes full use of the power of UNIX 
- Short Perl programs can be very short 
- Perl is designed to make the easy jobs easy, 
 without making the difficult jobs impossible. --
 Larry Wall, Programming Perl
4Getting/Installing Perl
- Windows 
- www.activestate.com 
- Download ActivePerl 
- Run installer 
- Linux 
- Mostly installed when Linux is installed 
- userX_at_machineY which perl 
- Get it from 
- Linux distribution CDs 
- Update your installation and during package 
 selection, select perl
- ActiveState.com 
- CPAN
5Other Possibilities
- Using Virtual Machines 
- VMWare 
- Install VMWare workstation on windows 
- Install Linux under VMware workstation (select 
 perl to be installed)
- Cygwin 
- Install Cygwin on windows 
- It will provide a Linux interface such that perl 
 can be used
- http//www.cygwin.com/mirrors.html
6Using Perl
- Windows 
- Write a program and save it with .pl extension 
- C\perl\bingtperl program_name.pl 
- Linux 
- Write a program and save it with .pl extension 
- userX_at_machineY perl program_name.pl 
- userX_at_machineY ./program_name.pl 
- Same under VMware  Cygwin
chmod x program_name.pl 
 7Structure of a basic program
First line is special, Path to perl installation 
this path can be different e.g., /bin/perl
- !/usr/bin/perl 
-  Program to do the obvious 
- print 'Hello world.' 
 denotes comment, any thing after  till the end 
of line is comment
Statement ends with semicolon
Built in function
Function argument, in this case a string constant
userX_at_machineY perl hello.pl 
 8Variables
- Scalar variables 
- Only one value at a time 
- List variables 
- List of values (Arrays)
9Scalar Variables
- The scalar variable means that it can store only 
 one value.
- They should always be preceded with the  symbol. 
 e.g., var1
-  There is no necessity to declare the variable 
 before hand. (but recommended)
- There are no data types such as character or 
 numeric. If you treat the variable as character
 then it can store a character. If you treat it as
 string it can store one word . if you treat it as
 a number it can store one number.
10Example scalar 
- !/perl/bin 
- x  "100\n" 
- print x 
- x  x  1 
- print x 
Output 100 101 
 11List/Array Variables
- They are like arrays. It can be considered as a 
 group of scalar variables.
- They are always preceded by the _at_symbol. 
- eg _at_items  (Apple",Bell",Chair") 
- Like in C the index starts from 0. 
- If you want the second name you should use 
 items1
- Watch the  symbol here because each element is 
 a scalar variable.
-  Followed by the list variable gives the length 
 of the list variable.
- items will provide index of last element _at_items 
- len  _at_items will assign length of array to 
 len
Is the result of two statements same len  
_at_items and print _at_items ? 
 12Example List/Array
- !/perl/bin 
- _at_myarray  (1721, 2974, string") 
- print _at_myarray 
- myarray0 string 
- myarray1 1234 
- myarray2 5646 
- print _at_myarray 
- print myarray0 . myarray1 . 
 myarray0
13Operations on Arrays
- Push 
- push adds one or more things to the end of a list 
- push (_at_items, table", chair") 
- push returns the new length of the list 
- Pop 
- pop removes and returns the last element 
- myitem  pop(_at_items) 
- shift, unshift, reverse 
14Example (Push  Pop)
- !/perl/bin 
- _at_myarray  (1721, 2974, string") 
- print _at_myarray\n 
- push(_at_myarray,newval1,newval2) 
- print _at_myarray\n 
- popvalue pop(_at_myarray) 
- print myarray\n 
- print _at_myarray 
15Operators
- Arithmetic 
- String 
- Single and Double quotes 
- Conditional
16Arithmetic in Perl
a  1  2  Add 1 and 2 and store in a a 
 3 - 4  Subtract 4 from 3 and store in 
a a  5  6  Multiply 5 and 6 a  7 / 
8  Divide 7 by 8 to give 0.875 a  9  
10  Nine to the power of 10, that is, 910 a 
 5  2  Remainder of 5 divided by 2 a 
  Increment a and then return 
it a  Return a and then 
increment it --a  Decrement a 
and then return it a--  Return a 
and then decrement it 
 17String and assignment operators
a  b . c  Concatenate b and c a  b x 
c  b repeated c times a  b  
Assign b to a a  b  Add b to a a 
- b  Subtract b from a a . b 
  Append b onto a 
 18Single and double quotes
- a  'apples' 
- b  'bananas' 
- print a . ' and ' . b 
- prints apples and bananas 
- print 'a and b' 
- prints a and b 
- print "a and b" 
- prints apples and bananas 
19Conditions
- Strings Numbers 
- eq  equal to 
- ne ! not equal to 
- lt lt less than 
- gt gt greater than 
- le lt less then or equal to 
- ge gt greater then or equal to 
- Logical 
-  And 
-  Or 
- ! negation 
20Control structures
- Loops 
- Foreach 
- For 
- while 
- Condition 
- If / else 
- Subroutines
21foreach
 Visit each item in turn and call it 
myitem _at_item  (item1,item2,item3) forea
ch myitem (_at_items)  print "myitem\n"   
 22for loops
- for loops are just as in C or Java 
- for (i  0 i lt 10 i) print 
 "i\n"
23while loops
!/usr/local/bin/perl a  1 while (a ! 
10)  a 
 24do..while loops
!/usr/local/bin/perl a  1 do  a  while 
(a ! 10) 
 25if statements
if (a)  print "The string is not 
empty\n"  else  print "The string is 
empty\n"  
 26if - elsif statements
if (!a)  print "The string is empty\n" 
  elsif (length(a)  1)  print "The string 
has one character\n"  elsif (length(a)  
2)  print "The string has two characters\n" 
  else  print "The string has many 
characters\n"  
 27Calling subroutines
- Assume you have a subroutine printargs that just 
 prints out its arguments
- Subroutine calls 
- printargs(arg1", arg2") 
- Prints arg1 arg2" 
- returnvalue printargs(arg1", arg2") 
- Prints arg1 arg2 
- returnvalue will be assigned two
28Defining subroutines
- Here's the definition of printargs 
- sub printargs 
-  print "_at__\n" 
-   
- Parameters are put in the array _at__ which can be 
 accessed using
- _0, _1 etc 
How many parameters are passed to sub routine? 
 29Returning a result
sub maximum  if (_0 gt _1) 
  return _0  else  
 return _1   biggest  maximum(37, 
24)  
 30Basic pattern matching
- sentence  /the/ 
- True if sentence contains "the" 
- sentence  "The dog bites."if (sentence  
 /the/)  is false
- because Perl is case-sensitive 
- ! is "does not contain"
31RE special characters
.  Any single character except a 
newline   The beginning of the line or 
string   The end of the line or 
string   Zero or more of the last 
character   One or more of the last 
character ?  Zero or one of the last 
character 
 32RE examples
.  matches the entire string hi.bye 
  matches from "hi" to "bye" inclusive x y 
  matches x, one or more blanks, and y Dear 
  matches "Dear" only at beginning bags? 
  matches "bag" or "bags" hiss  matches 
"hiss", "hisss", "hissss", etc. 
 33Other Topics
- Split() and join() 
- File handling 
- Perl 5 
- Modules 
- http//www.pageresource.com/cgirec/index2.htm