Introduction to PERL and RegExp - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Introduction to PERL and RegExp

Description:

(see http://www.uk.embnet.org/Software/EMBOSS/Apps/freak.html) ... Modify these two files to interface your analyse_seq instead of freak. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 18
Provided by: laurent4
Category:

less

Transcript and Presenter's Notes

Title: Introduction to PERL and RegExp


1
Introduction to PERL and RegExp
 Perl is a language of getting your job done 
Larry Wall
2
Contents
  • Definitions
  • PERL, CGI...
  • History
  • Language basics
  • reserved words
  • operators
  • variables
  • subroutines
  • modules
  • Client lt-gt Server lt-gt CGI
  • Modules for cgi-bin
  • CGI, CGI_Lite, example
  • Regular Expressions
  • Exercises

3
Definitions
  • PERL
  • Practical Extraction and Report Language
  • CGI
  • Common Gateway Interface
  • CPA N
  • Comprehensive Perl Archive Network

4
History
  • Created by Larry Wall in 1987
  • Last major release Perl5 in 1994
  • CPAN repository of Perl modules and documentation
  • Today most cgi-bin behind web pages are perl
    scripts
  • Good starting sites
  • www.perl.org, www.perl.com, www.cpan.org

5
Language basics (1)
  • Interpreted, JIT compiled
  • Context dependent
  • Object programming
  • Reserved words
  • Many!! E.g. if, else, until, while, for,
    foreach
  • Large number of predefined functions
  • See http//www.perl.com/pub/q/functionlist
  • In UNIX systems
  • /usr/local/bin/perl
  • /usr/bin/perl

6
Language basics (2)
  • Variables
  • Scalar
  • var 12 car  abc 
  • Array
  • _at_var ( a , b , c , d )
  • Hash
  • var ( a ,first ,b,3)
  • var (
  •  a  gt  first ,
  •  b  gt 3,
  • )
  • Operators
  • - /
  • ltlt gtgt
  • ! and or not xor
  • . x . x
  • eq ! neq
  • lt lt gt gt
  • lt le gt ge
  • ltgt cmp
  • !

7
Language basics (3)
  • Special variables (e.g.)
  • _ default input
  • process ID
  • string of last match
  • _at_ARGV cmd-line args
  • ENV environment
  • Functions
  • sub funcname expr
  • funcname(parameters)
  • Get parameters by _at__
  • Modules
  • Huge amount of free modules in CPAN repository
  • use mod_name
  • require mod_name

8
Client lt-gt Server lt-gt CGI
9
Modules for cgi-bin
  • CGI.pm
  • use CGI
  • cginew CGI
  • my seqcgi-gtparam(sequence')
  • my _at_databasecgi-gtparam('database')
  • Carp.pm
  • use CGICarp q(fatalsToBrowser)
  • CGI_Lite.pm
  • use CGI_Lite
  • cginew CGI_Lite
  • val cgi-gtparse_form_data
  • my seqval(sequence')
  • my _at_databasecgi-gtget_multiple_values
    (valdatabase')

10
Example
  • !/usr/local/bin/perl
  • import modules
  • use CGICarp q(fatalsToBrowser) makes
    debugging more easy
  • use CGI
  • read arguments
  • cgiCGI-gtnew() create CGI instance
  • my _at_interetscgi-gtparam('interets')
  • my nomcgi-gtparam('nom')
  • my passcgi-gtparam('motdepasse')
  • my genrecgi-gtparam('sexe')
  • my universitecgi-gtparam('universite')
  • select(STDOUT) configure output stream... to
    possibly send error message
  • 1 flush buffering to true
  • start HTML output
  • print "Content-type text/html \n\n"
    required line (HTTP)
  • print "ltHTMLgtltHEADgtlt/HEADgtltBODY
    bgcolor'afeeee'gt\n"
  • if (genre eq "homme") titre "Monsieur"
    else titre "Madame"
  • print "lth2gtltpgtBonjour titre,\nlt/h2gt"
  • print "ltpgtVotre nom est ltbgtnomlt/bgt et votre mot
    de passe est ltbgtpasslt/bgt\n"

11
Regular Expressions (1)
  • Idea powerful way to search for text patterns
  • Literal (or normal characters)
  • Alphanumeric
  • abcABC0123...
  • Punctuation
  • -_ ,.()/ ?!\ltgt"_at_
  • Metacharacters
  • Ex ls .java
  • Flavors
  • awk, egrep, Emacs, grep, Perl, POSIX, Tcl,
    PROSITE !
  • In Perl
  • Start and End of line
  • start, end
  • Match any of several
  • ()
  • Match 0, 1 or more
  • . any, ? 0 or 1, 1 or more, 0 or more
  • m,n range
  • ! negation

12
Regular Expressions (2)
  • Escape character or back reference
  • \char or \num
  • Shorthand
  • \d digit 0-9
  • \s whitespace space\f\n\r\t
  • \w character a-zA-Z0-9_
  • \D\S\W complement of \d\s\w
  • Byte notation
  • \num character in octal
  • \xnum character in hexadecimal
  • \cchar control character
  • Match operator
  • m//
  • var m/colou?r/
  • var ! m/colou?r/
  • Substitution operator
  • s///
  • var s/colou?r/couleur/
  • Modifiers //
  • /i case insensitive
  • /g global match
  • Many other /s,/m,/o,/x...

13
Regular Expressions (3)
  • Grouping
  • External reference
  • var s/sp\(\w\d5)/swissprot AC1/
  • Internal reference
  • var s/tr\(\w\d5)\\1/trembl AC1/
  • Numbering
  • 1 to 9
  • 10 to more if needed...
  • Exercises
  • Create a regexp to recognize any IP address
    123.456.789.012
  • Create a regexp to recognize any email address
    Jean.Dupond_at_isrec.isb-sib.ch
  • Create a regexp to change any HTML tag to another
  • ltaddressgt -gt ltpregt
  • On ludwig-sun2 use visual_regexp-1.2.tcl to
    check your regular expressions (requires
    X-windows)

14
Exercises
  • Exercises URL http//www.isrec.isb-sib.ch/DEA/m
    odule5/P2
  • Login to ludwig-sun2
  • In your home directory create a subdir called
     public_html 
  • In this public_html subdir create another subdir
    called  cgi-bin 
  • Place your pages here /home/username/public_html
  • Place your scripts here /home/username/public_ht
    ml/cgi-bin
  • View your pages with http//www.isrec.isb-sib.ch
    /username/mypage.html
  • Call the scripts with http//www.isrec.isb-sib.c
    h/cgi-bin/dea/username/myscript.pl

15
Exercises HTMLperl cgi-bin
  • The goal of this exercise is to take an existing
    pair of files (hits_form.html and dea-hits.pl)
    and analyse how these two files work together.
  • hits_form.html is a very simple form to allow the
    retrieval of protein sequences matching with a
    given profile (e.g., UBA, CARD or HECT). This
    form calls a cgi-bin perl script that makes use
    of the "hits" script to interrogate the Hits
    database.
  • Modify these two files to allow multiple database
    selection
  • Try to modify the perl script to allow
    hyperlinking of the results with the  get_doc 
    script
  • Syntax
  • http//www.isrec.isb-sib.ch/cgi-bin/get_doc?format
    textdbyyyentryxxx
  • xxx entry AC number
  • yyy database code (sp, trembl, trest or trgen)

16
Exercise interface analyse_seq
  • The goal of this exercise is to take an existing
    pair of files (freak.html and freak.pl) and
    analyse how these two files work together, then
    create another pair of files to interface the
    analyse_seq program.
  • freak.html is a very simple form to interface the
    EMBOSS program freak, that calculates the gc
    percentage of a DNA sequence and displays it in a
    table or graphically, very similarly to your
    analyse_seq
  • (see http//www.uk.embnet.org/Software/EMBOSS/App
    s/freak.html)
  • This form calls a cgi-bin perl script (freak.pl )
    that performs the command line action and
    displays the results in a new page.
  • Modify these two files to interface your
    analyse_seq instead of freak.
  • Textual output of the table is OK.
  • Try to modify the perl script to allow a
    graphical output of the results using GNUPLOT
  • Check if the user want a graphic output and which
    type
  • Create a gnuplot command file and execute it
  • Display the gif or pdf output (use  ps2pdf  to
    convert the postscript file to a pdf file).

17
Exercises
  • Exercises URL http//192.33.215.131/teacher
  • Login to your account
  • Copy a tar file into
  • your home directory cp ../teacher/exdea.tar .
  • Untar the file tar xvf exdea.tar
  • In your home directory it creates a subdir
    called  public_html 
  • Place your pages here /home/username/public_html
  • Place your scripts here /home/username/public_ht
    ml
  • View your pages with http//192.33.215.131/user
    name/mypage.html
  • Call the scripts with http//192.33.215.131/use
    rname/myscript.cgi
Write a Comment
User Comments (0)
About PowerShow.com