Perl in an Hou - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Perl in an Hou

Description:

Perl in an Hour – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 44
Provided by: t80
Learn more at: https://www.bios.niu.edu
Category:
Tags: dog | hou | names | perl

less

Transcript and Presenter's Notes

Title: Perl in an Hou


1
Perl in an Hour
  • or maybe two
  • (depends on how fast I talk)

2
Basics
3
Hello, World!
  • ! /usr/bin/perl
  • use warnings
  • use strict enforces variable scoping
  • print Hello, world!\n
  • invoke the Perl compiler with the ! in the first
    column of the first line.
  • lines end with semicolons
  • comments are single line, starting with
  • print followed by a double-quoted string
    interprets variables and metacharacters.
  • print by default prints to STDOUT, the monitor.
  • note that many built-in functions, such as print,
    do not require parentheses (but they can be used
    for dis-ambiguation).
  • Processing a quick compilation/optimization
    step, followed by execution. Execution starts at
    the top of the program and proceeds line-by-line.
    There is no main block it is implied, the
    code that is not part of any other block.
  • for SEED interactions
  • put this line into the .bashrc file in your home
    directory
  • source fig/FIGdisk/config/fig-user-env.sh
  • then, create your file with tool_hdr whatever.pl
    . This installs some code needed for file access.

4
Numerical Operations
  • Number representation oddities
  • you can use underscores as punctuation within a
    number 1_000_000 is the same as 1000000. Perl
    ignores the _s.
  • base 10 exponents are represented by e or E
    1.3e-12 is the Perl representation of 1.3 x
    10-12.
  • Warning BLAST programs often give scores like
    e-12. Perl needs to see a 1 in front of this.
  • score 1 . score if substr score,
    0,1 eq e
  • Numerical operations are mostly as in C (, -, ,
    /, (mod), (exponent) ).
  • However, division is floating point 10 / 3 gives
    3.3333.
  • to get integer division, use int int(10/3)
    give 3.
  • Operator precedence is as in C, but use
    parentheses!

5
Strings
  • Strings are a fundamental data type in Perl (as
    opposed to characters).
  • A string is anything surrounded by quotes dog.
    Variables and metacharacters are interpreted in
    the string (unless you use single quotes).
  • Concatenation is done using the dot ( . ). the
    . . dog is interpreted as the dog
  • Numbers and strings are the two main types of
    scalar variable. They are freely interconverted
    as needed. Thus, 5 and 5 are the same thing.
  • Non-numerical strings have a numerical value of
    0 the dog 3 equals 3 because the dog is
    interpreted as a number due to the sign. 5
    3 equals 8, because even though 5 is written as
    a string, the sign causes it to be interpreted
    as a number.

6
Scalar Variables
  • also logical comparisons, if, while

7
Scalar Variables
  • Scalar variables hold a single value.
  • Scalar variable names start with in Perl. This
    makes them easy to spot. Names can contain
    letters, numbers and underscores, but must not
    start with a number, and they are case-sensitive.
  • Perl variables are loosely typed var can hold a
    number or a string, and there is no distinction
    between different types of number.
  • Variables are declared with the my keyword, and
    values are assigned with .
  • Variables are visible only within the smallest
    code block (enclosed by ) containing them.
  • Variables have global scope if declared in the
    main section of the program, outside any code
    block.
  • Using the contents of a variable as the name of
    another variable, a symbolic reference, is
    considered Very Bad in Perl.
  • For example, you have foo 'snonk', and then
    want to operate on the value of snonk.
  • Binary assignment operators dog, dog, dog
    3, dog . cat, etc.

8
Logical Operations
  • Perl considers these values false 0 (zero), 0
    (the string zero), (the empty string) and
    undef (undefined, the default value for a
    declared but undefined variable). Everything
    else is true.
  • Comparison between numbers uses different
    operators than comparison between strings!.
    vs. eq, ! vs. ne, gt vs. gt, lt vs. le, etc.
  • Logical ! is not, is and, is or.
    The words also work, but they have a much lower
    precedence than the symbols.
  • Cs ternary operator ? works in Perl too.

9
If Statements
  • if (logical_expression_in_parentheses)
  • code set off by curly braces
  • elsif (another logical test) note the
    spelling!
  • more code
  • else
  • code
  • Even single statements must be enclosed within
  • There is a backwards logic for single statements
  • print yes if (var gt 17)

10
While loops
  • while (logical test)
  • code block
  • There is also a do-while loop, as in C.
  • next ends loop execution are returns you to the
    logical test at the top. last breaks you out of
    the loop altogether.

11
Arrays and Lists
  • also for and foreach loops, scalar context, _

12
Lists
  • A list is a set of elements enclosed within
    parentheses and separated by commas. String
    elements must be quoted. You can mix numerical
    and string values. (1,3,5) is a list. So is (1,
    3.1416, duh).
  • The empty list is ().
  • The qw operator (quote word) adds commas and
    quotes at spaces
  • qw(3 dog day) is equivalent to (3, dog,
    day).

13
Arrays
  • An array is a variable that holds a list. Array
    names start with _at_.
  • Arrays adjust their size automatically no need
    to pre-declare an array size.
  • You can assign lists to arrays, arrays to lists,
    etc _at_arr (1, 3, duh) (dog, cat) _at_arr
  • Note that in the last case, dog gets 1, cat
    gets 3, and duh is discarded.
  • More subtly (_at_pets, dog, cat) qw( rover fido
    spot rex duke fluffy) causes _at_pets to get all
    the names, and dog and cat to remain undefined.
  • switching positions (last, first) (first,
    last) not in C!
  • print _at_arr runs all the elements together. print
    _at_arr separates the elements by a space.

14
Accessing Array Elements
  • Arrays are numbered from 0, as in C.
  • An individual array element is a scalar, so its
    name starts with .
  • Array indexes are given in square brackets.
    arr3 is the 4th element of _at_arr.
  • Important arr is a completely separate and
    independent variable from _at_arr (and its elements
    such as arr0 ).
  • Negative numbers are used to access array
    elements from the end arr-1 returns the last
    element in the array, arr-2 returns the
    next-to-last element, etc.
  • arr gives the index of the last array element.
    Thus arr-1 arr arr Very useful for
    loops.

15
Array Operations
  • You can easily add or remove elements from either
    end of an array.
  • push and pop operate on the end (right side)
    of an array.
  • push _at_arr, var is standard syntax for adding an
    element to an array.
  • shift and unshift operate on the left end of
    the array.
  • my var shift _at_arr is standard syntax for
    unloading an array.
  • Standard C-style for loops
  • for (my i 0 i lt 10 i) note variable
    declaration
  • do something
  • foreach loops in which each element of the
    array is substituted into the scalar in turn.
    Sometimes called indirect object syntax.
  • foreach element ( _at_arr )
  • do something with element

16
Two Perl Oddities Context-sensitive variables
and _
  • The value of an array changes when used as an
    array or when used as a scalar, i.e. in scalar
    context.
  • _at_arr (1, 3, 5, 7) array (or list) context
  • print _at_arr gives 1 3 5 7
  • but
  • var _at_arr scalar context number
    of elements
  • print var gives 4 (the number of
    elements in _at_arr)
  • In many cases, if you dont assign input to a
    variable, Perl automatically assigns it to the
    variable _, which can often be used without
    being written explicitly.
  • foreach (_at_pets) print
  • foreach (_at_pets) print _,

17
Hashes
18
Hashes
  • a hash is another fundamental data structure,
    like scalars and arrays. Hashes are sometimes
    called associative arrays.
  • Basically, a hash associates a key with a value.
    A hash is composed of a set of key-value pairs.
  • A key is a string any collection of characters,
    generally enclosed in quotes. Any scalar can be
    a key, but they are all converted to strings.
  • A value can be almost anything the values are
    just scalar variables.
  • One hash oddity neither the keys nor the values
    is sorted or stored in a useful order. The order
    you enter hash items is not related to the order
    with which you retrieve them.

19
Hash Specifics
  • The punctuation mark used to denote a hash is
    (percent sign).
  • Hash elements are accessed by enclosing the key
    in curly braces. For example, the hash
    stoplight can be populated as follows
  • stoplightred stop
  • stoplightyellow caution
  • stoplightgreen go
  • Each key can refer to only a single value. You
    cant have duplicate keys. If you try, the first
    value will be lost and only the second will work.
  • A hash is really a list with alternating keys and
    values. Thus it is possible to load a hash like
  • stoplight (red, stop, yellow,
    caution, green, go)
  • A better way is to use the gt operator (big
    arrow), which is really just a synonym for a
    comma (and it also quotes the keys)
  • stoplight (red gt stop,
  • yellow gt
    caution,
  • green gt go )

20
Hash Operations
  • keys gives a list of all the keys used in the
    hash. Heres a common use
  • foreach (keys stoplight)
  • print _ stands for
    stoplight_\n
  • values lists all the values. each returns a
    set of 2-member lists, key and value.
  • while ( (key, value) each stoplight)
  • print key value\n
  • Removing elements in a hash is done with
    delete
  • delete stoplightred
  • Testing for existence with exists
  • exists stoplightred) returns true if
    that key-value pair exists, and false if it
    doesnt.

21
Subroutines
  • a.k.a. functions
  • Also running external programs such as BLAST

22
Subroutines
  • Subroutines do not need to be pre-declared. They
    can be defined before, after, or in the middle of
    the main program. Although not often used,
    subroutines use as a punctuation mark.
  • Subroutines are defined with the keyword sub
    followed by the actual code within curly braces.
    For example
  • sub print_qwerty
  • print qwerty\n
  • Subroutines are invoked using their names. Any
    arguments need to be put inside parentheses
    following the subroutine name print_qwerty()
  • Subroutines can return more than one value, using
    the return keyword.
  • More than one value can be returned. They are
    returned as a list.
  • sub print_qwerty3
  • print qwerty\n
  • return 5, 17, uiop
  • (var1, var2, var3) print_qwerty3()

23
More on Subroutines
  • You can pass arguments into a subroutine as a
    list enclosed in parentheses
  • print_words(dog, cat)
  • The arguments are copied into an array called
    _at__, and they can be accessed as elements of
    that array from within the subroutine.
  • sub print_words
  • foreach my word (_at__)
  • print word\n
  • Note that you arent required to specify the
    number of arguments in advance.
  • Variables declared in the main body are global in
    scope, visible from within any subroutine.
    Variables declared within a subroutine are
    visible only within that subroutine.

24
File Interactions
  • The open command assigns a file name to a file
    handle.
  • open INFILE, my_file.txt
    opened for reading
  • best to test for success. Standard syntax
  • open INFILE, my_file.txt or die
    Couldnt open read-file\n
  • Files are read one line at a time, by enclosing
    the file handle in angle brackets
  • while (ltINFILEgt) print each line
    goes into _ by default
  • the chomp command removes the terminal newline
    character from input lines
  • while (my line ltINFILEgt)
  • chomp line
  • print line
  • To open a file for writing, use gt before the
    file name open OUTFILE, gtmy_file.txt
  • To actually write to this file, use the file
    handles name with print
  • print OUTFILE something
    interesting\n
  • Appending is done by using gtgt in front of the
    files name
  • open APPENDFILE,
    gtgtmy_file.txt
  • Files are closed automatically when the program
    terminates, but sometimes you need to
    specifically close them close INFILE
  • Command line arguments are passed to the program
    in the _at_ARGV array, equivalent to Cs argv.

25
Running External Programs with Perl
  • The most commonly used Perl command for running
    external programs is system. This command
    executes the program specified by its arguments,
    then returns control to the next line in your
    Perl program.
  • You can also enclose the program name in
    backticks the programs output to STDOUT is the
    return value of this output_string
    blastall p blastn i my_input_file d
    my_database
  • system returns the signal number resulting from
    the process it executed. If all went well, this
    is 0. Numbers other than 0 indicate some kind of
    error.
  • The simplest way to use system is to simply
    enclose the command line you need in quotes
  • system( blastall p blastn i my_input_file
    d my_database o my_blast_output.txt )
  • The above line invokes the bash shell to
    interpret the command line, converting each group
    of symbols separated by spaces into a separate
    argument.
  • You can avoid invoking a shell (a somewhat more
    secure method), by separating out the individual
    space-delimited segments yourself
  • system( blastall, p, blastn, i,
    my_input_file, d, my_database, o,
    my_blast_output.txt )

26
String Functions and Regular Expressions
27
String Manipulations
  • Dont forget . is the concatenation operator.
  • split takes a string and separates it into an
    array of strings at whatever pattern of
    characters is indicated as the first argument
    between slashes
  • split /,/, cat,dog,bird
  • This expression splits the string at each
    comma, returning cat, dog, bird. The
    splitting characters (the comma in this case)
    are discarded.
  • Note the comma after the splitting pattern /,/,
    . It is necessary!
  • To split a string into individual characters,
    use
  • my _at_chars split //, The dog
  • join takes the elements of an array and joins
    them into a single string, separated by whatever
    symbol(s) you like.
  • join , dog, cat, bird gives
    dogcatbird.
  • substr extracts part of a string, based on the
    start position and length of the desired
    substring. Note that the first position is 0.
  • my_substring substr string,
    start_pos, length

28
More String Functions
  • reverse reverses the string.
  • When used on an array, it reverses the order of
    the elements
  • The transliteration operator tr/// substitutes
    one character for another. It is invoked with
    the binding operator , which is extensively
    used with regular expressions. It uses two
    argument lists separated by slashes. It
    substitutes every instance of the first list with
    the corresponding element in the second list.
  • my sequence AAGCTG
  • sequence tr/ACGT/TGCA/ sequence is
    now TTCGAC
  • tr returns the number of characters converted, so
    it can be used to count them num (sequence
    tr/CG// ) returns the number of Gs and Cs.
  • to reverse-complement a DNA strand
  • sequence reverse sequence
  • sequence tr/ACGT/TGCA/

29
Regular Expressions
  • Regular expressions are the main way Perl matches
    patterns within strings. For example, finding
    pieces of text within a larger document, or
    finding a restriction site within a larger
    sequence.
  • Note regular expressions DO NOT work very well
    for DNA sequences, because they dont deal well
    with gaps.
  • There are 2 main operators that use regular
    expressions
  • 1. matching (which returns TRUE if a
    match is found and FALSE if no match is found).
    m/regex/ or just /regex/
  • 2. substitution, which substitutes one
    pattern of characters for another within a
    string. s/orginal_pattern/new_string/
  • split also uses regex.
  • Strings are associated with match or substitution
    operations using the binding operator .
  • Syntax
  • if (str /dog/ ) print matches
    matching
  • str s/dog/cat/ substitutes cat for
    dog in str

30
Pattern Matching
  • Literal matching exact match of each character
    with no gaps or mismatches
  • str doggie matches /dog/, /do/, /og/,
    but NOT /dg/ or /dm/
  • an i after the match pattern makes it
    case-insensitive /dog/i matches Dog.
  • Position assertions at the beginning means
    that the matched string must be at the beginning
    of the line at the end means it must be at the
    end
  • dog matches /do/, /do/, and /og/, but NOT
    /og/ or /do/
  • Quantifiers are placed after the character you
    want to match.
  • means 0 or more of the preceding character
  • means 1 or more
  • ? Means 0 or 1
  • 3 means 3 3,5 means 3, 4, or 5
  • for example /dog/ matches dg or dog or
    dooog
  • /dog/ matches dog but not
    dg

31
Character Classes
  • There are several built-in classes
  • . stands for any single character except
    newline
  • note that /./ matches anything, including the
    empty string
  • \d is a digit (0-9) and \D is any non-digit
  • \s is a whitespace character space or tab \S is
    non-whitespace
  • \w is a word character a letter, a digit, or
    underscore \W is any other character
  • Your own character classes are enclosed in square
    brackets acf is any single a, c, or f.
  • you negate a character class with a first
    acf is anything except a, c, or f.
  • you can use hyphens to indicate a range (ASCII)
    a-z is any small letter, a-zA-Z is any letter

32
Pattern Memory
  • To capture the matched pattern, surround it with
    parentheses. Then, the special variables 1, 2,
    3, etc. contain the matched pattern.
  • str The z number is z576890
  • str /is z(\d)/
  • print 1 prints 567890
  • the numbered variables are assigned left to right
    on the basis of the opening (left) parenthesis
  • /(the ((cat) (runs)))/
  • captures 1 the cat runs 2 cat
    runs 3 cat 4 runs.
  • these variables only exist within the smallest
    block of code (delimited by ) containing the
    regex.
  • Matching is greedy and not lazy by default
  • doggg /(dog)/ extracts doggg not
    dog
  • a ? after the quantifier converts to lazy
    matching
  • doggg /(dog?)/ extracts dog not
    doggg

33
Substitution
  • Basic syntax
  • string s/original_pattern/replacement_string/
  • The original pattern is a regular expression and
    can capture parts of the pattern with
    parentheses.
  • The replacement string is just a string, not a
    regex, although it can contain 1, 2, etc.
    memory variables
  • str A cat is a nice
    pet
  • str s/cat/dog/
  • print str
    prints A dog is a nice pet
  • Modifiers by default, only 1 substitution is
    made on the string. To substitute all instances
    of the pattern, put a g after the expression.
    Also, an i after the expression makes it
    case-insensitive.
  • str A cat is a cat is a CAT
  • str s/cat/dog/ gives A dog
    is a cat is a cat
  • str s/cat/dog/gi gives A dog
    is a dog is a dog
  • You can also use substitution to remove
    characters. Thus s/ACGT//g finds any character
    that isnt A, C, G, or T and replaces it with
    nothing.
  • Substitution and assignment keeps the original
    string intact and assigns the altered string to a
    new variable
  • (newstr oldstr) s/cat/dog/

34
References and Data Structures
  • (including multidimensional arrays and hashes,
    and Sorting)

35
References
  • In Perl, the backslash is used to create a
    reference (i.e. a pointer)
  • my var 5
  • my var_ref \var
  • To dereference a simple reference, put it inside
    curly braces with another in front of it.
    Thus, var_ref is the same as var, that is,
    the value 5.
  • In many cases you can leave the curly braces out
    var_ref works just as well as var_ref.
    But, in complicated expressions this can cause
    havoc due to precedence problems.
  • To dereference array elements, the arrow notation
    is preferred
  • my arr_ref \(1,3,5,duh)
  • print arr_ref-gt3 prints duh
  • arr_ref3 also works (de-referencing
    arr_ref with )
  • use for hash references instead of
  • References to arrays and hashes are the standard
    way of passing these items into and out of
    subroutines, to avoid copying them.

36
Multidimensional Arrays
  • Square brackets are used to generate a reference
    to an anonymous array, which is then assigned to
    a scalar variable.
  • arr_ref 1, 3, 5, 7
  • arr2_ref _at_array
  • Similarly, curly braces generate references to
    anonymous hashes.
  • A two-dimensional array consists of an array of
    references to a set of anonymous arrays.
  • _at_array_2d (1,2,4, 7,8,9 , 5,6,3)
  • Dereferencing is as in C
  • print array_2d01 prints 2
  • Multidimensional arrays, or mixtures of arrays
    and hashes, are generated similarly.
  • Autovivification you need to declare the
    top-level array or hash, but all lower levels
    come into existence automatically.
  • my hash_ref
  • hash_ref-gtdog3color brown array
    elements 0, 1, and 2 are undef

37
Sorting
  • Perl has a built-in quicksort function (but of
    course if you really enjoy writing sort routines,
    please feel free to indulge yourself).
  • By default, sort goes in ASCII order
  • my _at_sorted sort _at_array
  • my _at_sorted_keys sort keys hash
  • For numerical sort
  • my _at_sorted sort a ltgt b _at_array
  • my _at_sorted_indexes sort a ltgt b 0
    ..array
  • Each pair of elements in _at_array is substituted
    into the special a and b variables. Use these
    names only!
  • largest-to-smallest (reverse) numerical sort
  • my _at_sorted sort b ltgt a _at_array
  • A sorting function is written within the curly
    braces. It needs to return a negative number if
    a is greater than b, 0 if they are equal, and a
    positive number if a is less than b.
  • Perl uses the built-in ltgt operator for numerical
    comparisons, and cmp for ASCII comparisons.
  • A multi-level sort is done using (or), since
    the comparison returns 0 (false) if the top level
    items are equal.
  • _at_last_names qw(Coburn Smith
    Jones Jones Smith)
  • _at_first_names qw(Fred Harold
    Mary Jane Hortense)
  • _at_sorted_indexes sort
    last_namesa cmp last_namesb

  • first_namesa cmp first_namesb

  • 0 .. last_names

38
Modules and Object-Oriented Perl
39
Modules
  • Commonly used subroutines are often put into a
    separate file, a module.
  • a module is just a text file, not made
    executable, with no invocation of Perl at the
    top.
  • modules are given a .pm extension
  • modules must return a true value, so nearly all
    of them have 1 as their last line.
  • the content of a module is a package, which is
    given the same name as the module. The package
    is set off by curly braces, and it contains your
    subroutines.
  • for example, MyModule.pm looks like
  • package MyModule
  • sub my_sub1 whatever
  • sub my_sub2 whatever else
  • 1
  • Module files need to be located in one of the
    directories listed in the built-in _at_INC array.
  • To put your own directory in this array unshift
    _at_INC, path_to_your_lib
  • modules can also be in the same directory as the
    main program, since the current directory (.) is
    listed in _at_INC.

40
More Modules
  • The subroutines and variables in a module are in
    a separate namespace from the main program (whose
    namespace is called main).
  • To use them, you need to have a line like use
    MyModule in your program (the module name
    without the .pm).
  • also, you need to provide the fully-qualified
    name of the variable or subroutine, which is the
    module name followed by 2 colons
  • MyModulevar1 or MyModulemy_sub2()
  • Some modules allow you to import specific
    subroutines with a construction like
  • use MyModule qw(mysub1 mysub2)
  • In this case, the module name does not have to be
    used when invoking those subroutines.
  • Details of exporting and importing are found with
    the standard Perl Exporter module see the
    documentation for that.
  • One of the joys of Perl is that people share a
    lot of useful code in the form of modules. The
    central repository is CPAN (www.cpan.org).
    Before writing your own module to do something
    obvious, look there first.
  • caveat emptor some modules are very high quality
    and others arent
  • some require other Perl modules, or compiled C
    libraries, to be installed first
  • In general, after downloading a module,
    installation is done by these 3 commands at the
    Unix prompt
  • perl Makefile.PL
  • make
  • make install

41
Object-Oriented Perl
  • A class is defined by a package. Class methods
    are subroutines in that package.
  • CRITICAL Arrow notation
  • Class-gtmethod(par1, par2) is interpreted in Perl
    as Classmethod(Class, par1, par2). That is,
    the class name becomes the first member of the _at__
    array passed to the method (subroutine).
  • For example, in the main program Cow-gtsound()
  • in Cow.pm
  • package Cow
  • sub sound
  • my class shift
  • print the class says Moooo
  • end of Cow
  • Inheritance is done with the _at_ISA array (is-a),
    which must be declared in each package with the
    our keyword. _at_ISA lists all the superclasses
    for this class.
  • package Cow
  • our _at_ISA qw(Animal MethaneProducer)
  • ...

42
Instances
  • An instance of a class is defined by a reference
    to an anonymous hash.
  • The hash reference gets associated with its class
    using the keyword bless.
  • my elsie
  • bless elsie, Cow
  • In Perl, most constructors are called new. An
    example
  • package Cow
  • sub new
  • my class shift
  • my self anonymous
    hash ref
  • bless self, class
  • end of Cow
  • my elsie Cow-gtnew invocation in main
    program, creating a new instance
  • Default properties of the instances are put into
    the anonymous hash in the new method
  • my self legs gt 4,
  • color gt brown ,
  • sound gt moo
  • my bossie Cow-gtnew(color gt white)
    override the default color

43
Instance Methods
  • Accessor methods (also called set or get).
    Note that by default, instance data members are
    NOT private. Access using accessor methods is a
    matter of politeness, not force.
  • sub color
  • my self shift
  • if (_at__) arguments exist, so
    its a set
  • self-gtcolor shift
  • else no arguments, so its a
    get
  • return self-gtcolor
  • Destructors Perl uses an automatic garbage
    collection system. When the last reference to an
    object is removed, the object is automatically
    destroyed. Thus class modules rarely contain
    explicit destructors.
  • Operator overloading overload is a built-in
    method. To overload an operator, define the
    altered method as a subroutine within the
    package, and put in a line like
  • use overload gt my_add
Write a Comment
User Comments (0)
About PowerShow.com