Title: Programming and Perl for Bioinformatics Part V
1Programming and Perlfor BioinformaticsPart V
2References and Objects
3What Are References?
- A reference is a (starting) address of a memory
block that stores some data also called a
pointer
...
G
A
T
C
010010
010010
010011
str_ref \string_1
010100
list_ref \_at_list_1
010101
hash_ref \hash_1
...
3
2/22/2014
4What Good Are References?
- An array of arrays (can do the job of a
2-dimensional matrix) - Spot_num Ch1-BKGD CH1 Ch2-BKGD Ch2
- 000 0.124 43.2 0.102 80.4
- 001 0.113 60.7 0.091 22.6
- 002 0.084 112.2 0.144 35.3
- Code my _at_spotarray ( 0.124, 43.2, 0.102,
80.4, - 0.113, 60.7, 0.091, 22.6,
- 0.084, 112.2, 0.144, 35.3)
2/22/2014
Perl in a Day - Subroutines
4
5What Good Are References?
- A hash of arrays
- Accession Ch1-BKGD CH1 Ch2-BKGD Ch2
- AW10021 0.124 43.2 0.102 80.4
- BE52002 0.113 60.7 0.091 22.6
- W20209 0.0841 12.2 0.144 35.3
- Code
- my spothash ('AW10021' gt 0.124, 43.2, 0.102,
80.4, 'BE52002' gt 0.113, 60.7, 0.091,
22.6, - 'W20209' gt 0.0841, 12.2, 0.144, 35.3
- )
- Hashes of hashes, and other more complex data
structures
2/22/2014
Perl in a Day - Subroutines
5
6What Is A Reference?
- _at_y ( 1, a, 2.3 )
- ref_to_y \_at_y
- print _at_y yields 1a2.3
-
- print ref_to_y yields
- ARRAY(0x80cd6ac)
1 a 2.3
_at_y
2/22/2014
Perl in a Day - Subroutines
6
7Getting At The Value de-referencing
- Using a block
- _at_array_reference hash_reference
scalar_reference - print _at_ref_to_y yields 1a23.
- Or without it
- _at_x _at_ref_to_y
- foo two humps
- scalar_ref \foo
- camel_model scalar_ref is now two humps
- push (_at_array_ref, filename)
- hash_refKEY VALUVE
2/22/2014
Perl in a Day - Subroutines
7
8Getting At The Value de-referencing
- Using the Arrow Operator
- array_ref 0 1 array_ref 0
1 array_ref-gt0 1 - Note that array3 and array-gt3 are NOT the
same. - my hash_copy hash_ref
- my hash_value hash_ref'some_key'
- my hash_value hash_ref-gt'some_key'
2/22/2014
Perl in a Day - Subroutines
8
9Getting At The Value de-referencing
- Reference to subroutines my_cool_sub
\subroutine - Dereference
- my result
- my_cool_sub(arg1, arg2)
- invoke my_cool_sub with two arguments
- using block operator
- my result
- my_cool_sub(arg1, arg2)
- or without it
- my result
- my_cool_sub-gt(arg1,arg2)
- or using arrow
2/22/2014
Perl in a Day - Subroutines
9
10Getting At The Value de-referencing
- _at_y ( 1, a, 2.3 )
- ref_to_y \_at_y
- print _at_y yields 1a2.3
-
- print ref_to_y yields
- ARRAY(0x80cd6ac)
1 a 2.3
_at_y
2/22/2014
10
11Getting At The Value de-referencing
- y3 'z'
- print _at_ref_to_y yield 1a2.3z
- _at_y (5, 6, 7)
- print _at_ref_to_y yield 567
- Why?
- Regular variables static scoping
- Reference variables dynamic scoping
2/22/2014
Perl in a Day - Subroutines
11
12Making References To Arbitrary Values From
Scratch
- Anonymous Hashes or Arrays
-
- y_gene_families
- 'DAZ', 'TSPY', 'RBMY', 'CDY1',
'CDY2' - instead of ( and )
- y_gene_family_counts 'DAZ' gt 4,
- 'TSPY' gt 20,
- 'RBMY' gt 10,
- 'CDY2' gt 2
-
- instead of ( and )
- y_gene_families gets a reference to an array,
and - y_gene_family_counts gets a reference to a
hash.
2/22/2014
Perl in a Day - Subroutines
12
13Making References To Arbitrary Values From
Scratch
- for (keys y_gene_family_counts)
- print "_\n"
- my _at_a _at_y_gene_families
- y_gene_families0
- y_gene_family_counts'DAZ'
- Arrow shorthand
- y_gene_families-gt0 yields 'DAZ'
- y_gene_family_counts-gt'DAZ' yields '4'
2/22/2014
Perl in a Day - Subroutines
13
14New Function ref
- ref - What kind of value does this reference
point to? - print ref(y_gene_families), "\n"
- ARRAY
- print ref(y_gene_family_counts), "\n"
- HASH
- x 1 print ref(x), "\n"
- (empty string) return null string if not a
reference. - Return values SCALAR, ARRAY, HASH, CODE
2/22/2014
Perl in a Day - Subroutines
14
15Two-Dimensional Arrays Matrices
- _at_probes ( 1, 3, 2, 9,
- 2, 0, 8, 1,
- 5, 4, 6, 7,
- 1, 9, 2, 8 )
- print "The probe at row 1, column 2 has value ",
probes12,"\n" - It prints The probe at row 1, column 2 has
value 8 - probes_ref 1, 3, 2, 9,
- 2, 0, 8, 1,
- 5, 4, 6, 7,
- 1, 9, 2, 8
- print "The probe at row 1, column 2 has value ",
- probes_ref-gt12, "\n"
- It prints The probe at row 1, column 2 has
value 8 - probes_ref-gt12 is a shorthand for
probes_ref-gt1-gt2 - it can also be written as probes_ref12
16Complex Data Structure
- gene
- hash of basic information about the gene
name, discoverer, - discovery date and laboratory.
-
- name gt 'antiaging',
- reference gt 'G. Mendel', '1865',
- laboratory gt 'Dept. of Genetics', 'Cornell
University', - 'USA'
- ,
- scalar giving priority
- 'high',
- array of local work history
- 'Jim', 'Rose', 'Eamon', 'Joe'
-
- print "Name is ", gene-gt0'name', "\n"
- print "Research center is ", gene-gt0'labo
ratory'1, - "\n"
17Passing References to Subroutines
- Perl collapses all arguments to a subroutine as a
list of scalars. This makes it impossible to
distinguish between two arrays you might try to
pass to a subroutine, as the following example
illustrates - _at_aminoacids1 ('E', 'V', 'L')
- _at_aminoacids2 ('D', 'T', 'Y')
printacids(_at_aminoacids1, _at_aminoacids2) - sub printacids
- my(_at_aa1, _at_aa2) _at__
- print "Amino acids 1\n"
- print "_at_aa1\n"
- print "Amino acids 2\n"
- print "_at_aa2\n"
- This prints out
- Amino acids 1
- E V L D T Y
- Amino acids 2
18Passing References to Subroutines
- Here is how to fix the previous example
- _at_aminoacids1 ('E', 'V', 'L')
- _at_aminoacids2 ('D', 'T', 'Y')
printacids(\_at_aminoacids1, \_at_aminoacids2) - sub printacids
- my(aa1, aa2) _at__
- print "Amino acids 1\n"
- print "_at_aa1\n"
- print "Amino acids 2\n"
- print "_at_aa2\n"
- This prints out
- Amino acids 1
- E V L
- Amino acids 2
- D T Y
19Perl Object Syntax
- Perl objects are special references that come
bundled with a set of functions that know how to
act on the contents of the reference. - For example, in BioPerl, there is a class of
objects called Sequence. Internally, the object
is a hash reference that has keys that point to
the DNA string, the name and source of the
sequence, and other attributes. The object is
bundled with functions that know how to
manipulate the sequence, such as revcom( ),
translate( ), subseq( ), etc. - When talking about objects, the bundled functions
are known as methods.
20Perl Objects
- For example, if we have a Sequence object stored
in the scalar variable sequence1, we can call
its methods like this - reverse_complement sequence1-gtrevcom()
first_10_bases sequence1-gtsubseq(1,10) - protein sequence1-gttranslate()
- You will learn later from the BioPerl lecture
that revcom(), subseq() and translate() are all
returning new Sequence objects that themselves
know how to revcom(), translate() and so forth.
So if you wanted to get the protein translation
from the reverse complement, you could do this - reverse_complement sequence-gtrevcom()
- protein reverse_complement-gttranslate()
21Creating Objects
- Before you can start using objects, you must load
their definitions from the appropriate module(s).
For example, if we want to load the BioPerl
Sequence definitions, we load the appropriate
module, which in this case is called
BioPrimarySeq (you learn this from reading the
BioPerl documentation) - !/usr/bin/perl -w
- use strict
- use BioPrimarySeq
- Now you'll probably want to create a new object.
There are a variety of ways to do this, and
details vary from module to module, but most
modules, including BioPrimarySeq, do it using
the new() method - my sequence1 new
- BioPrimarySeq('gattcgattccaaggttccaaa')
22Creating Objects
- The syntax here is
- ModuleName-gtnew(_at_args)
- where ModuleName is the name of the module that
contains the object definitions. - The new( ) method will return an object that
belongs to the ModuleName class. - In the example above, we get a BioPrimarySeq
object, which is the simplest of BioPerl's
various Sequence object types.
23Creating Objects
- When you call object methods, you can pass a list
of parameters, just as you would to a regular
function. - As methods get more complex, parameter lists can
get quite long and have possibly dozens of
optional parameters. To make this manageable,
many object-oriented modules use a named
parameter style of parameter passing, that looks
like this - my result object-gt
- method( -arg1gtvalue1, -arg2gtvalue2,
- -arg3gtvalue3, ... )
- In this case "-arg1", "-arg2", and so on are the
names of parameters, and value1, value2 are the
values of those named parameters. The name/value
pairs can occur in any order.
24Creating Objects
- Rather than create a humungous argument list
which forces you to remember the correct position
of each argument, BioPrimarySeq lets you create
a new Sequence this way - !/usr/bin/perl -w
- use strict
- use BioPrimarySeq
- my sequence1 BioPrimarySeq-gtnew(
- -seq gt 'gattcgattccaaggttccaaa',
- -id gt 'oligo23',
- -alphabet gt 'dna',
- -is_circular gt 0,
- -accession_number gt 'X123'
- )