Title: Bioperl modules
1Bioperl modules
2Object Oriented Programming in Perl (1)
- Defining a class
- A class is simply a package with subroutines that
function as methods.
!/usr/local/bin/perl package Cat sub new
sub meow
3Object Oriented Programming in Perl (2)
- Perl Object
- To initiates an object from a class, call the
class new method.
new_object new ClassName
- Using Method
- To use the methods of an object, use the -gt
operator.
cat-gtmeow()
4Object Oriented Programming in Perl (3)
- Inheritance
- Declare a class array called _at_ISA.
- This array store the name and parent class(es) of
the new species.
package NorthAmericanCat _at_NorthAmericanCatISA
(Cat) sub new
5Perl Modules
- A Perl module is a reusable package defined in a
library file whose name is the same as the name
of the package.
6Names of perl modules
- Each Perl module has a unique name.
- To minimize name space collision, Perl provides a
hierarchical name space for modules. - Components of a module name are separated by
double colons (). - For example,
- MathComplex
- MathApprox
- StringBitCount
- StringApprox
7Module files
- Each module is contained in a single file.
- Module files are stored in a subdirectory
hierarchy that parallels the module name
hierarchy. - All module files have an extension of .pm.
Module Is stored in
Config Config.pm
MathComplex Math/Complex.pm
StringApprox String/Approx.pm
8Module libraries
- The Perl interpreter has a list of directories in
which it searhces for modules. - Global arry _at_INC
- gtperl V
- _at_INC
- /usr/local/lib/perl5/5.00503/sun4-solaris
- /usr/local/lib/perl5/5.00503
- /usr/local/lib/perl5/site-perl/5.005/sun4-solaris
- /usr/local/lib/perl5/site-perl/5.005
9Using Modules
- A module can be loaded by calling the use
function. - use Foo
- bar( a ) using bar method
- blat( b ) using blat method
10Bioperl toolkit
- Core package (bioperl-live)
- THE basic package and its required by all the
other packages - Run package (bioperl-run)
- Providing wrappers for executing some 60 common
bioinformatics applications - DB package (bioperl-db)
- Subproject to store sequence and annotation data
in a BioSQL relational database - Network package (bioperl-network)
- Parses and analyzes protein-protein interaction
data - Dev package (bioperl-dev)
- New and exploratory bioperl development
11(No Transcript)
12Bioperl Object-Oriented
- The Bioperl takes advantages of the OO design to
create a consistent, well documented, object
model for interacting with biological data in the
life sciences. - Bioperl Name space
- The Bioperl package installs everything in the
Bio namespace. - (where are the packages stored???)
13Bioperl Objects
- Sequence handling objects
- Sequence objects
- Alignment objects
- Location objects
- Other Objects
- 3D structure objects, tree objects and
phylogenetic trees, map objects, bibliographic
objects and graphics objects
14Sequence handling
- Typical sequence handling tasks
- Access the sequence
- Format the sequence
- Sequence alignment and comparison
- Search for similar sequences
- Pairwise comparisons
- Multiple alignment
15Sequence Annotation
- BioSeqFeature Sequence object can have
multiple sequence feature (SeqFeature) objects
(e.g. Gene, Exon, or Promoter objects) associated
with it. - BioAnnotation A Seq object can also have an
Annotation object (used to store database links,
literature references and comments) associated
with it
16Sequence Input/Output
- The BioSeqIO system was designed to make
getting and storing sequences to and from the
myriad of formats as easy as possible.
17Accessing sequence data
- Bioperl supports accessing remote databases as
well as local databases. - Bioperl currently supports sequence data
retrieval from the GenBank, Genpept, RefSeq,
SwissProt, and EMBL databases
18Format the sequences
- SeqIO object can read a stream of sequences in
one format Fasta, EMBL, GenBank, Swissprot, PIR,
GCG, SCF, phd/phred, Ace, or raw (plain
sequence), then write to another file in another
format
19Manipulating sequence data
- seqobj-gtdisplay_id() the human readable id of
the sequence - seqobj-gtsubseq(5,10) part of the sequence as
a string seqobj-gtdesc() a description of the
sequence - seqobj-gttrunc(5,10) truncation from 5 to 10
as new object - seqobj-gtrevcom reverse complements sequence
- seqobj-gttranslate translation of the
sequence
20Search result parsing
- The BioSearchIO system was designed for
parsing sequence database searches (BLAST, sim4,
waba, FASTA, HMMER, exonerate, etc.)
21Manipulating alignment
- The BioAlignIO system was designed for
manipulating the alignment objects in different
formats including aln, phylip, fasta, etc.
22Example Format the sequences
- Example
- using seq_formating.pl to convert
sequences.gb to another format
23Copy the files to the current directory
Check whether the files are executable
Now, lets look at the genbank file.
24The home directory in Windows system.
If you have Notepad installed, click Edit with
Notepad. If not, try to open sequence.gb
with Notepad program.
25(No Transcript)
26uncheck
27The format of the input sequences.
28The perl script file
29(No Transcript)
30If no arguments were supplied, a usage
information will appear for instructions.
31ltentergt
Program name
Format of the input sequences
Format of the output sequences
Input file
Output file
32Program suceeded! Now its time to look at the
file generated.
33Use command prompt to run the script
34Type cdltspacegtc\BioDownload To enter the
BioDownload folder
35- Type
- dir
- To display the files in the current folder (NOT
ls) - You should have the following files in the folder
- (you may have other files, but thats fine)
- seq_formating.pl
- sequences.gb.txt
36Type perlltspacegtseq_formating.plltspacegtsequences.
gb.txtltspacegtgenbankltspacegtsequences.fastaltspacegtf
asta
37Output file
38The format of the output sequences.
39Parsing the BLAST output
Whats next