Various Career Options Available - PowerPoint PPT Presentation

About This Presentation
Title:

Various Career Options Available

Description:

Most popular on Unix/Linux systems. Excellent language for file and data processing ... Yes, there is documentation, but it can be difficult to see the big picture' ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 40
Provided by: imtec5
Category:

less

Transcript and Presenter's Notes

Title: Various Career Options Available


1
Introduction to Perl BioPerl Dr G. P. S.
Raghava Bioinformatics Centre IMTECH,
Chandigarh Email raghava_at_imtech.res.in Web
http//imtech.res.in/raghava/
2
Perl
  • Practical Extraction and Report Language
  • Created by Larry Wall
  • Runs on just about every platform
  • Most popular on Unix/Linux systems
  • Excellent language for file and data processing

3
Simple Program
On Unix, this is the location of the Perl
interpreter
!/usr/local/bin/perl This is a comment line.
This program prints Hello World. to the
screen. print Hello world.\n
Comments start with and end with the end of the
line
Program statements are terminated with semicolons
Newline character
4
Control Structures
!/usr/local/bin/perl Print out
0,1,2,3,4,5,6,7,8,9 in this case, x is local
only to the loop because my is used for (my x
0 x lt 10 x) print x if
(x lt 9) print , print \n
5
Control Structures
!/usr/local/bin/perl Demonstrate the foreach
loop, which goes through elements in an
array. my _at_users (bonzo, gorgon, pluto,
sting) foreach user (_at_users) print
user is alright.\n
6
Functions
  • Use sub to create a function.
  • No named formal parameters, assign _at__ to local
    subroutine variables.

!/usr/local/bin/perl Subroutine for
calculating the maximum sub max my max
shift(_at__) shift removes the first value
from _at__ foreach val (_at__) max
val if max lt val Notice perl allows post
ifs return max high
max(1,5,6,7,8,2,4,9,3,4) print High value is
high\n
7
Files
  • File handles are used to access files
  • open and close functions

!/usr/local/bin/perl Open a file and print its
contents to copy.txt my filename
ARGV0 open(MYFILE, ltfilename) lt
indicates read, gt indicates write open(OUTPUT,
gtcopy.txt) while (line ltMYFILEgt) The
ltgt operator reads a line print OUTPUT line
no newline is needed, read from
file close MYFILE Parenthesis
are optional
8
Regular Expressions
  • One of Perls strengths is pattern matching
  • Perls regular expression language is extremely
    powerful, but can be challenging to learn
  • Some examples follow

9
Comma Separated Value Files
!/usr/local/bin/perl Some simple code
demonstrating how to use split and regular
expressions. This code extracts out values in a
CSV file. my filename ARGV0 open(INPUT,
ltfilename) while (ltINPUTgt) chomp
Remove terminating newline my
_at_values split /,/ Split string in _ where ,
exists print The first value is .
values0 . \n close INPUT
10
Objects
  • Perl supports object oriented programming
  • Constructor name is new
  • A class is really a special kind of package.
  • Objects are created with bless

11
Example Class Definition
package Critter constructor sub new my
objref reference to an empty hash
bless objref make it an object in
Critter class return objref return
the reference Instance method, first
parameter is object reference sub display
my self shift just to demonstrate
print Im a critter.\n 1 must end class
with a true value
Store in Critter.pm
12
Example Object Usage
!/usr/local/bin/perl use Critter my critter
new Critter create an object critter-gtdisp
lay display the object display
critter alternative notation
13
BioPerl (http//www.bioperl.org/)
  • Defnition It is a collection of perl modules
    that facilitate the development of perl script
    for bioinformatics.
  • FACTS
  • Started in 1995 by open Bioinformatics Foundation
  • These are reusable codes (subroutine)
  • It does not include any ready to use program
  • It need basic knowledge of perl programming
  • Bioperl is an open source software that is still
    under active development
  • Biojava, Biophython, Biocorba, EMBOSS

14
What is BioPerl?
  • An open source project
  • http//bio.perl.org or http//www.cpan.org
  • A loose international collaboration of
    biologist/programmers
  • Nobody (that I know of) gets paid for this
  • A collection of PERL modules and methods for
    doing a number of bioinformatics tasks
  • Think of it as subroutines to do biology
  • Consider it a tool-box
  • If you need a hammer, it is in there.
  • If you need a 7/16 hex-head wrench, you might
    need to code that yourself.

15
What BioPerl isnt
  • Out of the box solutions to problems
  • You will have to know a little perl, and you will
    have to read documentation
  • Particularly well documented
  • Yes, there is documentation, but it can be
    difficult to see the big picture - or sometimes
    the small picture
  • Fast
  • Generally the overhead involved in making perl
    objects (complicated ones) and loading in
    dependencies tends to make bioperl slow.
  • My own blast parser takes a small fraction of the
    time of the bioperl parser. Pro mine is fast.
    Con I had to write it and maintain it.

16
Application of Bioperl
  • Accessing sequence data from local and remote
    databases
  • Transforming formats of database/ file records
  • Manipulating individual sequences
  • Searching for similar'' sequences
  • Creating and manipulating sequence alignments
  • Searching for genes and other structures on
    genomic DNA
  • Developing machine readable sequence annotation

17
Accessing sequence data from Databases
  • Accessing data from remote databases
  • GenBank
  • Swissprot
  • EMBL
  • Indexing and accessing local databases
  • Create index for your file
  • Access the file by keyword

18
Transforming formats of database/ file records
  • This allows to convert your sequence/alignment
    from one format to another format (eg. Readseq)
  • Change sequence format
  • From GCG, PIR, FASTA etc.
  • To FASTA,GCG. PIR etc.
  • Change format of multiple sequence aignment
  • FASTA, CLUSTAL-W, GCG etc.
  • GCG, FASTA CLUSTAL-w etc.

19
Manipulating sequence
  • Bioperl contains many modules with functions for
    sequence analysis.
  • Sequence data manipulation Display component,
    Reverse complement, Translate/Reverse translate
    from NT to Protein
  • Obtaining basic sequence statistics (Residues,
    MW, CODON)
  • Restriction enzyme mapping
  • Identifying amino acid cleavage sites
  • Manipulation by EMBOSS using BioPerl

20
Searching for similar'' sequences
  • Running BLAST locally (Standalone BLAST)
  • Running BLAST remotely
  • Parsing BLAST and FASTA reports
  • Parsing HMM reports

21
Creating and manipulating sequence alignments
  • Aligning 2 sequences with Smith-Waterman (pSW)
  • Aligning 2 sequences with Blast using bl2seq and
    AlignIO
  • Aligning multiple sequences (Clustalw.pm,
    TCoffee.pm)
  • Manipulating / displaying alignments (SimpleAlign)

22
Example 1
!/usr/local/bin/perl Collect documents from
PubMed containing the term Breast Cancer and
print them. use BioBiblio my biblio new
BioBiblio my collection biblio-gtfind(breas
t cancer) while (collection-gthas_next)
there are underlines before next print
collection-gtget_next
23
Example 2
!/usr/local/bin/perl Get a sequence from
RefSeq by accession number use BioDBRefSeq
gb new BioDBRefSeq seq
gb-gtget_Seq_by_acc(NM_007304) print
seq-gtseq()
24
A simple script
  • !/opt/perl/bin/perl -w
  • bioperl_gb_fetch.pl
  • use strict
  • use BioDBGenBank
  • my gb BioDBGenBank-gtnew()
  • my seq_obj gb-gtget_Seq_by_acc(AF303112)
  • print seq_obj-gtdisplay_id(), \n
  • print seq_obj-gtseq(),\n

25
Changing Formats
  • !/opt/perl/bin/perl -w
  • genbank_to_fasta.pl
  • use BioSeqIO
  • my input BioSeqIOnew-gt(-file gt
    ARGV0,

  • -format gt GenBank)
  • my output BioSeqIOnew-gt(-file gt
    gtoutput.fasta,

  • -format gt Fasta)
  • while (my seq input-gtnext_seq())
  • output-gtwrite_seq(seq)

26
Parsing Blast Reports
  • One of the strengths of BioPerl is its ability to
    parse complex data structures. Like a blast
    report.
  • Unfortunately, there is a bit of arcane
    terminology.
  • Also, you have to think like bioperl, in order
    to figure out the syntax.
  • This next script might get you started

27
Parse BLAST output
  • !/opt/perl/bin/perl -w
  • bioperl_blast_parse.pl
  • program prints out query, and all hits with
    scores for each blast result
  • use BioSearchIO
  • my record BioSearchIO-gtnew(-format gt
    blast, -file gt ARGV0)
  • while (my result record-gtnext_result)
  • print gt, result-gtquery_name, ,
    result-gtquery_description, \n
  • my seen 0
  • while (my hit result-gtnext_hit)
  • print \t, hit-gtname, \t, hit-gtbits, \t,
    hit-gtsignificance, \n seen
  • if (seen 0 ) print No Hits Found\n

28
SearchIO parsers
  • SearchIO can parse (and reformat) several formats
    containing alignment or similarity data
  • blast
  • xml formatted blast (still a little wonky)
  • psi-blast
  • exonerate
  • WABA
  • FASTA
  • HMMER
  • The interface is the same for all of these,
    making your life a little easier.

29
Some advanced BioPerl
  • What if you want to draw pretty images of blast
    reports? BioGraphics to the rescue
  • Input - A blast report (single query)
  • Output
  • Script in PERL13/blast_to_pngImage.pl

30
Some really advanced BioPerl
  • What if you want to display an entire genome with
    annotation? Grow your own genome project
  • We can do that all in BioPerl (and a mySQL or
    Oracle database)
  • GBrowse - the Generic Genome Browser
  • Allows any feature to be displayed on a reference
    sequence
  • Many different styles of Glyphs, so all
    features can be drawn in a different style
  • Allows user to zoom in and out on reference
    sequence
  • User can select which features to display
  • User can upload their own features to be
    displayed on your reference sequence
  • And more!!!
  • http//www.bch.msu.edu/cgi-bin/gbrowse
  • http//www.gmod.org/

31
GBrowse (Generic Genome Browser)
32
BioPerl Pros and Cons
  • It can be a substantial investment in time to
    learn how to use bioperl properly
  • Once you get used to it, it is pretty simple to
    do some complicated things
  • There are a lot of nice tools in there, and
    (usually) somebody else takes care of fixing
    parsers when they break
  • BioPerl code is portable - if you give somebody a
    script, it will probably work on their system
  • BioPerl, unfortunately, has high overhead.
    Sometimes it is quicker and easier to write it
    yourself

33
So what is BioPerl? (continued)
  • 551 modules (incl. 82 interface modules)
  • 37 module groups
  • 79,582 lines of code (223,310 lines total)
  • 144 lines of code per module
  • For More info BioPerl Module Listing

34
Some statistics
35
Downloading modules
  • Modules can be obtained from
  • www.CPAN.org (Perl Modules)
  • www.BioPerl.org (BioPerl Modules)
  • Downloading modules from CPAN
  • Interactive mode
  • perl -MCPAN -e shell
  • Batch mode
  • use CPAN
  • clean, install, make, recompile, test

36
Installing Modules
  • Steps for installing modules
  • Uncompress the module
  • Gunzip file.tar.gz
  • Tar xvf file.tar
  • perl Makefile.PL
  • make
  • make test
  • make install

37
Directory Structure
  • BioPerl directory structure organization
  • Bio/ BioPerl modules
  • models/ UML for BioPerl classes
  • t/ Perl built-in tests
  • t/data/ Data files used for the tests
  • scripts/ Reusable scripts that use BioPerl
  • scripts/contributed/ Contributed scripts not
    necessarily integrated into BioPerl.
  • doc/ "How To" files and the FAQ as XML

38
Platforms
  • HP-UX / Itanium ('ia64') 11.0
  • HP-UX / PA-RISC 11.0
  • MacOS
  • MacOS X
  • CygWin NT_5 v. 1.3.10 on Windows 2000 5.00
  • Win32, WinNT i386
  • IRIX64 6.5 SGI
  • Solaris 2.8 UltraSparc
  • OpenBSD 2.8 i386
  • RedHat Linux 7.2 i686
  • Linux i386

39
References
  • Programming Perl by Wall, Christiansen, and
    Schwartz (OReilly)
  • Learning Perl by Schwartz and Phoenix (OReilly)
  • Beginning Perl for Bioinformatics by Tisdall
    (OReilly)
  • http//www.perl.com
  • http//www.bioperl.org
Write a Comment
User Comments (0)
About PowerShow.com