Prsentation PowerPoint - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Prsentation PowerPoint

Description:

Bioinformatic and Comparative Genome Analysis Course. HKU-Pasteur Research Centre - Hong Kong, China. August 17 - August ... changer un caract re par un autre : ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 23
Provided by: fredjt
Category:

less

Transcript and Presenter's Notes

Title: Prsentation PowerPoint


1
Introduction to perl programmingthe minimum to
know!
Fredj Tekaia Institut Pasteur tekaia_at_pasteur.fr
Bioinformatic and Comparative Genome Analysis
Course HKU-Pasteur Research Centre - Hong Kong,
China August 17 - August 29, 2009
2
perl
A basic program !/bin/perl Program to
print a message print 'Hello world.' Print a
message
3
Variables, Arrays
val9 val9 valABC transporter
case sensitive val is different from Val
4
Operations and Assignment
Perl uses arithmetic operators a 1 2 Add
1 and 2 and store in a a 3 - 4 Subtract
4 from 3 and store in a a 5 6 Multiply 5
and 6 a 7 / 8 Divide 7 by 8 to give
0.875 a 9 10 Nine to the power of 10 a
5 2 Remainder of 5 divided by 2 a
Return a and then increment it a-- Return
a and then decrement it for strings perl has
among others a b . c Concatenate b and
c a b x c b repeated c times
5
To assign values perl includes a b
Assign b to a a b Add b to a a -
b Subtract b from a a . b Append b
onto a
6
Array variables
An array variable is a list of scalars (ie
numbers and/or strings). they are prefixed by _at_
_at_SEQNAME (MG001", MG002", MG003") SEQNAME
2 (MG003) Attention 0, 1, 2,....
_at_num (0,1,2,3)
7
_at_L_CODONS ('TTT','TTC','TTA','TTG',
'CTT','CTC','CTA','CTG',
'ATT','ATC','ATA','ATG',
'GTT','GTC','GTA','GTG',
'TCT','TCC','TCA','TCG',
'CCT','CCC','CCA','CCG',
'ACT','ACC','ACA','ACG',
'GCT','GCC','GCA','GCG',
'TAT','TAC','TAA','TAG',
'CAT','CAC','CAA','CAG',
'AAT','AAC','AAA','AAG',
'GAT','GAC','GAA','GAG',
'TGT','TGC','TGA','TGG',
'CGT','CGC','CGA','CGG',
'AGT','AGC','AGA','AGG',
'GGT','GGC','GGA','GGG')
8
_at_AA ('A','R','N','D','C','Q','E','G','H','I','L'
,'K','M','F','P','S','T','W','Y','V','B') _at_mm
( 'a','r','n','d','c','q','e','g','h','i','l','k',
'm','f','p','s','t','w','y','v','b )
9
Associative arrays hash tables
Ordinary list arrays allow us to access their
element by number. The first element of array _at_AA
is AA0. The second element is AA1, and so
on. But perl also allows us to create arrays
which are accessed by string. These are called
associative arrays. array itself is prefixed by
a sign
10
ages (Michael", 39, "Angie", 27,
"Willy", "21 years", "The Queen
Mother", 108)
ages"Michael" Returns 39 ages"Angie"
Returns 27 ages"Willy" Returns "21
years" ages"The Queen Mother" Returns 108
11
File handling
a script (cat.pl) equivalent to the UNIX cat
!/bin/perl open(FILE,GMG.pep) while
ltFILEgt print _ close (FILE)
use chmod ax cat.pl cat.pl
12
split
A very useful function in perl splits up a
string and places it into an array.
!/bin/perl open(FILE,GMG.pep) while
ltFILEgt _at_tabsplit(/\s/, _) print
tab0 close (FILE)
13
!/bin/perl open(FILE,GMG.pep) while
ltFILEgt _at_tabsplit(/\s/, _, 2) NOMtab0
tab1 print NOMtab0 close (FILE)
_at_tabsplit(/\s/,_,n)
14
Control structures
foreach To go through each line of an array or
other list-like structure (such as lines in a
file) perl uses the foreach structure. This has
the form foreach nom (_at_SEQNAME) Visit each
item in turn and call it nom print
"nom\n" Print the item
15
foreach j ( 0 .. 2) Visit each value in
turn and call it j print
"SEQNAMj\n" Print the item
foreach j ( 0 .. AA) Visit each value in
turn and call it j print
"AAj\n" Print the item
16
Testing
Here are some tests on numbers and strings. a
b Is a numerically equal to
b? Beware Don't use the operator. a !
b Is a numerically unequal to b? a eq
b Is a string-equal to b? a ne b Is a
string-unequal to b? You can also use logical
and, or and not (a b) Is a and b
true? (a b) Is either a or b
true? !(a) is a false?
17
for
for (initialise test inc) first_action
second_action etc....
for (i 0 i lt 10 i) Start with i
1 Do it while i lt 10 Increment i
before repeating print "i\n"
18
Conditionals
if (a) print "The string is not
empty\n" else print "The string is
empty\n"
!/bin/perl open(FILE,GMG.pep) while
ltFILEgt print _ if ( m/gt/ ) close (FILE)
19
String matching
a eq b Is a string-equal to b? a ne b
Is a string-unequal to b?
Here are some special RE characters and their
meaning . Any single character except a
newline The beginning of the line or
string The end of the line or string Zero
or more of the last character One or more of
the last character ? Zero or one of the last
character
20
Some more special characters
\n A newline \t A tab \w Any
alphanumeric (word) character. The same as
a-zA-Z0-9_ \W Any non-word character.
The same as a-zA-Z0-9_ \d Any digit. The
same as 0-9 \D Any non-digit. The same as
0-9 \s Any whitespace character space,
tab, newline, etc \S Any non-whitespace
character \b A word boundary, outside
only \B No word boundary
21
Characters like , , , ), \, / and so on are
peculiar cases in regular expressions. If you
want to match for one of those then you have to
preceed it by a backslash (\). So \
Vertical bar \ An open square bracket \) A
closing parenthesis \ An asterisk \ A
carat symbol \/ A slash \\ A backslash
22
Substitution and translation
s/london/London/
sentence s/london/London/
global substitution i option (for "ignore
case"). s/london/London/gi
Translation
sentence tr/abc/edf/ tr/a-z/A-Z/
converts _ to upper case tr/A-Z/a-z/
converts _ to lower case
23
Simple scripts
  • -given a nucleotide sequence
  • base composition
  • -given a protein sequence
  • amino-acid composition
  • -given a nucleic databse (in fasta format)
  • base composition
  • given a protein database (in fasta format)
  • amino-acid composition

24
  • -sequence size (base or amino-acids)
  • -extract a portion of a sequence (pos start pos
    end)
  • -extract a sequence by name (from a database of
    sequences)
  • gene sequence codon count
  • given allxxseqnew file
  • script to compute frequencies of multiple matches

see splitfasta.pl splitdnafasta.pl
25
  • given allxxseqnew file
  • script to compute frequencies of multiple
    matches

Exercices de manipulation des données  -
home-directory, mkdir, cd, pathway, pwd, find  -
notation  DB.pep, DB.dna, seq.dna, seq.prt  -
utiliser  tab  comme séparateur  - utilisation
de sed et de grep  - le format fasta des
séquences  - compter le nombre des séquences
dans une base de séquences au format fasta 
(grep  gt  DB.pep ? wc l ) -
changer un caractère par un autre  - extraire
les séquences dune base (fichier au format
fasta) (splitfasta.pl, splitdnafasta.pl) - extrai
re 1 partie dune séquence (la séquence est au
format fasta) - fréquence des aa dune séquence
protéique  - fréquence des bases dune séquence
nucléotidique  - taille dune séquence  - taille
s des séquence dune base  - fréquence des
codons dune séquence codante  - Codons
volatilité  . correspondance codons/amino-acids 
Write a Comment
User Comments (0)
About PowerShow.com