Title: Nimble Perl Programming Using Scriptome
1Nimble Perl Programming Using Scriptome
- Yannick Pouliot, PhD
- Bioresearch Informationist
- Lane Medical Library Knowledge Management
Center - 1/22/2009
2Objectives
- Determining whether Scriptome can
- Enable you to perform operations otherwise
difficult/time-consuming/error-prone? - Help you learn Perl?
Also, well be using anonymous polling to
determine whether youre happy with the material
and speed of delivery
And dont worry This experiment wont hurt a bit!
3So What Is Scriptome?
- Scriptome is a resident Perl program that
performs various data manipulation tasks useful
to biologists - Originally developed by Harvards FAS Center for
Systems Biology - Maintained and extended by lots more volunteers
not associated with Harvard
4Why Bother With Scriptome?
- Code is visible, enabling learning on how to do
things in Perl or not - Can handle arbitrarily large files
- No size limitations, e.g., Excel
- Free runs on everything PC, Mac, Linux
- Its programmatic!
- Much faster than manual operations
- You can string operations together and save these
in e.g. a .bat file
5How Do You Use Scriptome?
- You tell Scriptome which function you want it to
perform (more later) - You can also string Scriptome functions into a
protocol - Input Scriptome operates on text files
- No binary files, but you could add that
capability yourself - E.g., process Excel files in native form using
Perl modules, e.g., ParseExcel - Output command line or write into another file
6Scriptome Pick Your Flavor
http//lane.stanford.edu/howto/index.html?id_1257
http//sysbio.harvard.edu/csb/resources/computatio
nal/scriptome/
7Installing Scriptome - Windows
- Download Scriptome_exe.tar.gz using this link
http//sysbio.harvard.edu/csb/resources/computatio
nal/scriptome/bin/Scriptome_exe.tar.gz. - ? Final location I suggest C/Program
Files/Scriptome - Create a directory named Scriptome
- Decompress Scriptome_exe.tar.gz by
double-clicking - ? Notice the four files inside
- Update the PATH variable
- add this string at the END of the contents of
the PATH variable - C\Program Files\Scriptome\ScriptomeC\Program
Files\Scriptome\ScriptPackC\Program
Files\Scriptome\Scriptome.batC\Program
Files\Scriptome\ScriptPack.bat
8Scriptome Usage
- 1. Using a specific tool
- Scriptome flags toolname input_filenames gt
output_filename - Example
- Scriptome -t change_fasta_to_tab LONGhmcad.fst
- 2. Finding a tool by type
- Scriptome -t tooltype
- where tooltype
- Calc
- Choose
- Sort
- Fetch
- Merge
- Change
- Example
- Scriptome -t Calc
Lets examine each area briefly before going over
specifics
9Polling Time Hows the speed? 1 Too fast 2. Too
slow 3. More or less OK 4. I feel nauseous
10Examples and noteworthy tools
11Calc Tool Examples - 1
- Compute column sums
- Scriptome -t calc_col_sum SubjectData1.tab
- ? select columns to add
- IMPORTANT column numbers start at 0, not 1
- Note visible Perl code ? easy to modify, expand
perl -e " col1 while(ltgt) s/\r?\n//
_at_Fsplit /\t/, _ sum Fcol warn
qq\nSum of column col for . lines\n\n print
qqsum\n " file.tab
12Calc Tool Examples - 2
- Compute row sums
- Scriptome -t calc_row_sum SubjectData1.tab
- ? enter 1 for column 1, 2 for column 2, etc
perl -e " _at_cols(1, 2, 3) while(ltgt)
s/\r?\n// _at_Fsplit /\t/, _ sum 0
foreach col (_at_cols) sum Fcol
print qq_\tsum\n warn qq\nSum of
columns _at_cols for each line (. lines)\n\n "
in.tab
13Change Tool Examples - 1
perl -e " count0 len0 while(ltgt)
s/\r?\n// s/\t/ /g if (s/gt//) if
(. ! 1) print qq\n s/
/\t/ count _ . qq\t
else s/ //g len length(_)
print _ print qq\n warn qq\nConverted
count FASTA records in . lines to tabular
format\nTotal sequence length len\n\n "
seqs.fna
- Create tab-delimited file from FASTA file
-
- Scriptome -t change_fasta_to_tab LONGhmcad.fst gt
LONGhmcad.fst.tab - ? change_fasta_to_tab is an important tool
because many Scriptome tools use tab-delimited
files
14Change Tool Examples - 2
- Change rows to columns or vice versa
- Scriptome -t change_transpose_table
SubjectData1.tab - Note change_transpose_table operates on
tab-delimited files
15Change Tool Examples - 3
- Create tab-delimited file from FASTA file
-
- Scriptome -t change_bio_format_to_bio_format
LONGhmcad.fst - enter fasta as input format (no quotes)
- enter genbank as output format (no quotes)
- change_bio_format_to_bio_format addresses the
common problem of converting formats - Important requires Bioperl to be installed
Notice anything interesting?
perl -MBioSeqIO -e " informat
qqgenbank outformat qqfasta count
0 for infile (_at_ARGV) in
BioSeqIO-gtnewFh(-file gt infile , -format gt
informat) out BioSeqIO-gtnewFh(-format
gt outformat) while (ltingt) print out
_ count warn qqTranslated
count sequences from informat to outformat
format\n " myseqs.genbank gt myseqs.fasta
16Conclusions
- Scriptome is
- A good solution for manipulating medium to large
data files quickly and reliably - A way to learn Perl in a real context (no toy
problems) - Able to perform a wide range of tasks, from
simple, generic file manipulations to
bio-specific complex tasks
17Resources
- For Perl help, see resources in workshop
description in Lanes Perl Programming for
Biologists - Some recommended titles
18Polling Time Do you think Scriptome will be
useful to your research? 1. Definitely 2.
Likely 3. Not likely 4. No way 5. Whats the
question again?
19(No Transcript)