Introduction to UNIX/Commandline/PERL - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Introduction to UNIX/Commandline/PERL

Description:

Computing Concepts for Bioinformatics. Computing Concepts ... Variables: Scalar Data. Numbers 12, 12e5, -12.534. Strings 'who likes Austin Powers?' Operators ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 26
Provided by: niravme
Category:

less

Transcript and Presenter's Notes

Title: Introduction to UNIX/Commandline/PERL


1
Computing Concepts for Bioinformatics
  • More about UNIX filenames
  • Introduction to EMBOSS
  • EMBOSS input/output
  • Editing Files
  • The Programming Process
  • PERL concepts
  • Your first PERL program

http//amadeus.biosci.arizona.edu/nirav
2
Your new amadeus prompt!
  • To make navigation easier we have modified your
    prompt from
  • To
  • Now you can see the directory you are in the
    symbol is your home directory
  • Since I did a cd to bos-db now my prompt shows
    bos-db at the end (earlier it was )

3
More About UNIX Filenames
  • Many filenames include an extension
  • e.g. run_me.pl, my_seq.tfa, well.whatever
  • Filenames and extensions should reflect the
    contents and format of the file
  • When typing a filename use the tab key for
    automatic filename completion
  • i.e cd prltTABgt should fill in the rest
  • If it does not ..provide few more characters
    (you may have 2 directories starting with pr)

4
Wildcards and Filtering
  • The wildcard matches anything!
  • e.g. to see all files with names that begin with
    t, type ls l t
  • For files that end with t, type ls l t
  • Be very careful with this wildcard!
  • When using File Selection Boxes, using wildcards
    and Filter can cut down the list of names that
    appear
  • Well see this in the Notepad Editor

5
UNIX permissions (or modes)
  • ls l shows file types, permissions, ownership,
    size, date/time last modified
  • 3 groups of permissions u g o
  • 3 types of permissions r w x
  • To remove read permissions
  • chmod r secret_file
  • To make a PERL script executable
  • chmod x my_script.pl
  • (otherwise it wont run!)

6
EMBOSS
  • www.emboss.org
  • European Molecular Biology Open Software Suite
  • Free Open Source software analysis package
    specially developed for the needs of the
    molecular biology community
  • Provides a comprehensive set of sequence analysis
    programs (approximately 100)

7
EMBOSS (programs)
  • Integrates other publicly available packages
  • Can be accessed through BioPERL modules(easy
    automation)
  • Sequence alignment
  • Rapid database searching with sequence patterns
  • Protein motif identification, including domain
    analysis
  • Nucleotide sequence pattern analysis, for example
    to identify CpG islands or repeats.
  • Codon usage analysis for small genomes
  • Rapid identification of sequence patterns in
    large scale sequence sets.
  • Database creation/indexing

8
Interacting with EMBOSS
  • EMBOSS programs are run by typing them at the
    UNIX prompt (in your Xterminal) with or without
    parameters/options
  • EMBOSS command syntax follows normal UNIX command
    conventions
  • It will prompt you for parameters not provided
    when invoking the program
  • In doubt useprogram_name -help (seqret
    help)tfm program_name ( tfm seqret )
  • Use wossname to search a program by keyword

9
Sequence Formats
  • Sequence Formats
  • FASTA
  • GenBank
  • EMBL
  • SwissProt
  • PIR
  • FASTA formatgtSeq_Name description and some
    other comment ttcctctttctcgactccatcttcgcggtagctggg
    accgccgttcagtcgccaatatgcagctctttgtcgcgcccaggagctac
    acac
  • IDs and Accessions
  • ID was human readable and name suggested
    functions etc,
  • Accession number are database assigned (now a
    days they are same as ID )
  • ID 'hsfau' is the 'Homo Sapiens FAU pseudogene
    Its accession X65923 (sometimes Accession.1
    for version)
  • Multiple sequences per file
  • No connection between file name and ID
  • GFF (standard sequence feature exchange)

10
EMBOSS USA
  • USA (Uniform Sequence Address)
  • "formatfile"
  • "formatfileentry"
  • "dbnameentry" (we dont have this configured)
  • "_at_listfile" (a file of file-names ls .seq gt
    mylist )
  • Format is not required when reading in a
    sequence, EMBOSS will guess the sequence format
    by trying all known formats until one succeeds
  • When writing out a sequence, EMBOSS will use
    fasta format by default. You can specify another
    format to usegcgmyresults.seqemblmyresults.o
    ut

11
Setting up your Editor (Nedit)
  • www.nedit.org
  • Set your preferences (syntax highlight, line
    number)
  • Save default
  • Exit
  • Restart
  • You can also use the terminal to open a file or
    create a new oneusing nedit e.g.nedit test.pl
  • The pushes the job in the background giving
    your prompt back (so you can do multiple tasks)
  • You can type fg to get it back and control-z
    followed by bg to push it back (and get your
    prompt)

12
Nedit (File dialog box)
  • Ignore everything with .
  • Double click on directoryor select with mouse
    and use enter key
  • What is . and .. ?
  • Use filter if you have manyfiles ( .pl )
  • Select the file to edit/open with mouse (should
    have black background) then click on OK
  • Save (Control-s) and Save As

13
Programming Process
  • When asked to develop...look around before you
    re-invent the wheel
  • Requirements Analysis What input, output,
    formats, source of data, frequency of update
    (general solutions are better than specific)
  • Design Phase (how and what to use)
  • Flow charts for (logic and data) UML, use cases
    http//www.uml.org/
  • Pseudocodeget filename open file and read
    sequences For each sequence If length is
    greater then 100Kb print error msg

14
Programming
  • After the design phase
  • Always comment your code!!
  • Use version control filename.1 etc for small
    projectCVS http//www.cvshome.org/
  • Code has to be human readable but machine
    parseable !
  • Test and debug code using different scenarios for
    input
  • Dont feel shy to use paper and pencil ..its
    easier at times
  • http//www.eecs.wsu.edu/c/programm.htm

15
Introduction to PERL
  • Invoking PERL
  • PERL statement syntax non-loop lines end with
  • (You will forget this many times!!)
  • Basic Input/Output
  • STDIN, STDOUT, print and writing to files,
    sockets
  • Variables Scalar Data
  • Numbers 12, 12e5, -12.534
  • Strings who likes Austin Powers?
  • Operators , -, lt, gt
  • Flow Control
  • if, while, for, foreach
  • Arrays

16
URLs for Learning PERL
  • Perl in 10 minutes
  • http//www.geocities.com/SiliconValley/7331/ten_pe
    rl.html
  • Learning Perl
  • http//educationplanet.com/search/Computers_and_th
    e_Internet/Computers/Programming_Languages/Perl/
  • http//cslibrary.stanford.edu/108/EssentialPerl.pd
    f
  • Perl for Biologists
  • http//www.uni-hohenheim.de/rebhan/perl/

17
Invoking Perl
  • First line of a perl program
  • ! /usr/local/bin/perl -w
  • by itself means comments, i.e. the line is not
    interpreted.
  • It is important to comment your code!!!
  • !/usr/local/bin/perl
  • Program by Baha Men (Nov10,2000)
  • print Who let the dogs out\?n
  • The above line outputs to screen
  • the (only) famous song by the group

18
Variables
  • Variable is something that will store values
    while your program is running
  • You can set initial values of variables and
    modify these values as the program executes.
  • No need to pre define
  • Automatically get global scope
  • You can store numbers, text in the variables
  • a 1
  • z a 3.1412653505
  • b I put the cat out
  • gene_name C127899.1

Note the for text
19
Arithmetic Operators
  • a 1 2 Add 1 and 2 and store in a
  • a 3 - 4 Subtract 4 from 3 and store in
    a
  • a 5 6 Multiply 5 and 6
  • a 7 / 8 Divide 7 by 8 to give 0.875
  • a 9 10 Nine to the power of 10
  • a 5 2 Remainder of 5 divided by 2
  • a Increment a by 1
  • a-- Decrement a by 1
  • if (a lt 2) Lesser than or equal

20
String Operator
  • b Hello c World
  • a b . c Concatenate b and c
  • print a This is HelloWorld
  • We can do the same using
  • print a b from me\n This will print
    Hello World from me
  • Followed by a newline
  • Difference between and covered later
  • \n is newline this puts space between 2 lines
  • \t is the tab operator i.e Hello World

21
Testing Values
  • a b Is a numerically equal to b ?
  • Beware Don't use the
    operator.
  • a ! b Is a numerically not equal to b?
  • a eq b Is a string-equal to b?
  • a ne b Is a string not equal to b?
  • Use for numbers and eq for strings

22
Flow Control
  • for (initialize test increment)
  • first_action
  • second_action
  • etc
  • while (condition)
  • first_action
  • second_action
  • etc
  • if (condition)
  • first_action
  • second_action

for( count 0 count lt 10 count)
print Count is count \n
while (president ne Nader) print Try
again\n Ask again president ltSTDINgt
Get input chomp president Chop off
newline
23
Arrays
  • Array variable is a list of scalars (ie numbers
    and/or strings).
  • Same format as scalar except that they have a _at_
    i.e _at_names is a array while names is scalar
  • _at_names (Al,George,Ralph)_at_party
    (Democrat,Republican,Green)
  • Array data can be referenced by using the index
    number which starts from 0 names0 is Al and
    party1 is Republican
  • You can set values using names3Pat

24
I get the picture ..just get on with it !
  • Your first program hello.pl
  • Create directory prog1 and save files there
  • Print hello world (Thats too easy)
  • Ask the user for a name
  • Greet the user
  • Ask the user for password
  • If it matches the password yahoo then greet else
    boot
  • You can type perl hello.plor chmod ux hello.pl
    and run it./hello.pl (remember to cd prog1)

25
Your first PERL script
  • !/usr/local/bin/perl First line must be
    path to PERL interpreter!! Prompt the user to
    type and hit enterprint Please enter your name
    read from Keyboard and remove new line
  • name ltSTDINgtchomp(name)print Hello
    name please give me secret passwordpassword
    ltSTDINgt Now compare it to hidden
    passwordif(password eq yahoo ) print
    Welcome my buddy name\nelse print Bite
    Me password is invalid\n
Write a Comment
User Comments (0)
About PowerShow.com