Managing complexity Advanced Perl - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Managing complexity Advanced Perl

Description:

Funny? Goals. I already assume you know perl basics -- some more advanced features ... Regular expressions, text manipulations. Extensions (modules) do this ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 44
Provided by: tomb48
Category:

less

Transcript and Presenter's Notes

Title: Managing complexity Advanced Perl


1
Managing complexity(Advanced Perl)
  • Using perl for specific tasks with help from
    Bioperl and others

2
Login
  • Username bioinfouser
  • Password loginbioinfo

3
Funny?
4
Goals
  • I already assume you know perl basics -- some
    more advanced features
  • Learn how to write OO code
  • More flexible modules
  • Understand other modules
  • Some APIs that you may need.
  • Bioperl
  • PerlDBI

5
What I assume you already know
  • Scalars
  • Arrays
  • Hashes
  • Control structures (if-then, for, foreach, while,
    etc.)
  • File IO

6
Managing complexity
  • By managing complexity
  • Make hard tasks easy(er)
  • Perl itself does this
  • Regular expressions, text manipulations
  • Extensions (modules) do this
  • May come at the expense of execution speed
  • You may not care
  • Consider the big picture
  • Development time
  • Errors
  • Extremely custom software
  • Some things need speed

7
How complex is it now?
  • Perl is a very compact language in terms of
    human languages
  • Perl is large compared with other languages
  • TMTOWTDI
  • Perl has approximately 233 reserved words
  • Java has approximately 47 reserved words
  • Both are easy to learn harder to use effectively

8
General practices
  • Always use !/usr/bin/perl w or use warnings
  • Consider use strict for scripts longer than 10
    lines
  • You cant have too many comments
  • head
  • cut
  • perldoc

9
Getting values into the program or subroutine.
  • Perl is pass by value
  • A scalar can have as a value a pointer to an
    array, hash, function etc.
  • The args to a program or function arrive in a
    special variable called _at__
  • my first_value shift _at__
  • my first_value _1
  • my first_value shift

10
References
my _at_array (one, two, three,
four) function_call(_at_array) function_call(\_at_ar
ray) function_call(one,two,three) sub
function_call my passed shift _at__ print
passed
Output one ARRAY(0x80601a0) ARRAY(0x804c9a0)
11
Debugging complex data structures.
  • Print the reference
  • It will tell you a little bit of information
  • Use the Dumper module.
  • This will give you a snapshot of the whole data
    structure

12
Some more advanced features
13
Regular expressions
  • Not Perl specific
  • Very useful
  • What they do
  • String comparisons
  • String substitutions
  • Substring selection

14
Regex
Could put m
string /find/ string /find/ string
/find/ string /find/
. Match any character \w Match "word"
character (alphanumeric plus "_") \W Match
non-word character \s Match whitespace
character \S Match non-whitespace character \d
Match digit character \D Match non-digit
character \t Match tab \n Match newline \r
Match return
15
Repetition
string /(ti)2/ string
/ATG?C3A3,T4,6/
Character Classes
string /ATGCN/ string /ATGCNatgcn/i
16
Selection/Replacement
string /(A3,8)/ print 1
string s/a/A/ string tr/atgc/ATGC/
17
Additional syntax
string /AT?AT/
string m/var/log/messages
_ ATATATAGTGTGCGTGATATGGG (one,two,three)
/AT..AT/g
18
What is a module
  • Two types
  • Object-oriented type
  • Provides something similar to a class definition
  • Remote function call
  • Provides a method to import subroutines or
    variables for the main program to use

19
Howto Making a module
Create a file called workSaver.pm pack
age workSaver sub doStuff print Stuff
done\n 1 statement that evaluates to
true Now you can use with use
workSaver Some restrictions apply
20
HowtoMaking a module cont.
  • This method would work very well for subroutines
    that are used in several programs.
  • Reduces the clutter in your program
  • Provides one maintenance point instead of unknown
    number.
  • Eases bug fixes
  • Careful of boundaries

21
More Complete method
  • Allows you to pollute the namespace of the
    original program selectively.
  • Makes the use of functions and variables easier
  • Still used about the same way as the simple
    method but things are clearer

22
More Complete
package functional use strict use Exporter our
_at_ISA ("Exporter") our _at_EXPORT qw () our
_at_EXPORT_OK qw (variable1 variable2
printout) our VERSION 2.0 our variable1
"var1" our variable2 "var2" my variable3
"var3" sub printout my passed_variable
shift print "Your variable is passed_variable
mine are variable1 , variable2, variable3
\n" 1
23
CPAN
  • Wouldnt it be nice to have a place where
  • You could find a bunch of perl modules
  • It would be brows able
  • Searchable
  • Big pipe for people to download stuff
  • Other people would be encouraged to submit fixes
    and updates
  • And it was all free

24
Sources of modules/Information
  • www.CPAN.org
  • www.bioperl.org
  • www.perl.com
  • www.cetus-links.org/oo_infos.html

25
Bioperl
  • Set of modules that are extremely useful for
    working with biological data. Actively
    maintained.
  • www.bioperl.org is a very good place to get the
    basics of bioperl
  • We will go through an example to see a typical use

26
  • Bioperl has several basic types of objects
  • Seq a sequence the most common type BioSeq
  • Location objects where it is how long it is etc.
  • Interface objects BioxyzI No implementation
    mostly a documentation

27
Bioperl documentation
  • Several different ways to find out about a module
  • perldoc BioSeq
  • bioperl.org/usr/lib/perl5/site_perl/5.8.0/bptutori
    al.pl 100 BioSeq
  • DataDumper to print the data structure
  • Print the variable

28
Bio perl demo
29
Why use a database
  • Transaction control - only one user can modify
    the data at any one time.
  • Access control - some people can modify data,
    some can read data, others can create
    data-structures.
  • Fast handling of lots of data
  • Precise definition of data (mostly).
  • Easy to share data resources with others

30
Many choices
  • There are many types MS Access, Excel(sortof),
    sybase, oracle, postgres, msql, mysql
  • They each have their niche and function best in
    certain cases, there is also considerable
    overlap.
  • SQL structured query language is a common thread

31
MySQL is better than YourSQL
  • Free on Unix
  • Good developer support
  • Constant bug fixes and feature addition
  • Good scalability to medium size and load, OK
    performance.
  • Easy to install.
  • Used at Ensemble and UCSC genome browsers, so a
    lot of information is readily available in that
    format.

32
Table Structure - Schema
Gene table Gene_ID Name
Gene ATP7B Aliases Wilson disease-associated
protein Copper-transporting ATPase 2 References
Enzyme Commission 3.6.3.4 UniGene
Hs.84999 AffyProbeU133 204624_at AffyProbeU95
37930_at RefSeq NM_000053 GenBank
AF034838 GenBank U11700 LocusLink 540
Alias table Alias_ID Gene_ID Alias
Reference table Reference_ID Gene_ID Reference Dat
aSource
33
SQL (MySQL dialect)
  • SELECT col_name FROM table WHERE col_name
    value
  • SELECT COUNT() FROM table WHERE col_name is like
    value
  • SELECT count(distinct(col_name)) FROM table where
    col_name is not null
  • CREATE, UPDATE, DELETE, INSERT have similar forms

34
SQL cont.
  • USE database_name
  • Also can be specified on the command line D
  • SHOW TABLES lists all the tables in that
    database (also SHOW DATABASES).
  • DESCRIBE table_name lists the columns and
    datatypes for each column
  • or SHOW COLUMNS FROM table_name

35
More advanced SELECTS
  • SELECT (column_list) FROM (table_list) WHERE
    (constraints) GROUP_BY (grouping columns)
    ORDER_BY (sorting columns) LIMIT (limit number)
  • SELECT col_name from (table1, table2) where
    table1_val table2_val and table1_val2 gt value
  • Example of a equi-join

36
Getting the names right
  • If you only have one table you only need to use
    the column name
  • When you are using joins this may not be
    adequate.
  • If two tables have the column primary you would
    need to call the column table1.primary or
    table2.primary

37
Data Types
  • INT
  • Tinyint 128 to 127
  • Smallint 32768 to 32767
  • Mediumint 8388608 to 8388607
  • Int 2147683648 to 2147483647
  • Bigint 9223372036854775808 to 9223372036854775807
  • FLOAT
  • Float 4 bytes
  • Double 8 bytes

38
  • CHAR
  • Char(n) character string of n n bytes
  • Varchar(n) character string up to n long L1
    bytes
  • Text upto 216 bytes
  • BLOBs Binary Large OBjects

39
Perl DBI
  • Method for perl to connect to a database
    (virtually any database) and read or modify data.
  • The statements are constructed very similar to
    SQL statements that would be entered on the
    command line so learning SQL is still necessary

40
Statements in DBI
  • Connect
  • Used to establish initial connection
  • Prepare
  • Prepare a statement to execute
  • Execute
  • Execute the statement
  • Do
  • prepare a statement that does not return results
    and execute it

41
  • Fetch
  • Several types used to get returned data
  • Disconnect
  • Disconnect from the server

42
Types of fetch
  • fetchrow_array
  • Used to fetch an array of scalars each time
  • Can also use fetchrow_arrayref
  • fetchrow_hash
  • Used to fetch a hash indexed by column name.
  • Slower but cleaner code.
  • Can also use fetchrow_hashref.

43
More advanced statements
  • Quote
  • Used to properly quote data for use with a
    prepare statement
  • value dbh-gtquote(blast_result)
  • Placeholders
  • Speeds up execution, optional
  • my prep dbh-gtprepare (select x from y where
    z ?)
  • loop_start
  • prep-gtbind_param(1,z)
  • prep-gtexecute()
  • loop_end
Write a Comment
User Comments (0)
About PowerShow.com