Title: Biocomputing Databases and Perl
1BiocomputingDatabases and Perl
2HTTP
CGI
Web Server
Application Server
Client
Request
Response
3(No Transcript)
4The Database Tier
- stores and retrieves data.
- manages updates, allows simultaneous (concurrent)
access from web servers, provides security,
ensures the integrity of data - requires complex software.
- Database Management Systems (DBMSs)
- searches and manages data that's stored in
databases - a database is a collection of related data, and
an application can have more than one database - interface is accessed using SQL (Standard Query
Language) that's used to define and manipulate
databases and data
5Why use a database server?
- more than one user who needs to access the data
at the same time. - a moderate amount of data
- relationships between the stored data items
- more than one kind of data object
- constraints that must be rigidly enforced on the
data, such as field lengths, field types etc.. - If data must be queried to produce reports or
results. - large amount of data that must be searched
quickly. - If security is important
- When adding, deleting, or modifying data is a
complex process.
6Relational databases
- Entities and Attributes
- An entity is a person, place, event, or thing
about which data is collected - An entity set is a named collection of entities
sharing common characteristics - Attributes are characteristics of the entity
- Attributes are also called fields
- Tables
- A table holds related entities or an entity set
- Also called relations
- Comprised of rows and columns
7Tables
- Two-dimensional structure with rows and columns
- Each row (tuple) represents a single entity
- Columns represent attributes
- Row/column intersection represents single value
- Tables must have an attribute to uniquely
identify each row, a unique key - Column values all have same data format
- Each column has range of values called attribute
domain - Order of the rows and columns is immaterial to
the DBMS
8Fingerprint
Sequence
Motif
9Entity-relationship model
10Relational logical model
11(No Transcript)
12mySQL
- Command-line interpreter
- UNIX/Mac
- /usr/local/bin/mysql -uhugh -pshhh
- Windows
- Start gt All Programs gt mySQL gt mySQL Server 5.0 gt
mySQL command line client
13Managing databases
- Creating databases
- mysqlgt CREATE DATABASE phyloprints
- To use that database
- use phyloprints
- Creating tables
- CREATE TABLE "fingerprint" (
- "fprint_accn" varchar(15),
- "identifier" varchar(15),
- "motifs" int2,
- "date" date,
- "up_date" date,
- "family_title" text,
- "family_doc varchar(18),
- "category" text,
- "second_accn" varchar(15),
- "pseudo" bool
- )
14- Table fingerprint
- ------------------------------------------------
--------------------------- - Field Type Length
- ------------------------------------------------
--------------------------- - fprint_accn varchar() 15
- identifier varchar() 15
- motifs int2 2
- date date 4
- up_date date 4
- family_title text var
- family_doc varchar() 18
- category text var
- second_accn varchar() 15
- pseudo bool 1
- ------------------------------------------------
--------------------------- - Table motif
attributes
15Modifiers
- Are applied to attributes
- NOT NULL (row cant exist without this attribute
having a value) - identifier varchar(15) NOT NULL,
- DEFAULT (sets the data to the value that follows
when no data is supplied) - pseudo" bool DEFAULT f,
16Keys
- Primary key one or more attributes that
uniquely identify a row in a table - It is essential that every table has one
PRIMARY KEY (fprint_accn), key namesfprint_accn,i
dentifier) typeMyISAM
17Removing databases and tables
- DROP statement (make sure you want to do this
it doesnt check!) - DROP TABLE seqn
- DROP DATABASE phyloprints
18Inserting/Updating/Deleting Data
- Inserting Data
- INSERT INTO fingerprint VALUES (PR90008,
CRYSTALLIN,0,2000-10-11,2000-10-11,Crystallin
pseudo-signature,PDOC90008,Others,t) - Inserting lots of data rows
- INSERT INTO fingerprint VALUES (PR90008,
CRYSTALLIN,0,2000-10-11,2000-10-11,Crystallin
pseudo-signature,PDOC90008,Others,t),(PR0024
0,ADRENRGCA1DR,7,1996-08-14,2000-02-18,Alpha-1
D adrenergic receptor signature,PDOC00240, f)
19- DELETE statement
- All data
- DELETE FROM fingerprint
- Deleting specific rows use WHERE
- DELETE FROM fingerprint WHERE identifer
CRYSTALLIN - UPDATE statement
- UPDATE fingerprint SET fprint_accn PR00001
WHERE identifierCRYSTALLIN
20Querying with SQL
- Key operators
- SELECT
- PROJECT
- JOIN
- Other operators
- INTERSECT
- UNION (union compatible tables)
- DIFFERENCE
- PRODUCT
- DIVIDE
21Querying databases with SQL
- SELECT statement
- SELECT identifier, fprint_accn FROM fingerprint
- SELECT FROM fingerprint
- WHERE clause
- SELECT from fingerprint WHERE
fprint_accnPR00001
22- select identifier,fprint_accn from fingerprint
where pseudo't' ORDER BY fprint_accn - select fprint_accn,COUNT() from fingerprint
where pseudo't' GROUP BY fprint_accn - select DISTINCT seqn_accn from motif
- select distinct seqn_accn from motif LIMIT 10
23Join queries
- output data that's based on relationships between
two or more tables
24Diagram taken from Jane Mabey
25Diagram taken from Jane Mabey
26BiocomputingDatabases and Perl
27Outline
- Database connectivity in Perl using DBI module
- A series of small exercises
28Querying databases using Perl
- ! /usr/bin/perl -w
- intro6.pl - connect to MySQL, retrieve data,
write plain text output - use strict
- use DBI
- my user george
- my pass george
- my (dbh, sth, count) dbh DBI-gtconnect
("DBImysqlhostlocalhostdatabasewebdb",
"webdev", "webdevpass", PrintError gt 0,
RaiseError gt 1) - sth dbh-gtprepare ("SELECT name, wins, losses
FROM teams") - sth-gtexecute ()
- count 0
- while (my _at_val sth-gtfetchrow_array ())
- printf "name s, wins d, losses d\n",
val0, val1, val2 count - push(_at_allval,_at_val)
-
- print "count rows total\n"
- sth-gtfinish ()
29- ./intro6.pl
- name Fargo-Moorhead Twins, wins 36, losses
16 - name Winnipeg Maroons, wins 24, losses 26
- name Minot Why Nots, wins 19, losses 23
- name Warren Wanderers, wins 16, losses 30
- 4 rows total
30Perl example
- use DBI
- my dbh DBI-gtconnect('DBIMySqlpayroll') or
die "Couldn't connect to database " .
DBI-gterrstr - my sth dbh-gtprepare('SELECT FROM people
WHERE lastname ?') or die "Couldn't prepare
statement " . dbh-gterrstr - print "Enter namegt "
- while (lastname ltgt)
- Read input from the user my _at_data
- chomp lastname
- sth-gtexecute(lastname)
- Execute the query or die "Couldn't execute
statement " . sth-gterrstr - Read the matching records and print them out
- while (_at_data sth-gtfetchrow_array())
- my firstname data1
- my id data2
- print "\tid firstname lastname\n"
-
- if (sth-gtrows 0)
- print "No names matched lastname'.\n\n"
-
- sth-gtfinish
31Query databases over the Web
- ! /usr/bin/perl -w intro7.pl - connect to
MySQL, retrieve data, write HTML output - use strict
- use DBI
- use CGI qw(standard)
- my (dbh, sth, count)
- dbh DBI-gtconnect ("DBImysqlhostlocalhostdat
abasewebdb", "webdev", "webdevpass", PrintError
gt 0, RaiseError gt 1) - sth dbh-gtprepare ("SELECT name, wins, losses
FROM teams") - sth-gtexecute ()
- print header(), start_html ("team data")
- count 0
- while (my _at_val sth-gtfetchrow_array ())
- print p (sprintf ("name s, wins d, losses
d\n", val0, val1, val2)) - count
-
- print p ("count rows total"), end_html ()
- sth-gtfinish ()
- dbh-gtdisconnect ()
- exit (0)
32More about data retrieval with DBI.pm
- fetchrow_arrayref
- sth dbh-gtprepare ("SELECT name, wins, losses
FROM teams") - sth-gtexecute ()
- while (my ref sth-gtfetchrow_arrayref ())
- printf "name s, wins d, losses d\n",
ref-gt0, ref-gt1, ref-gt2 -
- sth-gtfinish ()
33- !/usr/bin/perl
- use DBI
- my db sequence
- my server localhost
- my user root
- my passwd passwd
- my dbConnection DBI-gtconnect(dbimysqldbs
erver,user,passwd) - my query show tables
- my sql dbConnection-gtprepare(query)
- sql -gtexecute()
- while(my row sql-gtfetchrow_arrayref)
- print join(\t,_at_row),\n
-
- dbConnection-gtdisconnect
- exit
34Higher retrieval methods
- selectrow_array()
- to retrieve a single row or a single column value
- selectcol_arrayref()
- returns the first column of a result set, as a
reference to an array of values. - selectall_arrayref()
- retrieves the entire result set as a matrix and
returns a reference to it
35Inserting data in the database
- sub new_employee
- Arguments database handle first and last
names of new employee department ID number for
new employee's work assignment - my (dbh, first, last, department) _at__
- my (insert_handle, update_handle)
- my insert_handle dbh-gtprepare_cached('INSERT
INTO employees VALUES (?,?,?)') - my update_handle dbh-gtprepare_cached('UPDATE
departments SET num_members num_members 1
WHERE id ?') - die "Couldn't prepare queries aborting" unless
defined insert_handle defined update_handle
- insert_handle-gtexecute(first, last,
department) or return 0 update_handle-gtexecute(
department) or return 0 return 1 Success -
36Inserting data in the database
- sub new_employee
- Arguments database handle first and last
names of new employee department ID number for
new employee's work assignment - my (dbh, first, last, department) _at__
- my (insert_handle, update_handle)
- my insert_handle dbh-gtprepare_cached('INSERT
INTO employees VALUES (?,?,?)') - my update_handle dbh-gtprepare_cached('UPDATE
departments SET num_members num_members 1
WHERE id ?') - die "Couldn't prepare queries aborting" unless
defined insert_handle defined update_handle
- my success 1
- success insert_handle-gtexecute(first,
last, department) success
update_handle-gtexecute(department) - my result (success ? dbh-gtcommit
dbh-gtrollback) unless (result) die
"Couldn't finish transaction " . dbh-gterrstr
return success -
37- A short cut for DELETING, UPDATING and INSERTING
data use the do statement - dbh-gtdo('DELETE FROM people WHERE age gt 65')
38A little on more keys
- A foreign key is a column or columns whose values
are the same as the primary key of another table.
You can think of a foreign key as a copy of
primary key from another relational table. The
relationship is made between two relational
tables by matching the values of the foreign key
in one table with the values of the primary key
in another.
39Reminders
- Executable CGI files
- Configuring Apache to read cgi files from cgi-bin
- In httpd.conf
- ScriptAlias /cgi-bin/ "C/Program Files/Apache
Group/Apache2/cgi-bin/"