Biocomputing Databases and Perl - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Biocomputing Databases and Perl

Description:

Biocomputing Databases and Perl – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 40
Provided by: gmo69
Category:

less

Transcript and Presenter's Notes

Title: Biocomputing Databases and Perl


1
BiocomputingDatabases and Perl
  • 13th February 2006

2
HTTP
CGI
Web Server
Application Server
Client
Request
Response
3
(No Transcript)
4
The Database Tier
  • stores and retrieves data.
  • manages updates, allows simultaneous (concurrent)
    access from web servers, provides security,
    ensures the integrity of data
  • requires complex software.
  • Database Management Systems (DBMSs)
  • searches and manages data that's stored in
    databases
  • a database is a collection of related data, and
    an application can have more than one database
  • interface is accessed using SQL (Standard Query
    Language) that's used to define and manipulate
    databases and data

5
Why use a database server?
  • more than one user who needs to access the data
    at the same time.
  • a moderate amount of data
  • relationships between the stored data items
  • more than one kind of data object
  • constraints that must be rigidly enforced on the
    data, such as field lengths, field types etc..
  • If data must be queried to produce reports or
    results.
  • large amount of data that must be searched
    quickly.
  • If security is important
  • When adding, deleting, or modifying data is a
    complex process.

6
Relational databases
  • Entities and Attributes
  • An entity is a person, place, event, or thing
    about which data is collected
  • An entity set is a named collection of entities
    sharing common characteristics
  • Attributes are characteristics of the entity
  • Attributes are also called fields
  • Tables
  • A table holds related entities or an entity set
  • Also called relations
  • Comprised of rows and columns

7
Tables
  • Two-dimensional structure with rows and columns
  • Each row (tuple) represents a single entity
  • Columns represent attributes
  • Row/column intersection represents single value
  • Tables must have an attribute to uniquely
    identify each row, a unique key
  • Column values all have same data format
  • Each column has range of values called attribute
    domain
  • Order of the rows and columns is immaterial to
    the DBMS

8
Fingerprint
Sequence
Motif
9
Entity-relationship model
10
Relational logical model
11
(No Transcript)
12
mySQL
  • Command-line interpreter
  • UNIX/Mac
  • /usr/local/bin/mysql -uhugh -pshhh
  • Windows
  • Start gt All Programs gt mySQL gt mySQL Server 5.0 gt
    mySQL command line client

13
Managing databases
  • Creating databases
  • mysqlgt CREATE DATABASE phyloprints
  • To use that database
  • use phyloprints
  • Creating tables
  • CREATE TABLE "fingerprint" (
  • "fprint_accn" varchar(15),
  • "identifier" varchar(15),
  • "motifs" int2,
  • "date" date,
  • "up_date" date,
  • "family_title" text,
  • "family_doc varchar(18),
  • "category" text,
  • "second_accn" varchar(15),
  • "pseudo" bool
  • )

14
  • Table fingerprint
  • ------------------------------------------------
    ---------------------------
  • Field Type Length
  • ------------------------------------------------
    ---------------------------
  • fprint_accn varchar() 15
  • identifier varchar() 15
  • motifs int2 2
  • date date 4
  • up_date date 4
  • family_title text var
  • family_doc varchar() 18
  • category text var
  • second_accn varchar() 15
  • pseudo bool 1
  • ------------------------------------------------
    ---------------------------
  • Table motif

attributes
15
Modifiers
  • Are applied to attributes
  • NOT NULL (row cant exist without this attribute
    having a value)
  • identifier varchar(15) NOT NULL,
  • DEFAULT (sets the data to the value that follows
    when no data is supplied)
  • pseudo" bool DEFAULT f,

16
Keys
  • Primary key one or more attributes that
    uniquely identify a row in a table
  • It is essential that every table has one

PRIMARY KEY (fprint_accn), key namesfprint_accn,i
dentifier) typeMyISAM
17
Removing databases and tables
  • DROP statement (make sure you want to do this
    it doesnt check!)
  • DROP TABLE seqn
  • DROP DATABASE phyloprints

18
Inserting/Updating/Deleting Data
  • Inserting Data
  • INSERT INTO fingerprint VALUES (PR90008,
    CRYSTALLIN,0,2000-10-11,2000-10-11,Crystallin
    pseudo-signature,PDOC90008,Others,t)
  • Inserting lots of data rows
  • INSERT INTO fingerprint VALUES (PR90008,
    CRYSTALLIN,0,2000-10-11,2000-10-11,Crystallin
    pseudo-signature,PDOC90008,Others,t),(PR0024
    0,ADRENRGCA1DR,7,1996-08-14,2000-02-18,Alpha-1
    D adrenergic receptor signature,PDOC00240, f)

19
  • DELETE statement
  • All data
  • DELETE FROM fingerprint
  • Deleting specific rows use WHERE
  • DELETE FROM fingerprint WHERE identifer
    CRYSTALLIN
  • UPDATE statement
  • UPDATE fingerprint SET fprint_accn PR00001
    WHERE identifierCRYSTALLIN

20
Querying with SQL
  • Key operators
  • SELECT
  • PROJECT
  • JOIN
  • Other operators
  • INTERSECT
  • UNION (union compatible tables)
  • DIFFERENCE
  • PRODUCT
  • DIVIDE

21
Querying databases with SQL
  • SELECT statement
  • SELECT identifier, fprint_accn FROM fingerprint
  • SELECT FROM fingerprint
  • WHERE clause
  • SELECT from fingerprint WHERE
    fprint_accnPR00001

22
  • select identifier,fprint_accn from fingerprint
    where pseudo't' ORDER BY fprint_accn
  • select fprint_accn,COUNT() from fingerprint
    where pseudo't' GROUP BY fprint_accn
  • select DISTINCT seqn_accn from motif
  • select distinct seqn_accn from motif LIMIT 10

23
Join queries
  • output data that's based on relationships between
    two or more tables

24
Diagram taken from Jane Mabey
25
Diagram taken from Jane Mabey
26
BiocomputingDatabases and Perl
  • 15th February 2006

27
Outline
  • Database connectivity in Perl using DBI module
  • A series of small exercises

28
Querying databases using Perl
  • ! /usr/bin/perl -w
  • intro6.pl - connect to MySQL, retrieve data,
    write plain text output
  • use strict
  • use DBI
  • my user george
  • my pass george
  • my (dbh, sth, count) dbh DBI-gtconnect
    ("DBImysqlhostlocalhostdatabasewebdb",
    "webdev", "webdevpass", PrintError gt 0,
    RaiseError gt 1)
  • sth dbh-gtprepare ("SELECT name, wins, losses
    FROM teams")
  • sth-gtexecute ()
  • count 0
  • while (my _at_val sth-gtfetchrow_array ())
  • printf "name s, wins d, losses d\n",
    val0, val1, val2 count
  • push(_at_allval,_at_val)
  • print "count rows total\n"
  • sth-gtfinish ()

29
  • ./intro6.pl
  • name Fargo-Moorhead Twins, wins 36, losses
    16
  • name Winnipeg Maroons, wins 24, losses 26
  • name Minot Why Nots, wins 19, losses 23
  • name Warren Wanderers, wins 16, losses 30
  • 4 rows total

30
Perl example
  • use DBI
  • my dbh DBI-gtconnect('DBIMySqlpayroll') or
    die "Couldn't connect to database " .
    DBI-gterrstr
  • my sth dbh-gtprepare('SELECT FROM people
    WHERE lastname ?') or die "Couldn't prepare
    statement " . dbh-gterrstr
  • print "Enter namegt "
  • while (lastname ltgt)
  • Read input from the user my _at_data
  • chomp lastname
  • sth-gtexecute(lastname)
  • Execute the query or die "Couldn't execute
    statement " . sth-gterrstr
  • Read the matching records and print them out
  • while (_at_data sth-gtfetchrow_array())
  • my firstname data1
  • my id data2
  • print "\tid firstname lastname\n"
  • if (sth-gtrows 0)
  • print "No names matched lastname'.\n\n"
  • sth-gtfinish

31
Query databases over the Web
  • ! /usr/bin/perl -w intro7.pl - connect to
    MySQL, retrieve data, write HTML output
  • use strict
  • use DBI
  • use CGI qw(standard)
  • my (dbh, sth, count)
  • dbh DBI-gtconnect ("DBImysqlhostlocalhostdat
    abasewebdb", "webdev", "webdevpass", PrintError
    gt 0, RaiseError gt 1)
  • sth dbh-gtprepare ("SELECT name, wins, losses
    FROM teams")
  • sth-gtexecute ()
  • print header(), start_html ("team data")
  • count 0
  • while (my _at_val sth-gtfetchrow_array ())
  • print p (sprintf ("name s, wins d, losses
    d\n", val0, val1, val2))
  • count
  • print p ("count rows total"), end_html ()
  • sth-gtfinish ()
  • dbh-gtdisconnect ()
  • exit (0)

32
More about data retrieval with DBI.pm
  • fetchrow_arrayref
  • sth dbh-gtprepare ("SELECT name, wins, losses
    FROM teams")
  • sth-gtexecute ()
  • while (my ref sth-gtfetchrow_arrayref ())
  • printf "name s, wins d, losses d\n",
    ref-gt0, ref-gt1, ref-gt2
  • sth-gtfinish ()

33
  • !/usr/bin/perl
  • use DBI
  • my db sequence
  • my server localhost
  • my user root
  • my passwd passwd
  • my dbConnection DBI-gtconnect(dbimysqldbs
    erver,user,passwd)
  • my query show tables
  • my sql dbConnection-gtprepare(query)
  • sql -gtexecute()
  • while(my row sql-gtfetchrow_arrayref)
  • print join(\t,_at_row),\n
  • dbConnection-gtdisconnect
  • exit

34
Higher retrieval methods
  • selectrow_array()
  • to retrieve a single row or a single column value
  • selectcol_arrayref()
  • returns the first column of a result set, as a
    reference to an array of values.
  • selectall_arrayref()
  • retrieves the entire result set as a matrix and
    returns a reference to it

35
Inserting data in the database
  • sub new_employee
  • Arguments database handle first and last
    names of new employee department ID number for
    new employee's work assignment
  • my (dbh, first, last, department) _at__
  • my (insert_handle, update_handle)
  • my insert_handle dbh-gtprepare_cached('INSERT
    INTO employees VALUES (?,?,?)')
  • my update_handle dbh-gtprepare_cached('UPDATE
    departments SET num_members num_members 1
    WHERE id ?')
  • die "Couldn't prepare queries aborting" unless
    defined insert_handle defined update_handle
  • insert_handle-gtexecute(first, last,
    department) or return 0 update_handle-gtexecute(
    department) or return 0 return 1 Success

36
Inserting data in the database
  • sub new_employee
  • Arguments database handle first and last
    names of new employee department ID number for
    new employee's work assignment
  • my (dbh, first, last, department) _at__
  • my (insert_handle, update_handle)
  • my insert_handle dbh-gtprepare_cached('INSERT
    INTO employees VALUES (?,?,?)')
  • my update_handle dbh-gtprepare_cached('UPDATE
    departments SET num_members num_members 1
    WHERE id ?')
  • die "Couldn't prepare queries aborting" unless
    defined insert_handle defined update_handle
  • my success 1
  • success insert_handle-gtexecute(first,
    last, department) success
    update_handle-gtexecute(department)
  • my result (success ? dbh-gtcommit
    dbh-gtrollback) unless (result) die
    "Couldn't finish transaction " . dbh-gterrstr
    return success

37
  • A short cut for DELETING, UPDATING and INSERTING
    data use the do statement
  • dbh-gtdo('DELETE FROM people WHERE age gt 65')

38
A little on more keys
  • A foreign key is a column or columns whose values
    are the same as the primary key of another table.
    You can think of a foreign key as a copy of
    primary key from another relational table. The
    relationship is made between two relational
    tables by matching the values of the foreign key
    in one table with the values of the primary key
    in another.

39
Reminders
  • Executable CGI files
  • Configuring Apache to read cgi files from cgi-bin
  • In httpd.conf
  • ScriptAlias /cgi-bin/ "C/Program Files/Apache
    Group/Apache2/cgi-bin/"
Write a Comment
User Comments (0)
About PowerShow.com