Information Infrastructure II - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Information Infrastructure II

Description:

A consistent way to access different databases ... Allows you to interact in a Pythonic way. A row is a Python tuple. The Python DB-API ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 35
Provided by: rajars
Category:

less

Transcript and Presenter's Notes

Title: Information Infrastructure II


1
Information Infrastructure II
  • I211 Week 12
  • Rajarshi Guha

2
Outline
  • The Python DB-API
  • Working with Postgres
  • Working with sqlite3
  • Base64 encoding

3
The Python DB-API
  • A consistent way to access different databases
  • If your program uses this, your application can
    use any database supported by the API
  • Just need to change 2 lines
  • Hides much of the database internals
  • Allows you to interact in a Pythonic way
  • A row is a Python tuple

4
The Python DB-API
  • You need to construct SQL statements as strings
  • The API will allow you to
  • Send a query
  • Get the results of a query
  • One at a time
  • All together
  • Commit transactions

5
Connecting to Postgres
  • Well use the psycopg2 module
  • Remember to import it
  • To connect we use the connect method
  • Returns a connection object
  • We then get a cursor object
  • Now were set to use the database

con psycopg2.connect('dbnamepython211
useri211 passwordI211pwd') cursor con.cursor()
6
Working with rguha_pc
Column Type Modifiers
--------------------------------------------
cid bigint not null
iupac_openeye_name text
iupac_cas_name text iupac_name
text iupac_systematic_name
text iupac_traditional_name text
nist_inchi text xlogp
real exact_mass
real mw real
can_smi text
iso_smi text tpsa
real charge
smallint hatm_count integer
gfp bit(166) gfpbcnt
integer
7
Performing Queries
  • The cursor object has a method call execute takes
    one argument
  • Create an SQL statement in a string
  • Send this to execute
  • This function has no return value

sql select cid from rguha_pc where mw gt 100
and mw lt 150 cursor.execute(sql)
8
How To Handle Errors?
  • If your SQL statement has an error an exception
    will be thrown
  • Catch it and report what happened
  • If the statement was successful, there is no
    response

9
How to Handle Errors
sql selet from rguha_pc where mw gt 100 and
mw lt 150 try cursor.execute(sql) except
Exception, e print e gtgtgt syntax error at or
near "selet" at character 1 sql select from
rguha_pctable where mw gt 100 and mw lt 150 try
cursor.execute(sql) except Exception, e
print e gtgtgt relation "rguha_pctable" does not
exist
10
How to Handle Errors
  • In general, syntax errors in SQL, wrong table
    names etc will be fixed during testing
  • Other errors might occur at runtime
  • Tables may be locked
  • Loose connection
  • Wrapping stuff in try/except is a good idea

11
Getting Results
  • Get the results from the cursor object
  • fetchall() - returns all the results as a list of
    tuples
  • fetchone() - returns a single tuple
  • In both cases, the individual tuples have N
    elements
  • N is the number of columns in the SELECT statement

12
Getting All The Results
  • If the result set is very big, your program will
    be very slow or crash
  • If there were no records then ret will be zero
    length

c.execute('select cid,mw from rguha_pc where mw gt
120 and mw lt 150') ret c.fetchall() print Got
back d records (length(ret)) for row in ret
print ret
13
A Cursor As a List
  • After executing the SQL, the cursor has the
    records
  • We can also directly loop over the cursor
  • Cant get the length of the cursor since its not
    a real list

c.execute('select cid,mw from rguha_pc where mw gt
120 and mw lt 150') for row in c print row
14
Results are Temporary
  • After one call to fetchall(), doing it again will
    not get any results
  • Also, if you loop over the cursor. Works just one
    time
  • To get the results again, must execute the
    original query
  • If you need to keep the rows for later use, copy
    them into a list!

15
Results are Temporary
gtgtgt c.execute('select cid from rguha_pc where mw
gt 120 and mw lt 150') gtgtgt ret c.fetchall() gtgtgt
print 'Got d results' (len(ret)) Got 62
results gtgtgt ret c.fetchall() gtgtgt print 'Got d
results' (len(ret)) Got 0 results
16
Results One at a Time
  • Rather than get a whole list of records, its
    better to get one at a time
  • Useful for large result sets

c.execute('select cid from rguha_pc where mw gt
120 and mw lt 150') while True ret
c.fetchone() if not ret break print
ret
17
Results One at a Time
  • As with fetchall(), once the loop is over, you
    cant repeat it
  • You can combine fetchone() with fetchall()
  • After calling fetchone() a few times, fetchall()
    will return the remainder

18
Combining fetchall() fetchone()
  • The result of execute() is like a pipe
  • fetchone() picks one item from the pipe at a time
  • fetchall() gets all the items (or whatever is
    left)
  • In both cases, those items are no longer available

c.execute('select cid from rguha_pc where mw gt
120 and mw lt 150') print c.fetchone() print
c.fetchone() print len(c.fetchall())
19
Update/Delete from Python
  • Very similar to SELECT queries
  • Construct the SQL
  • Execute it

sql "update rguha_pc set tpsa 1234 where cid
'12997'" c.execute(sql)
20
Committing Transactions
  • psycopg2 uses transactions
  • Not important for SELECT
  • If you do a series of UPDATE, INSERT or DELETE
    you should commit
  • If you dont the changes may or may not be seen
    in a subequent query

con psycopgq( ) cursor.execute(insert into x
values (1)) con.commit
21
A Postgres Alternative?
  • Uptil now, weve been working with Postgres
  • Lot of overhead involved
  • Someone has to maintain the database
  • Involves a distinct process on the machine
  • Overkill for small projects
  • Go to a smaller database

22
sqlite3
  • A small database
  • Doesnt require a separate process
  • Minimal (no) maintenance
  • Can even run in memory
  • Uses a reduced form of SQL
  • Uses DB-API

23
Sqlite3 - Why Use it?
  • Comes with Python 2.5
  • Useful for smallish projects
  • MP3 library manager (Amarok)
  • Manage email (Mail.app)
  • Good for prototyping
  • Easily shift to Postgres/Mysql/Oracle later on
    with minimal change

24
Using Sqlite3
  • Wont work on Sulu
  • The code creates a DB file (if it didnt exist)
  • After that itsthe same asfor Postgres

import sqlite3 con sqlite3.connect(/tmp/example
) cursor con.cursor()
25
Creating Tables
  • If you had sqlite installed separately, you could
    create tables from the command line
  • No need to bother with that in Python!
  • Just do an execute
  • Should be donejust once

c.execute('' create table stocks (date text,
trans text, symbol text, qty real, price
real)''')
26
Insert/Update/Delete
  • Exactly the same as previous examples
  • Create SQL statement
  • Execute it

27
Handling Binary Data
  • Many sources of binary data
  • Images (PNG, JPEG)
  • Music (MP3, WAV)
  • Data files (Word, Excel)
  • Cant handle binary files like text
  • No line endings
  • Some characters might print weirdly

28
Handling Binary Data
  • But certain applications are text based
  • HTTP
  • Email
  • How can we pass binary data in these types of
    applications
  • Example How does you email client send an MP3
    attachment?

29
Base64 Encoding
  • The trick is to convert the binary data to text
    data
  • No special characters involved
  • Many ways to do this
  • Well consider Base 64 encoding
  • Why 64?
  • Uses numbers in base 64

30
Base 64
  • Base 2 is binary
  • Digits are 1 and 0
  • Base 10 is our use base
  • Digits are 0, , 9
  • Base 16 is hexadecimal
  • Digits are 0, , 9, A, , F
  • Base 64 uses many more digits
  • 0, , 9, A, , Z, a, , z, , /
  • The is used as a padding symbol

31
How Does it Work?
  • Consider Hello
  • Take the ASCII code for each letter
  • Convert the code to binary
  • Join the whole thing (40 bits)
  • Then take 6 bits at a time and convert it to base
    64
  • SGVsbG8

32
Working With Base64
  • Python has a module, base64
  • Handles encoding and decoding
  • Makes life very simple
  • Two functions of interest
  • encodestring - converts a list of bytes to Base64
  • decodestring - converts Base64 to a list of bytes
  • With larger data (say MP3 files) youll want to
    use encode() and decode() which work directly
    with files

33
Working with Base64
  • import base64
  • s My name is Jack Bauer
  • enc base64.encodestring(s)
  • dec base64.decodestring(enc)
  • print enc
  • print dec

34
Working with Base64
  • To quickly encode a binary file, just read in all
    the data
  • Call encodestring
  • Same procedure for a Base64 encoded file
  • Read the wholefile
  • Dont read lineby line

import base64 f open(binary.exe, r) data
f.read() f.close() enc base64.encodestring(dat)
Write a Comment
User Comments (0)
About PowerShow.com