Title: Learning SQL to Query Biological Databases
1Learning SQL to Query Biological Databases
- Yannick Pouliot, PhD
- Bioresearch Informationist
- lanebioresearch_at_stanford.edu
- Lane Medical Library Knowledge Management
Center - 11/8/2006
2Contents
- Basics of SQL querying
- Tools for running SQL queries
- Installation (Windows)
- Selected databases
- Resources cheat sheet
- Very important class evaluation questionaire
3So Why Are We Here?
4Bioinformatics Databases Who Supports Direct
Querying?
5Definitions
- Database Collection of tables ?
- Table
- Collection of records that share a common
fundamental characteristic - E.g., patients and locations can each be stored
in their own table - Record
- Basic unit of information in a relation
information - A record is composed of fields
- E.g., 1 record per person
- Query
- Set of instructions to a database engine to
retrieve, sort and format returning data. - find me all patients in my database
6Main Relational Database Engines
- Oracle
- MS Access
- MS SQL Server
- Filemaker
- MS SQL Server
- MySQL
- Postgress
- Sybase
7Understanding the Relational Principle A Simple
Database
- Every patient gets ONE record in the Patients
table - Every visit gets ONE record in the Visits table
- Rows in different tables can be related one to
another using a shared key (identifier) - There can be multiple visits records for a given
patient - There can be multiple tissue records for a given
patient
8The Relational Principle at Work
- Related records can be found using a shared key
- Example Patients.ID Visits.PatientID
9Table and Field Definitions
10The Database Schema Your Roadmap For Querying
- The schema describes all tables and all fields
- Used to determine how to inter-relate tables to
retrieve the desired data - Simple example
- Very important
- Must understand for accurate querying
- Wrong understanding wrong results
11Select -1 Introducing The SQL Select Statement
- Good news This is the only SQL statement you
need to understand for querying - MS Access demo
SELECT LastName, FirstName FROM Patients
12So What Does A Real Schema Look Like?
13Conceptual View of the BioWarehouse Database
14(No Transcript)
15Basic Syntax of Select Statement
- SELECT field_name
- FROM table
- WHERE condition
- Example
- Select LastName,FirstName
- From Patients
- Where Alive Y
- Note case sensitive for all but Oracle
- ? Query statement are written into a tool such as
MS Query or MySQL Query Browser
16SQL QueryingWith What?
- Other query browsers exist but are more
sophisticated - more complex
- Example PL/SQL Developer, from Allround
Automations
17Querying With MySQL Query Browser
18Querying with MySQL Query Browser
- Free
- MySQL only
- Facilitates writing of a SQL query ? graphical
- Get it at http//www.mysql.com/products/tools/quer
y-browser/
19Select 2 A More Complete Description of Select
Statement
- SELECT DISTINCTCOUNT field_list
- FROM table_list
- WHERE conditions
- GROUP BY field_list
- ORDER BY field_list ASC DESC Â
- Note there are minor differences in SQL as
supported by MySQL and Oracle, for example - Complete description for Oracle
http//www.iselfschooling.com/syntax/OracleSyntax.
htm
20A More Meaningful Query Using MySQL Query Browser
- Select distinct(a.Name), a.Length,
a.MolecularWeightExp,c.Name - from Protein a, BioSourceWIDProteinWID b,
BioSource c - where a.WID b.ProteinWID
- and b.BioSourceWID c.WID
- and a.Length gt 50
- and a.MolecularWeightExp is not null
- order by a.Length
21Querying With MS Query
22Putting MSQuery to Work
- MSQuery, an unknown hero
- Free
- Facilitates writing of a SQL query ? graphical
- First, need to find it!
- Search for MSQRY32.EXE using Search for Files
or Folders - Search hidden files and folders
- On my disk, it is located in C\Program
Files\Microsoft Office\OFFICE11 - Once you find it, create a shortcut to it and
rename it e.g. MSQuery - move the shortcut to a desired location
23Example Network Querying of Ensembl Database
Using MS Query
- What happens when you use MS Query
- DEMO
- May take some time
- Big database, lots of data to return from far
away
24Meaningful Example Mapping to Common
IdentifiersE.g., accession numbers from
Swiss-Prot
25The Nitty- Gritty Details
26How to Query Using MSQuery
- Steps
- Make sure you have the requisite driver (next
slide) - Create a Data Source Name (Windows only)
- Write your query
- Get the results back into Excel!
See Lane videorecorded class Managing Experiment
Data Using Excel and Friends Digging Out from
Under the Avalanche for lots more details.
27Step1 Getting DriversEssential for SQL Querying
- A driver is a piece of software that lets your
operating system talk to a database - Installed drivers visible in ODBC manager
- data connectivity tool
- Each database engine (Oracle, MySQL, etc)
requires its own driver - Generally must be installed by user
- Drivers are needed by Data Source Name tool and
querying programs - Require (simple) installation
28MySQL Driver Needed to Query MySQL Databases
- Windows Download MySQL Connector/ODBC 3.51 here
- Must be installed for direct querying using e.g.
Excel - Not necessary if you are using the MySQL Query
Browser
29Oracle Driver Needed to Query Oracle Databases
- Installing client software will also install
driver - Windows Download 10g Client here
- Mac Download 10g Client here
- Free Oracle user account required to download
- Must be installed if you are querying using MS
Query or any other query browser involving Oracle
30Step 2 Creating a Data Source Name
- A Data Source Name (DSN) tells programs on your
PC where and how to query a database - Populating the fields
- Data Source Name Unique name of your choice
- Description anything
- Server exactly as given by the database provider
- Port number as specified by database provider
- Defaults MySQL 3306 Oracle 1521 MS Access
N/A
31Words of Caution
- Easy to build queries that
- Retrieve nonsense
- Never complete, end up completely bogging down
the database - Scotty to Captain Kirk Where going in circles,
and at warp 6 were going mighty fast - Understanding schema is only way to prevent that
- Not always easy
32Resources SQLThe Language to Query Relational
Databases
- Beginning SQL, Wilton P Colby JW E
http//jenson.stanford.edu/uhtbin/cgisirsi/5AGuKep
toD/GREEN/59960102/9holdings - Oracle SQLPlus, Gennick, J.
- Beginning MySQL E http//site.ebrary.com/lib/sta
nford/Doc?id10114227
33(No Transcript)
34http//proquest.safaribooksonline.com/0596007272
35(No Transcript)
36Resources Excel/MS Query
In Lane catalog
37Resources MS Access Free Online Training
Resources
- Using an Access database to store and information
(2 min) http//office.microsoft.com/en-us/assistan
ce/HA011709681033.aspx - Creating a database from Excel (5 min)
http//office.microsoft.com/en-us/assistance/HA012
013211033.aspx - Creating tables in Access (50 min)
http//office.microsoft.com/training/training.aspx
?AssetIDRC061183261033 - Writing queries (50 min) http//office.microsoft.
com/training/training.aspx?AssetIDRC010776611033
38Resources MS Access
1st edition available from SU