Title: Introduction to Databases
1Introduction to Databases
2Structure
- Motivation
- Introduction to MySQL
- Example Queries
- Using SQL to query GeneOntology
3What databases are about
- Logical organization of data
- data models, schema design, dictionaries
- Physical organization of data
- Fast retrieval, indexing, compact storage of data
- Other requirements
- Logging (important to know who did what to the
data) - Security and access control (important to know
who can do what) - Transactions and concurrency control (important
when more than one person is working on database) - Integrity (important to ensure that only valid
entries in the database) - Recovery (important as hardware and software can
sometimes fail
4Different types of databases
- Flat files
- XML
- Relational database
- (Object databases)
- (Object relational databases)
5Flat files
ID BTBPTIG standard genomic DNA MAM 3998
BP. XX AC X03365 K00966 XX SV
X03365.1 XX DT 18-NOV-1986 (Rel. 10,
Created) DT 20-MAY-1992 (Rel. 31, Last updated,
Version 3) XX DE Bovine pancreatic trypsin
inhibitor (BPTI) gene XX KW Alu-like repetitive
sequence protease inhibitor trypsin
inhibitor. XX OS Bos taurus (cow) OC
Eukaryota Metazoa Chordata Craniata
Vertebrata Euteleostomi Mammalia OC
Eutheria Cetartiodactyla Ruminantia Pecora
Bovoidea Bovidae Bovinae OC Bos. XX RN
1 RP 1-3998 RX MEDLINE 86158754. RX
PUBMED 2420326. RA Kingston I.B., Anderson
S. RT "Sequences encoding two trypsin
inhibitors occur in strikingly similar RT
genomic environments" RL Biochem. J.
233(2)443-450(1986). XX RN 2 RX MEDLINE
84070725. RX PUBMED 6580617. RA Anderson S.,
Kingston I.B. RT "Isolation of a genomic clone
for bovine pancreatic trypsin inhibitor by RT
using a unique-sequence synthetic DNA probe."
- We can store any data in a flat file, e.g. EMBL
- But is this a database?
- Logical data organisation None, unless we define
one (as done for EMBL) and adhere to it, which is
not enforced - Physical data organisation None, we cannot
optimise retrieval for common queries - Logging No
- Access control Implicit through Unix
- Transaction and concurrency control None
- Integrity None
- Recovery If files are backed-up they can be
recovered. However, not on the fly
6XML files
ltArticlegt ltJournalgt ltISSNgt0270-7306lt/ISSNgt ltJourn
alIssuegt ltVolumegt19lt/Volumegt ltIssuegt11lt/Issuegt
ltPubDategt ltYeargt1999lt/Yeargt ltMonthgtNovlt/M
onthgt lt/PubDategt lt/JournalIssuegt lt/Journalgt ltAr
ticleTitlegtDifferential regulation of the cell
wall integrity mitogen-activated protein kinase
pathway in budding yeast by the protein tyrosine
phosphatases Ptp2 and Ptp3. lt/ArticleTitlegt ltPagin
ationgt ltMedlinePgngt7651-60lt/MedlinePgngt lt/Paginat
iongt ltAbstractgt ltAbstractTextgtMitogen-activated
protein kinases (MAPKs) are inactivated by
dual-specificity and protein tyrosine
phosphatases (PTPs) in yeasts. In Saccharomyces
cerevisiae, two PTPs, Ptp2 and Ptp3, inactivate
the MAPKs, Hog1 and Fus3, with different
specificities... lt/AbstractTextgt lt/Abstractgt ltAffi
liationgtDepartment of Chemistry, University of
Colorado, Boulder, Colorado 80309-0215,
USA. lt/Affiliationgt
- We can store any data in XML, the eXtentable
Mark-up Language, e.g. Medline - But is this a database?
- Logical data organisation yes, XML schema, which
is enforced - Physical data organisation None, we cannot
optimise retrieval for common queries - Logging No
- Access control Implicit through Unix
- Transaction and concurrency control None
- Integrity None
- Recovery If files are backed-up they can be
recovered. However, not on the fly
7Relational Database
- Central Idea Data as relations in a table
- E.g. SCOP, Structural Classification of Proteins
---------------------------------------------
------------------------ id type sccs
sid description
-------------------------------------------
-------------------------- 46457 cf a.1
- Globin-like
46458 sf a.1.1 -
Globin-like 46459
fa a.1.1.1 - Truncated hemoglobin
46460 dm a.1.1.1 -
Truncated hemoglobin 46461
sp a.1.1.1 - Ciliate (Paramecium
caudatum) 14982 px a.1.1.1
d1dlwa_ 1dlw A
46462 sp a.1.1.1 - Green alga
(Chlamydomonas eugametos) 14983 px
a.1.1.1 d1dlya_ 1dly A
63437 sp a.1.1.1 -
Mycobacterium tuberculosis 62301
px a.1.1.1 d1idra_ 1idr A
------------------------------
---------------------------------------
8Relational Database
- Central Idea Data as relations in a table
- E.g. Employee
------------------------------- id
name salary role --------------------
----------- 46457 pete 50.000
director 46458 jane 60.000 nurse
46459 asif 70.000 driver
-------------------------------
9Relational Database
- Central Idea Data as relations in a table
- E.g. pets
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
10Relational Database
- Central Idea Data as relations in a table
- E.g. school
---------------------- id prof
subject ---------------------- 51221
bert bio 55435 anne math
----------------------
---------------------- id name
subject ---------------------- 46458
rick bio 46459 gerd bio
46460 mary bio 46461 ella math
14982 anne math 46462 paul
math ----------------------
-------------------------- subject room
day time --------------------------
bio A mo 3pm math B
tue 1pm --------------------------
11Relational Database
- A cell in the table stores a single number or
string, but not a list - Lists, sets need to be flattened
------------------- prof subjects
------------------- bert bio,sport
anne math,arts -------------------
12Bioinformatics 10 years of resistance to
flattening!
- Why the resistance?
- Bioinformatics data is naturally nested
- Extensive Use of sets and lists
- E.g. Swissprot Features, keywords, References
-
- Such data can be flattened, but the resulting
relational schema is hard to understand hence it
is hard to formulate queries. - For example, storing the SWISSPROT entry in a
relational database would split it over 15-20
tables.
13Relational Databases
- RDB introduced in 1970 by Codd
- Took off in the 80s
- In the business world, relational databases are
the rule (Oracle, Sybase, mySQL, DB2, Microsoft
Access). - Large biomedical databases typically use a
relational technology but there are also a lot
of homegrown systems (ACeDB, SRS indexed files).
Data is almost always viewed and exported in a
variety of flat file formats (EMBL, GenBank among
others)
14The flood of biomedical data
- Since 1980, the number and size of biomedical
databases has been growing exponentially. - How can you find sources of information you are
seeking? - Nucleic Acids Research Database Issue in January
of every year (http//nar.oupjournals.org/) - Dbcat (http//www.infobiogen.fr/ ) a flat file
database of 500 biological databases.
Susan B. Davidson, Biol537/CIS636, Fall 2003
15Relational Schema
- The schema of a database is a set of relation
names, their field names and types. - Example
- Entry(ID int, Length int, Seq string, Mod
date) - Feature(ID int, Type string, From int,To
int) - Entry and Feature are relation names,
- ID, Seq, Mod, etc are attribute names, and
- int, string, date are domains
-
Susan B. Davidson, Biol537/CIS636, Fall 2003
16Relation Instance
- An instance of a relation is a set of tuples of
the type of the relation. - A tuple of Entry could be
- ( ID 82814, Length 597, Seqccagctaaccg, Mod
1-7-95) - A tuple of Feature could be
- (ID 82814, Typesource, From1,To8959)
Susan B. Davidson, Biol537/CIS636, Fall 2003
17Tabular representation
- Typically, relations are displayed as tables
- Sequence
- Feature
Susan B. Davidson, Biol537/CIS636, Fall 2003
18Entities and Relationships
- There is a one-many relationship from Entry to
Feature each entry can have many features, but a
feature can be on at most one entry. - Put another way, the existence of a feature
depends on the existence of the owning entry ?
referential integrity
Susan B. Davidson, Biol537/CIS636, Fall 2003
19Integrity Constraints
- ID is the key of Entry, indicated by underlining
- No two tuples of any instance of Entry can have
the same ID. - In Feature, there is a referential integrity
constraint on ID - Every ID in Feature must appear in some tuple in
Entry. - This is specified in the data definition language
(DDL), and enforced by the system as updates are
made to the instance.
Susan B. Davidson, Biol537/CIS636, Fall 2003
20DDL for this relational schema
CREATE TABLE Feature (Id INTEGER, Type
CHAR(15), From INTEGER, To INTEGER,
PRIMARY KEY (Id, Type,
From, To) FOREIGN KEY (Id)
REFERENCES Entry ON DELETE CASCADE
ON UPDATE CASCADE)
CREATE TABLE Entry (Id INTEGER, Length
INTEGER, Sequence LONGCHAR, Mod DATE,
PRIMARY KEY (Id) )
Susan B. Davidson, Biol537/CIS636, Fall 2003
21Querying relational databases
- The language SQL has become a standard for
querying relational databases. Based on a
curious mixture of the relational algebra and
relational calculus (formal languages), it allows
new relations of information to be computed from
a set of relations. - Unlike the relational algebra, it allows other
useful stuff count, sum, min, max, etc.
Susan B. Davidson, Biol537/CIS636, Fall 2003
22Basic Query
SELECT DISTINCT target-list FROM
relation-list WHERE qualification
- relation-list A list of relation names (possibly
with a range-variable after each name). - target-list A list of attributes of relations in
relation-list. can be used to denote all atts. - qualification Comparisons (Attr op const or
Attr1 op Attr2, where op is one of
combined using AND, OR and NOT. - DISTINCT (optional) keyword indicates that the
answer should not contain duplicates. Default is
that duplicates are not eliminated!
Susan B. Davidson, Biol537/CIS636, Fall 2003
23Conceptual Evaluation Strategy
- Compute the product of relation-list
- Discard tuples that fail qualification
- Project over attributes in target-list
- If DISTINCT then eliminate duplicates
-
- This is probably a very bad way of executing
the query, and a good query optimizer will use
all sorts of tricks to find efficient strategies
to compute the same answer.
Susan B. Davidson, Biol537/CIS636, Fall 2003
24Sample tables
- ID Length Seq Mod
- 82814 597 ccagctaa...
1-07-95 - 98608 18976 accgcct...
2-14-98 - 16665 gtgtaa. 1-19-97
- 76582 9976 actgga
2-29-00
- ID Type From To
- 82814 Source 1 597
- 82814 Gene 23 65
- Gene 3 9999
- 13428 Gene 11000 16665
- 13428 Source 1 16665
Susan B. Davidson, Biol537/CIS636, Fall 2003
25Simple queries
Print all sequences with length less than 10000.
SELECT FROM Sequence WHERE Length lt 10000
ID Length Seq Mod
82814 597 ccagctaa...
1-07-95 76582 9976 actgga
2-29-00
Type Source Gene Gene Gene
Source
SELECT Type FROM Feature
Print the type of all features.
Susan B. Davidson, Biol537/CIS636, Fall 2003
26Distinct
- Note that SQL did not eliminate duplicates. We
need to request this explicitly.
SELECT DISTINCT Type FROM Feature
Print the type of all features (no duplicates).
Type Source Gene
Susan B. Davidson, Biol537/CIS636, Fall 2003
27Pattern Matching
- Can be used in where clause. _ denotes any
character, 0 or more characters.
SELECT FROM Sequence WHERE Seq LIKE a_g'
ID Length Seq Mod
98608 18976 accgcct...
2-14-98 76582 9976 actgga
2-29-00
Susan B. Davidson, Biol537/CIS636, Fall 2003
28Arithmetic
- as can be used to label columns in the output
arithmetic can be used to compute results
SELECT DISTINCT ID, To-From1 as Length FROM
Feature
- ID Length
- 82814 597
- 82814 43
- 13428 9997
- 13428 5666
- 13428 16665
Susan B. Davidson, Biol537/CIS636, Fall 2003
29Set operations -- union
SELECT ID FROM Sequence WHERE Lengthlt10000 UNION
SELECT ID FROM Feature WHERE TypeSource
ID 76582 82814 13428
- Duplicates do not occur in the union.
Susan B. Davidson, Biol537/CIS636, Fall 2003
30The UNION ALL operator preserves duplicates
SELECT ID FROM Sequence WHERE Lengthlt10000 UNION
ALL SELECT ID FROM Feature WHERE TypeSource
Susan B. Davidson, Biol537/CIS636, Fall 2003
31Intersection and difference
SELECT Id FROM Sequence INTERSECT SELECT Id FROM
Feature
SELECT Id FROM Sequence MINUS SELECT Id FROM
Feature
Susan B. Davidson, Biol537/CIS636, Fall 2003
32Products
SELECT FROM Sequence,Feature
- ID Length Seq
Mod ID Type From To - 82814 597 ccagctaa...
1-07-95 82814 Source 1 597 - 98608 18976 accgcct...
2-14-98 82814 Source 1 597 - 16665 gtgtaa. 1-19-97
82814 Source 1 597 - 76582 9976 actgga
2-29-00 82814 Source 1 597 - . (lots more!)
- Note that the ID column name is duplicated in the
output.
Susan B. Davidson, Biol537/CIS636, Fall 2003
33Conditional join
SELECT FROM Sequence, Feature WHERE Sequence.Id
Feature.Id
- ID Length Seq
Mod ID Type From To - 82814 597 ccagctaa...
1-07-95 82814 Source 1 597 - 82814 597 ccagctaa...
1-07-95 82814 Gene 23 65 - 13428 16665 gtgtaa.
1-19-97 13428 Gene 3 9999 - 16665 gtgtaa. 1-19-97
13428 Gene 11000 16665 - 13428 16665 gtgtaa.
1-19-97 13428 Source 1 16665
Susan B. Davidson, Biol537/CIS636, Fall 2003
34Counting
SELECT COUNT() FROM Feature
Print the number of feature entries.
Print the number of types of features.
- Surprisingly, the answer to both of these is the
following
SELECT COUNT(Type) FROM Feature
Susan B. Davidson, Biol537/CIS636, Fall 2003
35Counting, cont.
- To fix this, we use the keyword DISTINCT
- Can also use SUM, AVG, MIN and MAX.
SELECT COUNT(DISTINCT Type) FROM Feature
Susan B. Davidson, Biol537/CIS636, Fall 2003
36Group by
- So far, these aggregate operators have been
applied to all qualifying tuples. Sometimes we
want to apply them to each of several groups of
tuples. - For example Print the type and number of
features of each type.
Susan B. Davidson, Biol537/CIS636, Fall 2003
37Group by
SELECT Type, COUNT() FROM Feature GROUP BY Type
- Note that only the columns that appear in the
GROUP BY statement and aggregated columns can
appear in the output. So the following would
generate an error.
SELECT Type, From, To, COUNT() FROM
Feature GROUP BY Type
Susan B. Davidson, Biol537/CIS636, Fall 2003
38Group by having
- HAVING is to GROUP BY as WHERE is to FROM
- HAVING is used to restrict the groups that
appear in the result.
SELECT Type, COUNT() FROM Feature WHERE From-To
gt 50 GROUP BY Type HAVING AVG(From-To)gt 8500
Susan B. Davidson, Biol537/CIS636, Fall 2003
39Summary
- SQL is relationally complete allows you to
perform operators in an algebra of relations
(the relational algebra). - Additional features string comparisons, set
membership, arithmetic and grouping. - In contrast, Entrez is a much more limited
language.
Susan B. Davidson, Biol537/CIS636, Fall 2003
40A Little Exercise
41A Little Exercise
- Given the table pet below let us formulate some
queries
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
42A Little Exercise
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
43A Little Exercise
- Select name from pet
- Get all owners and list them only once
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
44A Little Exercise
- Select distinct owner from pet
- Select the names of all birds
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
45A Little Exercise
- Select name from pet where speciesbird
- Select the names of all female birds
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
46A Little Exercise
- Select name from pet where speciesbird and
sexf - Select names and owners
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
47A Little Exercise
- Select name,owner from pet
- Select owners of birds and dogs
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
48A Little Exercise
- Select owner from pet where speciesbird or
speciesdog - Select all owners starting with Dia
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
49A Little Exercise
- Select owner from pet where owner like Dia
- How many pets has Gwen?...
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
50A Little Exercise
- Select count(name) from pet where ownerGwen
- Select owners of male pets in sorted order
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
51A Little Exercise
- Select owner from pet where sexm order by
owner - List owners and the number of pets they have
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
52A Little Exercise
- Select owner, count(name) from pet group by
owner - List owners and the number of pets they have in
descending order
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
53A Little Exercise
- Select owner, count(name) as num from pet group
by owner order by num desc - List owners and the number of pets they have in
descending order, but only if they have more than
1 pet
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
54A Little Exercise
- Select owner, count(name) as num from pet group
by owner having numgt1 order by num desc - List all pairs of cats and dogs
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
55A Little Exercise
- Select p1.name, p1.species, p2.name, p2.species
from pet as p1, pet as p2 where p1.speciesdog
and p2.speciescat - Select all male/female pairs of the same species
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
56A Little Exercise
- Select p1.name, p1.species, p1.sex, p2.name,
p2.species, p2.sex from pet as p1, pet as p2
where p1.speciesp2.species and p1.sexm and
p2.sexf - Can we write p1.sex ! p2.sex instead?
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
57A Little Exercise
- We would get the pair Whistler and Chirpy as well
---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
58GO as Database
- Show all the tables belonging to the GeneOntology.
59GO as Database
- show tables
-
- mysqlgt
- -----------------------
- Tables_in_GO
- -----------------------
- assoc_rel
- association
- association_qualifier
- db
- db_info
- dbxref
- evidence
- evidence_dbxref
- gene_product
- gene_product_count
-
-
gene_product_property gene_product_seq
gene_product_synonym graph_path
graph_path2term
instance_data seq
seq_dbxref seq_property
source_audit species
term term2term
term_audit
term_dbxref term_definition
term_synonym -------------------
----
60GO as Database
- What different term types are there in the term
table?
61GO as Database
- select distinct(term_type) from term
-
- --------------
- term_type
- --------------
-
- component
- function
- process
- relationship
- sequence
- synonym_type
- universal
- --------------
62GO as Database
- What different types of synonyms are there?
63GO as Database
-
- select from term where term_type"synonym_type"
-
- ---------------------------------------------
-------------- - id name term_type acc
is_obsolete is_root - ---------------------------------------------
-------------- - 7 alt_id synonym_type alt_id
0 0 - 20 exact synonym_type exact
0 0 - 47 related synonym_type related
0 0 - 50 narrow synonym_type narrow
0 0 - 105 broad synonym_type broad
0 0 - ---------------------------------------------
-------------- -
64GO as Database
- How many entries are in the term table?
65GO as Database
- select count() from term
-
- ----------
- count()
- ----------
- 18091
- ----------
66GO as Database
- Do all entries have different names?
67GO as Database
- select count(distinct name) from term
-
- ----------------------
- count(distinct name)
- ----------------------
- 17947
- ----------------------
68GO as Database
- GeneOntology breaks down into three subontologies
for biological process, molecular functions, and
cellular components. - How many terms are there for each of them?
69GO as Database
- select count() from term where
term_type"component" - ----------
- count()
- ----------
- 1489
- ----------
- select count() from term where
term_type"function" - ----------
- count()
- ----------
- 7437
- ----------
- select count() from term where
term_type"process" - ----------
- count()
- ----------
- 9055
- ----------
70GO as Database
- Which terms contain "GTPase" and "signal"?
71GO as Database
- select id,name,acc from term where name like
"GTPasesignal" -
- ------------------------------------------------
------------------------------------- - id name
acc - ------------------------------------------------
------------------------------------- - 4280 small GTPase mediated signal
transduction GO0007264
- 17832 regulation of small GTPase mediated
signal transduction GO0051056 - 17833 positive regulation of small GTPase
mediated signal transduction GO0051057 - 17834 negative regulation of small GTPase
mediated signal transduction GO0051058 - ------------------------------------------------
-------------------------------------
72GO as Database
- What are the children of small GTPase mediated
signal transduction? - What does the term2term table look like?
73GO as Database
- select from term2term limit 5
-
- ----------------------------------------------
---------- - id relationship_type_id term1_id term2_id
complete - ----------------------------------------------
---------- - 1 2 1 10
0 - 2 2 10 9
0 - 3 2 10 13
0 - 4 2 1 26
0 - 5 2 26 25
0 - ----------------------------------------------
----------
74GO as Database
- What does relationship type mean? Let's try a
join with the term table.
75GO as Database
- select term2term.,name from term2term,term where
term2term.relationship_type_idterm.id limit 5 -
- ----------------------------------------------
---------------- - id relationship_type_id term1_id term2_id
complete name - ----------------------------------------------
---------------- - 1 2 1 10
0 is_a - 2 2 10 9
0 is_a - 3 2 10 13
0 is_a - 4 2 1 26
0 is_a - 5 2 26 25
0 is_a - ----------------------------------------------
----------------
76GO as Database
- Ok, so what are the children of small GTPase
mediated signal transduction?
77GO as Database
- select
- term.
- from
- term,
- term2term
- where
- term1_id 4280 and
- term2_id term.id
-
- ------------------------------------------------
-----------------------------------------------
-------------- - id name
term_type acc
is_obsolete is_root - ------------------------------------------------
-----------------------------------------------
-------------- - 4281 Ras protein signal transduction
process GO0007265
0 0 - 4282 Rho protein signal transduction
process GO0007266
0 0 - 8301 Rac protein signal transduction
process GO0016601
0 0 - 17832 regulation of small GTPase mediated
signal transduction process GO0051056
0 0 - ------------------------------------------------
-----------------------------------------------
--------------
78GO as Database
- Let's look at some of the other tables
-
- select from seq limit 1
-
- ----------------------------------------------
- id display_id description seq
- --------------------------------------------
- 1 P91926 Alpha-adaptin homolog.
MAPVRGDG... - ----------------------------------------------
------ -
79GO as Database
- select from species limit 1
-
- ----------------------------------------------
------------------------------------- - id ncbi_taxa_id common_name
lineage_string genus species
- ----------------------------------------------
------------------------------------- - 1 199350 NULL NULL
Influenza C virus (C/Miyagi/9/96) - ----------------------------------------------
------------------------------------- -
80GO as Database
- Is the mammoth among the species?
-
- select from species where common_name like
"mammoth" -
- ----------------------------------------------
---------------- - id ncbi_taxa_id common_name genus
species - ----------------------------------------------
---------------- - 184126 37349 woolly mammoth
Mammuthus primigenius - ----------------------------------------------
----------------
81GO as Database
- select from species where common_name like
"mouse" -
- -----------------------------------------------
-----------------------------------------------
-------------- - id ncbi_taxa_id common_name
lineage_string genus
species - -----------------------------------------------
-----------------------------------------------
-------------- - 6265 109676 Kazbeg birch mouse
NULL Sicista
kazbegica - 6710 13151 European harvest mouse
NULL Micromys
minutus - 11428 187268 Plains Mouse
NULL Pseudomys
australis - 187797 214933 Sichuan field mouse
NULL Apodemus
latronum - 187799 214934 Ward's field mouse
NULL Apodemus wardi
- 188302 30635 hairy-tailed bolo mouse
NULL Bolomys
lasiurus - 188392 60742 Temminck's mouse
NULL Mus
musculoides - 188395 60745 fiery spiny mouse
NULL Acomys
ignitus - 188397 60746 golden spiny mouse
NULL Acomys
russatus - 190875 55145 forest dormouse
NULL Dryomys
nitedula - 190879 55147 Japanese dormouse
NULL Glirulus
japonicus - -----------------------------------------------
-----------------------------------------------
-------------- - 245 rows in set (0.39 sec)
82GO as Database
- select from species where common_name like
"house mouse" -
- ----------------------------------------------
---------------------- - id ncbi_taxa_id common_name
lineage_string genus species - ----------------------------------------------
---------------------- - 118651 10090 house mouse NULL
Mus musculus - ----------------------------------------------
---------------------- -
83GO as Database
- select from evidence limit 5
-
- ---------------------------------------------
- - id code association_id dbxref_id
seq_acc - ---------------------------------------------
- - 1 ND 1 30885
- 2 ND 2 30885
- 3 ND 3 30885
- 4 ISS 4 30888
- 5 ISS 5 30888
- ---------------------------------------------
-
84GO as Database
- select distinct code from evidence
-
- code
- ------
- IC
- IDA
- IEA
- IEP
- IGI
- IMP
- IPI
- ISS
- NAS
- ND
- NR
- TAS
- ------
-
85GO as Database
- select from gene_product limit 5
-
- ---------------------------------------------
----------------------------------------------
- id symbol dbxref_id species_id
secondary_species_id type_id full_name
- ---------------------------------------------
----------------------------------------------
- 1 29C 30884 146087
NULL 18012 Protein 29C (Fragment).
- 2 2C 30886 146087
NULL 18012 coiled-coil protein
- 3 7E 30889 146087
NULL 18012 coiled-coil protein
- 4 AAC11 30890 146087
NULL 18012 AAC-rich mRNA
- 5 AAC4 30892 146087
NULL 18012 AAC-rich mRNA
- ---------------------------------------------
----------------------------------------------
86GO as Database
- select from association limit 5
- ---------------------------------------------
------------------------------ - id term_id gene_product_id is_not
role_group assocdate source_db_id - ---------------------------------------------
------------------------------ - 1 8 1 0
NULL 20040707 1 - 2 2649 1 0
NULL 20040707 1 - 3 4985 1 0
NULL 20040707 1 - 4 2402 2 0
NULL 20040720 1 - 5 2715 2 0
NULL 20040720 1 - ---------------------------------------------
------------------------------
87GO as Database
- select from gene_product_seq limit 1
- -----------------------------------------
- gene_product_id seq_id is_primary_seq
- -----------------------------------------
- 5393 1 NULL
- -----------------------------------------
-
88GO as Database
- List the genes with symbol "BRC1" with their
species and display_id -
89GO as Database
- select
- gp.symbol,
- s.common_name,
- s.genus,
- s.species,
- seq.display_id
- from
- gene_product as gp,
- species as s,
- seq
- where
- gp.symbol like "BRC1" and
- gp.species_id s.id and
- gp.id seq.id
90GO as Database
- -----------------------------------------------
--------------------------- - symbol common_name genus
species display_id - -----------------------------------------------
--------------------------- - brc1 fission yeast Schizosaccharomyces
pombe Q8L743 - BRC1_MACMU rhesus monkey Macaca
mulatta P30627 - BRC1_RAT Norway rat Rattus
norvegicus P63412 - BRC1_HUMAN human Homo
sapiens Q7WEV1 - BRC1_MOUSE house mouse Mus
musculus Q82C72 - BRC1_DROME fruit fly Drosophila
melanogaster Q8VGU3 - -----------------------------------------------
---------------------------
91GO as Database
- Show the annotated terms for BRC1_HUMAN
92GO as Database
- select
- distinct
- gp.symbol,
- s.common_name,
- s.genus,
- s.species,
- seq.display_id,
- t.name,
- t.acc
- from
- gene_product as gp,
- species as s,
- seq,
- association as a,
- term as t
- where
- gp.symbol like "BRC1_HUMAN" and
- gp.species_id s.id and
- gp.id seq.id and
93GO as Database
- ---------------------------------------------
--------------------------- - symbol common_name genus species
display_id name - ---------------------------------------------
--------------------------- - BRC1_HUMAN human Homo sapiens
Q7WEV1 ubiquitin ligase c - BRC1_HUMAN human Homo sapiens
Q7WEV1 DNA binding - BRC1_HUMAN human Homo sapiens
Q7WEV1 damaged DNA bindin - BRC1_HUMAN human Homo sapiens
Q7WEV1 gamma-tubulin ring - ...
- BRC1_HUMAN human Homo sapiens
Q7WEV1 tubulin binding - BRC1_HUMAN human Homo sapiens
Q7WEV1 transcriptional ac - BRC1_HUMAN human Homo sapiens
Q7WEV1 protein ubiquitina - BRC1_HUMAN human Homo sapiens
Q7WEV1 regulation of cell - BRC1_HUMAN human Homo sapiens
Q7WEV1 regulation of apop - BRC1_HUMAN human Homo sapiens
Q7WEV1 positive regulatio - BRC1_HUMAN human Homo sapiens
Q7WEV1 negative regulatio - BRC1_HUMAN human Homo sapiens
Q7WEV1 negative regulatio - ---------------------------------------------
---------------------------
94Limits
- List all descendants of a given term (Transitive
Closure). - This query cannot be expressed in SQL, but easily
with rules - Desc(X,Z) if term2term(X,Z).
- Desc(X,Z) if term2term(X,Y), Desc(Y,Z).
95Introduction to MySQL
96SQL Structured Query Languagethe most common
standardized language used to access
databases. SQL has several parts DDL Data
Definition Language Defining, Deleting,
Modifying relation schemas DML Data
Manipulation Language Inserting, Deleting,
Modifying tuples in database Embedded SQL
defines how SQL statements can be used with
general-purposed programming
97- MySQL, the most popular Open Source SQL database,
is developed, distributed and supported by MySQL
AB. - MySQL is a relational database management system.
- MySQL software is Open Source.
- Written in C and C. Tested with a broad range
of different compilers. - Works on many different platforms.
- APIs for C, C, Eiffel, Java, Perl, PHP, Python,
Ruby, and Tcl. - You can find MySQl manual and documentation at
www .mysql.com/documentation/ - You can download and install MySQL on your own
computer (both under Windows and Linux)
98MySQL To see a list of options provided by
mysql, invoke it with the --help option shellgt
mysql --help Using SQL On any linux you have
to use this to log on to MySQL shellgt
/usr/local/mysql/bin/mysql -h hostname -D
loginname -p shellgt mysql -h host -u user -p
Enter password The
represents your password enter it when mysql
displays the Enter password prompt.
99- Basic Query
- select A1, A2,,An
- from r1, r2, ,rm
- where P
- A1, A2,,An represent attributes
- r1, r2, rm represent relations
- P represents predicate (guard condition)
- Keywords may be entered in any letter case
- mysqlgt SELECT VERSION(), CURRENT_DATE
- mysqlgt select version(), current_date
- mysqlgt SeLeCt vErSiOn(), current_DATE
100Prompt Meaning mysqlgt Ready for new command.
-gt Waiting for next line of
multiple-line command. gt Waiting for
next line, collecting a string that begins
with a single quote ( ). gt
Waiting for next line, collecting a string that
begins with a double quote ( ). mysqlgt
SELECT -gt FROM my_table -gt WHERE
name Smith AND age lt 30 mysqlgt SELECT
FROM my_table WHERE name "Smith AND age lt
30 "gt "\c mysqlgt \c to cancel the execution
of a command
101- Basic Database Operation
- Create a database
- Create a table
- Load data into the table
- Retrieve data from the table in various ways
- Use multiple tables
- Suppose you have several pets in your home (your
menagerie) and you'd like to keep track of
various types of information about them. You can
do so by creating tables to hold your data and
loading them with the desired information. Then
you can answer different sorts of questions about
your animals by retrieving data from the tables.
102Creating and Using a Database mysqlgt SHOW
DATABASES SHOW statement can be used to find
out the databases currently existing on the
server mysqlgt USE testdb testdb is a database
name. USE command does not need a semi colon and
must be given in a single line. Database needs to
be invoked in order to use it. mysqlgt CREATE
DATABASE example Database names are
case-sensitive unlike keywords Same applies for
table names So example ! Example ! EXAMPLE or
some other variant
103Creating a Table mysqlgt SHOW TABLES Displays
the current list of tables mysqlgt CREATE TABLE
pet (name VARCHAR(20), owner VARCHAR(20),
-gt species VARCHAR(20), sex CHAR(1), birth DATE,
death DATE) mysqlgt SHOW TABLES Will display
the table with the table name pet Verification
of the table can be done with DESCRIBE
command mysqlgt DESCRIBE pet ------------------
----------------------------------------------
Field Type Null Key Default
Extra -------------------------------------
--------------------------- name
varchar(20) YES NULL owner
varchar(20) YES NULL species
varchar(20) YES NULL sex char(1)
YES NULL birth date
YES NULL death date YES
NULL
104Loading Data into a Table LOAD DATA uses a text
file with single record in a line that match the
attributes in the table. Useful for inserting
when multiple records are involved. Example
pet.txt is a text file with a single
record Name owner species sex birth death
Whistler Gwen bird \N 1997-12-09 \N
mysqlgt LOAD DATA LOCAL INFILE "pet.txt" INTO
TABLE pet INSERT command can be used when
records needs to be inserted one at a time. NULL
can be directly