Introduction to Databases - PowerPoint PPT Presentation

1 / 114
About This Presentation
Title:

Introduction to Databases

Description:

RT using a unique-sequence synthetic DNA probe.'; By Michael Schroeder, Biotec, 2005 6 ... By Michael Schroeder, Biotec, 2005 11. Relational Database ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 115
Provided by: biotecTu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Databases


1
Introduction to Databases
2
Structure
  • Motivation
  • Introduction to MySQL
  • Example Queries
  • Using SQL to query GeneOntology

3
What databases are about
  • Logical organization of data
  • data models, schema design, dictionaries
  • Physical organization of data
  • Fast retrieval, indexing, compact storage of data
  • Other requirements
  • Logging (important to know who did what to the
    data)
  • Security and access control (important to know
    who can do what)
  • Transactions and concurrency control (important
    when more than one person is working on database)
  • Integrity (important to ensure that only valid
    entries in the database)
  • Recovery (important as hardware and software can
    sometimes fail

4
Different types of databases
  • Flat files
  • XML
  • Relational database
  • (Object databases)
  • (Object relational databases)

5
Flat files
ID BTBPTIG standard genomic DNA MAM 3998
BP. XX AC X03365 K00966 XX SV
X03365.1 XX DT 18-NOV-1986 (Rel. 10,
Created) DT 20-MAY-1992 (Rel. 31, Last updated,
Version 3) XX DE Bovine pancreatic trypsin
inhibitor (BPTI) gene XX KW Alu-like repetitive
sequence protease inhibitor trypsin
inhibitor. XX OS Bos taurus (cow) OC
Eukaryota Metazoa Chordata Craniata
Vertebrata Euteleostomi Mammalia OC
Eutheria Cetartiodactyla Ruminantia Pecora
Bovoidea Bovidae Bovinae OC Bos. XX RN
1 RP 1-3998 RX MEDLINE 86158754. RX
PUBMED 2420326. RA Kingston I.B., Anderson
S. RT "Sequences encoding two trypsin
inhibitors occur in strikingly similar RT
genomic environments" RL Biochem. J.
233(2)443-450(1986). XX RN 2 RX MEDLINE
84070725. RX PUBMED 6580617. RA Anderson S.,
Kingston I.B. RT "Isolation of a genomic clone
for bovine pancreatic trypsin inhibitor by RT
using a unique-sequence synthetic DNA probe."
  • We can store any data in a flat file, e.g. EMBL
  • But is this a database?
  • Logical data organisation None, unless we define
    one (as done for EMBL) and adhere to it, which is
    not enforced
  • Physical data organisation None, we cannot
    optimise retrieval for common queries
  • Logging No
  • Access control Implicit through Unix
  • Transaction and concurrency control None
  • Integrity None
  • Recovery If files are backed-up they can be
    recovered. However, not on the fly

6
XML files
ltArticlegt ltJournalgt ltISSNgt0270-7306lt/ISSNgt ltJourn
alIssuegt ltVolumegt19lt/Volumegt ltIssuegt11lt/Issuegt
ltPubDategt ltYeargt1999lt/Yeargt ltMonthgtNovlt/M
onthgt lt/PubDategt lt/JournalIssuegt lt/Journalgt ltAr
ticleTitlegtDifferential regulation of the cell
wall integrity mitogen-activated protein kinase
pathway in budding yeast by the protein tyrosine
phosphatases Ptp2 and Ptp3. lt/ArticleTitlegt ltPagin
ationgt ltMedlinePgngt7651-60lt/MedlinePgngt lt/Paginat
iongt ltAbstractgt ltAbstractTextgtMitogen-activated
protein kinases (MAPKs) are inactivated by
dual-specificity and protein tyrosine
phosphatases (PTPs) in yeasts. In Saccharomyces
cerevisiae, two PTPs, Ptp2 and Ptp3, inactivate
the MAPKs, Hog1 and Fus3, with different
specificities... lt/AbstractTextgt lt/Abstractgt ltAffi
liationgtDepartment of Chemistry, University of
Colorado, Boulder, Colorado 80309-0215,
USA. lt/Affiliationgt
  • We can store any data in XML, the eXtentable
    Mark-up Language, e.g. Medline
  • But is this a database?
  • Logical data organisation yes, XML schema, which
    is enforced
  • Physical data organisation None, we cannot
    optimise retrieval for common queries
  • Logging No
  • Access control Implicit through Unix
  • Transaction and concurrency control None
  • Integrity None
  • Recovery If files are backed-up they can be
    recovered. However, not on the fly

7
Relational Database
  • Central Idea Data as relations in a table
  • E.g. SCOP, Structural Classification of Proteins

---------------------------------------------
------------------------ id type sccs
sid description
-------------------------------------------
-------------------------- 46457 cf a.1
- Globin-like
46458 sf a.1.1 -
Globin-like 46459
fa a.1.1.1 - Truncated hemoglobin
46460 dm a.1.1.1 -
Truncated hemoglobin 46461
sp a.1.1.1 - Ciliate (Paramecium
caudatum) 14982 px a.1.1.1
d1dlwa_ 1dlw A
46462 sp a.1.1.1 - Green alga
(Chlamydomonas eugametos) 14983 px
a.1.1.1 d1dlya_ 1dly A
63437 sp a.1.1.1 -
Mycobacterium tuberculosis 62301
px a.1.1.1 d1idra_ 1idr A
------------------------------
---------------------------------------
8
Relational Database
  • Central Idea Data as relations in a table
  • E.g. Employee

------------------------------- id
name salary role --------------------
----------- 46457 pete 50.000
director 46458 jane 60.000 nurse
46459 asif 70.000 driver
-------------------------------
9
Relational Database
  • Central Idea Data as relations in a table
  • E.g. pets

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
10
Relational Database
  • Central Idea Data as relations in a table
  • E.g. school

---------------------- id prof
subject ---------------------- 51221
bert bio 55435 anne math
----------------------
---------------------- id name
subject ---------------------- 46458
rick bio 46459 gerd bio
46460 mary bio 46461 ella math
14982 anne math 46462 paul
math ----------------------
-------------------------- subject room
day time --------------------------
bio A mo 3pm math B
tue 1pm --------------------------
11
Relational Database
  • A cell in the table stores a single number or
    string, but not a list
  • Lists, sets need to be flattened

------------------- prof subjects
------------------- bert bio,sport
anne math,arts -------------------
12
Bioinformatics 10 years of resistance to
flattening!
  • Why the resistance?
  • Bioinformatics data is naturally nested
  • Extensive Use of sets and lists
  • E.g. Swissprot Features, keywords, References
  • Such data can be flattened, but the resulting
    relational schema is hard to understand hence it
    is hard to formulate queries.
  • For example, storing the SWISSPROT entry in a
    relational database would split it over 15-20
    tables.

13
Relational Databases
  • RDB introduced in 1970 by Codd
  • Took off in the 80s
  • In the business world, relational databases are
    the rule (Oracle, Sybase, mySQL, DB2, Microsoft
    Access).
  • Large biomedical databases typically use a
    relational technology but there are also a lot
    of homegrown systems (ACeDB, SRS indexed files).
    Data is almost always viewed and exported in a
    variety of flat file formats (EMBL, GenBank among
    others)

14
The flood of biomedical data
  • Since 1980, the number and size of biomedical
    databases has been growing exponentially.
  • How can you find sources of information you are
    seeking?
  • Nucleic Acids Research Database Issue in January
    of every year (http//nar.oupjournals.org/)
  • Dbcat (http//www.infobiogen.fr/ ) a flat file
    database of 500 biological databases.

Susan B. Davidson, Biol537/CIS636, Fall 2003
15
Relational Schema
  • The schema of a database is a set of relation
    names, their field names and types.
  • Example
  • Entry(ID int, Length int, Seq string, Mod
    date)
  • Feature(ID int, Type string, From int,To
    int)
  • Entry and Feature are relation names,
  • ID, Seq, Mod, etc are attribute names, and
  • int, string, date are domains

Susan B. Davidson, Biol537/CIS636, Fall 2003
16
Relation Instance
  • An instance of a relation is a set of tuples of
    the type of the relation.
  • A tuple of Entry could be
  • ( ID 82814, Length 597, Seqccagctaaccg, Mod
    1-7-95)
  • A tuple of Feature could be
  • (ID 82814, Typesource, From1,To8959)

Susan B. Davidson, Biol537/CIS636, Fall 2003
17
Tabular representation
  • Typically, relations are displayed as tables
  • Sequence
  • Feature

Susan B. Davidson, Biol537/CIS636, Fall 2003
18
Entities and Relationships
  • There is a one-many relationship from Entry to
    Feature each entry can have many features, but a
    feature can be on at most one entry.
  • Put another way, the existence of a feature
    depends on the existence of the owning entry ?
    referential integrity

Susan B. Davidson, Biol537/CIS636, Fall 2003
19
Integrity Constraints
  • ID is the key of Entry, indicated by underlining
  • No two tuples of any instance of Entry can have
    the same ID.
  • In Feature, there is a referential integrity
    constraint on ID
  • Every ID in Feature must appear in some tuple in
    Entry.
  • This is specified in the data definition language
    (DDL), and enforced by the system as updates are
    made to the instance.

Susan B. Davidson, Biol537/CIS636, Fall 2003
20
DDL for this relational schema
CREATE TABLE Feature (Id INTEGER, Type
CHAR(15), From INTEGER, To INTEGER,
PRIMARY KEY (Id, Type,
From, To) FOREIGN KEY (Id)
REFERENCES Entry ON DELETE CASCADE
ON UPDATE CASCADE)
CREATE TABLE Entry (Id INTEGER, Length
INTEGER, Sequence LONGCHAR, Mod DATE,
PRIMARY KEY (Id) )
Susan B. Davidson, Biol537/CIS636, Fall 2003
21
Querying relational databases
  • The language SQL has become a standard for
    querying relational databases. Based on a
    curious mixture of the relational algebra and
    relational calculus (formal languages), it allows
    new relations of information to be computed from
    a set of relations.
  • Unlike the relational algebra, it allows other
    useful stuff count, sum, min, max, etc.

Susan B. Davidson, Biol537/CIS636, Fall 2003
22
Basic Query
SELECT DISTINCT target-list FROM
relation-list WHERE qualification
  • relation-list A list of relation names (possibly
    with a range-variable after each name).
  • target-list A list of attributes of relations in
    relation-list. can be used to denote all atts.
  • qualification Comparisons (Attr op const or
    Attr1 op Attr2, where op is one of
    combined using AND, OR and NOT.
  • DISTINCT (optional) keyword indicates that the
    answer should not contain duplicates. Default is
    that duplicates are not eliminated!

Susan B. Davidson, Biol537/CIS636, Fall 2003
23
Conceptual Evaluation Strategy
  • Compute the product of relation-list
  • Discard tuples that fail qualification
  • Project over attributes in target-list
  • If DISTINCT then eliminate duplicates
  • This is probably a very bad way of executing
    the query, and a good query optimizer will use
    all sorts of tricks to find efficient strategies
    to compute the same answer.

Susan B. Davidson, Biol537/CIS636, Fall 2003
24
Sample tables
  • Sequence
  • Feature
  • ID Length Seq Mod
  • 82814 597 ccagctaa...
    1-07-95
  • 98608 18976 accgcct...
    2-14-98
  • 16665 gtgtaa. 1-19-97
  • 76582 9976 actgga
    2-29-00
  • ID Type From To
  • 82814 Source 1 597
  • 82814 Gene 23 65
  • Gene 3 9999
  • 13428 Gene 11000 16665
  • 13428 Source 1 16665

Susan B. Davidson, Biol537/CIS636, Fall 2003
25
Simple queries
Print all sequences with length less than 10000.
SELECT FROM Sequence WHERE Length lt 10000
ID Length Seq Mod
82814 597 ccagctaa...
1-07-95 76582 9976 actgga
2-29-00
Type Source Gene Gene Gene
Source
SELECT Type FROM Feature
Print the type of all features.
Susan B. Davidson, Biol537/CIS636, Fall 2003
26
Distinct
  • Note that SQL did not eliminate duplicates. We
    need to request this explicitly.

SELECT DISTINCT Type FROM Feature
Print the type of all features (no duplicates).
Type Source Gene
Susan B. Davidson, Biol537/CIS636, Fall 2003
27
Pattern Matching
  • Can be used in where clause. _ denotes any
    character, 0 or more characters.

SELECT FROM Sequence WHERE Seq LIKE a_g'
ID Length Seq Mod
98608 18976 accgcct...
2-14-98 76582 9976 actgga
2-29-00
Susan B. Davidson, Biol537/CIS636, Fall 2003
28
Arithmetic
  • as can be used to label columns in the output
    arithmetic can be used to compute results

SELECT DISTINCT ID, To-From1 as Length FROM
Feature
  • ID Length
  • 82814 597
  • 82814 43
  • 13428 9997
  • 13428 5666
  • 13428 16665

Susan B. Davidson, Biol537/CIS636, Fall 2003
29
Set operations -- union
SELECT ID FROM Sequence WHERE Lengthlt10000 UNION
SELECT ID FROM Feature WHERE TypeSource
ID 76582 82814 13428
  • Duplicates do not occur in the union.

Susan B. Davidson, Biol537/CIS636, Fall 2003
30
The UNION ALL operator preserves duplicates
SELECT ID FROM Sequence WHERE Lengthlt10000 UNION
ALL SELECT ID FROM Feature WHERE TypeSource
Susan B. Davidson, Biol537/CIS636, Fall 2003
31
Intersection and difference
SELECT Id FROM Sequence INTERSECT SELECT Id FROM
Feature
SELECT Id FROM Sequence MINUS SELECT Id FROM
Feature
Susan B. Davidson, Biol537/CIS636, Fall 2003
32
Products
SELECT FROM Sequence,Feature
  • ID Length Seq
    Mod ID Type From To
  • 82814 597 ccagctaa...
    1-07-95 82814 Source 1 597
  • 98608 18976 accgcct...
    2-14-98 82814 Source 1 597
  • 16665 gtgtaa. 1-19-97
    82814 Source 1 597
  • 76582 9976 actgga
    2-29-00 82814 Source 1 597
  • . (lots more!)
  • Note that the ID column name is duplicated in the
    output.

Susan B. Davidson, Biol537/CIS636, Fall 2003
33
Conditional join
SELECT FROM Sequence, Feature WHERE Sequence.Id
Feature.Id
  • ID Length Seq
    Mod ID Type From To
  • 82814 597 ccagctaa...
    1-07-95 82814 Source 1 597
  • 82814 597 ccagctaa...
    1-07-95 82814 Gene 23 65
  • 13428 16665 gtgtaa.
    1-19-97 13428 Gene 3 9999
  • 16665 gtgtaa. 1-19-97
    13428 Gene 11000 16665
  • 13428 16665 gtgtaa.
    1-19-97 13428 Source 1 16665

Susan B. Davidson, Biol537/CIS636, Fall 2003
34
Counting
SELECT COUNT() FROM Feature
Print the number of feature entries.
Print the number of types of features.
  • Surprisingly, the answer to both of these is the
    following

SELECT COUNT(Type) FROM Feature
Susan B. Davidson, Biol537/CIS636, Fall 2003
35
Counting, cont.
  • To fix this, we use the keyword DISTINCT
  • Can also use SUM, AVG, MIN and MAX.

SELECT COUNT(DISTINCT Type) FROM Feature
Susan B. Davidson, Biol537/CIS636, Fall 2003
36
Group by
  • So far, these aggregate operators have been
    applied to all qualifying tuples. Sometimes we
    want to apply them to each of several groups of
    tuples.
  • For example Print the type and number of
    features of each type.

Susan B. Davidson, Biol537/CIS636, Fall 2003
37
Group by
SELECT Type, COUNT() FROM Feature GROUP BY Type
  • Note that only the columns that appear in the
    GROUP BY statement and aggregated columns can
    appear in the output. So the following would
    generate an error.

SELECT Type, From, To, COUNT() FROM
Feature GROUP BY Type
Susan B. Davidson, Biol537/CIS636, Fall 2003
38
Group by having
  • HAVING is to GROUP BY as WHERE is to FROM
  • HAVING is used to restrict the groups that
    appear in the result.

SELECT Type, COUNT() FROM Feature WHERE From-To
gt 50 GROUP BY Type HAVING AVG(From-To)gt 8500
Susan B. Davidson, Biol537/CIS636, Fall 2003
39
Summary
  • SQL is relationally complete allows you to
    perform operators in an algebra of relations
    (the relational algebra).
  • Additional features string comparisons, set
    membership, arithmetic and grouping.
  • In contrast, Entrez is a much more limited
    language.

Susan B. Davidson, Biol537/CIS636, Fall 2003
40
A Little Exercise
41
A Little Exercise
  • Given the table pet below let us formulate some
    queries

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
42
A Little Exercise
  • Get all pet names

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
43
A Little Exercise
  • Select name from pet
  • Get all owners and list them only once

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
44
A Little Exercise
  • Select distinct owner from pet
  • Select the names of all birds

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
45
A Little Exercise
  • Select name from pet where speciesbird
  • Select the names of all female birds

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
46
A Little Exercise
  • Select name from pet where speciesbird and
    sexf
  • Select names and owners

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
47
A Little Exercise
  • Select name,owner from pet
  • Select owners of birds and dogs

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
48
A Little Exercise
  • Select owner from pet where speciesbird or
    speciesdog
  • Select all owners starting with Dia

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
49
A Little Exercise
  • Select owner from pet where owner like Dia
  • How many pets has Gwen?...

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
50
A Little Exercise
  • Select count(name) from pet where ownerGwen
  • Select owners of male pets in sorted order

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
51
A Little Exercise
  • Select owner from pet where sexm order by
    owner
  • List owners and the number of pets they have

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
52
A Little Exercise
  • Select owner, count(name) from pet group by
    owner
  • List owners and the number of pets they have in
    descending order

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
53
A Little Exercise
  • Select owner, count(name) as num from pet group
    by owner order by num desc
  • List owners and the number of pets they have in
    descending order, but only if they have more than
    1 pet

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
54
A Little Exercise
  • Select owner, count(name) as num from pet group
    by owner having numgt1 order by num desc
  • List all pairs of cats and dogs

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
55
A Little Exercise
  • Select p1.name, p1.species, p2.name, p2.species
    from pet as p1, pet as p2 where p1.speciesdog
    and p2.speciescat
  • Select all male/female pairs of the same species

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
56
A Little Exercise
  • Select p1.name, p1.species, p1.sex, p2.name,
    p2.species, p2.sex from pet as p1, pet as p2
    where p1.speciesp2.species and p1.sexm and
    p2.sexf
  • Can we write p1.sex ! p2.sex instead?

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
57
A Little Exercise
  • We would get the pair Whistler and Chirpy as well

---------------------------------------------
------------ name owner species
sex birth death ---------------
------------------------------------------
Whistler Gwen bird 0000-00-00
NULL Chirpy Gwen bird f
1998-09-11 0000-00-00 Bowser Diane
dog m 1979-08-31 1995-07-29 Fang
Benny dog m 1990-08-27
0000-00-00 Buffy Harold dog f
1989-05-13 0000-00-00 Claws Gwen
cat m 1994-03-17 0000-00-00
Fluffy Harold cat f 1993-02-04
0000-00-00 Slim Benny snake m
1996-04-29 0000-00-00 -------------------
--------------------------------------
58
GO as Database
  • Show all the tables belonging to the GeneOntology.

59
GO as Database
  • show tables
  • mysqlgt
  • -----------------------
  • Tables_in_GO
  • -----------------------
  • assoc_rel
  • association
  • association_qualifier
  • db
  • db_info
  • dbxref
  • evidence
  • evidence_dbxref
  • gene_product
  • gene_product_count

gene_product_property gene_product_seq
gene_product_synonym graph_path
graph_path2term
instance_data seq
seq_dbxref seq_property
source_audit species
term term2term
term_audit
term_dbxref term_definition
term_synonym -------------------
----
60
GO as Database
  • What different term types are there in the term
    table?

61
GO as Database
  • select distinct(term_type) from term
  • --------------
  • term_type
  • --------------
  • component
  • function
  • process
  • relationship
  • sequence
  • synonym_type
  • universal
  • --------------

62
GO as Database
  • What different types of synonyms are there?

63
GO as Database
  • select from term where term_type"synonym_type"
  • ---------------------------------------------
    --------------
  • id name term_type acc
    is_obsolete is_root
  • ---------------------------------------------
    --------------
  • 7 alt_id synonym_type alt_id
    0 0
  • 20 exact synonym_type exact
    0 0
  • 47 related synonym_type related
    0 0
  • 50 narrow synonym_type narrow
    0 0
  • 105 broad synonym_type broad
    0 0
  • ---------------------------------------------
    --------------

64
GO as Database
  • How many entries are in the term table?

65
GO as Database
  • select count() from term
  • ----------
  • count()
  • ----------
  • 18091
  • ----------

66
GO as Database
  • Do all entries have different names?

67
GO as Database
  • select count(distinct name) from term
  • ----------------------
  • count(distinct name)
  • ----------------------
  • 17947
  • ----------------------

68
GO as Database
  • GeneOntology breaks down into three subontologies
    for biological process, molecular functions, and
    cellular components.
  • How many terms are there for each of them?

69
GO as Database
  • select count() from term where
    term_type"component"
  • ----------
  • count()
  • ----------
  • 1489
  • ----------
  • select count() from term where
    term_type"function"
  • ----------
  • count()
  • ----------
  • 7437
  • ----------
  • select count() from term where
    term_type"process"
  • ----------
  • count()
  • ----------
  • 9055
  • ----------

70
GO as Database
  • Which terms contain "GTPase" and "signal"?

71
GO as Database
  • select id,name,acc from term where name like
    "GTPasesignal"
  • ------------------------------------------------
    -------------------------------------
  • id name
    acc
  • ------------------------------------------------
    -------------------------------------
  • 4280 small GTPase mediated signal
    transduction GO0007264
  • 17832 regulation of small GTPase mediated
    signal transduction GO0051056
  • 17833 positive regulation of small GTPase
    mediated signal transduction GO0051057
  • 17834 negative regulation of small GTPase
    mediated signal transduction GO0051058
  • ------------------------------------------------
    -------------------------------------

72
GO as Database
  • What are the children of small GTPase mediated
    signal transduction?
  • What does the term2term table look like?

73
GO as Database
  • select from term2term limit 5
  • ----------------------------------------------
    ----------
  • id relationship_type_id term1_id term2_id
    complete
  • ----------------------------------------------
    ----------
  • 1 2 1 10
    0
  • 2 2 10 9
    0
  • 3 2 10 13
    0
  • 4 2 1 26
    0
  • 5 2 26 25
    0
  • ----------------------------------------------
    ----------

74
GO as Database
  • What does relationship type mean? Let's try a
    join with the term table.

75
GO as Database
  • select term2term.,name from term2term,term where
    term2term.relationship_type_idterm.id limit 5
  • ----------------------------------------------
    ----------------
  • id relationship_type_id term1_id term2_id
    complete name
  • ----------------------------------------------
    ----------------
  • 1 2 1 10
    0 is_a
  • 2 2 10 9
    0 is_a
  • 3 2 10 13
    0 is_a
  • 4 2 1 26
    0 is_a
  • 5 2 26 25
    0 is_a
  • ----------------------------------------------
    ----------------

76
GO as Database
  • Ok, so what are the children of small GTPase
    mediated signal transduction?

77
GO as Database
  • select
  • term.
  • from
  • term,
  • term2term
  • where
  • term1_id 4280 and
  • term2_id term.id
  • ------------------------------------------------
    -----------------------------------------------
    --------------
  • id name
    term_type acc
    is_obsolete is_root
  • ------------------------------------------------
    -----------------------------------------------
    --------------
  • 4281 Ras protein signal transduction
    process GO0007265
    0 0
  • 4282 Rho protein signal transduction
    process GO0007266
    0 0
  • 8301 Rac protein signal transduction
    process GO0016601
    0 0
  • 17832 regulation of small GTPase mediated
    signal transduction process GO0051056
    0 0
  • ------------------------------------------------
    -----------------------------------------------
    --------------

78
GO as Database
  • Let's look at some of the other tables
  • select from seq limit 1
  • ----------------------------------------------
  • id display_id description seq
  • --------------------------------------------
  • 1 P91926 Alpha-adaptin homolog.
    MAPVRGDG...
  • ----------------------------------------------
    ------

79
GO as Database
  • select from species limit 1
  • ----------------------------------------------
    -------------------------------------
  • id ncbi_taxa_id common_name
    lineage_string genus species
  • ----------------------------------------------
    -------------------------------------
  • 1 199350 NULL NULL
    Influenza C virus (C/Miyagi/9/96)
  • ----------------------------------------------
    -------------------------------------

80
GO as Database
  • Is the mammoth among the species?
  • select from species where common_name like
    "mammoth"
  • ----------------------------------------------
    ----------------
  • id ncbi_taxa_id common_name genus
    species
  • ----------------------------------------------
    ----------------
  • 184126 37349 woolly mammoth
    Mammuthus primigenius
  • ----------------------------------------------
    ----------------

81
GO as Database
  • select from species where common_name like
    "mouse"
  • -----------------------------------------------
    -----------------------------------------------
    --------------
  • id ncbi_taxa_id common_name
    lineage_string genus
    species
  • -----------------------------------------------
    -----------------------------------------------
    --------------
  • 6265 109676 Kazbeg birch mouse
    NULL Sicista
    kazbegica
  • 6710 13151 European harvest mouse
    NULL Micromys
    minutus
  • 11428 187268 Plains Mouse
    NULL Pseudomys
    australis
  • 187797 214933 Sichuan field mouse
    NULL Apodemus
    latronum
  • 187799 214934 Ward's field mouse
    NULL Apodemus wardi
  • 188302 30635 hairy-tailed bolo mouse
    NULL Bolomys
    lasiurus
  • 188392 60742 Temminck's mouse
    NULL Mus
    musculoides
  • 188395 60745 fiery spiny mouse
    NULL Acomys
    ignitus
  • 188397 60746 golden spiny mouse
    NULL Acomys
    russatus
  • 190875 55145 forest dormouse
    NULL Dryomys
    nitedula
  • 190879 55147 Japanese dormouse
    NULL Glirulus
    japonicus
  • -----------------------------------------------
    -----------------------------------------------
    --------------
  • 245 rows in set (0.39 sec)

82
GO as Database
  • select from species where common_name like
    "house mouse"
  • ----------------------------------------------
    ----------------------
  • id ncbi_taxa_id common_name
    lineage_string genus species
  • ----------------------------------------------
    ----------------------
  • 118651 10090 house mouse NULL
    Mus musculus
  • ----------------------------------------------
    ----------------------

83
GO as Database
  • select from evidence limit 5
  • ---------------------------------------------
    -
  • id code association_id dbxref_id
    seq_acc
  • ---------------------------------------------
    -
  • 1 ND 1 30885
  • 2 ND 2 30885
  • 3 ND 3 30885
  • 4 ISS 4 30888
  • 5 ISS 5 30888
  • ---------------------------------------------
    -

84
GO as Database
  • select distinct code from evidence
  • code
  • ------
  • IC
  • IDA
  • IEA
  • IEP
  • IGI
  • IMP
  • IPI
  • ISS
  • NAS
  • ND
  • NR
  • TAS
  • ------

85
GO as Database
  • select from gene_product limit 5
  • ---------------------------------------------
    ----------------------------------------------
  • id symbol dbxref_id species_id
    secondary_species_id type_id full_name
  • ---------------------------------------------
    ----------------------------------------------
  • 1 29C 30884 146087
    NULL 18012 Protein 29C (Fragment).
  • 2 2C 30886 146087
    NULL 18012 coiled-coil protein
  • 3 7E 30889 146087
    NULL 18012 coiled-coil protein
  • 4 AAC11 30890 146087
    NULL 18012 AAC-rich mRNA
  • 5 AAC4 30892 146087
    NULL 18012 AAC-rich mRNA
  • ---------------------------------------------
    ----------------------------------------------

86
GO as Database
  • select from association limit 5
  • ---------------------------------------------
    ------------------------------
  • id term_id gene_product_id is_not
    role_group assocdate source_db_id
  • ---------------------------------------------
    ------------------------------
  • 1 8 1 0
    NULL 20040707 1
  • 2 2649 1 0
    NULL 20040707 1
  • 3 4985 1 0
    NULL 20040707 1
  • 4 2402 2 0
    NULL 20040720 1
  • 5 2715 2 0
    NULL 20040720 1
  • ---------------------------------------------
    ------------------------------

87
GO as Database
  • select from gene_product_seq limit 1
  • -----------------------------------------
  • gene_product_id seq_id is_primary_seq
  • -----------------------------------------
  • 5393 1 NULL
  • -----------------------------------------

88
GO as Database
  • List the genes with symbol "BRC1" with their
    species and display_id

89
GO as Database
  • select
  • gp.symbol,
  • s.common_name,
  • s.genus,
  • s.species,
  • seq.display_id
  • from
  • gene_product as gp,
  • species as s,
  • seq
  • where
  • gp.symbol like "BRC1" and
  • gp.species_id s.id and
  • gp.id seq.id

90
GO as Database
  • -----------------------------------------------
    ---------------------------
  • symbol common_name genus
    species display_id
  • -----------------------------------------------
    ---------------------------
  • brc1 fission yeast Schizosaccharomyces
    pombe Q8L743
  • BRC1_MACMU rhesus monkey Macaca
    mulatta P30627
  • BRC1_RAT Norway rat Rattus
    norvegicus P63412
  • BRC1_HUMAN human Homo
    sapiens Q7WEV1
  • BRC1_MOUSE house mouse Mus
    musculus Q82C72
  • BRC1_DROME fruit fly Drosophila
    melanogaster Q8VGU3
  • -----------------------------------------------
    ---------------------------

91
GO as Database
  • Show the annotated terms for BRC1_HUMAN

92
GO as Database
  • select
  • distinct
  • gp.symbol,
  • s.common_name,
  • s.genus,
  • s.species,
  • seq.display_id,
  • t.name,
  • t.acc
  • from
  • gene_product as gp,
  • species as s,
  • seq,
  • association as a,
  • term as t
  • where
  • gp.symbol like "BRC1_HUMAN" and
  • gp.species_id s.id and
  • gp.id seq.id and

93
GO as Database
  • ---------------------------------------------
    ---------------------------
  • symbol common_name genus species
    display_id name
  • ---------------------------------------------
    ---------------------------
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 ubiquitin ligase c
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 DNA binding
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 damaged DNA bindin
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 gamma-tubulin ring
  • ...
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 tubulin binding
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 transcriptional ac
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 protein ubiquitina
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 regulation of cell
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 regulation of apop
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 positive regulatio
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 negative regulatio
  • BRC1_HUMAN human Homo sapiens
    Q7WEV1 negative regulatio
  • ---------------------------------------------
    ---------------------------

94
Limits
  • List all descendants of a given term (Transitive
    Closure).
  • This query cannot be expressed in SQL, but easily
    with rules
  • Desc(X,Z) if term2term(X,Z).
  • Desc(X,Z) if term2term(X,Y), Desc(Y,Z).

95
Introduction to MySQL
96
SQL Structured Query Languagethe most common
standardized language used to access
databases. SQL has several parts DDL Data
Definition Language Defining, Deleting,
Modifying relation schemas DML Data
Manipulation Language Inserting, Deleting,
Modifying tuples in database Embedded SQL
defines how SQL statements can be used with
general-purposed programming
97
  • MySQL, the most popular Open Source SQL database,
    is developed, distributed and supported by MySQL
    AB.
  • MySQL is a relational database management system.
  • MySQL software is Open Source.
  • Written in C and C. Tested with a broad range
    of different compilers.
  • Works on many different platforms.
  • APIs for C, C, Eiffel, Java, Perl, PHP, Python,
    Ruby, and Tcl.
  • You can find MySQl manual and documentation at
    www .mysql.com/documentation/
  • You can download and install MySQL on your own
    computer (both under Windows and Linux)

98
MySQL To see a list of options provided by
mysql, invoke it with the --help option shellgt
mysql --help Using SQL On any linux you have
to use this to log on to MySQL shellgt
/usr/local/mysql/bin/mysql -h hostname -D
loginname -p shellgt mysql -h host -u user -p
Enter password The
represents your password enter it when mysql
displays the Enter password prompt.
99
  • Basic Query
  • select A1, A2,,An
  • from r1, r2, ,rm
  • where P
  • A1, A2,,An represent attributes
  • r1, r2, rm represent relations
  • P represents predicate (guard condition)
  • Keywords may be entered in any letter case
  • mysqlgt SELECT VERSION(), CURRENT_DATE
  • mysqlgt select version(), current_date
  • mysqlgt SeLeCt vErSiOn(), current_DATE

100
Prompt Meaning mysqlgt Ready for new command.
-gt Waiting for next line of
multiple-line command. gt Waiting for
next line, collecting a string that begins
with a single quote ( ). gt
Waiting for next line, collecting a string that
begins with a double quote ( ). mysqlgt
SELECT -gt FROM my_table -gt WHERE
name Smith AND age lt 30 mysqlgt SELECT
FROM my_table WHERE name "Smith AND age lt
30 "gt "\c mysqlgt \c to cancel the execution
of a command
101
  • Basic Database Operation
  • Create a database
  • Create a table
  • Load data into the table
  • Retrieve data from the table in various ways
  • Use multiple tables
  • Suppose you have several pets in your home (your
    menagerie) and you'd like to keep track of
    various types of information about them. You can
    do so by creating tables to hold your data and
    loading them with the desired information. Then
    you can answer different sorts of questions about
    your animals by retrieving data from the tables.

102
Creating and Using a Database mysqlgt SHOW
DATABASES SHOW statement can be used to find
out the databases currently existing on the
server mysqlgt USE testdb testdb is a database
name. USE command does not need a semi colon and
must be given in a single line. Database needs to
be invoked in order to use it. mysqlgt CREATE
DATABASE example Database names are
case-sensitive unlike keywords Same applies for
table names So example ! Example ! EXAMPLE or
some other variant
103
Creating a Table mysqlgt SHOW TABLES Displays
the current list of tables mysqlgt CREATE TABLE
pet (name VARCHAR(20), owner VARCHAR(20),
-gt species VARCHAR(20), sex CHAR(1), birth DATE,
death DATE) mysqlgt SHOW TABLES Will display
the table with the table name pet Verification
of the table can be done with DESCRIBE
command mysqlgt DESCRIBE pet ------------------
----------------------------------------------
Field Type Null Key Default
Extra -------------------------------------
--------------------------- name
varchar(20) YES NULL owner
varchar(20) YES NULL species
varchar(20) YES NULL sex char(1)
YES NULL birth date
YES NULL death date YES
NULL
104
Loading Data into a Table LOAD DATA uses a text
file with single record in a line that match the
attributes in the table. Useful for inserting
when multiple records are involved. Example
pet.txt is a text file with a single
record Name owner species sex birth death
Whistler Gwen bird \N 1997-12-09 \N
mysqlgt LOAD DATA LOCAL INFILE "pet.txt" INTO
TABLE pet INSERT command can be used when
records needs to be inserted one at a time. NULL
can be directly
Write a Comment
User Comments (0)
About PowerShow.com