Title: LSM3241: Bioinformatics and Biocomputing Lecture 3: Molecular database development Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg http://bidd.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore
1LSM3241 Bioinformatics and BiocomputingLecture
3 Molecular database developmentProf. Chen Yu
Zong Tel 6516-6877Email csccyz_at_nus.edu.sghtt
p//bidd.nus.edu.sgRoom 07-24, level 7, SOC1,
National University of Singapore
2Development of a protein inhibitor database
- Step 1 Start from a protein (alternative names
from Swissprot) - Step 2 Search for inhibitors using protein name
and alternative names (from Medline, or Google) - Step 3 Find additional information about each of
the inhibitors (from ChemIDplus database) - Step 4 Database construction
3Assignment of a protein
- Each student is given an Id number, use this
number to get your protein from the protein list
posted on the module home page. - In case that you wish to work on a different
protein instead of the one assigned to you, get
another one from a list of back-up proteins
posted on the module home page. - Earn extra credit by working on extra number of
proteins
4Inhibitor search
- Search Medline by using keywords (protein name
inhibitor). - In case no inhibitor or too few inhibitors can be
found from medline, search google or yahoo as an
alternative source. - Make sure the protein name and its alternative
names (get from Swissprot) are used in the
search. - Record inhibitor name and reference source
(journal issue, page and publication year
internet source etc.). - Record the number of references checked
(including those found from the keyword-search
but without inhibitors) - Try to find as many inhibitors as you can. The
more known inhibitors you get, the more credit
you can earn.
5Finding chemical information
- Search ChemIDPlus database (http//chem.sis.nlm.ni
h.gov/chemidplus/) for additional information of
each inhibitor - Before using ChemIDPlus, download Chime from MDL
Click here to search
Select Equals for inhibitor name Select
Contains for other keywords
Input inhibitor name
Click here to select Chime option in ChemIDPlus
(Chime is needed to view structure) More Info
at http//sis.nlm.nih.gov/Chem/ChemMain.html
6ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
Click here to get full record
Click here to get structure
7ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
8ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
9ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
10ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
11ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
12ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
13ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
Right-click in this field to get 3D structure
file
14ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
- Steps for saving the file
- Open a word or text file
- Select all text in the source file
- Copy (using CtrlC) and paste (using CtrlV) to
the word or text file - Save the word or text file as chem_name.doc or
chem_name.txt
15Generating 3D structure from 2D structure
Use of ViewerPro software
Save 2D structure as a MDL file
16ViewerPro software
Load 2D structure MDL file (viracept.mol)
Save 2D structure as a MDL file
17ViewerPro software
Save to 3D structure select File gt Save as (pdb
format)
Save 2D structure as a MDL file
18ViewerPro software
Re-load pdb file and clean structure select
Modify gt Clean Structure (click here 10 times)
Save 2D structure as a MDL file
19ViewerPro software
Save as pdb file select File gt Save as (pdb
file)
Save 2D structure as a MDL file
20Database construction
Information needed to construct
database Inhibitor Name Aspirin Protein
Target Cox I (SwissProt accession
number) 1 CAS Number 50-78-2 2 Formula
C9-H8-O4 2 Systematic Name O-Acetylsalicylic
acid, Salicylic acid, acetate Synonyms
2-(Acetyloxy)benzoic acid, 2-Acetoxybenzoic acid
2, A.S.A. empirin 2 Classification
Analgesic, Anti-inflammatory agents,
non-steroidal Notes The prototypical
analgesic used in the treatment of mild to
moderate pain. It has anti-inflammatory and
antipyretic properties and acts as an inhibitor
of cyclooxygenase which results in the
inhibition of the biosynthesis of prostaglandins.
Aspirin also inhibits platelet aggregation and
is used in the prevention of arterial and venous
thrombosis. (From Martindale, The Extra
Pharmacopoeia, 30th ed,p5) References
1OMIM 176805 (http//www.ncbi.nlm.nih.gov/htbin-
post/Omim/dispmim?176805) 2ChemIDplus Database
(http//chem.sis.nlm.nih.gov/chemidplus/) 3D
structure aspirin.pdb
21Database construction
Database development
22Database construction
Database development
23Database construction
Database development
24Database Construction
- Database Definition
- A Database is a shared collection of
Inter-Related data designed to meet the needs of
multiple types of users and applications. - This implies that multiple user VIEWS can be
defined - Data stored is independent of the programs which
use it - Data is structured to provide a basis for future
applications - DATABASE Stored Collection of Related Data
- May be physically
distributed
25Database construction
Database development
26Database Construction
- Data/Information
- A General Definition
- DATA - raw (unprocessed or partly processed)
facts which represent the state of
entities (things) which have
occurred - INFORMATION - data which has been processed into
a form USEFUL TO THE USER - What is Information to one user may be Data to
another user.
27Database construction
Database development
28Database Models - Hierarchical
owner/parent child /parent
owner
member
child
child/parent
29Database Models - Network
set of data
owner
member
owner
member
set of data
Note Only linked sets can be accessed
30Database Models - Relational
table table
table table
table A B
C
D E
Any table(s) can be joined to any other table(s),
provided there is a means of effecting the
join Primary key / Foreign key concept. Data
redundancy No fixed linkages
31Example 2 Relations in a Database
EMPNUM NAME Date of Birth
DEPTNUM 3 JONES
16-05-1956 605
7 SMITH 23-09-1965 432 11
ADAMS 11-08-1972 201 15
NGUYEN 23-10-1964 314 18 PHAN 16-11-197
6 201 Relation (Table) Name EMP Relation
Schema EMP(empnum,name,date of birth,deptnum)
DEPTNUM DEPTNAME 201
Production 314 Finance 432
Information Systems 605 Administration
Relation (Table) Name DEPT Relation Schema
DEPT(deptnum, deptname)
32Definition of a Relational Database
- A relational database is a collection of
relations or two-dimensional tables.
Database
33Data Models
Model ofsystemin clientsmind
Entity model ofclients model
Table modelof entity model
Tables on disk
34Relational Database Management System
Server
User tables
Data dictionary
35Database Components
- 1. Back End Engine
- Used for Disk Input/Output processes
- (Read/Write/Find)
- 2. Front End Processor
- Data manipulation
- String/Arithmetic/Statistical operations
- 3. DBMS Interface
- Data Definition Language (DDL) Data
Manipulation Language (DML) - 4.Programmer Interface
- Applications Environment (4GLs, Embedded
capability)
36String Matching
- Based on 1. Analysis of query pattern
- 2. Auxiliary table
- 3. Character mapping
- 1 2 3 4 5 6 7 8 9 10 11
- L E C T U R E R O O M
- L occurs in position 1 E occurs
in positions 2 and 7 - C occurs in position 3 T occurs
in position 4 - U occurs in position 5 R occurs in
positions 6 and 8 - O occurs in positions 9 and 10 M occurs in
position 11 - Matching starts ltRightgt character and
progressively shifts left on match
37Database construction
Database development data format
38Database construction
Database development data languages
39Database construction
Database development An example of data language
40Communicating with a RDBMS Using SQL
41Database construction
- Data Definition Language (DDL)
- used to describe data at the database level
- Schema level - complete database description
- Sub-Schema level - user views (restricted)
- Data Manipulation Language (DML)
- Provides for Create Insert
- Update Retrieve (extract)
- Delete Drop
- Modify Calculation
- Report
- capabilities
-
42Using Oracle
- SQL
- Structured Query Language DDL, DML, DCL
- Universally supported by database servers
- SQLPlus
- Oracle command-line SQL interpreter
- Account (schema) independent working space
43SQL data types
Type description Oracle SQL
variable-length char. string VARCHAR2(l)a
fixed-length char. string CHAR(l) a
number NUMBER(p,s)b
currency NUMBER(10,2)
date DATE
a length.b length, scale.
44General Data Types
- Used to augment an attribute description and to
provide a means of Integrity - Normal data types are
- Character (or text)
- Numeric - Integer, Decimal, Money, Float
- (in Access Long and Short Integer,
- Decimal with options of a number of decimal
places) - Date - Standard date format - Access has
various forms - Logical - Yes/No True/False Exists/Does
Not Exist
45Data Attribute Size
- In many cases this is set by the Data Definition
facilities - e.g.. Date, Short Integer, Long Integer,
Logical,Number - Others are set by the Designer
- Number of Characters, Size of a Decimal
Number - Access allows for a Default value
- Duplicates/ No Duplicates
allowed - Primary Key nomination
- Indexing
- Required Status of an Attribute
-
46DDL Example
- CREATE TABLE books (
- title VARCHAR2(60),
- author VARCHAR2(60),
- isbn NUMBER(10,0)
- CONSTRAINT pk_books PRIMARY KEY,
- pub_date DATE DEFAULT SYSDATE
- )
- PRIMARY KEY column serves as a unique identifier
- SYSDATE returns the systems current date
- terminates a SQL statement
47DDL Example
- CREATE TABLE book_reviews (
- isbn NUMBER(10,0)
- CONSTRAINT fk_books_booksrev REFERENCES
books(isbn), - reviewer VARCHAR2(30),
- comments VARCHAR2(150)
- )
- REFERENCES value in this column must be one of
the values in the isbn column of the table
books. This is called referential integrity.
48DML Example
- INSERT INTO books VALUES(
- The Importance of Being Earnest,
- Oscar Wilde, -- this is a comment
- 9876543210,
- 14-FEB-1895
- )
- -- Comment. The part from this symbol till the
end of the line will be ignored by the database
server. - 14-FEB-1895 Date format. DD-MMM-YYYY or
DD-MMM-YY
49DML Example
- SELECT FROM books
- SELECT FROM books
- WHERE author Oscar Wilde
- every column
- Oscar Wilde strings are quoted by . If is
part of the string, it is escaped by double it
(e.g. string ab should be ab in SQL)
50DML Example
- SELECT title FROM books
- WHERE author LIKE Oscar
- You will get The Importance of Being Earnest
- LIKE fuzzy matching (in contrast to )
- wild character
- represents any string of any length (Oscr
matches Oscar, Osceer) - _ represents any single character (Osc_r
matches Oscar, Oscer)
51DML Example
- COMMIT
- ROLLBACK
- if you have modified the data in database, such
as inserted or deleted an entry, you can either - COMMIT to make this change permanent, or
- ROLLBACK to cancel your operation (the data will
look as if you just finished your last COMMIT).
52Further readings
- Oracle TechNet
- http//otn.oracle.com/index.html
- Free subscription
- Online tutorials
- http//oldweb.uwp.edu/academic/mis/baldwin/sqlplus
.htm - http//www.ilook.fsnet.co.uk/index/oracle.htm
- Google for more!
53Database construction
Database development webpage http//bidd.nus.edu
.sg/group/teach/submit/welcome.asp
54Database construction
Database development webpage http//bidd.nus.edu
.sg/group/teach/submit/welcome.asp
55Database construction
Database development webpage http//bidd.nus.edu
.sg/group/teach/submit/welcome.asp
56Database construction
Database development webpage http//bidd.nus.edu
.sg/group/teach/submit/welcome.asp
57Summary of Todays lecture
- Molecular database development
- Protein inhibitor search
- Getting additional info about selected inhibitors
(using ChemIDPlus database) - Generating 3D structure (using DS ViewerPro)
- Database construction