LSM3241: Bioinformatics and Biocomputing Lecture 3: Molecular database development Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg http://bidd.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

LSM3241: Bioinformatics and Biocomputing Lecture 3: Molecular database development Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg http://bidd.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore

Description:

LSM3241: Bioinformatics and Biocomputing Lecture 3: Molecular database development Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz_at_nus.edu.sg http://bidd.nus.edu.sg – PowerPoint PPT presentation

Number of Views:209
Avg rating:3.0/5.0
Slides: 57
Provided by: dbsczw
Category:

less

Transcript and Presenter's Notes

Title: LSM3241: Bioinformatics and Biocomputing Lecture 3: Molecular database development Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg http://bidd.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore


1
LSM3241 Bioinformatics and BiocomputingLecture
3 Molecular database developmentProf. Chen Yu
Zong Tel 6516-6877Email csccyz_at_nus.edu.sghtt
p//bidd.nus.edu.sgRoom 07-24, level 7, SOC1,
National University of Singapore
2
Development of a protein inhibitor database
  • Step 1 Start from a protein (alternative names
    from Swissprot)
  • Step 2 Search for inhibitors using protein name
    and alternative names (from Medline, or Google)
  • Step 3 Find additional information about each of
    the inhibitors (from ChemIDplus database)
  • Step 4 Database construction

3
Assignment of a protein
  • Each student is given an Id number, use this
    number to get your protein from the protein list
    posted on the module home page.
  • In case that you wish to work on a different
    protein instead of the one assigned to you, get
    another one from a list of back-up proteins
    posted on the module home page.
  • Earn extra credit by working on extra number of
    proteins

4
Inhibitor search
  • Search Medline by using keywords (protein name
    inhibitor).
  • In case no inhibitor or too few inhibitors can be
    found from medline, search google or yahoo as an
    alternative source.
  • Make sure the protein name and its alternative
    names (get from Swissprot) are used in the
    search.
  • Record inhibitor name and reference source
    (journal issue, page and publication year
    internet source etc.).
  • Record the number of references checked
    (including those found from the keyword-search
    but without inhibitors)
  • Try to find as many inhibitors as you can. The
    more known inhibitors you get, the more credit
    you can earn.

5
Finding chemical information
  • Search ChemIDPlus database (http//chem.sis.nlm.ni
    h.gov/chemidplus/) for additional information of
    each inhibitor
  • Before using ChemIDPlus, download Chime from MDL

Click here to search
Select Equals for inhibitor name Select
Contains for other keywords
Input inhibitor name
Click here to select Chime option in ChemIDPlus
(Chime is needed to view structure) More Info
at http//sis.nlm.nih.gov/Chem/ChemMain.html
6
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
Click here to get full record
Click here to get structure
7
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
8
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
9
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
10
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
11
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
12
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
13
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
Right-click in this field to get 3D structure
file
14
ChemIDPlus database
Additional information of each inhibitor may be
found from this database How to use this
database?
  • Steps for saving the file
  • Open a word or text file
  • Select all text in the source file
  • Copy (using CtrlC) and paste (using CtrlV) to
    the word or text file
  • Save the word or text file as chem_name.doc or
    chem_name.txt

15
Generating 3D structure from 2D structure
Use of ViewerPro software
Save 2D structure as a MDL file
16
ViewerPro software
Load 2D structure MDL file (viracept.mol)
Save 2D structure as a MDL file
17
ViewerPro software
Save to 3D structure select File gt Save as (pdb
format)
Save 2D structure as a MDL file
18
ViewerPro software
Re-load pdb file and clean structure select
Modify gt Clean Structure (click here 10 times)
Save 2D structure as a MDL file
19
ViewerPro software
Save as pdb file select File gt Save as (pdb
file)
Save 2D structure as a MDL file
20
Database construction
Information needed to construct
database Inhibitor Name Aspirin     Protein
Target Cox I   (SwissProt accession
number) 1 CAS Number 50-78-2    2 Formula
C9-H8-O4    2 Systematic Name O-Acetylsalicylic
acid, Salicylic acid, acetate    Synonyms
2-(Acetyloxy)benzoic acid, 2-Acetoxybenzoic acid
2, A.S.A. empirin 2 Classification
Analgesic,  Anti-inflammatory agents,
non-steroidal     Notes The prototypical
analgesic used in the treatment of mild to
moderate pain. It has anti-inflammatory and
antipyretic properties and acts as an inhibitor
of cyclooxygenase which results in the
inhibition of the biosynthesis of prostaglandins.
Aspirin also inhibits platelet aggregation and
is used in the prevention of arterial and venous
thrombosis. (From Martindale, The Extra
Pharmacopoeia, 30th ed,p5)     References
1OMIM 176805 (http//www.ncbi.nlm.nih.gov/htbin-
post/Omim/dispmim?176805) 2ChemIDplus Database
(http//chem.sis.nlm.nih.gov/chemidplus/) 3D
structure aspirin.pdb
21
Database construction
Database development
22
Database construction
Database development
23
Database construction
Database development
24
Database Construction
  • Database Definition
  • A Database is a shared collection of
    Inter-Related data designed to meet the needs of
    multiple types of users and applications.
  • This implies that multiple user VIEWS can be
    defined
  • Data stored is independent of the programs which
    use it
  • Data is structured to provide a basis for future
    applications
  • DATABASE Stored Collection of Related Data
  • May be physically
    distributed

25
Database construction
Database development
26
Database Construction
  • Data/Information
  • A General Definition
  • DATA - raw (unprocessed or partly processed)
    facts which represent the state of
    entities (things) which have
    occurred
  • INFORMATION - data which has been processed into
    a form USEFUL TO THE USER
  • What is Information to one user may be Data to
    another user.

27
Database construction
Database development
28
Database Models - Hierarchical

owner/parent child /parent
owner
member
child
child/parent
29
Database Models - Network
set of data
owner
member
owner
member
set of data
Note Only linked sets can be accessed
30
Database Models - Relational
table table
table table
table A B
C
D E
Any table(s) can be joined to any other table(s),
provided there is a means of effecting the
join Primary key / Foreign key concept. Data
redundancy No fixed linkages
31
Example 2 Relations in a Database
EMPNUM NAME Date of Birth
DEPTNUM 3 JONES
16-05-1956 605
7 SMITH 23-09-1965 432 11
ADAMS 11-08-1972 201 15
NGUYEN 23-10-1964 314 18 PHAN 16-11-197
6 201 Relation (Table) Name EMP Relation
Schema EMP(empnum,name,date of birth,deptnum)
DEPTNUM DEPTNAME 201
Production 314 Finance 432
Information Systems 605 Administration
Relation (Table) Name DEPT Relation Schema
DEPT(deptnum, deptname)
32
Definition of a Relational Database
  • A relational database is a collection of
    relations or two-dimensional tables.

Database
33
Data Models
Model ofsystemin clientsmind
Entity model ofclients model
Table modelof entity model
Tables on disk
34
Relational Database Management System
Server
User tables
Data dictionary
35
Database Components
  • 1. Back End Engine
  • Used for Disk Input/Output processes
  • (Read/Write/Find)
  • 2. Front End Processor
  • Data manipulation
  • String/Arithmetic/Statistical operations
  • 3. DBMS Interface
  • Data Definition Language (DDL) Data
    Manipulation Language (DML)
  • 4.Programmer Interface
  • Applications Environment (4GLs, Embedded
    capability)

36
String Matching
  • Based on 1. Analysis of query pattern
  • 2. Auxiliary table
  • 3. Character mapping
  • 1 2 3 4 5 6 7 8 9 10 11
  • L E C T U R E R O O M
  • L occurs in position 1 E occurs
    in positions 2 and 7
  • C occurs in position 3 T occurs
    in position 4
  • U occurs in position 5 R occurs in
    positions 6 and 8
  • O occurs in positions 9 and 10 M occurs in
    position 11
  • Matching starts ltRightgt character and
    progressively shifts left on match

37
Database construction
Database development data format
38
Database construction
Database development data languages
39
Database construction
Database development An example of data language
40
Communicating with a RDBMS Using SQL
41
Database construction
  • Data Definition Language (DDL)
  • used to describe data at the database level
  • Schema level - complete database description
  • Sub-Schema level - user views (restricted)
  • Data Manipulation Language (DML)
  • Provides for Create Insert
  • Update Retrieve (extract)
  • Delete Drop
  • Modify Calculation
  • Report
  • capabilities

42
Using Oracle
  • SQL
  • Structured Query Language DDL, DML, DCL
  • Universally supported by database servers
  • SQLPlus
  • Oracle command-line SQL interpreter
  • Account (schema) independent working space

43
SQL data types
Type description Oracle SQL
variable-length char. string VARCHAR2(l)a
fixed-length char. string CHAR(l) a
number NUMBER(p,s)b
currency NUMBER(10,2)
date DATE
a length.b length, scale.
44
General Data Types
  • Used to augment an attribute description and to
    provide a means of Integrity
  • Normal data types are
  • Character (or text)
  • Numeric - Integer, Decimal, Money, Float
  • (in Access Long and Short Integer,
  • Decimal with options of a number of decimal
    places)
  • Date - Standard date format - Access has
    various forms
  • Logical - Yes/No True/False Exists/Does
    Not Exist

45
Data Attribute Size
  • In many cases this is set by the Data Definition
    facilities
  • e.g.. Date, Short Integer, Long Integer,
    Logical,Number
  • Others are set by the Designer
  • Number of Characters, Size of a Decimal
    Number
  • Access allows for a Default value
  • Duplicates/ No Duplicates
    allowed
  • Primary Key nomination
  • Indexing
  • Required Status of an Attribute

46
DDL Example
  • CREATE TABLE books (
  • title VARCHAR2(60),
  • author VARCHAR2(60),
  • isbn NUMBER(10,0)
  • CONSTRAINT pk_books PRIMARY KEY,
  • pub_date DATE DEFAULT SYSDATE
  • )
  • PRIMARY KEY column serves as a unique identifier
  • SYSDATE returns the systems current date
  • terminates a SQL statement

47
DDL Example
  • CREATE TABLE book_reviews (
  • isbn NUMBER(10,0)
  • CONSTRAINT fk_books_booksrev REFERENCES
    books(isbn),
  • reviewer VARCHAR2(30),
  • comments VARCHAR2(150)
  • )
  • REFERENCES value in this column must be one of
    the values in the isbn column of the table
    books. This is called referential integrity.

48
DML Example
  • INSERT INTO books VALUES(
  • The Importance of Being Earnest,
  • Oscar Wilde, -- this is a comment
  • 9876543210,
  • 14-FEB-1895
  • )
  • -- Comment. The part from this symbol till the
    end of the line will be ignored by the database
    server.
  • 14-FEB-1895 Date format. DD-MMM-YYYY or
    DD-MMM-YY

49
DML Example
  • SELECT FROM books
  • SELECT FROM books
  • WHERE author Oscar Wilde
  • every column
  • Oscar Wilde strings are quoted by . If is
    part of the string, it is escaped by double it
    (e.g. string ab should be ab in SQL)

50
DML Example
  • SELECT title FROM books
  • WHERE author LIKE Oscar
  • You will get The Importance of Being Earnest
  • LIKE fuzzy matching (in contrast to )
  • wild character
  • represents any string of any length (Oscr
    matches Oscar, Osceer)
  • _ represents any single character (Osc_r
    matches Oscar, Oscer)

51
DML Example
  • COMMIT
  • ROLLBACK
  • if you have modified the data in database, such
    as inserted or deleted an entry, you can either
  • COMMIT to make this change permanent, or
  • ROLLBACK to cancel your operation (the data will
    look as if you just finished your last COMMIT).

52
Further readings
  • Oracle TechNet
  • http//otn.oracle.com/index.html
  • Free subscription
  • Online tutorials
  • http//oldweb.uwp.edu/academic/mis/baldwin/sqlplus
    .htm
  • http//www.ilook.fsnet.co.uk/index/oracle.htm
  • Google for more!

53
Database construction
Database development webpage http//bidd.nus.edu
.sg/group/teach/submit/welcome.asp
54
Database construction
Database development webpage http//bidd.nus.edu
.sg/group/teach/submit/welcome.asp
55
Database construction
Database development webpage http//bidd.nus.edu
.sg/group/teach/submit/welcome.asp
56
Database construction
Database development webpage http//bidd.nus.edu
.sg/group/teach/submit/welcome.asp
57
Summary of Todays lecture
  • Molecular database development
  • Protein inhibitor search
  • Getting additional info about selected inhibitors
    (using ChemIDPlus database)
  • Generating 3D structure (using DS ViewerPro)
  • Database construction
Write a Comment
User Comments (0)
About PowerShow.com