Title: Databases.
1BIOLOGICAL DATABASES
M.Prasad Naidu MSc Medical Biochemistry, Ph.D,.
2INTRODUCTION
- The database
- must be maintained as a central shareable
resource - should provide easy-to-use software to access the
information (web-pages...) - has to be structurally organised and fully
annotated to find the information needed - should not contain redundant information
- should be error free
3Levels of protein sequence databases and
structural organisation
Primary database
Primary
Sequence
AVILDRYFH
Motif or Pattern
Secondary
AS-X-IL2-DE
Secondary database
Rosmann fold, GTP-binding domain...
Structure database
Tertiary
Domain
4Different Types Of Databases
- Primary Databases.
- Composite Databases.
- Secondary Databases.
5PRIMARY DATABASES
- In 1980, Due to the flooding of sequence
information, need to storage of sequence Data. - They contain sequence information.
- Eg NA Protein
- EMBL PIR
- Gen Bank MIPS
- DDBJ SWISS-Prot
- Tr-EMBL
- NRL-3D
6PIR
- Developed by National Biomedical Research
Foundation in 1960s by Margaret Dayhoff to
investigate evolutionary relationships between
proteins. - Maintained by PIR, an association of
Macromolecular sequence data collection centres - Pir at NBRF
- International protein information database of
Japan (JIPID). - Martinsried institute of Protein sequences
(MIPS).
7Quality of PIR Database
- Has been split into 4 different sections ranked
according to quality - PIR1 fully classified and annotated entries
- PIR2 includes preliminary entries (may include
redundancy) - PIR3 includes unverified entries
- PIR4 contains conceptual translations
8MIPS
- Collects and processes sequence Data for the PIR.
- Also distributed with Patch x ,a supplement of
unverified protein sequences from external
resources.
9SWISS-PROT database
- Produced by the Dept. of Medical Biochemistry at
University of Geneva and the EMBL in 1986. - Was transferred to EBI in1994.
- Further changed to Swiss institute of
Bioinformatics-SIB. - Has a High level annotated entries with
descriptions of functions, structure, post
translational modifications.
10Example of a Flat file SWISS-PROT Q14790 ID
ICE8_HUMAN STANDARD PRT 479 AA. AC
Q14790 Q14791 Q14792 Q14793 Q14794 AC
Q14795 Q14796 Q15780 Q15806 Q9UQ81 AC
O14676 DT 01-NOV-1997 (Rel. 35, Created) DT
01-NOV-1997 (Rel. 35, Last sequence update) DT
01-OCT-2000 (Rel. 40, Last annotation DT
update) DE CASPASE-8 PRECURSOR (EC 3.4.22.-)
(ICE-LIKE DE APOPTOTIC PROTEASE
5)(MORT1-ASSOCIATED CED-DE 3 HOMOLOG) (MACH)
(FADD-HOMOLOGOUS ICE/CED-DE 3-LIKE PROTEASE)
(FADD-LIKE ICE) (FLICE) DE (APOPTOTIC
CYSTEINE PROTEASE)(APOPTOTIC DE PROTEASE
MCH-5) (CAP4). GN CASP8 OR MCH5.
11OS Homo sapiens (Human). OC Eukaryota
Metazoa Chordata Craniata OC Vertebrata
Euteleostomi OC Mammalia Eutheria Primates
Catarrhini OC Hominidae Homo. OX
NCBI_TaxID9606 RN 1 RP SEQUENCE FROM
N.A., AND ALTERNATIVE RP SPLICING. RC
TISSUEThymus, and B-cell RX MEDLINE96279826
PubMed8681376 NCBI, RX ExPASy, EBI, Israel,
Japan RA Boldin M.P., Goncharov T.M., Goltsev
Y.V., Wallach D.
12RT "Involvement of MACH, a novel
MORT1/FADD-interacting protease, in RT
Fas/APO-1- and TNF receptor-induced cell
death." RL Cell 85803-815(1996). RN 2 RP
X-RAY CRYSTALLOGRAPHY (2.8 ANGSTROMS). RX
MEDLINE99451259 PubMed10508784 NCBI, RX
ExPASy, EBI, Israel, Japan RA Blanchard H.,
Kodandapani L.,Mittl P.R.E., RA Di Marco RA,
S., Krebs J.F., Wu J.C., RA Tomaselli
K.J., Gruetter M.G. RT "The three-dimensional
structure of RT caspase-8 an initiator
enzyme in RT apoptosis." RL
Structure 71125-1133(1999).
13CC -!- FUNCTION MOST UPSTREAM PROTEASE OF CC
THE ACTIVATION CASCADE OF CASPASES CC
RESPONSIBLE FOR THE FAS-RECEPTOR CC
MEDIATED (CD95) AND TNFR-1 INDUCED CELL CC
DEATH. BINDING TO THE ADAPTOR MOLECULE CC FADD
RECRUITS IT TO EITHER RECEPTORS. CC THE
RESULTING AGGREGATE CALLED THE CC
DEATH-INDUCING SIGNALING COMPLEX (DISC) CC
PERFORMS FLICE/MACH PROTEOLYTIC CC
ACTIVATION. THE ACTIVE DIMERIC ENZYME IS CC
THEN LIBERATED FROM THE DISC AND FREE TO CC
ACTIVATE DOWNSTREAM APOPTOTIC PROTEASES. CC
PROTEOLYTIC FRAGMENTS OF THE N-TERMINAL CC
PROPEPTIDE (TERMED CAP3, CAP5 AND CAP6) CC ARE
LIKELY RETAINED IN THE DISC. CLEAVES
Comments
14CC AND ACTIVATES CASPASE-3, -4, -6, -7, -9, CC
AND -10. MAY PARTICIPATE IN THE GRANZYME B CC
APOPTOTIC PATHWAYS. PROTEOLYTICALLY CC
CLEAVES POLY(ADP-RIBOSE) POLYMERASE(PARP). CC
HYDROLYZES THE SMALL- MOLECULE SUBSTRATE, CC
AC- ASP-GLU-VAL-ASP--AMC. LIKELY TARGET CC FOR
THE COWPOX VIRUS CRMA DEATH INHIBITORY CC
PROTEIN. CC -!- SUBUNIT HETERODIMER OF A 18
KDA (P18) CC AND A 10 KDA (P10) SUBUNIT.
INTERACTS WITH CC CFLAR. CC -!- ALTERNATIVE
PRODUCTS 8 ISOFORMS 1- CC ALPHA (SHOWN HERE),
2-ALPHA/MCH5-BETA, 3-CC ALPHA, 4-ALPHA, 1-BETA,
2-BETA, 3-BETA AND CC 4-BETA ARE PRODUCED BY
ALTERNATIVE CC SPLICING.
Presence of subunits
and of alternative proteins
15CC -!- TISSUE SPECIFICITY ALPHA 1 AND BETA 1
CC ISOFORMS ARE EXPRESSED IN A WIDE VARIETY CC
OF TISSUES. HIGHEST EXPRESSION IN CC
PERIPHERAL BLOOD LEUKOCYTES, SPLEEN, CC
THYMUS AND LIVER. BARELY DETECTABLE IN CC
BRAIN, TESTIS, AND SKELETAL MUSCLE. CC -!- PTM
GENERATION OF THE SUBUNITS CC REQUIRES
ASSOCIATION WITH THE DISC, CC WHEREAS
ADDITIONAL PROCESSING IS LIKELY CC DUE TO THE
AUTOCATALYTIC ACTIVITY OF THE CC ACTIVATED
PROTEASE. GRANZYME B AND CC CASPASE-10
CAN BE INVOLVED IN THESE CC PROCESSING
EVENTS. CC -!- SIMILARITY BELONGS TO PEPTIDASE
CC FAMILY C14 ALSO KNOWN AS THE CASPASE
CC FAMILY. CONTAINS 2 DEATH EFFECTOR CC
DOMAINS (DED).
Tissue specificity, Post-translational
modifications , Similarity
16DR EMBL X98172 CAA66853.1 -. EMBL / DR
GenBank / DDBJ CoDingSequence DR EMBL
X98173 CAA66854.1 -. EMBL / DR GenBank
/ DDBJ CoDingSequence DR EMBL X98174
CAA66855.1 -. EMBL / DR GenBank / DDBJ
CoDingSequence DR PDB 1QDU PRELIMINARY.
ExPASy / RCSB DR SWISS-3DIMAGE
ICE8_HUMAN. DR InterPro IPR001875 DED. DR
Pfam PF01335 DED 2. DR Pfam PF00655
ICE_p10 1. DR Pfam PF00656 ICE_p20 1. DR
PROSITE PS50207 CASPASE_P10 1. DR PROSITE
PS50208 CASPASE_P20 1. DR PROSITE PS50168
DED 2.
Database cross-reference with access number
17DR ProDom Domain structure / List of seq. DR
sharing at least 1 domain DR BLOCKS
Q14790. DR DOMO Q14790. DR PROTOMAP
Q14790. DR PRESAGE Q14790. DR DIP
Q14790. DR SWISS-2DPAGE GET REGION ON 2D
PAGE. KW Hydrolase Thiol protease Apoptosis
KW Zymogen Alternative splicing KW
3D-structure.
Keywords
18FT PROPEP 1 216 FT CHAIN 217
374 CASPASE-8 SUBUNIT P18. FT PROPEP 375
384 FT CHAIN 385 479 CASPASE-8
SUBUNIT P10. FT ACT_SITE 317 317 FT
ACT_SITE 360 360 FT DOMAIN 2 80
DED 1. FT DOMAIN 100 177 DED 2. FT
VARSPLIC 102 102 R -gt RFHFCRMSWAEANSQC FT
QTQSVPFWRRVDHLLIR (IN ISOFORM 4 ALPHA). FT
VARSPLIC MISSING (IN ISOFORM 2 ALPHA, FT
ISOFORM 4 ALPHA AND ISOFORM 4 BETA). FT
CONFLICT 285 285 D -gt H (IN REF. 3 AND FT
5). FT CONFLICT 294 294 E -gt D (IN REF. 4).
Feature Table
19SQ SEQUENCE 479 AA 55391 MW SQ
7A5FEAA6B39B582F CRC64 MDFSRNLYDI GEQLDSEDLA
SLKFLSLDYI PQRKQEPIKD ALMLFQRLQE KRMLEESNLS
FLKELLFRIN RLDLLITYLN TRKEEMEREL QTPGRAQISA
YRVMLYQISE EVSRSELRSF KFLLQEEISK CKLDDDMNLL
DIFIEMEKRV ILGEGKLDIL KRVCAQINKS LLKIINDYEE
FSKERSSSLE GSPDEFSNGE ELCGVMTISD SPREQDSESQ
TLDKVYQMKS KPRGYCLIIN NHNFAKAREK VPKLHSIRDR
NGTHLDAGAL TTTFEELHFE IKPHDDCTVE QIYEILKIYQ
LMDHSNMDCF ICCILSHGDK GIIYGTDGQE APIYELTSQF
TGLKCPSLAG KPKVFFIQAC QGDNYQKGIP VETDSEEQPY
LEMDLSSPQT RYIPDEADFL LGMATVNNCV SYRNPAEGTW
YIQSLCQSLR ERCPRGDDIL TILTEVNYEV SNKDDKKNMG
KQMPQPTFTL RKKLVFPSD // The same file in an
oriented Web looking via SWISS-Prot
20TrEMBL database
- Designed as a supplement to SWISS-PROT
- Benefits by providing translation of all coding
sequences - Consists of 2 sections
- SP-TrEMBL with entries that will be
incorporated into SWISS-PROT after annotation - REM-TrEMBL with entries that are not destined
to be included in SWISS-PROT (synthetic
sequences, conceptual translations,)? do not
compromise - the quality of the SWISS-PROT
21 NRL-3D databases
- Contains only protein sequences extracted from
the Brookhaven Protein Databank (PDB) - But includes
- bibliographic references and MEDLINE cross-
references - secondary structure information
- active and binding site, modification in the
sequence - details on experimental method, resolution,
R-factor,
22Composite protein sequence Databases
- 1) To render sequence searching more efficient
- To answer the questions of choosing the best
primary databases? - (the most up-to-date, which database to use? ,)
23Some of the Composite protein sequence databases
available
- NRDB OWL MIPSX
SPTrEMBL - PDB SWISS-PROT PIR
SWISS-PROT - SWISS-PROT PIR
MIPSOwn TrEMBL - PIR GenBank
MIPSTrn - GenPept NRL-3D
MIPSH - SWISS-PROT update
PIRMOD - GenPeptupdate NRL-3D
- SWISS-PROT
-
EMTrans -
GBTrans - Kabat
- PseqIP
24NRDB
- NRDB (Non-Redundant Database) is built locally at
the NCBI. - It is a composite of
- -Gen pept. (Genbanks CDS translations)
- -PDB sequences.
- -Swissprot update (updates of swissprot)
- -PIR
- -Gen pept updates (daily updates of Gen pept)
- NRDB is not prone to errors.
- NRDB is the database of BLAST services.
25OWL
- Non redundant protein Sequence database.
- Built at university of Leeds in collaboration
with the Dares bury Laboratory in Washington. - Composite of
- -Swiss-Prot.
- -PIR
- -Genbank.
- -NRL-3D.
26MIPS X
- Merged database produced at the Max Planck
institute in Martinsried Institute of Protein
sequences. - Composite of
- -PIR NRL-3D
- -MIPSOWN Swiss-prot
- -MIPS Trn EM trans
- -MIPS H GB trans
- -PIRMOD
-
27Swiss-Prot TrEmbl
- EBI constructed database.
- Composite of both Swiss-Prot TrEmbl.
- Minimally redundant.
- SRS is used to retrieve the information.
28THANK YOU