SCOP DATABASE presentation

About This Presentation

Transcript and Presenter's Notes

Title: SCOP DATABASE

1
SCOP DATABASE
2
SCOP STATISTICS
Classification Proteins are classified to
reflect both structural and evolutionary
relatedness. Many levels exist in the hierarchy,
but the principal levels are family,
superfamily fold, Classification is
generally conservative where any doubt about
relatedness exists, we made new divisions at the
family and superfamily levels. Thus, some
researchers may prefer to focus on the higher
levels of the classification tree, where
proteins with structural similarity are
clustered.
3
SCOP STATISTICS

Classification
The different major levels in the hierarchy are
Family Clear evolutionarily relationshipProteins
clustered together into families are clearly
evolutionarily related. Generally, this
means that
pairwise residue identities between the
proteins are
30 and greater. However, in some cases
similar
functions and structures provide definitive
evidence
of common descent in the absense of high
sequence
identity for example, many globins form a
family
though some members have sequence identities
of
only 15.

4
SCOP STATISTICS
Classification The different major levels in the
hierarchy are 2. Superfamily Probable common
evolutionary originProteins that have low
sequence identities, but whose structural and
functional features suggest that a common
evolutionary origin is probable are placed
together in superfamilies. For example, actin,
the ATPase domain of the heat shock protein,
and hexakinase together form a superfamily
5
SCOP STATISTICS
Classification The different major levels in the
hierarchy are 2. Superfamily Probable common
evolutionary originProteins that have low
sequence identities, but whose structural and
functional features suggest that a common
evolutionary origin is probable are placed
together in superfamilies. For example, actin,
the ATPase domain of the heat shock protein,
and hexakinase together form a superfamily
6
SCOP STATISTICS
Classification The different major levels in the
hierarchy are 3. Fold Major structural
similarity Proteins are defined as having a
common fold if they have the same major
secondary structures in the same arrangement
and with the same topological connections.
Different proteins with the same fold often
have peripheral elements of secondary structure
and turn regions that differ in size and
conformation. In some cases, these differing
peripheral regions may comprise half the
structure. Proteins placed together in the same
fold category may not have a common
evolutionary origin the structural
similarities could arise just from the physics
and chemistry of proteins favoring certain
packing arrangements and chain topologies.
7
SCOP STATISTICS
Class of number of Number of
Number of Proteins folds
superfamilies families All a proteins 138
224 337 All b proteins
93 171 276 All a/b 97 167
374 All ab 184 263
391 Multi-domain 28 28
35 Membrane 11 17 28 And cell
surface Small proteins 54 77
116 Total 605 947 1557
8
CATH DATABASE
9
CATH HIERARCHY
10
CATH HIERARCHY
Class, C-level Class is determined according to
the secondary structure composition and packing
within the structure. It can be assigned
automatically for over 90 of the known
structures using the method of Michie et al.
(1996). For the remainder, manual inspection is
used and where necessary information from the
literature taken into account. Three major
classes are recognised mainly-alpha, mainly-beta
and alpha-beta. This last class (alpha-beta)
includes both alternating alpha/beta structures
and alphabeta structures, as originally defined
by Levitt and Chothia (1976). A fourth class is
also identified which contains protein domains
which have low secondary structure content.
11
CATH HIERARCHY
       Architecture, A-level This describes the
overall shape of the domain structure as
determined by the orientations of the secondary
structures but ignores the connectivity between
the secondary structures. It is currently
assigned manually using a simple description of
the secondary structure arrangement e.g. barrel
or 3-layer sandwich. Reference is made to the
literature for well-known architectures (e.g the
beta-propellor or alpha four helix bundle).
Procedures are being developed for automating
this step.
12
CATH HIERARCHY
      Topology (Fold family), T-level
Structures are grouped into fold families at
this level depending on both the overall shape
and connectivity of the secondary structures.
This is done using the structure comparison
algorithm SSAP (Taylor Orengo (1989)).
Parameters for clustering domains into the same
fold family have been determined by empirical
trials throughout the databank (Orengo et al.
(1992), Orengo et al. (1993)). Structures which
have a SSAP score of 70 and where at least 60 of
the larger protein matches the smaller protein
are assigned to the same T level or fold family.
Some fold families are very highly populated
(Orengo et al. (1994)) particularly within the
mainly-beta 2-layer sandwich architectures and
the alpha-beta 3-layer sandwich architectures. In
order to appreciate the structural relationships
within these families more easily, they are
currently subdivided using a higher cutoff on the
SSAP score (75 for some mainly-beta and
alpha-beta families, 80 for some mainly-alpha
families, together with a higher overlap
requirement (70)).
13
CATH HIERARCHY

Homologous Superfamily, H-level This
level groups together protein domains which are
thought to share a common ancestor and can
therefore be described as homologous.
Similarities are identified first by sequence
comparisons and subsequently by structure
comparison using SSAP. Structures are clustered
into the same homologous superfamily if they
satisfy one of the following criteria
Sequence identity gt 35, 60 of larger structure
equivalent to smaller
SSAP score gt 80.0 and sequence identity gt
2060 of larger structure equivalent to smaller
SSAP score gt 80.0, 60 of larger structure
equivalent to smaller, and domains which have
related functions
Sequence families, S-level
Structures within each H-level are further
clustered on sequence identity. Domains clustered
in the same sequence families have sequence
identities gt35 (with at least 60 of the larger
domain equivalent to the smaller), indicating
highly similar structures and functions.

14
CATH HIERARCHY Database
15
STRUCTURE SIMILARITY SEARCH In SCOP or CATH
using TOP
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
EC DATABASE
26
Fold representation in SCOP
27
(No Transcript)
28
AN EXAMPLE OXYDOREDUCTASE
29
E.C.1.-.-.- Oxidoreductases. E.C.1.1.-.- Acting
on the CH-OH group of donors. E.C.1.1.1.- With
NAD() or NADP() as acceptor. E.C.1.1.1.1
Alcohol dehydrogenase. Reaction An alcohol
NAD() an aldehyde or ketone Nadh. Other
name(s) Aldehyde reductase. Cofactor(s) Zinc
or Iron. Comments Acts on primary or secondary
alcohols or hemiacetals. The animal, but not the
yeast, enzyme acts also on cyclic secondary
alcohols.
CATH (Fold)
EC (Function)
1a4u
1ofga1
(oxydoreductase)
1ofga1
30
E.C.1.-.-.- Oxidoreductases. E.C.1.1.-.- Acting
on the CH-OH group of donors. E.C.1.1.1.- With
NAD() or NADP() as acceptor. E.C.1.1.1.1
Alcohol dehydrogenase. Reaction An alcohol
NAD() an aldehyde or ketone Nadh. Other
name(s) Aldehyde reductase. Cofactor(s) Zinc
or Iron. Comments Acts on primary or secondary
alcohols or hemiacetals. The animal, but not the
yeast, enzyme acts also on cyclic secondary
alcohols.
CATH (Fold)
1ofga1
2scua1
1gkyo2
ligase
Transferase
oxydoreductase
1ofga1
31
KEGG (Metabolic pathway)
ENTRY EC 1.1.1.1 NAME Alcohol dehydrogenase
Aldehyde reductase CLASS Oxidoreductases Acting
on the CH-OH group of donors With NAD or NADP
as acceptor SYSNAME AlcoholNAD oxidoreductase
REACTION Alcohol NAD Aldehyde or Ketone
NADH SUBSTRATE NAD Primary alcohol Secondary
alcohol Cyclic secondary alcohol Hemiacetal
PRODUCT Aldehyde Ketone NADH COFACTOR Zinc
COMMENT A zinc protein. Acts on primary or
secondary alcohols or hemiacetals the animal,
but not the yeast, enzyme acts also on cyclic
secondary alcohols The insect enzyme is a member
of the nonmetallo-short-chain alcohol
dehydrogenase (ADH) family (Proc.Natl.Acad.Sci.USA
(1991) 88, 10064-10068). PATHWAY PATH MAP00010
Glycolysis / Gluconeogenesis PATH MAP00071
Fatty acid metabolism PATH MAP00120 Bile acid
biosynthesis PATH MAP00350 Tyrosine metabolism
PATH MAP00561 Glycerolipid metabolism GENES
ECs4466 YPE YPO2180(adhE) HIN HI0185(adhC) PMU
PM1453(adh2) XFA XF1746 XF2389 DISEASE MIM
103700 Alcohol dehydrogenase (class I), alpha
polypeptide MIM 103720 Alcohol dehydrogenase
(class I), beta polypeptide MIM 103730 Alcohol
dehydrogenase (class I), gamma polypeptide MIM
103740 Alcohol dehydrogenase (class II), pi
polypeptide MIM 600086 Alcohol dehydrogenase-7
MOTIF PS PS00059 G-H-E-x(2)-G-x(5)-GA-x(2)-IVS
AC PS PS00060 GSW-x-LIVTSACD-GH-x(2)-GSAE
-GSHYQ-x-LIVTP- GAST-GAS-x(3)-LIVMT-x-
HNS-GA-x-GTAC PS PS00061 LIVSPADNK-x(12)-Y
-PSTAGNCV-STAGNQCIVM-STAGC-K-
PC-SAGFYR-LIVMSTAGD-x(2)-LIVMFYW-x(3)-
LIVMFYWGAPTHQ-GSACQRHM PS PS00913
STALIV-LIVF-x-DE-x(6,7)-P-x(4)-ALIV-x-GST
- x(2)-D-TAIVM-LIVMF-x(4)-E STRUCTURES
PDB 1A4U 1A71 1A72 1ADB 1ADC 1ADF 1ADG 1AGN 1AXE
1AXG 1B14 1B15 1B16 1B2L 1BTO 1CDO 1D1S 1D1T 1DDA
1DEH 1E3E 1E3I 1E3L 1EE2 1HDX 1HDY 1HDZ 1HET 1HEU
1HF3 1HLD 1HSO 1HSZ 1HT0 1HTB 1JU9 1LDE 1LDY 1QLH
1QLJ 1TEH 2OHX 2OXI 3BTO 3HUD 5ADH 6ADH 7ADH
DBLINKS IUBMB Enzyme Nomenclature 1.1.1.1
ExPASy - ENZYME nomenclature database 1.1.1.1
WIT (What Is There) Metabolic Reconstruction
1.1.1.1 BRENDA, the Enzyme Database 1.1.1.1
SCOP (Structural Classification of Proteins)
1.1.1.1
32
KEGG (Metabolic pathway)
33
KEGG (Metabolic pathway)
34
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

SCOP DATABASE PowerPoint PPT Presentation