Title: Protein structures in the PDB
1Protein structures in the PDB
2Domains
- proteins can be modular
- single chain may be divisible into smaller
independent units of tertiary structure called
domains - domains are the basic unit of structure
classification - different domains in a protein are also often
associated with different functions carried out
by the protein.
3Definition of domain
- A polypeptide or part of a polypeptide chain
that can independently fold into a stable
tertiary structure... - from Introduction to Protein Structure, by
Branden Tooze - Compact units within the folding pattern of a
single chain that look as if they should have
independent stability. - from Introduction to Protein Architecture, by
Lesk
MBP Figure to go here
4Motif (Supersecondary Structure)
- there are certain favored arrangements of
multiple secondary structure elements that recur
again and again in proteins--these are known as
motifs or supersecondary structures - a motif is usually smaller than a domain but can
encompass an entire domain
greek key
beta-alpha-beta
5Protein Taxonomy-The CATH Hierarchy
- 1. Divide PDB structure entries into domains
(using domain recognition algorithms--the domain
is the fundamental unit of structure
classification - 2. Classify each domain according to a five
level hierarchy -
Class Architecture Topology Homologous
Superfamily Sequence Family
the top 3 levels of the hierarchy are purely
phenetic--based on characteristics of the
structure, not on evolutionary relationships
the bottom two levels include some phyletic
classification as well-- groupings according to
putative common ancestry based on structural
similarity, functional similarity, and sequence
similarity
protein evolution is not well understood-- there
is to date no purely phyletic classification
system
6Class
- In the CATH hierarchy, Class simply describes
what type of secondary structure is present. - There are only four classes
- mainly a
- mainly b
- a b
- few secondary structures
- 90 of structures are trivial to assign at this
level.
7Architecture
- Architecture is hard to define precisely
- In CATH it is defined broadly as describing
general features of protein shape such as
arrangements of secondary structure in 3D space - It does not define connectivities between
secondary structural elements--thats what the
topology level does. It does not even explicitly
define directionality of secondary structure,
e.g. parallel or antiparallel beta-sheets. - in CATH, architectures are presently assigned
manually, by visual inspection. - lets look at some architectures!
8Some mostly beta architectures
9Some mixed alpha-beta architectures
10Topology (Fold)
- if two proteins have the same topology, it means
they have the same number and arrangement of
secondary structures, and the connectivities
between these elements are the same. - this is also sometimes called the fold of a
protein. - in CATH, automated structure alignment is used
to group proteins according to topology. We will
discuss this later. - we will now look at some examples which
illustrate differences in topology.
11Topology differences in connectivity
- example a four-stranded antiparallel beta-sheet
can have many different topologies based on the
order in which - the four beta-strands are connected.
greek key
up-and-down
12Topology differences in handedness
- example in a beta-alpha-beta motif, if the two
parallel strands are oriented to face toward you,
the helix can be either above or below the plane
of the strands.
13Visualizing protein topology--TOPS cartoons
- up trianglesup-facing beta strands
- down trianglesdown-facing beta strands
- horizontal rows of trianglesbeta sheets (beta
barrel would be a ring of triangles) - circleshelices
- linesloops
- if loops enter from top, line drawn to ctr.
- if loops enter from bottom, line drawn to
boundary
fold above is clearly an antiparallel
beta-sandwich
14Visual summary of top three levels of CATH
hierarchy
CLASS
ARCHITECTURE
TOPOLOGY
15Discovery of New Folds
- structural taxonomy reveals that although
structures are being solved more rapidly than
ever, fewer and fewer of them have new folds!
Will we get them all soon?
16Homologous superfamily/Sequence family
- The lowest two levels in the CATH hierarchy
relate to common ancestry - some, but not all proteins with the same fold
show evidence of common ancestry - the surest way of identifying common ancestry is
that two proteins have sequences roughly gt30
identical (sequence family level) - if protein sequences are not that similar, common
ancestry may still be inferred on the basis of a
combination of structural and functional
similarity, and possibly weak sequence similarity
(homologous superfamily level)
17Multifunctional Superfolds
some architectures have many folds-- superarchite
cture
some folds have many homologous superfamilies, whi
ch means they are used for a variety of
functions. these are called superfolds
18Common core
- structures need not share exactly the same
number, type and connectivity of secondary
structural elements to be grouped into a single
fold type. - in fact, evolutionarily related proteins often
share a common core of structurally related
elements but may differ in presence or absence of
a secondary structure element or two.
19Problems in Fold Classification
- Structure space has a continuous aspect,
especially in certain types of folds, which makes
clustering structures into fold families
difficult. This is an inherent problem for any
classification method based on hierarchical
clustering. - It seems reasonable to group as having the same
fold proteins which share some common core but
differ in addition/subtraction of a few secondary
structure elements. - But this can lead to unnaturally large and
diverse fold families via the Russian doll effect
and motif overlap.
20Russian Doll Effect
- A continuous range of slight size differences
will lead to clustering proteins of very
different size. small--gt medium--gtlarge.
21Motif Overlap
- Motif overlap effects Sometimes two proteins
will share a common core but one of them will
share a slightly different (but not necessarily
larger) common core with a third protein. A
continuous range of overlapping common cores - AB--gtBC--gtCD will lead to grouping
proteins that have no common core.
22Comparison of SCOP and CATH Hierarchies
- SCOP CATH
- class class
- architecture
- fold topology
- homologous superfamily
- superfamily
- family sequence family
- domain domain
CATH more directed toward structural
classification, SCOP pays more attention to
evolutionary relationships
23Another SCOP/CATH difference
- in CATH, there is one class to represent mixed
alpha-beta - in SCOP there are two
- a/b beta structure is largely parallel,
made of bab motifs - ab alpha and beta structure segregated to
different parts of structure
24SCOP and CATH
- they have in common that they are hierarchical
and based on abstractions - they both include some manual aspects and are
curated by experts in the field of protein
structure - are there automated methods for structure
classification/comparison?