LSM2104/CZ2251 - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

LSM2104/CZ2251

Description:

Protein Data Bank: maintained by the Research Collaboratory for Structural ... are delineated automatically using the criteria of recurrence and compactness. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 67
Provided by: bch47
Category:

less

Transcript and Presenter's Notes

Title: LSM2104/CZ2251


1
LSM2104/CZ2251 Essential Bioinformatics and
Biocomputing 
Protein Structure and Visualization (2)
Chen Yu Zong   csccyz_at_nus.edu.sg 6874-6877
2
LSM2104/CZ2251 Essential Bioinformatics and
Biocomputing 
Lecture 10 Protein structure databases
visualization and classifications
1. Introduction to Protein Data Bank (PDB) 2.
Free graphic software for 3D structure
visualization 3. Hierarchical classification of
protein domains SCOP CATH DALI
3
1. Protein Data Bank (PDB)
  • Protein Data Bank maintained by the Research
    Collaboratory for Structural Bioinformatics
    (RCSB)
  • http//www.rcsb.org/pdb/
  • 30060 Structures 15-Mar-2005
  • 27570 Structures 05-Oct-2004
  • 23997 Structures 20-Jan-2004
  • Also contains structures of other
    bio-macromolecules DNA, carbohydrates and
    protein-DNA complexes.

4
1. Protein Data Bank (PDB)
5
1. Protein Data Bank (PDB)
6
PDB Content Growth
7
PDB Presentation of Selected Molecules
8
Deficiencies in our structural knowledge
  • Only deposited data is actually available
  • Many structures not deposited in PDB, why?
  • Structures available for soluble proteins
  • A few dozen entries for membrane protein
    domains, why?
  • X-ray data only for those proteins that
    crystallize well or diffract properly.
  • Why?
  • NMR structures are usually for small proteins
  • How to survey the size of NMR-determined
    proteins?
  • Estimated that structural data available
  • for only 10-15 of all known proteins.

9
Alternative Source of Structure NCBI
10
Protein Structure in PDB
  • Text files
  • Each entry is specified by a unique 4-letter code
    (PDB code) say 1HUY for a variant of GFP 1BGK
    for a 37-residue toxin protein isolated from sea
    anemone
  • 1HUY and 1BGK
  • Header information
  • Atomic coordinates in Å (1 Ångstrom 1.0e-10 m)

11
Header Details
  • Identifies the molecule, modifications, date of
    release
  • Host organism, keywords, method of study
  • Authors, reference, resolution for X-ray
    structure
  • Smaller the number, better the structure.
  • Sequence, reference

12
(No Transcript)
13
The Atomic Coordinates
  • XYZ Coordinates for each atom (starting with
    ATOM, only heavy atom for X-ray structure) from
    the first residue to the last
  • XYZ coordinates for any ligands (starting with
    HETATM) complexed to the bio-macromolecule
  • O atoms of water molecules (starting with HETATM,
    normally at the last part of the xyz coordinate
    section)
  • Usually, for X-ray structure, resolution is not
    high enough to locate H atoms hence only heavy
    atoms are shown in the PDB file.
  • For NMR structure, all atoms (including hydrogen
    atoms) are specified in the PDB file.

14
X-ray structure 1HUY
15
NMR structure 1BGK
16
2. Free Software for Protein Structure
Visualization
  • RASMOL available for all platforms
  • http//www.openrasmol.org
  • Swiss PDB Viewer from Swiss-Prot
    http//www.expasy.ch/spdbv/
  • Chemscape Chime Plug-in for PC and Mac
    http//www.mdl.com/downloads/downloadable/index.js
    p
  • YASARA http//www.yasara.org/
  • MOLMOL MOLecule analysis and MOLecule display
  • http//129.132.45.141/wuthrich/software/molmol/in
    dex.html

17
Ribbon representation by RasMol
1HUY
An Improved Yellow Variant Of Green Fluorescent
Protein From Tsiens group J.Biol.Chem. 276
29188 (2001)
18
Ribbon representation by YASARA
19
Ribbon representation by YASARA
20
Ribbon representation by MOLMOL
21
(No Transcript)
22
An ensemble of 15 structures (NMR, toxin
Bgk) Proton atoms also included
15 backbone structures of the sea anemone toxin
Bgk
23
15 all-atom structures of the sea anemone toxin
Bgk
Line representation
24
Ribbon representation
25
Space-filling representation
26
3. Hierarchical classification of protein
domains SCOP CATH
  • SCOP Structural Classification of Proteins
  • University of Cambridge, UK
  • http//scop.mrc-lmb.cam.ac.uk/scop/
  • Hyperlink in Singapore http//scop.bic.nus.edu.sg
    /
  • CATH ClassArchitectureTopology
  • --Homologous Superfamily
  • Sequence family
  • University College London, UK
  • http//www.biochem.ucl.ac.uk/bsm/cath/

27
Basis for protein classification
  • Proteins adopt a limited number of topologies
  • More than 50,000 sequences fold into 1000
    unique folds.
  • Homologous sequences have similar structures
  • Usually, when sequence identitygt30, proteins
    adopt the same fold. Even in the absence of
    sequence homology, some folds are preferred by
    vastly different sequences.
  • The active site is highly conserved
  • A subset of functionally critical residues are
    found to be conserved even the folds are varied.

28
How many unique folds do organisms use to
express functions?
Sequence space gt 50,000
Conformational space
Many sequences to form one unique fold
1,000 ???????
29
Growth of Protein Databases
30
Structural Classification of Proteins SCOP
  • University of Cambridge, UK http//scop.mrc-lmb.c
    am.ac.uk/scop/
  • mirrored at Singapore http//scop.bic.nus.edu.sg/
  • contains PDB entries grouped hierachically by
  • Structural class,
  • Fold,
  • Superfamily,
  • Family,
  • Individual member
  • (domain-based)

31
Structural Classification of Proteins SCOP
  • Family
  • Proteins are clustered together into families on
    the basis of one of two criteria that imply their
    having a common evolutionary origin
  • All proteins that have residue identities of 30
    and greater
  • Proteins with lower sequence identities but whose
    functions and structures are very similar
  • Example, globins with sequence identities of 15.

32
Structural Classification of Proteins SCOP
  • Superfamily
  • Families, whose proteins have low sequence
    identities but whose structures and, in many
    cases, functional features suggest that a common
    evolutionary origin is probable, are placed
    together in superfamilies
  • Example, actin, the ATPase domain of the
    heat-shock protein and hexokinase

33
Structural Classification of Proteins SCOP
  • Fold
  • Superfamilies and families are defined as having
    a common fold if their proteins have same major
    secondary structures in same arrangement with the
    same topological connections.

34
Structural Classification of Proteins SCOP
  • Class
  • For convenience of users, the different folds
    have been grouped into classes. Most of the folds
    are assigned to one of a few structural classes
    on the basis of the secondary structures of which
    they composed

35
(No Transcript)
36
SCOP Class All-a topologies
cytochrome b-562
ferritin
37
SCOP Class All-a topologies
38
SCOP Class All-a topologies
39
SCOP Class All-b topologies
b-barrels
b sandwiches
40
SCOP Class All-b topologies
41
SCOP Class a/b Topologies
a/b horseshoe
42
SCOP Class a/b Topologies
a/b barrels
43
SCOP Class a/b Topologies
44
SCOP Class AlphaBeta Topologies
45
SCOP Class AlphaBeta Topologies
46
(No Transcript)
47
Ubiquitin
1ubi
48
Ubiquitin
1ubi
49
Ubiquitin
1ubi
50
Ubiquitin
1ubi
51
CATH database
http//www.biochem.ucl.ac.uk/bsm/cath/
CATH ClassArchitectureTopology--Homologous
Superfamily--Sequence family
Orengo et al. CATH-a hierarchical classification
of protein domain structures (1997) Structure 5,
1093-1108
Sequence identity gt30 the same overall
fold Sequence identity gt70 the same overall
fold the similar function
52
CATH database
Class Derived from secondary structure content,
is assigned for more than 90 of protein
structures automatically. Architecture Describes
the gross orientation of secondary structures,
independent of connectivities, is currently
assigned manually. Topology Clusters
structures according to their topological
connections and numbers of secondary structures.
Homologous superfamilies Cluster proteins with
highly similar structures and functions. The
assignments of structures to topology families
and homologous superfamilies are made by sequence
and structure comparisons. Sequence
families Structures within each H-level are
further clustered on sequence identity. Domains
clustered in the same sequence families have
sequence identities gt35. Non-identical
sequence domains, Identical sequence
domains, Domains
53
CATH database
54
(No Transcript)
55
The class (C), architecture (A) and topology (T)
levels in the CATH database
Class Architecture Topology
56
The class (C), architecture (A) and topology (T)
levels in the CATH database
Homologous Superfamily
57
CATH architectures
58
CATH architectures (cont.)
59
The protein structure universe in the PDB (1997)
by a CATH wheel
The distribution of non-homologous structures
(i.e. a single representative from each
homologous superfamily at the H-level in CATH)
amongst the different classes (C), architectures
(A) and fold families (T) in the CATH database.
60
SCOP / CATH -gt DALI
  • SCOP CATH
  • Hierarchical and based on abstractions
  • Include some manual aspects and are curated by
    experts in the field of protein structure

Dali
Presentation of results of computer
classification, where the methods that underlie
the classification remain internal
Structure comparison
61
DALI
Comparing protein structures in 3D
a b meander
anti parallel b barrel
a/b
a
b
More information about DALI Touring protein fold
space with Dali/FSSP Liisa Holm and Chris Sander
62
Compare 3D protein structures by Dali
http//www.ebi.ac.uk/dali/
63
Compare 3D protein structures by Dali
http//www.ebi.ac.uk/dali/
  • The FSSP database (Fold classification based on
    Structure-Structure alignment of Proteins) is
    based on exhaustive all-against-all 3D structure
    comparison of protein structures currently in the
    Protein Data Bank (PDB).
  • The classification and alignments are
    automatically maintained and continuously updated
    using the Dali search engine.
  • Dali Domain Dictionary
  • Structural domains are delineated automatically
    using the criteria of recurrence and compactness.
    Each domain is assigned a Domain Classification
    number DC_l_m_n_p , where
  • l - fold space attractor region
  • m - globular folding topology
  • n - functional family
  • p - sequence family

64
Compare 3D protein structures by Dali
http//www.ebi.ac.uk/dali/
  • Functional families
  • Evolutionary relationships from strong
    structural similarities which are accompanied by
    functional or sequence similarities.
  • Functional families are branches of the fold
    dendrogram where all pairs have a high average
    neural network prediction for being homologous.
  • Sequence families
  • Representative subset of the Protein Data Bank
    extracted using a 25 sequence identity
    threshold.
  • All-against-all structure comparison was carried
    out within the set of representatives.
  • Homologues are only shown aligned to their
    representative.

65
Compare 3D protein structures by Dali
http//www.ebi.ac.uk/dali/
  • Fold types
  • Fold types are defined as clusters of structural
    neighbors in fold space with average pairwise
    Z-scores (by Dali) above 2.
  • Structural neighbours of 1urnA (top left).
    1mli (bottom right) has the same topology even
    though there are shifts in the relative
    orientation of secondary structure elements

66
Summary
  • Protein structure database (PDB)
  • Protein structure visualization software
  • Structural classification, databases and servers
Write a Comment
User Comments (0)
About PowerShow.com