Worldwide Protein Data Bank - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Worldwide Protein Data Bank

Description:

... and quality assessment information. Additional information should be available, including: ... however Journals need to check validity of IDCODE ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 68
Provided by: wwp5
Learn more at: https://cdn.rcsb.org
Category:

less

Transcript and Presenter's Notes

Title: Worldwide Protein Data Bank


1
Worldwide Protein Data Bank www.wwpdb.org
2
Agenda
  • Welcome and Introductions
  • Overview of recent wwPDB progress
  • Introduction to the BMRB
  • Theoretical model policy
  • Issues for discussion and advice
  • Break
  • wwPDB group interactions
  • wwPDB plans for 2007
  • Long term aims, funding, and stability
  • Executive session
  • Feedback to wwPDB
  • Set next meeting date (July 2007 Salt Lake City,
    UT?)

3
wwPDB AchievementsAugust 2005-October 2006
  • Continued growth of archive
  • Website updates
  • Publications and presentations
  • Time stamped archive
  • wwPDB team building
  • Annotation document
  • Remediation
  • BMRB formally a member of wwPDB

4
Deposition issues
5
The never ending story
6
Deposition since establishment of 3 sites
7
PDB entry processing
  • 1-1-2000 10,997 entries in PDB
  • Today 1-Oct-2006 39,323 entries in PDB
  • Total size is 3.6 times when the 3 sites started
  • In 1999 2361 entries deposited
  • In 2005 6678 entries deposited
  • We handle 2.8 as many entries per year with less
    staff - and all 3 sites produce high quality
    annotated PDB entries
  • NO CURRENT BACKLOG UN-PROCESSED ENTRIES

8
Time-stamped copies of the archive
  • 24 Gbytes of data for 2005, released January 3,
    2006
  • Includes
  • PDB format entries
  • mmCIF format entries
  • PDBML format entries
  • Experimental data
  • Dictionary, schema and format documentation

9
Outreach
  • wwPDB website
  • Publications and meetings

10
(No Transcript)
11
Joint publications and presentations
  • Nucleic Acids Research 2007 Database Issue
  • Ensuring a single, uniform archive of PDB data
  • Methods in Molecular Biology 2007
  • Data deposition and annotation at the wwPDB
  • Nature Structural Molecular Biology, 2006
  • Is one solution good enough? (response)
  • CODATA (October 23-25, 2006 Beijing, China)
  • The Worldwide Protein Data Bank
  • Encyclopedia of Genomics, Proteomics, and
    Bioinformatics, 2005
  • The Protein Data Bank and the wwPDB

12
The wwPDB Team
13
wwPDB interactions this year
  • Exchange visits
  • MSD/RCSB (6) (thanks to WT)
  • PDBj/RCSB (1),
  • BMRB/RCSB-PDB (3)
  • Phone conference with site directors-twice a year
  • VTCs among staff
  • BMRB/RCSB twice a month (ADIT-NMR)
  • MSD/RCSB-twice a week (annotation procedures,
    remediation)
  • Email among staff
  • MSD/RCSB2 per day
  • PDBj/RCSB2 per day

14
What is the PDB?
  • Content
  • Processes to ensure quality (annotation project)

15
Annotation project
16
Annotation project
  • GOALS
  • Standardize annotation rules and policies among
    wwPDB sites
  • Document annotation rules and policies
  • Create venue to update annotation rules and
    policies as necessary

17
Annotation project
  • How did we get there?
  • Review and discuss each PDB field by email and
    VTC
  • Write document and review by all staff
  • Final review by site directors
  • Implement software compliant to new annotation
    procedures
  • Test software and train annotators
  • Publish document on Web

18
Annotation project
  • Resultant document
  • Specification of ALL fields in PDB file
  • Clarification of policies
  • Assignment of PDB IDs
  • Release of files and information
  • Changes to entries
  • Clarification of data representation
  • Chain ID for all atoms in the file
  • Multi-model representation for alternate
    conformation or disorder
  • Chimeras
  • Microheterogenity

19
Remediation
20
Remediation scope
  • 34,528 Entries Checked
  • Primary citations
  • Sequences taxonomy
  • Ligand stereochemistry and nomenclature
  • Symmetry and coordinate transformations for virus
    entries
  • Diffraction source beamline
  • Miscellaneous uniformity issues

21
Remediation statistics
  • Citations
  • All primary citations checked
  • 8508 citations manually examined
  • 7037 citations confirmed and updated
  • Sequence and taxonomy
  • 47917 sequences checked
  • 20068 updated sequence data references
  • 11087 taxonomic references updated
  • Virus entries
  • 250 entries checked and revised
  • Diffraction source
  • 10985 entries revised
  • Miscellaneous uniformity corrections
  • 1041 entries revised

22
Remediation statistics
  • Ligand stereochemistry and nomenclature
  • 7568 ligand definitions checked
  • 1758 new ligand definitions added
  • 185 ligand definitions obsoleted
  • 152,000 ligand instances checked
  • 138,230 ligand instances OK
  • 6815 ligand instances renamed

23
Remediation process
  • Corrections contributed and reviewed by wwPDB
    members
  • Corrections on the archival mmCIF data files
    tracked in a version tracking system, CVS
  • New PDB exchange, PDBML and PDB format data files
    being produced now
  • Each wwPDB group will validate and load the
    resulting files into their database systems
  • Invited public testing will begin January 2007
  • General availability will start April 2007

24
Remediation Ligand dictionary rewrite
  • Model and idealized coordinates provided
  • Stereochemical configuration assignments
  • Aromatic atoms and bonds flagged
  • Definitions provided for Chemistry Catalog
    state with leaving atom candidates flagged
  • Nonstandard atom names revised (e.g.
    dinucleotides)
  • Duplicate ligand definitions marked as obsolete
  • Metal hydrate definitions obsoleted
  • Alternate atom name aded to store legacy atom
    names
  • SMILES and INCHI descriptors provided

25
Remediation major entry level corrections
  • Citations
  • PubMed identifiers provided where available
  • Unpublished citations checked and flagged
  • Sequence and taxonomy
  • UniProt sequence database references
  • Taxonomies from NCBI Taxonomy database
  • Diffraction source
  • Synchrotron facility and beamlines names
    consistently specified in coordination with
    BioSync

26
Remediation major ATOM record changes
  • Nomenclature changes
  • IUPAC H-atom names for standard amino acids and
    nucleotides
  • DNA and RNA differentiated (AD (DNA) A (RNA))
  • Modified nucleotides expressed as 3-letter codes
    (removed s)
  • PDB asterisks replaced by single quotes in atom
    names
  • Noncompliant ligands flagged in data files

27
Remediation Major REMARK changes
  • Virus entries
  • Transformations from deposited frame to point
    symmetry and crystallographic frame provided
  • NCS and point symmetry transformations properly
    differentiated


28
EM standards
  • New dictionary for electron microscopy
  • MAP orientation conventions

29
BMRB
  • John Markley

30
Introduction to the BMRB
  • BMRB is the worldwide archival site for
    biomolecular NMR data
  • NMR data related to structures are cross
    referenced to PDB entries
  • PDBj mirrors BMRB and supports external BMRB
    depositions
  • As RCSB members, BMRB and PDB have worked closely
    to capture and annotate NMR data associated with
    deposited coordinate sets
  • Recognizing that the biomolecular NMR community
    would be best served by having a one stop
    deposition system for NMR structures, BMRB has
    been pursuing this goal in collaboration with the
    RCSB-PDB
  • BMRB plans to institute the same policy with MSD
    EBL

31
wwPDB NMR experimental data flow
BMRB (deposition/processing/export) ADIT-NMR centr
al archive
Processed NMR-STAR
CERM-BMRB (export) Mirror site
Processed NMR-STAR
Processed NMR-STAR
Deposited data Raw NMR-STAR
Deposited data Processed NMR-STAR
Raw NMR-STAR
Deposited data
MSD/EBI (deposition/export) CCPN
PDBj-BMRB (deposition/processing/export) ADIT-NMR
Mirror site
RCSB-PDB (deposition) ADIT-NMR
32
Major developments related to BMRBs role in the
wwPDB
  • One-stop BMRB-PDB ADIT-NMR deposition site for
    structures and NMR data developed in
    collaboration with PDB is operational, with BMRB
    assigning PDB accession codes
  • Restraints database for legacy structures is
    nearing completion as part of the wwPDB
    clean-up new tools to automate this process
    were developed in collaboration with MSD EBI
  • NMR-STAR v3 dictionary has been extended and
    released
  • Graphical interface with Jmol displays integrates
    PDB coordinate data with associated NMR
    parameters
  • BMRB is working with SG groups to improve
    efficiency of capturing protein NMR data
  • BMRB participates in the PDB-BMRB Task Group on
    NMR

33
New one-stop deposition of NMR structures/ data
34
Deposition interface features
  • BMRB and RCSB-PDB depositions are
  • now generated from a joint interface
  • BMRB interface has been streamlined
  • RCSB-PDB interface for NMR has been
  • extended with optional fields for conformer
  • and constraint statistics
  • Files in PDB format, mmCIF, and NMR-STAR can be
    uploaded to pre-populate a deposition
  • Many fields (i.e., experiment name, software
    name, software author, etc.) have pull-down lists
    to choose from for convenience and to improve
    uniformity
  • Fields common to multiple forms are linked to
    eliminate the need to retype information (i.e.,
    uploaded data file names, author names, molecule
    names and others)
  • Help and examples have been improved

35
Restraints grid is keyed to NMR structural entries
36
Coordinated displays of NMR data and structures
37
Theoretical Models Policy
  • Haruki Nakamura

38
Models
  • Define line between pure models and models
    based on data
  • Large experimental spectrum e.g. X-ray, NMR, EM,
    SAX, FRET models
  • Homology models especially as derived from
    structural genomics
  • Need a way to archive models that is totally
    compatible with PDB

39
Defining a policy for models
Workshop at Rutgers (November 19-20, 2005)
  • Attended by modelers, structural genomicists,
    electron microscopists
  • Policies and suggested implementations developed
  • Outcome published in Structure
  • Outcome of a Workshop on Archiving Structural
    Models of Biological Macromolecules, Helen M.
    Berman, Stephen K. Burley, Wah Chiu, Andrej Sali,
    Alexei Adzhubei, Philip E. Bourne, Stephen H.
    Bryant, Roland L. Dunbrack, Jr., Krzysztof
    Fidelis, Joachim Frank, Adam Godzik, Kim Henrick,
    Andrzej Joachimiak, Bernard Heymann, David Jones,
    John L. Markley, John Moult, Gaetano T.
    Montelione, Christine Orengo, Michael G.
    Rossmann, Burkhard Rost, Helen Saibil, Torsten
    Schwede, Daron M. Standley, John D. Westbrook,
    Structure, 2006 14/81211-1217.

40
Models Recommendations
  • PDB depositions will be restricted to atomic
    coordinates that are substantially determined by
    experimental measurements on specimens containing
    biological macromolecules.
  • A central, publicly available archive or portal
    should be established for models that are the
    explicit subject of peer review.
  • Methods for assessing model quality are essential
    for the integrity and long-term success of any
    publicly available model portal, either from a
    central repository or a set of linked resources.
    There was no consensus as to which single method
    or group of methods should be applied.

41
(No Transcript)
42
Characteristics of portal
  • Data Standards for Models
  • Access Models for a Central Portal of Models
  • The minimum contents for this portal require a
    unique identifier for each model registered with
    the system, each model's polypeptide chain
    sequence, and quality assessment information.
  • Additional information should be available,
    including keywords, structural motifs, standard
    test sets of data, bound ligands, domains,
    flexibility, surface electrostatic properties,
    coding noncoding SNPs, alternative splicing,
    oligomeric state, macromolecular interactions,
    literature references, subcellular localization,
    pathways, transcript profiling, drugability.
  • Access to these data should be free and
    constantly available to a diverse worldwide user
    community of both model producers and users.
    Several levels of access are required for the
    different levels of users of the portal.

43
Implementation of models policy
  • August 15, 2006 Policy announced with 60 day
    period of review
  • August 15-October 15, 2006 Transition Plan
  • All existing un-processed theoretical model
    entries as well as entries deposited during this
    time were not validated or processed. Entries
    will be released as-is without author review or
    corrections.
  • Authors had the choice of correcting their
    entries by withdrawing the original entry and
    then re-submitting the corrected version before
    October 15, 2006.
  • October 15, 2006 Theoretical model depositions
    no longer accepted

44
Discussion Issues
  • Kim Henrick

45
SAX - New EXP TYPE
  • Hamburg to provide templates for consideration

46
4-letter code?
  • Use of PDB 4-letter code can be extended by
    allowing alpha-numeric in 1st character to 35 x
    36 x 36 x 36 1,632,960 combinations

47
Patent Office
  • The structures in the patent office may not
    represent a major loss of structures current
    investigations indicate most patent structures
    are in the PDB.
  • A much larger set of structures are in the Pharma
    on ligand bound structures.

48
wwPDB SAC input request
49
What is a PDB Entry?
  • Rules for the smallest structure that can be
    submitted
  • Carbohydrate chains?
  • How long is a peptide? (24)
  • Non-gene product macromolecular biological
    ligands (e.g. antibiotics)?

Particular request from NMR depositors
50
Issues Annotation EXP details
  • Experimental Details
  • Twinning twin factor in REMARK 3 requested and
    original un-twinned structure factors
  • TLS and conventional atomic B factor
  • Author derived Validation software and
    procedures/results no longer accepted as in
    REMARK 42 now a REMARK to carry software used
    and function

51
Policy pre-Release Details
  • Entries on HOLD or HPUB currently details
    usually made public AUTHOR, TITLE, STATUS
  • Authors request all details suppressed however
    Journals need to check validity of IDCODE
    ...Yes/No?

52
Deposition policy
  • HPUB/HOLD limit of one year Current problem
    After one year no response. Do we release?
  • ... fixed rules?
  • No problems, release it
  • Problems, withdraw it

53
Major changes after remediationdictionary changes
  • Could have major affects on software
  • New dictionary will be announced to many software
    developers in early November, 2006

54
Major changes after remediationnucleic acids
  • DNA and RNA differentiated (AD (DNA) A
    (RNA))residue names now A rna, AD dna
  • Modified nucleotides expressed as 3-letter codes
    (removed s) e.g. no longer treat as C etcC31
    as 2'-O-3-AMINOPROPYL CYTIDINE-5'-MONOPHOSPHATE
  • PDB asterisks replaced by single quotes in atom
    names O2 is back to O2 as in refinement
    dictionaries

55
Major changes after remediationH-atom names
  • IUPAC H-atom names for standard amino acids and
    nucleotides as in BMRB filehttp//www.bmrb.wisc.e
    du/ref_info/atom_nom.tbl
  • as recommended by the NMR Task Force
  • Example
  • New PDB New PDB
    New PDB
  • H H HG12 pro-R 1HG1 HD11 1HD1
  • HA HA HG13 pro-S 2HG1 HD12 2HD1

56
Major changes after remediationother atom names
  • strange atom names as in co-factors like FAD
    i.e. AC1, AN9, AC8 to be replaced by C1'A, N9A,
    C8A
  • In HEM atom names N A to 'NA'

57
Other issues
58
Issues Annotation Disorder/MODEL
  • Use of MODEL record with disorder with Alternate
    conformations of large portions of structures
    e.g. statistical disorder
  • ....in progress

59
Issues Annotation ATOM/SEQRES Mismatch
  • Fitting species specific ATOM records to a
    related X-ray or EM data set of a different
    species, as for example, a large complex, ATPase
  • ... needs new tokens in progress

60
Very large structures in PDB
  • A proposed solution

61
Representing large complexes in the PDB
  • PART and ENDPRT
  • These records will act much like the existing
    MODEL/ENDMDL records providing a sectioning
    mechanism with the PDB file.
  • PART sections will include records which describe
    the different constituent parts of a large
    molecular system.

62
Representing large complexes in the PDB
  • A PART/ENDPRT section will include all of the PDB
    records types which reference specific structural
    elements of the molecule.
  • PDB records that do not define or reference
    specific elements of molecular structure will be
    at the beginning of the multipart PDB file

63
Coffee
64
wwPDB in 2007
  • Same again .. but more of it (deposition and
    processing)
  • IN ADDITION
  • Rollout new files
  • Implement new annotation procedures
  • Discuss feasibility of a single
    deposition/processing system
  • Further team exchanges
  • Gather Pharma structures

65
Long term goals, funding and stability
66
  • We would really like to be the world wide PDB
    with regular stable funding

67
Acknowledgements
E-MSD is supported by grants from the Wellcome
Trust, the EU (TEMBLOR, NMRQUAL and IIMS), CCP4,
the BBSRC, the MRC and EMBL.
PDBj is supported by grant-in-aid from the
Institute for Bioinformatics Research and
Development, Japan Science and Technology Agency
(BIRD-JST), and the Ministry of Education,
Culture, Sports, Science and Technology (MEXT).
The BMRB is supported by NIH grant LM05799 from
the National Library of Medicine.
The RCSB PDB is supported by grants from the
National Science Foundation, National Institute
of General Medical Sciences, the Office of
Science-Department of Energy, the National
Library of Medicine, the National Cancer
Institute, the National Center for Research
Resources, the National Institute of Biomedical
Imaging and Bioengineering, the National
Institute of Neurological Disorders and Stroke,
and the National Institute of Diabetes
Digestive Kidney Diseases.
Write a Comment
User Comments (0)
About PowerShow.com