Title: Real World Experiences in Operating a Collaboratory: The Protein Data Bank
1Real World Experiences in Operating a
Collaboratory The Protein Data Bank
- Helen M. Berman
- Board of Governors Professor of Chemistry
Chemical BiologyDirector, Research Collaboratory
for Structural Bioinformatics and the Protein
Data Bank
2What is the PDB?
- Single international repository for all
information about the structure of large
biological molecules - Archival database with hundreds of thousands of
users who depend on the data
3(No Transcript)
4Number of released entries
Year
51970s
- Grass roots community efforts to archive data
- Protein crystallographers discuss how to archive
data - June 1971
- Cold Spring Harbor meeting brings groups together
- (Cold Spring Harbor Symposia on Quantitative
Biology, vol. XXXVI, 1972.) - October 1971
- PDB is announced in Nature New Biology
- (7 structures vol 233, 1971, page 223)
- 1975
- PDB receives first funding from NSF (32
structures)
6(No Transcript)
7Nature New Biology
81980s
- Technology takes off
- molecular biology, instrumentation, computer
hardware and software - Structural biology is able to focus on medical
problems - Community efforts to promote data sharing
- IUCr guidelines requiring data deposition in the
PDB are published
91990s
- Number of structures increases exponentially
- Complexity of structures increases
- New databases begin to emerge
- More structures determined by cryo- electron
microscopy - Plans for structural genomics emerge
- User community for the PDB expands dramatically
RCSB awarded contract for the PDB
10(No Transcript)
11Who does what?
- Rutgers
- Data in standards, validation, annotation
- UCSD/SDSC
- Data out search engine, Web site, data
distribution
12(No Transcript)
13(No Transcript)
14Communication
- VTC
- Electronic email, forums, wikis
- Procedures
- Internal newsletter
- Retreats
15Retreats
- Team building exercises
- Management training
- Technical discussions
- Time to get to know one another
16(No Transcript)
17VTCs
- Two formal ones per week
- Ad hoc when there are issues to discuss
182000s
- Continued growth in structure studies
- Structural genomics takes off
- RCSB PDB contract renewed
-
BMRB joins RCSB
- Release of new database and website
2bus
Kurt Wüthrich, who determined the first first
three-dimensional protein structure by NMR
spectroscopy with coworkers (proteinase IIa
inhibitor from bull seminal plasma) was awarded
the Nobel Prize in Chemistry in 2002
19(No Transcript)
20The PDB is Global
21Worldwide Protein Data Bank www.wwpdb.org
22Mission
- Maintain a single archive of macromolecular
structural data that is freely and openly
available to the global community
23(No Transcript)
24(No Transcript)
25wwPDB
- Formalization of current working practice
- Members
- RCSB PDB (Research Collaboratory for Structural
Bioinformatics) - PDBj (Osaka University)
- Macromolecular Structure Database (EBI)
- MOU signed July 1, 2003
- Announced in Nature Structural Biology
November 21, 2003
26Guidelines and Responsibilities
- All members issue PDB IDs and serve as
distribution sites for data - One member is the archive keeper (RCSB)
- All format documentation publicly available
- Strict rules for redistribution of PDB files
- All sites can create their own web sites
27Future
- 60,000 structures by 2008
- 20,000 depositions per year in 2010
- Complexity will increase dramatically
- New methods will yield new structures
28Scientific Challenges
- Number of data files continues to increase
- Information content of each data file is
increasing - Many more very large macromolecular complexes
- New structure determination methods
- Structure genomics
29Technical Challenges
- How do we represent diverse data?
- How do make a searchable database?
- How do we integrate with other data resources?
- How do we make a scalable system?
- How do we meet the needs of a diverse community?
30Structural Genomics
The next step beyond the human genome project
From the NIH Request for Proposals for Structure
Genomics Centers
These studies should lead to an understanding of
structure/function relationships and the ability
to obtain structural models of all proteins
identified by genomics. This project will require
the determination of a large number of protein
structures in a high-throughput mode.
31PSI - Structures (Sep-2005 1246 images)
32Community
- Depositors
- Different methods X-ray, NMR, cryo-EM
- Users
- Specialists (structural biologists)
- Generalists
- Educators
- Students
- Lay community
33Active Outreach
- Electronic
- Meetings
- Publications
- One on one
- Many workshops
34Issues
- Standards What is the role of the centers? What
should it be? - Long term preservation How long? What are the
options? - Stability Strong dependency of research
community demands a more stable model
35Bottom line
- All the interdependencies within wwPDB and
between the scientific community and wwPDB call
for a new funding model that will ensure the long
term preservation and availability of the
research data contained within these resources
36Acknowledgements
Operated by two members of the RCSB
The RCSB PDB is a member of the
Supported by