Enabling Collaboration - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Enabling Collaboration

Description:

The PDB is supported by funds from the NSF, DOE, and two units of the NIH: the ... NIST: T.N. Bhat, Phoebe Fagan, Veerasamy Ravichandran, Michael Tung, Greg ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 23
Provided by: off699
Category:

less

Transcript and Presenter's Notes

Title: Enabling Collaboration


1
EnablingCollaboration
http//www.pdb.org/ ? info_at_rcsb.org The PDB is
supported by funds from the NSF, DOE, and two
units of the NIH the NIGMS and NLM
2
Current PDB Management
  • Research Collaboratory for Structural
    Bioinformatics
  • Director
  • Helen M. Berman, Rutgers University
  • Co-Directors
  • Phil Bourne, UCSD/SDSC
  • Gary Gilliland, CARB/NIST
  • John Westbrook, Rutgers University
  • Although the R in RCSB is for research
    PDB is purely a production service project

3
PDB Deposition and Distribution Sites
In place
Planned
SDSC, Rutgers, NIST, BMRB
Cambridge Crystallographic Data Centre,
UK National University of Singapore, Singapore
Osaka University, Japan Universidade
Federal de Minas Gerais, Brazil Max-Delbrück-Cente
r, Germany
4
Some Goals Related to Collaboration
  • Seamless data exchange among structure
    determination applications, databases, and
    deposition pathways
  • Timely distribution and enabling complete
    analysis of macromolecular structure data by
    remote Internet users and applications

5
Requirements
  • Consensus on well-defined metadata specifications
    for all exchanged data
  • Well-integrated software supporting the data
    specifications
  • APIs to deliver data

6
Data Sharing Nightmare
7
Ontology Content
  • http//deposit.pdb.org/mmcif/
  • NMR
  • Modeling
  • Crystallization
  • Symmetry
  • Image data
  • BIOSYNC
  • Data harvesting/pipelining
  • PDB data exchange
  • Including structural genomics extensions

8
Whats Driving Data Definition
  • IUCr sponsored community effort (1989 -gt )
  • Data harvesting
  • Data management and exchange for PDB
  • High-throughput structure determination and
    structural genomics (via International Task
    Forces)
  • Data deposited will be at the level of journal
    materials and methods section
  • Each data item must be carefully defined in PDB
    exchange data dictionary
  • Additional description of X-ray, NMR experiments
    and new items describing protein production (3x
    increase in content scope)

9
Elements of Ontology Metadata
  • Data attributes
  • Definition
  • Examples
  • Data type (primitive type/regular expression
    patterns)
  • Range or allowed values
  • Classes
  • Categories
  • Subcategories
  • Category groups
  • Associations
  • Parent-child relationships
  • Interdependencies/exclusivity
  • Methods

10
Ontology Representation
  • Macromolecular Crystallographic Information
    Framework
  • XML Schema Mapping
  • Other emerging ontology representations
    DAMLOIL, RDFS, OWL
  • Many difficult issues
  • resolving semantic ambiguities encoding meaning
  • integrating multiple views and controlled
    vocabularies
  • separation of primary and derived information
  • supporting rapid evolution and changing
    scientific persceptives
  • including detailed process modeling

11
Incremental Data Pipeline
Target Tracking
Incremental Assembly
12
Current Collection and Integration Strategy( If
its not electronic you probably wont get it )
  • Collect status data on the progress of each
    target
  • Collect bits of output from each program step
    Work with software developers to optimize this
    data component
  • Merge the data from each step into common
    representation
  • Use editing tool to enter remaining data and
    check results
  • Make all data files available in the
    representation of the exchange dictionary (beta
    -- ftp//beta.rcsb.org)

13
Software Integration Toolshttp//deposit.pdb.org/
softwarehttp//deposit.pdb.org/mmcif
  • Standalone data input tool creates and edits
    files in PDB exchange format
  • PDB validation suite checks data in exchange
    format
  • Tools to extract and translate data from program
    output files in exchange format
  • Format exchange (PDB, XML conversion tools)
  • C, C, and Perl tools to parse and manage mmCIF

14
Typical Project Deposition Data Flow
Target Selection
Crystal Production
Protein Production
Project Database
Structure Determination
Merged Project Data
Exchange Dictionary
PDB Deposition
15
Target Registration Database(a new form of
sharing )TargetDB  http//targetdb.pdb.org/
  • All targets downloadable in XML (17,950 Targets)
  • Targets downloaded from 13 centers weekly
  • Target search by
  • Sequence (FASTA), project target ID, project
    site, status (selected, cloned, expressed, in
    PDB), update date, protein name, source organism
  • Report output in HTML, FASTA, and XML
  • Integrates sequences from PDB entries (41,000
    sequences including 700 pre-release sequences)
  • Provides links to related sequence databases
  • Open to all Structural Genomics projects

16
Application Level Distribution
  • Corba specification adopted by OMG in February
    2001
  • Based on the PDB exchange data ontology
  • Provides high performance access
  • Direct access to binary data structures
  • Broad granularity of access (individual atoms to
    biological assemblies)

17
CORBA Implementation
  • OpenMMS provides a Java-only toolkit that creates
    XML, CORBA and relational DB representations of
    the PDB data ontology.
  • Allow programmers to more easily create
    efficient, high performance and robust
    applications that use PDB data
  • Provides database-to-database interoperability
  • C server under development
  • Code and examples available at
    http//openmms.sdsc.edu/

18
API Development
  • EJB
  • 60 entities developed
  • LSID
  • In collaboration with I3C
  • Coarse grain SOAP access to PDB and TargetDB data
  • SOAP API
  • Fine grain SOAP access based modeled on Corba
    specification
  • Reuses C Corba server
  • Direct SQL
  • Problems large investment for robust production
    support for potentially short lived technology

19
Access
  • PDB SDSC Access Site
  • http//www.pdb.org/
  • PDB Deposition Sites
  • http//autodep.ebi.ac.uk/
  • http//pdbdep.protein.osaka-u.ac.jp/adit/
  • http//pdb.rutgers.edu/adit/
  • PDB Software Download Site
  • http//pdb.rutgers.edu/software/
  • PDB mmCIF Resource Site http//pdb.rutgers.edu/mmc
    if/
  • mmCIF Beta Data Site
  • ftp//beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/

20
PDB Project Team
Director Helen M. Berman (Rutgers) Co-Directors
John Westbrook (Rutgers), Phil Bourne
(UCSD/SDSC), Gary Gilliland (NIST) Rutgers
Anthony Adelakun, Kyle Burkhardt, Li Chen, Sharon
Cousin, Zukang Feng, Lisa Iype, Shri Jain,
Jessica Marvin, Rose Oughtred, Gnanesh Patel,
Tania Rose Posa, Suzanne Richman, Bohdan
Schneider (Prague), Olivera Tosic, Rosalina
Valera, Christine Zardecki NIST T.N. Bhat,
Phoebe Fagan, Veerasamy Ravichandran, Michael
Tung, Greg Vasquez, Padma Priya Paragi
Vedanthi UCSD/SDSC David Archbell, Peter
Arzberger, Bryan Banister, Tammy Battistuz,
Wolfgang F. Bluhm, Eliot Clingman, Nita
Deshpande, Ward Fleri, Douglas S. Greer, David
Padilla, Thomas Solomon, David Stoner, Peggy
Wagner
21
Sequence Target DTDTargetDB - http//targetdb.pdb
.org
  • lt!ELEMENT targets (target)gt
  • lt!ELEMENT target (id, lab, date, status,
    sequence, name?, url, remark)gt
  • lt!-- required data items --gt
  • lt!-- any lab specified id --gt
  • lt!ELEMENT id (PCDATA)gt
  • lt!-- any lab specified id --gt
  • lt!ELEMENT lab (PCDATA)gt
  • lt!-- most recent update. format YYYY-MM-DD --gt
  • lt!ELEMENT date (PCDATA)gt
  • lt!-- status. One or more or the following
    descriptive terms
  • Selected, Cloned, Expressed, Soluble, Purified,
    Crystallized,
  • Diffraction-quality Crystals, Diffraction NMR,
    Assigned HSQC,
  • Crystal Structure, NMR Structure, In PDB, Work
    Stopped, Other --gt
  • lt!ELEMENT status (PCDATA)gt
  • lt!-- protein sequence in IUPAC 1-letter codes --gt
  • lt!ELEMENT sequence (PCDATA)gt
  • lt!-- optional data items --gt
  • lt!-- any lab-specified name for the protein --gt
  • lt!ELEMENT name (PCDATA)gt

22
Diverse Delivery Options using a Common Data
Dictionary
CRYST1 101.362 114.722 45.591 90.00 90.00
90.00 P 21 21 2 20
  • ltmmCIFGROUP.cell_groupgt
  • ltmmCIFCATEGORY.cellgt
  • ltmmCIFcell entry_id"RCSB000000"gt 
  • ltlength_agt101.362lt/length_agt  
  • ltlength_bgt114.722lt/length_bgt
  • ltlength_cgt45.591lt/length_cgt  
  • ltangle_alphagt90.00lt/angle_alphagt
  • ltangle_betagt90.00lt/angle_betagt  
  • ltangle_gammagt90.00lt/angle_gammagt
  • ltZ_PDBgt20lt/Z_PDBgt  
  • lt/mmCIFcellgt 
  • lt/mmCIFCATEGORY.cellgt 
  • lt/mmCIFGROUP.cell_groupgt
Write a Comment
User Comments (0)
About PowerShow.com