High Performance Application Program Interfaces to Macromolecular Structure Data - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

High Performance Application Program Interfaces to Macromolecular Structure Data

Description:

ADIT. Annotate. Validate. Depositor Approval. Validation Report ... ADIT. ADITsrv. Features of System. Different dictionaries without software changes ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 20
Provided by: chem254
Category:

less

Transcript and Presenter's Notes

Title: High Performance Application Program Interfaces to Macromolecular Structure Data


1
Data Integration and Management A PDB Perspective
http//www.pdb.org/  info_at_rcsb.org
2
What is PDB?
  • Single international repository of
    three-dimensional data for biological
    macromolecules
  • Public community resource
  • Established at Brookhaven in 1971 (7 structures)
  • Moves to RCSB in 1998
  • wwPDB established in 2004
  • gt 25,000 structures in PDB

3
Community
  • Scientific Community - at all levels
  • Structural biologists (crystallography, NMR,
    cryo-EM)
  • Biologists
  • Computational biologists
  • Journals
  • General Community
  • Secondary school
  • General public
  • Internal
  • RCSB PDB staff
  • wwPDB members

4
Data Representation
  • Macromolecular Crystallographic Information
    Framework
  • XML DTD/Schema Mapping
  • SQL Schema Mapping
  • CORBA IDL Mapping
  • Supporting emerging ontology representations - OWL

5
Elements of Dictionary Metadata
  • Data Attributes
  • Definition
  • Examples
  • Data type (primitive type/regular expression
    patterns)
  • Range or allowed values
  • Classes
  • Categories
  • Subcategories
  • Category groups
  • Associations
  • Parent-child relationships
  • Interdependencies/exclusivity
  • Methods

6
Difficult Issues
  • Resolving semantic ambiguities encoding meaning
  • Integrating controlled vocabularies
  • Separation of primary and derived information
  • Supporting rapid evolution of science

7
Whats Driving Data Definition
  • IUCr-sponsored community effort
  • Automated data acquisition
  • Data management and data exchange for PDB
  • New technologies (e.g. cryo-electron microscopy)
  • High-throughput structure determination and
    structural genomics

8
Typical Project Deposition Data Flow
Target Selection
Crystal Production
Protein Production
Project Database
Structure Determination
Merged Project Data
Exchange Dictionary
PDB Deposition
9
Data Sharing Nightmare
10
Incremental Data Pipeline
11
Current Integration Strategy
  • Provide software tools to collect bits of data
    from the output from each program step
  • Convert data in log and output files to a common
    representation
  • Merge the data corresponding to the successful
    outcome
  • Provide an editor tool to enter remaining data
    and check consistency of results

12
Data Deposition and Annotation
Step 2
Validation Report
Step 1
Depositor
Annotate
ADIT
Step 3
Step 4
Depositor Approval
Step 5
13
Integrated Data Processing System
MAXIT
Validation
Data
Assembled by Depositor
ADIT ADITsrv
ADIT
Client Input Tool
Database Loader
ADITsrv
Reports Final Files
Metadata Dictionaries
Data Views
14
Features of System
15
Data Distribution
mmCIF Parsers
Applications
XML Files
mmCIF Data Files (Data Reference Standard)
Relational Database
API Servers
16
Automatic Production of Macromolecular
Structure API Components
PDB Exchange Dictionary API Specific Data
Dictionaries
Metamodel Framework
CORBA IDL, SQL Schema, XML DTD/Schemas, Data
Loaders Database Access Classes
17
Management
  • Complex challenges in technology and sociology
  • Communicate and work with diverse community
  • Help create and enforce community policies and
    standards
  • Must take advantage of the most current
    innovations in new technologies
  • New technologies must be introduced so as to
    enable and not disrupt the users of the resource
  • Beyond all else is the need for good data and a
    robust data representation

18
Access
  • RCSB Protein Data Bank Site
  • http//www.pdb.org/
  • OpenMMS site (Java implementation)
  • http//openmms.sdsc.edu/
  • RCSB PDB Software Download Site (C and Python
    implementation, NDB server)
  • http//deposit.pdb.org/mmcif/FILM/
  • RCSB PDB Dictionary Resource Site
  • http//deposit.pdb.org/mmcif/
  • RCSB PDB Beta Data Site
  • ftp//beta.rcsb.org/pub/pdb/uniformity/data/

19
http//www.pdb.org/  info_at_rcsb.org
  • Operated by three members of the RCSB Rutgers,
    The State University of New Jersey San Diego
    Supercomputer Center at the University of
    California, San Diego Center for Advanced
    Research in Biotechnology/UMBI/NIST
  • The RCSB PDB is supported by funds from the
    National Science Foundation (NSF), the National
    Institute of General Medical Sciences (NIGMS),
    the Office of Science, Department of Energy
    (DOE), the National Library of Medicine (NLM),
    the National Cancer Institute (NCI), the National
    Center for Research Resources (NCRR), the
    National Institute of Biomedical Imaging and
    Bioengineering (NIBIB), and the National
    Institute of Neurological Disorders and Stroke
    (NINDS).
  • The RCSB PDB is a member of the wwPDB
    (http//www.wwpdb.org/)
Write a Comment
User Comments (0)
About PowerShow.com