Fast Access to Big Molecules An Introduction to the OMG/LSR Macromolecular Structure API - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Fast Access to Big Molecules An Introduction to the OMG/LSR Macromolecular Structure API

Description:

Macromolecular Structure (MMS) Metamodel. Parser, XML, SQL. CORBA ... Promote well defined MMS specifications. Deposition and Archiving ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 30
Provided by: Brad1169
Learn more at: http://lsr.omg.org
Category:

less

Transcript and Presenter's Notes

Title: Fast Access to Big Molecules An Introduction to the OMG/LSR Macromolecular Structure API


1
Fast Access to Big MoleculesAn Introduction to
the OMG/LSR Macromolecular Structure API
  • Douglas S. Greer
  • University of California, San Diego
  • Alexy Khrabrov
  • Rutgers University
  • Philip E. Bourne
  • University of California, San Diego
  • John D. Westbrook
  • Rutgers University

2
Overview
  • RCSB
  • An Ontology Driven Architecture
  • OpenMMS Toolkit
  • Macromolecular Structure (MMS) Metamodel
  • Parser, XML, SQL
  • CORBA
  • Ten Macromolecular Structure Classes
  • Two Examples of the API

3
What Is the RCSB?
  • The Research Collaboratory for Structural
    Bioinformatics
  • Manages the Protein Data Bank
  • Members
  • University of California San Diego
  • Rutgers University
  • National Institute of Standards and Technology
  • http//www.rcsb.org - info_at_rcsb.org


4
RCSB Goal
To enable the science of molecular biology
5
How to Enable?
  • Promote well defined MMS specifications
  • Deposition and Archiving
  • Fast turnaround of accurate data
  • Capture more information directly
  • Distribution Open Interfaces
  • Classic
  • flat files
  • Web browsing and searching
  • New
  • XML, SQL, CORBA

6
RCSB Internal Responsibilities
X-ray and NMR Depositions
  • Rutgers - Data deposition and validation
  • UCSD - Data query and distribution
  • NIST - Long term archive and data clean-up

EBI
Direct
Rutgers
Cleanup
UCSD
Rutgers
NIST
Mirrors
7
Why OpenMMS?
  • Allow programmers to more easily create
    efficient, high performance and robust
    applications.
  • A Java-only toolkit with that creates XML, CORBA
    and Relational DB representations of the mmCIF
    Macromolecular Structure Data.
  • Source code is publicly available
  • Extensible add your own dictionary definitions
    and data

8
mmCIF Dictionary and Data Files
  • Based on Ontology for Macromolecular Structure
    defined by the International Union of
    Crystallography
  • Replaces the older 80-Column PDB files
  • mmCIF Dictionary contains over 140 Category and
    1600 Item definitions
  • Open Standards Process
  • Provides a well-defined reference standard for
    the specification and distribution of
    macromolecular structure data

9
Ontology Driven Architecture OpenMMS Toolkit
Data Flow
mmCIF Parsers
Applications
XML Files
mmCIF Data Files (Reference Standard)
Relational Database
CORBA Server
10
Ontology Driven Architecture Metamodel
Information Flow
mmCIF Dictionaries (Ontology)
Ontology Metamodel
Metamodel Framework
CORBA IDL, SQL Schema, XML DTD, Java Data
Loaders JDBC Loaders
11
MMS Metamodel Hierarchy
Root
Visitor Abstract Class
Module
Module
Interface
Struct
Visitor Subclass
Struct
Struct
Field
Field
12
Some Advantages of Using an Ontology Driven
Architecture
  • Scales to very large Ontologies
  • More reliable and maintainable code
  • Transfer between representations
  • Scientific correctness of representation
  • Help in maintaining backward compatibility

13
mmCIF Parsers
  • General Purpose, Low-level access to data
  • Parsers available in many Languages
  • OpenMMS toolkit includes Java Parser
  • An application subclasses Abstract class and
    stores data into its own data structure

14
MMS in XML (Prototype)
  • Very Large Flat Files (due to open and close
    tags)
  • CIF ? mmCIF
  • Tables can be grouped by rows or columns
  • XML from SQL Query

15
Relational DB Expression
  • SQL-92 Compatible
  • Schemas for all the standard DB vendors
  • Oracle, DB2, mySQL, MS Access, Sybase
  • Fast and Flexible Keyword searches
  • PDBase loader allows structures to be selectively
    loaded

16
CORBA Expression of MMS Data
  • No Parsing of Flat Files
  • Direct Access to Binary Data Structures
  • Strongly Typed Data
  • Granularity of Access
  • Indices and Presence Flags Pre-computed
  • Highest Performance

17
Two OpenMMS Tools pdbase and dbserv
dbserv Corba Server
Compute Farm
pdbase DB loader
RAM Cache
18
OMG/LSR MMS Specification Adoption Process
  • August 1999 RFP issued
  • March 2000 Initial Submission
  • September 2000 Revised Submission
  • February 2001 Adopted by the OMG
  • November 2001 Version 1.0 of OpenMMS source
    code publicly available
  • February 2002 Formal OMG Specification

19
Using the CORBA MMS Server
An excerpt from a legacy 80-column PDB Formatted
File (4hhb) ... ATOM 6 CG1 VAL A 1
7.009 20.127 5.418 6.00 61.79 ... ATOM
7 CG2 VAL A 1 5.246 18.533 5.681
6.00 80.12 ... ATOM 8 N LEU A 2
9.096 18.040 3.857 7.00 26.44 ... ATOM
9 CA LEU A 2 10.600 17.889 4.283
6.00 26.32 ... ATOM 10 C LEU A 2
11.265 19.184 5.297 6.00 32.96 ... ATOM
11 O LEU A 2 10.813 20.177 4.647
8.00 31.90 ... ATOM 12 CB LEU A 2
11.099 18.007 2.815 6.00 29.23 ... ATOM
13 CG LEU A 2 11.322 16.956 1.934
6.00 37.71 ... ATOM 14 CD1 LEU A 2
11.468 15.596 2.337 6.00 39.10 ... ATOM
15 CD2 LEU A 2 11.423 17.268 .300
6.00 37.47 ... ...
20
LSR/MMS ATOM Record
DsLSRMacromolecularStructure.idl excerpt
struct AtomSite string id
IndexId type_symbol AtomIndex label
IndexId label_entity VectorXYZ
cartn float occupancy float
b_iso_or_equiv
21
Example code to get a list of atomic coordinates
Entry e entryFactory.get_entry_from_id(4hhb")
AtomSite a e.get_atom_site_list()
for (int i 0 i lt a.length i)
System.out.println(ai.id " "
ai.type_symbol.id " ("
ai.cartn.x ", " ai.cartn.y ", "
ai.cartn.z ")")
produces 1 N (11.065, 7.352, 9.598) 2 C
(12.436, 7.764, 9.902) 3 C (12.883, 7.09,
11.208) 4 O (12.088, 7.0, 12.147) 5 C (12.611,
9.264, 10.06) ...
22
Overview of Ten Core Classes (mmCIF Categories)
23
Ten Core Classes continued...
24
Secondary Structure Core Classes
25
Secondary Structure Code Example
Entry e entryFactory.get_entry_from_id("4hhb")
StructConf scf e.get_struct_conf_list() Enti
tyPolySeq eps e.get_entity_poly_seq_list() C
hemComp cc e.get_chem_comp_list()
for (int j 0 j lt scf.length j)
System.out.println("Structure Conformation "
scfj.id " in chain "
scfj.beg_label.asym.id " contains") int
start scfj.beg_label.seq.index int end
scfj.end_label.seq.index for (int i
start i lt end i)
System.out.println(" Monomer "
ccepsi.mon.index.name " ("
epsi.mon.id ") at position " epsi.num)

26
Secondary Structure Print Results
... Structure Conformation HELX_P24 in chain C
contains Monomer THREONINE (THR) at
position 118 Monomer PROLINE (PRO) at
position 119 Monomer ALANINE (ALA) at
position 120 Monomer VALINE (VAL) at
position 121 Monomer HISTIDINE (HIS) at
position 122 Monomer ALANINE (ALA) at
position 123 Monomer SERINE (SER) at
position 124 ...  
27
Secondary Structure Code Example
Entry e entryFactory.get_entry_from_id("4hhb")
StructConf scf e.get_struct_conf_list() Enti
tyPolySeq eps e.get_entity_poly_seq_list() C
hemComp cc e.get_chem_comp_list()
for (int j 0 j lt scf.length j)
System.out.println("Structure Conformation "
scfj.id " in chain "
scfj.beg_label.asym.id " contains") int
start scfj.beg_label.seq.index int end
scfj.end_label.seq.index for (int i
start i lt end i)
System.out.println(" Monomer "
ccepsi.mon.index.name " ("
epsi.mon.id ") at position " epsi.num)

28
Work in progress
  • MMS Corba API Graphics Applications
  • MMS Corba API Searching and Analysis
    Applications
  • MMS Corba API Linux Cluster Applications

29
Thanks and Acknowledgments
  • Michael Miller
  • Martin Senger
  • Lynn TenEyck
  • David Benton
  • Helen Berman
  • Karl Konnerth

The OMG
Write a Comment
User Comments (0)
About PowerShow.com