Title: Collaboratory for Multiscale Chemical Science CMCS: A Knowledge Grid Adaptive Informatics Infrastruc
1Collaboratory for Multi-scale Chemical Science
(CMCS)A Knowledge Grid/ Adaptive Informatics
Infrastructure
- Jim Myers, Carmen Pancerella
2CMCS Enabling New Forms of Research and
Communication
- Distributed Research Groups
- Chemical Databases
- Rich Publication
- Community Annotation
- Informatics Analysis
- Cross-scale Communication
- Peer Data Review
- Pedigree Analysis
- Automated informatics
- Automated monitoring/analysis
3Adaptive Informatics Infrastructure
- Infrastructure a well designed, scalable,
reusable, flexible set of tools, middleware, and
services - Informatics the emerging use of semi-automated
means to derive new knowledge from the analysis
of (large amounts of) heterogeneous data,
annotating existing data with its newly
discovered meaning - Adaptive able to dynamically change to
incorporate new knowledge and support new
activities - Low Barriers
- Many access points
- Storage of data in original formats with dynamic
metadata extraction and translation - Powerful
- Arbitrary formats (binary, ASCII, XML)
- Integrated data, metadata, pedigree across
internal and external tools - Evolvable
- Schema can be changed/extended as needed
- Metadata, translations, viewers, portal, etc. can
be dynamically configured
4SAM Architecture
Notebook Services
Semantic Services
DAV, DASL, JMS, SAM Extensions
DAV, JDBC, GridFTP
Metadata Services
DataGrid
5SAM Metadata Services Layer
- Jakarta Slide DAV server plus configurable
- Mime Type Assignment
- CMCS default Based on dcformat tag within .xml
file - Property Generation from binary/ASCII/XML files
- 12 types ? standard CMCS properties
- Resource Translation
- 12 Viewers/Translators for CMCS including
Interactive Applets - Mapping to Data Store(s)
- NIST Kinetics DB
- JMS Events for access and changes
- Feeds events to CMCS NED Email Notification
daemon - Authentication/Authorization model
- (single sign-on with CMCS Portal
username/password or GridCert)
6Extensible Scientific Interchange Language (XSIL)
/ Binary Format Description (BFD) language
- XSIL (Roy Williams, CalTech) - XML Encoding and
Java code for scientific data - Ints, floats, vectors, arrays, time series,
- Can describe the byte structure of external data
files/streams (encoding, byte order,) - Can have link(s) to external data
- BFD (Alan Chappell, Jim Myers, PNNL) XML Encoding
and Java code for describing binary/ascii files - Bug fixes, removed ambiguities
- Parameterized logic (if, while, for)
- Parameterized Stream interface
- Being used as input for Grid Forum Data Format
description Language (DFDL) standard
ltXSILgt ltParam Name"date" Type"String" /gt
ltParam Name"Program Version" Type"float" /gt
ltParam Name"numColumns" Type"int" /gt ltArray
Name"data" Type"float"gt ltDimgt
ltXBFDvalue-of select "/XSIL/Param_at_Name
'numColumns'" /gt lt/Dimgt ltDimgt6lt/Dimgt
lt/Arraygt ltStream Encoding"Binary"
Type"Remote XBFDstreamnumber"0" /gt
lt/XSILgt
7Demo
8Example
- Binary ? XML ? Properties
- Translation of Chemistry Data
- SAM-based Electronic Notebook
- CMCS Portal/Pedigree Browser
ELN
DAV
Fortran Application
SAM
DAV
JMS
Local Disk
DataGrid
9CMCS Provenancede-facto standards
- Cmcshasinputs workflow
- Cmcshasoutputs workflow
- Samhastranslations virtual workflow
- Cmcsispartofproject hierarchy
- Elnchildren hierarchy
- (Davcollection) hierarchy
- Dctermsreferences scientific pedigree
- Dctermsisreferencedby scientific pedigree
- Elnreferences informal/private scientific
pedigree
10Applications/Chemistry Services
- Extensible Computational Chemistry Environment
- Export to CMCS with pedigree/metadata
- Active Thermochemical Tables
- Portlet/web service using CMCS data store
- RIOT adaptive mechanism reduction
- Portlet/web service using CMCS data store
asynchronous invocation mechanism
11Standard Protocol and API
- WebDAV An early web service (XML commands over
HTTP) - A widely adopted standard for metadata/data
transport - Put/Get data with arbitrary properties (dynamic)
- Properties can be discovered and accessed
independently - DASL, Versioning, Transactions,
- JSR 170 Java Content Repository
- An API for working with nodes with properties
(versioning, queries, typing, notification, )
12Path Forward
- Pilot groups doing real chemistry
- Exploring new practice
- Peer-Review / Endorsement Mechanisms/Interfaces
- Digital publication, third party annotation
- Activity Reporting tools
- Scoping Searches, Notifications
- Based on user-defined notion of
provenance/hierarchy - Notebook Views of Other Hierarchies
- E.g. A notebook sharing a computational chemistry
project hierarchy - Validation of Chemical networks
- E.g. Active Thermo-chemical Tables
- Workflow by Example
- Informatics Data File Assembly Tool
13URLs/Team Members
- http//cmcs.org/
- http//www.scidac.org/SAM/
CMCS Team Members Thomas C. Allison, Kaizar
Amin, Sandra Bittner, Brett Didier, Michael
Frenklach, William H. Green, Jr., Yen-Ling Ho,
John Hewson, Wendy Koegler, Carina Lansing, David
Leahy, Michael Lee, Renata McCoy, Michael
Minkoff, James D. Myers, Sandeep Nijsure, Gregor
von Laszewski, David Montoya, Carmen Pancerella,
Reinhardt Pinzon, William Pitz, Larry Rahn,
Branko Ruscic, Karen Schuchardt, Eric Stephan, Al
Wagner, Baoshan Wang, Theresa Windus, Lili Xu,
Christine Yang