Title: 1 of 29
1XML as a Strategic Information Technology for DOE
- Frank Olken John L. McCarthy
- Lawrence Berkeley National Lab
-
- InterLab 99
- Stanford Linear Accelerator Center
- November 4, 1999
2Summary Overview
- XML can help DOE share information
- Why XML?
- What DOE applications could benefit?
- Requires additional standards/infrastructure
- content standards
- code sets
- repositories / registries
3LBNL Metadata Activities sponsored by US EPA
Superfund
- W3C RDF, XML Schema, XML Query Lang. WGs
- ISO SC32 WG2 / NCITS L8
- ISO 11179, ANSI X3.285, XMI
- XML/XSL Prototype for metadata exchange
- OMG - XMI, CWDMI
- Ecommerce - eCo Framework, OASIS/CEFACT
4Where can (should) DOE use XML?
- EDI / Ecommerce
- Scientific Engineering Data Exchange
- Remote Procedure Call / Object Serialization
- Scientific Data Archiving
- Publishing (scientific, other)
- Metadata Exchange
- Information Discovery, Query, Integration
5XML provides syntax, but data sharing also
requires other standards layers
- Structure Standards (e.g., XML Schema)
- Content Standards (e.g., geospatial, chemical,
...) - Repository Standards (e.g., ISO 11179)
- Measurement units standards
- Digital Signature Infrastructure
- Repositories of standard data elements, code
sets, - .
6Analogy to Human Language
- Communication requires shared language
- Shared alphabet
- Shared vocabulary
- Shared syntax and structure
- We spend years teaching children language/reading
- We spend more years teaching specialized
vocabularies (medicine, chemistry, .)
7Attractions of XML
- XML is essentially an LL(1) parser generator
- Human readable, editable, ...
- Little semantics specified
- Applies to both documents and data
- Can encode many different data models
- Can encode programming languages (e.g. for
workflow) - Many tools are becoming available
(parsers , formatting engines, repositories,
query tools, )
8Potential XML Competitors?
- ASN.1 - data only (no docs), often not human
readable, richer base types, smaller tool base - netCDF, HDF5 - binary, data only, not human
readable - SGML - limited tool base (no browsers)
- X12 EDI messages - positional, idiosyncratic
- comma delimited files - positional, error prone
- STEP Part 21 files - product data, narrow tool
base - domain specific formats - ad hoc, small tool base
9 XML Data Model
- Eclectic (SGML Legacy)
- Hybrid hierarchical / network
- hierarchical tree of nested elements
- network IDREFs, XLINKS
- Often described via graph data model
- Can encode OO, relational other models
- but may need to query higher level data models
10XML for DOE EDI / Ecommerce
- CME (FTPA proposal database)
- But inconsistent budget account structures, etc?
- Procurement
- Travel
- Human resources (job postings, resumes)
- Financial reporting to DOE
- Enabling technology for workflow management
11XML for Scientific Data Exchange
- Applications
- Astronomy
- Bioseqence ML
- ChemML (chemical compound markup)
- XML is verbose - unsuited for very large
databases - XML Schema language currently lacks vectors,
arrays, complex numbers, measurement units, ...
12XML for Scientific Data Archiving
- Similar requirements to scientific data exchange
- Preserve data and meaning over time
- Requires preservation of data and programs to
decode / access data - Attractions of XML - human readable, long lived
standard, many tools - e.g., San Diego Xarchive
- LLNL ?, LANL ?
13XML for Engineering Data Exchange
- Mechanical / architectural CAD data exchange
- XML encoded STEP Part 21 files Part 22
- aecXML (for architecture / construction)
- Major application arena for DOE
- EPRI / NERC CCAPI - power systems config.
14XML for Scientific Publishing
- Single source document for multiple media outputs
- Much more flexible powerful than HTML for print
- MathML for math, ChemML for chemistry
- MathML, ChemML tools recently available
15XML for RPC / Object Serialization
- RPC remote procedure call
- used in place of XDR, IIOP,
- used for Java object serialization (comm/storage)
- more verbose, human readable
- many possible transport protocols (HTTP, email,)
- examples XML RPC, Web Methods WIDL
16XML for Metadata Exchange
- XMI - UML model interchange (DB schemas, )
- RDF (Resource Description Framework)
- Content Standards (Dublin Core, GILS, FGDC, )
17XML for Information Discovery, Query
Integration
- Semantic markup permits more precise retrieval
- Content standards Dublin core, FGDC, MARC
- Energy abstracts, molecular biology annotations /
abstracts - XML Query Language being standardized
- Use of XML for mediators, data integration
- SDSC, Stanford, ATT, U of Washington,
- XML can transform WWW to World Wide Database
18XML Structural Standards Needed
- XML (done)
- XML Schema Language (1Q 2000)
- XML Query Language (4Q 2000?)
- XMI (V1.0 done)
- Process (workflow) Protocol modeling
- Petri Nets
- Coupled Finite State Automata
19XML Content Standards Needed
- XML Schemas
- Domain Data Models (schemas)
- e.g., workflow, ...
- Message Content Specifications
- views over domain data models
- XML/EDI, HL7
- Standard code sets, vocabularies
- Accounting Standards (e.g., overhead definitions)
20Standard code sets, vocabularies
- Element / Subatomic particle names/symbols
- Chemical names
- Geographic place names, codes
- SNOMED (std. nomenclature for medicine)
- LOINC (std. clinical lab test names, )
- X12 message codes
21Measurement Units Stds. Needed
- Conventions for specification of
- Measurement units (meters, feet, kilograms)
- Dimensionality (length, mass, time)
- Automatic inference/checking of
dimensionality/units consistency in query
expressions
22Repositories Needed
- Thesauri, ontologies data element registries,
- Means of sharing code sets, vocabularies
- Means of sharing schemas, message formats
- Implies shared metamodel
- Requires access interface (APIs, query language)
23Data Element Registry Activities
- ISO 11179, ANSI X3.285
- EPA Environmental Data Registry
- HCFA
- DOD Healthcare
- OASIS (Ecommerce)
- Australian Healthcare
24Digital Signatures Needed
- Essential if digital documents are to replace
paper - Need digital signature standards (several exist)
- W3C Digital Signature Working Group
- Need certificate authorities (coming slowly)
- May need smart cards (slow in US)
- Need legal acceptance (most states, pending in
Congress)
25Current DOE Standards Infrastructure Efforts
- Data exchange formats HDF5 for ASCI DMF
- Data models Fiber bundle for ASCI DMF
- Infrastructure Development
- digital signature certificate authorities
- Repositories ???
- Standards development
- DOE is AWOL
26You are invited to participate in the Open Forum
on Metadata Registries
- January 17-21, 2000
- Santa Fe, New Mexico
- Sponsored by ISO SC32 WG2, JTC1, EPA
- Presentations of ISO11179, X3.285, XMI,
implementations, - Disciplinary tracks environmental, healthcare,
... - http//www.sdct.itl.nist.gov/ftp/l8/sc32wg2/2000/
events/openforum/index.htm
27Conclusions
- XML could become strategic IT for DOE
- Facilitates information sharing
- Needs complementary standards
- Requires more infrastructure
- DOE needs to get its act together .
28Acknowledgements
- Paid for by U.S. Environmental Protection Agency
Superfund Office - No DOE funding yet .
29Contact Information
- Frank Olken, LBNL
- Mailstop 50B-3238
- tel. 510-486-5891
- olken_at_lbl.gov
- http//www.lbl.gov/olken
- John McCarthy, LBNL
- Mailstop 50C
- tel. 510-486-5307
- jlmccarthy_at_lbl.gov
- http//www.lbl.gov/mccarthy