1 of 29 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

1 of 29

Description:

Attractions of XML. XML is essentially an LL(1) parser ... Attractions of XML - human readable, long lived standard, many tools. e.g., San Diego Xarchive ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 30
Provided by: frank438
Category:
Tags: attractions | diego | san

less

Transcript and Presenter's Notes

Title: 1 of 29


1
XML as a Strategic Information Technology for DOE
  • Frank Olken John L. McCarthy
  • Lawrence Berkeley National Lab
  • InterLab 99
  • Stanford Linear Accelerator Center
  • November 4, 1999

2
Summary Overview
  • XML can help DOE share information
  • Why XML?
  • What DOE applications could benefit?
  • Requires additional standards/infrastructure
  • content standards
  • code sets
  • repositories / registries

3
LBNL Metadata Activities sponsored by US EPA
Superfund
  • W3C RDF, XML Schema, XML Query Lang. WGs
  • ISO SC32 WG2 / NCITS L8
  • ISO 11179, ANSI X3.285, XMI
  • XML/XSL Prototype for metadata exchange
  • OMG - XMI, CWDMI
  • Ecommerce - eCo Framework, OASIS/CEFACT

4
Where can (should) DOE use XML?
  • EDI / Ecommerce
  • Scientific Engineering Data Exchange
  • Remote Procedure Call / Object Serialization
  • Scientific Data Archiving
  • Publishing (scientific, other)
  • Metadata Exchange
  • Information Discovery, Query, Integration

5
XML provides syntax, but data sharing also
requires other standards layers
  • Structure Standards (e.g., XML Schema)
  • Content Standards (e.g., geospatial, chemical,
    ...)
  • Repository Standards (e.g., ISO 11179)
  • Measurement units standards
  • Digital Signature Infrastructure
  • Repositories of standard data elements, code
    sets,
  • .

6
Analogy to Human Language
  • Communication requires shared language
  • Shared alphabet
  • Shared vocabulary
  • Shared syntax and structure
  • We spend years teaching children language/reading
  • We spend more years teaching specialized
    vocabularies (medicine, chemistry, .)

7
Attractions of XML
  • XML is essentially an LL(1) parser generator
  • Human readable, editable, ...
  • Little semantics specified
  • Applies to both documents and data
  • Can encode many different data models
  • Can encode programming languages (e.g. for
    workflow)
  • Many tools are becoming available
    (parsers , formatting engines, repositories,
    query tools, )

8
Potential XML Competitors?
  • ASN.1 - data only (no docs), often not human
    readable, richer base types, smaller tool base
  • netCDF, HDF5 - binary, data only, not human
    readable
  • SGML - limited tool base (no browsers)
  • X12 EDI messages - positional, idiosyncratic
  • comma delimited files - positional, error prone
  • STEP Part 21 files - product data, narrow tool
    base
  • domain specific formats - ad hoc, small tool base

9
XML Data Model
  • Eclectic (SGML Legacy)
  • Hybrid hierarchical / network
  • hierarchical tree of nested elements
  • network IDREFs, XLINKS
  • Often described via graph data model
  • Can encode OO, relational other models
  • but may need to query higher level data models

10
XML for DOE EDI / Ecommerce
  • CME (FTPA proposal database)
  • But inconsistent budget account structures, etc?
  • Procurement
  • Travel
  • Human resources (job postings, resumes)
  • Financial reporting to DOE
  • Enabling technology for workflow management

11
XML for Scientific Data Exchange
  • Applications
  • Astronomy
  • Bioseqence ML
  • ChemML (chemical compound markup)
  • XML is verbose - unsuited for very large
    databases
  • XML Schema language currently lacks vectors,
    arrays, complex numbers, measurement units, ...

12
XML for Scientific Data Archiving
  • Similar requirements to scientific data exchange
  • Preserve data and meaning over time
  • Requires preservation of data and programs to
    decode / access data
  • Attractions of XML - human readable, long lived
    standard, many tools
  • e.g., San Diego Xarchive
  • LLNL ?, LANL ?

13
XML for Engineering Data Exchange
  • Mechanical / architectural CAD data exchange
  • XML encoded STEP Part 21 files Part 22
  • aecXML (for architecture / construction)
  • Major application arena for DOE
  • EPRI / NERC CCAPI - power systems config.

14
XML for Scientific Publishing
  • Single source document for multiple media outputs
  • Much more flexible powerful than HTML for print
  • MathML for math, ChemML for chemistry
  • MathML, ChemML tools recently available

15
XML for RPC / Object Serialization
  • RPC remote procedure call
  • used in place of XDR, IIOP,
  • used for Java object serialization (comm/storage)
  • more verbose, human readable
  • many possible transport protocols (HTTP, email,)
  • examples XML RPC, Web Methods WIDL

16
XML for Metadata Exchange
  • XMI - UML model interchange (DB schemas, )
  • RDF (Resource Description Framework)
  • Content Standards (Dublin Core, GILS, FGDC, )

17
XML for Information Discovery, Query
Integration
  • Semantic markup permits more precise retrieval
  • Content standards Dublin core, FGDC, MARC
  • Energy abstracts, molecular biology annotations /
    abstracts
  • XML Query Language being standardized
  • Use of XML for mediators, data integration
  • SDSC, Stanford, ATT, U of Washington,
  • XML can transform WWW to World Wide Database

18
XML Structural Standards Needed
  • XML (done)
  • XML Schema Language (1Q 2000)
  • XML Query Language (4Q 2000?)
  • XMI (V1.0 done)
  • Process (workflow) Protocol modeling
  • Petri Nets
  • Coupled Finite State Automata

19
XML Content Standards Needed
  • XML Schemas
  • Domain Data Models (schemas)
  • e.g., workflow, ...
  • Message Content Specifications
  • views over domain data models
  • XML/EDI, HL7
  • Standard code sets, vocabularies
  • Accounting Standards (e.g., overhead definitions)

20
Standard code sets, vocabularies
  • Element / Subatomic particle names/symbols
  • Chemical names
  • Geographic place names, codes
  • SNOMED (std. nomenclature for medicine)
  • LOINC (std. clinical lab test names, )
  • X12 message codes

21
Measurement Units Stds. Needed
  • Conventions for specification of
  • Measurement units (meters, feet, kilograms)
  • Dimensionality (length, mass, time)
  • Automatic inference/checking of
    dimensionality/units consistency in query
    expressions

22
Repositories Needed
  • Thesauri, ontologies data element registries,
  • Means of sharing code sets, vocabularies
  • Means of sharing schemas, message formats
  • Implies shared metamodel
  • Requires access interface (APIs, query language)

23
Data Element Registry Activities
  • ISO 11179, ANSI X3.285
  • EPA Environmental Data Registry
  • HCFA
  • DOD Healthcare
  • OASIS (Ecommerce)
  • Australian Healthcare

24
Digital Signatures Needed
  • Essential if digital documents are to replace
    paper
  • Need digital signature standards (several exist)
  • W3C Digital Signature Working Group
  • Need certificate authorities (coming slowly)
  • May need smart cards (slow in US)
  • Need legal acceptance (most states, pending in
    Congress)

25
Current DOE Standards Infrastructure Efforts
  • Data exchange formats HDF5 for ASCI DMF
  • Data models Fiber bundle for ASCI DMF
  • Infrastructure Development
  • digital signature certificate authorities
  • Repositories ???
  • Standards development
  • DOE is AWOL

26
You are invited to participate in the Open Forum
on Metadata Registries
  • January 17-21, 2000
  • Santa Fe, New Mexico
  • Sponsored by ISO SC32 WG2, JTC1, EPA
  • Presentations of ISO11179, X3.285, XMI,
    implementations,
  • Disciplinary tracks environmental, healthcare,
    ...
  • http//www.sdct.itl.nist.gov/ftp/l8/sc32wg2/2000/
    events/openforum/index.htm

27
Conclusions
  • XML could become strategic IT for DOE
  • Facilitates information sharing
  • Needs complementary standards
  • Requires more infrastructure
  • DOE needs to get its act together .

28
Acknowledgements
  • Paid for by U.S. Environmental Protection Agency
    Superfund Office
  • No DOE funding yet .

29
Contact Information
  • Frank Olken, LBNL
  • Mailstop 50B-3238
  • tel. 510-486-5891
  • olken_at_lbl.gov
  • http//www.lbl.gov/olken
  • John McCarthy, LBNL
  • Mailstop 50C
  • tel. 510-486-5307
  • jlmccarthy_at_lbl.gov
  • http//www.lbl.gov/mccarthy
Write a Comment
User Comments (0)
About PowerShow.com