Title: What XML Schema Designers Need to Know About Measurement Units
1What XML Schema Designers Need to Know About
Measurement Units
- Frank Olken and John McCarthy
- Lawrence Berkeley National Laboratory
- Presented to XTECH 2000, San Jose, CA
- February 29, 2000
2Content of Talk
- Syntax issues - markup
- Semantics- units, dimensionality
- Architecture - references to shared registries
- Units registry operations - who???
3What is the issue?
- XML is used for data exchange
- Often used to encode measured quantities
- e.g., for Ecommerce, engineering, medicine
- How do we encode measurement units?
4Why do we need measurement units?
- Quantities without units are meaningless !!!
- Misunderstandings about units causes
- loss of spacecraft (Mars Climate Explorer)
- contractual disputes
- potential loss of life (in medicine)
- Common error
- Delusions of shared assumptions about measurement
units.
5Actually two issues
- What do we need to specify?
- Units
- Dimensionality
- Property ?
- How do we write it in XML?
6 XML encodings of units
- Implicitly structured strings
- ltheightgt 5 inches lt/heightgt
- bad style (requires special parser)
7 XML encodings of units
- Explicit markup is better, e.g.,
- ltspeedgt
- ltvaluegt 5 lt/valuegt
- ltunitsgt km/hr lt/unitsgt
- lt/speedgt
- advantage easy to identify units info.
- problem parsing units values still requires
special purpose parser
8 XML encodings of units
- Explicit markup with namespaces/URIs
- ltx xmlnsisoUnitshttp//www.iso.org/units gt
- ltspeedgt
- ltvaluegt 5 lt/valuegt
- ltunitsgt ltxlink typesimple, REFisoUnitskmPer
Hour /gt lt/unitsgt - lt/speedgt lt/xgt
- advantages
- exploits existing lookup mechanism
- standard units designators, mechanism for
checking dimensional consistency - disadvantage
- nonstandard units names (XML prohibits slashes
in IDs)
9Architecture of XML Units Encoding
- Use URI references from schemas/instances into
standard units registries - Units/Dimensionality registries have detailed
fully marked up descriptions of derived/composite
units/dimensions in terms of - Basis units/dimensions
10Basis units declaration (XML)
- ltunitdecl ID"meter" /gt
- ltnamegt meter lt/namegt
- ltUnitTypegt Base lt/UnitTypegt
- ltsymbolgt m lt/symbolgt
- ltdimensionalitygt ltA
xmllinksimple REFlength" /gt - lt/dimensionalitygt
- ltdefinitiongt
- Length of path traveled
by light in a time interval of - 1/299,792,458 second.
- lt/definitiongt
- ltcitegt ltA xmllinksimple ref..
/gt lt/citegt - lt/unitdeclgt
11Derived units declaration (XML)
- ltunitdecl ID"inch" /gt
- ltnamegt Inch lt/namegt
- ltsymbol gt in lt/symbolgt
- ltunitTypegt Derived lt/unitTypegt
- ltdimensionalitygt ltA
xmllinksimple REF"length" /gt - lt/dimensionalitygt
- ltconversionfactorgt 0.0254
lt/conversionfactorgt - ltunits REF"meter" /gt
- lt/unitdeclgt
12Composite units decl. (XML)
- ltunitdecl ID"metersPerSecond" gt
- ltnamegt MetersPerSecond lt/namegt ltUnitTypegt
Composite lt/UnitTypegt - ltdimensionalitygt ltA xmllinksimple
REF"DimSpeed" /gt - lt/dimensionalitygt
- ltunitsgt
- ltnumeratorgt
- ltunitgt ltradixgt ltA xmllink"simple"
REF"meter" /gtlt/radixgt - ltexponentgt 1 lt/exponentgt
lt/unitgtlt/numeratorgt - ltdenominatorgt
- ltunitgt ltradixgt ltA xmllink"simple"
REF"second"/gtlt/radixgt - ltexponentgt 1
lt/exponentgt lt/unitgt ltdenominatorgt - lt/unitsgt
- lt/unitdeclgt
13Units vs. Dimensionality
- Units
- representational issue
- e.g., feet, meters, centimeters
- Dimensionality
- semantic concept
- e.g., mass, length, time
- speed length / time
14Units not sufficient, need dimensionality
- Example tons
- unit of mass
- unit of power (refrigeration)
- unit of energy (megatons)
- These are all homonyms which need to be
distinguished, hence - Also need to specify dimensionality
15Dimensional consistency
- Units with same dimensionality are said to be
dimensionally consistent - Examples
- Dimensionality time
- units seconds, minutes, hours, days
- Dimensionality length
- units feet, inches, meters, kilometers
- Dimensionality Length / Time
- units kilometers / hour, meters/second,
miles/hour
16Significance of dimensional consistency
- Usually IMPLIES
- Comparability
- Additivity
- Automated unit conversion
- Some exceptions to these rules
- (see below)
17Classic representation of dimensionality
- Product of powers of basis dimensions
- Basis dimensions
- mass, length, time, number of (moles), current,
luminous intensity, temperature - Exponents
- integers, range from -4 to 4
- Example
- energy Mass Length2 Time(-2)
18Dimensionless quantities concentration
- mass/mass, moles/moles, or volume/volume
- Classic theory says these are dimensionally
consistent, hence comparable, additive . - NO !!!
- Need to distinguish dimensionless mass, molar,
volumetric ratios
19Revised dimensionality
- Numerator
- product of non-negative powers of basis
dimensions - Denominator
- product of non-negative powers of basis
dimensions - Can distinguish
- mass/mass vs. moles / moles
- Problem breaks the dimensional algebra
20When to specify units/dimensionality?
- Specify in schema? Or in instance?
- Preferred
- specify both dimensionality and units in schema
- homogeneous units in document instance produces
fewer errors - easier to search
21Second best choice
- Specify dimensionality in schema
- Specify units in document instance
- e.g., per element instance
- e.g, dimensionality length (in schema)
- Units feet or meters (in instance)
- Advantage can check for plausible units
22Worst Choice
- Specify both units and dimensionality in document
instances. - Checking of dimensions is impossible.
- Can check for units consistent with
dimensionality. - Necessary for heterogeneous catalogs ...
23Ideal XML Encoding of Units and Dimensionalities
- Extend XML Schema Basic Datatypes
- Add facets on types to encode units and
dimensionality - Hence, schema/query processor can check units and
dimensionality - Problems huge type lattice, complex type
checking / unit conversion
24XML Encoding Considerations
- Use detailed markup to specify units and
dimensionality. - This simplifies design of processor for checking
units compatibility and automatic units
conversion. - Result is very verbose - see above.
25Practical XML Encoding Solution
- Store detailed units/dimensionality info at well
known site - Use URI reference to point to full
units/dimensionality specification - Use namespaces to shorten URI reference in
instances - Need canonical encodings of units for URI
references
26Corollaries
- Someone needs to maintain units/dimensionality
repositories - Separate applications to check units
compatibility are needed - Can automate units conversion
- Implies standard XML query language will not
check units .
27Basis Dimension Declaration (XML)
- ltdimensiondecl ID"length" gt
- ltnamegt length lt/namegt
- ltDimensionTypegt Base
lt/DimensionTypegt - ltdefinitiongt
- A measurement of
distance. - lt/definitiongt
- ltcitegt ltA xmllink"simple"
REF"...." /gt lt/citegt - ltexampleUnitsgt
- Meters, ....
- lt/exampleUnitsgt
- lt/dimensiondeclgt
28Composite Dimensionality Declaration (XML)
- ltdimensiondecl ID"DimSpeed" gt
- ltnamegt Speed lt/namegt
- ltDimensionTypegt Composite lt/DimensionTypegt
- ltdimensionalitygt
- ltnumeratorgt ltdimensiongt
- ltradixgt ltA xmllink"simple"
REF"length" /gt lt/radixgt - ltexponentgt 1 lt/exponentgt
lt/dimensiongt lt/numeratorgt - ltdenominatorgtltdimensiongt
- ltradixgt lt XLINK REF"time" gt
lt/radixgt - ltexponentgt 1 lt/exponentgt
lt/dimensiongt ltdenominatorgt - lt/dimensionalitygt
- lt/dimensiondeclgt
29Who will maintain the XML units repository?
- W3C ? No, lacks units expertise
- OASIS? (xml.org) ?
- Intl. Scientific Societies IUPAC? ICSU?
- NIST ?
- Engineering Societies IEEE ? ASME?
- ASTM ?
- American Physics Society? American Chemical
Society ? - UN/CEFACT ? (ebXML.org)?
- ISO? IEC ?
30Measures vs. Coordinates
- Measures
- length
- temperature difference
- time
- subtended angle
- Coordinates
- position
- absolute temperature
- datetimestamp
- latitude/longitude
31Automatic unit conversion
- Only for dimensionally consistent units
- Convert to/from canonical (SI) units
- hence O(n) conversion factors, vs. O(n2)
converson factors
32Dimensionally inconsistent unit conversions
- Example mass to/from volume
- wheat in bushels or in tons
- oil in barrels or tons
- Very common in commerce
- Requires knowledge of material density
- Should be done explicitly (user/application)
33Additional complexities
- Dimensionality is not sufficient to specify type
lattice - Example
- torque and work (energy)
- same dimensionality
- Mass Length2 / Time2
- but these are incommensurate
- torque cross product, work dot product
34Implications
- Need to subtype dimensionality to differing
properties - May need rules on comparability, additivity of
properties (subtypes) of common dimensionality
35Conclusions
- Must specify both units and dimensionality
- either in schema or instances
- Use product of basis dimensions (units)
- Use URI references to detailed specs
- Dimensional analysis theory is incomplete
- Custodian organization for units repository?
36Acknowledgements
- This work supported by U.S. Environmental
Protection Agency, Superfund Office - Program manager Bruce Bargmeyer
- Also thank Peter Murray-Rust, Malcolm Panthaki,
Max Sherman, for discussions, suggestions, etc.
37Contact Information
- Frank Olken, olken_at_lbl.gov, http//pueblo.lbl.gov/
olken Lawrence Berkeley National Lab, Mailstop
50B-3238, 1 Cyclotron Road, Berkeley, CA
94720, Tel 510-486-5891
Pager 510-442-7361 - John L. McCarthy, jlmccarthy_at_lbl.gov,
http//www.lbl.gov/mccarthy Lawrence Berkeley
National Lab, Mailstop 50C, 1 Cyclotron Road,
Berkeley, CA 94720, Tel 510-486-5307
38Bibliography
- Olken, F., McCarthy,J. Measurement Units in XML
Datatypes, http//www.lbl.gov/olken/mendel/w3c/x
ml.schema.wg/units/ syntax.htm , June 1999 - Olken, F., McCarthy,J. Simplified Measurement
Units for XML Datatypes, http//www.lbl.gov/olke
n/mendel/w3c/xml.schema.wg/units/
simplesyntax.htm , June 1999 - Hart, George W. Multidimensional analysis
algebras and systems for science and engineering
/, George W. Hart. New York Springer-Verlag,
c1995. - Schadow, G McDonald, CJ Suico, JG Föhring, U
Tolxdorff, T. "Units of measure in clinical
information systems", Journal of the American
Medical Informatics Association, 1999 Mar-Apr,
vol. 6 number 2, pages151-62.
39Bibliography (cont.)
- Taylor, Barry. Guide for the Use of the
International System of Units (SI) NIST Special
Publication 811, 1995 Edition, U.S. National
Institute of Standards and Technology - ISO TC-12. ISO 311992 Parts 0-13 Quantities and
units, ISO Standards Handbook, International
Organization for Standardization, 345 pages, 3rd
edition, Geneva, 1993, ISBN 92-67-10185-4.
(Available in the United States from ANSI)
(Contains multiple ISO standards ) - .Gruber, T.R. Olsen, G.R. An ontology for
engineering math-ematics. in (Edited by Doyle,
J. Sandewall, E. Torasso, P.) Proc, of 4th
International Conference on Principles of
Knowledge Representation and Reasoning (KR'94),
Bonn, Germany, 24-27 May 1994.) San Francisco,
CA, USA Morgan Kaufmann Publishers, 1994.
p.258-69 (See also URL http//www-ksl.stanfor
d.edu /knowledge-sharing/papers/engmath.html )
40Bibliography (cont.)
- Monica Gayle Funston, Walter Gerstle, and Malcolm
Panthaki, "Quantity, Revisited An
Object-Oriented Reusable Class", URL
http//www.arc.unm.edu/CoMeT/publication/quantity.
html, 1998(?) - Greene, Stephan. Metadata for units of measure
in social science databases, International
Journal of Digital Libraries, (1977), vol. 1, pg.
161-175 - de Boer, J. On the Hisotry of Quantity
Calculus, Metrologia, 1994/1995, vol. 32, pg.
405-429 - Gehani, Narain H. Databases and Units of
Measure, IEEE. Tans. On Software Engineering,
vol. SE-8, no. 6, Nov. 1982, pg. 605-611. - Karr, M. and Loveman, D.B. Incoporation of
Units into Programming Languages, Comm. ACM, May
1978, vol. 21, no. 5, pg. 385-390 - many, many other papers/standards . Send
citations to olken_at_lbl.gov I will post to my
web site.