Title: Universal Numeric Fingerprints: A Method for Scientific Data Verification
1Universal Numeric Fingerprints A Method for
Scientific Data Verification
- Micah Altman, Senior Research Scientist
- Harvard University
2Contents
- General Algorithm
- Algorithmic Details
- Applications
- References
3General Algorithm
- Format and platform independent data fingerprint.
- Same UNF regardless of hardware, operating
system, file format, orapplication software. - UNF generation stages
- - approximation (dessication)
- - normalization (canonicalization)
- - fingerprinting (cryptographic hash)
- - representation (printable encoding)
4Algorithmic Details (Number Values)
- Approximation Round to k significant digits
- Scale value so kth digit is to left of decimal
- Round using IEEE round-to-nearest mode
- Rescale value to original magnitude
- Convert to string in canonical representation
- A sign character in ,-
- A single leading period.
- A decimal point, represented by a period
character . - Up to k-1 digits following the decimal, comprised
of the remaining k-1 digits of the number,
omitting trailing zeros. - A lower case e
- A sign character.
- The digits of the exponent, omitting trailing
zeros. - Specified representation for special values
missing, nan,inf,-inf - Termination with POSIX EOL character.
- Serialization as UTF-8
- Fingerprint using SHA-256
- Presentation as string
- Leading identification UNF version
option string - Trailing fingerprint value truncated hash,
base-64 encoded, big-endian order
Other specified canonical formats for
characters strings, dates, times, durations,
bitfields booleans,
5Applications
- Object identification UNF uniquely identifies
object based solely on content. - Citation/verificationCitations that include a
UNF can later be used to verify that the data
cited has not been altered. - Reformatting/input checkingValidate format
conversion (e.g. for digital preservation) or
data loading process (e.g. for statistical
software) by calculating UNFs pre/post.
gt library(UNF) gt v 1100/10 0.0111 gt
print(unf(v, ndigits 7)) 1"UNF47,1286kK46s0
59g5dswiRGBM7yVvo3gwyBVvuBzioK/df72o gt
summary(unf(longley)) 1"UNF47zq5Q8/mP7z3m2Emw
oOJndVM8flQmmbuHvvqDK910E"
6References
- Software home
- http//purl.oclc.org/NET/UNF_PROJECT_WEBSITE
- Original Algorithm
- M. Altman, J. Gill, M. McDonald (2003),
Numerical Issues in Statistical Computing for the
Social Scientist, John Wiley Sons - Use in Citation Standards, Digital Libraries, and
Preservation - M. Altman, G. King, (2007), A Proposed Standard
for the Scholarly Citation of Data, Dlib 13(3/4) - G. King, . 2007. An Introduction to the
Dataverse Network as an Infrastructure for Data
Sharing, Sociological Methods and Research.
Forthcoming 2007. - M. Altman , J. Crabtree., D. Donakowski,, M.
Maynard, , Data Preservation Alliance for the
Social Sciences A Model for Collaboration.
Paper presented at DigCCurr 2007, Chapel Hill,
N.C. 2007.