Universal Numeric Fingerprints: A Method for Scientific Data Verification - PowerPoint PPT Presentation

1 / 6
About This Presentation
Title:

Universal Numeric Fingerprints: A Method for Scientific Data Verification

Description:

Format and platform independent data fingerprint. ... representation (printable encoding) Univesal Numeric Fingerprints (Page 4 ) ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 7
Provided by: donated
Category:

less

Transcript and Presenter's Notes

Title: Universal Numeric Fingerprints: A Method for Scientific Data Verification


1
Universal Numeric Fingerprints A Method for
Scientific Data Verification
  • Micah Altman, Senior Research Scientist
  • Harvard University

2
Contents
  • General Algorithm
  • Algorithmic Details
  • Applications
  • References

3
General Algorithm
  • Format and platform independent data fingerprint.
  • Same UNF regardless of hardware, operating
    system, file format, orapplication software.
  • UNF generation stages
  • - approximation (dessication)
  • - normalization (canonicalization)
  • - fingerprinting (cryptographic hash)
  • - representation (printable encoding)

4
Algorithmic Details (Number Values)
  • Approximation Round to k significant digits
  • Scale value so kth digit is to left of decimal
  • Round using IEEE round-to-nearest mode
  • Rescale value to original magnitude
  • Convert to string in canonical representation
  • A sign character in ,-
  • A single leading period.
  • A decimal point, represented by a period
    character .
  • Up to k-1 digits following the decimal, comprised
    of the remaining k-1 digits of the number,
    omitting trailing zeros.
  • A lower case e
  • A sign character.
  • The digits of the exponent, omitting trailing
    zeros.
  • Specified representation for special values
    missing, nan,inf,-inf
  • Termination with POSIX EOL character.
  • Serialization as UTF-8
  • Fingerprint using SHA-256
  • Presentation as string
  • Leading identification UNF version
    option string
  • Trailing fingerprint value truncated hash,
    base-64 encoded, big-endian order

Other specified canonical formats for
characters strings, dates, times, durations,
bitfields booleans,
5
Applications
  • Object identification UNF uniquely identifies
    object based solely on content.
  • Citation/verificationCitations that include a
    UNF can later be used to verify that the data
    cited has not been altered.
  • Reformatting/input checkingValidate format
    conversion (e.g. for digital preservation) or
    data loading process (e.g. for statistical
    software) by calculating UNFs pre/post.

gt library(UNF) gt v 1100/10 0.0111 gt
print(unf(v, ndigits 7)) 1"UNF47,1286kK46s0
59g5dswiRGBM7yVvo3gwyBVvuBzioK/df72o gt
summary(unf(longley)) 1"UNF47zq5Q8/mP7z3m2Emw
oOJndVM8flQmmbuHvvqDK910E"
6
References
  • Software home
  • http//purl.oclc.org/NET/UNF_PROJECT_WEBSITE
  • Original Algorithm
  • M. Altman, J. Gill, M. McDonald (2003),
    Numerical Issues in Statistical Computing for the
    Social Scientist, John Wiley Sons
  • Use in Citation Standards, Digital Libraries, and
    Preservation
  • M. Altman, G. King, (2007), A Proposed Standard
    for the Scholarly Citation of Data, Dlib 13(3/4)
  • G. King, . 2007. An Introduction to the
    Dataverse Network as an Infrastructure for Data
    Sharing, Sociological Methods and Research.
    Forthcoming 2007.
  • M. Altman , J. Crabtree., D. Donakowski,, M.
    Maynard, , Data Preservation Alliance for the
    Social Sciences A Model for Collaboration.
    Paper presented at DigCCurr 2007, Chapel Hill,
    N.C. 2007.
Write a Comment
User Comments (0)
About PowerShow.com