Establishing a Mechanism for Maintaining File Integrity within the Data Archive - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Establishing a Mechanism for Maintaining File Integrity within the Data Archive

Description:

Establishing a Mechanism for Maintaining File Integrity ... Thomas C. Stein, Edward A. Guinness, Susan H. Slavney. Planetary Data Systems Geosciences Node ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 25
Provided by: Thomas578
Category:

less

Transcript and Presenter's Notes

Title: Establishing a Mechanism for Maintaining File Integrity within the Data Archive


1
Establishing a Mechanism for Maintaining File
Integrity within the Data Archive
  • Thomas C. Stein, Edward A. Guinness, Susan H.
    Slavney
  • Planetary Data Systems Geosciences Node
  • Washington University
  • St. Louis, Missouri

2
Background
  • Must develop and maintain long-term (100 year)
    archive
  • For NASAs past, present and future orbital and
    landed missions to Mars, Venus, and the Moon
  • Current repository
  • 15 missions
  • 143 data sets (half of them active in 2004)
  • 9 TB data products

3
Background
4
Background
5
Early Archives
  • Data sets archived at the end of a mission
  • Archive small enough to be published on CD-ROM
  • Copies sent to hundreds of science users

6
Current Archives
  • Data sets archived throughout active mission with
    releases at 3- to 6-month intervals
  • Active missions revise and redeliver data sets
    during their lifetime
  • Archive stored on SAN at Geosciences Node
  • Work with multiple active missions at one time
  • Active Mars Global Surveyor, Mars Odyssey, Mars
    Exploration Rovers, Mars Express
  • Planning Mars Reconnaissance Orbiter, MESSENGER,
    Phoenix, Mars Science Laboratory, Lunar
    Reconnaissance Orbiter

7
Issue
  • How to ensure that archived data do not change
    over timeThat is, how to maintain file
    integrity

8
Uses
  • Tracking and validating electronic deliveries
    from data suppliers
  • Protecting the on-line repository against file
    loss or corruption
  • Provide end users with means to validate
    electronically download data

9
Requirements
  • Maintain inventory
  • Data products, product labels, ancillary files
  • Product type, format, file location, size,
    integrity key
  • Maintain state information
  • Backup status
  • Data access permissions

10
Requirements
  • Check in new or updated data
  • Multiple deliveries may be made by instrument
    team during life of mission
  • Verify that all files in a delivery are present
  • Verify that contents of the files match what was
    sent by data provider

11
Requirements
  • Maintain integrity of the online repository
  • Automated checks
  • Manifest (filename, location, and size)
  • Content via integrity key
  • Access permissions
  • Partial integrity checks may be run manually
  • Report of results produced automatically

12
Requirements
  • Track backups of data
  • Scheduled backups
  • Onsite and offsite copies
  • Simulated restores with integrity checks

13
Requirements
  • Provide location and validation method of data on
    demand
  • URL and local file system location of a file
  • Integrity key
  • Multiple files and keys should be packaged on
    the fly into standard formats (e.g., tar and zip)

14
Potential Methods
  • File to file (per byte) comparison
  • File size
  • Bit counts (simple parity)
  • File checksums

15
Selecting a Method
  • Availability
  • Freely available source code
  • Use on multiple platforms
  • Calculation speed
  • Important for data set ingestion and verification
  • Size of key
  • Minimize amount of information required to
    compare files
  • Ease of use
  • Data providers and end users should not need
    special knowledge
  • Accuracy
  • Needs to work every time

16
Selecting a Method
  File to file File size Bit counts MD5 RIPEMD-160 RIPEMD-320 SHA-1 SHA-512
Availability yes yes yes yes yes yes yes yes
Calculation speed variable fast very fast very fast very fast fast very fast fast
Checksum size lt 2 TB 64 bits 64 bits 32 bits 40 bits 80 bits 40 bits 80 bits
Ease of Use easy easy easy easy easy easy easy easy
Accuracy excellent fair poor excellent excellent excellent excellent excellent
  • The National Institute of Standards and
    Technology has not approved MD5 as a secure
    hashing algorithm.

17
Solution
  • Use SHA-1 checksums at the file level to provide
    digital signatures

18
Approach
  • Data flow model involves three phases
  • Data delivery from science team
  • Data ingestion
  • Data integrity

19
(No Transcript)
20
(No Transcript)
21
Test Case Mars Express
  • Using checksums to validate Mars Express HRSC
    data set transfer
  • 320 GB in 11280 files
  • 772 files gt 100 MB
  • 44 files gt 1 GB
  • 11 hours to create checksums 3 seconds to
    compare checksum lists

22
Test Case Mars Exploration Rovers
  • Making checksums available to end user within MER
    Analysts Notebook
  • New feature
  • MD5 and SHA-1 checksums available
  • 3 TB in 2.6 million files
  • 160 files gt 100 MB
  • Checksums created in 3 days

23
Future Work
  • In-house
  • Integrate checksums into existing validation
    tools
  • Develop data integrity tools to carry out regular
    checks of archive
  • External
  • Work with science teams to incorporate checksums
    as part of their data delivery
  • Educate end users regarding checksums

24
  • Provenance
Write a Comment
User Comments (0)
About PowerShow.com