Challenges and Solutions for Digital Geospatial Data Preservation Jeff Essic Geospatial Data Service - PowerPoint PPT Presentation

About This Presentation
Title:

Challenges and Solutions for Digital Geospatial Data Preservation Jeff Essic Geospatial Data Service

Description:

Challenges and Solutions for Digital Geospatial Data Preservation Jeff Essic Geospatial Data Service – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 83
Provided by: Davi860
Learn more at: https://www.lib.ncsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Challenges and Solutions for Digital Geospatial Data Preservation Jeff Essic Geospatial Data Service


1
Challenges and Solutions for Digital Geospatial
Data PreservationJeff EssicGeospatial Data
Services LibrarianNorth Carolina State
University Libraries
Digital Preservation Summit Indiana State
University
May 21, 2008
2
NC Geospatial Data Archiving Project
  • Partnership between university library (NCSU) and
    state agency (NCCGIA), with Library of Congress
    under the National Digital Information
    Infrastructure and Preservation Program (NDIIPP)
  • One of 8 initial NDIIPP collection building
    partnerships
  • Focus on state and local geospatial content in
    North Carolina (state demonstration)
  • Tied to NC OneMap initiative, which provides for
    seamless access to data, metadata, and
    inventories

3
NCGDAP Goals
  • Repository Goal
  • Capture at-risk data
  • Explore technical and organizational challenges
  • Project End Goal
  • Data Producers Improved temporal data management
    practices
  • Archives More efficient means of acquiring and
    preserving data
  • Progress towards best practices

4
NCGDAP Specifics
  • Funding
  • 520,000 for 2005-2007
  • 500,000 for 18 month extension
  • Staff
  • 1.5 at NCSU
  • Approx. same at NCCGIA

5
Selected Geospatial Data Archive Projects
6
Outline
  • Key Geospatial Data Types
  • Risks to Digital Geospatial Data
  • Value in Temporal/Historical Geospatial Data
  • Archiving Challenges
  • Solutions in Progress

7
Key Geospatial Content Types
8
Data Types Digital Orthophotography
  • All 100 NC counties with orthos
  • 1-5 flight years per county
  • 30-300 gb per flight

9
Geospatial Data Types Vector GIS
  • County, municipal, state
  • Detailed, accurate, current
  • Frequently updated
  • Cadastral (tax parcels)
  • Street centerlines
  • Zoning
  • Topographic contours
  • School, sheriff, fire
  • Voting precincts
  • More

10
Imagery Durable Static Simple structure Mostly
open formats Vector data Volatile Frequent
update Complex structure Mostly proprietary
formats
Imagery Durable Static Simple structure Mostly
open formats Vector data Volatile Frequent
update Complex structure Mostly proprietary
formats
Downtown Raleigh Near State Capitol 2005 Wake
County Ortho
Downtown Raleigh Near State Capitol 2005 Wake
County Ortho
11
Data Types Spatial Databases
  • Vector, raster, and tabular data
  • Relationships
  • Behaviors
  • Annotation
  • Data Models

12
Geospatial Data Types Cartographic
  • GIS Software
  • Software project file (.mxd, .apr, )
  • Data layer file (.avl, .lyr, )
  • PDF, GeoPDF map exports
  • Web Services-based representations

13
Other Geospatial Data Types Place-based Data
Street Views
Oblique Imagery
3D Images
Tax Dept. Photos
  • Present-day value in location-based services and
    mobile applications
  • Future value for cultural heritage, descriptions
    of places

14
Other Geospatial Data Types Web 2.0 Content
15
Geospatial Data Compelling Issues
  • Dynamic content
  • Constantly updated information
  • Data versioning
  • Digital object complexity
  • Spatially enabled databases
  • Complicated, multi-component formats
  • Proprietary formats

16
Risks to Geospatial Data
17
Digital Preservation Points of Failure
  • Data is not saved, or
  • cant be found, or
  • media is obsolete, or
  • media is corrupt, or
  • format is obsolete, or
  • file is corrupt, or
  • meaning is lost

18
Risks to Geospatial Data
  • Producer focus on current data
  • Data overwrite as common practice
  • Future support of data formats in question
  • No open, supported format for vector data
  • Shift to web services-based access
  • Data becoming more ephemeral
  • Inadequate or nonexistent metadata
  • Impedes discovery and use
  • Increasing use of spatial databases for data
    management
  • The whole is greater than the sum of the parts

19
Value in Historical/Temporal Geospatial Data
20
Value in Older Data Cultural Heritage
Future uses of data are difficult to anticipate
(as with Sanborn Maps)
21
Application Impervious Surface Change Mapping
A.
B.
2002 Impervious
2004 Aerial Photography
C.
D.
2004 Impervious Update
2004 Impervious using 2002 Mask
22
Application Shoreline Change Mapping
23
Application Identifying Land Use Changes
1993
1998
1999
2005
2002
Use case Land use and impervious surface change
analysis
24
(No Transcript)
25
Preservation Challenges
26
Challenge Data Capture
2006 Frequency of Capture Survey targeting North
Carolina counties and municipalities
Response yes 65.3, no 34.7 (out of
57.6 response rate)
27
Challenge Data Capture
  • Industry focus on latest and greatest data
  • Industry temporally-impaired from the point of
    view of data availability, software support, etc.
  • Loss of memory about the data
  • Of superceded county orthophoto flights in NC
  • Only 22 recorded in the states GIS inventory
  • Only 30 accessible through county map servers

Some older inventories only available through
Internet Archive
28
Survey of current archiving practice among NC
counties and municipalities
All of our data is kept monthly for 1 year
i.e., September 2006 tape will be overwritten
September 2007. I do a weekly backup of
existing data but it is overwriting the
previously saved data. All of our data is
archived daily, then weekly, then monthly, and
yearly. No emphasis on historical data here.
We just try to keep from losing data completely.
Very minimal hardware to work with and no money.
29
Survey of current archiving practice among NC
counties and municipalities
We are only an emerging GIS. But it is my
intention that ALL data will be
archived. Getting ready to implement this type
of archiving of data. I have not done this,
but it does seem like a good idea! I do not
see why this can not be incorporated with
disaster recovery. Don't you think you would
foster greater support?
Tremendous data producer interest in digitizing
and georeferencing old analog imagery and maps
30
Challenge Preservation Metadata
Results from a 2006 survey of all 100 NC counties
and 25 largest NC municipalities
31
Challenge Vector Data Formats
  • No widely-supported, open vector formats for
    geospatial data
  • Spatial Data Transfer Standard (SDTS) not widely
    supported
  • Geography Markup Language (GML) diversity of
    application schemas and profiles a challenge for
    permanent access
  • Spatial Databases
  • The whole is more than the sum of the parts, and
    the whole is very difficult to preserve
  • Can export individual data layers for curation,
    but relationships and context are lost

32
Problem Multiple choice for format type,
coordinate system, tiling scheme
33
Challenge Digital Object Complexity
  • Files
  • Multi-file dataset
  • Georeferencing
  • Metadata file
  • Symbols file
  • Additional
  • documentation
  • License
  • Disclaimer
  • More
  • Metadata
  • FGDC
  • Acquisition metadata
  • Transfer metadata
  • Ingest metadata
  • Archive rights
  • Archive processes
  • Collection metadata
  • Series metadata

34
Challenge Cartographic Representation
Counterpart to the map is not just the dataset
but also models, symbolization, classification,
annotation, etc.
35
Challenge Geospatial Web Services
USGS nat_haz ArcIMS Service, 7 May 2008, 1105 am
36
Carrboro, NC Population 17,797 (2005 est.)
24 downloadable GIS data layers
6 web mapping applications
4 OGC WMS services (web services)
9 downloadable PDF map layers
37
Other Challenges
  • Rights management
  • Data versioning
  • Semantic issues
  • Large scale content transfer
  • Integrating older analog data
  • More

38
Solutions in Progress
39
Different Ways to Approach Preservation
  • Technical solutions How do we preserve acquired
    content over the long term?
  • Cultural/Organizational solutions How do we make
    the data more preservableand more prone to be
    preservedfrom point of production?

Current use and data sharing requirements not
archiving needs are most likely to drive
improved preservability of content and
improvement of metadata
40
Different Ways to Approach Preservation
  • Technical solutions How do we archive acquired
    content over the long term?
  • Build data repositories not just as an end in
    itself but also as a catalyst for discussion
    within the data community
  • Develop repository ingest workflows create
    technical points of engagement with other NDIIPP
    preservation projects and build on collective
    learning experience

41
Different Ways to Approach Preservation
  • Cultural/Organizational solutions How do we make
    the data more preservableand more prone to be
    archivedfrom point of production?
  • Engage data producer community and spatial data
    infrastructure through outreach and engagement
    influence practice
  • Sell the problem to software vendors and
    standards development
  • Find overlap with more compelling business
    problems disaster preparedness, business
    continuity, road building, etc.
  • Start a discussion about roles at the local,
    state, and federal level

42
Content Identification
Technical Solution Data Repository
43
Formal Inventory Processes
  • Alleviate contact fatigue on part of local
    agencies
  • 20 different NC state agencies contact local
    agencies for data also, federal/regional
    agencies
  • Geospatial data is complex, requiring lengthy
    inventory process
  • Must capture descriptive, technical, and
    administrative information related to the data
  • Make the inventory available as a sharable data
    store

44
What do Inventories Offer to Archives?
  • Data Availability Information
  • Detailed information by data layer
  • Contact Information
  • Minimal Metadata
  • Descriptive, technical, administrative
  • Rights Information
  • Document Technical Environment
  • Software used, formats, transfer methods
  • Future Data Development Plans

45
Detailed Information About Data
Source NC OneMap Data Inventory 2004
46
Inventories as Source of MetadataExample
Surface Water
47
Content Selection
48
Selection Issues
  • Most content is already at some level of risk
  • Early-Middle-Late Stage issues
  • Middle stage is usually the sweet spot, e.g.
    TIFF orthophotos vs. raw images or compressed
    images
  • Also added-value products digital maps,
    cartographic representation
  • Digital maps record or not?
  • Frequency of capture

49
Time series vector data Parcel Boundary Changes
2001-2004, North Raleigh, NC
Continuously updated data Frequency of
snapshots? Different for various framework
layers?
50
Sept. 2006 Frequency of Capture Survey
  • Survey objective
  • Document current practices for obtaining archival
    snapshots of county/municipal geospatial vector
    data layers
  • Seek guidance about frequency of capture
  • Survey topics
  • General questions about data archiving practice
  • Specific questions about parcels, street
    centerlines, jurisdictional boundaries, and
    zoning
  • Survey subjects
  • All 100 counties and 25 municipalities
  • 58 response rate
  • Survey conducted September 2006

51
Frequency of Capture Survey
52
Data Capture Survey Results Overview
  • Two-thirds of responding agencies create and
    retain periodic snapshots
  • Long-term retention more common in counties with
    larger populations
  • Storage environments vary, with servers and
    CD-ROMs most common
  • Offsite storage (or both onsite and offsite) is
    used by nearly half of the respondents
  • Popularity of historic images has resulted in
    scanning and geo-referencing of hardcopy aerial
    photos among one-third of the respondents

53
Survey Observations
  • Process of survey formulation and implementation
    helped to socialize the problem of archiving data
  • Local innovation needs to be mined further to
    inform development of best practices
  • Business drivers for archiving need more study
    (e.g., stated adherence to retention policy)
  • Exposure to peer practice encourages archiving
  • Pronounced local interest in scanning/rectifying
    older analog maps and imagery

54
Content Exchange
55
Solutions Content Exchange Infrastructure
  • High volume of state/federal requests for local
    data
  • Solving the present-day problems of data sharing
    is a pre-requisite to solving the problem of
    long-term access
  • Leveraging more compelling business reasons to
    put the data in motion (disaster preparedness,
    business continuity, highway construction,
    census, )
  • Content exchange networks
  • Minimize need to make contact
  • Add technical, administrative, descriptive
    metadata
  • Establish rights and provenance

56
Solutions Content Exchange Infrastructure
  • Nov. 2007 NC Geographic Information
    Coordinating Council (GICC)
  • Ten Recommendations in Support of Geospatial
    Data Sharing released
  • Recommendation Establish archive and long term
    data access strategies
  • Suggested best practices include Establish a
    policy and procedure for the provision of access
    to historic data, especially for framework data
    layers.
  • http//www.ncgicc.org/CurrentActivities/TenRecomme
    ndationsinSupportofGeospatialData/tabid/156/Defaul
    t.aspx

57
Solutions Get the Data in Motion
  • Harvesting use cases for older data as part of
    outreach

Survey of current archiving practice among NC
counties and municipalities
58
Solutions Getting the Data in Motion
  • Important Objectives
  • Minimize Direct Contact
  • Document Data
  • Clarify Rights
  • Routinize Transfers
  • Leverage other business uses that put data in
    motion
  • Continuity of operations
  • Highway Planning
  • Floodplain Mapping

Most costly part of archive development is
identifying, negotiating acquisition, and then
transferring data
59
Solutions Getting the Data in Motion
  • NC GIS Inventory
  • Efficient data identification
  • Adding preservation elements

Orthophoto Data Distribution System
sneakernet Transfer of large quantities of
imagery
  • NC OneMap Data Download and Viewer
  • Public access
  • Data visualization

Street Centerline Data Distribution
System Efficient transfer of data from 100
counties, with metadata and clarified
rights http//www.ncstreetmap.com
60
Solutions County and City GIS Data Directories
  • Tracking data, map servers, and web services
    since 2000
  • Ranked 3rd in traffic among entry points to
    library website
  • Persistent identifiers
  • usage tracking
  • IDs used in other sites
  • Peers compare activities
  • Community help in site maintenance

61
Repository Development
62
General Workflow
  • Receive Data from Agency
  • Copy data from agency source to NCSU workstation
  • Create Dspace collection space for the data
  • Create administrative metadata
  • Process geospatial metadata
  • Scan geospatial formats and migrate to archival
    format
  • Ingest original and archival data objects, and
    geospatial administrative metadata to Dspace

63
Repository Status
  • Acquired 4 TB of data with more on the way
  • Disk space being used initially for data
    staging
  • Inventorying
  • In the process of ingesting content into DSpace
  • Metadata generation

64
Summary
Technical Solution Data Repository
65
Data Capture Challenge Implemented Solutions
  • Downloading or acquiring low hanging fruit
  • Frequency based on FOC survey
  • Tapping into existing content exchange networks
  • Orthophoto sneakernet
  • NC OneMap
  • NCStreetmaps.org
  • Floodplain Mapping data distribution
  • Others

66
Preservation Metadata Challenge Implemented
Solutions
  • Creating our own based on
  • Non-standard documentation
  • Inventories
  • Personal information exchanges
  • Data context
  • Clues, memory,
  • and other sleuthing

67
Vector Data Formats and Complexity Challenges
Implemented Solutions
  • Converting and Preserving data in Shapefile
    format
  • Not ideal, but
  • Specifications are published
  • Stable, widely accepted and known format
  • Ingest content into Dspace object model
  • Exportability, Transfer, Extraction, and
    Conversion being tested

68
Cartographic Representation Challenge
Implemented Solutions
  • Scanned, georeferenced, and compressed over 286
    NC geologic maps, in cooperation with NC Geologic
    Survey

131,680 1430,000
1500,000 12.5 M
69
Geospatial Web Services Challenge Implemented
Solutions
  • Still searching
  • WMS (Web Map Service)
  • Can only capture derived static images, losing
    the underlying data intelligence
  • Possible use for agent-based image atlas creation
  • WFS (Web Feature Service)
  • Transfers actual vector data as GML
  • Not widely deployed variation in configuration
  • Scalability for bulk transfer questionable

70
Engaging Spatial Data Infrastructure
Cultural/Organigation Solution Engaging Others
71
NC Spatial Data Infrastructure NC OneMap
  • NC OneMap is a next generation mechanism to
    coordinate and disseminate geographic information
    in North Carolina and interact with the NSDI.
  • Objectives
  • Build a common
  • understanding of North
  • Carolina data resources
  • Enable widespread
  • access and distribution
  • of geospatial data

72
NC OneMap
  • Objectives (cont.)
  • Develop ongoing data
  • inventory for all geospatial data
  • holdings
  • http//nc.gisinventory.net
  • Develop content standards
  • for key data themes
  • NC Geographic Information
  • Coordinating Council (GICC)
  • One of the defined characteristics of NC OneMap
    is that Historic and temporal data will be
    maintained and available.

73
Points of Engagement with Spatial Data
Infrastructure
  • Framework data communities
  • Snapshot frequency, naming schemes,
    classification, GML application schemas, format
    strategies
  • Metadata standards and outreach
  • Persistent identifiers, versioning, feedback on
    metadata quality
  • Content replication/transfer
  • For data improvement projects, disaster
    preparedness, aggregation by regional service
    providers, and archives
  • Where does archiving and preservation fit in?

74
Archival and Long Term Access Working Group
  • Initiated by NC Geographic Information
    Coordinating Council in 2008 to address growing
    concerns of state and local agencies about
    long-term access to data
  • Federal, state, regional, and local agency
    representation
  • Key focus
  • Best practices for data snapshots and retention
  • State Archives processes appraisal, selection,
    retention schedules, etc.
  • Who, What, Why, When, Where, How
  • Promising outcome of NCGDAP multiple parties
    and levels discussing data archiving on their
    own.

75
Regional Partnerships
  • Focused on development of shared infrastructure
    for cultivating access to data
  • Becoming test beds for innovation in the area of
    data sharing and data management, including
    archiving

76
NDIIPP Multi-State Geospatial Project
  • Lead organizations North Carolina Center for
    Geographic Information Analysis (NCCGIA) and
    State Archives of NC
  • Partners
  • Leading state geospatial organizations of
    Kentucky and Utah
  • State Archives of Kentucky and Utah
  • NCSU Libraries in catalytic/advisory role
  • State-to-state and geo-to-Archives collaboration
  • 2 year project Nov. 2007-Dec. 2009
  • Archives as part of Spatial Data Infrastructure

77
Engaging Industry
78
Cultural Changing Industry Thinking
  • Is the geospatial industry temporally-impaired?
  • Lack of access to older data
  • Lack for tool/model support for temporal analysis
  • Metadata poor support for changing data
  • Education building class projects around
    available data (i.e., not temporal)
  • Increased interest now in temporal applications?
  • Increased demand for temporal data?
  • Improved tool support ArcGIS 9.2 animation
    tools Geodatabase History, etc.

79
Project Status
What About Commercial Data?
Cultivating a commercial market for older data.
Part of permanent access is marketing,
advertising, and putting older data into the path
of the user
80
Conclusions
81
Conclusions
  • Supporting temporal analysis requirements gets
    more attention than archiving and preservation
  • Leverage existing infrastructure
  • Current data sharing needs drive infrastructure
    improvements that help archiving
  • Leverage business needs that are more compelling
    than preservation (e.g., continuity of
    operations)
  • Facilitate stakeholder ownership of the solutions
  • Mine state and local archiving innovations

82
Slide PresentationTemporarily
athttp//www4.ncsu.edu/jfessic/DPW08.pptLater
, permanently linked athttp//www.lib.ncsu.edu/n
cgdap
Steve Morris Jeff Essic Head, Digital Library
Initiatives Geospatial Data Services
Librarian NCSU Libraries NCSU Libraries ph
(919) 515-1361 ph (919) 515-5698 Steven_Morris_at_
ncsu.edu Jeff_Essic_at_ncsu.edu
Write a Comment
User Comments (0)
About PowerShow.com