Preservation Issues Related to Digital Geospatial Data Steven P. Morris Head of Digital Library Initiatives North Carolina State University Libraries - PowerPoint PPT Presentation

About This Presentation
Title:

Preservation Issues Related to Digital Geospatial Data Steven P. Morris Head of Digital Library Initiatives North Carolina State University Libraries

Description:

Preservation Issues Related to Digital Geospatial Data Steven P' Morris Head of Digital Library Init – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 95
Provided by: Davi860
Category:

less

Transcript and Presenter's Notes

Title: Preservation Issues Related to Digital Geospatial Data Steven P. Morris Head of Digital Library Initiatives North Carolina State University Libraries


1
Preservation Issues Related to Digital Geospatial
Data Steven P. MorrisHead of Digital Library
InitiativesNorth Carolina State University
Libraries
Library of Congress Workshop
April 21, 2008
2
Overview of the Problem Area Outline
  • Revisiting Key Geospatial Data Types
  • Risks to Digital Geospatial Data
  • Value in Temporal/Historical Data
  • Archiving Challenges

3
Brief (Very) Overview of the Geospatial Domain
4
Data Types Digital Orthophotography
  • All 100 NC counties with orthos
  • 1-5 flight years per county
  • 30-300 gb per flight

5
Geospatial Data Types Vector GIS
  • County, municipal, state
  • Detailed, accurate, current
  • Frequently updated
  • Cadastral (tax parcels)
  • Street centerlines
  • Zoning
  • Topographic contours
  • School, sheriff, fire
  • Voting precincts
  • More

6
Data Types Spatial Databases
  • Vector and raster data
  • Relationships
  • Behaviors
  • Annotation
  • Data Models

7
Geospatial Data Types Cartographic
  • GIS Software
  • Software project file (.mxd, .apr, )
  • Data layer file (.avl, .lyr, )
  • PDF map exports
  • Web Services-based representations

8
Other Geospatial Data Types Place-based Data
Oblique Imagery
  • Mobile, LBS, and, social networking applications
  • Long-term cultural heritage value in non-overhead
    imagery more descriptive of place and
    function

Street View Images
Tax Dept. Photos
Road Videologs
9
Geospatial Data Compelling Issues
  • Dynamic content
  • Constantly updated information
  • Data versioning
  • Digital object complexity
  • Spatially enabled databases
  • Complicated, multi-component formats
  • Proprietary formats

10
Risks to Geospatial Data
11
How would you describe your current geospatial
archive?
12
Digital Preservation Points of Failure
  • Data is not saved, or
  • cant be found, or
  • media is obsolete, or
  • media is corrupt, or
  • format is obsolete, or
  • file is corrupt, or
  • meaning is lost

Solutions Migration Emulation Encapsulation XML
13
Risks to Geospatial Data
  • Producer focus on current data
  • Data overwrite as common practice
  • Future support of data formats in question
  • No open, supported format for vector data
  • Shift to web services-based access
  • Data becoming more ephemeral
  • Inadequate or nonexistent metadata
  • Impedes discovery and use
  • Increasing use of spatial databases for data
    management
  • The whole is greater than the sum of the parts

14
Value in Older Geospatial Data
15
Value in Older Data Cultural Heritage
Future uses of data are difficult to anticipate
(as with Sanborn Maps)
16
Value in Older Data Solving Business Problems
Land use change analysis
Site location analysis
Real estate trends analysis
Disaster response
Resolution of legal challenges
Impervious surface maps
Suburban Development 1993/2002 Near
Mecklenburg-Cabarrus County border
17
Problem Flood and Hurricane Preparedness
18
Application Impervious Surface Change Mapping
A.
B.
2002 Impervious
2004 Aerial Photography
C.
D.
2004 Impervious Update
2004 Impervious using 2002 Mask
19
Problem Beach Erosion and Shoreline Change
20
Application Shoreline Change Mapping
21
Problem Tracking Land Use Change
22
Application Land Use Change Mapping
Input Data
Output GIS Data
Using Mecklenburg County 2002 true color
orthorectified aerial photography
23
(No Transcript)
24
Preservation Challenges
25
Challenge Vector Data Formats
  • No widely-supported, open vector formats for
    geospatial data
  • Spatial Data Transfer Standard (SDTS) not widely
    supported
  • Geography Markup Language (GML) diversity of
    application schemas and profiles a challenge for
    permanent access
  • Spatial Databases
  • The whole is more than the sum of the parts, and
    the whole is very difficult to preserve
  • Can export individual data layers for curation,
    but relationships and context are lost
  • Some thinking of using the spatial database as
    the primary archival platform

26
Challenge Preserving Geodatabases
  • Spatial databases in general vs. ESRI Geodatabase
    format
  • Not just data layers and attributesalso
    topology, annotation, relationships, behaviors
  • ESRI Geodatabase archival issues
  • XML Export, Geodatabase History, File
    Geodatabase, Geodatabase Replication
  • Some looking to Geodatabase as archival platform
    (in addition to feature class export)

27
Challenge Cartographic Representation
Counterpart to the map is not just the dataset
but also models, symbolization, classification,
annotation, etc.
28
Challenge Geospatial Web Services
  • How to capture records from decision-
  • making processes?

29
Challenge Preservation Metadata
Results from a 2006 survey of all 100 NC counties
and 25 largest NC municipalities
30
Challenge Data Capture
2006 Frequency of Capture Survey targeting North
Carolina counties and municipalities
Response yes 65.3, no 34.7 (out of
57.6 response rate)
31
Challenge Digital Object Complexity
32
Building Data Bundles The Zip Codes Example
33
Where is the Dataset?
34
Heres One!
  • Files
  • Multi-file dataset
  • Georeferencing
  • Metadata file
  • Symbolization file
  • Additional
  • documentation
  • License
  • Disclaimer
  • More
  • Metadata
  • FGDC
  • Acquisition metadata
  • Transfer metadata
  • Ingest metadata
  • Archive rights
  • Archive processes
  • Collection metadata
  • Series metadata

35
Other Challenges
  • Rights management
  • Data versioning
  • Semantic issues
  • Large scale content transfer
  • Integrating older analog data
  • More

36
Looking for Solutions Outline
  • Approaches to Archiving and Preservation
  • Current and Recent Geoarchiving Projects
  • Content Identification
  • Content Selection
  • Content Exchange
  • Digital Repository Development
  • Engaging Spatial Data Infrastructure
  • Archives Processes

37
Different Ways to Approach Preservation
  • Technical solutions How do we preserve acquired
    content over the long term?
  • Cultural/Organizational solutions How do we make
    the data more preservableand more prone to be
    preservedfrom point of production?

Current use and data sharing requirements not
archiving needs are most likely to drive
improved preservability of content and
improvement of metadata
38
Different Ways to Approach Preservation
  • Technical solutions How do we archive acquired
    content over the long term?
  • Build data repositories not just as an end in
    itself but also as a catalyst for discussion
    within the data community
  • Develop repository ingest workflows create
    technical points of engagement with other NDIIPP
    preservation projects and build on collective
    learning experience

39
Different Ways to Approach Preservation
  • Cultural/Organizational solutions How do we make
    the data more preservableand more prone to be
    archivedfrom point of production?
  • Engage data producer community and spatial data
    infrastructure through outreach and engagement
    influence practice
  • Sell the problem to software vendors and
    standards development
  • Find overlap with more compelling business
    problems disaster preparedness, business
    continuity, road building, etc.
  • Start a discussion about roles at the local,
    state, and federal level

40
Current or Recent Geospatial Data Archiving
Projects
41
Selected Geospatial Data Archive Projects
Project Organizations Funding
Persistent Archives Testbed San Diego Supercomputer Center, NARA NARA
VanMap San Diego Supercomputer Center Inter- PARES
Geospatial Repository for Academic Deposit Extraction EDINA JISC
Geospatial Electronic Records CIESIN NHPRC
various Carleton University various
National Geospatial Digital Archive UC Santa Barbara NDIIPP
Maine GeoArchives State of Maine NHPRC
42
NC Geospatial Data Archiving Project
  • Partnership between university library (NCSU) and
    state agency (NCCGIA), with Library of Congress
    under the National Digital Information
    Infrastructure and Preservation Program (NDIIPP)
  • One of 8 initial NDIIPP collection building
    partnerships
  • Focus on state and local geospatial content in
    North Carolina (state demonstration)
  • Tied to NC OneMap initiative, which provides for
    seamless access to data, metadata, and
    inventories
  • Objective engage existing state/federal
    geospatial data infrastructures in preservation

Serve as catalyst for discussion within industry
43
NCGDAP Goals
  • Repository Goal
  • Capture at-risk data
  • Explore technical and organizational challenges
  • Project End Goal
  • Data Producers Improved temporal data management
    practices
  • Archives More efficient means of acquiring and
    preserving data
  • Progress towards best practices

Temporal data management vs. long-term
preservation
44
Content Identification
45
Formal Inventory Processes
  • Alleviate contact fatigue on part of local
    agencies
  • 20 different NC state agencies contact local
    agencies for data also, federal/regional
    agencies
  • Geospatial data is complex, requiring lengthy
    inventory process
  • Must capture descriptive, technical, and
    administrative information related to the data
  • Make the inventory available as a sharable data
    store

46
What do Inventories Offer to Archives?
  • Data Availability Information
  • Detailed information by data layer
  • Contact Information
  • Minimal Metadata
  • Descriptive, technical, administrative
  • Rights Information
  • Document Technical Environment
  • Software used, formats, transfer methods
  • Future Data Development Plans

47
Detailed Information About Data
Source NC OneMap Data Inventory 2004
48
Inventories as Source of MetadataExample
Surface Water
49
Content Selection
50
Selection Issues
  • Most content is already at some level of risk
  • Early-Middle-Late Stage issues
  • Middle stage is usually the sweet spot, e.g.
    TIFF orthophotos vs. raw images or compressed
    images
  • Also added-value products digital maps,
    cartographic representation
  • Digital maps record or not?
  • Frequency of capture

51
Problem Multiple choice for format type,
coordinate system, tiling scheme
52
Time series vector data Parcel Boundary Changes
2001-2004, North Raleigh, NC
Continuously updated data Frequency of
snapshots? Different for various framework
layers?
53
Sept. 2006 Frequency of Capture Survey
  • Survey objective
  • Document current practices for obtaining archival
    snapshots of county/municipal geospatial vector
    data layers
  • Seek guidance about frequency of capture
  • Survey topics
  • General questions about data archiving practice
  • Specific questions about parcels, street
    centerlines, jurisdictional boundaries, and
    zoning
  • Survey subjects
  • All 100 counties and 25 municipalities
  • 58 response rate
  • Survey conducted September 2006

54
Data Capture Survey Results Overview
  • Two-thirds of responding agencies create and
    retain periodic snapshots
  • Long-term retention more common in counties with
    larger populations
  • Storage environments vary, with servers and
    CD-ROMs most common
  • Offsite storage (or both onsite and offsite) is
    used by nearly half of the respondents
  • Popularity of historic images has resulted in
    scanning and geo-referencing of hardcopy aerial
    photos among one-third of the respondents

55
Survey Observations
  • Process of survey formulation and implementation
    helped to socialize the problem of archiving data
  • Local innovation needs to be mined further to
    inform development of best practices
  • Business drivers for archiving need more study
    (e.g., stated adherence to retention policy)
  • Exposure to peer practice encourages archiving
  • Pronounced local interest in scanning/rectifying
    older analog maps and imagery

56
Content Exchange
57
Solutions Content Exchange Infrastructure
  • High volume of state/federal requests for local
    data
  • Solving the present-day problems of data sharing
    is a pre-requisite to solving the problem of
    long-term access
  • Leveraging more compelling business reasons to
    put the data in motion (disaster preparedness,
    business continuity, highway construction,
    census, )
  • Content exchange networks
  • Minimize need to make contact
  • Add technical, administrative, descriptive
    metadata
  • Establish rights and provenance

58
Transfer Modes - Conventional
  • CD/DVD
  • e.g., 230 CD-ROMs for 1999 Wake County
    orthophotos
  • External drives
  • Becoming more routine
  • FTP
  • Bandwidth intensive restricted to off hours, or
    not done
  • WAN (Wide Area Network)
  • Network incompatibilities, network load
  • Web Download
  • Complex interfaces make automation difficult

59
Transfer Modes - Web Services
  • WMS (Web Map Service)
  • Can only capture derived static images, losing
    the underlying data intelligence
  • Possible use for agent-based image atlas creation
  • WFS (Web Feature Service)
  • Transfers actual vector data as GML
  • Not widely deployed variation in configuration
  • Scalability for bulk transfer questionable
  • Federal Enterprise Architecture Geospatial
    Profile suggests WMS, WFS, FTP

60
Repository Development
61
Repository Pre-ingest Workflow
62
NCGDAP Workflow Data Receipt
63
Workflow Format Processing
Conversion
Compound Formats
64
Workflow Metadata Processing
Creation
Remediation
65
Workflow Ingest Processes
66
Extended Curation Feedback and Outreach
67
Engaging Spatial Data Infrastructure
68
NC Spatial Data Infrastructure NC OneMap
  • NC OneMap is a next generation mechanism to
    coordinate and disseminate geographic information
    in North Carolina and interact with the NSDI.
  • Objectives
  • Build a common
  • understanding of North
  • Carolina data resources
  • Enable widespread
  • access and distribution
  • of geospatial data

69
NC OneMap
  • Objectives (cont.)
  • Develop ongoing data
  • inventory for all geospatial data
  • holdings RAMONA
  • http//nc.gisinventory.net
  • Develop content standards
  • for key data themes
  • NC Geographic Information
  • Coordinating Council (GICC)
  • One of the defined characteristics of NC OneMap
    is that Historic and temporal data will be
    maintained and available.

70
Points of Engagement with Spatial Data
Infrastructure
  • Framework data communities
  • Snapshot frequency, naming schemes,
    classification, GML application schemas, format
    strategies
  • Metadata standards and outreach
  • Persistent identifiers, versioning, feedback on
    metadata quality
  • Content replication/transfer
  • For data improvement projects, disaster
    preparedness, aggregation by regional service
    providers, and archives
  • Where does archiving and preservation fit in?

71
Emerging Regional Partnerships
  • Focused on development of shared infrastructure
    for cultivating access to data
  • Becoming test beds for innovation in the area of
    data sharing and data management, including
    archiving

72
Engaging Industry
73
Cultural Changing Industry Thinking
  • Is the geospatial industry temporally-impaired?
  • Lack of access to older data
  • Lack for tool/model support for temporal analysis
  • Metadata poor support for changing data
  • Education building class projects around
    available data (i.e., not temporal)
  • Increased interest now in temporal applications?
  • Increased demand for temporal data?
  • Improved tool support ArcGIS 9.2 animation
    tools Geodatabase History, etc.

74
Project Status
What About Commercial Data?
Cultivating a commercial market for older data.
Part of permanent access is marketing,
advertising, and putting older data into the path
of the user
75
Points of Engagement with the Open Geospatial
Consortium (OGC)
  • Geography Markup Language (GML) for archiving
    (PDF/A version of GML?)
  • GeoRM (Geo Rights Management)
  • Adding preservation use cases
  • Content Packaging
  • Will there be an industry solution?
  • Web Services Context Documents
  • Can we save data state as well as application
    state?
  • Content Replication
  • Is this a layer in the overall architecture?
  • Persistent Identifiers

76
Archives Processes
77
Maine GeoArchives Project Components
  • Retention schedules
  • Geospatial data
  • Administrative records
  • Record accessioning
  • Appraisal system
  • System documentation
  • Archival data and metadata standards
  • Rules for disposition of local government records

78
Maine GeoArchives Functional Requirements
Adopted set of functional requirements for
recordkeeping systems to insure permanent
retention of data layers
  • Compliance
  • Responsible
  • Credibility
  • Completeness
  • Authenticity
  • Soundness
  • Auditability
  • Availability
  • Exportable
  • Renderable
  • Redactable

79
NC 2006 Survey of Current Archiving Practice
80
Question 1 (the filter)
Do you create periodic snapshots of any vector
datasets for long-term retention and archiving?
  • Response
  • yes 65.3,
  • no 34.7
  • (out of 57.6
  • response rate)

Respondents answering No automatically skip
most of the remaining questions
81
Key Results Capture Frequency
82
Key Results Formats
83
Key Results Formats
84
Key Results Metadata
85
Key Results Storage
86
Key Results Storage
87
Key Results Reasons for Archiving
88
Conclusion
89
Key issues
  • What are the points of intersection between
    archive needs and business continuity/disaster
    preparedness and other business needs?
  • How to best stimulate and learn from innovation
    at the state/regional/local level?
  • How to make data more preservable from point of
    production and on through data transfer
  • How to most effectively move data in an
    efficient, well-documented manner with clarified
    rights

90
Key issues (continued)
  • How to best make Archives a part of spatial data
    infrastructure?
  • How should tradeoffs between level of curation
    and quantity of acquisition be made?
  • Defining the record data vs. derivative
    components
  • How to best cross-fertilize between projects
    (NDIIPP, NHPRC, etc.)

91
Questions?
Contact Steve Morris Head, Digital Library
Initiatives NCSU Libraries ph (919)
515-1361 Steven_Morris_at_ncsu.edu http//www.lib.nc
su.edu/ncgdap
92
Workflow Overview
Handout 1
93
Workflow Focus Digital Format Curatorship
Handout 2
94
Workflow Focus Geospatial Metadata Management
Handout 3
Write a Comment
User Comments (0)
About PowerShow.com