Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries - PowerPoint PPT Presentation

About This Presentation
Title:

Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries

Description:

State expression of the National Map ... National States Geographic Information Council. FGDC Historical Data Committee ... more ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries


1
Preservation of Digital Geospatial Data
Challenges and Opportunities Steve MorrisHead
of Digital Library InitaitivesNorth Carolina
State University Libraries
NARA Meeting
Dec. 14, 2005
2
Outline
  • Digital Geospatial Data Types
  • Risks to Digital Geospatial Data
  • Overview of NC Geospatial Data Archiving Project
  • Preservation Challenges and Possible Solutions

3
Geospatial data types Vector data
4
Geospatial data types Satellite imagery
5
Geospatial data types Aerial imagery
6
Geospatial data types Aerial imagery
7
Geospatial data types Aerial imagery
8
Geospatial data types Tabular data (w/vector)
9
Time series vector data Parcel Boundary Changes
2001-2004, North Raleigh, NC
10
Time series Ortho imagery Vicinity of
Raleigh-Durham International Airport 1993-2002
11
Todays geospatial data as tomorrows cultural
heritage
12
Risks to Digital Geospatial Data
.shp
.mif
.gml
.e00
.dwg
.dgn
.bsb
.bil
.sid
13
Risks to Digital Geospatial Data
  • Producer focus on current data
  • Time-versioned content generally not archives
  • Future support of data formats in question
  • Vast range of data formats in use--complex
  • Shift to streaming data for access
  • Archives have been a by-product of providing
    access
  • Preservation metadata requirements
  • Descriptive, administrative, technical, DRM
  • Geodatabases
  • Complex functionality

14
NC Geospatial Data Archiving Project
  • Partnership between university library (NCSU) and
    state agency (NCCGIA)
  • Focus on state and local geospatial content in
    North Carolina (state demonstration)
  • Tied to NC OneMap initiative, which provides for
    seamless access to data, metadata, and inventory
    information
  • Objective engage existing state/federal
    geospatial data infrastructures in preservation

15
Targeted Content
  • Resource Types
  • GIS vector (point/line/polygon) data
  • Digital orthophotography
  • Digital maps
  • Tabular data (e.g. assessment data)
  • Content Producers
  • Mostly state, local, regional agencies
  • Some university, not-for-profit, commercial
  • Selected local federal projects

16
Local Government GIS Archival Issues
  • Data resources are highly distributed and subject
    to frequent update
  • More detailed, current, accurate than
    federal/state data resources
  • North Carolina local agency GIS environment
  • 100 counties, 95 with GIS
  • 85 counties with high resolution orthophotography
  • Growing number of municipal systems
  • Value 162 million plus investment (est. in
    2003)

17
Work plan in a Nutshell
  • Work from existing data inventories
  • NC OneMap Data Sharing Agreements as the
    blanket, individual agreements as the quilt
  • Partnership work with existing geospatial data
    infrastructures (state and federal)
  • Technical approach
  • METS with FGDC, PREMIS?, GeoDRM?
  • Dspace now re-ingest to different environment
  • Web services consumption for archival development

18
NCGDAP Philosphy of Engagement
Provide feedback to producer organizations/ inform
state geospatial infrastructure
Take the data as in the manner In which it can
be obtained
Wrangle and archive data
Note the Project in North Carolina Geospatial
Data Archiving Project the process, the
learning experience, and the engagement with
geospatial data infrastructures are more
important than the archive
19
Big Challenges
  • Format migration paths
  • Management of data versions over time
  • Preservation metadata
  • Harnessing geospatial web services
  • Preserving cartographic representation
  • Keeping content repository-agnostic
  • Preserving geodatabases
  • More

20
Vector Data Format Issues
  • Vector data much more complicated than image data
  • Archiving vs. Permanent access
  • An open pile of XML might make an archive, but
    if using it requires a team of programmers to do
    digital archaeology then it does not provide
    permanent access
  • Piles of XML need to be widely understood piles
  • GML need widely accepted application schemas
    (like OSMM?)
  • The Geodatabase conundrum
  • Export feature classes, and lose topology,
    annotation, relationships, etc.
  • or use the Geodatabase as the primary archival
    platform (some are now thinking this way)

21
GIS Software Used NC Local Agencies
Source NC OneMap Data Inventory 2004
22
Vector Data Format Options
  • Option A use an open format and have a really
    unfortunate transformation and limited vendor
    support for the output object
  • Option B use closed format but retain the
    original content and count on short- and
    medium-term vendor support. 
  • Option C do both to buy time and look for an
    open, ASCII-based solution. (watch GML activity)
  • No sweet spot, just an evolving and changing mix
    of
  • flawed options that are used in combination.

23
Geography Markup Language Issues
  • GML still more useful as a transfer format than
    an archival format, support limited even for
    transfer
  • Permanent access requirements
  • profiles and application schemas widely
    understood and supported, avoid requiring
    digital archaeology
  • role of GML Simple Features Profile?
  • Assessing formats for preservation
    sustainability factors, quality functionality
    factors
  • Apply same approach to GML profiles and
    application schemas?

24
Geography Markup Language Issues
  • Plans for environmental scan of existing GML
    profiles and application schemas or profiles
  • schema name (e.g. OSMM, top10NL, ESRI GML,
    LandGML)
  • responsible agency schema has official
    government status?
  • GML version known unsupported GML components
  • schema history known interoperation with other
    schemas
  • vendor support translator support stability
    over time

25
Managing Time-versioned Content
26
Managing Time-versioned Content
  • Many local agency data layers continuously
    updated
  • E.g., some county cadastral data updated
    dailyolder versions not generally available
  • Individual versioned datasets will wander off
    from the archive
  • How do users get current metadata/DRM/object
    from a versioned dataset found in the wild?
  • How do we certify concurrency and agreement
    between the metadata and the data?

27
Managing Time-versioned Content
  • Can we manage the relationship loosely using a
    persistent identifier link to a parent object?

Persistent ID Resolver
Parent Object Manager
version
version
version
version
version
28
Preservation Metadata Issues
  • FGDC Metadata
  • Many flavors, incoming metadata needs processing
  • Cross-walk elements to PREMIS, MODS?
  • Metadata wrapper/Content packaging
  • METS (Metadata Encoding and Transmission
    Standard) vs. other industry solutions
  • Need a geospatial industry solution for the
    METS-like problem
  • GeoDRM a likely triggerwrapper to enforce
    licensing (MPEG 21 references in OGIS Web
    Services 3)

29
Metadata Availability
30
Harnessing Geospatial Web Services
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Geospatial Web Service Types
  • Image services
  • Deliver image resulting from query against
    underlying data
  • Limited opportunity for analysis
  • Feature services
  • Stream actual feature data, greater opportunity
    for data analysis
  • Other
  • Geocoding services
  • Routing
  • .etc.

37
(No Transcript)
38
Geospatial Web Services Rights IssuesExample
Desktop GIS-accessible ArcIMS
  • 39 of 100 NC counties have desktop GIS-accessible
    ArcIMS services
  • It is difficult to know how many of these
    counties actually expect users to either
  • A) access data through desktop GIS for viewing
    only, or
  • B) extract and download data

39
Harnessing Geospatial Web Services
  • Automated content identification
  • capabilities files, registries, catalog
    services
  • WMS (Web Map Service) for batch extraction of
    image atlases
  • last ditch capture option
  • preserve cartographic representation
  • retain records of decision-making process
  • feature services (WFS) later.
  • Rights issues in the web services space are
    ambiguous

40
Web mash-ups and the New Mainstream Geospatial
Web Services
41
Preserving Cartographic Representation
42
Preserving Cartographic Representation
  • The true counterpart of the old map is not the
    GIS dataset, but rather the cartographic
    representation that builds on that data
  • Intellectual choices about symbolization, layer
    combinations
  • Data models, analysis, annotations
  • Cartographic representation typically encoded in
    proprietary files (.avl, .lyr, .apr, .mxd) that
    do not lend themselves well to migration
  • Symbologies have meaning to particular
    communities at particular points in time,
    preserving information about symbol sets and
    their meaning is a different problem

43
Preserving Cartographic Representation
  • Image-based approaches
  • Generate images using Map Book or similar tools
  • Harvest existing atlas images
  • Capture atlases from WMS servers
  • Export layouts or maps to image
  • Vector-based approaches
  • Store explicitly in the data format (e.g. Feature
    Class Representation in ArcGIS 9.2)
  • Archive and upward-migrate existing files .avl,
    .apr, .lyr, .mxd, etc.
  • SVG, VML or other XML approaches
  • Other?

44
Preserving Cartographic Representation
45
Preserving Cartographic Representation
46
Repository Architecture Issues
  • Interest in how geospatial content interacts with
    widely available digital repository software
  • Focus on salient, domain-specific issues
  • Challenge remain repository agnostic
  • Avoid imprinting on repository software
    environment
  • Preservation package should not be the same as
    the ingest object of the first environment
  • Tension between exploiting repository software
    features vs. becoming software dependent

47
Preserving Geodatabases
  • Spatial databases in general vs. ESRI Geodatabase
    format
  • Not just data layers and attributesalso
    topology, annotation, relationships, behaviors
  • ESRI Geodatabase archival issues
  • XML Export, Geodatabase History, File
    Geodatabase, Geodatabase Replication
  • Some looking to Geodatabase as archival platform
    (in addition to feature class export)

48
Geodatabase Availability
  • Local agencies, especially municipalities, are
    increasingly turning to the ESRI Geodatabase
    format to manage geospatial data.
  • According to the 2003 Local Government GIS Data
    Inventory, 10.0 of all county framework data and
    32.7 of all municipal framework data were
    managed in that format.

49
Evolving Geodatabase Handling Approaches
Project Stage Planned Approach
Original Proposal (Nov. 2003) Export feature classes as shapefiles archive Geodatabases less than 2 GB in size
Finalized Work Plan (Dec. 2004) Also export content as Geodatabase XML
Possible Future Work Plan Changes Explore maintenance of some archival content in Geodatabase form explore Geodatabase replication as an archive development approach archive Geodatabases of unlimited size
50
Efficient Content Replication
  • Content replication also needed for
  • Disaster preparedness
  • State and federal data improvement projects
  • Aggregation by regional geospatial web service
    providers
  • WFS, e.g. efficiency in complete content
    transfer?
  • Rsync-like function, plus rights management,
    inventory processes, metadata management,
    informed by data update cycles
  • Archiving delta files vs. complete replication
    need to avoid requiring digital archaeology in
    the future

51
Points of Engagement with the Open Geospatial
Consortium (OGC)
  • GML for archiving
  • GeoDRM -- Adding preservation use cases
  • Content Packaging -- Industry solution?
  • Web Services Context Documents
  • Can we save data state as well as application
    state?
  • Content Replication
  • Is this layer in the architecture?
  • Persistent Identifiers

52
Project Outcomes
  • Demonstration archive
  • Outreach activity planting seeds
  • International, national, state, local, commercial
  • Learning experience, informing
  • Spatial data infrastructure
  • Commercial vendors (data/software/consulting)
  • Repository software communities
  • Metadata practice (both GIS preservation)
  • Rights management developments
  • Data and interoperability standards

53
Content Identification and Selection
  • Work from NC OneMap Data Inventory
  • Combine with inventory information from various
    state agencies and from previous NCSU efforts
  • Develop methodology for selecting from among
    early, middle, and late stage products
  • Develop criteria for time series development
  • Investigate use of emerging Open Geospatial
    Consortium technologies in data identification

54
Content Acquisition
  • Work from NC OneMap Data Sharing Agreements as a
    starting point (the blanket)
  • Secure individual agreements (the quilt)
  • Investigate use of OGC technologies in capture
  • Explore use of METS as a metadata wrapper
  • Ingest FGDC metadata Xwalk to MODS? PREMIS?
  • Maybe METS DRM short term GeoDRM long term
  • Consider links to services version management
  • Get the geospatial community to tackle the
    content packaging problem (maybe MPEG 21?)

55
Partnership Building
  • Work within context of the NC OneMap initiative
  • State, local, federal partnership
  • State expression of the National Map
  • Defined characteristic Historic and temporal
    data will be maintained and available
  • Advisory Committee drawn from the NC Geographic
    Information Coordinating Council subcommittees
  • Seek external partners
  • National States Geographic Information Council
  • FGDC Historical Data Committee
  • more

56
Content Retention and Transfer
  • Ingest into Dspace
  • Explore how geospatial content interacts with
    existing digital repository software environments
  • Investigate re-ingest into a second platform
  • Challenge keep the collection repository-agnostic
  • Start to define format migration paths
  • Special problem geodatabases
  • Purse long term solution
  • Roles of data producing agencies, state agencies
    NC OneMap NCSU

57
Project Status
  • Completing inventory analysis stage
  • Storage system and backup deployed
  • DSpace deployed to production
  • Metadata workflow finalized
  • Ingest workflow near finalization
  • Content migration workflow near finalization
  • Regional site visits planned for coming months
  • Wide range of outreach/collaboration FGDC, ESRI,
    EDINA (JISC), USGS, OGC, TRB, etc.
  • Pilot project, georegistering digital archival
    geologic maps

58
Questions?
Contact Steve Morris Head, Digital Library
Initiatives NCSU Libraries ph (919)
515-1361 Steven_Morris_at_ncsu.edu
Write a Comment
User Comments (0)
About PowerShow.com