P1246990952BuZjz - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

P1246990952BuZjz

Description:

Investigating Metadata for Long-Lived Geospatial Resources: An Exploration ... GER & FGDC seem more specific & less inclusive (only POV of 'distributor') for last 2 ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 30
Provided by: tomcr
Category:

less

Transcript and Presenter's Notes

Title: P1246990952BuZjz


1
Investigating Metadata for Long-Lived Geospatial
Resources An Exploration By Nancy J.
Hoebelheinrich Metadata Coordinator Digital
Library Systems Services
Stanford Digital Repository
2
To Be Discussed
  • The Study
  • Question asked
  • Methodology used
  • MD standards strengths / weaknesses
  • Conclusions Recommendations
  • Future work needed

3
The Study
  • NGDA Project
  • Pertinent project objective
  • Collect and archive major segments of at-risk
    digital geospatial data and images
  • Partners / Backgrounds / Areas of experience
  • UCSB Alexandria Digital Library / Presentation
  • Stanford Libraries Stanford Digital Repository
    / Preservation
  • Differences in experiences gave rise to study
    question

4
Study Question
  • What metadata is needed for long-lived geospatial
    data formats?
  • Grounded in previous studies
  • Hunolt paper for USGCRP Office
  • Digital Preservation Coalition (UK)
  • NSF
  • OAIS Reference model
  • Duerr, Parsons articles
  • OCLC / RLG Preservation studies

5
Methodology used
  • Evaluate four fairly typical geospatial data
    formats
  • Shapefiles, DOQQs, DRGs Landsat 7 satellite
    images (preliminary)
  • Compare / contrast 3 different approaches to
    documenting
  • FGDC Content Standard
  • CIESIN Geospatial Electronic Records
  • PREMIS

6
Categories of information about the resources
  • Environment (computer platforms)
  • Semantic Underpinnings
  • Domain specific terminology
  • Provenance
  • Data trustworthiness
  • Data quality
  • Appropriate use

7
Environment (computing platform)
  • Definition characteristics of the hw / sw
    configuration that allow a resource to function
    properly
  • Function could be defined as
  • Rendering
  • Viewing
  • Using
  • May need to be repeated

8
Environment, cont.
  • All 3 systems have means for documenting these
    characteristics
  • Both PREMIS and GER provide more granularity
    parsability, e.g., creatingApplication, sw, hw
    name, versions dependencies, environment type,
    etc.
  • FGDC uses technical prerequisites native
    data set

9
Semantic Underpinnings
  • Detailed concepts
  • Meaning or essence of data
  • Significance of data, i.e., why preserve it?
  • Purpose or function served by data
  • Intended community
  • FGDC GER have fairly extensive set,
    particularly GER
  • PREMIS NOT descriptive or domain specific, so
    not covered

10
Domain specific terminology
  • For geospatial, particularly valuable
  • Keywords associated with data themes
  • Spatial coverage
  • Time period
  • Stratum coverage / place names
  • GER FGDC cover
  • PREMIS NOT descriptive MD

11
Provenance
  • Detailed concepts
  • Info about the events, parameters source data
    associated w/ construction of data set prior to
    ingestion
  • Source of data
  • Changes made to data inside the preservation
    archive
  • FGDC, GER, PREMIS all ok for 1st 2
  • FGDC NOT for last

12
Provenance, cont.
  • Greater level granularity / parsability in
    PREMIS using Object, Event Agent entities
  • See Example for Rumsey Historical Map Image
    Collection about descriptive MD transformation

13
Use of PREMIS Event Data Elements
  • Example Event 1
  • Transform of descriptive MD from MS Access db
    XML MODS
  • Why this event?
  • In case of questions from outside data provider
  • Retain singular scripts transform mechanisms

14
PREMIS Event Excerpt (v1.1)
15
Use of PREMIS Event Data Elements
  • Example Event 2
  • Merge c\temp\states1c\temp \states2
    c\temp\USA
  • (includes process merge and data sources
  • Advantage can describe events once in
    repository, unlike FGDC, but
  • Can include if prior to ingestion?
  • Why this event?
  • Important to describe processes during different
    phases of lifecycle, even prior to ingestion
  • Not to be able to do so problemmatic for
    geospatial resources
  • Is best practice issue for this domain

16
Data trustworthiness
  • Detailed concepts
  • Who are parties responsible for creation,
    development, storage, maintenance of data set
  • Where is data located
  • How is data available
  • What important factors about the data should be
    preserved

17
Data trustworthiness, cont.
  • Parties
  • Location of data
  • Factors to preserve
  • FGDC, only originator, GER PREMIS more
    granular parsable
  • GER FGDC seem more specific less inclusive
    (only POV of distributor) for last 2

18
Data Quality
  • Detailed concepts
  • General condition statement
  • Accuracy of the data
  • Fidelity of relationships within the data set
  • Accuracy of measurements of the data
  • FGDC has tags, but are very specific
  • GER not much coverage here

19
Appropriate use
  • Detailed concepts
  • Legal use and liability statements
  • Technical characteristics that impact use
  • FGDC PREMIS have, GER NOT
  • FGDC NOT, GER PREMIS have means of linking to
    format registry info
  • More about format registries, later

20
PREMIS Significant properties
  • Way within PREMIS to document
  • Data trustworthiness data creator / provider
    reliable authentic
  • Data quality describing completeness, logical
    consistency, attribute accuracy
  • Data Provenance processes sources for dataset
    understandable reliable
  • Appropriate use understanding of the specific
    needs of the designated community?
  • Other important factors to preserve
  • More work needs to be done in this area

21
Strengths weaknesses FGDC
  • Rich in detail
  • Specificity for the geospatial domain
  • Ubiquity
  • Very complex laborious to complete
  • Poor means for describing relationships among
    file components of a digital resource
  • No way to describe digital resource once within
    preservation archive

22
Strengths weaknesses GER
  • Focus on archiving
  • Comprehensive
  • Little known as yet
  • No data dictionary, so unclear how to apply tags
    (cardinality, repeatability, etc.)
  • Relational DB format
  • Unclear if and/or how to describe digital
    resource once within preservation archive

23
Strengths, weaknesses PREMIS
  • Applicable at many levels of digital resource
    abstract physical
  • Capability for describing relationships among
    file components of digital resource
  • Capability for describing digital resource during
    its entire lifecycle within the preservation
    archive
  • Generic focused upon preservation
  • Not specific enough for geospatial
  • Does not include critical semantics or
    descriptive information important for using
    digital resource
  • Fairly young specification unclear how to
    document significant properties for digital
    resources

24
Recommendations
  • Use of content standard (e.g., FGDC or ISO when
    replaces)
  • Best used for semantics, domain specific
    terminology
  • PREMIS
  • Best used for management of resources over time
    using
  • PREMIS Object entity
  • PREMIS Event entity
  • PREMIS Agent entity
  • Useful to package resources metadata together
    to facilitate tracking of aggregation of
    resource(s), MD resource structure file
    inventory, e.g., METS

25
Issues Challenges
  • What if domain specific MD is not available?
  • If not, how can one get important info from data
    creators?
  • How to determine what is truly necessary for use
    of data sets?
  • Establishment of geospatial format registries
  • Getting buy-in from geospatial domains for use of
    vocabularies, etc. (see Global Spatial Data
    Infrastructure http//www.gsdi.org/Default.asp )
  • More research needs to be done on significant
    properties like that done by JISC DPC studies,
    e.g., SPs of vector images

26
Future directions for NGDA Project
  • Further investigation of other geospatial formats
    including more vector based data such as
  • layers of the National Atlas
  • National Map (sections of California)
  • Landsat 7 ETM imagery
  • Derived data sets from Stanford faculty

27
Future directions, cont.
  • Format Registry investigation - what should be
    included in a format registry for geospatial
  • Contact with key vendors, e.g. ESRI,
    SafeSoftware, etc.
  • Monitoring what others are doing with e-science
    social science data sets, e.g.,
  • NCSU, Johns Hopkins
  • National Australian Archive (NAA)
  • JISC and DPC in the UK
  • NDIIPP US Multi-state project
  • Those using new DDI v 3.0 schema

28
References, contact info
  • JISC DPC studies on significant properties
    http//www.dpconline.org/graphics/events/080407wor
    kshop.html
  • See Duce and Nielsen papers
  • Full paper available at http//www.ngda.org/resea
    rch.php
  • National Geospatial Digital Archive
    http//www.ngda.org/index.php
  • Examples of METS with PREMIS on METS public wiki
  • http//www.socialtext.net/mim-2006/index.cgi?profi
    le_playground

29
Questions? / comments?
  • Nancy J. Hoebelheinrich
  • nhoebel_at_stanford.edu
  • John Banning jwbanning_at_gmail.com
Write a Comment
User Comments (0)
About PowerShow.com