Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR


1
Managing the Metadata Lifecycle The Future of
DDI at GESIS and ICPSR
  • Peter Granda, ICPSR

Meinhard Moschner, GESIS Mary Vardigan,
ICPSR Joachim Wackerow, GESIS Wolfgang
Zenk-Möltgen, GESIS
2
Research Data Life Cycle
Archiving
Collection
Concept
Processing
Distribution
Discovery
Analysis
Repurposing
3
Current Uses of DDI
  • DDI 2 used for many different purposes by many
    different archival institutions, e.g., metadata
    records for data catalogs, export to Web-based
    information systems such as Nesstar, long-term
    preservation, and PDF codebooks
  • GESIS and ICPSR are developing procedures and
    systems to extend use of DDI in their
    institutions

4
DDI 3 Expands in Scope
  • To date use mainly limited to Distribution and
    Archiving stages of data life cycle
  • DDI 3 enables use of new elements and structures
    to extend markup to other stages of the life
    cycle - both earlier and later
  • Emphasis is on projects and tasks already in
    process at each institution

5
DDI 3 Use at GESIS
  • Structured Comments Processing
  • Translation of EVS Questionnaire Collection
  • Supporting Enhanced Publications Analysis
  • Continuity Guides Trends by Concepts Concept,
    Discovery, Repurposing

6
Extracting structured information in current
workflow
  • Example building derived variables by SPSS
  • SPSS setups contain commands and comments
  • Necessary steps for using SPSS setups as
    information source for DDI
  • Improving comments for automated extraction
  • formalize layout
  • add keywords from a list
  • Extraction of structured comments and related
    commands by custom tool.
  • Transformation of this information into DDI 3
    fragments

7
Extracting structured information in current
workflow
v Variables/DerivedVariables DESCRIPTION
This section is on derived variables .
v DerivedVariables/w101_new NAME
w101_new DESCRIPTION w101_new is a
derived variable from w101 It has the
original value from w101 when w102 is equal
1 otherwise it has the value 5 USED
VARIABLES w101, w102 SOURCE . compute
w101_new 5 . if ( w102 1 ) w101_new w101
. VERSION 2009-04-18 AUTHOR
Achim Wackerow EMAIL joachim.wackerow_at_gesi
s.org .
Report (HTML)
Extractor
DDI 3 fragments GenerationInstruction Description
Command
SPSS
Result
8
Translation of EVS Questionnaire
DSDM
http//zacat.gesis.org
9
Supporting Enhanced Publications
DDI Alliance
Publications with References to Data DDI 3.1 URN
contains Agency Object Version
find agency gesis.de.ddi
return resolver address
http//resolve.gesis.org
Publication with References (URNs)
find object
return URL
http//www.gesis.org/doc/docxyz
request document
URL of Documentation and/or Data
return document
lturnddi3_1VariableScheme.Variablegesis.de.ddi
ZA3811_VarSch(1_0).V8(1_0)gt
10
Supporting Enhanced Publications
DSDM DDI 3 EPE Simple Export Wizard 1.2.0
11
Grouping Trends
  • Continuity guides in different contexts
  • Synoptical question / variable lists
  • Documentation of changes in question wording /
    answer scales
  • Systematic organization by conceptual categories
  • CodebookExlorer tool (relational DB)
  • Publication as html links on variable level in
    ZACAT
  • Taking advantage of DDI3 in the future
  • Defining the standard and comparison
  • Qualifying relations (e.g. q-text modified, scale
    modified,)

12
Continuity guides
Literal question text over time
Conceptual categories
Deviations in answer categories
13
Trends by concepts
Trend variables by study
Conceptual categories
Country 1
Country 2
14
  • Comparison map
  • Equivalency
  • Relationship
  • Description

DDI3 RESOURCE Ex-post
Standard Universe
Concept Data Collection
ltdcQuestionScheme id"QS"gt ltdcQuestionItem
id"Q"gt ltdcQuestionTextgt
ltdcLiteralTextgt ltdcTextgtDo you
?lt/dcTextgt lt/dcLiteralTextgt
ltdcCodeDomaingt ltrCodeSchemeReferencegt
ltrIDgtCODS1lt/rIDgt lt/rCodeSchemeRefer
encegt Logical Product
ltlCategoryScheme id"CATS1"gt
ltlCategory id"Cat1"gt
ltrLabelgtoftenlt/rLabelgt ltlCodeScheme
id"CODS1"gt ltlCategorySchemeReferencegt
ltrIDgtCATS1lt/rIDgt lt/lCategorySchemeRef
erencegt ltlCode isDiscrete"true"gt
ltlCategoryReferencegt
ltrIDgtCat1lt/rIDgt
lt/lCategoryReferencegt
ltlValuegt1lt/lValuegt lt/lCodegt
STUDY UNIT 1 n DataCollection ltdc
QuestionScheme id"QS"gt ltdcQuestionItem
id"Qn"gt ltdcTextgtHave you ?lt/dcTextgt
LogicalProduct ltlCategoryScheme
id"CATS1"gt ltlCategory id"Cat1"gt
ltrLabelgtoftenlt/rLabelgt ltlCodeScheme
id"CODS1"gt ltlCode isDiscrete"true"gt
ltlCategoryReferencegt
ltrIDgtCat1lt/rIDgt
lt/lCategoryReferencegt
ltlValuegt4lt/lValuegt lt/lCodegt
Questiontext ltgtmodifiedltgt
Label ltgtidenticalltgt
GROUP STUDY UNIT 8-14 DataCollection LogicalProd
uct
Values ltgtdifferentgtgt ltgtgeneration
instructionltgt ltgtscale reversedltgt
GROUP STUDY UNIT 15-x DataCollection LogicalProd
uct
15
DDI 3 Use at ICPSR
  • Information collected from data producers in
    pre-collection phase Concept
  • Metadata output from CAI applications Data
    Collection
  • Processors dashboard Data Processing
  • Metadata mining New faceted search tool to
    facilitate discovery through more precise
    searching Data Discovery
  • Relational database for comparison and
    harmonization across studies Repurposing

16
SMDS Metadata Modules
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
  • A combination of this information forms a
    traditional SIP.
  • Information from each life cycle stage - sent to
    the archive - can be understood as dynamic SIP.
  • Self-archiving by web forms can be offered for
    the different stages.

Collection
Concept
Processing
  • The structured metadata combined with data forms
    the core of the archive.
  • It would be organised in a way where metadata
    can be reused and information can be ingested and
    distributed in a dynamic way.
  • An AIP must be specially built, because the
    metadata can include just references to other
    reused metadata.
  • An AIP should include everything of one study,
    DDI can be also the main structure of the AIP.
    Data can be inline in DDI. An AIP would exist
    beside the core structure in the archive.
  • An easy roundtrip should be possible between the
    core structure and the AIP.
  • The purpose of the AIP is comparable to PDF/A
    where all fonts are included.
  • The core structure is headed to efficient
    processing and reuse of metadata.

Repurposing
CAI Tools MQDS etc.
Information extracted from SPSS etc.
Custom Tools (e.g. Forms-based)
SIP
O A I S
DDI as backbone for structured metadata
Archive
AIP
Data / Documents outside of DDI
DIP
Distribution Packages Web information system
Statistical packages Online Analysis.
Search engines.
Distribution
Discovery
Analysis
23
  • DDI-based archive as collection of reusable
    components
  • Metadata in DDI is structured in small items
    which can be identified and maintained by one or
    more institutions
  • These parts can be
  • the basis for comparison and metadata mining
    (discovery of new relationships)
  • a candidate for reuse in other studies or new
    studies (like standard questions or variables)
  • Repository of
  • reusable components
  • Standard concepts
  • Standard questions
  • Standard variables
  • Harmonized information
  • Controlled vocabularies

Study 1
Study-specific information
Items for reuse
Study 1
Study-specific information
Items for reuse
New study
24
Issues for Discussion
  • Advantages and disadvantages of seeking to
    capture additional metadata throughout the data
    life cycle
  • How much information to make available to funding
    agencies, data producers, and secondary users?
  • Rules for structured documentation and delivery
    of items to archives for preservation
  • An overall DDI tool to capture and curate all
    metadata and data the Holy Grail???
Write a Comment
User Comments (0)
About PowerShow.com