Title: Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR
1Managing the Metadata Lifecycle The Future of
DDI at GESIS and ICPSR
Meinhard Moschner, GESIS Mary Vardigan,
ICPSR Joachim Wackerow, GESIS Wolfgang
Zenk-Möltgen, GESIS
2Research Data Life Cycle
Archiving
Collection
Concept
Processing
Distribution
Discovery
Analysis
Repurposing
3Current Uses of DDI
- DDI 2 used for many different purposes by many
different archival institutions, e.g., metadata
records for data catalogs, export to Web-based
information systems such as Nesstar, long-term
preservation, and PDF codebooks - GESIS and ICPSR are developing procedures and
systems to extend use of DDI in their
institutions
4DDI 3 Expands in Scope
- To date use mainly limited to Distribution and
Archiving stages of data life cycle - DDI 3 enables use of new elements and structures
to extend markup to other stages of the life
cycle - both earlier and later - Emphasis is on projects and tasks already in
process at each institution
5DDI 3 Use at GESIS
- Structured Comments Processing
- Translation of EVS Questionnaire Collection
- Supporting Enhanced Publications Analysis
- Continuity Guides Trends by Concepts Concept,
Discovery, Repurposing
6Extracting structured information in current
workflow
- Example building derived variables by SPSS
- SPSS setups contain commands and comments
- Necessary steps for using SPSS setups as
information source for DDI - Improving comments for automated extraction
- formalize layout
- add keywords from a list
- Extraction of structured comments and related
commands by custom tool. - Transformation of this information into DDI 3
fragments
7Extracting structured information in current
workflow
v Variables/DerivedVariables DESCRIPTION
This section is on derived variables .
v DerivedVariables/w101_new NAME
w101_new DESCRIPTION w101_new is a
derived variable from w101 It has the
original value from w101 when w102 is equal
1 otherwise it has the value 5 USED
VARIABLES w101, w102 SOURCE . compute
w101_new 5 . if ( w102 1 ) w101_new w101
. VERSION 2009-04-18 AUTHOR
Achim Wackerow EMAIL joachim.wackerow_at_gesi
s.org .
Report (HTML)
Extractor
DDI 3 fragments GenerationInstruction Description
Command
SPSS
Result
8Translation of EVS Questionnaire
DSDM
http//zacat.gesis.org
9Supporting Enhanced Publications
DDI Alliance
Publications with References to Data DDI 3.1 URN
contains Agency Object Version
find agency gesis.de.ddi
return resolver address
http//resolve.gesis.org
Publication with References (URNs)
find object
return URL
http//www.gesis.org/doc/docxyz
request document
URL of Documentation and/or Data
return document
lturnddi3_1VariableScheme.Variablegesis.de.ddi
ZA3811_VarSch(1_0).V8(1_0)gt
10Supporting Enhanced Publications
DSDM DDI 3 EPE Simple Export Wizard 1.2.0
11Grouping Trends
- Continuity guides in different contexts
- Synoptical question / variable lists
- Documentation of changes in question wording /
answer scales - Systematic organization by conceptual categories
- CodebookExlorer tool (relational DB)
- Publication as html links on variable level in
ZACAT - Taking advantage of DDI3 in the future
- Defining the standard and comparison
- Qualifying relations (e.g. q-text modified, scale
modified,)
12Continuity guides
Literal question text over time
Conceptual categories
Deviations in answer categories
13Trends by concepts
Trend variables by study
Conceptual categories
Country 1
Country 2
14- Comparison map
- Equivalency
- Relationship
- Description
DDI3 RESOURCE Ex-post
Standard Universe
Concept Data Collection
ltdcQuestionScheme id"QS"gt ltdcQuestionItem
id"Q"gt ltdcQuestionTextgt
ltdcLiteralTextgt ltdcTextgtDo you
?lt/dcTextgt lt/dcLiteralTextgt
ltdcCodeDomaingt ltrCodeSchemeReferencegt
ltrIDgtCODS1lt/rIDgt lt/rCodeSchemeRefer
encegt Logical Product
ltlCategoryScheme id"CATS1"gt
ltlCategory id"Cat1"gt
ltrLabelgtoftenlt/rLabelgt ltlCodeScheme
id"CODS1"gt ltlCategorySchemeReferencegt
ltrIDgtCATS1lt/rIDgt lt/lCategorySchemeRef
erencegt ltlCode isDiscrete"true"gt
ltlCategoryReferencegt
ltrIDgtCat1lt/rIDgt
lt/lCategoryReferencegt
ltlValuegt1lt/lValuegt lt/lCodegt
STUDY UNIT 1 n DataCollection ltdc
QuestionScheme id"QS"gt ltdcQuestionItem
id"Qn"gt ltdcTextgtHave you ?lt/dcTextgt
LogicalProduct ltlCategoryScheme
id"CATS1"gt ltlCategory id"Cat1"gt
ltrLabelgtoftenlt/rLabelgt ltlCodeScheme
id"CODS1"gt ltlCode isDiscrete"true"gt
ltlCategoryReferencegt
ltrIDgtCat1lt/rIDgt
lt/lCategoryReferencegt
ltlValuegt4lt/lValuegt lt/lCodegt
Questiontext ltgtmodifiedltgt
Label ltgtidenticalltgt
GROUP STUDY UNIT 8-14 DataCollection LogicalProd
uct
Values ltgtdifferentgtgt ltgtgeneration
instructionltgt ltgtscale reversedltgt
GROUP STUDY UNIT 15-x DataCollection LogicalProd
uct
15DDI 3 Use at ICPSR
- Information collected from data producers in
pre-collection phase Concept - Metadata output from CAI applications Data
Collection - Processors dashboard Data Processing
- Metadata mining New faceted search tool to
facilitate discovery through more precise
searching Data Discovery - Relational database for comparison and
harmonization across studies Repurposing
16SMDS Metadata Modules
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22- A combination of this information forms a
traditional SIP. - Information from each life cycle stage - sent to
the archive - can be understood as dynamic SIP. - Self-archiving by web forms can be offered for
the different stages.
Collection
Concept
Processing
- The structured metadata combined with data forms
the core of the archive. - It would be organised in a way where metadata
can be reused and information can be ingested and
distributed in a dynamic way.
- An AIP must be specially built, because the
metadata can include just references to other
reused metadata. - An AIP should include everything of one study,
DDI can be also the main structure of the AIP.
Data can be inline in DDI. An AIP would exist
beside the core structure in the archive. - An easy roundtrip should be possible between the
core structure and the AIP. - The purpose of the AIP is comparable to PDF/A
where all fonts are included. - The core structure is headed to efficient
processing and reuse of metadata.
Repurposing
CAI Tools MQDS etc.
Information extracted from SPSS etc.
Custom Tools (e.g. Forms-based)
SIP
O A I S
DDI as backbone for structured metadata
Archive
AIP
Data / Documents outside of DDI
DIP
Distribution Packages Web information system
Statistical packages Online Analysis.
Search engines.
Distribution
Discovery
Analysis
23- DDI-based archive as collection of reusable
components - Metadata in DDI is structured in small items
which can be identified and maintained by one or
more institutions - These parts can be
- the basis for comparison and metadata mining
(discovery of new relationships) - a candidate for reuse in other studies or new
studies (like standard questions or variables)
- Repository of
- reusable components
- Standard concepts
- Standard questions
- Standard variables
- Harmonized information
- Controlled vocabularies
Study 1
Study-specific information
Items for reuse
Study 1
Study-specific information
Items for reuse
New study
24Issues for Discussion
- Advantages and disadvantages of seeking to
capture additional metadata throughout the data
life cycle - How much information to make available to funding
agencies, data producers, and secondary users? - Rules for structured documentation and delivery
of items to archives for preservation - An overall DDI tool to capture and curate all
metadata and data the Holy Grail???