Norwegian Social Science Data Archive - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Norwegian Social Science Data Archive

Description:

The last 15 years has been focused on building up a common data ... The web: The idea that the archives could create an integrated catalog, Grenoble 1994 ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 27
Provided by: bja58
Category:

less

Transcript and Presenter's Notes

Title: Norwegian Social Science Data Archive


1
Atle Alvheim
  • Norwegian Social Science Data Archive

CESSDA Expert Seminar 2009
2
(No Transcript)
3
A common future ?
  • The last 15 years has been focused on building up
    a common data infrastructure for the social
    sciences, based on modern web-technology

4
  • The web The idea that the archives could create
    an integrated catalog, Grenoble 1994
  • DDI A richer and better data documentation
    format, R.Rockwell / ICPSR IASSIST 1995
  • Integrate 3-4 components
  • Internet / web / Common catalog
  • DDI
  • Access explore analyse download data The social
    science dream machine NESSTAR J.Ryssevik /
    S.Musgrave
  • ILSES Integrated Library and Survey Data
    Extraction Service
  • Richer services, FASTER (Data types) LIMBER
    (Attack the language barrier)
  • One single common entry point,
  • Madiera,

5
(No Transcript)
6
Resources CESSDA Template Controled
vocabularies Multilingual thesaurus CESSDA
classification
Browsing tool Search tool
Harvester Indexing tool
Portal
Publishing
Server 1 2 3 4
Client
Square files
7
THE RESEARCHER
THE ARCHIVES
THE PORTAL
8
  • Greenland
  • Iceland
  • Feroe Islands
  • Norway
  • Sweden
  • Finland
  • Aaland Islands
  • Estonia
  • Latvia
  • Lithuania
  • Belorussia
  • Ukraine
  • Moldova
  • Poland
  • Germany
  • Denmark
  • England
  • Scotland
  • Wales
  • Netherland
  • Belgium
  • Luxembourg
  • France
  • Portugal
  • Spain
  • Andorra
  • Monaco
  • Switzerland
  • Italy
  • San Marino
  • Vatican State
  • Slovenia
  • Lichtenstein
  • Austria
  • Czech republic
  • Slovakia
  • Hungary
  • Romania
  • Croatia
  • Bosnia Herzegovina
  • Montenegro
  • Kosovo
  • Albania
  • Macedonia
  • Greece
  • Cyprus South
  • Cyprus North
  • Malta
  • Turkey
  • Russia ?
  • Georgia ??
  • Armenia ???
  • Israel ????

30 Languages, 45 legal systems We are supposed
to support research, break down technical-,
linguistic-, judicial-, economic
barriers Several processes timelines in a
layered system
9
Access and download
Control access
Share formats and routines
Instrument development
10
(No Transcript)
11
  • Make a more powerful interface to data holdings
  • - more sophisticated search / browse
    possibilities,
    more focused, even across languages
  • - better possibilities to handle results
  • Handle more complex datastructures, over time,
    across space, languages, link micro macro
  • These we may see as analytic dimensions

  • 3. Persistent identity, connect knowledge
    products back into the data used, turn
    traditional picture upside down
  • These are more practical management
  • 4. Handle problems of double storage. Data
    dynamics, more than one value in a table cell
  • Versioning, updating, comments, links,
    references
  • Adding to the data item
  • Single Sign On, need to pass information and
    access more than one server, logging

12
Data have a life-cycle The archive A Greenhouse
or a Graveyard ?
Much data generated by the public statistical
system or other producers
Contact with user community Metadata
standard Tool for instrument development Tool
for data collection Tool for documentation Quest
ion DB, translations A overarching
plan Integration of components
The researcher formulate a problem and need data
to analyse the problem
When data are collected, with necessary metadata,
they represent a SIP
To make data ready for archiving they have to be
documented (and processed), lifted from a SIP to
a AIP
If data have to be collected, we need an
instrument, a questionnaire
Conseptualisation
Instrument
Data production (SIP)
Data documentation (AIP)
Data documentation Should be based on
standardised procedures / best practices and
common tools for all CESSDA () archives DDI
2/3 expressed as a Template/DDI-profile, which
is a) selection of elements, with status
b) element repositories c) controled
vocabularies d) multi-lingual thesaurus
e) gazetteer, geographic classification
f) CESSDA study classification This requires
software or a manual / clear guidelines. DDI
becomes the glue that hold this whole system
together.
Question DB
A questions- and concepts DB is a very useful
tool to develop instruments
A questions DB potentially problematic for data
documentation processes. Better to import
directly via questionnaire
Will make it possible to find questions
from concepts (Need an interface)
Learn from others Encourage comp research Look up
translations
13
Or do updates happen as a harvesting process ?
Question DB
A question database will be related to a basic
storage. Do updates happen as a guarded /
explicit process ? What are the criteria ?
Ingest
Data repositories UKDA DDA FSD
AIP
Our AIPs
When an AIP a inserted into an archive or
storage it can trigger an update of a question
database.
Metadata Metadata Metadata Metadata
Data Data
Data-data-data
Data Data-data-data
Data
Data-data-data
To what degree are packages pre-defined or built
for purposes ?
14
Language
Metadata-standard
Storage
Archive
English
DDI 3.0
DDI 2.0
Other
Fedora
UKDA
Danish and English
DD2.x
Nesstar
Fedora
DDA
Finnish and English
DDI 3.1
Other
Nesstar
FSD

Combinations
Combinations
Because of storage complexity harvesting also
becomes quite complex
15
Data repositories are guarded by access policies.
Policies are usually formulated at institution or
repository level Policies are activated by the
crossing of the line between metadata and data,
which is at data package level Should policies
be linked to packages instead of repositories ?
Should it be an obligatory part of metadata ?
Then we need to have policies formalised.
SSO / AAA
Data repositories UKDA DDA FSD
LOG-DB
Metadata Metadata Metadata
Metadata Data Data
Data-data-data
Data Data-data-data
Data Data-data-data
Data repositories should be documented in
national common language Different
documentation templates for national and
international language
16
CV LifeCycleEvent Study Proposal Study
Design Instrument Design Funding Interviewer
training Ethics Review Sampling Instrument
pre-testing Pilot study Questionnaire
translation Documentation translation DATA
COLLECTION Data collection reports Post-collection
processing Data production Initial data
quality checks Metadata production Original
release DEPOSIT Post-production processing
Data quality checks Data editing Data
integration Processing for Disclosure
Metadata editing Preservation package
production Dissemination package New
version production New version release /
publication
From producer to consumer, the data archival work
Locate, explore and download
Cover the whole data (or project ?) life-cycle
17
CESSDA complications We need services that cover
many servers and many conditions for use
The CESSDA data archives will in due time be both
data providers, aggregators and single service
providers. This is an illustration of what would
presently be the NSD situation.
18
Functionalities we need, with a scale from
producer to consumer
19

The user authentication problem Almost always at
institutional level
20
The user authorisation problem Very often at
resource level
Dataset 1
User
Server 1 DDA Server 2 Server 3 ZA Server 4 Server
5 UKDA Server n
Dataset 2
Portal
Dataset 3
Users, affiliated with national institutions,
based on a common justification (research) and
work within specific projects (Have roles within
projects ?) want to access data resources in
different institutions and countries
21
Complex?
22
Complex?
23
Conceptualisation
Tool
Web browser
Instrument
Data production (SIP)
Portal Search Browse ELSST x time, space,
methodology ELSST Query service
Harmonisation (and concepts) DB
Question DB
8
12
9
4
Intermediate storage
5
Data loader May handle multiple and complex
data packages Explore and compare functionality
Data documentation
DDI 2/3 expressed as Template/DDI-profile, as
a) selection of elements, with status b)
Controled vocabularies c) Multilingual
thesaurus d) Gazetteer e) CESSDA
classification
Registry
1
10
CESSDA Toolkit
6
Download
3
2
SSO/AAA
Ingest (AIP)
Data repositories UKDA DDA FSD
7
Log database
11
Politics (Repository or package level)
24
Internal web services stack
Could interact with WS for metadata preparation
DDI centric back-end CESSDA-DB stores all low
level objects
Web services exposed for public consumption
3CDB/QBD applications call relevant WS
CESSDA WS
Concept Bank
Nesstar Publisher

Universe Bank
3CDB
C3DB WS
3CDB Applications
DDI 1/2.x
Classification Bank

DDI 3.0 Converter
3CDB/QDB Applications
Ingester performs quality assurance, split
metadata and maintains referential integrity for
storage in CESSDA Bank
Geo Bank
local objects
Question Bank

Metadata Ingester
QDB
QDB WS
QDB Applications
Ingest WS
Publication Tool
DDI 3.0
Questionnaire Bank
Instruction Bank
Custom Exporter
local objects
Study Bank
Legacy Database
Back-end maintenance and reporting tools

Variable Bank
Future Services
Future WS
Future Applications
Could interact with WS for metadata preparation
Reporting Tools
Banks
Admin Tools


Security Tools
non-DDI Objects
25
Ingestion/Registration Process
Repository Many metadata repositories can exist
around the network. These can be deployed at the
provider level, or as shared metadata storage.
Example Submission of a Nesstar DDI will
typically result in creation of objects in the
following banks study, classifications,
variables, instance (files) and possibly
concepts, universes, questions, instructions if
such variable level metadata have been compiled.
Concept Bank
Nesstar Publisher
Universe Bank
DDI 1/2.x
Classification Bank
Submission Object registration could be automated
upon release of the metadata by the provider.
Workflow can be implemented as necessary.
Metadata optimization / harmonization Optimization
of the metadata (merging duplicates, aligning on
harmonized objects, etc.) can be done using
various automated, semi-automated or manual
methods during the various stages of submission
(this can also be performed later on)
DDI 3.0 Converter
Geo Bank
Question Bank
Metadata Registry
Publication WS
Metadata Repositories (Banks)
Metadata Ingester
Ingest WS
Publication Tool
DDI 3.0
Questionnaire Bank
Repository WS
Submission Submission packages are prepared by
providers in compliance with the CESSDA DDI3
specification. Publications tools are used to
manage packages and control ingestion process.
Packages are broken down and stored in various
banks (as needed)
Instruction Bank
Interfaces Note that metadata repositories also
expose a set of general and specialized web
services along with administrative / security
interfaces
Custom Exporter
Study Bank
Legacy Database
Example A legacy system used for the production
of questionnaire could create objects in the
question, questionnaire, instruction, concepts,
universes and classification banks. This may
happen outside the context of a survey (question
bank) and no variable would be associated with
these objects.
Variable Bank
Banks
26
Conceptualisation
Tool
Web browser
Instrument
Data production (SIP)
Portal Search Browse ELSST x time, space,
methodology ELSST Query service
Harmonisation (and concepts) DB
Question DB
8
12
9
4
Intermediate storage
5
Data loader May handle multiple and complex
data packages Explore and compare functionality
Data documentation
DDI 2/3 expressed as Template/DDI-profile, as
a) selection of elements, with status b)
Controled vocabularies c) Multilingual
thesaurus d) Gazetteer e) CESSDA
classification
Registry
1
10
CESSDA Toolkit
6
Download
3
2
SSO/AAA
Log database
Ingest (AIP)
Data repositories UKDA DDA FSD
7
11
Politics (Repository or package level)
Write a Comment
User Comments (0)
About PowerShow.com