Title: PIDs in Data Infrastructures
1PIDs in Data Infrastructures
- Peter Wittenburg
- CLARIN Research Infrastructure
- EUDAT Data Infrastructure
2Automatic Workflows
- most data is created automatically as part of
workflows - manual operations are exceptions
- at data creation time it is not obvious what
their future life will be - later association with metadata and PIDs
troublesome and costly - thus immediate generation of metadata and PIDs as
part of automated - workflows
- data resources need to be referable and often
citable (published) - need a reliable and highly performing machinery
(registration resolution) based on stable
standards
typically Handles via EPIC
typically DOIs via DataCite
3PID usage in our domain
? ? ?
- assume that we have a recording of an extinct
language and some - annotations that tell us what someone said
about medicine etc - researchers create relations that need to be
preserved
Video Recording
from Repository A
from Repository B
Recording Session Metadata Record
Sound Recording
from Repository C
How long, stable and persistent? are using
Handles from EPIC service
Annotations
4PID usage in our domain
? ? ?
Biological and cultural processes have evolved
together, in a symbiotic spiral they are now
indissolubly linked, with human survival unlikely
without such culturally produced aids as
clothing, cooked food, and tools. The twelve
original essays collected in this volume take an
evolutionary perspective on human culture,
examining the emergence of culture in evolution
and the underlying role of brain and cognition.
The essay authors, all internationally prominent
researchers in their fields, draw on the
cognitive sciences -- including linguistics,
developmental psychology, and cognition -- to
develop conceptual and methodological tools for
understanding the interaction of culture and
genome. They go beyond the "how" -- the questions
of behavioral mechanisms -- to address the "why"
-- the evolutionary origin of our psychological
functioning. What was the "X-factor," the magic
ingredient of culture -- the element that took
humans out of the general run of mammals and
other highly social organisms?Several essays
identify specific behavioral and functional
factors that could account for human culture,
including the capacity for "mind reading" that
underlies social and cultural learning and the
nature of morality and inhibitions, while others
emphasize multiple partially independent factors
-- planning, technology, learning, and language.
The X-factor, these essays suggest, is a set of
cognitive adaptations for culture.
ePublication Repository 1
eRessource Repository 2
How long, etc.? Handles from EPIC
5Data Object World
- lets isolate external properties of our data
objects and collections and ignore the content
(structure, semantics, packaging, etc.) for a
moment
goes back to a paper by Kahn Wilensky, 2006
62 DO flavours in our domain
DO
access via metadata
metadata
bit sequence (instance)
immediate access ?
access via PID
PID
- way how we organize data
- different other variants possible
MDO
access via metadata
metadata
bit sequence (instance)
search/browse access
access via PID
PID
7collections in our domain (similar to MPEG21
containers, items, sub-items)
ISOcat Registry (ISO 12620, compl. ISO 11179)
- grouping of related data - large variety of
reasons - versions of a DO - presentations
of a DO - same interview/experim. - many
others - DO part of many collections
category 1 - assoc info category 2 - assoc info
metadata (collection) - category 1 - category
2 ... - category N - PID1 - PID2 ... - PID K
metadata - category 1 - category 2 ... - category
N - PID
PID collection - assoc info PID1 - assoc
info PID2 - assoc info
bit sequence
PID Registry
8EUDAT - common services
- two major tracks
- understanding data organization practices in
communities - provide first common services after 12 months
9PID Use V1 in EUDAT Federation
repository Y
repository Z
repository X
DO1
DO1
DO1
prefx
PIDx
URL URLy URLz CKSM Rights ....
domain X
domain Y
domain Z
10PID Use V2 in EUDAT Federation
repository Y
repository Z
repository X
DO1
DO1
DO1
prefx
prefy
prefz
PIDx
URL RoR HDL CKSM Rights ....
PIDy
URL RoR HDL CKSM Rights ....
PIDz
URL RoR CKSM Rights ....
domain X
domain Y
domain Z
11EUDAT relying on EPIC Handles
- EPIC (European PID Consortium CSC, SARA, GWDG,
more) - large data centers with national/organizational
(MPS) support - applying redundancy schemes (persistence,
availability) - reliability, robustness, performance
(registration, resolution) - all the same API (agreement on information
associated) - thus PID syntax not crucial but storing /finding
information - feasible business model for science
- security of administration DB for system
- persistent and balanced governance for HS
- need a worldwide registry of agreed information
types to feed our stupid machines
12Information types in discussion
- multiple links to resources
- checksum
- link to metadata
- citation metadata
- RoR statement
- mutability flag
- persistency statement
- pointers to presentation versions
- provenance statement
- collection statement
- pointer to rights
- (support for parts/fragments)
- (actionable PIDs)
- need agreements - need standard APIs for EUDAT
this is crucial