Persistent identifiers: the 7 levels of identification - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Persistent identifiers: the 7 levels of identification

Description:

ISWC: International Standard Musical Work Code. T-345246800-1 ... separately (e-book chapters, images within a book, teddy bears on sale in book stores) ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 21
Provided by: juha9
Category:

less

Transcript and Presenter's Notes

Title: Persistent identifiers: the 7 levels of identification


1
Persistent identifiersthe 7 levels of
identification
  • Juha Hakala
  • Helsinki University Library
  • ELAG 2005 1-3 June 2005, CERN

2
Persistence?
  • Is not dependent on the identifier itself, but on
    legal, organisational and technical
    infrastructure
  • ISSN would collapse without the ISSN standard, a
    community using it according to the generally
    accepted principles, ISSN International Centre
    governing the system and the ISSN database
    linking the non-semantic (that is, dumb)
    identifiers to serials
  • Even a technically brilliant system may be
    discontinued if its mission breaks apart

3
Normal identifiers and resolution services
  • Resolution services are a new brand of
    identifiers which render traditional identifier
    systems actionable in the Internet (Web)
    environment
  • Resolve provide a link from reference to the
    resource
  • Prime examples DOI and URN
  • Both may encompass, at least in principle, any
    existing identifier (URN namespaces have been
    defined for e.g. ISSN and ISBN)
  • Both are useless without an existing identifier
    adding flesh to the DOI/URN bones
  • From now on, only normal identifiers will be
    discusses
  • Complex enough topic for 35 minutes

4
Seven levels of identifiers
  • After the collapse of integrated library system
    paradigm, and implementation of IR portals,
    digital asset management systems, digital
    archives, e-resource management systems, what do
    we need to identify?
  • This can be analysed from top to bottom, from
    organisations to search attributes
  • Such analysis may show gaps and help in design of
    identifier systems

5
Top level libraries
  • Identifier system must cover at least other
    (memory) organisations
  • National level (union catalogue codes) exists
    due to the Internet / Web it became necessary to
    develop an international system
  • ISIL, International Standard Identifier for
    Libraries and Related Organisations ISO 15511
  • Consists of ISO country code, hyphen and UC code
  • FI-H (Helsinki University Library)
  • Danish Library Authority hosts the ISIL IC
    national centres have been established in some
    countries but the system needs wider acceptance

6
2nd level collections and services
  • These identifiers are important for IR portals
    international exchange of collection service
    (e.g. a Z39.50 server) metadata is cumbersome
    unless there is an efficient means for duplicate
    control
  • These identifiers do not exist yet
  • Helsinki University Library is writing a New Work
    Item proposal for ISO TC 46 on ISCI
    International Standard Collection Identifier
  • No on-going efforts to develop service ID

7
ISCI design principles
  • Will be based on ISIL in order to allow efficient
    decentralization of the ISCI assignment and
    creation of Internet-wide resolution service
    without a global ISCI DB
  • Will consist of three parts ISIL, delimiting
    character (colon) and the actual (colon-less)
    collection identifier
  • FI-HSlavica (Slavic collection in HUL)
  • Need for an international support center?

8
3rd level authors
  • International exchange of authority records can
    be made more efficient with persistent and unique
    identification
  • ISADN, International Standard Authority Data
    Number, has been discussed for quite a few years,
    but it is not yet formally under development
  • Retrospective assignment may create interesting
    ownership problems, especially if the future
    ISADN contains country of origin
  • Is Franz Liszt German or Hungarian?

9
4rd level identifiers for works
  • ISWC International Standard Musical Work Code
  • T-345246800-1
  • Letter T, 9-digit unique number and check digit
  • ISAN International Standard Audiovisual Number
  • ISAN 006A-15FA-002B-C95F-A
  • 12-digit root segment 4-digit segment for
    episode identification and check digit
  • ISTC International Standard Text Code
  • ISTC OA9 2005 12B4A105 6
  • agency code, year, work element check digit
  • These systems were developed at the same time,
    but their syntax and terminology used varies
  • This should not complicate usage too much

10
ISTC/ISWC/ISAN issues
  • Many library system vendors are investigating the
    possibility of implementing FRBR, but few have
    been capable of doing it (VTLS, OCLC)
  • Once an ILMS is frbrized, implementing work
    identifiers is essential, but there is more than
    technology to consider here
  • Do we need to pay for these identifiers even
    when retrospectively generating them for old
    works?
  • Who will establish the national centers and
    create the identifiers (and work level records
    they require)?

11
5th level manifestations
  • This used to be familiar terrain for us
  • ISBN, ISSN, NBN belong here
  • E-publishing has destroyed the old status quo
  • Systems that worked well for decades have
    adaptation problems for different reasons
  • It is not yet entirely clear if the revisions
    done (or planned) are sufficient

12
E-problems with manifestations
  • It is increasingly difficult to define valid
    targets
  • ISSN could be assigned to any Web site out there
  • Publishers want to give ISBNs to anything that
    can in principle be sold separately (e-book
    chapters, images within a book, teddy bears on
    sale in book stores)
  • The number of things to be identified is growing
    fast this will cause syntax problems (ISBN
    revision was done to make more room) and staff
    issues in ISSN/ISBN national centers
  • There is no point to give a persistent identifier
    to a non-persistent resource therefore resources
    must be identified, described archived which
    is labour-intensive process

13
Case ISBN
  • The old ISBN was running out of number space
  • Several extension options were discussed
  • 13, 16, even 32-digit ISBNs
  • The idea to make ISBN a dumb number such as
    ISSN was voted down (for this the librarians in
    the WG are to blame)
  • The new ISBN will be compliant with the EAN
    system
  • 13 digits, starting with 978, 979 or in the
    future with something else to extend the scope of
    the system further
  • New check digit calculation algorithm adopted
    from EAN
  • It is possible to convert from an old ISBN to the
    new (starting with 978) and back
  • Publishers retroconvert to new ISBNs libraries
    will keep the old ones
  • ILMS need to do sophisticated things with old/new
    ISBNs

14
6th level component parts
  • Libraries have not done too well in this area in
    the past due to staff limitations
  • We catalogue serials but not the articles
  • E-publishing may force us to change tactics since
    now even component parts are separate items
    accessible directly
  • Manual processing must be partially or fully be
    replaced by automated processes this will also
    have an impact on identifiers
  • Automated ID generation solves the staff
    bottleneck

15
SICI still alive, but not kicking
  • Serial Item and Component Identifier, 1991-
  • NISO standard has never really taken off
  • Can be generated programmatically provided that
    the article is structured enough
  • 0095-4403(199502/03)213ltgt1.0.TX2-Y
  • Complex consists of ISSN and stuff identifying
    the issue and article within it
  • Publishers have their own systems like PII which
    have been easier to create and maintain (for
    them)
  • Still not clear how popular SICI will eventually
    be

16
BICI Dead On Arrival, or conflict between theory
and practice
  • Book Item and Contribution Identifier
  • NISO draft standard, never completed
  • Consists of ISBN and extra stuff to identify the
    relevant section within the book may be
    automatically generated
  • Publishers book stores prefer to rely solely on
    ISBN in their systems
  • Using ISBN only is not a neat solution (uses a
    lot of ISBNs, and giving ISBN both for the thing
    as a whole and its component parts is messy)

17
7th level search attributes etc.
  • Within Z39.50, sets (e.g. attribute and
    diagnostic), record syntaxes etc. are identified
    by ISO Object Identifiers
  • MARC21 1.2.840.10003.5.10
  • Bib-1 1.2.840.10003.3.1 term examples
  • Author 1.2.840.10003.3.1.1.1003
  • Name 1.2.840.10003.3.1.1.1002
  • Author-name personal 1.2.840.10003.3.1.1.1004
  • Personal name 1.2.840.10003.3.1.1.1

18
OID problems
  • Bib-1 attribute set is not quite as coherent as
    it should be, there are lots of (domestic) search
    attributes missing from it, and sometimes there
    are too many alternatives
  • Attempt to develop Bib-2 failed, and even if we
    succeed in the future, co-existence of Bib-1 and
    Bib-2 may cause trouble
  • ISO OIDs can be applied to anything
  • Not clear how to use them in bibliographic
    context to e.g. identify government publications
    or parts of them this is currently being
    investigated in Finland

19
Conclusion
  • E-publishing and new applications (and their
    novel metadata) have expanded both the scope of
    identifiers needed and the requirements towards
    existing systems, especially on manifestation
    component parts levels
  • Standards developers have reacted to these needs,
    but the progress has been slow still, on some
    areas system builders have been even more slow

20
Conclusion (2)
  • Identifier is more than just a string of
    characters
  • There must be an agent which assigns the
    identifier to a resource, and (usually) describes
    it
  • As long as all parts in this picture are stable,
    identification is a routine process
  • Agent breakdowns have been the most common reason
    for problems in the past
  • Number of national ISSN agencies are non-active
  • E-resources have destroyed the balance, and it
    may take a while before the identification system
    works again in business as usual style
Write a Comment
User Comments (0)
About PowerShow.com