The OpenURL Quality Problem - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

The OpenURL Quality Problem

Description:

Title: PowerPoint Presentation Author: IrisUser Last modified by: LibUser Created Date: 10/20/2005 1:06:50 PM Document presentation format: Custom Company – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 37
Provided by: iris46
Category:

less

Transcript and Presenter's Notes

Title: The OpenURL Quality Problem


1
The OpenURL Quality Problem Project Adam
Chandler Coordinator, Service Design Group Glen
Wiley Metadata Librarian
Metadata Working Group February 22, 2008
2
The Original Problem
  • Reduce linking dead ends from a publishers
    content to another
  • Show multiple subscriptions or relevant access
    points in one place
  • Desire to show the most appropriate version of
    the service (like full text)
  • Improve content visibility
  • Possibly reduce document delivery costs

3
Brief History of OpenURL
  • Originated by Herbert van de Stompel at Univ. of
    Ghent, around 2000
  • Became OpenURL Version 0.1
  • Commercialized by ExLibris (SFX) in 2001
  • Fast-tracked by NISO
  • Released as Version 1.0, but officially as
    international ANSI standard Z39.88 in 2004
  • OCLC is maintenance agency as of June 2006

4
What is OpenURL?
  • OpenURL is a syntax for querying a server
  • to perform a service
  • on a resource
  • specified by attributes
  • sensitive to context
  • also specified by attributes
  • OpenURL is an "actionable" URL that transports
    resource metadata.

5
OpenURL Version 0.1 Example
  • http//linkresolver.library.cornell.edu4550/ress
    erv?genrearticleissn01604120titleEnvironment
    Internationalvolume32issue1date20060101atit
    leTheUnitedStatesDepartmentofEnergy'sRegion
    alCarbonSequestrationPartnershipsprogram.spag
    e128pages128-144sidEBSCOaphaulastLitynski

6
OpenURL Version 1.0 Example
http//linkresolver.library.cornell.edu4550/resse
rv?url_ctx_fmtinfoofi/fmtkevmtxctxrfr_idinf
osid/www.isinet.comwokwosrft.augiordanino,m
rft.epage377rft.stitleknowlengrevrft.date20
07rft_idinfodoi/10.10172fs0269888907001233/ur
l_verz39.88-2004rft.issn0269-8889rft.aulastur
enrft.titleknowledgeengineeringreviewrft.genr
earticlerft.issue4rft.spage361rft_val_fmtin
foofi/fmtkevmtxjournalrft.volume22rft.auini
tvrft.atitletheusabilityofsemanticsearchto
ols3aareview
7
How does it work?
8
OpenURL Version 0.1
  • OpenURL 0.1 is a de facto standard that is built
    around scholarly bibliographic data only
  • An accepted standard syntax for creating a link
    between an information source and a link resolver
  • Pre-defines sets of data elements to use in
    describing an item
  • Relies on HTTP protocol for transmission
  • The concept of context-sensitive linking
    implemented for a specific class of resources
    (some) scholarly assets

9
OpenURL Version 0.1
  • Limitations
  • Pre-defined metadata genres and elements means
    that new ones cannot be defined to meet emerging
    needs (e.g., for image databases)
  • Only provides for key-value pair (HTTP GET or
    POST) representation of metadata.
  • OpenURL 0.1 is tied to HTTP transport
  • Lack of implementation guidelines means that
    support for OpenURL is loosely defined

10
OpenURL Version 1.0
  • Complicated and highly abstract
  • Designed for greater flexibility
  • Slower uptake
  • Supports richer data formats/genres
  • Journal, Article, Proceeding, Preprint, Book,
    Report, Document, Patent, Dissertation, etc
  • Provides more complete context description
  • Supports transport mechanisms other than HTTP
  • like SOAP, OAI-PMH, HTTPS
  • A generic specification that allows to implement
    OpenURL Applications
  • OpenURL Applications networked applications that
    implement the concept of context-sensitive
    services for a certain class of resources

11
Understanding OpenURL Version 1.0
networked resource
Diagram is from Herbert von de Sompels OpenURL
Tutorial at the Olybris 2005 Ex Libris Seminar,
Kos, Greece, April 18th 2005.
12
Understanding OpenURL Version 1.0
  • OpenURL 1.0 divides ContextObject into six
    entities (including the resource)
  • Each entity has attributes to identify it
  • Each entity has schema for those attributes
  • Each entity affects URL resolution

13
Problems with the Standard Documentation
  • Tough read
  • Key/Encoded-Value (KEV) Implementation
    Guidelines are helpful, but complex
  • Not specific enough in many ways. Some mention
    of best practices for metadata values like
  • UTF-8 encoding for special characters
  • DCMI Type Vocabulary for Referent Type (rft.type)
  • MIME type for Referent Format (rft.format)

14
Miriam Blake citation and the Known Issues
  • M.E. Blake, F.L. Knudson. Metadata and Reference
    Linking. Libr. Coll. Acq. Tech. Serv. 26 (2002)
    219230 229
  • Goals for the future
  • Increased consistency in metadata within a single
    database and across databases.
  • Increased communication between primary
    publishers and secondary publishers.
  • Increased awareness of bibliographic/citation
    standards by authors.
  • Increased outreach by librarians to authors
    emphasizing and promoting the importance of
    citation standards for electronic document
    retrieval.

15
Link Resolvers and the Serials Supply ChainUKSG
Report -- 2007
  • Description of the Supply Chain
  • Issues and Barriers
  • Lack of awareness
  • Lack of Co-operation
  • Inaccurate/Incomplete Data
  • Content Package Issues
  • Responsibility of Data Quality
  • Lack of Data Standards
  • Inbound Linking Issues
  • Etc
  • Recommendations

16
Problems Persist
  • 1. Wrong start end date in the local library's
    holdings database
  • 2. Wrong link-to syntax in link resolver
  • 3. Inaccurate or missing Crossref DOI URL (often
    the DOI registration process is out of sync with
    the mounting of articles)

17
Problems Persist
  • 4. Semantically inaccurate metadata from the
    OpenURL origin (wrong ISSN, for example)

18
Problems Persist
5. Syntactically incorrect metadata from the
OpenURL origin
19
Problems Persist
  • 6. Subscription and embargo errors (especially in
    January)
  • For each month that passes the chances of the
    link working is increased by over 8

20
Characteristics of a solution to the OpenURL
quality problem
  • empirical
  • network level problem so it needs be solved at
    the network level
  • sanctioned, officially recognized
  • offer value to librarians and content providers
  • narrow scope

21
Model Open Language Archives Community
Metadata Quality Evaluation Experience from the
Open Language Archives Community, Baden Hughes,
Department of Computer Science and Software
Engineering, University of Melbourne, Abstract.
We describe the motivation, design and
implementation of an infrastructure to support
metadata quality assessment within a specialised
Open Archives Initiative (OAI) sub-domain, the
Open Language Archives Community (OLAC). While
services for structural validation of metadata
are widely used, there is little corresponding
work regarding services which evaluate the
semantic and syntactic content of metadata from a
qualitative perspective. We posit that any
measure of metadata quality benefits from
both contextual and referential assessment -
metadata on a per record and per collection basis
is legitimately assessed against the baseline of
broader community practice, as well as for
compliance to any external standard. In
this paper we describe the implementation of a
metadata quality assessment scheme, and the
corresponding interfaces to the evaluation tool.
http//eprints.infodiv.unimelb.edu.au/archive/0000
1408/01/ICADL2004-PUBLISHED.pdf
22
Model Open Language Archives Community
  • Metrics
  • code existence score, 0-1 (bonus for using
    controlled vocabulary)
  • element absence penalty, 0-1 (penalty for
    missing core elements)
  • per metadata record weighted aggregate, max 10
  • archive level derivative metrics
  • archive diversity metric (use of controlled
    vocabulary across the archive)
  • metadata quality score metric (derived from
    individual scores)
  • core elements per record metric
  • core element usage metric
  • code usage metrics
  • code and element usage metrics
  • star rating (derived from average item score
    in archive)

http//eprints.infodiv.unimelb.edu.au/archive/0000
1408/01/ICADL2004-PUBLISHED.pdf
23
Case Study L'Année philologique
  • Log file provided by Professor Eric Rebillard,
    Director of Graduate Studies, Field of Classics
  • http//www.annee-philologique.com/aph/
  • 126 OpenURLs in sample

24
(No Transcript)
25
(No Transcript)
26
Observations log file scan


Log file is not available in Powerpoint version.
Please contact Adam Chandler for more
information
27
Observations date
  • log examples
  • 2000-2001
  • 2000-2001
  • 2000-2001
  • 2004-2005
  • 2004-2005
  • 2003-2004
  • 2004-2005
  • 1998-1999
  • 2004-2005
  • 2004-2005

Date of publication in ISO 8601 form YYYY,
YYYY-MM or YYYY-MM-MM p.56 NOTE "chron"
Indications of chronology in a non ISO8601 form
(like "Spring" or "1st quarter") should be
carried in this element the element content is
not normalized. Where numeric ISO8601 dates are
also available, they should be provided in the
"date" element. As such, a recorded date of
publication of "Spring, 1992" becomes "date1992"
and "chronspring". Chronology information can
also be provided in the "ssn" and "quarter"
elements p. 57

28
Observations volume and issue
log examples N.20S.205520(1) 720(1) 4320(3-
4) N.20S.205520(2) 4a20ser.20320(1) NB0201
52 NB02054 720(2) 13320(2) 13-14 4a20ser.203
20(1) 3120(1) 13320(2) 3820(3) 9820(1) N.20S
.205520(1)
Volume is usually expressed as a number but
could be roman numerals or non-numeric, e.g.
"124", or "VI"."4 p.57 Issue This is the
designation of the published issue of a journal,
corresponding to the actual physical piece in
most cases. While usually numeric, it could be
nonnumeric. Note that some publications use
chronology in the place of enumeration, i.e.
Spring, 1998. p.58

29
Observations spage

"spage" is missing more useful than pages field
when linking to full text First page number of a
start/end (spage-epage) pair. Note that pages are
not always numeric p.58
30
Observations missing ISSNs

International Standard Serial Number (ISSN).
ISSN numbers may contain a hyphen, e.g.
"1041-5653" p. 59 "ISSN" these are easier to
resolve than titles, especially with titles that
contain special characters
31
Observations character encoding

Character encoding Use UTF-8 Specify character
encoding this way in OpenURl 1.0
infoofi/encUTF-8 Source http//alcme.oclc.org
/openurl/servlet/OAIHandler?verbListRecordsmetad
ataPrefixoai_dcsetCoreCharacterEncodings
32
Observations Missing WorldCat numbers

Including OCLC WorldCat numbers would help to
resolve title level ambiguities, especially when
the request is routed to InterLibrary Loan Data
from title matching in WorldCat 17 titles
without an ISSN To do this means moving from
OpenURL 0.1 to 1.0 format infoofi/naminfooclc
num Source http//alcme.oclc.org/openurl/servl
et/OAIHandler?verbListRecordsmetadataPrefixoai_
dcsetCoreNamespaces
33
Analysis of L'Année philologique in log sample
that are held in WorldCat libraries
Total titles analyzed 81 Total confirmed held by
Cornell in WorldCat 53 (margin of
error) Unconfirmed in or out of WorldCat
6 Median number of libraries that hold these
titles 67 Thus, even if the metadata were
perfect, finding the title through ILL,
especially without an identifier (ISSN, ISBN,
WorldCat) is expensive. Caveat Not all of a
librarys holdings are in WorldCat, especially
journals.

34
Cornell link resolver activity December 3, 2007
February 8, 2008 53,062 openurls were sent to
link resolver.
The scale of the OpenURL quality problem
35
Discussion
36
Notes and links
  • http//library4.library.cornell.edu/openurl/index.
    html
  • How many openurls came into Cornell dec feb?
  • http//www.language-archives.org/index.html
  • http//www.niso.org/standards/standard_detail.cfm?
    std_id783
  • http//erms.library.cornell.edu/webbridge/edit
  • beforelinks.html
Write a Comment
User Comments (0)
About PowerShow.com