MARC Content Designation and Utilization - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

MARC Content Designation and Utilization

Description:

School of Library and Information Sciences. Texas Center for Digital Knowledge ... its processes and structures are unsustainable, and change needs to be swift. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 40
Provided by: willia81
Category:

less

Transcript and Presenter's Notes

Title: MARC Content Designation and Utilization


1
MARC Content Designation and Utilization
Inquiry and Analysis
  • Future of MARC
  • Challenges and Opportunities of
  • 21st Century Cataloging
  • William E. Moen
  • ltwemoen_at_unt.edugtSchool of Library and
    Information SciencesTexas Center for Digital
    KnowledgeUniversity of North Texas

Research funded by a National Leadership Grant
from the Institute for Museum and Library
Services. Additional support provided by the
University of North Texas School of Library and
Information Sciences and the Texas Center for
Digital Knowledge.
2
To start
  • Discussion of the future of MARC is only
    partially about MARC
  • The broader digital information landscape
  • Technologies
  • Cataloging practices
  • The possible diminishing market share of
  • Libraries in the information marketplace
  • Library catalogs as a resource discovery tool

3
Calhouns report
Today, a large and growing number of students and
scholars routinely bypass library catalogs in
favor of other discovery tools, and the catalog
represents a shrinking proportion of the universe
of scholarly information. The catalog is in
decline, its processes and structures are
unsustainable, and change needs to be
swift. Todays research library catalogseven
those that include records for thousands of
scholarly e-journals and databasesreflect only a
small portion of the expanding universe of
scholarly information. Library catalogs manage
description and access for mostly published
resourcestangible materials such as books,
serials, and audiovisual media, plus licensed
materials such as abstracting and indexing
services, full text databases, and electronic
journals and books In contrast, the stuff of
cultural heritage collections, digital assets,
pre-print services and the open Web, research
labs, and learning management systems remain for
the most part outside the scope of the catalog.
4
When we say MARC?
  • Record format
  • Defined by ISO 2709/ANSI Z39.2
  • Structural elements of the format
  • Metadata scheme
  • Defined by MARC 21
  • Fields, subfields, indicators and their semantics

5
Approaching MARCs future
  • Requirements for a record format / metadata
    scheme
  • Responding to recent developments
  • Looking at empirical data

6
Thinking about requirements
  • Goldsmith Knudsons Requirements LANLs DL
    Repository
  • Granularity
  • lossless data mapping without losing the finer
    shades of meaning intrinsic to the original data
  • Transparency
  • necessary for seamless data interchange,
    requiring a standard widely known throughout the
    digital library community.
  • Extensibility
  • in order to permit changes to the general
    structure without breaking the whole or requiring
    reprocessing of already ingested materials.
  • Tennants Requirements for Bibliographic
    Infrastructure
  • XML-based format
  • Modularity
  • Hierarchy support
  • Community-supported tool sets
  • And others

7
Thinking about requirements
  • McCallums 10 format attributes for MARC Forward
  • XML
  • Granularity
  • Versatility
  • Extensibility
  • Modularity
  • Hierarchy support
  • Crosswalks
  • Tools
  • Cooperative management
  • Pervasive

8
Recent developments
  • Functional Requirements for Bibliographic Records
  • IFLA Study Group on Functional Requirements for
    Bibliographic Records, 1992-1995
  • A conceptual model for the bibliographic
    universe (B. Tillett, 2003).

The aim of the study was to produce a framework
that would provide a clear, precisely stated,
and commonly shared understanding of what it is
that the bibliographic record aims to provide
information about, and what it is that we expect
the record to achieve in terms of answering user
needs.
9
The FRBR model
  • Based on Entity-Relationship modeling
  • Entity something that can be described
  • Attributes the features of the entity that
    characterize it
  • Relationships between entities
  • Three groups of entities in model
  • Group 1 Products of intellectual or artistic
    endeavor
  • Group 2 Entities responsible for the
    intellectual or artistic content, the physical
    production, etc.
  • Group 3 Entities that serve as the subjects of
    intellectual or artistic endeavor
  • Remember what it is that the bibliographic
    record aims to provide information about

10
FRBR Group 1 Entities
11
FRBR -- Group 2 Entities
12
FRBR Group Three Entities
13
FRBR user tasks
  • Remember what it is that we expect the record to
    achieve in terms of answering user needs
  • Four user tasks
  • Find Discovering if something exists by
    searching one or more attributes
  • Identify Examine retrieved records to determine
    the items that met users search request
  • Select Examine retrieved records for those that
    meet other user needs/requirements
  • Obtain Using data in retrieved records to gain
    physical access to the described object

14
Impact on cataloging and catalogs
  • Introduces new terminology and conceptual model
    incorporated in
  • RDA
  • Statement on cataloging principles
  • Assisting in understanding better the range of
    relationships in the bibliographic universe
  • Collocation function of the catalog
  • Improve linking mechanisms
  • Implementation in catalogs to improve user
    experience

15
Recent developments
  • Revision of the Anglo-American Cataloguing Rules
  • No AARC 3
  • Resource Description and Access (RDA)
  • Focus on guidelines for content creation
  • Separation from syntax or record format
  • Designing the future -- Library Systems and Data
    Formats (wiki)
  • Grassroots effort to address next generation
    library catalog and data format

16
Metadata
  • Essential in library applications
  • Variety of metadata schemes
  • Variety of functions and services supported
  • Increasing use of machine-generated metadata
  • Role of handcrafted metadata needs continuing
    review and assessment
  • Research on use of metadata schemes can provide
    empirical data for decisions

17
Metadata record as artifact
  • Metadata creation as process
  • Resulting metadata records as artifacts of the
    process
  • Artifact reflects decisions, policies
  • Artifact can be investigated to understand
    metadata utilization decisions
  • Decisions to use or not use available metadata
    elements

18
Metadata rules practice
  • Library catalogers create metadata
    bibliographic records
  • Follow cataloging rules and other standards to
    create the bibliographic data
  • Encode the bibliographic data into MARC records
  • MARC communications format and metadata scheme
  • Approximately 2,000 structures for encoding data

19
Richness of MARC
MARC 21 Field Groups Currently Defined (in MARC 21 or OCLC MARC Bibliographic Format) MARC 1972
00x 6 3
0xx 311 28
1xx 76 40
2xx 176 15
3xx 155 4
4xx 45 37
5xx 344 8
6xx 235 66
7xx 477 41
8xx 249 36
9xx 16
TOTAL 2074 278
20
What do catalogers use?
  • Given the cataloging rules
  • Given the detailed structuring of bibliographic
    data in MARC records
  • Given training of the catalogers
  • Given local policies and practices
  • What can we learn by examining a large set of
    MARC bibliographic records?

21
Why study MARC utilization?
  • Standard record structure for exchange of
    descriptive and other types of metadata
  • Evolved since late 1960s as key mechanism for
    sharing metadata among libraries
  • Metadata record with approximately 2,000 elements
    available
  • Approximately 200 fields
  • Approximately 1800 subfields or other structures
  • To what extent is the richness/complexity
    exploited and to what purpose?
  • See Goldsmith and Knudson regarding Los Alamos
    Research Library choice of a metadata scheme

Although often disparaged or dismissed in the
library community, the MARC standard, notably
the MARCXML standard, provides surprising
flexibility and robustness for mapping disparate
metadata to a vendor- neutral format for storage,
exchange, and downstream use.
22
Occurrence summary
Frequency of Fields/ Subfields of All Occurrences
gt 600,000 1 4.4
500,000 gt 599,999 0 0
400,000 gt 499,999 13 39.9
300,000 gt 399,999 6 14.3
200,000 gt 299,999 6 10.6
100,000 gt 199,999 10 10.3
TOTAL 36 79.5
  • Only 4 of all fields/subfields account for 80
    of all occurrences
  • 96 of all fields/subfields account for only
    20 of all occurrences

23
The MCDU Project
  • MARC Content Designation Utilization
  • Provide empirical evidence of catalogers use of
    MARC content designation
  • Identify commonly used elements of bibliographic
    records
  • Contribute to community discussion about core
    elements in MARC bibliographic records
  • Explore the evolution of MARC content designation
  • Develop research approach to understand the
    factors influencing levels of MARC content
    designation use

24
Project deliverables
  • Reports containing results of analysis of
    utilization
  • Reports addressing commonly used elements
  • Across formats
  • In context of national recommendations (e.g.,
    BIBCO)
  • In context of FRBR user tasks
  • HistoriMARC
  • Database of MARC historical information about
    evolution of fields/subfields, etc.
  • Enable analysis of patterns of adoption and
    utilization
  • A methodology to understand factors influencing
    catalogers use of MARC
  • Software tools and methods for others to use

25
Dataset and preparation
  • 56,177,383 MARC 21 Bibliographic Records from
    OCLC WorldCat
  • Decomposed the records to store in MySQL
  • Parsing Tool
  • 82 hours to process and load records
  • 295 GB final database size (with indexing)
  • Structuring of decomposed records align with
    analytical questions

26
Additional data preparation
  • Analysis required determining frequency counts by
    format of material (ten)
  • Concern about significant differences in patterns
    of utilization between Library of Congress and
    OCLC member cataloging
  • Partitioned decomposed data into 20 databases
  • Based on source of cataloging
  • Based on format of material

27
Number Number Total
MCDU Project Dataset 56,177,383 100
LC-Created Records LC-Created Records Non-LC-Created Records Non-LC-Created Records
MCDU Project Dataset by LC/nonLC 8,713,665 15.5 47,463,718 84.5 56,177,383

Books Records 7,595,887 13.5 34,546,200 61.5 42,142,087
Cartographic Materials 242,132 0.4 596,642 1.1 838,774
Electronic Resources 39,879 0.1 871,881 1.6 911,760
Continuing Resources 388,332 0.7 2,193,009 3.9 2,581,341
Manuscripts 11,471 0.02 4,390,970 7.8 4,402,441
Music 109,249 0.2 1,167,654 2.1 1,276,903
Sound Recordings 241,940 0.4 1,702,342 3.0 1,944,282
Projected Media 22,088 0.04 1,415,606 2.5 1,437,694
Graphic Materials 62,625 0.1 506,401 0.9 569,026
Three-Dimensional Objects and Realia 62 0.0001 73,013 0.1 73,075
28
Categories of questions
  • General profile of the dataset (e.g.)
  • What is the distribution of records by Type of
    Record?
  • What is the distribution of records by Encoding
    Level?
  • Occurrences of content designation structures
  • What is the number of total occurrences of all
    control and data fields and how many unique field
    tags are used?
  • In how many and in what percentage of records is
    each unique field/subfield combination used at
    least once?

29
Example results
  • 7,595,887 LC-created records in dataset
  • Type of Record Book, Pamphlets, and Printed
    Sheets
  • Total number of unique fields occurring 167
  • Number of fields accounting for 80 of
    occurrences 14 fields (8.3)
  • Number of fields accounting for 90 of
    occurrences 21 fields (12.6)
  • Approximately 110 fields (66) occur in less than
    1 of all records
  • Note Fields are cataloger-supplied, not
    system-supplied

30
Field Tag Number of Records Where Each Field is Used at Least Once Number of Total Occurrences of Each Field Cumulative Total Percentage of Field Occurrences
650 5,387,282 11,778,732 10.910
008 7,595,887 7,595,887 17.945
245 7,595,887 7,595,887 24.981
010 7,595,726 7,595,726 32.016
300 7,586,264 7,586,415 39.043
260 7,585,926 7,585,928 46.069
050 7,027,027 7,095,639 52.642
100 5,626,011 5,626,018 57.853
500 3,264,297 4,582,571 62.097
020 3,845,934 4,235,426 66.020
082 4,034,888 4,036,101 69.758
043 3,665,624 3,665,626 73.154
504 3,373,297 3,403,714 76.306
700 2,312,712 3,240,072 79.307
880 512,563 2,327,504 81.463
31
Making sense of numbers
  • Frequency counts provide raw but informative data
  • Threshold concept to delineate a change in
    trend in utilization
  • Determining commonly occurring elements
  • Comparing to recommended core records
  • Comparing to recommendations for national level
    records
  • Comparing the FRBR user tasks data

32
Element use and FRBR tasks
  • FRBR describes four user tasks
  • Find
  • Identify
  • Select
  • Obtain
  • Are library catalogers providing data to support
    FRBR tasks?
  • Delsey mapped these tasks to MARC CDS for FRBR
    entities

33
FRBR user task Find (search)
  • MARC 21 fields/subfields that can contain author,
    title, or subject data
  • Author-related fields/subfields 119
  • AuthorTitle-related fields/subfields 21
  • Title-related fields/subfields 253
  • Subject-related fields/subfields 144
  • In FRBR context, Delsey identified
  • Approximately 460 fields/subfields can support
    this task for the FRBR entities
  • In MCDU dataset, only 59 (13) of these occur at
    or above the threshold of use in OCLC book records

34
Questions for consideration?
  • What is needed in a bibliographic record?
  • Support for the four user tasks?
  • In context of FRBR, what does it mean to support
    a user task?
  • Management of information resources?
  • How do your systems use the infrequently used
    data?
  • What about the 62 of all fields used in less
    than 1 of the records?

35
Questions for consideration?
  • Can you argue persuasively for the cost/benefit
    of your existing practice?
  • Should the focus be on high-value, high-impact,
    high-quality data in a few fields/subfields?
  • Can you identify these few fields/subfields?
  • What would it mean for costs of cataloging?
  • What would this mean for training?
  • Can MCDU results inform your local practices?

36
New cataloging practices?
  • Select the appropriate metadata scheme.
  • Use level of description and schema (DC, LOM, VRA
    Core, etc,) appropriate to the bibliographic
    resource. Dont apply MARC, AACR2, and LCSH to
    everything.
  • Consider abandoning the use of controlled
    vocabularies LCSH, MESH, etc for topical
    subjects in bibliographic records.
  • Manually enrich metadata in important areas
  • Enhance name, main title, series titles, and
    uniform titles for prolific authors in music,
    literature, and special collections.
  • Automate Metadata Creation
  • Encourage the creation of metadata by vendors,
    and its ingestion into our catalog as early as
    possible in the process.
  • Import enhanced metadata whenever, wherever it is
    available from vendors and other sources.
  • Rethinking How We Provide Bibliographic Services
    for the University of California (December 2005)

37
Confluence for change
  • Within library community
  • Influence of FRBR concepts and model for metadata
  • Resource Description and Access (RDA)
  • Re-examination of library catalog and its
    position within the landscape of resource
    discovery tools
  • Development of a bibliographic metadata element
    set
  • Next generation MARC

38
References
  • MARC Content Designation Utilization Project
  • http//www.mcdu.unt.edu/
  • Moen and Benardino. (2003). Assessing Metadata
    Utilization An Analysis of MARC Content
    Designation Use
  • http//www.unt.edu/wmoen/publications/MARCPaper_Fi
    nal2003pdf.pdf
  • Goldsmith and Knudson. 2006. Repository Librarian
    and the Next Crusade The Search for a Common
    Standard for Digital Repository Metadata
  • http//www.dlib.org/dlib/september06/goldsmith/09g
    oldsmith.html
  • Roy Tennant. (2004). A Bibliographic Metadata
    Infrastructure for the Twenty-first Century
  • Sally H. McCallum. (2006). MARC Forward.
  • http//www.rlg.org/en/pdfs/Forum.8-06.McCallum.pdf

39
References
  • Designing the future -- Library Systems and Data
    Formats
  • http//futurelib.pbwiki.com/
  • Barbara Tillett. (2003). What is FRBR? A
    Conceptual Model for the Bibliographic Universe.
  • http//www.loc.gov/cds/downloads/FRBR.PDF
  • Karen Calhoun. (2006). The Changing Nature of the
    Catalog and its Integration with Other Discovery
    Tools
  • http//www.loc.gov/catdir/calhoun-report-final.pdf
  • Bibliographic Services Task Force. (2005).
    Rethinking How We Provide Bibliographic Services
    for the University of California
  • http//libraries.universityofcalifornia.edu/sopag/
    BSTF/Final.pdf
Write a Comment
User Comments (0)
About PowerShow.com