Title: MARC Content Designation and Utilization
1MARC Content Designation and Utilization
Inquiry and Analysis
- Future of MARC
- Challenges and Opportunities of
- 21st Century Cataloging
- William E. Moen
- ltwemoen_at_unt.edugtSchool of Library and
Information SciencesTexas Center for Digital
KnowledgeUniversity of North Texas
Research funded by a National Leadership Grant
from the Institute for Museum and Library
Services. Additional support provided by the
University of North Texas School of Library and
Information Sciences and the Texas Center for
Digital Knowledge.
2To start
- Discussion of the future of MARC is only
partially about MARC - The broader digital information landscape
- Technologies
- Cataloging practices
- The possible diminishing market share of
- Libraries in the information marketplace
- Library catalogs as a resource discovery tool
3Calhouns report
Today, a large and growing number of students and
scholars routinely bypass library catalogs in
favor of other discovery tools, and the catalog
represents a shrinking proportion of the universe
of scholarly information. The catalog is in
decline, its processes and structures are
unsustainable, and change needs to be
swift. Todays research library catalogseven
those that include records for thousands of
scholarly e-journals and databasesreflect only a
small portion of the expanding universe of
scholarly information. Library catalogs manage
description and access for mostly published
resourcestangible materials such as books,
serials, and audiovisual media, plus licensed
materials such as abstracting and indexing
services, full text databases, and electronic
journals and books In contrast, the stuff of
cultural heritage collections, digital assets,
pre-print services and the open Web, research
labs, and learning management systems remain for
the most part outside the scope of the catalog.
4When we say MARC?
- Record format
- Defined by ISO 2709/ANSI Z39.2
- Structural elements of the format
- Metadata scheme
- Defined by MARC 21
- Fields, subfields, indicators and their semantics
5Approaching MARCs future
- Requirements for a record format / metadata
scheme - Responding to recent developments
- Looking at empirical data
6Thinking about requirements
- Goldsmith Knudsons Requirements LANLs DL
Repository - Granularity
- lossless data mapping without losing the finer
shades of meaning intrinsic to the original data - Transparency
- necessary for seamless data interchange,
requiring a standard widely known throughout the
digital library community. - Extensibility
- in order to permit changes to the general
structure without breaking the whole or requiring
reprocessing of already ingested materials. - Tennants Requirements for Bibliographic
Infrastructure - XML-based format
- Modularity
- Hierarchy support
- Community-supported tool sets
- And others
7Thinking about requirements
- McCallums 10 format attributes for MARC Forward
- XML
- Granularity
- Versatility
- Extensibility
- Modularity
- Hierarchy support
- Crosswalks
- Tools
- Cooperative management
- Pervasive
8Recent developments
- Functional Requirements for Bibliographic Records
- IFLA Study Group on Functional Requirements for
Bibliographic Records, 1992-1995 - A conceptual model for the bibliographic
universe (B. Tillett, 2003).
The aim of the study was to produce a framework
that would provide a clear, precisely stated,
and commonly shared understanding of what it is
that the bibliographic record aims to provide
information about, and what it is that we expect
the record to achieve in terms of answering user
needs.
9The FRBR model
- Based on Entity-Relationship modeling
- Entity something that can be described
- Attributes the features of the entity that
characterize it - Relationships between entities
- Three groups of entities in model
- Group 1 Products of intellectual or artistic
endeavor - Group 2 Entities responsible for the
intellectual or artistic content, the physical
production, etc. - Group 3 Entities that serve as the subjects of
intellectual or artistic endeavor - Remember what it is that the bibliographic
record aims to provide information about
10FRBR Group 1 Entities
11FRBR -- Group 2 Entities
12FRBR Group Three Entities
13FRBR user tasks
- Remember what it is that we expect the record to
achieve in terms of answering user needs - Four user tasks
- Find Discovering if something exists by
searching one or more attributes - Identify Examine retrieved records to determine
the items that met users search request - Select Examine retrieved records for those that
meet other user needs/requirements - Obtain Using data in retrieved records to gain
physical access to the described object
14Impact on cataloging and catalogs
- Introduces new terminology and conceptual model
incorporated in - RDA
- Statement on cataloging principles
- Assisting in understanding better the range of
relationships in the bibliographic universe - Collocation function of the catalog
- Improve linking mechanisms
- Implementation in catalogs to improve user
experience
15Recent developments
- Revision of the Anglo-American Cataloguing Rules
- No AARC 3
- Resource Description and Access (RDA)
- Focus on guidelines for content creation
- Separation from syntax or record format
- Designing the future -- Library Systems and Data
Formats (wiki) - Grassroots effort to address next generation
library catalog and data format
16Metadata
- Essential in library applications
- Variety of metadata schemes
- Variety of functions and services supported
- Increasing use of machine-generated metadata
- Role of handcrafted metadata needs continuing
review and assessment - Research on use of metadata schemes can provide
empirical data for decisions
17Metadata record as artifact
- Metadata creation as process
- Resulting metadata records as artifacts of the
process - Artifact reflects decisions, policies
- Artifact can be investigated to understand
metadata utilization decisions - Decisions to use or not use available metadata
elements
18Metadata rules practice
- Library catalogers create metadata
bibliographic records - Follow cataloging rules and other standards to
create the bibliographic data - Encode the bibliographic data into MARC records
- MARC communications format and metadata scheme
- Approximately 2,000 structures for encoding data
19Richness of MARC
MARC 21 Field Groups Currently Defined (in MARC 21 or OCLC MARC Bibliographic Format) MARC 1972
00x 6 3
0xx 311 28
1xx 76 40
2xx 176 15
3xx 155 4
4xx 45 37
5xx 344 8
6xx 235 66
7xx 477 41
8xx 249 36
9xx 16
TOTAL 2074 278
20What do catalogers use?
- Given the cataloging rules
- Given the detailed structuring of bibliographic
data in MARC records - Given training of the catalogers
- Given local policies and practices
- What can we learn by examining a large set of
MARC bibliographic records?
21Why study MARC utilization?
- Standard record structure for exchange of
descriptive and other types of metadata - Evolved since late 1960s as key mechanism for
sharing metadata among libraries - Metadata record with approximately 2,000 elements
available - Approximately 200 fields
- Approximately 1800 subfields or other structures
- To what extent is the richness/complexity
exploited and to what purpose? - See Goldsmith and Knudson regarding Los Alamos
Research Library choice of a metadata scheme
Although often disparaged or dismissed in the
library community, the MARC standard, notably
the MARCXML standard, provides surprising
flexibility and robustness for mapping disparate
metadata to a vendor- neutral format for storage,
exchange, and downstream use.
22Occurrence summary
Frequency of Fields/ Subfields of All Occurrences
gt 600,000 1 4.4
500,000 gt 599,999 0 0
400,000 gt 499,999 13 39.9
300,000 gt 399,999 6 14.3
200,000 gt 299,999 6 10.6
100,000 gt 199,999 10 10.3
TOTAL 36 79.5
- Only 4 of all fields/subfields account for 80
of all occurrences - 96 of all fields/subfields account for only
20 of all occurrences
23The MCDU Project
- MARC Content Designation Utilization
- Provide empirical evidence of catalogers use of
MARC content designation - Identify commonly used elements of bibliographic
records - Contribute to community discussion about core
elements in MARC bibliographic records - Explore the evolution of MARC content designation
- Develop research approach to understand the
factors influencing levels of MARC content
designation use
24Project deliverables
- Reports containing results of analysis of
utilization - Reports addressing commonly used elements
- Across formats
- In context of national recommendations (e.g.,
BIBCO) - In context of FRBR user tasks
- HistoriMARC
- Database of MARC historical information about
evolution of fields/subfields, etc. - Enable analysis of patterns of adoption and
utilization - A methodology to understand factors influencing
catalogers use of MARC - Software tools and methods for others to use
25Dataset and preparation
- 56,177,383 MARC 21 Bibliographic Records from
OCLC WorldCat - Decomposed the records to store in MySQL
- Parsing Tool
- 82 hours to process and load records
- 295 GB final database size (with indexing)
- Structuring of decomposed records align with
analytical questions
26Additional data preparation
- Analysis required determining frequency counts by
format of material (ten) - Concern about significant differences in patterns
of utilization between Library of Congress and
OCLC member cataloging - Partitioned decomposed data into 20 databases
- Based on source of cataloging
- Based on format of material
27Number Number Total
MCDU Project Dataset 56,177,383 100
LC-Created Records LC-Created Records Non-LC-Created Records Non-LC-Created Records
MCDU Project Dataset by LC/nonLC 8,713,665 15.5 47,463,718 84.5 56,177,383
Books Records 7,595,887 13.5 34,546,200 61.5 42,142,087
Cartographic Materials 242,132 0.4 596,642 1.1 838,774
Electronic Resources 39,879 0.1 871,881 1.6 911,760
Continuing Resources 388,332 0.7 2,193,009 3.9 2,581,341
Manuscripts 11,471 0.02 4,390,970 7.8 4,402,441
Music 109,249 0.2 1,167,654 2.1 1,276,903
Sound Recordings 241,940 0.4 1,702,342 3.0 1,944,282
Projected Media 22,088 0.04 1,415,606 2.5 1,437,694
Graphic Materials 62,625 0.1 506,401 0.9 569,026
Three-Dimensional Objects and Realia 62 0.0001 73,013 0.1 73,075
28Categories of questions
- General profile of the dataset (e.g.)
- What is the distribution of records by Type of
Record? - What is the distribution of records by Encoding
Level? - Occurrences of content designation structures
- What is the number of total occurrences of all
control and data fields and how many unique field
tags are used? - In how many and in what percentage of records is
each unique field/subfield combination used at
least once?
29Example results
- 7,595,887 LC-created records in dataset
- Type of Record Book, Pamphlets, and Printed
Sheets - Total number of unique fields occurring 167
- Number of fields accounting for 80 of
occurrences 14 fields (8.3) - Number of fields accounting for 90 of
occurrences 21 fields (12.6) - Approximately 110 fields (66) occur in less than
1 of all records - Note Fields are cataloger-supplied, not
system-supplied
30Field Tag Number of Records Where Each Field is Used at Least Once Number of Total Occurrences of Each Field Cumulative Total Percentage of Field Occurrences
650 5,387,282 11,778,732 10.910
008 7,595,887 7,595,887 17.945
245 7,595,887 7,595,887 24.981
010 7,595,726 7,595,726 32.016
300 7,586,264 7,586,415 39.043
260 7,585,926 7,585,928 46.069
050 7,027,027 7,095,639 52.642
100 5,626,011 5,626,018 57.853
500 3,264,297 4,582,571 62.097
020 3,845,934 4,235,426 66.020
082 4,034,888 4,036,101 69.758
043 3,665,624 3,665,626 73.154
504 3,373,297 3,403,714 76.306
700 2,312,712 3,240,072 79.307
880 512,563 2,327,504 81.463
31Making sense of numbers
- Frequency counts provide raw but informative data
- Threshold concept to delineate a change in
trend in utilization - Determining commonly occurring elements
- Comparing to recommended core records
- Comparing to recommendations for national level
records - Comparing the FRBR user tasks data
32Element use and FRBR tasks
- FRBR describes four user tasks
- Find
- Identify
- Select
- Obtain
- Are library catalogers providing data to support
FRBR tasks? - Delsey mapped these tasks to MARC CDS for FRBR
entities
33FRBR user task Find (search)
- MARC 21 fields/subfields that can contain author,
title, or subject data - Author-related fields/subfields 119
- AuthorTitle-related fields/subfields 21
- Title-related fields/subfields 253
- Subject-related fields/subfields 144
- In FRBR context, Delsey identified
- Approximately 460 fields/subfields can support
this task for the FRBR entities - In MCDU dataset, only 59 (13) of these occur at
or above the threshold of use in OCLC book records
34Questions for consideration?
- What is needed in a bibliographic record?
- Support for the four user tasks?
- In context of FRBR, what does it mean to support
a user task? - Management of information resources?
- How do your systems use the infrequently used
data? - What about the 62 of all fields used in less
than 1 of the records?
35Questions for consideration?
- Can you argue persuasively for the cost/benefit
of your existing practice? - Should the focus be on high-value, high-impact,
high-quality data in a few fields/subfields? - Can you identify these few fields/subfields?
- What would it mean for costs of cataloging?
- What would this mean for training?
- Can MCDU results inform your local practices?
36New cataloging practices?
- Select the appropriate metadata scheme.
- Use level of description and schema (DC, LOM, VRA
Core, etc,) appropriate to the bibliographic
resource. Dont apply MARC, AACR2, and LCSH to
everything. - Consider abandoning the use of controlled
vocabularies LCSH, MESH, etc for topical
subjects in bibliographic records. - Manually enrich metadata in important areas
- Enhance name, main title, series titles, and
uniform titles for prolific authors in music,
literature, and special collections. - Automate Metadata Creation
- Encourage the creation of metadata by vendors,
and its ingestion into our catalog as early as
possible in the process. - Import enhanced metadata whenever, wherever it is
available from vendors and other sources. - Rethinking How We Provide Bibliographic Services
for the University of California (December 2005)
37Confluence for change
- Within library community
- Influence of FRBR concepts and model for metadata
- Resource Description and Access (RDA)
- Re-examination of library catalog and its
position within the landscape of resource
discovery tools - Development of a bibliographic metadata element
set - Next generation MARC
38References
- MARC Content Designation Utilization Project
- http//www.mcdu.unt.edu/
- Moen and Benardino. (2003). Assessing Metadata
Utilization An Analysis of MARC Content
Designation Use - http//www.unt.edu/wmoen/publications/MARCPaper_Fi
nal2003pdf.pdf - Goldsmith and Knudson. 2006. Repository Librarian
and the Next Crusade The Search for a Common
Standard for Digital Repository Metadata - http//www.dlib.org/dlib/september06/goldsmith/09g
oldsmith.html - Roy Tennant. (2004). A Bibliographic Metadata
Infrastructure for the Twenty-first Century - Sally H. McCallum. (2006). MARC Forward.
- http//www.rlg.org/en/pdfs/Forum.8-06.McCallum.pdf
39References
- Designing the future -- Library Systems and Data
Formats - http//futurelib.pbwiki.com/
- Barbara Tillett. (2003). What is FRBR? A
Conceptual Model for the Bibliographic Universe. - http//www.loc.gov/cds/downloads/FRBR.PDF
- Karen Calhoun. (2006). The Changing Nature of the
Catalog and its Integration with Other Discovery
Tools - http//www.loc.gov/catdir/calhoun-report-final.pdf
- Bibliographic Services Task Force. (2005).
Rethinking How We Provide Bibliographic Services
for the University of California - http//libraries.universityofcalifornia.edu/sopag/
BSTF/Final.pdf