Title: CS 502: Computing Methods for Digital Libraries
1CS 502 Computing Methods for Digital Libraries
- Lecture 13
- Descriptive Metadata I cataloguing,
classification, authority files
2Administration
- Open laptop examination
- Read the Course Notices for instructions
- Remember, electronic communication is
cheating! - Extension of wireless network
- Uris Library and Olin Library (1st floor and
basement) - Schedule changes
- See Course Notices for next two lectures
- Change of dates for future assignments
3Text Retrieval Conferences (TREC)
- Quantitative research in digital libraries.
- Compare performance of techniques, e.g.,
automatic thesauri, sophisticated term weighting,
natural language techniques, relevance feedback,
and advanced machine learning. - Corpus of several million textual documents
-- 5 Gbytes. - Standard set of tasks, e.g.,
- Search the corpus for topics provided
- Match a stream of documents against standard
queries - Participants include large commercial
companies, small information retrieval vendors,
and university research groups.
4Descriptive metadata
Some methods of information discovery search
descriptive metadata about the objects.
Metadata typically consists of a catalog or
indexing record, or an abstract, one record for
each object.
- Catalog metadata records that have a consistent
structure, organized according to systematic
rules. - Abstract a free text record that summarizes a
longer document. - Indexing record less formal than a catalog
record, but more structure than a simple
abstract.
5Descriptive metadata
- Usually stored separately from the objects
that it - describes, but sometimes is embedded in the
objects. - Usually the metadata is a set of text
fields. - Textual metadata can be used to describe
non-textual objects, e.g., software, images,
music
6Library Cataloguing
Anglo American Cataloguing Rules (AACR2)
rules for what goes into each field of a catalog
record MARC format an exchange format for
catalog records "MARC Catalog" catalog in
MARC format, where content of each field
follows AACR2
7Example Monograph catalog record
Citation Caroline R. Arms, editor, Campus
strategies for libraries and electronic
information. Bedford, MA Digital Press, 1990.
8MARC fields
tag value 001 89-16879 r93 050 Z675.U5C16
1990 082 027.7/0973 20 245 Campus strategies
for libraries and electronic title statement
information/Caroline Arms, editor. 260
Bedford, Mass. Digital Press, c1990.
publisher 300 xi, 404 p. ill. 24 cm.
collation 440 EDUCOM strategies series on
information technology
series title 504 Includes
bibliographical references (p. 373-381). 020
ISBN 1-55558-036-X 34.95
9MARC fields (continued)
650 Academic libraries--United
States--Automation.
subject
heading 650 Libraries and electronic
publishing--United States. 650 Library
information networks--United States. 650
Information technology--United States. 700
Arms, Caroline R. (Caroline Ruth) 040 DLC DLC
DLC 043 n-us--- 955 CIP ver. br02 to SL
02-26-90 985 APIF/MIG
10MARC Encoding
tag 260 subfield a Bedford, Mass.
subfield b Digital Press, subfield
c c1990. MARC encoding 2600abcBedford,
Mass. Digital Press,c1990.
11Name authority files
- Caroline R. Arms or Caroline Ruth Arms?
- Which William Phillips of Cardiff?
- Mark Twain or Samuel Clemens?
- Epithets
- of Cardiff
- doctor
- Dates
- 1832 - 1876
- flourished 1860
- circa 1832 - 1876
12Shared cataloguing
- OCLC -- Large centralized transaction processing
database system - When a library catalogs a book it deposits MARC
record in OCLC - Other libraries can copy the record
- saves duplication of cataloguing
- build database of holdings
- OCLC database has 42 million records
13Subject information
Library of Congress Subject Headings Academic
libraries--United States--Automation Hierarchical
classification Library of Congress call
number Z675.U5C16 Dewey Decimal
Classification 027.7 Creation and maintenance
of lists of subject headings and classifications
is a never ending task.
14Online public access catalog (OPAC)
- First stage
- Library mounts its MARC records on a central
computer - Provides a simple terminal interface and
dedicated terminals - Boolean search -- fielded searching
- Most university libraries reached this stage
about 1990 - Second stage
- Library connects computer to a campus network
and Internet - Converts card catalog records to MARC
(retrospective conversion)
15Library information systems
- When the catalog is online ...
- Add other collections and services
- Secondary information (Inspec, Medline,
Chemical Abstracts) - Reference works (dictionaries,
encyclopedias) - Improve user interface
- Add full text searching
- Add web interface
- Add connections to off-campus information
sources - Scientific journals
- Databases (census, genome)
16Library management systems
A library management system, sometimes called an
integrated library system, integrates the
internal processes of a library, e.g.,
acquisitions, cataloguing, binding, circulation,
etc. It usually contains an online public
access catalog, but does not provide integrated
services to users. Library management systems are
produced by small companies who lack the capital
and technical expertise to develop modern digital
libraries.
17Notes on MARC
- A great achievement
- Developed in 1960s
- Magnetic tape exchange format for printing
catalog records - The dawn of computing
- mixed upper and lower case
- variable length fields,
- repeated fields
- non-Roman scripts
- 100(?) million records with standard content
and format - Thousands of trained librarians (millions?)
18Notes on MARC
- A great problem
- Not designed for computer algorithms
- One record per item (poor links between
records) - Tied to traditional materials and
traditional practices - Not Unicode
- 100 of million records at 100 -- 10 billion
- A classic legacy system!