CS 502: Computing Methods for Digital Libraries - PowerPoint PPT Presentation

About This Presentation
Title:

CS 502: Computing Methods for Digital Libraries

Description:

Read the Course Notices for instructions. Remember, electronic communication is cheating! ... Uris Library and Olin Library (1st floor and basement) Schedule changes ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 19
Provided by: wya54
Category:

less

Transcript and Presenter's Notes

Title: CS 502: Computing Methods for Digital Libraries


1
CS 502 Computing Methods for Digital Libraries
  • Lecture 13
  • Descriptive Metadata I cataloguing,
    classification, authority files

2
Administration
  • Open laptop examination
  • Read the Course Notices for instructions
  • Remember, electronic communication is
    cheating!
  • Extension of wireless network
  • Uris Library and Olin Library (1st floor and
    basement)
  • Schedule changes
  • See Course Notices for next two lectures
  • Change of dates for future assignments

3
Text Retrieval Conferences (TREC)
  • Quantitative research in digital libraries.
  • Compare performance of techniques, e.g.,
    automatic thesauri, sophisticated term weighting,
    natural language techniques, relevance feedback,
    and advanced machine learning.
  • Corpus of several million textual documents
    -- 5 Gbytes.
  • Standard set of tasks, e.g.,
  • Search the corpus for topics provided
  • Match a stream of documents against standard
    queries
  • Participants include large commercial
    companies, small information retrieval vendors,
    and university research groups.

4
Descriptive metadata
Some methods of information discovery search
descriptive metadata about the objects.
Metadata typically consists of a catalog or
indexing record, or an abstract, one record for
each object.
  • Catalog metadata records that have a consistent
    structure, organized according to systematic
    rules.
  • Abstract a free text record that summarizes a
    longer document.
  • Indexing record less formal than a catalog
    record, but more structure than a simple
    abstract.

5
Descriptive metadata
  • Usually stored separately from the objects
    that it
  • describes, but sometimes is embedded in the
    objects.
  • Usually the metadata is a set of text
    fields.
  • Textual metadata can be used to describe
    non-textual objects, e.g., software, images,
    music

6
Library Cataloguing
Anglo American Cataloguing Rules (AACR2)
rules for what goes into each field of a catalog
record MARC format an exchange format for
catalog records "MARC Catalog" catalog in
MARC format, where content of each field
follows AACR2
7
Example Monograph catalog record
Citation Caroline R. Arms, editor, Campus
strategies for libraries and electronic
information. Bedford, MA Digital Press, 1990.
8
MARC fields
tag value 001 89-16879 r93 050 Z675.U5C16
1990 082 027.7/0973 20 245 Campus strategies
for libraries and electronic title statement
information/Caroline Arms, editor. 260
Bedford, Mass. Digital Press, c1990.
publisher 300 xi, 404 p. ill. 24 cm.

collation 440 EDUCOM strategies series on
information technology

series title 504 Includes
bibliographical references (p. 373-381). 020
ISBN 1-55558-036-X 34.95
9
MARC fields (continued)
650 Academic libraries--United
States--Automation.
subject
heading 650 Libraries and electronic
publishing--United States. 650 Library
information networks--United States. 650
Information technology--United States. 700
Arms, Caroline R. (Caroline Ruth) 040 DLC DLC
DLC 043 n-us--- 955 CIP ver. br02 to SL
02-26-90 985 APIF/MIG
10
MARC Encoding
tag 260 subfield a Bedford, Mass.
subfield b Digital Press, subfield
c c1990. MARC encoding 2600abcBedford,
Mass. Digital Press,c1990.
11
Name authority files
  • Caroline R. Arms or Caroline Ruth Arms?
  • Which William Phillips of Cardiff?
  • Mark Twain or Samuel Clemens?
  • Epithets
  • of Cardiff
  • doctor
  • Dates
  • 1832 - 1876
  • flourished 1860
  • circa 1832 - 1876

12
Shared cataloguing
  • OCLC -- Large centralized transaction processing
    database system
  • When a library catalogs a book it deposits MARC
    record in OCLC
  • Other libraries can copy the record
  • saves duplication of cataloguing
  • build database of holdings
  • OCLC database has 42 million records

13
Subject information
Library of Congress Subject Headings Academic
libraries--United States--Automation Hierarchical
classification Library of Congress call
number Z675.U5C16 Dewey Decimal
Classification 027.7 Creation and maintenance
of lists of subject headings and classifications
is a never ending task.
14
Online public access catalog (OPAC)
  • First stage
  • Library mounts its MARC records on a central
    computer
  • Provides a simple terminal interface and
    dedicated terminals
  • Boolean search -- fielded searching
  • Most university libraries reached this stage
    about 1990
  • Second stage
  • Library connects computer to a campus network
    and Internet
  • Converts card catalog records to MARC
    (retrospective conversion)

15
Library information systems
  • When the catalog is online ...
  • Add other collections and services
  • Secondary information (Inspec, Medline,
    Chemical Abstracts)
  • Reference works (dictionaries,
    encyclopedias)
  • Improve user interface
  • Add full text searching
  • Add web interface
  • Add connections to off-campus information
    sources
  • Scientific journals
  • Databases (census, genome)

16
Library management systems
A library management system, sometimes called an
integrated library system, integrates the
internal processes of a library, e.g.,
acquisitions, cataloguing, binding, circulation,
etc. It usually contains an online public
access catalog, but does not provide integrated
services to users. Library management systems are
produced by small companies who lack the capital
and technical expertise to develop modern digital
libraries.
17
Notes on MARC
  • A great achievement
  • Developed in 1960s
  • Magnetic tape exchange format for printing
    catalog records
  • The dawn of computing
  • mixed upper and lower case
  • variable length fields,
  • repeated fields
  • non-Roman scripts
  • 100(?) million records with standard content
    and format
  • Thousands of trained librarians (millions?)

18
Notes on MARC
  • A great problem
  • Not designed for computer algorithms
  • One record per item (poor links between
    records)
  • Tied to traditional materials and
    traditional practices
  • Not Unicode
  • 100 of million records at 100 -- 10 billion
  • A classic legacy system!
Write a Comment
User Comments (0)
About PowerShow.com