Title: Controlled Vocabularies: Name Authority Control
1Controlled Vocabularies Name Authority Control
- University of California, Berkeley
- School of Information Management and Systems
- SIMS 202 Information Organization and Retrieval
2Review
- Mapping to the relational model
- Database Design Normalization
- ER Diagrams and Assignment
3Normalization
Unnormalized Relations
First normal form
Functional dependencyof nonkey attributes on the
primary key - Atomic values only
Second normal form
No transitive dependency between nonkey attributes
Third normal form
Boyce- Codd and Higher
Full Functional dependencyof nonkey attributes on
the primary key
All determinants are candidate keys - Single
multivalued dependency
4Unnormalized Relations
- First step in normalization is to convert the
data into a two-dimensional table - In unnormalized relations data can repeat within
a column
5Unnormalized Relation
6First Normal Form
- To move to First Normal Form a relation must
contain only atomic values at each row and
column. - No repeating groups
- A column or set of columns is called a Candidate
Key when its values can uniquely identify the row
in the relation.
7First Normal Form
8Second Normal Form
- A relation is said to be in Second Normal Form
when every nonkey attribute is fully functionally
dependent on the primary key. - That is, every nonkey attribute needs the full
primary key for unique identification
9Second Normal Form
10Second Normal Form
11Third Normal Form
- A relation is said to be in Third Normal Form if
there is no transitive functional dependency
between nonkey attributes - When one nonkey attribute can be determined with
one or more nonkey attributes there is said to be
a transitive functional dependency. - The side effect column in the Surgery table is
determined by the drug administered - Side effect is transitively functionally
dependent on drug so Surgery is not 3NF
12Third Normal Form
13Third Normal Form
14Joins
15More on Assignment and ER
- Just what is this Cookie database?
- What sort of ways might it be used?
- What are those ER symbols again?
16Original Assignment
- Examine the Cookie database using Access and look
at the ER Diagram for it posted on the
assignments page. - Consider the possibilities of Book publications
- What are the problems with the database?
- What new fields would you add to the database,
and where? - Draw a new ER diagram showing your design.
17Cookie ER diagram
pubid
accno
Has call
Has copy
BIBFILE
LIBFILE
CALLFILE
libid
accno
libid
Note diagram contains only attributes used for
linking
Has index
Has subject
INDXFILE
SUBFILE
subcode
accno
subcode
18Cookie Database
- Cookie is a bibliographic database that contains
information about a hypothetical union catalog of
several libraries - There are currently 5 main types of entities in
the database (and one linking relation) - Books (bibfile)
- Local Call numbers (callfile)
- Libraries (libfile)
- Publishers (pubfile)
- Subject headings (subfile)
- Links between subject and books (indxfile)
19BIBFILE
- Books (BIBFILE) contains information about
particular books. It includes one record for each
book. The attributes are - accno -- an accession or serial number
- author -- The authors name
- title -- The title of the book
- loc -- Location of publication (where published)
- date -- Date of publication
- price -- Price of the book
- pagination -- Number of pages
- ill -- What type of illustrations (maps, etc) if
any - height -- Height of the book in centimeters
20CALLFILE
- CALLFILE contains call numbers and holdings
information linking particular books with
particular libraries. Its attributes are - accno -- the book accession number
- libid -- the id of the holding library
- callno -- the call number of the book in the
particular library - copies -- the number of copies held by the
particular library
21LIBFILE
- LIBFILE contain information about the libraries
participating in this union catalog. Its
attributes include - libid -- Library id number
- library -- Name of the library
- laddress -- Street address for the library
- lcity -- City name
- lstate -- State code (postal abbreviation)
- lzip -- zip code
- lphone -- Phone number
- mop - suncl -- Library opening and closing times
for each day of the week.
22PUBFILE
- PUBFILE contain information about the publishers
of books. Its attributes include - pubid -- The publishers id number
- publisher -- Publisher name
- paddress -- Publisher street address
- pcity -- Publisher city
- pstate -- Publisher state
- pzip -- Publisher zip code
- pphone -- Publisher phone number
- ship -- standard shipping time in days
23SUBFILE
- SUBFILE contains each unique subject heading that
can be assigned to books. Its attributes are - subcode -- Subject identification number
- subject -- the subject heading/description
24INDXFILE
- INDXFILE provides a way to allow many-to-many
mapping of subject headings to books. Its
attributes consist entirely of links to other
tables - subcode -- link to subject id
- accno -- link to book accession number
25Some examples of Cookie Searches
- Who wrote Microcosmographia Academica?
- How many pages long is Alfred Whiteheads The
Aims of Education and Other Essays? - Which branches in Berkeleys public library
system are open on Sunday? - What is the call number of Moffitt Librarys copy
of Abraham Flexners book Universities American,
English, German? - What books on the subject of higher education are
among the holdings of Berkeley (both UC and City)
libraries? - Print a list of the Mechanics Library holdings,
in descending order by height. - What would it cost to replace every copy of each
book that contains illustrations (including
graphs, maps, portraits, etc.)? - Which library closes earliest on Friday night?
26ER Diagram Symbols
Ovals are used to indicate the attributes
associated with an entity or relationship (That
is, the pieces of information recorded in the
database about the entity or relationship) An
underlined name indicates that the attribute is a
primary key (That is, it can uniquely identify
the entity) Rectangles are used to indicate
entities (That is, the representatives or records
describing persons, things, or events in the
database) Diamonds are used to indicate
relationships between entities. (That is, some
association between the data records of different
entities)
Attribute
Primary key
Entity
Relationship
27Cookie ER diagram
pubid
accno
Has call
Has copy
BIBFILE
LIBFILE
CALLFILE
libid
accno
libid
Note diagram contains only attributes used for
linking
Has index
Has subject
INDXFILE
SUBFILE
subcode
accno
subcode
28Assignment Goal
- The main intent is to have you start thinking
about how databases are structured, and what
types of information can or should be included
when designing a database - The main task is to look for MISSING elements in
the current design, or badly designed elements
given the particular data - What attributes and/or new relations need to be
added to the database?
29And now for something completely different...
30Today
- Controlled vocabularies
- Choice of names
- Form of names
- Name Authority files
31Controlled Vocabularies
- Vocabulary control is the attempt to provide a
standardized and consistent set of terms (such as
subject headings, names, classifications, etc.)
with the intent of aiding the searcher in finding
information.
32Controlled Vocabularies
- Names and name authorities (Today)
- Cognitive basis of categorization and subject
classification (Thursday) - Design of controlled vocabularies for subject
access -- Thesaurus design (next week)
33Names
- Cutters objectives of bibliographic description
- To enable a person to find a document of which
the author is known. - To show what the library has by a given author.
- First serves access.
- Second serves collocation.
34Problems with Names
- How many names should be associated with a
document? - Which of these should be the main entry?
- What form should each of the names take?
- What references should be made from other
possible forms of names that havent been used?
35The problem
- Proliferation of the forms of names
- Different names for the same person
- Different people with the same names
- Examples
- from Books in Print (semi-controlled but not
consistent) - ERIC author index (not controlled)
36Rules for description
- AACR II and other sets of descriptive cataloging
rules provide guidelines for - Determining the number of name entries
- Choosing a main entry
- Deciding on the form of name to be used
- Deciding when to make references
37Authority control
- Authority control is concerned with creation and
maintenance of a set of terms that have been
chosen as the standard representatives (also know
as established) based on some set of rules. - If you have rules, why do you need to keep track
of all of the headings?
38Conditions of Authorship?
- Single person or single corporate entity
- Unknown or anonymous authors
- Shared responsibility
- Collections or editorially assembled works
- Works of mixed responsibility (e.g. translations)
- Related Works
39Added Entries
- Personal names
- Collaborators
- Editors, compilers, writers
- Translators (in some cases)
- Illustrators (in some cases)
- Other persons associated with the work (such as
the honoree in a Festschrift). - Corporate Names
- Any prominently named corporate body that has
involvement in the work beyond publication,
distribution, etc.
40Choice of Name
- AACR II says that the predominant form of the
name used in a particular authors writings
should be chosen as the form of name. - References should be made from the other forms of
the name.
41Form of the Name
- When names appear in multiple forms, one form
needs to be chosen. Criteria for choice are - Fullness (e.g. Full names vs. initials only)
- Language of the name.
- Spelling (choose predominant form)
- Entry element
- John Smith or Smith, John?
- Mao Zedong or Zedong, Mao? (Mao Tse Tung?)
42Name Authority Files
IDNAFL8057230 STp ELn STHa MSc
UIPa TD19910821174242 KRCa NMUa
CRCc UPNa SBUa SBCa DIDn
DF05-14-80 RFEa CSC SRUb SRTn
SRNn TSS TGA? ROM? MOD VSTd
08-21-91 Other Versions
earlier 040 DLCcDLCdDLCdOCoLC 053
PR6005.R517 100 10 Creasey, John 400 10
Cooke, M. E. 400 10 Cooke, Margaret,d1908-1973
400 10 Cooper, Henry St. John,d1908-1973
400 00 Credo,d1908-1973 400 10 Fecamps,
Elise 400 10 Gill, Patrick,d1908-1973 400
10 Hope, Brian,d1908-1973 400 10 Hughes,
Colin,d1908-1973 400 10 Marsden, James 400
10 Matheson, Rodney 400 10 Ranger, Ken 400
20 St. John, Henry,d1908-1973 400 10 Wilde,
Jimmy 500 10 wnnncaAshe, Gordon,d1908-1973
43Name Authority Files
IDNAFO9114111 STp ELn STHa MSn
UIPa TD19910817053048 KRCa NMUa
CRCc UPNa SBUa SBCa DIDn
DF06-03-91 RFEa CSCc SRUb SRTn
SRNn TSS TGA? ROM? MOD VSTd
08-19-91 040 OCoLCcOCoLC 100 10 Marric,
J. J.,d1908-1973 500 10 wnnncaCreasey,
John 663 Works by this author are entered
under the name used in the item. For a
listing of other names used by this author,
search also underbCrease y, John 670
OCLC 13441825 His Gideon's day, 1955b(hdg.
Creasey, John usage J .J. Marric) 670
LC data base, 6/10/91b(hdg. Creasey, John
usage J.J. Marric) 670 Pseuds. and
nicknames dict., c1987b(Creasey, John,
1908-1973 Britis h author pseud.
Marric, J. J.)
44Name authority files
IDNAFL8166762 STp ELn STHa MSc
UIPa TD19910604053124 KRCa NMUa
CRCc UPNa SBUa SBCa DIDn
DF08-20-81 RFEa CSC SRUb SRTn
SRNn TSS TGA? ROM? MOD VSTd
06-06-91 Other Versions
earlier 040 DLCcDLCdDLCdOCoLC 100 10
Butler, William Vivian,d1927- 400 10 Butler,
W. V.q(William Vivian),d1927- 400 10 Marric,
J. J.,d1927- 670 His The durable
desperadoes, 1973. 670 His The young
detective's handbook, c1981bt.p. (W.V. Butler)
670 His Gideon's way, 1986bCIP t.p.
(William Vivian Butler writing as J .J.
Marric)