Title: Classification of the Web
1Classification of the Web
LIBR 517 Subject Analysis School of Library,
Archival and Information Studies,
UBC Instructors Carol Elder Penny Swanson
- Presented by
- Huibin (Heather) Cai
- Henian E
- Daphne Hamilton-Nagorsen
- Todd Rowlatt
- March 30, 2005
2Introduction
- Classification of the Web
- A Brief History
- Dewey Decimal Classification
- Library of Congress Classification
- Current Trends
3Brief History
- Issues with the Web and Information Retrieval
- Volume of information
- Lack of order and rules
- Lack of comprehensive record of documents
- Lack of classification and description framework
4Brief History (contd)
- A Brief History of Web Classification Indexing
- Subject Directories
- Human based
- Concept indexing
- Search Engines
- Machine based
- Word indexing
5Dewey Decimal Classification (DDC) and the Web
- Organizing web resources by DDC
6Why DDC is preferred
- Universal in scope
- Multilingual
- Numerical notation, easy to understand by users
of all languages - Documents never arranged by alphabetical order
- When indexes are translated to another language,
the arrangement still makes sense - Schedules available in multiple languages
7Why DDC is preferred (contd)
- Well-known
- Most widely used in the world
- Both user and cataloguer familiar with it
- Good Interoperability
- OCLC linking DDC notation to other subject-based
finding aids such as the LCSH - Programs and standards being developed to
transcribe other classification numbers into DDC
numbers
8Why DDC is preferred (contd)
- Hierarchy structure
- A decimal system that always expands into sets of
ten ten choices per page for web users easy
to handle - From broad to specific, easy for browsing with
hyperlink technology
9Common practices with DDC
10Examples
- BUBL LINK Catalogue of Internet Resources
http//bubl.ac.uk/link/ - An online catalogue created and managed by BUBL
Information Service, Centre for Digital Library
Research, Strathclyde University, Glasgow - Uses DDC as the primary organization structure
for its catalogue of Internet resources
11Examples (contd)
- Browse by Dewey Classification Number
http//www.bcpl.gov.bc.ca/VRD/dewey.php - An online catalogue created and managed by The BC
Virtual Reference Desk (VRD) that is designed to
provide quick access to a virtual library of Web
sites
12What improvements can be made?
- More DDC levels can be used
- Captions need to be revised to make them more
understandable - Websites should offer searching option along with
browsing - Enhanced features of DDC (21st and 22nd editions)
should be applied
13Library of Congress Classification (LCC) and the
Web
- Organizing web resources by LCC
-
14Benefits of Using LCC
- A General Classification Scheme
- Supported by an existing organization
- Stood test of time
- Relationship to LCSH
15The Chief Benefit
- Widespread Use in Academic Institutions
- Familiarity and Comfort
- Support for Digitization Projects
- Inter-operability and a bridge between 2 worlds
16Problems with Using LCC
- Literary Warrant
- Designed for a specific national collection
- Reflects the biases of that collection
- Not dynamic or quick to respond to change
- Not concerned for needs of the Internet users,
but for the users of the Library of Congress
17Hierarchy
- The Hierarchy is demonstrated visually in the
print schedules, not through the notation - An crucial problem in the electronic realm
18An example
- QA76.27 Study and Teaching
- QA76.33 Computer Camps
- QA76.38 Hybrid Computers
- QA76.4 Analog Computers
- QA76.5 Digital Computers
- QA76.27 Study and Teaching
- QA76.33 Computer Camps
- QA76.38 Hybrid Computers
- QA76.4 Analog Computers
- QA76.5 Digital Computers
19Other Issues with LCC
- Multilingual
- Language of the content
- Cutter numbers
20Current Projects Using LCC
- Not many creative projects exist
- Most are associated with a university or academic
institution - 2 Examples
- ICRC The Internet Collegiate Reference
Collection (Bloomsburg University) - Online Books Page (University of Pennsylvania
21ICRC (Internet Collegiate Reference Collection)
22ICRC (contd)
23Online Books Page
24Online Books Page (contd)
25LCC and the Web
- Benefits of Integration and Inter-operability
suggest the value of LCC for classifying the
Internet - Need more creative and extensive testing and
research
26Current Trends
27Brief Review (DDC LCC)
- Manual classification using DDC and LCC
- Advantages
- Authority control
- Organized
- Disadvantages
- Costly
- Slow in process
28Current Issues
- Fixed number of categories vs. the growing,
nature of the Internet - Local practices vs. data integrity and
interoperability in higher level standard
development - Individual judgment to Web resources vs. precise
search and retrieval demand
29Solutions?
30Example 1
- Specialized directory system by automatically
- classifying Web documents
- - Dept. of Lib. Info. Sci., Yonsei University,
Korea - Automates the collecting, indexing and
classification processes of Web documents - Specialized directory system based on DDC scheme
- Classification using subject term dictionary
- Experiments on Economics (class 300 in DDC)
31Fig.1 Overview of the specialized directory system
32Example 1 (contd)
- Weaknesses
- Still requires human expert verification
- The classification scheme and term dictionary
need to be updated periodically - Search through all existing categories to make
any classification
33Example 2
- Dynamic and hierarchical classification system
- - College of Eng. Sci., Louisiana Tech Univ.
- Organize the Web pages into a tree structure
- Capable of adding new categories as required
- Classify Web pages by searching through single
path of the tree
34Analysis
- Similarities approach
- Comparing the similarities between a
classification structure and a Web document - Assigning the document to the correct category
automatically
35Analysis (contd)
- Remarkable improvements made in E.g. 2
- Reduces computational complexity by searching
through single path of the category tree - Accommodates the ever growing number of Web pages
on the Internet by adding new categories as
required - Increase the accuracy by 6 in comparison to
related system
36Summary
- A Brief History
- Dewey Decimal Classification
- Library of Congress Classification
- Current Trends
37References
- Ahronheim, Judith R. ed. High-level Subject
Access Tools and Techniques in Internet - Cataloging. Bringhamton, N.Y. Haworth
Information Press, 2002. -
- Choi, Ben, and Xiaogang Peng. Dynamic and
Hierarchical Classification of Web - Pages. Online Information Review 28, no. 2
(2004) 139-147. -
- Chung, Young Mee, and Noh Young-Hee. Developing
a Specialized Directory System - by Automatically Classifying Web Documents.
Journal of Information Science - 29, no. 2 (2003) 117-126.
-
- Digital Libraries Project. Columbia University,
August 2004. - lthttp//www.columbia.edu/cu/libraries/digital/gt
(16 March 2005). -
- Ellis, David, and Ana Vascondelos. Ranganathan
and the Net Using Facet Analysis to - Search and Organize the World Wide Web. In
ASLIB Proceedings Bradford Jan. - 1999. vol. 51, no. 1 3-10.
38References (contd)
- Franklin, Rosemary Aud. Re-inventing subject
access for the semantic Web. Online Information
Review 27, no. 2 (2003) 94. - Godby, Jean, and Jay Stuler. "The Library of
Congress Classification as a knowledge - base for automatic subject categorization."
Presented at the IFLA Preconference - "Subject Retrieval in a Networked Environment,"
Dublin, Ohio, August 2001 - lthttp//staff.oclc.org/godby/auto_class/godby-i
fla.html_edn5gt (14 March - 2005).
- Koch, Traugott et al. The role of classification
schemes in Internet resource description - and discovery. (February 1997)
- lthttp//www.ukoln.ac.uk/metadata/desire/classifi
cation/gt (18 March 2005). - Larson, Ray (1992). Experiments in Automatic
Library of Congress Classification. - Journal of the American Society for Information
Science 43(2) 130-148. -
- McKiernan, Gerry, ed., CyberStacks(sm). Iowa
State University, 1998. - lthttp//www.public.iastate.edu/CYBERSTACKS/home
page.htmlgt (16 March - 2005).
39References (contd)
- Ockerbloom, John Mark, ed. The Online Books Page.
University of Pennsylvania, 2005. - lthttp//digital.library.upenn.edu/books/subjects
.htmlgt (16 March 2005). - OCLC. About DDC. lthttp//www.oclc.org/dewey/about/
gt (18 March 2005). - Saeed, Hamid and Abudus Sattar Chaudery.
Potential of Bibliographic Tools to - Organize Knowledge on the Internet. Knowledge
Organization 28, no.1 (2001) 17-26. - Thomas, Alan R. and James R. Shearer (eds).
Internet Searching and Indexing The Subject
Approach. New York The Haworth Information
Press, 2000. -
- Vizine-Goetz, Diane. Using Library
Classification Schemes for Internet Resources. - OCLC Internet Cataloging Project Colloquium,
November 1999. - lthttp//staff.oclc.org/vizine/Intercat/vizine-g
oetz.htmgt (13 March 2005). -
- Weyant, Nancy, ed. ICRC The Internet Collegiate
Reference Collection. Bloomsburg - University's Harvey A. Andruss Library,
November 2004. - lthttp//icrc.bloomu.edu/icrc/lc.phpgt (16 March
2005).