Classification of the Web - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Classification of the Web

Description:

LIBR 517: Subject Analysis. School of Library, Archival and Information ... that always expands into sets of ten ten choices per page for web users easy ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 40
Provided by: Hui91
Category:

less

Transcript and Presenter's Notes

Title: Classification of the Web


1
Classification of the Web
LIBR 517 Subject Analysis School of Library,
Archival and Information Studies,
UBC Instructors Carol Elder Penny Swanson
  • Presented by
  • Huibin (Heather) Cai
  • Henian E
  • Daphne Hamilton-Nagorsen
  • Todd Rowlatt
  • March 30, 2005

2
Introduction
  • Classification of the Web
  • A Brief History
  • Dewey Decimal Classification
  • Library of Congress Classification
  • Current Trends

3
Brief History
  • Issues with the Web and Information Retrieval
  • Volume of information
  • Lack of order and rules
  • Lack of comprehensive record of documents
  • Lack of classification and description framework

4
Brief History (contd)
  • A Brief History of Web Classification Indexing
  • Subject Directories
  • Human based
  • Concept indexing
  • Search Engines
  • Machine based
  • Word indexing

5
Dewey Decimal Classification (DDC) and the Web
  • Organizing web resources by DDC

6
Why DDC is preferred
  • Universal in scope
  • Multilingual
  • Numerical notation, easy to understand by users
    of all languages
  • Documents never arranged by alphabetical order
  • When indexes are translated to another language,
    the arrangement still makes sense
  • Schedules available in multiple languages

7
Why DDC is preferred (contd)
  • Well-known
  • Most widely used in the world
  • Both user and cataloguer familiar with it
  • Good Interoperability
  • OCLC linking DDC notation to other subject-based
    finding aids such as the LCSH
  • Programs and standards being developed to
    transcribe other classification numbers into DDC
    numbers

8
Why DDC is preferred (contd)
  • Hierarchy structure
  • A decimal system that always expands into sets of
    ten ten choices per page for web users easy
    to handle
  • From broad to specific, easy for browsing with
    hyperlink technology

9
Common practices with DDC
10
Examples
  • BUBL LINK Catalogue of Internet Resources
    http//bubl.ac.uk/link/
  • An online catalogue created and managed by BUBL
    Information Service, Centre for Digital Library
    Research, Strathclyde University, Glasgow
  • Uses DDC as the primary organization structure
    for its catalogue of Internet resources

11
Examples (contd)
  • Browse by Dewey Classification Number
    http//www.bcpl.gov.bc.ca/VRD/dewey.php
  • An online catalogue created and managed by The BC
    Virtual Reference Desk (VRD) that is designed to
    provide quick access to a virtual library of Web
    sites

12
What improvements can be made?
  • More DDC levels can be used
  • Captions need to be revised to make them more
    understandable
  • Websites should offer searching option along with
    browsing
  • Enhanced features of DDC (21st and 22nd editions)
    should be applied

13
Library of Congress Classification (LCC) and the
Web
  • Organizing web resources by LCC

14
Benefits of Using LCC
  • A General Classification Scheme
  • Supported by an existing organization
  • Stood test of time
  • Relationship to LCSH

15
The Chief Benefit
  • Widespread Use in Academic Institutions
  • Familiarity and Comfort
  • Support for Digitization Projects
  • Inter-operability and a bridge between 2 worlds

16
Problems with Using LCC
  • Literary Warrant
  • Designed for a specific national collection
  • Reflects the biases of that collection
  • Not dynamic or quick to respond to change
  • Not concerned for needs of the Internet users,
    but for the users of the Library of Congress

17
Hierarchy
  • The Hierarchy is demonstrated visually in the
    print schedules, not through the notation
  • An crucial problem in the electronic realm

18
An example
  • QA76.27 Study and Teaching
  • QA76.33 Computer Camps
  • QA76.38 Hybrid Computers
  • QA76.4 Analog Computers
  • QA76.5 Digital Computers
  • QA76.27 Study and Teaching
  • QA76.33 Computer Camps
  • QA76.38 Hybrid Computers
  • QA76.4 Analog Computers
  • QA76.5 Digital Computers

19
Other Issues with LCC
  • Multilingual
  • Language of the content
  • Cutter numbers

20
Current Projects Using LCC
  • Not many creative projects exist
  • Most are associated with a university or academic
    institution
  • 2 Examples
  • ICRC The Internet Collegiate Reference
    Collection (Bloomsburg University)
  • Online Books Page (University of Pennsylvania

21
ICRC (Internet Collegiate Reference Collection)
22
ICRC (contd)
23
Online Books Page
24
Online Books Page (contd)
25
LCC and the Web
  • Benefits of Integration and Inter-operability
    suggest the value of LCC for classifying the
    Internet
  • Need more creative and extensive testing and
    research

26
Current Trends
  • Approaches and practices

27
Brief Review (DDC LCC)
  • Manual classification using DDC and LCC
  • Advantages
  • Authority control
  • Organized
  • Disadvantages
  • Costly
  • Slow in process

28
Current Issues
  • Fixed number of categories vs. the growing,
    nature of the Internet
  • Local practices vs. data integrity and
    interoperability in higher level standard
    development
  • Individual judgment to Web resources vs. precise
    search and retrieval demand

29
Solutions?
30
Example 1
  • Specialized directory system by automatically
  • classifying Web documents
  • - Dept. of Lib. Info. Sci., Yonsei University,
    Korea
  • Automates the collecting, indexing and
    classification processes of Web documents
  • Specialized directory system based on DDC scheme
  • Classification using subject term dictionary
  • Experiments on Economics (class 300 in DDC)

31
Fig.1 Overview of the specialized directory system
32
Example 1 (contd)
  • Weaknesses
  • Still requires human expert verification
  • The classification scheme and term dictionary
    need to be updated periodically
  • Search through all existing categories to make
    any classification

33
Example 2
  • Dynamic and hierarchical classification system
  • - College of Eng. Sci., Louisiana Tech Univ.
  • Organize the Web pages into a tree structure
  • Capable of adding new categories as required
  • Classify Web pages by searching through single
    path of the tree

34
Analysis
  • Similarities approach
  • Comparing the similarities between a
    classification structure and a Web document
  • Assigning the document to the correct category
    automatically

35
Analysis (contd)
  • Remarkable improvements made in E.g. 2
  • Reduces computational complexity by searching
    through single path of the category tree
  • Accommodates the ever growing number of Web pages
    on the Internet by adding new categories as
    required
  • Increase the accuracy by 6 in comparison to
    related system

36
Summary
  • A Brief History
  • Dewey Decimal Classification
  • Library of Congress Classification
  • Current Trends

37
References
  • Ahronheim, Judith R. ed. High-level Subject
    Access Tools and Techniques in Internet
  • Cataloging. Bringhamton, N.Y. Haworth
    Information Press, 2002.
  •  
  • Choi, Ben, and Xiaogang Peng. Dynamic and
    Hierarchical Classification of Web
  • Pages. Online Information Review 28, no. 2
    (2004) 139-147.
  •  
  • Chung, Young Mee, and Noh Young-Hee. Developing
    a Specialized Directory System
  • by Automatically Classifying Web Documents.
    Journal of Information Science
  • 29, no. 2 (2003) 117-126.
  •  
  • Digital Libraries Project. Columbia University,
    August 2004.
  • lthttp//www.columbia.edu/cu/libraries/digital/gt
    (16 March 2005).
  •  
  • Ellis, David, and Ana Vascondelos. Ranganathan
    and the Net Using Facet Analysis to
  • Search and Organize the World Wide Web. In
    ASLIB Proceedings Bradford Jan.
  • 1999. vol. 51, no. 1 3-10.

38
References (contd)
  • Franklin, Rosemary Aud. Re-inventing subject
    access for the semantic Web. Online Information
    Review 27, no. 2 (2003) 94.
  • Godby, Jean, and Jay Stuler. "The Library of
    Congress Classification as a knowledge
  • base for automatic subject categorization."
    Presented at the IFLA Preconference
  • "Subject Retrieval in a Networked Environment,"
    Dublin, Ohio, August 2001
  • lthttp//staff.oclc.org/godby/auto_class/godby-i
    fla.html_edn5gt (14 March
  • 2005).
  • Koch, Traugott et al. The role of classification
    schemes in Internet resource description
  • and discovery. (February 1997)
  • lthttp//www.ukoln.ac.uk/metadata/desire/classifi
    cation/gt (18 March 2005).
  • Larson, Ray (1992). Experiments in Automatic
    Library of Congress Classification.
  • Journal of the American Society for Information
    Science 43(2) 130-148.
  • McKiernan, Gerry, ed., CyberStacks(sm). Iowa
    State University, 1998.
  • lthttp//www.public.iastate.edu/CYBERSTACKS/home
    page.htmlgt (16 March
  • 2005).

39
References (contd)
  • Ockerbloom, John Mark, ed. The Online Books Page.
    University of Pennsylvania, 2005.
  • lthttp//digital.library.upenn.edu/books/subjects
    .htmlgt (16 March 2005).
  • OCLC. About DDC. lthttp//www.oclc.org/dewey/about/
    gt (18 March 2005).
  • Saeed, Hamid and Abudus Sattar Chaudery.
    Potential of Bibliographic Tools to
  • Organize Knowledge on the Internet. Knowledge
    Organization 28, no.1 (2001) 17-26.
  • Thomas, Alan R. and James R. Shearer (eds).
    Internet Searching and Indexing The Subject
    Approach. New York The Haworth Information
    Press, 2000.
  •  
  • Vizine-Goetz, Diane. Using Library
    Classification Schemes for Internet Resources.
  • OCLC Internet Cataloging Project Colloquium,
    November 1999.
  • lthttp//staff.oclc.org/vizine/Intercat/vizine-g
    oetz.htmgt (13 March 2005).
  •  
  • Weyant, Nancy, ed. ICRC The Internet Collegiate
    Reference Collection. Bloomsburg
  • University's Harvey A. Andruss Library,
    November 2004.
  • lthttp//icrc.bloomu.edu/icrc/lc.phpgt (16 March
    2005).
Write a Comment
User Comments (0)
About PowerShow.com