SIMPLE AUTOMATIC INDEXING SYSTEM By SHIVARAM' B' S Trainee NCSI 20012002

1 / 17
About This Presentation
Title:

SIMPLE AUTOMATIC INDEXING SYSTEM By SHIVARAM' B' S Trainee NCSI 20012002

Description:

Table of Contents. Introduction of Information Retrieval. Statement of the problem. Objective ... Input file should be in English language and ASCII text only ... –

Number of Views:431
Avg rating:4.0/5.0
Slides: 18
Provided by: ncsiIis
Category:

less

Transcript and Presenter's Notes

Title: SIMPLE AUTOMATIC INDEXING SYSTEM By SHIVARAM' B' S Trainee NCSI 20012002


1
SIMPLE AUTOMATIC INDEXING SYSTEMBySHIVARAM.
B. STrainee NCSI2001-2002
2
Table of Contents
  • Introduction of Information Retrieval
  • Statement of the problem
  • Objective
  • Need of the project
  • Scope
  • System Design
  • Observation
  • Conclusion

3
Information Retrieval
  • Information
  • Information Retrieval
  • Information Retrieval System

4
Characteristics and activities of IRS
  • Multi-party interaction
  • Iterative process
  • Dynamic process
  • Activities
  • Content analysis
  • Information structure
  • Evaluation

5
Indexing
  • It is a process of constructing document
    surrogates by assigning identifiers to text items
  • Notions of Indexing
  • -indexing exhaustivity
  • -term specificity

6
Automatic indexing
7
Basic Techniques of AI
  • Text simplification
  • Conflation of morphologically related words
    through stemming
  • Selection of the best indices based on frequency
    criteria
  • Assigning weightage to terms

8
why AI ?
  • Information overload
  • Explosion of machine-readable text
  • Cost effectiveness

9
Advantages and Disadvantages of AI
  • Advantages
  • Reduces the processing time
  • Reduces the cost
  • Improved consistency
  • Disadvantages
  • Lack of control vocabulary
  • Difficulty in specifying term relation

10
Objective
  • To develop simple automatic indexing system to
    identify the significant keywords and phrases.

11
Scope
  • Step forward in building an automatic indexing
    system for Indian institute of Science

12
Limitations
  • Input file should be in English language and
    ASCII text only
  • This program fails to take care of special
    characters
  • The program is not implemented for stemming
  • It indexes based on the string length not
    according to alphabetical order
  • Term weightages are not assigned to measure the
    relevance.
  • Semantic relationships are not implemented

13
System design
  • Workflow through different modules

Input file
14
(No Transcript)
15
Software and Hardware
  • Software
  • Operating system Linux
    (6.2)
  • Programming language C
  • Hardware
  • Intel Pentium III processor with 64 MB RAM

16
OBSERVATIONS
  • List of relevant keywords were obtained with the
    frequency
  • Keywords as a phrase were also got
  • Keywords with a length of 2 or 3 characters were
    got because of abbreviations.
  • Some irrelevant keywords and phrases like
    centers, successfully, location, etc.were indexed
  • Phrases of 5 or 6 words are also got which may be
    relevant or not.

17
Conclusion
  • As we have seen automatic indexing as the
    assignment of content identifiers with the help
    of latest technology, it has made indexing system
    effective and efficient in retrieval. One can
    achieve indexing exhaustivity and specificity as
    much possible in manual indexing. One such
    example is the project undertaken which will be
    more useful in text simplification, single term
    and phrase indexing. With the application of
    clear indexing policy regarding phrases, keywords
    and with the list of stop words, it is evident
    that this project can be modified and use as a
    very good mechanized indexing tool.
Write a Comment
User Comments (0)
About PowerShow.com