The SMART System: - PowerPoint PPT Presentation

About This Presentation
Title:

The SMART System:

Description:

Bibliography. Salton, Gerard. ... recognize document structure and convert to a standard format ... Output/display format. System Architecture. 350 source files ... – PowerPoint PPT presentation

Number of Views:409
Avg rating:3.0/5.0
Slides: 13
Provided by: jonatha79
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: The SMART System:


1
The SMART System
Progress Report on System Acquisition and Set-Up
Danyel Fisher Jonathan Henke Jason Hong
Jonathan Huang Jeane Stetson
  • March 8, 2000
  • IS 240 Principles of Information Retrieval

2
Background
  • Developed 1961-64 at Harvard
  • Maintained at Cornell University
  • Tested at every TREC conference
  • Emphasis automatic retrieval (rather than
    interactive)
  • Vector-based analysis, tf x idf weighting
  • Current version 13.3 (we have 11.0)

3
Bibliography
  • Salton, Gerard. The SMART retrieval system
    experiments in automatic document processing.
    Englewood Cliffs, N.J., Prentice-Hall. 1971
  • Salton, Gerard. Developments in Automatic Text
    Retrieval. Science, 1991 Aug 30, v253
    n5023974-980.
  • TREC Proceedings
  • SMART Staff, User's Manual for the SMART
    Information Retrieval System. Technical Report
    71-95, Revised April 1974. Cornell University
    (1974).
  • C. Buckley, Implemetation of the SMART
    Information Retrieval System. Technical Report
    85-686, Cornell University (1985).

4
Indexing (Creating a Collection)
  • Document pre-parsing
  • recognize document structure and convert to a
    standard format
  • Finding handling indexable information
  • parsing, stopword removal, stemming, term
    clustering, synonym dictionaries, etc.
  • Query handling
  • parsing, stopword removal, stemming, etc.
    (parallel to document handling)

5
Indexing (Creating a Collection)
  • Retrieval methods
  • term weighting and similarity evaluation
  • Default standard tf x idf weighting, vector
    inner product
  • Output format display

6
Indexing Customizable Elements
  • Document location format
  • Indexable information index format
  • Query format
  • Retrieval method (document/query comparison)
  • Output/display format

7
System Architecture
  • 350 source files
  • 45,000 lines of code
  • Can include user-programmed modules

8
Set-up Procedure
  • Download source code
  • ftp//ftp.cs.cornell.edu/pub/smart
  • Compile
  • Look for documentation
  • Indexing completed using default settings
  • Unable to complete query yet
  • Unable to examine index
  • Cannot verify success of indexing!

9
System Documentation
  • Minimal
  • Poorly explained
  • Cryptic
  • Uses their own specific terminology

10
Problems Faced
  • Virtually every feature is customizable
  • Somewhere there are people who know how to do the
    customization..
  • SMART suffers from the advantages and
    disadvantages of most academic research software.
    It's designed to be extremely flexible (as long
    as you know what you're doing!) - SMART manual
  • Documentation is too high level.

11
Further Steps
  • Complete a query using default settings.
  • Identify specific files for adjusting each
    customizable feature.
  • Determine how to modify each feature.

12
Recommendations Advice
  • Find someone who has actually worked with the
    system before.
  • Understanding operation requires examination of C
    source code.
  • Customization requires modifying / creating C
    code.
Write a Comment
User Comments (0)
About PowerShow.com