DIALOG 1: Basics, Inverted Files, Fundamental Commands - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

DIALOG 1: Basics, Inverted Files, Fundamental Commands

Description:

business D&B Electronic Business Directory, Disclosure, EdgarPlus, Thomas Register Online ... all documents in databases in reverse order by accession number ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 24
Provided by: sils2
Category:

less

Transcript and Presenter's Notes

Title: DIALOG 1: Basics, Inverted Files, Fundamental Commands


1
DIALOG 1Basics, Inverted Files, Fundamental
Commands
2
DIALOG
  • owned now by Thomson a commercial information
    retrieval system providing a single command
    language to several hundred files/databases
  • started in 1972 by Lockheed
  • one of a series of vendors of online databases
    (others include Lexis/Nexis, Dow Jones, OCLC,
    etc.)
  • databases come from producers (American
    Psychological Assn, Dun Bradstreet, Gale, US
    Government, etc.)
  • used much less frequently than in previous years

3
DIALOG
  • what kinds of questions can be answered using
    DIALOG and similar systems?
  • Have any studies been done on whether sunglasses
    induce temporary color blindness?
  • I am writing a paper for a university course in
    Healthcare Administration that deals with
    diabetes intervention in the Latino community.

4
DIALOG
  • It has been noted that women who participated in
    sports as children tend to have greater
    self-esteem, are less likely to be abused or stay
    with an abuser, and that they accomplish more in
    the business world. I would like to see articles
    and/or studies that address this issue for an
    article that I will write on the advantages,
    other than the obvious, of martial arts training
    for girls.

5
DIALOG
  • We are doing a comparison between the 3 countries
    of NAFTA on the difference in ethics and values
    between the countries.... The things we are
    interested in are laws connecting with ethics in
    the three countries, if there are any official
    "Code of ethics" companies can follow when doing
    buisiness the these countries, etc.

6
DIALOG
  • at present, 450 files, largely in business,
    science
  • many different kinds
  • bibliographic ERIC, PsycINFO, PR Newswire,
    Nuclear Science Abstracts
  • full-text Los Angeles Times, ABI/Inform, Bible
    (King James Version), Magills Survey of Cinema
  • directory American Library Directory, Foundation
    Directory, American Business Directory

7
DIALOG
  • reference Quotations Database, Marquis Whos
    Who, Eventline, British Books in Print
  • citation Social SciSearch, SciSearch, Arts
    Humanities Search
  • chemical CA Search, Beilstein Online, CHEMSEARCH
  • business DB Electronic Business Directory,
    Disclosure, EdgarPlus, Thomas Register Online

8
inverted files
  • the way we search in DIALOG and similar systems
  • records are stored in the linear file a simple
    list of all documents in databases in reverse
    order by accession number
  • can search in the linear file (in fact some
    systems do)
  • but this would be extremely tedious,
    time-consuming and inefficient
  • so we use the inverted file instead

9
what goes in an inverted file?
  • depends, and varies by system, database,
    producer, record structure, etc.
  • content-bearing words from fields which have
    been selected as searchable (cf. access points in
    catalog/MARC records)
  • in ERIC, for example, the Title, Abstract,
    Descriptor, Identifier fields go in the Basic
    Index
  • except stop words - varies by system from none to
    extensive (DIALOG only 9)

10
how do you construct an inverted file?
  • I thought youd never ask....lets do one.

11
constructing an inverted file
  • Steps 0.
  • Make decisions about the things were going to
    take as gospel here (stop list, what to do with
    punctuation, word v. phrase indexing, etc.)
  • Make decisions about coverage of database, get
    collection of documents, etc.
  • Choose record structure, software, hardware, etc.

12
constructing an inverted file
  • Step 1. Number all the words in each field of
    the record.
  • while doing that, strip all punctuation,
    including hyphens and apostrophes, and replace
    with spaces (Alzheimers becomes ALZHEIMER S, for
    example)
  • also include phrase-indexing where appropriate
    (here DE and ID fields)
  • then stop words in the stop list
  • AN AND BY FOR FROM OF THE TO WITH

13
constructing an inverted file
  • Step 2. Make a list of words with pointers.
  • be sure to include phrases as phrases along with
    words within (unless strictly phrase-indexed)
  • mark major descriptors and identifiers

14
constructing an inverted file
  • Step 3. Alphabetize the list.
  • Usually, ASCII order
  • Thats it!

15
geez--why all the detail?
  • dont have to dont even have to have the thing
    in the first place
  • part of overhead -- work you put in up front to
    make searching easier
  • each of these aspects (field information,
    position, major descriptors, etc.) will come in
    handy for searching later

16
structure
  • what we mean by that
  • examples
  • origins, intents and uses
  • use in searching (present and potential)
  • overhead issues

17
what do we mean by structure?
  • ways of configuring information items for a
    variety of reasons
  • of 2 kinds (at least here)
  • bibliographic record structure (ERIC in DIALOG)
  • structured documents (HTML, XML, SGML)

18
sample ERIC record
  • EJ355024 TM511910
  • An Experimental, Exploratory Study of Causes of
    Bias in Test Items.
  • Scheuneman, Janice Dowd
  • Journal of Educational Measurement v24 n2
    p97-118 Sum 1987
  • Available from UMI
  • Language English
  • Document Type JOURNAL ARTICLE (080) RESEARCH
    REPORT (143)
  • Journal Announcement CIJSEP87
  • This study evaluated 16 hypotheses concerning
    possible sources of bias in test items on the
    Graduate Record Examination General Test. Ten of
    the hypotheses showed interactions between
    group membership and the item version
    indicating a differential effect of the item
    manipulation on the performance of Black and
    White examinees. (Author/LMO)
  • Descriptors Blacks College Entrance
    Examinations Higher Education Hypothesis
    Testing Racial Differences Sex Differences
    Statistical Bias Test Bias Test Items
    Whites
  • Identifiers Graduate Record Examinations Log
    Linear Analysis

19
bibliographic records
  • way of representing surrogates created during
    cataloging or indexing process
  • fields represent individual pieces of information
    about each entity (title, author, abstract)
  • records collect fields about a given entity
    (book, article)
  • files collect records about all entities of
    interest (database, catalog)

20
structured text
  • way of representing internal structure of
    documents (acts, chapters, stanzas, captions,
    etc.)
  • also meta-information (version, edition,
    authority, date, etc.)
  • aid in analyzing, printing, describing, searching

21
use in searching/bibliographic records
  • pretty standard permits searching by field
    need not search entire texts for Bush if you
    want only documents written by Bush
  • often used as auxiliaries to searches by content
    (we will return to this in a few weeks)

22
use in searching/structured text
  • only just emerging can permit searching in much
    more detailed way by element (title, chapter,
    caption), meta-information (version, edition),
    etc.
  • as yet pretty primitive, in HTML (search on
    and tags, for example)
  • and, of course, the right tags have to be there
    and the search engine has to be able to take
    advantage of them

23
overhead
  • in both cases, this is a lot of work, and the
    more you want to be able to use structure to aid
    in searching, the more work it will be
  • bibliographic records if you want indexing,
    abstracting or even structure at all, somebody
    has to do it
  • structured text more sophisticated searching
    means more detailed markup which is a very
    time-intensive, tedious and costly process
Write a Comment
User Comments (0)
About PowerShow.com