LIS618 lecture 2 - PowerPoint PPT Presentation

About This Presentation
Title:

LIS618 lecture 2

Description:

expand topics. You can also expand a topic in a database to see what index terms are available ... example 'add 297' to see what the bible says about it. repeat ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 29
Provided by: kric
Learn more at: https://openlib.org
Category:
Tags: bible | lecture | lis618 | topics

less

Transcript and Presenter's Notes

Title: LIS618 lecture 2


1
LIS618 lecture 2
  • Thomas Krichel
  • 2004-02-07

2
Structure
  • Theory information retrieval performance
  • Practice more advanced dialog.

3
retrieval performance evaluation
  • "Recall" and "Precision" are two classic measures
    to measure the performance of information
    retrieval in a single query.
  • Both assume that there is an answer set of
    documents that contain the answer to the query.
  • Performance is optimal if
  • the database returns all the documents in the
    answer set
  • the database returns only documents in the answer
    set
  • Recall is the fraction of the relevant documents
    that the query result has captured.
  • Precision is the fraction of the retrieved
    documents that is relevant.

4
recall and precision curves
  • Assume that all the retrieved documents arrive at
    once and are being examined.
  • During that process, the user discover more and
    more relevant documents. Recall increases.
  • During the same process, at least eventually,
    there will be less and less useful document.
    Precision declines (usually).
  • This can be represented as a curve.

5
Example
  • Let the answer set be 0,1,2,3,4,5,6,7,8,9 and
    non-relevant documents represented by letters.
  • A query reveals the following result
  • 7,a,3,b,c,9,n,j,l,5,r,o,s,e,4.
  • For the first document, (recall, precision) is
    (10,100), for the third, (20,66), for the
    sixth (30,50), for the tenth (40,40) for the
    (30,33)

6
recall/precision curves
  • Such curves can be formed for each query.
  • An average curve, for each recall level, can be
    calculated for several queries.
  • Recall and precision levels can also be used to
    calculate two single-valued summaries.
  • average precision at seen document
  • R-precision

7
R-precision
  • a more ad-hoc measure.
  • Let R be the size of the answer set.
  • Take the first R results of the query.
  • Find the number of relevant documents
  • Divide by R.
  • In our example, the R-precision is 40.
  • An average can be calculated for a number of
    queries.

8
average precision at seen document
  • To find it, sum all the precision level for each
    new relevant document discovered by the user and
    divide by the total number of relevant documents
    for the query.
  • In our example, it is (100665044 33)/557
  • This measure favors retrieval methods that get
    the relevant documents to the top.

9
critique of recall precision
  • Recall has to be estimated by an expert
  • Recall is very difficult to estimate in a large
    collection
  • They focus on one query only. No serious user
    works like this.
  • There are some other measures, but that is more
    for an advanced course in IR.

10
Looking at database structure
  • Up until now, we have looked at commands that
    take a full-text view of the database.
  • Such commands can be executed for every database.
  • If we want to make more precise queries, we have
    to take account of database structure.

11
blue sheet
  • each database name is linked to a blueish pop-up
    window called the blue sheet for the database
  • This is called the bluesheet.
  • It contains the details of the database.

12
closer look at the bluesheet
  • file description
  • subject coverage (free vocabulary)
  • format options, lists all formats
  • by number (internal)
  • by dialog web format (external, i.e.
    cross-database)
  • search options
  • basic index, i.e. subject contents
  • additional index, i.e. non-subject

13
basic vs additional index
  • the basic index
  • has information that is relevant to the
    substantive contents of the data
  • usually is indexed by word, i.e. connectors are
    required
  • the additional index
  • has data that is not relevant to the substantive
    matter
  • usually indexed by phrase, i.e. connectors are
    not required

14
search options basic index
  • select without qualifiers searches in all fields
    in the basic index
  • bluesheet lists field indicators available for a
    database
  • also note if field is indexed by word or phrase.
    proximity searching only works with word indices.
    when phrases are indexed you don't need proximity
    indicators

15
search in basic index
  • a field in the basic index is queried through
    term/IN, where term is a search term and IN is a
    field indicator
  • Thomas calls this a appending indicator
  • several field indicators can be ORed by giving a
    comma separated list
  • for example mate/ti,de searches for mate in the
    title or descriptor fields

16
limiters and sorting
  • Some databases allow to restrict the search using
    limiters. For example
  • /ABS require abstract present
  • /ENG English language publication
  • Some fields are sortable with the sort command,
    i.e. records can be sorted by the values in the
    fields. Example sort /ti
  • Such features are database specific.

17
additional indices
  • additional indices lists those terms that can
    lead a query. Often, these are phrase indexed.
  • Such fields a queried by prefix INterm where IN
    is the field abbreviator and term is the search
    term
  • Thomas calls this a pre-pending indicator

18
expanding queries
  • names have to be entered as they appear in the
    database.
  • The "expand" command can be used to see varieties
    of spelling of a value
  • It has to be used in conjunction with a field
    identifier, example
  • expand aucruz, b?
  • expand aubarrueco?
  • to search for misspellings of José Manuel
    Barrueco Cruz

19
expanding queries II
  • search produces results of the form
  • Ref Items Index-term
  • Ref is a reference number
  • Items is the number of items where the index term
    appears
  • Index-term is the index term
  • "s Ref" searches for the reference term.

20
expand topics
  • You can also expand a topic in a database to see
    what index terms are available that start with
    the term.
  • If you expand an entry in the expansion list
    again, you can see a list of related terms to
    the term, if such a list is available.

21
Example
  • How many domain names are currently registered in
    Novosibirsk, Russia?
  • Hint use domain name database file 225.
  • Note that this database also covers non-current
    domains.

22
ranking
  • The rank command can be use to show the most
    frequent values of a phrase indexed field in a
    search set.
  • Example
  • rank au s1
  • shows the most frequent authors in a
  • set of result
  • rank de s1
  • shows most frequent descriptors

23
example
  • Who wrote on interest rates and growth rates. Use
    EconLit
  • b 139
  • s interest(n)rate? and growth(n)rate?
  • rank au s1
  • You can then set some authors you are interested
    in, 1-5 for example
  • exit / exs to search for those authors.

24
topic searches
  • Often we want to know what literature is
    available on a certain topic.
  • Many times authors do not use obvious words that
    occur to the searcher.
  • Using descriptors can be very helpful.
  • Conduct a search
  • Look for descriptors
  • Use those in other searches

25
Initial file selection
  • On the main menu, go to the database menu.
  • After the principle menu, you get a search box
  • There you can enter full-text queries for all the
    databases
  • You can then select the database you want
  • And get to the begin databases stage.

26
database categories
  • In order to help people to find databases
    (files), DIALOG have grouped databases by
    categories.
  • categories are listed at http//library.dialog.com
    /bluesheets/html/blo.html
  • 'b category' will select databases from the
    category category at the start.
  • 'sf category' selects files belonging to a
    category category at other times.

27
add/repeat
  • add number, number
  • adds databases by files to the last query
  • example "add 297" to see what the bible says
    about it
  • repeat
  • repeats previous query with database added

28
http//openlib.org/home/krichel
  • Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com