H' Lundbeck AS21Nov091 - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

H' Lundbeck AS21Nov091

Description:

'Alzheimer's disease OR Alzheimer's disorders OR Alzheimer type dementia OR.' Lundbeck Thesaurus ... Metadata on document types in EDMS are evaluated and under ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 35

Provided by: AGE90

Category:

more less

Transcript and Presenter's Notes

Title: H' Lundbeck AS21Nov091

1
Assessing the effectiveness of your current
search and retrieval function
Case story evaluating human metadata indexing
versus automatic query expansion using a
corporate thesaurus

Anna G. Eslau, Information Specialist, H.
Lundbeck A/S
Marianne Lykke Nielsen, Associate Professor,
Royal School of Library and Information Science

2
Agenda

Motivation
Case study
Research partners
Purpose
Test design
Findings
Conclusions
Summing up

3
Motivation

A lot of money has been invested but does our
current search and retrieval function perform as
expected?
An advanced and time consuming indexing task has
been laid upon our end users but is our current
indexing strategy effective?
Do we have - as high quality - alternatives to
manual indexing?

4
Agenda

Motivation
Case study
Research partners
Purpose
Test design
Findings
Conclusions
Summing up

5
Case study - Research partners

H. Lundbeck A/S
Pharmaceutical company
5000 employees, in gt 40 countries
Information systems with electronic documents
Corporate thesaurus
Users and search requests
Royal School of Librarianship
Thesaurus research expertise
Domain knowledge from former research project
Ensight A/S
Verity K2 search engine and Intelligent
Classifier
Technical expertise

6
Purpose of case study

To evaluate
Information retrieval based on controlled, human
indexing (controlled metadata)
Information retrieval based on full-text
indexing, with thesaurus-based automatic query
expansion

7
Case study Retrieval system and indexing policy

Electronic document management system (EDMS) and
bibliographic information system containing
research documentation
Indexing policy
Written indexing policy
Mandatory training of indexers
Corporate Thesaurus
Human, controlled indexing
Topical checklist/Facetted indexing
Searching by controlled metadata and full-text
Domain specific thesaurus containing 5,500
concepts and 16,000 terms

8
EDMS 1/2 - Indexing
9
EDMS 2/2 Searching
10
Lundbeck Thesaurus 1/3
11
Lundbeck Thesaurus 2/3
12
Lundbeck Thesaurus 3/3
13
Agenda

Motivation
Case study
Research partners
Purpose
Test design
Findings
Conclusions
Summing up

14
Test design - Retrieval performance of different
search strategies

Three different search strategies were evaluated
Searches based on natural language (words from
original request) in full text
Searches based on natural language in full text
expanded with words from thesaurus (query
expansion with synonyms and narrower terms)
Searches based on (manually assigned) controlled
keywords in selected metadata fields

15
Test design - Query expansion

Search for information about intravenous
administration of a drug AND Alzheimers disease
Intravenous OR IV OR Intravenously OR
AND
Alzheimers disease OR Alzheimers disorders OR
Alzheimer type dementia OR..

16
Lundbeck Thesaurus
17
Test design - Test persons and retrieval system

Persons
Query expansion tests were carried out by the
thesaurus manager and did not involve end-users
Evaluation of search results were carried out by
end users 4 subject experts (Medical advisers)
who had formerly answered the search requests
System
Verity K2 search system was used as test
retrieval system for the query expansion test
work
Original document management systems were used as
retrieval system for the metadata searches

18
Test design - Test thesaurus

The Lundbeck Thesaurus was the test thesaurus.
The thesaurus formed basis for query
formulations
- Synonyms and narrower terms were picked from
the thesaurus for the test searches based on
expansion of natural language in full text
searches
- Preferred keywords were picked from the
thesaurus for the test searches based on
controlled keywords in selected metadata fields.

19
Test design - Test collection

25,384 document objects from two different
sources
24,369 document objects from a bibliographical
(BRS) information system (internal research
reports and published research articles)
1015 documents from the full-text EDMS system
(internal research reports)

20
Test design - Search requests

10 search requests were selected from a set of
searches which in real life had been carried out
in the corporate information systems

Work task 7 You are a medical reviewer. A
physician has contacted you. He would like to
have data on the use of Citalopram and Reboxetine
together to treat resistant depression. He wants
any reporting of possible interactions. Indicative
request Find reports, papers or case stories
that investigate the possible interaction of
Citalopram and Reboxetine on resistant depression
21
Agenda

Motivation
Case study
Research partners
Purpose
Test design
Findings
Conclusions
Summing up

22
Findings Performance
SJ Search Job, QE Query Expansion
Precision ( relevant docs out of all retrieved
docs) went down from 33 to 24 with query
expansion
23
Findings Human indexing problems
24
Findings Other metadata

Topical retrieval and situational relevance
ranking - the importance of contextual parameters
Document type
Publication year
Source
Language
Author

25
Findings Thesaurus

Thesaurus
Relevant synonyms (acronyms with multiple
meanings should be omitted)
Logical hierarchies
High topical relevance

26
Findings Documents and search requests

Document collection
OCR scanned documents may contain errors gt false
positive hits
Large (gt100 pages) full text documents lower
precision (irrelevant hits)
Search requests
If people are searching using very general terms,
QE will be extremely complicated/extensive, the
more levels of QE we choose to add
Different types of facets result in
Different relevance assessment according to
document types
Different recall in metadata search

27
Findings Search software

Search software settings are important
Stemming
Case sensitivity
Character sensitivity (())
Number of search terms allowed
Zoning

28
Agenda

Motivation
Case study
Research partners
Purpose
Test design
Findings
Conclusions
Summing up

29
Conclusion Thesaurus and QE

A domain specific thesaurus are well suited for
QE
QE improves recall but decreases precision
QE with synonyms only are in most cases sufficient

30
Conclusion - Search result display

Users want to see all hits (recall is important)
Manual sorting of search results by (other than
topical) metadata is requested by the users
Ranking based on e.g. zoning is not always useful

31
Conclusion Indexing policy

Difficult to obtain complete, accurate and
exhaustive human indexing
Findings suggest that searching for specific
topics should be based on full-text indexing,
supported by thesaurus based query expansion
Human indexing should focus on few, important,
well-defined topics, e.g. used to develop
taxonomies for broad browsing
Non-Topical context metadata are important in
assessment of document relevance
Document type
Publication year
Source
Language
Author

32
Conclusion Implications for Lundbeck

Lundbeck Thesaurus has been integrated with
bibliographic information system to perform
automated QE
EDMS upgrade planned where QE should be possible
OCR scanning of existing documents are considered
Metadata on document types in EDMS are evaluated
and under revision (simplified)
New models on how to add metadata are considered
(dictionaries)
New indexing tools for the users are developed
(indexing keys)

33
Agenda

Motivation
Case study
Research partners
Purpose
Test design
Findings
Conclusions
Summing up

34
Summing up

If your current search and retrieval function
does NOT perform as expected, your organisation
may loose important information
You may have an indexing strategy (which is
good) but evaluation may reveal that the
resource investments could be used even better
Evaluation is important, it may save your
organisation money over time

Write a Comment

User Comments (0)