Collection-Level User Searches in Federated Digital Resource Environment - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Collection-Level User Searches in Federated Digital Resource Environment

Description:

Collection-level schema developed based on DC and RSLP (Research Support ... all the other searches in the Registry belong to a collection-level subject search type. ... – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 29
Provided by: zava7
Category:

less

Transcript and Presenter's Notes

Title: Collection-Level User Searches in Federated Digital Resource Environment


1
Collection-Level User Searches in Federated
Digital Resource Environment
  • Oksana Zavalina
  • IMLS Digital Collections and Content project
  • Graduate School of Library and Information
    Science
  • University of Illinois at Urbana-Champaign
  • 2007 ASIST Annual meeting

2
  • IMLS Digital Collections and Content project at
    UIUC
  • Started in 2002 with National Leadership Grant
  • Aggregation of over 200 cultural heritage
    collections
  • Collection Registry
  • Provides access, services, and additional
    functionality to a database of collection
    descriptions
  • Collection-level schema developed based on DC and
    RSLP (Research Support Libraries Programme, UK)
  • Metadata repository
  • Harvested metadata aggregated in one location
  • Acts as a portal to the item-level records for
    digital content in NLG collections

3
  • Participating institutions, 2006

4
  • Collection Registry access through search and
    browse

5
  • Subject Representation in the Registry
  • Gateway to Educational Materials (GEM) subject
    headings
  • Alternative subject headings (e.g., LCSH, AAT,
    locally-developed)
  • Geographic coverage headings (Getty Thesaurus of
    Geographic Terms).

6
  • Top GEM (Gateway to Educational Materials)
    subjects in Collection Registry
  • Social Studies 80
  • United States history
  • State history
  • Arts 46
  • Visual arts
  • Photography
  • Science 17

7
  • Collection records

8
  • Collection records

9
  • Research questions
  • What is the distribution of the two major search
    types (subject and known-item) in the Registry?
  • What are the typical user search categories in
    the Registry? Can FRBR set of 10 entities be used
    for user search categorization?
  • What are the quantitative characteristics of a
    typical user search query in the Registry?
  • How suitable is GEM subject scheme for describing
    diverse collections in the Registry compared to
    alternative controlled vocabularies?
  • semantic similarity measures
  • user keywords extracted from transaction logs
  • subject terms in 3 different controlled
    vocabularies GEM, Library of Congress Subject
    Headings (LCSH), and Art and Architecture
    Thesaurus (AAT).

10
  • Dataset
  • MS Access file (7 months 19,000 records)
  • 936 user keyword search query strings
  • Minimal data processing
  • manual extraction of keyword query strings
  • aggregation of repetitive identical queries and
    morphological variants
  • no query parsing
  • stop-word list prepositions, conjunctions and
    articles
  • Methods
  • Transaction log analysis
  • qualitative (subject analysis similarity
    measures)
  • quantitative (basic descriptive statistics).

11
  • Methods search categories
  • 7 FRBR entities
  • work collection as a work any intellectual or
    artistic creation that has a title attribute
  • (individual) person
  • corporate body
  • concept
  • object
  • event
  • place
  • FRANAR/FRAD entity
  • family
  • Additional categories
  • classes of persons (e.g., abused children,
    prisoners) 2 of queries
  • ethnic/national group (Irish Americans, Sioux
    Indian) 5 of queries
  • unknown search category (e.g., beyond, LU65)

12
  • Operational definition of search types
  • searches where the user queries either the title
    or the author individual or corporate of the
    digital collection belong to collection-level
    known-item search type
  • all the other searches in the Registry belong to
    a collection-level subject search type.

13
  • Methods similarity measures
  • exact matches
  • synonymous matches (semantic variants)
  • near-exact matches
  • syntactic variants (e.g., French art and Art,
    French)
  • morphological variants (e.g., automated speech
    recognition and automatic speech recognition)
  • acronyms (e.g., WW1 and World War, 1914-1918)
  • NO broader and narrower terms matches

14
  • Findings search categories
  • Object, concept, place, individual person are
    heavily used
  • surprisingly low level of event searching (4)

15
  • Polysemy and search intent ambiguity problem
  • Concept or Object?
  • books, tools
  • Amusement park, Ballrooms, Highways,
    interstates, detroithistoricalmuseums
  • Industrial models, Lessonplans,
    dissertations
  • Landscape
  • Work or Person or Object?
  • donquijote
  • TomSawyer
  • Event or Concept?
  • Civil rights movement
  • Census
  • Single or multiple categories/entities?
  • Lettersfrom19thcentury (object? object AND
    event?)
  • childrenthatareabused (class of persons?
    class of persons AND event?)
  • henryfordmuseumandgreenfieldvillage
    (corporate body? corporate body AND person AND
    place?)

16
  • Findings search types
  • Prevalence of subject search
  • Higher than usual level of subject searching
  • general shift towards subject searching in Web
    2.0?
  • conceptual difference between collection-level
    and item-level search?

17
  • Findings search query length

18
  • Findings frequency of unique search query use
    (query popularity)

19
  • Findings user queries by search category

20
  • Findings semantic similarity

21
  • Semantic similarity findings at a glance
  • Weak semantic match between searches and GEM/AAT
    terms strong match for LCSH
  • GEM represents only concepts, AAT only concepts
    and objects

22
  • Findings semantic similarity overlap

23
  • Semantic similarity overlap findings
  • at a glance
  • LCSH on its own (without any overlap with AAT or
    GEM) covers 48 of the user search terms.
  • Only 12 terms (7) matched in AAT were not also
    matched in LCSH.
  • All the terms matched in GEM were also matched in
    LCSH.
  • 27 of user search terms were not matched in any
    of the three controlled vocabularies.

24
  • Semantic similarity map

25
  • Conclusions
  • Unusually high for catalog use / transaction log
    analysis studies level of subject searching
  • Strong semantic match to user queries offered by
    a traditional library subject scheme Library of
    Congress Subject Headings
  • Combination of two or more standardized
    controlled vocabularies may be beneficial for
    collection-level subject description in IMLS DCC
    Registry
  • Based on user searches, we recommend to update
    FRBR model to cover class of persons and
    ethnic/national group.

26
  • Further research
  • reasons for subject search prominence (interviews
    and observations of the Registry users)
  • user conceptualization of the collection-level
    search and its possible difference from the
    concept of the item-level search
  • investigate more flexible than LCSH controlled
    vocabularies, which, unlike GEM or AAT, represent
    a wide variety of search categories.

27
  • Acknowledgements
  • This research has been funded by IMLS NLG
    Research and Demonstration grant LG-02-02-0281
    http//imlsdcc.grainger.uiuc.edu/
  • Special thanks to
  • Timothy W. Cole Principal Investigator
  • Carole L. Palmer Co-Principal Investigator
  • Michael Twidale Co-Principal Investigator
  • Amy Jackson, Sarah Shreeves, and Jenny Benevento
    current and former Project Coordinators

28
  • Questions and comments always welcome
  • Oksana Zavalina zavalina_at_uiuc.edu
Write a Comment
User Comments (0)
About PowerShow.com