Mixed content, mixed metadata: Information discovery in the NSDL

About This Presentation

Title:

Mixed content, mixed metadata: Information discovery in the NSDL

Description:

It holds information about every collection and item known to the NSDL, ... Collections map metadata to Dublin Core, provide via Open Archives protocol. ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 23

Provided by: bobm181

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Mixed content, mixed metadata: Information discovery in the NSDL

1
Mixed content, mixed metadataInformation
discovery in the NSDL
2
Experience from American Memory and NSDL
Caroline R. Arms and William Y. Arms Mixed
content, mixed metadata information discovery in
a messy world In Metadata in Practice, Editors
Diane Hillmann and Elaine Westbrooks, ALA
Editions (forthcoming)
3
The National Science Digital Library
The Integration Task is to provide a coherent set
of collections and services across great
diversity (all digital collections relevant to
science education).
http//nsdl.org/
4
Mixed Content
Examples NSDL-funded collections at
Cornell Atlas. Data sets of earthquakes,
volcanoes, etc. Reuleaux. Digitized kinematics
models from the nineteenth century Laboratory of
Ornithology. Sound recording, images, videos of
birds and other animals. Nuprl. Logic-based tools
to support programming and to implement formal
computational mathematics.
5
Effective Information Discovery Before Digital
Information

Searching
(a) Resources separated into categories of
related materials. Each category organized,
indexed and searched separately.
Catalogs and indexes built on tightly controlled
metadata standards, e.g., MARC, MeSH headings,
etc.
Search engines used Boolean operators and
fielding searching.
Query languages and search interfaces assumed a
trained user.
Resources were physical items.

6
Effective Information Discovery With Homogeneous
Digital Information
Comprehensive metadata with Boolean retrieval
Can be excellent for well-understood categories
of material, but requires standardized metadata
and relatively homogeneous content (e.g., MARC
catalog). Full text indexing with ranked
retrieval Can be excellent, but methods
developed and validated for relatively
homogeneous textual material (e.g., TREC ad hoc
track).
7
Mixed Metadata the Chimera of Standardization

Technical reasons
Characteristics of formats and genres
Differing user needs
Social and cultural reasons
Economic factors
Installed base

8
Cross-Domain Metadata
Dublin Core "... indexes such as Lycos are
most useful in small collections within a given
domain. As the scope of their coverage expands,
indexes succumb to problems of large retrieval
sets and problems of cross-disciplinary semantic
drift. Richer records, created by content
experts, are necessary to improve search and
retrieval." Weibel 1995
9
Information Discovery in a Messy World
Web search engines have adapted to a very large
scale. Other techniques, such as cross-domain
metadata and federated searching have failed to
scale up. What new concepts and techniques
have enabled this adaptation? What
can we learn that is applicable to other
information discovery tasks? How
is NSDL making use of this understanding?
10
Information Discovery in a Messy World
Building blocks Brute force computation The
expertise of users -- human in the
loop Methods (a) Better understanding of how and
why users seek for information (b) Relationships
and context information (c) Multi-modal
information discovery (d) User interfaces for
exploring information
11
Understanding How and Why Users Seek for
Information
Homogeneous content All documents are assumed
equal Criterion is relevance (binary
measure) Goal is to find all relevant documents
(high recall) Hits ranked in order of similarity
to query Mixed content Some documents are more
important than other Goal is to find most useful
documents on a topic and then browse Hits ranked
in order that combines importance and similarity
to query
12
Relationship and Contextual Information
Methods for capturing context Analysis of
citations and links (e.g., PageRank) Mining
usage logs (e.g., customers who buy the same
product) Reviews (e.g., reputation
management) Structural relationships (e.g.,
domain names)
13
Multi-Modal Information Discovery
With mixed content and mixed metadata, the amount
of information about the various resources
varies greatly but clues from many difference
sources can be combined. "The fundamental
premise of the research was that the integration
of these technologies, all of which are imperfect
and incomplete, would overcome the limitations of
each, and improve the overall performance in the
information retrieval task." Wactlar, 2000
14
User Interfaces for Exploring Information
Return objects
Return hits
Browse content
Search index
15
NSDL The Spectrum of Interoperability
Level Agreements Example Federation Strict use
of standards AACR, MARC (syntax, semantic, Z
39.50 and business) Harvesting Digital
libraries expose Open Archives metadata
simple metadata harvesting protocol and
registry Gathering Digital libraries do not Web
crawlers cooperate services must and search
engines seek out information
16
The NSDL Repository
Services
The repository is a resource for service
providers. It holds information about every
collection and item known to the NSDL, including
contextual information.
NSDL Repository
Users
Collections
17
NSDL Search Service First Phase
NSDL Repository
harvest
Portal
SDLIP
Search andDiscoveryService
Portal
Portal
crawl
Inquery -gt Lucene
Collections
18
NSDL Search Service First Phase

Approach
Collections map metadata to Dublin Core, provide
via Open Archives protocol.
Search service augments Dublin Core metadata with
indexing of full-text where available.
User interface returns snippets derived from the
metadata, links to full content and to metadata.

19
NSDL Search Service First Phase

Weaknesses
Ranking by similarity to query not sufficient.
Snippets do not indicate why item was returned
(e.g., terms in full text but not in metadata).
Dublin Core records provide limited information.
(d) Browsing environment limited.
(e) Most users begin their search with a Web
search engine (e.g., Google)

20
NSDL Search Service Second Phase Developments

Metadata
Accept any metadata that is available in a range
of formats
System for reviews and annotations, with
reputation management
Search system
Multimodal retrieval and ranking
Dynamic generation of snippets by search engine

21
NSDL Search Service Second Phase Developments
(cont.)

Usability and human factors
Wider range of browsing tools (e.g., collection
visualization)
Filters by education level and education quality,
where known
Web compatibility
Expose records for Web crawlers to index
Browser bookmarklet to add NSDL information to
Web pages

22
Mixed content, mixed metadataInformation
discovery in the NSDL

Write a Comment

User Comments (0)

About PowerShow.com

Mixed content, mixed metadata: Information discovery in the NSDL - PowerPoint PPT Presentation

Mixed content, mixed metadata: Information discovery in the NSDL

It holds information about every collection and item known to the NSDL, ... Collections map metadata to Dublin Core, provide via Open Archives protocol. ... – PowerPoint PPT presentation