Title: Is Metasearching Really Better Searching
1Is MetasearchingReally Better Searching? STM
Innovations SeminarLondon, Friday 2 December
2005 Pete Johnston Research Officer, UKOLN,
University of Bath
UKOLN is supported by
www.bath.ac.uk
2Is Metasearching Better Searching?
- What is metasearch?
- Making metasearch work
- The NISO Metasearch Initiative
- Metasearch today
- Metasearch and Google
- Metasearch and "social bookmarking"
3What is metasearch?
4What is metasearch?
Metasearch, parallel search, federated search,
broadcast search, cross-database search, search
portal are a familiar part of the information
community's vocabulary. They speak to the need
for search and retrieval to span multiple
databases, sources, platforms, protocols, and
vendors at one time.
NISO MetaSearch initiativehttp//www.niso.org/com
mittees/MS_initiative.html
5The search problem
- User wants to find, access, and use items made
available by multiple content providers - Content providers make their collections
available through their own separate
presentation services - User interacts with multiple services in
succession, e.g. - Query Resource Discovery Network (RDN) for Web
resources - Query Zetoc for journal articles
- etc
6The search problem
7The search problem
- User has to
- Discover different services
- Manage different authentication/access
requirements - Use different user interfaces for search
- Interpret different result sets
- different metadata
- Manipulate different result sets
- human-readable (HTML)
- but difficult to merge, reuse
- May still not have access to (appropriate copy
of) resource
8The metasearch solution
- The provision of "metasearch" services that
- enable user to search across the metadata
databases of multiple content providers from a
single interface - manage multiple result sets and present to user
- manage authentication/access
- (etc!)
- Seamless (to the user) discovery of and access to
heterogeneous, distributed resources!
9Approaches to metasearch (1) cross-searching
- Metasearch service accepts user query
- Sends query to multiple content provider search
targets - Receives responses from targets
- Presents result sets to user
10Z39.50, SRW, SRU, etc
11Approaches to metasearch (2) harvesting
- Metasearch service periodically gathers metadata
records from content provider repositories into
local database - Metasearch service accepts user query
- Executes query on local database
- Presents result sets to user
- Some harvesting services may also harvest/index
copy of resource
12OAI-PMH
13Cross-searching harvesting
- Metasearch service may use both in combination!
- Cross-search
- Latest results returned
- Content provider controls searches available
- May slow overall performance
- Harvesting
- Better performance for user query
- Options for normalisation etc by harvester
- Only as up-to-date as last harvest
14A hospitable climate for metasearch?
- Metasearch service depends on access to metadata
- Web Services
- Standards for providing machine interfaces to
applications on Web - Based on HTTP and XML
- SOAP (messaging protocol), WSDL (service
description), WS- (!!) - WS not just for search!
- Service-oriented approaches, modular applications
- Google and Amazon provide Web Services
- "Web 2.0"
- "The Web as platform"
- Recombining data and services from multiple
sources
15The problems with metasearch
- User requires/expects resources from increasing
range of content providers - What if content provider doesn't implement
standard search/harvest interface? - Some proprietary APIs, "XML Gateways"
- Scalability
- Some "screen-scraping"
- Parsing of HTML pages to obtain metadata
- Rights issues
- Scalability, volatility
16The problems with metasearch
- Metasearch services work, but.
- For service provider
- complex, laborious
- fragile, susceptible to change by content
provider - duplication of effort by service providers
- For content provider
- concerns over efficiency
- concerns over access management
- rights, branding, results presentation/ranking
17Making metasearch work
18Making metasearch work
- Effective metasearch requires agreements between
content providers and service providers - Transport protocol(s)
- Query language(s)
- syntax and semantics
- Metadata schemas
- syntax and semantics
- Metadata quality
- presence of values, formats of literals etc
- Intellectual property rights issues
- how metadata records and resources are presented,
used - Authorisation / authentication
- Disclosure / discovery of collections and services
Andy Powell, "Metasearching an overview",
Presentation to BCS EPSG Seminar, July 2004
19The NISO Metasearch Initiative
- Response to concerns of librarians, systems
vendors, content providers - Aims to enable
- metasearch service providers to offer more
effective and responsive services - content providers to deliver enhanced content and
protect their intellectual property - libraries to deliver services that distinguish
their services from Google and other free web
services
NISO MetaSearch initiativehttp//www.niso.org/com
mittees/MS_initiative.html
20Task Group 1 Access Management
- Conducted survey of authentication methods in use
- Developed use cases for authentication in
metasearch context - Ranked methods by ability to satisfy needs of use
cases - Recommends either
- IP-Authentication with a Proxy Server, or
- Username/Password authentication
- Liaison with Shibboleth community
21Task Group 2 Collection Description
- Metasearch service needs information about
targets available for search/harvest - Discover collections of potential interest
- Obtain sufficient information to identify a
collection - Select one or more collections from amongst a
number of discovered collections - Discover the services that provide access to the
collection - Select a service with which to interact
- Interact with service
Collectiondescription
Servicedescription
22(No Transcript)
23Task Group 2 Collection Description
- Collection Description Specification
- Metadata schema for collection-level description
- Closely aligned with DCMI Collection Description
Application Profile - Title, Subject, Size, Language, Item Type, Owner,
Collector, Audience, Rights etc - Whole/Part relationships
- Collection/Catalogue relationships
- Collection/Service relationships
24Task Group 2 Collection Description
- Information Retrieval Service Description
Specification - Describe those digital services that provide
access to collections - Zeerex
- Indicates protocol used
- Describes access point(s) for service
- Describes authentication/authorization
requirements - Lists operations/queries supported
25Task Group 3 Search/Retrieve
- Result Set Metadata
- Metadata schema to describe result set and record
within result set - To support ranking, branding etc
- Citation Metadata
- Metadata schema for citation components (based on
subset of OpenURL)
26Task Group 3 Search/Retrieve
- NISO XML Gateway
- Based on SRU ("non-conformant subset")
- Query encoded in URI, transmitted in HTTP GET,
response as XML document - Three levels of implementation
- Level 0 Any query grammar
- Level 1 Provide description record for database
- Level 3 Support CQL
- Liaison with A9 Opensearch
27Metasearch today
28Metasearch and Google
- Google
- Harvests full-text of Web pages by following
links - Makes indexes available for search
- Result ranking based on number of links to page
- Index coverage limited to "visible Web"
- Problems with
- Authentication controls
- Non-persistent URIs
- Non-textual resources
- Even if indexed, low ranking if few links
- No fielded searching
29Metasearch and Google
- "Success is as much about what you dont search
as what you do" - Selection is important
- Relevance of results not determined only by
links, citations - e.g. often useful/vital to select/filter by
audience, purpose of resource
Roy Tennant, "Is Metasearch Dead?"http//www.niso
.org/news/events_workshops/OpenURL-05-Agen-FINAL.h
tml
30Metasearch and Google
- Google interest in indexing "hidden Web"
- Collaborations with repository providers, OCLC
etc - Google Scholar
- Google interest in metadata-based approach?
- Google Base
- Google and Metasearch as complementary approaches
to discovery
31Metasearch and "Social bookmarking"
del.icio.ushttp//del.icio.us/
32Metasearch and "Social bookmarking"
Connoteahttp//www.connotea.org/
33Metasearch and "Social Bookmarking"
- Simple user-generated metadata
- Typically description plus "tags"
- Capture user perceptions of resources
- Some services adding richer metadata
- Social merging of personal collections
- Bookmarking services as discovery services
- Connotea as "community-driven recommendation
system" (Lund et al) - Metadata available via RSS or simple API
- Can metasearch services use/integrate metadata
from bookmarking services?
34Is Metasearching Better Searching?
- Technical components for metasearch available
- User expectations of coverage mean metasearch is
a cross-domain problem - However, quality of metasearch dependent on
- metadata quality
- metadata consistency
- across multiple providers
- Metasearch can complement other approaches
- Metasearch as "enabler"
- supporting construction of many different services
35Is MetasearchingReally Better Searching? STM
Innovations SeminarLondon, Friday 2 December
2005 Pete Johnston Research Officer, UKOLN,
University of Bath
UKOLN is supported by
www.bath.ac.uk