Title: A Web Services Approach for Search and Retrieve The Next Generation Z39.50
1A Web Services Approach for Search and Retrieve
The Next Generation Z39.50
Access 2004, October 13-16, 2004, Halifax, Nova
Scotia
- William E. Moenltwemoen_at_unt.edugtSchool of
Library and Information SciencesTexas Center for
Digital KnowledgeUniversity of North
TexasDenton, TX 72603
2Overview
- Quick description of SRW
- Brief background historical, political,
conceptual - Non-technical (almost) introduction to SRW
- Common Query Language (CQL) briefly
- Concluding thoughts
3What is SRW?
- Search and Retrieve Web Service (SRW)
- An XML-based protocol for searching, retrieving,
and other information retrieval transactions - Cast in the standards/technologies for web
services - XML
- SOAP
- HTTP
- Brings the concepts and experience of Z39.50 into
the web environment using web technologies
4Why SRW?
- Genesis several years of soul searching by
Z39.50 developers and implementors - The web had become the common implementation
environment - Z39.50 was not perceived as web friendly
- Pivotal moments
- December 2000 ZIG meeting
- July 2001 meeting
5Turning point December 2000
- Z39.50 Future discussion
- Perceptions of Z39.50
- broken
- heavy-weight
- difficult and complex
- old technology
- not web friendly
- Several options presented
- Rewrite the protocol from the ground up
- Rewrite as an XML protocol
- Separate the Z39.50 protocol from its use of BER
as a wire protocol - Simplify the protocol specifications to focus on
core features - Recognition of the intellectual contribution of
Z39.50
6Taking action June 2001
- Invitational meeting to discuss moving Z39.50 to
an XML-based protocol - Goal
- Lower the barriers to implementation while
preserving the existing intellectual
contributions of Z39.50, discarding those aspects
no longer useful or meaningful. - Objective
- Define specifications for a new web service
definition based on Z39.50 together with web
technologies - Separate the Z39.50 abstract and associated
semantic model from its specific encoding and
wire protocol (i.e., ASN.1/BER and TCP/IP) - Initially called Z39.50 Next Generation (ZNG)
- Intended as proof-of-concept
- Defining only those protocol specifications that
would actually be implemented by participants
7ZING Z39.50 International Next Generation
- Make intellectual/semantic content of Z39.50 more
broadly available - Make Z39.50 more attractive by lowering barriers
to implementation - Use of XML to represent and encode data
- Use of HTTP for transport
- Use of SOAP for interaction between client and
server - Several ZING initiatives ZOOM, ez39.50, ZeeRex,
SRW/U
FOR MORE INFORMATION, VISIT THE ZING WEBSITE
http//www.loc.gov/z3950/agency/zing/
8SRW/U, SRW, SRU
- SRW/U Search and Retrieve for the Web
- General designation for this initiative
- SRW Search and Retrieve Web Service
- XML messages
- Simple Object Access Protocol (SOAP)
- HTTP Post
- SRU Search and Retrieve URL Service
- Request parameters included in URL syntax
- HTPP Get
- Development
- Version 1.0 November 2001
- Version 1.1 February 2004
FOR MORE INFORMATION, VISIT THE SRW WEBSITE
http//www.loc.gov/srw
9Networked information retrieval
- Whats needed
- Identifying a target to search
- A vocabulary for expressing search requests,
search criteria, retrieval requests, etc. - Methods to encode the requests and responses from
the target - Methods to transport the requests and responses
across a network - In other words, a protocol and supporting
specifications
10Abstract Model of IR
11Abstract model of Z39.50
12Z39.50 classic SRW
13SRW Overview
- Builds on Z39.50 concepts and web technologies
- Web technologies XML, SOAP, HTTP
- Uses new, human-readable query language
- Combines several Z39.50 features into several
operation types - searchRetrieve operation
- scan operation
- explain operation
14searchRetrieve operation
- The core of the protocol
- Expresses the search and additional criteria
- Records are returned in XML
- Request parameters
- version
- query
- Optional parameters
- sortkeys
- recordPacking
- recordSchema
- recordXPath
- stylesheet
- Response parameters
- version
- numberOfRecords
- Optional parameters
- resultSetID
- resultSetIdleTime
- records
- diagnostics
15SRW XML
- XML as foundation for protocol
- Provides syntax for intelligent markup
- Defines or references XML schemas
- Example XML schema for SRW specifications
- searchRetrieveRequest
- searchRetrieveResponse
16searchRetrieveRequest example
- XML document is sent to the server
- Using SOAP to wrap the request
- Sent as a HTTP Post
ltsearchRetrieveRequestgt ltversiongt1.1lt/versiongt
ltquerygtdc.title all "Squirrel
Hungry"lt/querygt ltmaximumRecordsgt1lt/maximumReco
rdsgt ltstartRecordgt1lt/startrecordgt
ltrecordSchemagtdclt/recordSchemagt
lt/searchRetrieveRequestgt
17searchRetrieveResponse
- Records returned in response
- All records in XML syntax
- According to one or more XML schemas (semantics)
- Dublin Core
- Onix
- MODS
- MarcXML
18searchRetrieveResponse example
ltsearchRetrieveResponsegt ltversiongt1.1lt/versio
ngt ltnumberOfRecordsgt10lt/numberOfRecordsgt
ltrecordsgt ltrecordgt
ltrecordSchemagtinfosrw/schema/1/dc- v1.1lt/record
Schemagt ltrecordDatagt
ltdcrecordgt ltdctitlegtSquirrel is
Hungrylt/dctitlegt lt/dcrecordgt
lt/recordDatagt lt/recordgt lt/recordsgt
lt/searchRetrieveResponsegt
19searchRetrieve example
ltsearchRetrieveRequestgt ltversiongt1.1lt/versiongt
ltquerygtdc.title computerlt/querygt
ltstartRecordgt1lt/startrecordgt
ltmaximumRecordsgt10lt/maximumRecordsgt
ltrecordPackinggtxmllt/recordPackinggt
ltrecordSchemagtdcgtlt/recordSchemagt lt/searchRetrieveR
equestgt
- Retrieval results
- XML view
- Screen shot
20SRW results
21SRU briefly
- Protocol requests can be carried via HTTP Get
- searchRetrieveRequest parameters expressed in
standard URL syntax - baseURL and search part separated by question
mark ? - Response is XML document containing records
- The searchRetrieveRequest in SRU
- http//alcme.oclc.org/srw/search/SOAR?operationse
archRetrieveversion1.1querydc.title22compute
r22recordSchemaDCstartRecord1maximumRecords
10recordPackingxml - Eric Lease Morgans Journal Locator
- Use of extra data parameters
- allow implementers to add additional functionality
22search/Retrieve query
- SRW query consists of one or more query
statements linked by Boolean operators - Five categories of query statements
- single search clause
- two or more search clauses linked by Boolean
- search clauses and result sets linked by Boolean
- two or more result sets linked by Boolean
- single result set
- Expressed in the Common Query Language (CQL)
23Common Query Language (CQL)
- A formal language for representing queries to
information retrieval systems - Simple free text
- Complex Boolean, proximity
- Human-readable
- Search clause
- Always includes a term
- simple terms consist of one or more words
- May include index name
- To limit search to a particular field/element
- Index name includes base name and may include
prefix - title, subject
- dc.title, dc.subject
- Several index sets have been defined
- dc
- bath
- cql
- Context sets in SRW define the available indexes
for a particular application and additional query
specifications (e.g., relation operators) - Legend of the Five Rings Database
24Other components of CQL
- Relation
- lt, gt, lt, gt, , ltgt
- exact used for string matching
- all when term is list of words to indicate all
words must be found - any when term is list of words to indicate any
words must be found - Boolean operators and, or, not
- Proximity (prox operator)
- relation (lt, gt, lt, gt, , ltgt)
- distance (integer)
- unit (word, sentence, paragraph, element)
- ordering (ordered or unordered)
- Masking rules and special characters
- single asterisk () to mask zero or more
characters - single question mark (?) to mask a single
character - carat/hat () to indicate anchoring, left or right
25CQL examples
- Simple queries
- dinosaur
- "the complete dinosaur"
- Boolean
- dinosaur and bird or dinobird
- "feathered dinosaur" and (yixian or jehol)
- Proximity
- foo prox bar
- foo prox/gt/4/word/ordered bar
- Indexes
- title dinosaur
- bath.title"the complete dinosaur"
- srw.serverChoicedinosaur
- Relations
- year gt 1998
- title all "complete dinosaur"
- title any "dinosaur bird reptile"
- title exact "the complete dinosaur"
26SRW classic Z39.50
- SRW
- No explicit concept of connection, session, or
state - Results sets named by server
- Single record syntax (XML), multiple schemas
- String (i.e., human-readable) queries CQL
- Named indexes
- Classic Z39.50
- Stateful
- Results sets named by client
- Multiple record syntaxes
- No human-readable query language
- Type 1 query using attribute sets
- Use attribute to identify access point
- Z39.50 Concepts Retained
- Result sets
- Abstract access points
- Abstract record schemas
- Explain
- Diagnostics
27What problems does SRW solve
- Addresses need for standards-based searching in
the networked environment - Shows the vitality of the Z39.50 concepts and
implements those in a web services URL access
context - Offers database providers with a web-friendly
method for offering standards-based searching of
resources - Provides low barrier to entry solution using
commonly available technologies - XML format of records provide for more reuse, and
more interesting use of resources
28Possible implementation venues
- Gateways to existing Z39.50 servers
- Lightweight SRW/U servers to specialized
databases - Standard search interface for OAI service
providers and institutional repositories - Cost-effective search access to commercial
databases (e.g., citation, full-text) - Metasearching
- Beyond libraries to many other information
communities
29References
- Z39.50 International Next Generation ZING
- http//www.loc.gov/z3950/agency/zing/
- Search and Retrieve for the Web SRW/U
- http//www.loc.gov/srw
- A Gentle Introduction to SRW
- http//www.loc.gov/z3950/agency/zing/srw/introduct
ion.html - A Gentle Introduction to CQL
- http//zing.z3950.org/cql/intro.html
- An Introduction to the Search/Retrieve URL
Service (SRU) by Eric Lease Morgan in Ariadne
(July 04) - http//www.ariadne.ac.uk/issue40/morgan/
- Search and Retrieval in The European Library A
New Approach by van Veen and Oldroyd in D-Lib
(Feb 04) - http//www.dlib.org/dlib/february04/vanveen/02vanv
een.html