Next Generation Z39.50 A Web Services Approach for Search and Retrieve - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Next Generation Z39.50 A Web Services Approach for Search and Retrieve

Description:

Next Generation Z39.50. A Web Services Approach for Search and Retrieve ... title any 'dinosaur bird reptile' title exact 'the complete dinosaur' Moen ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 30
Provided by: willia81
Learn more at: https://courses.unt.edu
Category:

less

Transcript and Presenter's Notes

Title: Next Generation Z39.50 A Web Services Approach for Search and Retrieve


1
Next Generation Z39.50A Web Services Approach
for Search and Retrieve
6th Annual State GILS Conference, March 31
April 3, 2004, Raleigh, NC
  • William E. Moenltwemoen_at_unt.edugtSchool of
    Library and Information SciencesTexas Center for
    Digital KnowledgeUniversity of North
    TexasDenton, TX 72603

2
Overview
  • Quick description of SRW
  • Brief background historical, political,
    conceptual
  • Non-technical (almost) introduction to SRW
  • Common Query Language (CQL) briefly
  • Concluding thoughts

3
What is SRW?
  • Search and Retrieve Web Service (SRW)
  • An XML-based protocol for searching, retrieving,
    and other information retrieval transactions
  • Cast in the standards/technologies for web
    services
  • XML
  • SOAP
  • HTTP
  • Brings the concepts and experience of Z39.50 into
    the web environment using web technologies

4
Why SRW?
  • Genesis several years of soul searching by
    Z39.50 developers and implementors
  • The web had become the common implementation
    environment
  • Z39.50 was not perceived as web friendly
  • Pivotal moments
  • December 2000 ZIG meeting
  • July 2001 meeting

5
Turning point December 2000
  • Z39.50 Future discussion
  • Perceptions of Z39.50
  • broken
  • heavy-weight
  • difficult and complex
  • old technology
  • not web friendly
  • Several options presented
  • Rewrite the protocol from the ground up
  • Rewrite as an XML protocol
  • Separate the Z39.50 protocol from its use of BER
    as a wire protocol
  • Simplify the protocol specifications to focus on
    core features
  • Recognition of the intellectual contribution of
    Z39.50

6
Taking action June 2001
  • Invitational meeting to discuss moving Z39.50 to
    an XML-based protocol
  • Goal
  • Lower the barriers to implementation while
    preserving the existing intellectual
    contributions of Z39.50, discarding those aspects
    no longer useful or meaningful.
  • Objective
  • Define specifications for a new web service
    definition based on Z39.50 together with web
    technologies
  • Separate the Z39.50 abstract and associated
    semantic model from its specific encoding and
    wire protocol (i.e., ASN.1/BER and TCP/IP)
  • Initially called Z39.50 Next Generation (ZNG)
  • Intended as proof-of-concept
  • Defining only those protocol specifications that
    would actually be implemented by participants

7
ZING Z39.50 International Next Generation
  • Make intellectual/semantic content of Z39.50 more
    broadly available
  • Make Z39.50 more attractive by lowering barriers
    to implementation
  • Use of XML to represent and encode data
  • Use of HTTP for transport
  • Use of SOAP for interaction between client and
    server based on Remote Procedural Call (RPC)
  • Several ZING initiatives ZOOM, ez39.50, ZeeRex,
    SRW/U

FOR MORE INFORMATION, VISIT THE ZING WEBSITE
http//www.loc.gov/z3950/agency/zing/
8
SRW/U, SRW, SRU
  • SRW/U Search and Retrieve for the Web
  • General designation for this initiative
  • SRW Search and Retrieve Web Service
  • HTTP Post
  • Simple Object Access Protocol (SOAP)
  • XML messages
  • SRU Search and Retrieve URL Service
  • HTPP Get
  • Request parameters included in URL syntax
  • Development
  • Version 1.0 November 2001
  • Version 1.1 February 2002

FOR MORE INFORMATION, VISIT THE SRW WEBSITE
http//www.loc.gov/srw
9
Networked information retrieval
  • Whats needed
  • Identifying a target to search
  • A vocabulary for expressing search requests,
    search criteria, retrieval requests, etc.
  • Methods to encode the requests and responses from
    the target
  • Methods to transport the requests and responses
    across a network
  • In other words, a protocol and supporting
    specifications

10
Abstract Model of IR
11
Abstract model of Z39.50
12
Z39.50 classic SRW
13
SRW Overview
  • Builds on Z39.50 concepts and web technologies
  • Web technologies XML, SOAP, HTTP
  • Uses new, human-readable query language
  • Combines several Z39.50 features into several
    operation types
  • searchRetrieve operation
  • scan operation
  • explain operation

14
searchRetrieve operation
  • The core of the protocol
  • Expresses the search and additional criteria
  • Records are returned in XML
  • Request parameters
  • version
  • query
  • Optional parameters
  • sortkeys
  • recordPacking
  • recordSchema
  • recordXPath
  • stylesheet
  • Response parameters
  • version
  • numberOfRecords
  • Optional parameters
  • resultSetID
  • resultSetIdleTime
  • records
  • diagnostics

15
SRW XML
  • XML as foundation for protocol
  • Provides syntax for intelligent markup
  • Defines or references XML schemas
  • Example XML schema for SRW specifications
  • searchRetrieveRequest
  • searchRetrieveResponse

16
searchRetrieveRequest example
  • Sent as a HTTP Post
  • XML document is sent to the server
  • Using SOAP to wrap the request

ltsearchRetrieveRequestgt ltversiongt1.1lt/versiongt
ltquerygtdc.title all "Squirrel
Hungry"lt/querygt ltmaximumRecordsgt1lt/maximumReco
rdsgt ltstartRecordgt1lt/startrecordgt
ltrecordSchemagtdclt/recordSchemagt
lt/searchRetrieveRequestgt
17
searchRetrieveResponse example
ltsearchRetrieveResponsegt ltversiongt1.1lt/versio
ngt ltnumberOfRecordsgt10lt/numberOfRecordsgt
ltrecordsgt ltrecordgt
ltrecordSchemagtinfosrw/schema/1/dc- v1.1lt/record
Schemagt ltrecordDatagt
ltdcrecordgt ltdctitlegtSquirrel is
Hungrylt/dctitlegt lt/dcrecordgt
lt/recordDatagt lt/recordgt lt/recordsgt
lt/searchRetrieveResponsegt
18
searchRetrieve response
  • Records returned in response
  • All records in XML syntax
  • According to one or more XML schemas (semantics)
  • Dublin Core
  • Onix
  • MODS
  • MarcXml

19
searchRetrieve example
ltsearchRetrieveRequestgt ltversiongt1.1lt/versiongt
ltquerygtdc.title computerlt/querygt
ltstartRecordgt1lt/startrecordgt
ltmaximumRecordsgt10lt/maximumRecordsgt
ltrecordPackinggtxmllt/recordPackinggt
ltrecordSchemagtdcgtlt/recordSchemagt lt/searchRetrieveR
equestgt
  • Retrieval results
  • XML view
  • Screen shot

20
SRW results
21
SRU briefly
  • Protocol requests can be carried via HTTP Get
  • searchRetrieveRequest parameters expressed in
    standard URL syntax
  • baseURL and search part separated by question
    mark ?
  • Response is XML document containing records
  • The searchRetrieveRequest in SRU
  • http//alcme.oclc.org/srw/search/SOAR?operationse
    archRetrieveversion1.1querydc.title22compute
    r22recordSchemaDCstartRecord1maximumRecords
    10recordPackingxml

22
search/Retrieve query
  • SRW query consists of one or more query
    statements linked by Boolean operators
  • Five categories of query statements
  • single search clause
  • two or more search clauses linked by Boolean
  • search clauses and result sets linked by Boolean
  • two or more result sets linked by Boolean
  • single result set
  • Expressed in the Common Query Language (CQL)

23
Common Query Language (CQL)
  • A formal language for representing queries to
    information retrieval systems
  • Human-readable
  • Search clause
  • Always includes a term
  • simple terms consist of one or more words
  • May include index name
  • To limit search to a particular field/element
  • Index name includes base name and may include
    prefix
  • title, subject
  • dc.title, dc.subject
  • Several index sets have been defined (called
    Context Sets in SRW)
  • dc
  • bath
  • srw
  • Context set defines the available indexes for a
    particular application

24
Other components of CQL
  • Relation
  • lt, gt, lt, gt, , ltgt
  • exact used for string matching
  • all when term is list of words to indicate all
    words must be found
  • any when term is list of words to indicate any
    words must be found
  • Boolean operators and, or, not
  • Proximity (prox operator)
  • relation (lt, gt, lt, gt, , ltgt)
  • distance (integer)
  • unit (word, sentence, paragraph, element)
  • ordering (ordered or unordered)
  • Masking rules and special characters
  • single asterisk () to mask zero or more
    characters
  • single question mark (?) to mask a single
    character
  • carat/hat () to indicate anchoring, left or right

25
CQL examples
  • Simple queries
  • dinosaur
  • "the complete dinosaur"
  • Boolean
  • dinosaur and bird or dinobird
  • "feathered dinosaur" and (yixian or jehol)
  • Proximity
  • foo prox bar
  • foo prox/gt/4/word/ordered bar
  • Indexes
  • title dinosaur
  • bath.title"the complete dinosaur"
  • srw.serverChoicedinosaur
  • Relations
  • year gt 1998
  • title all "complete dinosaur"
  • title any "dinosaur bird reptile"
  • title exact "the complete dinosaur"

26
SRW classic Z39.50
  • SRW
  • No explicit concept of connection, session, or
    state
  • Results sets named by server
  • Single record syntax (XML), multiple schemas
  • String (i.e., human-readable) queries CQL
  • Named indexes
  • Classic Z39.50
  • Stateful
  • Results sets named by client
  • Multiple record syntaxes
  • No human-readable query language
  • Type 1 query using attribute sets
  • Use attribute to identify access point
  • Z39.50 Concepts Retained
  • Result sets
  • Abstract access points
  • Abstract record schemas
  • Explain
  • Diagnostics

27
What problems does SRW solve
  • Addresses need for standards-based searching in
    the networked environment
  • Shows the vitality of the Z39.50 concepts and
    implements those in a web services URL access
    context
  • Offers database providers with a web-friendly
    method for offering standards-based searching of
    resources
  • Provides low barrier to entry solution using
    commonly available technologies
  • XML format of records provide for more reuse, and
    more interesting use of resources

28
Possible implementation venues
  • Gateways to existing Z39.50 servers
  • Lightweight SRW/U servers to specialized
    databases
  • Cost-effective search access to commercial
    databases (e.g., citation, full-text)
  • Metasearching
  • Beyond libraries to many other information
    communities

29
References
  • Z39.50 International Next Generation ZING
  • http//www.loc.gov/z3950/agency/zing/
  • Search and Retrieve for the Web SRW/U
  • http//www.loc.gov/srw
  • A Gentle Introduction to SRW
  • http//www.loc.gov/z3950/agency/zing/srw/introduct
    ion.html
  • A Gentle Introduction to CQL
  • http//zing.z3950.org/cql/intro.html
  • Search and Retrieval in The European Library A
    New Approach by van Veen and Oldroyd in D-Lib
    (Feb04)
  • http//www.dlib.org/dlib/february04/vanveen/02vanv
    een.html
Write a Comment
User Comments (0)
About PowerShow.com