Title: PowerPointPrsentation
1Practices and Challenges in Preservation and
Access for Scientific and Scholarly Digital
RepositoriesDCC/DPE/DRIVER/Nestor Joint
Workshop27-28 November 2007
Bielefeld Academic Search Engine a Scientific
Search Service for Scientific Repositories
Friedrich Summann Bielefeld University Library
2Overview
- BASE concept and content
- Overview BASE user-interface and further steps
- BASE data processing
- (from repository to index)
- OAI harvesting challenges
- BASE interfaces
- BASE Demo (perhaps)
3BASE concept and content
- BASE uses Fast Data Search
- BASE uses local Linux-based multi-node system
- BASE contains intellectual selected resources
with focus on OAI-Servers but also web crawled
content - BASE is an registred OAI service provider
- BASE displays result lists as bibliographic data
and full text hits - BASE frontend is written in PHP using the search
API from Fast Data Search - BASE offers sorting, search refinement and search
history
http//www.base-search.net
4BASE concept and content
- Currently 7.2 mio documents in 516 collections,
15 of them web crawled data, 41 fulltext indexed
5Development of Repository Integration into BASE
6Geographical distribution of repositories
7Aspects of repository integration
- Visibility (Registries)
- Academic relevance
- Repository quality
- Data quality
8Special view on IR collections
- Collections are listed in configuration file
- ftubirmingham
- url "http//eprints.bham.ac.uk/"
- desc_de "The Univ. of Birmingham Eprints
Archive" - desc_en "The Univ. of Birmingham Eprints
Archive" - descdd_de "Birmingham Univ."
- descdd_en "Birmingham Univ."
- Collections can be clustered for user-interface,
e.g. Institutional Repositories Europe consists
of ftubarcelona, ftubath, ftubristol ,
ftuhelsinki, - Parametric search possible
- Frontend is ready for multi view (independent
views with own configuration and layouts on the
same backend)
9BASE end-user interface (1)
10BASE end-user interface (2)
11BASE end-user interface (3)
- Displays search results as
- bibliographic data and full text hits
12BASE end-user interface (4)
- The result list (left hand side)
If the document contains meta data (e.g. title,
author, abstract) the displayed description is
highlighted
13BASE end-user interface (5)
The result list (right hand side)
- Various options to sort the result set
- Search refinement by author, keyword, document
type, language etc. - Search history comprises up to 10 queries
14BASE end-user interface (6)
Select an author ...
... only documents by this author are displayed
15Google Scholar integration
Check citations (citing articles) in Google
Scholar ...
16DDC Browsing (based on OAI data)
17Linguistic options
- In production
- Language detection
- Lemmatizing support
- Eurovoc integration
- Subjects enrichment
- In preparation
- Text mining and enrichment
- detecting word topology
- relating to categories
- enriching BASE data and BASE query
18BASE dataflow
Web Pages
Database Records
OAI-Data
Harvesting
Pre-Processing
Processing
Internal Index (FAST)
User interface (PHP)
19OAI harvesting challenges
- Repositories do not response or
- deliver Error Messages
- Links to the Document are not included
- or do not work
- XML file is not well-formed
- Data contain only References without any Fulltext
- Access to fulltext often is restricted
- Field content varies in a very broad range
20Some Rules from the Harvesting Practice
- Standard repository software is great
- - for OAI harvesting as well
- Small collections small problems
- Getting the related fulltext is complicated
- Libraries produce better metadata
- Communication helps - sometimes
- Data aggregation may produce problems
21BASE interfaces
- Search form (feeding BASE urls and using
the BASE standard display)
- HTTP calls (using CQL or FAST query syntax)
- Web Service (SOAP based,
- using CQL or FAST query syntax)
22Local integration (via search form)
E-Repository Integration
ltform action"http//www.base-search.net/index.php
" method"post" accept-charset"UTF-8"gt
ltinput maxlength"512" name"q" type"text"
size"50" /gt ltinput value"Search!"
type"submit" /gt ltinput value"all" name"s"
type"hidden" /gt lt/formgt
23Example Integration via HTTP interface
- Meta Search Engine metager.de
24BASE expertise part of EU project DRIVER
- Member of the Technical Group
- (Aggregating, Storing, Indexing)
- Adding DRIVER functionality to our favourite
repository system (OPUS)
25Thank you!