PowerPointPrsentation - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

PowerPointPrsentation

Description:

BASE (Bielefeld Academic Search Engine) Bielefeld Academic ... Practices and Challenges in ... functionality to our favourite repository system (OPUS) ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 26
Provided by: ubbiel
Category:

less

Transcript and Presenter's Notes

Title: PowerPointPrsentation


1
Practices and Challenges in Preservation and
Access for Scientific and Scholarly Digital
RepositoriesDCC/DPE/DRIVER/Nestor Joint
Workshop27-28 November 2007
Bielefeld Academic Search Engine a Scientific
Search Service for Scientific Repositories
Friedrich Summann Bielefeld University Library
2
Overview
  • BASE concept and content
  • Overview BASE user-interface and further steps
  • BASE data processing
  • (from repository to index)
  • OAI harvesting challenges
  • BASE interfaces
  • BASE Demo (perhaps)

3
BASE concept and content
  • BASE uses Fast Data Search
  • BASE uses local Linux-based multi-node system
  • BASE contains intellectual selected resources
    with focus on OAI-Servers but also web crawled
    content
  • BASE is an registred OAI service provider
  • BASE displays result lists as bibliographic data
    and full text hits
  • BASE frontend is written in PHP using the search
    API from Fast Data Search
  • BASE offers sorting, search refinement and search
    history

http//www.base-search.net
4
BASE concept and content
  • Currently 7.2 mio documents in 516 collections,
    15 of them web crawled data, 41 fulltext indexed

5
Development of Repository Integration into BASE
6
Geographical distribution of repositories
7
Aspects of repository integration
  • Visibility (Registries)
  • Academic relevance
  • Repository quality
  • Data quality

8
Special view on IR collections
  • Collections are listed in configuration file
  • ftubirmingham
  • url "http//eprints.bham.ac.uk/"
  • desc_de "The Univ. of Birmingham Eprints
    Archive"
  • desc_en "The Univ. of Birmingham Eprints
    Archive"
  • descdd_de "Birmingham Univ."
  • descdd_en "Birmingham Univ."
  • Collections can be clustered for user-interface,
    e.g. Institutional Repositories Europe consists
    of ftubarcelona, ftubath, ftubristol ,
    ftuhelsinki,
  • Parametric search possible
  • Frontend is ready for multi view (independent
    views with own configuration and layouts on the
    same backend)

9
BASE end-user interface (1)
10
BASE end-user interface (2)
11
BASE end-user interface (3)
  • Displays search results as
  • bibliographic data and full text hits

12
BASE end-user interface (4)
  • The result list (left hand side)

If the document contains meta data (e.g. title,
author, abstract) the displayed description is
highlighted
13
BASE end-user interface (5)
The result list (right hand side)
  • Various options to sort the result set
  • Search refinement by author, keyword, document
    type, language etc.
  • Search history comprises up to 10 queries

14
BASE end-user interface (6)
  • Search Refinement

Select an author ...
... only documents by this author are displayed
15
Google Scholar integration
Check citations (citing articles) in Google
Scholar ...
16
DDC Browsing (based on OAI data)
17
Linguistic options
  • In production
  • Language detection
  • Lemmatizing support
  • Eurovoc integration
  • Subjects enrichment
  • In preparation
  • Text mining and enrichment
  • detecting word topology
  • relating to categories
  • enriching BASE data and BASE query

18
BASE dataflow
Web Pages
Database Records
OAI-Data
Harvesting
Pre-Processing
Processing
Internal Index (FAST)
User interface (PHP)
19
OAI harvesting challenges
  • Repositories do not response or
  • deliver Error Messages
  • Links to the Document are not included
  • or do not work
  • XML file is not well-formed
  • Data contain only References without any Fulltext
  • Access to fulltext often is restricted
  • Field content varies in a very broad range

20
Some Rules from the Harvesting Practice
  • Standard repository software is great
  • - for OAI harvesting as well
  • Small collections small problems
  • Getting the related fulltext is complicated
  • Libraries produce better metadata
  • Communication helps - sometimes
  • Data aggregation may produce problems

21
BASE interfaces
  • Search form (feeding BASE urls and using
    the BASE standard display)
  • HTTP calls (using CQL or FAST query syntax)
  • Web Service (SOAP based,
  • using CQL or FAST query syntax)

22
Local integration (via search form)
E-Repository Integration
ltform action"http//www.base-search.net/index.php
" method"post" accept-charset"UTF-8"gt
ltinput maxlength"512" name"q" type"text"
size"50" /gt ltinput value"Search!"
type"submit" /gt ltinput value"all" name"s"
type"hidden" /gt lt/formgt
23
Example Integration via HTTP interface
  • Meta Search Engine metager.de

24
BASE expertise part of EU project DRIVER
  • Member of the Technical Group
  • (Aggregating, Storing, Indexing)
  • Adding DRIVER functionality to our favourite
    repository system (OPUS)

25
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com