Archiving Web Resources Information Day - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Archiving Web Resources Information Day

Description:

search/browse access to archived databases. Monica Berko, NLA. 12th November, 2004 ... our method for archiving a deep web site has been to acquire the back-end ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 29
Provided by: Help84
Category:

less

Transcript and Presenter's Notes

Title: Archiving Web Resources Information Day


1
Archiving Web ResourcesInformation Day
  • Xinq XML Inquire
  • search/browse access to archived databases
  • Monica Berko, NLA
  • 12th November, 2004

2
Why was Xinq developed
  • our method for archiving a deep web site has been
    to acquire the back-end database and transform
    the content to XML
  • we now have XML file(s) and possibly a schema
  • how do we provide access to the content that
    emulates the original interface on the live web
    site

3
Examples
  • Health Education Rural Remote Resources Database
  • Plant Breeders Rights Database
  • Australian Medical Pioneers Index
  • Australia Dancing
  • The Last Resting Place of Australian Artists
  • Soldiers of the South African War

4
Problems to be solved
  • How to describe the data model and semantics of
    the deposited database
  • What administrative metadata is required
  • How to describe the behaviour of the access
    interface
  • Select a platform for building web interfaces to
    XML data stores that is open-source, popular,
    easily deployed and scalable.
  • How to automatically generate a web-based
    search/browse interface to an arbitrary XML data
    archive in this platform

5
Archival Information Package
  • XML configuration file storing the required
    administrative information, data model
    description, and access/display rules
  • XML Schema for the Archive Contents
  • The archive contents (XML only)

6
Administrative Metadata
  • Descriptive Title, Description, Publisher, Live
    URL
  • If the database describes digital objects which
    are also to be fetched and archived the field(s)
    containing URI reference must be specified
  • Extraction details about original database and
    the filtering, mapping processes used to extract
    data to be archived
  • Ingestion details about ingestion of the XML
    archive into the repository and also
    corresponding digital objects if there are any

7
Describing the Data Model
  • An XML schema is difficult for non-technical
    staff to author
  • Semantic information is required for reproducing
    a usable web interface
  • The data model must cope with multi-item models
    and item relationships
  • The description must be expressed as XML
  • The Xinq tool generates an XML schema for each
    item described in the data model

8
Single Item and Multi-Item Sites
  • Many dynamic web sites describe only one entity
  • Some sites are obviously based around multiple
    but related entities and the database archive
    description will need to reflect this egHealth
    Education Rural Remote Resources Database

9
Arbitrary database model
10
Simple Multi-Item Example
11
Example Data Model Definition
  • http//www.nla.gov.au/xinq/documents/examples/pub2
    _archive-spec.xml

12
Data Model definition schema
13
(No Transcript)
14
The Access Interface
  • a home page with a search form
  • a search results display page for each item type
  • a detailed display page for all the properties of
    an item
  • browse options which mimic the browse options
    available on the original site
  • a default header and footer file which includes
    the name of the publisher and the url of the
    original site

15
Example
  • Health Education Rural Remote Resources Database
  • Live Site
  • Archived Site

16
Describing the access interface
  • Search rules
  • Browse rules
  • Display rules

17
Search Rules
18
Search rules example
ltsearch_rulesgt ltsearchgt
ltentitygtpublicationlt/entitygt
ltfieldgttitlelt/fieldgt
ltfieldgttypelt/fieldgt
ltfieldgtauthorltsubfieldgtfamilynamelt/subfieldgtlt/fiel
dgt ltfieldgtauthorltsubfieldgtgivennamelt/s
ubfieldgtlt/fieldgt ltfieldgtauthorltsubfiel
dgtdeceasedlt/subfieldgtlt/fieldgt lt/searchgt
ltsearchgt ltentitygtpersonlt/entitygt
ltfieldgtfamilynamelt/fieldgt
ltfieldgtgivennamelt/fieldgt
ltfieldgtdeceasedlt/fieldgt
ltfieldgtbirthdatelt/fieldgt lt/searchgt
lt/search_rulesgt
19
Browse Rules
20
Browse Rules Example
ltbrowse_rulesgt ltbrowsegt ltentitygtpublication
lt/entitygt ltfieldgttitlelt/fieldgt
ltfieldgttypelt/fieldgt ltfieldgtauthor
ltsubfieldgtfamilynamelt/subfieldgt lt/fieldgt
lt/browsegt lt/browse_rulesgt
21
Display Rules
22
ltdisplay_rulesgt ltresultsummarygt
ltentitygtresourcelt/entitygt ltresults-stylegttabl
elt/results-stylegt ltfield sortorder"ascending
" link"true"gttitlelt/fieldgt
ltfieldgtproviderltsubfieldgtnamelt/subfieldgtlt/fieldgt
ltfieldgtproviderltsubfieldgtstatelt/subfieldgtlt/fie
ldgt lt/resultsummarygt ltresultsummarygt
ltentitygtproviderlt/entitygt ltresults-stylegtlist
lt/results-stylegt ltfield sortorder"ascending"
link"true"gtnamelt/fieldgt ltfieldgtstatelt/field
gt ltfieldgtcontactlt/fieldgt
ltfieldgtphonelt/fieldgt ltfieldgtemaillt/fieldgt
lt/resultsummarygt lt/display_rulesgt
23
More Examples
  • Plant Breeders Rights Database
  • Australian Medical Pioneers Index
  • The Last Resting Place of Australian Artists
  • Soldiers of the South African War

24
Required Infrastructure
  • Native XML database server which supports XQuery
    and XMLDB API (eXist and Tamino have been
    tested)
  • Java servlet container(Tomcat and Jetty have
    been tested)
  • Apache Ant
  • Xalan XSLT processor

25
Limitations of the tool
  • Does not validate item relationships
  • Does not deal with nested property groups
  • Does not yet properly handle nested references
  • Archive description file needs to be authored
    from scratch by the curator
  • Has limited free text search capability
  • Has no advanced search interface
  • Has no map interface for querying by physical
    location
  • Not integrated with the archival of digital
    objects referenced in the database

26
Roadmap
  • Release on SourceForge February 2005
  • Some architectural and performance improvements
  • Develop Wizard-style tool for generating archive
    description file
  • Integration with digital object archives
    referenced by the data
  • Improved handling of nested property groups and
    item relationships
  • Advanced search
  • More flexible configuration of free text
    searching rules

27
Alternative uses for tool
  • For mothballed systems, contents of legacy
    database can be archived as XML and then Xinq can
    generate online search and browse capability.
  • Prototyping tool for requirements analysis
  • Related tool development Xedit generic online
    update capability based on the same database
    description configuration file

28
More Information
  • Project Page
  • http//www.nla.gov.au/xinq
  • Source Forge Entry
  • http//sourceforge.net/projects/xinq/
Write a Comment
User Comments (0)
About PowerShow.com