Title: The University of Illinois DLI Project
1Federating Repositoriesof Scientific Literature
www.canis.uiuc.edu
The Interspace Prototype (1997-2000) Digital
Libraries Initiative (1994-1998) Worm Community
System (1990-1993) Telesophy System (1984-1989)
2Federating Repositoriesof Scientific
LiteratureThe University of Illinois Digital
Libraries Initiative (DLI)Project Status
RetrospectiveBruce R. Schatz dli_at_uiuc.eduhttp
//dli.grainger.uiuc.eduAAAS-98, Digital
Libraries SessionPhiladelphia, February 1998
3Evolution of Information Retrieval across the Net
from Bruce R. Schatz, Information Retrieval in
Digital Libraries Bringing Search to the Net
cover article in Science, vol 275, Jan 17, 1997
special issue on Bioinformatics
4Illinois DLI Status
- Production Testbed based in a Real Library
- Document Search based on Structure
- SGML Publisher Stream deployed at U of Illinois
- Technology Research for Scalable Federation
- Concept Search based on Semantics
- Statistical Indexes across subjects and media
5Production Testbed Status
- Based in major Engineering Library
- Production Stream - in testbed before on shelves
- Full-text SGML -- Federated Structure Search
- 5 publishers, 55 journals, 40,000 articles
- Web version campus rollout October 1997
- integrated within library information services
6Production Testbed Evaluation
- 700 users, steadily increasing to max 1500
- used in intro Computer Science classes
- developers and evaluators work closely
- needs assessment and usability studies
- careful multi-modal usage evaluation
- session observations and transaction logs
7Primary Partners
- journal/magazine Publishers
- American Institute of Physics (AIP)
- American Physical Society (APS)
- American Astronomical Society (AAS)
- American Society of Civil Engineers (ASCE)
- American Society of Mechanical Engineers (ASME)
- American Society of Agricultural Engineers (ASAE)
- American Institute of Aeronautics Astronautics
(AIAA) - Institute of Electrical and Electronics Engineers
(IEEE) - Institution of Electrical Engineers (IEE)
- IEEE Computer Society (IEEE-CS)
- testbed SoftQuad, OpenText
- infrastructure Hewlett-Packard, Microsoft
8DeLIver Search Interface
9DeLIver Search Results
10(Full Text Retrieval)
11Result of Figure Caption Search
12Dynamic Linking in Bibliography
13Testbed Difficulties
- Original plan was to modify Mosaic for search
- Web became commercial -- we lost control of
developers - Plan to use standard BRS as fulltext backend
- needed to use SGML specific OpenText search
engine - good-quality SGML simply not available
- we had to train every publisher nothing was
ready - SGML interactive display not journal quality
- physics requires equations -- hard to display
well - Custom software hard to deploy widely
- Web widespread but too lowend for professional
search
14Testbed Successes
- Willing to build custom encoding procedures
- so succeed with SGML where Elsevier and OCLC
failed - Canonical encoding for structure tags
- so can federate across publishers and journals
- Willing to build custom software for Search
- so able to do multiple views not single stream
like Web - Production repositories for real Publishers
- became RD arm of major scientific publishers
- Changing the nature of libraries with research
- research prototype becomes standard service
15Technology Transfer
- Illinois DLI considered RD arm of publishers
- broad spectrum of major publishers in scientific
literature - successful annual partners workshop plus
high-level visits - Technology transferred to Publisher partners
- contract with AIP to clone testbed software
processing - arrangements with ASCE for a second cloning
- Testbed Continuance by University Library
- industrial partners program between Library
Publishers - company formed to provide software and service
16Technology Research
- Scalable Semantics becoming feasible
- statistical clustering proves useful
interactively - concept spaces and category maps
- Semantic indexes for large collections
- 400K Inspec (1995)
- 4M Compendex (1996)
- Simulation of Community Repositories
- 1000 collections across all of engineering
- testbed for vocabulary switching (federation)
17Vocabulary Switching
- Grand Challenge of Digital Libraries
- semantic interoperability across subject domains
- vocabulary switching to suggest across domains
- Generating 1000 community repositories
- 600 categories across engineering (38 top-level)
- 150 categories across EE, CS, physics
- 3M raw abstracts, about 10M in community spaces
- large-scale supercomputer simulation
- 7 days of dedicated computation (10 days overall)
- have space navigation need space intersection
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28Multimedia Federation
- Semantic Indexing within Media
- Text, Image, Number
- Semantic Interoperability across Media
- Spatial Data (GIS) dataset intersection
- Multi-site DLI Collaboration
- U Illinois systems and supercomputers
- U Arizona algorithms and experiments
- UC Santa Barbara collections and metadata
29Semantic Analysis of Multimedia
- Collections of Objects containing Units
- Text community repository (topic proximity)
- document abstracts containing noun phrases
- Image aerial photograph (spatial proximity)
- feature regions containing texture tiles
- Units are media-dependent (statistical parsers)
- Text phrase segmentation (nouns on word parts of
speech) - Image texture segmentation (orientation on pixel
densities) - Indexes are media-independent (statistical
clusters) - Concept co-occurrence similarity of units within
objects - Category self-organizing maps of objects within
collections
30Media Interoperability Experiment
- Feature regions containing texture tiles in
aerial photos - 1M regions in 5K photos around southern
California (GIS) - text concept space and category map in geoscience
- 10M phrases in 500K abstracts from Georef and
Petroleum Abstracts - image concept space and category map in aerial
photos - tile similarity space and visual thesaurus maps
(10M tiles) - numeric satellite sensor data
- 1M NASA AVHRR temperature records, 2M GNIS
feature names - spatial gazetteer as bridge imageltgttextltgtnumber
- images are labeled by GNIS gazetteer (feature
names for text search)
31(No Transcript)
32(No Transcript)
33Federated Search
- Multiple Indexes in Distributed Repositories
- text search SGML for full-text articles
(Testbed) bibliographic
abstracts for full coverage (INSPEC) - term suggestion thesaurus for taxonomy
(INSPEC) - concept spaces for term coverage
(SGML) - Multiple View User Interface Client
- uniform displays for multiple indexes
- drag-and-drop between display views to
mix-and-match - uniform search across multiple repositories
- Multiple Protocol Stateful Gateway
- single query stream analog to single user
interface - will handle distributed repositories for
federation, e.g. AAS - Opentext (socket), term-suggest (SQL), Ovid/DRA
(Z39.50)
34IODyne Engineering Search Example
35Building a new Community
- starting the field of Digital Libraries
- IEEE Computer DLI special issue May 1996
- Computer DLI retrospective planned for 1999
- Allerton workshops on DL Sociology
- edited book planned on DL Evaluation
- DLI National Coordination effort
- Illinois DLI retrospective conference (Mar 98)
36The 21st Century Analysis
- Beyond Search to Analysis
- Cross-Correlating Information from many sources
across the Net - The Net solves problems
- Every community has its own special library
- Every community and every person does indexing !!
- The Internet evolves into the Interspace