Title: Alexandria Digital Library ADL
1Alexandria Digital Library (ADL)
What is ADL? What is it suppose to do?
2ADL Mission
- To provide a distributed spatially searchable
digital library of geographically referenced
materials. - The library's components may be distributed
(spread across the Internet) or coexist within a
single network or desktop. -
- Geographically-referenced means that all the
information objects in the library will be
associated with one or more regions
("footprints") on the surface of the Earth.
3Alexandria Digital Library (ADL)
- NSF funded digital library project 1994-98
- New method to organize search for information
- Focused on geographical information
- Internet searching and data delivery
- Operational library 1999-Present
- 2.8 million bibliographic records
- 5.5 million place names records
- 7.5 terabytes of on-line data
- Available to the public via the Internet
4What Is Spatial Information?
Museum Artifacts
Art about
Zoological Habitat Study
Geographical Data Archives
Botanical Survey
Earth Science Data
Books about
Archeological Digs
5What information do you have about here?
6ADL Organization
- The ADL project has
- An operational library run by the Davidson
Library, - A research component (ADEPT) funded by NSF and
others, and - A gazetteer (place name index and geocoder) run
by the Davidson Library
7Operational Partners
- Implementers
- AUT (Auckland University of Technology)
Software implementation and content builder - DLESE (Digital Library for Earth Systems
Education) Software implementation and content
builder - CNR (Center for National Research, Pisa Italy)
Content Builders - ADEPT Educational classroom content
- CASS (Center for the Analysis of Sacred Sites)
Video, sound, imagery text - ESSW MODIS real-time spacecraft imagery
- Scripps SIOExplorer Oceanographic Data
8Alexandria Digital Library (ADL)
History
9Prototypes
- Rapid Prototype (CD ROM Arc View)
- Java Application
- Marc FGDC Union Catalog
- Web Version 1
- Search Optimized Fields, AKA Search Buckets
- Java Application
- CDL Web Client
10Marc FGDC Web Prototype (1995)
11Java Application Prototype (1997)
12Webclient Interface (2002) 1/2
13Webclient Interface (2002) 2/2
14ADL - Web Gazetteer
Printed Report
15Alexandria Digital Library (ADL)
Current ADL Architecture
16Common Features of the Prototypes
- Map
- Place name search
- Search definition frame/panel/tab
- Vocabulary support where appropriate
- Standardized citation metadata
display/formatting
17ADL Architecture Goals (1/2)
- Catalog separate from the data distribution
- Metadata agnostic search methodology
- Data center reliability
- Collection level metadata
- Search buckets
- Strongly typed aggregated search field based on
library concepts - Facilitate quick/easy ingest of collections
- Abstract, searchable indexes
18ADL Architecture Goals (2/2)
- Digital library for georeferenced information
- distributed
- heterogeneous
- rich services
- scalable
- many providers
- collections, large and small
- Standard components, interfaces
19Components/services
collection
collection
item
item
item
many interconnections between services
item
item
20Library Server Architecture
harvest loader
metadata mapper
item tracker
userinterface
client interface (XML / Java,HTTP,RMI)
middleware access control query fan-out query
result caching ranking collection referencing
registration
collection interface (XML / Java)
21Architecture - Buckets
Buckets
22What is a bucket? (1/3)
- Strongly-typed aggregated search fields based on
library concepts - Similar to Dublin Core, but define allowable
content and search semantics, and are optimized
for geospatial searching - Facilitate quick/easy ingest of collections
- Abstract, searchable indexes
- Location, Time, Type, Format, Originator,
Assigned terms, Subject related text and
Identifiers
23What is a bucket? (2/3)
- Strongly typed, abstract metadata category with
defined search semantics to which source metadata
is mapped - Key properties
- name
- Coverage date
- semantic definition
- The time period to which the item is relevant.
- data type (strictly observed)
- calendar date or range of calendar dates
- syntactic representation (strictly observed)
- ISO 8601
24What is a bucket? (3/3)
- Source metadata is mapped to buckets
- buckets hold not just simple values
- 2001-09-08
- but rather, explicit descriptions of those values
- (FGDC, 1.3, Time period of content,
2001-09-08) - multiple values may be mapped per bucket
- Bucket definition includes search semantics
- defines query terms
- ISO 8601 date range
- defines query operators
- contains, overlaps, is-contained-in
- semantics are slightly fuzzy in certain cases to
accommodate multiple implementations
25Standard buckets
- ADL
- Subject-related text
- Title
- Assigned term
- Originator
- Geographic location
- Coverage date
- Object type
- Format
- Identifier
- Dublin Core
- DC.Subject
- DC.Title
- DC.Subject (qualified)
- DC.Creator DC.Publisher
- DC.Coverage.Spatia
- DC.Coverage.Temporal
- DC.Type
- DC.Format
- DC.Identifier
26Bucket Motivation
- Heterogeneous metadata
- Uniform client services
- Spatial search requires
- Strongly typed search fields
- Optimized for geospatial searching
27Summary
- A bucket is a strongly typed, abstract metadata
category with defined search semantics to which
source metadata is mapped - Supports discovery/search across distributed,
heterogeneous collections that use metadata
structures of their choosing - Supports high-level searching across collections
and supports drill-down searching to the
item-level metadata elements
28Benefits of the Architecture
- Standard Readily-Optimized Search Methodology
- Simplifies Design
- Provides a client with a standard API for
searching different data sources. - Provides a way to discover a changed data
locations. - Scalability
- Scale by upgrading the database
- Scale by distributing the databases
29ADL Metadata
30Collection Ingest Procedure
31San Diego DRG Metadata Processing
32Processes
Organize Metadata
Publisher
Separate shared and unique values for every
record.
Stephen P. Teale Data Center
Shared (Parent)
Assign adl control number
Title of Particular DRG
Unique (Child)
Digital Raster Graphic, DRG of Otay Mesa, CA, 7.5
minute topographic quadrangle.
Create Metadata
Creation of values for required fields for which
we dont have info/metadata.
Search (Visit Teale web pages for DRG production
information)
Original cataloging (access path)
Calculation (determining resolution and footprint)
ADL Metadata
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42Collection-level metadata
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47Alexandria Digital Library
Future Directions
48Core Service Directions
- Lowering the barrier
- metadata management services
- OAI harvest loader
- improved packaging
- Service aggregation via harvesting
- Content-based searches, ranking
- text IR, image texture
- Collection discovery
- Display results over the map - layering
- Storage of user result sets on server
49The Ideal ADL Entry Portal
- The Portal will be
- Easy to use - allows patron to search collection
w/out knowing keywords or jargon - Flexible - to allow users of differing levels of
geographic knowledge to find the data they seek
in the minimal amount of time - Help oriented - if user does not find what s/he
wants, we in MIL will find out and use that
knowledge to develop the collection - Dynamic - so that the user will want to return to
see the latest features, collections and tools - Educational - so that the user can learn to use
the site more effectively - Interesting uncluttered, new data, featured
events
50(No Transcript)
51New Interface Functional Areas
52Summary
- Distributed, service-based architecture
- two search levels
- heterogeneous, native metadata
- rich, uniform services
- Status
- basis of UCSB MIL operational library
- http//webclient.alexandria.ucsb.edu
- downloadable
- http//www.alexandria.ucsb.edu/middleware
- initial full version late 2002
53Stay Tuned
Thanks and
54Contact Information
- Larry Carver, AUL Library Technologies
- carver_at_library.ucsb.edu 805-893-4433
- Catherine Masi, ADL Coordinator
- masi_at_library.ucsb.edu 805-893-7661
- David Valentine, Senior Systems Engineer
- valentine_at_library.ucsb.edu 805-893-4545
55Lessons Learned
- Spatial orientation a problem for some users
help/guidance needed - Crosswalking between different metadata systems
can result in loss of information during
translation - Technical metadata standards are not written with
information discovery in mind - Not every field in a technical metadata standard
will end up being populated - For search and discovery, it is the indexing of
the metadata that is important - Geographic and text queries can cause problems
with query optimizers - Every prototype seems to lose a few features when
moved to the next prototype
56Collection-level aggregation
- Collection-level metadata describes
- buckets supported by the collection
- item-level metadata mappings
- statistical overviews
- item counts
- spatiotemporal coverage histograms
- Example (de-XML-ized)
- in collection foo, the Originator bucket is
supported and the following item fields are
mapped to it - (FGDC, 1.1/8.1, Citation/Originator) 973
items - (USGS DOQ, PRODUCER, Producer) 973 items
- (DC, Creator, Creator) 1249 items
- unknown 6 items
57Searching collections
- Bucket-level
- uniform across all collections
- example
- search all collections for items whose Originator
bucket contains the phrase geological survey - Field-level
- collection-specific
- but discovery and invocation mechanisms are
uniform - functionally equivalent to searching the entire
bucket plus additional constraint - example
- search collection foo for items whose FGDC
1.1/8.1 field within the Originator bucket
contains the phrase
58Components/services
collection
collection
item
item
item
many interconnections between services
item
item
59ADL Middleware Details
60Middleware server
logical view
client
collection discovery service
local access point standard services access
control thin client support distributed
search brokering of queries results
proxying of collections items creation
organiza- tion of local collections
collection
collection
thesaurus/ vocabulary
item
item
item
functional view
61Interoperability problem
- Distributed, heterogeneous collections
- locally, autonomously created and managed
- Minimal requirements on collection providers
- allow use of native metadata
- Provide uniform client services
- common high-level interface across collections
- structured means of discovering and exploiting
(possibly collection-specific) lower-level
interfaces - Assumptions
- items have metadata
- items have sufficient, good metadata
- i.e., this is a metadata interoperability problem
62ADL core buckets (1/6)
- Subject-related text
- Title
- Assigned term
- Originator
- Geographic location
- Coverage date
- Object type
- Feature type
- Format
- Identifier
Aggregated indexes
63ADL core buckets (2/6)
- Subject-related text
- type textual
- description text indicative of the subject of
the item, not necessarily from controlled
vocabularies - superset of Title and Assigned term
- multiple values concatenated
- compare DC.Subject
- Title
- type textual
- description the items title
- subset of Subject-related text
- multiple values concatenated
- compare DC.Title
64ADL core buckets (3/6)
- Assigned term
- type textual
- description subject-related terms from
controlled vocabularies - subset of Subject-related text
- multiple values concatenated
- compare qualified DC.Subject
- Originator
- type textual
- description names of entities related to the
origination of the item - multiple values concatenated
- compare DC.Creator DC.Publisher
65ADL core buckets (4/6)
- Geographic location
- type spatial
- description the subset of the Earths surface to
which the item is relevant - multiple values unioned
- compare DC.Coverage.Spatial
- Coverage date
- type temporal
- description the calendar dates to which the item
is relevant - multiple values unioned
- compare DC.Coverage.Temporal
66ADL core buckets (5/6)
- Object type
- type hierarchical
- vocabulary ADL Object Type Thesaurus (image,
map, thesis, sound recording, etc.) - multiple values unioned
- compare DC.Type
- Feature type
- type hierarchical
- vocabulary ADL Feature Type Thesaurus (river,
mountain, park, city, etc.) - multiple values unioned
- compare none
67ADL core buckets (6/6)
- Format
- type hierarchical
- vocabulary ADL Object Format Thesaurus (loosely
based on MIME) - multiple values unioned
- compare DC.Format
- Identifier
- type qualified textual
- description names and codes that function as
unique identifiers - multiple values treated separately
- compare DC.Identifier
68(No Transcript)
69Target Audience Phase 1 Simplified Interface
- 1. UCSB students, faculty and staff
- 2. University of California students, faculty
and staff - 3. Researchers
- 4. Other academic institutions
- 5. GIS/Map producers
- 6. Non-UC/Casual users/Other local
clients/General web users
70ADL Does
- Provides quick and accurate answers to the
question "What data is available for this
geographic area? - Provides both, online spatial content and
metadata of library holdings for local and
distributed collections. - Internet discovery, access and delivery.
- ADL may be searched using background maps, other
imagery, as well as by geographic placenames. - The ADL project has two venues an operational
library run by the Davidson Library and a
research component (ADEPT) funded by NSF and
others.
71 Future Research Tool Initiatives
- Simplified search interface
- Searching by content
- Pattern matching using image segmentation and
texture - Image registration for automatic time sequence
comparison and geo-rectification - Security based access to data objects
(smartcard?) - Easy interfaces for integrating spatially
referenced data with other types of information
sources - Improved gazetteer services for geo-coding
72Goals of the ADL project
- Provide geospatial access to all classes of
information - Provide access to both library and personal
collections - Provide supporting information services for
- research
- Learning
- Part of distributed (spatial) information
infrastructure - Position UCSB as a national leader in geospatial
information
73Constraint types
- Spatial
- overlaps, contains, ...
- lat/lon polygon, box
- Temporal
- overlaps, contains, ...
- date range
- Numeric
- lt, , gt,
- real number
- optional unit of measure
- Textual
- contains phrase, ...
- word list
- Hierarchical
- is a
- thesaurus term
- Identification
- matches
- string, optionally namespace-qualified
Booleans AND, OR, AND NOT
74ADL Interoperability Architecture
Logical View -
client
collection discovery service
collection
collection
75Bucket mapping
Originator
FGDC Citation/Originator
USGS DOQ Producer
76(No Transcript)
77Collection discovery
- Collection registry polls known library servers
- Relevance model
- binary
- more is better
- Query language
- range searching over space, time, vocabulary
terms - subset of item-level query language
- Limitations
- no joint constraint conditions
- no text statistics à la STARTS
- multiple, overlapping vocabularies
78Architecture
- Search buckets
- Abstract, searchable indexes
- Similar to Dublin Core, but buckets define
allowable content and search semantics, and are
optimized for geospatial searching - Designed to be easy for populating collections
- Includes all traditional library search elements
79Data model
- Collection
- name
- static, dynamic metadata
- set of items
- functional behaviors
- Item
- identifier
- bucket view
- searchable metadata mapped to standard, typed
buckets - browse view
- content abstracts
- Item, contd
- access view
- multiple access points
- file-like
- human interface
- programmatic service
- offline
- other views
- collection- and/or item-specific
- FGDC, MARC, etc.
- content
80The Users Viewpoint
What I want to know when searching
- What exists?
- Where it is located?
- Is it useful given my needs?
- How do I get it?
- Is it in a form that I can use?
- Conditions of use?
- Is it in original or altered form?
- How big is the digital file?