Alexandria Digital Library ADL - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Alexandria Digital Library ADL

Description:

Facilitate quick/easy ingest of collections. Abstract, searchable indexes ... Metadata Ingest. Collection Ingest Procedure. San Diego DRG Metadata Processing ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 55
Provided by: catheri135
Category:

less

Transcript and Presenter's Notes

Title: Alexandria Digital Library ADL


1
Alexandria Digital Library (ADL)
What is ADL? What is it suppose to do?
2
ADL Mission
  • To provide a distributed spatially searchable
    digital library of geographically referenced
    materials.
  • The library's components may be distributed
    (spread across the Internet) or coexist within a
    single network or desktop.
  • Geographically-referenced means that all the
    information objects in the library will be
    associated with one or more regions
    ("footprints") on the surface of the Earth.

3
Alexandria Digital Library (ADL)
  • NSF funded digital library project 1994-98
  • New method to organize search for information
  • Focused on geographical information
  • Internet searching and data delivery
  • Operational library 1999-Present
  • 2.8 million bibliographic records
  • 5.5 million place names records
  • 7.5 terabytes of on-line data
  • Available to the public via the Internet

4
What Is Spatial Information?
Museum Artifacts
Art about
Zoological Habitat Study
Geographical Data Archives
Botanical Survey
Earth Science Data
Books about
Archeological Digs
5
What information do you have about here?
6
ADL Organization
  • The ADL project has
  • An operational library run by the Davidson
    Library,
  • A research component (ADEPT) funded by NSF and
    others, and
  • A gazetteer (place name index and geocoder) run
    by the Davidson Library

7
Operational Partners
  • Implementers
  • AUT (Auckland University of Technology)
    Software implementation and content builder
  • DLESE (Digital Library for Earth Systems
    Education) Software implementation and content
    builder
  • CNR (Center for National Research, Pisa Italy)
    Content Builders
  • ADEPT Educational classroom content
  • CASS (Center for the Analysis of Sacred Sites)
    Video, sound, imagery text
  • ESSW MODIS real-time spacecraft imagery
  • Scripps SIOExplorer Oceanographic Data

8
Alexandria Digital Library (ADL)
History
9
Prototypes
  • Rapid Prototype (CD ROM Arc View)
  • Java Application
  • Marc FGDC Union Catalog
  • Web Version 1
  • Search Optimized Fields, AKA Search Buckets
  • Java Application
  • CDL Web Client

10
Marc FGDC Web Prototype (1995)
11
Java Application Prototype (1997)
12
Webclient Interface (2002) 1/2
13
Webclient Interface (2002) 2/2
14
ADL - Web Gazetteer
Printed Report
15
Alexandria Digital Library (ADL)
Current ADL Architecture
16
Common Features of the Prototypes
  • Map
  • Place name search
  • Search definition frame/panel/tab
  • Vocabulary support where appropriate
  • Standardized citation metadata
    display/formatting

17
ADL Architecture Goals (1/2)
  • Catalog separate from the data distribution
  • Metadata agnostic search methodology
  • Data center reliability
  • Collection level metadata
  • Search buckets
  • Strongly typed aggregated search field based on
    library concepts
  • Facilitate quick/easy ingest of collections
  • Abstract, searchable indexes

18
ADL Architecture Goals (2/2)
  • Digital library for georeferenced information
  • distributed
  • heterogeneous
  • rich services
  • scalable
  • many providers
  • collections, large and small
  • Standard components, interfaces

19
Components/services
collection
collection
item
item
item
many interconnections between services
item
item
20
Library Server Architecture
harvest loader
metadata mapper
item tracker
userinterface
client interface (XML / Java,HTTP,RMI)
middleware access control query fan-out query
result caching ranking collection referencing
registration
collection interface (XML / Java)
21
Architecture - Buckets
Buckets
22
What is a bucket? (1/3)
  • Strongly-typed aggregated search fields based on
    library concepts
  • Similar to Dublin Core, but define allowable
    content and search semantics, and are optimized
    for geospatial searching
  • Facilitate quick/easy ingest of collections
  • Abstract, searchable indexes
  • Location, Time, Type, Format, Originator,
    Assigned terms, Subject related text and
    Identifiers

23
What is a bucket? (2/3)
  • Strongly typed, abstract metadata category with
    defined search semantics to which source metadata
    is mapped
  • Key properties
  • name
  • Coverage date
  • semantic definition
  • The time period to which the item is relevant.
  • data type (strictly observed)
  • calendar date or range of calendar dates
  • syntactic representation (strictly observed)
  • ISO 8601

24
What is a bucket? (3/3)
  • Source metadata is mapped to buckets
  • buckets hold not just simple values
  • 2001-09-08
  • but rather, explicit descriptions of those values
  • (FGDC, 1.3, Time period of content,
    2001-09-08)
  • multiple values may be mapped per bucket
  • Bucket definition includes search semantics
  • defines query terms
  • ISO 8601 date range
  • defines query operators
  • contains, overlaps, is-contained-in
  • semantics are slightly fuzzy in certain cases to
    accommodate multiple implementations

25
Standard buckets
  • ADL
  • Subject-related text
  • Title
  • Assigned term
  • Originator
  • Geographic location
  • Coverage date
  • Object type
  • Format
  • Identifier
  • Dublin Core
  • DC.Subject
  • DC.Title
  • DC.Subject (qualified)
  • DC.Creator DC.Publisher
  • DC.Coverage.Spatia
  • DC.Coverage.Temporal
  • DC.Type
  • DC.Format
  • DC.Identifier

26
Bucket Motivation
  • Heterogeneous metadata
  • Uniform client services
  • Spatial search requires
  • Strongly typed search fields
  • Optimized for geospatial searching

27
Summary
  • A bucket is a strongly typed, abstract metadata
    category with defined search semantics to which
    source metadata is mapped
  • Supports discovery/search across distributed,
    heterogeneous collections that use metadata
    structures of their choosing
  • Supports high-level searching across collections
    and supports drill-down searching to the
    item-level metadata elements

28
Benefits of the Architecture
  • Standard Readily-Optimized Search Methodology
  • Simplifies Design
  • Provides a client with a standard API for
    searching different data sources.
  • Provides a way to discover a changed data
    locations.
  • Scalability
  • Scale by upgrading the database
  • Scale by distributing the databases

29
ADL Metadata
  • Metadata Ingest

30
Collection Ingest Procedure
31
San Diego DRG Metadata Processing
32
Processes
Organize Metadata
Publisher
Separate shared and unique values for every
record.
Stephen P. Teale Data Center
Shared (Parent)
Assign adl control number
Title of Particular DRG
Unique (Child)
Digital Raster Graphic, DRG of Otay Mesa, CA, 7.5
minute topographic quadrangle.
Create Metadata
Creation of values for required fields for which
we dont have info/metadata.
Search (Visit Teale web pages for DRG production
information)
Original cataloging (access path)
Calculation (determining resolution and footprint)
ADL Metadata
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Collection-level metadata
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
Alexandria Digital Library
Future Directions
48
Core Service Directions
  • Lowering the barrier
  • metadata management services
  • OAI harvest loader
  • improved packaging
  • Service aggregation via harvesting
  • Content-based searches, ranking
  • text IR, image texture
  • Collection discovery
  • Display results over the map - layering
  • Storage of user result sets on server

49
The Ideal ADL Entry Portal
  • The Portal will be
  • Easy to use - allows patron to search collection
    w/out knowing keywords or jargon
  • Flexible - to allow users of differing levels of
    geographic knowledge to find the data they seek
    in the minimal amount of time
  • Help oriented - if user does not find what s/he
    wants, we in MIL will find out and use that
    knowledge to develop the collection
  • Dynamic - so that the user will want to return to
    see the latest features, collections and tools
  • Educational - so that the user can learn to use
    the site more effectively
  • Interesting uncluttered, new data, featured
    events

50
(No Transcript)
51
New Interface Functional Areas
52
Summary
  • Distributed, service-based architecture
  • two search levels
  • heterogeneous, native metadata
  • rich, uniform services
  • Status
  • basis of UCSB MIL operational library
  • http//webclient.alexandria.ucsb.edu
  • downloadable
  • http//www.alexandria.ucsb.edu/middleware
  • initial full version late 2002

53
Stay Tuned
Thanks and
54
Contact Information
  • Larry Carver, AUL Library Technologies
  • carver_at_library.ucsb.edu 805-893-4433
  • Catherine Masi, ADL Coordinator
  • masi_at_library.ucsb.edu 805-893-7661
  • David Valentine, Senior Systems Engineer
  • valentine_at_library.ucsb.edu 805-893-4545

55
Lessons Learned
  • Spatial orientation a problem for some users
    help/guidance needed
  • Crosswalking between different metadata systems
    can result in loss of information during
    translation
  • Technical metadata standards are not written with
    information discovery in mind
  • Not every field in a technical metadata standard
    will end up being populated
  • For search and discovery, it is the indexing of
    the metadata that is important
  • Geographic and text queries can cause problems
    with query optimizers
  • Every prototype seems to lose a few features when
    moved to the next prototype

56
Collection-level aggregation
  • Collection-level metadata describes
  • buckets supported by the collection
  • item-level metadata mappings
  • statistical overviews
  • item counts
  • spatiotemporal coverage histograms
  • Example (de-XML-ized)
  • in collection foo, the Originator bucket is
    supported and the following item fields are
    mapped to it
  • (FGDC, 1.1/8.1, Citation/Originator) 973
    items
  • (USGS DOQ, PRODUCER, Producer) 973 items
  • (DC, Creator, Creator) 1249 items
  • unknown 6 items

57
Searching collections
  • Bucket-level
  • uniform across all collections
  • example
  • search all collections for items whose Originator
    bucket contains the phrase geological survey
  • Field-level
  • collection-specific
  • but discovery and invocation mechanisms are
    uniform
  • functionally equivalent to searching the entire
    bucket plus additional constraint
  • example
  • search collection foo for items whose FGDC
    1.1/8.1 field within the Originator bucket
    contains the phrase

58
Components/services
collection
collection
item
item
item
many interconnections between services
item
item
59
ADL Middleware Details
60
Middleware server
logical view
client
collection discovery service
local access point standard services access
control thin client support distributed
search brokering of queries results
proxying of collections items creation
organiza- tion of local collections
collection
collection
thesaurus/ vocabulary
item
item
item
functional view
61
Interoperability problem
  • Distributed, heterogeneous collections
  • locally, autonomously created and managed
  • Minimal requirements on collection providers
  • allow use of native metadata
  • Provide uniform client services
  • common high-level interface across collections
  • structured means of discovering and exploiting
    (possibly collection-specific) lower-level
    interfaces
  • Assumptions
  • items have metadata
  • items have sufficient, good metadata
  • i.e., this is a metadata interoperability problem

62
ADL core buckets (1/6)
  • Subject-related text
  • Title
  • Assigned term
  • Originator
  • Geographic location
  • Coverage date
  • Object type
  • Feature type
  • Format
  • Identifier

Aggregated indexes
63
ADL core buckets (2/6)
  • Subject-related text
  • type textual
  • description text indicative of the subject of
    the item, not necessarily from controlled
    vocabularies
  • superset of Title and Assigned term
  • multiple values concatenated
  • compare DC.Subject
  • Title
  • type textual
  • description the items title
  • subset of Subject-related text
  • multiple values concatenated
  • compare DC.Title

64
ADL core buckets (3/6)
  • Assigned term
  • type textual
  • description subject-related terms from
    controlled vocabularies
  • subset of Subject-related text
  • multiple values concatenated
  • compare qualified DC.Subject
  • Originator
  • type textual
  • description names of entities related to the
    origination of the item
  • multiple values concatenated
  • compare DC.Creator DC.Publisher

65
ADL core buckets (4/6)
  • Geographic location
  • type spatial
  • description the subset of the Earths surface to
    which the item is relevant
  • multiple values unioned
  • compare DC.Coverage.Spatial
  • Coverage date
  • type temporal
  • description the calendar dates to which the item
    is relevant
  • multiple values unioned
  • compare DC.Coverage.Temporal

66
ADL core buckets (5/6)
  • Object type
  • type hierarchical
  • vocabulary ADL Object Type Thesaurus (image,
    map, thesis, sound recording, etc.)
  • multiple values unioned
  • compare DC.Type
  • Feature type
  • type hierarchical
  • vocabulary ADL Feature Type Thesaurus (river,
    mountain, park, city, etc.)
  • multiple values unioned
  • compare none

67
ADL core buckets (6/6)
  • Format
  • type hierarchical
  • vocabulary ADL Object Format Thesaurus (loosely
    based on MIME)
  • multiple values unioned
  • compare DC.Format
  • Identifier
  • type qualified textual
  • description names and codes that function as
    unique identifiers
  • multiple values treated separately
  • compare DC.Identifier

68
(No Transcript)
69
Target Audience Phase 1 Simplified Interface
  • 1. UCSB students, faculty and staff
  • 2. University of California students, faculty
    and staff
  • 3. Researchers
  • 4. Other academic institutions
  • 5. GIS/Map producers
  • 6. Non-UC/Casual users/Other local
    clients/General web users

70
ADL Does
  • Provides quick and accurate answers to the
    question "What data is available for this
    geographic area?
  • Provides both, online spatial content and
    metadata of library holdings for local and
    distributed collections.
  • Internet discovery, access and delivery.
  • ADL may be searched using background maps, other
    imagery, as well as by geographic placenames.
  • The ADL project has two venues an operational
    library run by the Davidson Library and a
    research component (ADEPT) funded by NSF and
    others.

71
Future Research Tool Initiatives
  • Simplified search interface
  • Searching by content
  • Pattern matching using image segmentation and
    texture
  • Image registration for automatic time sequence
    comparison and geo-rectification
  • Security based access to data objects
    (smartcard?)
  • Easy interfaces for integrating spatially
    referenced data with other types of information
    sources
  • Improved gazetteer services for geo-coding

72
Goals of the ADL project
  • Provide geospatial access to all classes of
    information
  • Provide access to both library and personal
    collections
  • Provide supporting information services for
  • research
  • Learning
  • Part of distributed (spatial) information
    infrastructure
  • Position UCSB as a national leader in geospatial
    information

73
Constraint types
  • Spatial
  • overlaps, contains, ...
  • lat/lon polygon, box
  • Temporal
  • overlaps, contains, ...
  • date range
  • Numeric
  • lt, , gt,
  • real number
  • optional unit of measure
  • Textual
  • contains phrase, ...
  • word list
  • Hierarchical
  • is a
  • thesaurus term
  • Identification
  • matches
  • string, optionally namespace-qualified

Booleans AND, OR, AND NOT
74
ADL Interoperability Architecture
Logical View -
client
collection discovery service
collection
collection
75
Bucket mapping
Originator
FGDC Citation/Originator
USGS DOQ Producer
76
(No Transcript)
77
Collection discovery
  • Collection registry polls known library servers
  • Relevance model
  • binary
  • more is better
  • Query language
  • range searching over space, time, vocabulary
    terms
  • subset of item-level query language
  • Limitations
  • no joint constraint conditions
  • no text statistics à la STARTS
  • multiple, overlapping vocabularies

78
Architecture
  • Search buckets
  • Abstract, searchable indexes
  • Similar to Dublin Core, but buckets define
    allowable content and search semantics, and are
    optimized for geospatial searching
  • Designed to be easy for populating collections
  • Includes all traditional library search elements

79
Data model
  • Collection
  • name
  • static, dynamic metadata
  • set of items
  • functional behaviors
  • Item
  • identifier
  • bucket view
  • searchable metadata mapped to standard, typed
    buckets
  • browse view
  • content abstracts
  • Item, contd
  • access view
  • multiple access points
  • file-like
  • human interface
  • programmatic service
  • offline
  • other views
  • collection- and/or item-specific
  • FGDC, MARC, etc.
  • content

80
The Users Viewpoint
What I want to know when searching
  • What exists?
  • Where it is located?
  • Is it useful given my needs?
  • How do I get it?
  • Is it in a form that I can use?
  • Conditions of use?
  • Is it in original or altered form?
  • How big is the digital file?
Write a Comment
User Comments (0)
About PowerShow.com