Title: ALEXANDRIA DIGITAL LIBRARY PROJECT
1ALEXANDRIA DIGITAL LIBRARY PROJECT
- Larry Carver ? James Frew ? Greg Janée
- Mike Goodchild ? Linda Hill ? Terry Smith
- www.alexandria.ucsb.edu
2Outline
- Alexandria Digital Library Project (ADLP)
- History
- Goals, activities, partners
- Distributed DL supporting georeferenced access
- Research and development issues
- Operational collections and services
- Knowledge organization systems (KOS)
- Gazetteers and related KOS
- ADEPT learning environment
- Concept-based learning spaces
- Collections and services
3ADLP History
- Pre-1994 UCSB geo-information and map library
- 1994-98 DLI-1 georeferenced collections/access
- 1998-99 Operational ADL (UCSB Library/CDL)
- 1999-2004 DLI-2 distributed DL
- Extension of architecture and access services
- Knowledge organization services
- Integration of learning services
- Geo/GIS-based interfaces
- Basic CS research
- 2004-2008 Large-scale DLs and beyond
- NSDL Core Infrastructure and services
- Cyber Infrastructure
4ADLP Goals
- Current goals Distributed DLs and applications
- Operational distributed digital library
- services for construction/use of georeferenced
collections - DL federation and interoperation
- scalability over many heterogeneous collections
- Development/integration of KOS services
- Integration of concept-based learning spaces
- services for creating/using learning environments
- Development of geo-based interfaces
- Evaluation of services
- Basic computational science research
- Emerging goals Large-scale DLs and beyond
- Extending NSDL Core Infrastructure and services
- Cyber Infrastructure
5ADLP Major Collaborative Activities
- 1994-98
- 4 DLI-1 partners CMU, Illinois, Stanford, UCB
- SDSC, U.Arizona, US Navy, NIMA, LoC, MSFT, ESRI,
- 1999-2004
- UCSB Library, CDL
- DLI-2 partners UCLA, GT, SDSC/NPACI, Stanford,
UCB - DLESE
- NSDL CI partners Cornell, Columbia, U.Mass
- NSDL Services partners IIT Chicago, UCSD
- JISC partners Penn State, Southampton, Leeds
6ADLP Activities
7Outline
- Alexandria Digital Library Project (ADLP)
- History
- Goals, activities, partners
- Distributed DL supporting georeferenced access
- Research and development issues
- Operational collections and services
- Knowledge organization systems (KOS)
- Gazetteers and related KOS
- ADEPT learning environment
- Concept-based learning spaces
- Collections and services
8Goals
- Digital library architecture for
geospatial/georeferenced information - heterogeneous
- rich services
- scalable
- many providers
- collections, large and small
- DL infrastructure, not artifact
- standard components and interfaces
- distributed participants
9Issue discovery
- Naïve approach
- I want a map of Boulder
- ? Downtown street map of Boulder, Colorado
- But... remote-sensing imagery is nameless
- AVHRR NOAA-13 2002-06-03 1433 UTC
- But... direct placename search is unreliable
- I want a map of the Flatirons in the Rocky
Mountains just behind Boulder, Colorado - ? USGS topographic map Eldorado Springs
- generally many names for any given place
10ADL approach
- Coordinate-based representation and discovery
- lat/lon coordinates
- rich geometry
- polygons, polylines
- spatial operators
- overlaps, contains
- Gazetteer
- content standard defines representation
- service maps placenames ? coordinates
placenames
client
gazetteer
coordinates
library
11Issue multiple data types
- Geospatial discovery is not amenable to text
treatment - constitutes new data type
- Adding notion of different data types has many
implications - input validation
- internal structures, external representations
- query language and processing
- ranking
- user interface components
12ADL approach
- Discovery bucket framework
- extensible data type system for metadata
- XML representations, search operations
- native metadata is explicitly mapped to buckets
- software supports bucket views over arbitrary
RDBMSs - 9 Dublin Core-like standard buckets
- User interface components
- background maps, item footprint
identification/creation - Spatial ranking
- by spatial similarity to query region
13Bucket mapping
Originator
FGDC Citation/Originator
U.S. Geological Survey
USGS DOQ Producer
Photo Science, Inc.
14Collection statistics
Spatial
Temporal
Object Type cartographic works maps images
photographs aerial photographs
Count 324,876 324,876 2,014,799 484,083 484,083
15ADL approach
- Discovery bucket framework
- extensible data type system for metadata
- XML representations, search operations
- native metadata is explicitly mapped to buckets
- software supports bucket views over arbitrary
RDBMSs - 9 Dublin Core-like standard buckets
- User interface components
- background maps, item footprint
identification/creation - Spatial ranking
- by spatial similarity to query region
16ADL in context
ADL
affordances
DLs
17Issue scalability
- Size
- easy to accumulate lots of data
- satellites image continuously
- geospatial discovery scales... not so well
- indexing unwieldy at 106 items
- efficiently joining spatial, other constraint
types is difficult - Burden management
- collection building is labor-intensive
- providers have differing content, services, IP
concerns, policies, lifetimes - providers already exist
- MS Terraserver 3 TB, 750 million items
18ADL approach
- Distributed library of peer nodes
- library nodes host collections
- other nodes host gazetteers, thesauri, other KOS
- other components, e.g., map servers
- Federated item-level search
- over buckets
- over individual metadata fields mapped to buckets
- Centralized collection-level search/ranking
- over collection statistics derived from bucket
mappings - space, time, type, format
- any library node can act as collection registry
- Collection aggregation
19Issue context use of library items
- Context is critical in geospatial DLs
- formulating queries
- evaluating result sets and individual results
- Use of geospatial data
- need access descriptions
- item content ? single URL is insufficient
- multiple formats
- multiple access methods
- multiple components
- need integration with common data environments
- ARC/INFO, etc.
20Geospatial context
- Does this answer your question?
21ADL approach
- All library functionality is accessible via...
- web service APIs
- Java RMI
- Content access model
- characterizes methods of access
- multiple access points
- download, service, web interface, offline
- hierarchies of alternatives, decompositions
- Context
- background maps
- library-supplied lightweight GIS functionality
22Incorporation into NSDL/CI
- Geospatial/georeferenced data is an instance of
science data - complex, well-defined structure
- rich metadata
- large size
- poorly served by traditional information
retrieval methods - Science data belongs in NSDL
- For NSDL comparable infrastructure enabling...
- distributed, content-specific search services
- association of DL items and content-specific
helper tools
23Operational status
- ADL co-developed with UCSB Library
- production-quality software
- foundation of operational library since 2000
- complete system in 2003
- UCSB Library Map Imagery Laboratory (MIL)
- self-supporting, 5 full-time employees
- 2.6 million items, 6.5 TB, growing 1.5 TB/year
- 4.5 million item gazetteer
- Remote sites
- ESSW, CNR, DLESE, SIO, NTNU, AUT
24Outline
- Alexandria Digital Library Project (ADLP)
- History
- Goals, activities, partners
- Distributed DL supporting georeferenced access
- Research and development issues
- Operational collections and services
- Knowledge organization systems (KOS)
- Gazetteers and related KOS
- ADEPT learning environment
- Concept-based learning spaces
- Collections and services
25KOS activities contributions
- KOS as primary components of DL architecture
- Heretofore not acknowledged as a major component
- ADL/ADEPT thesaurus and gazetteer service
protocols - Gazetteer components of DLs
- Growth of a research and development community,
adopting/adapting/sharing our ADL Gazetteer
components - Gazetteer research issues
- NSDL Textual Geospatial Integration Project
- KOS integration into learning environments
- Terry Smith will address this in detail
26Digital Library Components
CATALOG OF METADATA
27KOS Generalization
Concept
Type
Definition
Label
Relationships
Meaning
Navigation
Translation
Sense-making
28Digital Gazetteer Essentials
Name
- None of these elements are unique identifiers of
a particular place
29Building gazetteer research community
- 1994-1996 ADL built the first multi-million-entry
international gazetteer and integrated it into
the ADL system - 1996-1999 ADL created...
- Gazetteer Content Standard
- Feature Type Thesaurus (210 preferred terms 1046
non-preferred) - rebuilt the ADL Gazetteer (over 4 million
entries) - provided web interfaces for searching the ADL
Gazetteer
30Building a research community
- Set of 5.9 million geographic names available for
download useful for placename recognition in
text - Gazetteer Service Protocol and protocol server
code - An external identifier for ADL Gazetteer
records - New gazetteer client that is based on the
gazetteer protocol
31Our network of gazetteer interactions
32Advancing and extending gazetteers
33Advancing and extending gazetteers
34Advancing and extending gazetteers
Obtaining extents from image analysis
- Recognizing patterns
- Identifying features from gazetteers
- Deriving the extent of the features from feature
analysis - Adding bounding box footprints to gazetteer
entries
Santa Barbara Municipal Airport
35Advancing and extending gazetteers
The duplicate detection problem. Given variant
names and variant footprints, how do we determine
that two pieces of information are about the same
place?
36Advancing and extending gazetteers
37Gazetteer ITR Proposal
- Advancing and Extending Georeferencing
Interoperability and Services (AEGIS) - Medium ITR proposal for 2003
- Michael Goodchild, UCSB, PI
- Lewis Lancaster, Berkeley/ECAI, co-PI
- Formalization and extension
- Performance and scalability
- Cross-cultural issues
- Cognitive and behavior issues
- Extents representation of a features geometry
- Integration of locator services
38NSDL Textual Geospatial Integration
2001 - 2003
- Goals
- Extend NSDL infrastructure by enabling
- geographic queries
- across heterogeneous, text and non-text resources
- spatial georeferencing
- of arbitrary texts without explicit geographic
cataloging
- Participants
- University of California, Santa Barbara
- James Frew, PI
- Terence Smith
- Michael Bueno
- Linda Hill
- Information Retrieval Lab, Illinois Institute of
Technology - Ophir Frieder
- David Grossman
- Eric Jensen
- Steve Beitzel
The American Geological Institute (AGI) has
permitted us to use a set of their GeoRef records
for system training.
39Example text -gt Estimated footprint
Structure and petrography of the schist of
Skookum Gulch, Callahan-Yreka area, eastern
Klamath Mountains, Northern California ltkeygtbluesc
hist California Callahan California
foliation Klamath Mountains melange
metamorphic rocks Ordovician Paleozoic
petrology schists Silurian Siskiyou County
California Skookum Gulch United States
Yreka Californialt/keygt ltabgtThe schist of Skookum
Gulch (SSG) is an informal name applied to a
fault-bounded melange composed mainly of
schistose metamorphic rocks and less abundant
sedimentary and igneous rocks located in the
eastern Klamath Mountains of Northern California.
The SSG features outcrops of lawsonitesodic
amphibole blueschist and epidotesodic amphibole
rocks transitional to the greenschist facies.
Isotopic dating indicates that the schist was
metamorphosed during the Ordovician. The SSG is
the oldest known Paleozoic blueschist-bearing
melange in California and one of the oldest
preserved blueschist terranes in North America.
Tonalitic rocks associated with the schist have
Early Cambrian ages and are among the oldest
rocks yet dated within the Klamath Mountains.
Field relations indicate that the schist of
Skookum Gulch is a complex tectonic melange
composed of metavolcanic, ...lt/abgt ltcoordgtN410000N
420000W1220000W1230000lt/coordgt
- Derived footprint - small
- Blue derived footprint large
- Red GeoRef footprint
40KOS activities contributions
- KOS as primary components of DL architecture
- Heretofore not acknowledged as a major component
- ADL/ADEPT thesaurus and gazetteer service
protocols - Gazetteer components of DLs
- Growth of a research and development community,
adopting/adapting/sharing our ADL Gazetteer
components - Research issues
- NSDL Textual Geospatial Integration Project
- KOS integration into learning environments
- Terry Smith will address this in detail
41Outline
- Alexandria Digital Library Project (ADLP)
- History
- Goals, activities, partners
- Distributed DL supporting georeferenced access
- Research and development issues
- Operational collections and services
- Knowledge organization systems (KOS)
- Gazetteers and related KOS
- ADEPT learning environment
- Concept-based learning spaces
- Collections and services
42Applications services based on DLs
- Integrate applications with DL infrastructure
- Web portals lack library organization
- packages not integrated with DLs
- Important applications include
- Services/collections supporting learning
environments - Services/collection supporting research
- Apply domain-specific KOS principles for
organizing collections/services for given
application - Geospatial applications use georeference
- Science learning environments use concept spaces
43Science learning spaces Concept KOS
- Concepts of science as basic knowledge granules
- Sets of concepts form bases for scientific
representation - DL and KOS technology can support organization of
science learning materials in terms of concepts - Collections of models of science concepts
(knowledge base) - Collections of learning objects (LO) cataloged
with concepts - Collections of instructional materials organized
by concepts - Organize learning materials as trajectory
through concept space - Lecture, lab, self-paced materials
- Services for creating/editing/displaying such
materials
44Learning environment components/services
45Application to learning environments
- Application
- Introductory physical geography (F2002, S2003)
- Collections created
- Knowledge base (KB) of strongly structured
concepts - Structured lectures and labs
- Learning objects cataloged by ADN metadata (
concepts) - Services created
- For concepts
- Web-based concept input tool
- Graphic and text-based display tools
- For instructional materials
- Web-based lecture composer
- Conceptualization graphing tool
- For learning objects
- Metadata input tool
46Learning environment display (lecture mode)
- The lecture is presented on three projection
screens, showing the - Concept window (left)
- Lecture window (center)
- Object window (right)
47Model of science concepts
- Representing a concept involves more than terms
- Objective, information-rich, scientific
representations - e.g., for concepts of heat diffusion, DNA,
drainage basin, - Associated semantics
- e.g., relating to measurement, recognition,
- Many interrelationships
- e.g., hierarchical, causative, property,
- Models of science concepts
- Already exist for chemistry (ASA), materials
(NIST), - Generalize such models for this application
- Structure items in concept KB using model
48Model of science concepts
- ID
- TYPE and FACET
- CONTEXT (KNOWLEDGE DOMAIN)
- TERM(S) (P/NP)
- DESCRIPTION(S)
- HISTORICAL ORIGIN(S)
- EXAMPLE(S)
- HIERARCHICAL RELATIONS
- DEFINING OPERATIONS
- SCIENTIFIC REPRESENTATION(S)
- Scientific classifications
- Data/Graphical/Mathematical/Computational reps
- PROPERTIES
- CAUSAL RELATIONS
- CO-RELATIONS
- APPLICATION(S)
49Item in concept knowledge base
50Concept input tool
51Collections of learning materials
- Lecture/lab composer
- Creates learning materials with
- Tailorable structure
- Underlying organization as forest of trees of
concepts - Small reusable granules for
- Easy creation/edit/access/re-use
- Can link in
- Concepts from concept KB
- Items from learning object collections
- Items from lecture collection
52Current instructional material window
- The left-hand frame displays the structure of the
lecture - The right-hand frame displays the content of the
lecture - ADL icons (globe image) attached to a concept
link to a display of concept properties in the
concept window
Other icons attached to a concept link to a
display of concept examples in the illustration
window
53View of learning material by concepts
54Lecture/lab/ composer tool
55Learning object collections
- Cataloged with tool for metadata creation
- ADN metadata content standard with concept fields
- Use of ADL/ADEPT middleware search services
- E.g., in creation of lecture/lab presentation
materials - Display of collection items in collection window
- Photos, images, maps, text, videos,
- Support in display window for ADL browser
- Allows dynamic search of collection holdings
56The illustrations window
57Evaluation of concept-based approach
- Evaluation of efficacy for student learning
- Do students attain deeper levels of
understanding? - Comparison approach to evaluation
- Evaluation of value to instructors/TAs
- UCLA evaluation team
- Evaluation issues
- Instrumenting students use of course materials
- Time to assess pedagogic value of approach
58Example of lessons learned
- Importance of conceptualizations of concept
- e.g., characterize concept of Fluvial Landscape
with concepts of River, Watershed - Embed conceptualizations in lecture/labs (not in
KB) - Idea of learning materials as trees in concept
space - Construct labs using analogous lab composer
- Tailored for lab presentations/work
- Supports of logic of using concepts as framework
- Can import material from lecture/other
collections
59Summary
- DL infrastructure as basis for Learning
Environments - Collections
- Concept KBs, Lectures, DL objects
- Services
- Creation/Search/Display
- Evaluation of efficacy of approach
- Community-based development of KBs, Learning
Materials, Collections
60ADLP Activities