Title: Digital Object Storage and Retrieval (DOSR) Vision
1Digital Object Storage and Retrieval(DOSR)Vision
2Disclaimer
This presentation discusses areas of technology
investigation and interest. It does not relate to
any existing DARPA program, nor should it be
inferred to anticipate a future DARPA program.
3The Mundaneum
- In 1910 Belgians Paul Otlet and future Nobel
Peace Prize laureate Henri La Fontaine opened the
Palais Mondial, later renamed the Mundaneum. - The Mundaneums mission was to collect metadata
on every book, journal, and periodical ever
published and record it in a card file system
that embodied what we would call a faceted
classification scheme. By 1934 it contained over
15 million entries. - Unique identifiers included embedded links to
related documents. - Staff responded to search requests received by
post and telegraph and returned hand-copied cards
by post. - In 1934 Otlet conceived a global network of
electric telescopes that would allow people to
search and browse through interlinked documents,
images, audio and motion picture recordings. He
wrote that, from his armchair, everyone will
hear, see, participate, will even be able to
applaud, give ovations, sing in the chorus, add
his cries of participation to those of all the
others.
4DOSR Vision
- Create a resilient, distributed, scalable, and
secure network of information that does not
require a completely trusted or stable network of
processing nodes employ network overlays, and
advanced cryptographic techniques - Advance the state-of-the art in automated
metadata generation and interoperability apply
machine learning techniques - Automatically get information where it is needed,
or may be needed, using less bandwidth and
processing. integrate user models, compact
information retrieval encodings, and distributed
content delivery - Reliably track where information goes, and where
it came from encapsulate provenance and audit
information in network-maintained virtual
objects - Enable secure, resilient information storage,
characterization, retrieval, and collaboration
across barriers of time, geography, community of
interest, technology, and administrative domain
User and Data Models
Automated Metadata Generation
What we can find defines what we can do
Photos courtesy of U.S. Army, U.S. Navy
5Hard Problems
-
- Automated metadata extraction and generation
- DoD has many stovepipe systems with limited
metadata - Automatic extraction of metadata, especially from
non-textual information is an unsolved problem
requiring some form of artificial intelligence - Email, papers, presentations, forms, databases do
not possess a community-maintained mesh of
reciprocal references, so Google-like search,
relevance, and ranking algorithms do not work - Scalable security for sharable objects
- Decentralized (for scalability) key distribution
systems present security challenges - Protection from known cryptographic and
corruption attacks is hard protection from
unknown attacks is harder - Usable secure sharing (as convenient as email) is
needed or system wont be used - Scalable, revocable group access to synchronized,
encrypted, versioned documents is essential - Scalable replicated storage and parallel data
distribution - Globally unique identifiers (GUIDs) for retrieval
and update are essential, and must be
unbreakable, verifiable, and afford scalable
resolution of a retreivable, trackable object - How to track fragmented and replicated objects
for persistence and provenance - Object replication for secure, scalable,
high-bandwidth distribution (secure
BitTorrent-style) - Enhance resiliency and service in network-poor,
areas - Respond adaptively to service degradation for
high-demand data and large-scale disruptions - Personalization, intelligent agents and user
models - Intelligent agents needed to locate content near
likely users, based on user models - User models based on authorization, active input
and passive tracking
6Key Capabilities
Object 1 Version 1
Replicas and fragments
Retrieve latest version from closest fragments or
replica
- Architecture and protocols
- Protocols for exchanging objects, metadata, and
security controls - Mobile agents and federated requests for
information - Persistence of digital objects
- Distribute replicas and coded fragments
- Global, persistent, verifiable, unique
identifiers (GUIDs) - Version-controlled, collaborative updates
- Trust, security and provenance
- Authorized, authenticated access
- Decentralized encryption for scalability
- Verifiable provenance and tracking of all objects
- Resilience to attacks
- Scalability
- Scale-free architecture
- Decentralized, peer-to-peer techniques
- Manage latency, consistency and security as scale
grows - Metadata and search
- Extract metadata from video, maps, images
- Relevance feedback
Object 1 Version 2 update
Decentralized, scalable key distribution
Scalable resources, storage and participant
networks
Needed objects migrate to local server for user
7Interesting Research Ongoing in
- Automated metadata extraction
- Decentralized, self-configuring, location and
routing - Federated search
- Information retrieval
- Personalization and user models
- Proxy re-encryption
- Scalable security and PKI
- Search over encrypted indexes
- Securing resilient peer-to-peer networks
DOSR Workshop will address these areas
8Preliminary Schedule
July 15 Posters 420 pm Break 440 pm Poster
Session 1 520 pm Poster Session 2 600 pm
Adjourn July 16 Breakouts 900 am Dr. Josh
Alspector - DOSR vision and breakout group
instructions 930 am Breakout group
discussions Noon Lunch 130 pm Brief out Group
1 200 pm Brief out Group 2 230 Break 250
Brief out Group 3 320 Brief out Group 4 345
Plenary Session 415 Adjourn
- July 15 Talks
- 830 am Opening remarks DARPA
- Architecture
- 845 am Dr. Robert Kahn - keynote address
- 915 am Dr. Peter Lucas MAYA
- 935 am Dr. Daniel Crichton NASA
- 955 am Break
- Metadata
- 1015 am Dr. Ajay Divakaran - Sarnoff Corp.
- 1035 am Dr. Randal Burns - JHU
- 1055 am Dr. Shmuel Peleg - HU-J
- 1115 am Mr. Jason Byassee - Northrop Grumman
- Security
- 1135 am Dr. James Allan - U. Mass-Amherst
- 1155 am Dr. Rafail Ostrovsky UCLA
- 1215 pm Lunch
- 140 pm Dr. Urs Muller - Net-Scale Tech.
- 200 pm Dr. Matt Staker - IBM Research
- 220 pm Dr. Angelos Stavrou - Global InfoTek Inc.
9Levels of Success
- DoD adopts system internally
- Portions of system are made available for
open-source uses by Apache - Legal, medical, and financial records management
firms adopt GUIDs, protocols, and system
components - ISPs and media companies adopt GUIDs, protocols,
and system components for subscription services - Amazon, Google and iTunes use GUIDs and
protocols
10Prior Art
- Coda (CMU)
- Cooperative File System (MIT)
- FARSITE (Microsoft)
- Grid (Argonne National Laboratory)
- Lustre (now owned by Sun Microsystems)
- OceanStore (UC Berkeley)
- PASIS (CMU)
- Universal Database (Maya Design)