Stanford InterLib Technologies - PowerPoint PPT Presentation

About This Presentation
Title:

Stanford InterLib Technologies

Description:

Stanford InterLib Technologies Hector Garcia-Molina and the Stanford DigLib Team Stanford Digital Libraries Team Faculty: Dan Boneh, Hector Garcia-Molina, Terry ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 56
Provided by: HectorGar2
Category:

less

Transcript and Presenter's Notes

Title: Stanford InterLib Technologies


1
Stanford InterLib Technologies
  • Hector Garcia-Molina
  • and the Stanford DigLib Team

2
Stanford Digital Libraries Team
  • Faculty
  • Dan Boneh, Hector Garcia-Molina, Terry Winograd
  • Research Scientist
  • Andreas Paepcke
  • Librarians
  • Vicky Reich, Rebecca Wesley
  • Partners
  • InterLib Partners, ACM, Dialog, Hitachi, IBM,
    Intel, Microsoft, NASA Ames Library, Stanford
    Libraries,SUL HighWire Press, Xerox

3
Barriers to Effective DLs
Physical Barriers
Economic Concerns
Information Loss
Information Overload
Service Heterogeneity
4
Thrusts
Physical Barriers
  • Mobile Access

Economic Concerns
  • IP Infrastructure

Information Loss
  • Archival Repository

Information Overload
  • Value Filtering

Service Heterogeneity
  • Interoperability

5
DL Interoperability Challenges
  • Growing number of players, formats, countries,...
  • Repositories ? Services
  • Dynamic artifacts
  • Reliability

6
DL Interoperability Challenges
  • Growing number of players, formats, countries,...
  • Repositories ? Services
  • Dynamic artifacts
  • Reliability

Solution InfoBus ? InterServ
7
InfoBus Example
Q Find Ti distributed (W) systems
Query Trans
Meta Data
Con- tracts
DLite
Gloss
U-Pai
Dialog Proxy
Folio Proxy
DigiCash Proxy
F.V. Proxy
F.V.
Folio
Dialog
DigiCash
8
InfoBus Example
Q Find Ti distributed (W) systems
Suggested Folio, Dialog
Query Trans
Meta Data
Con- tracts
DLite
Gloss
U-Pai
Dialog Proxy
Folio Proxy
DigiCash Proxy
F.V. Proxy
F.V.
Folio
Dialog
DigiCash
9
InfoBus Example
Q Find Ti distributed (W) systems
Query Translation
Query Trans
Meta Data
Con- tracts
DLite
Gloss
U-Pai
Dialog Proxy
Folio Proxy
DigiCash Proxy
F.V. Proxy
F.V.
Folio
Dialog
DigiCash
Q Find Ti distributed AND systems
10
InfoBus Example
Q Find Ti distributed (W) systems
Pay per View
Query Trans
Meta Data
Con- tracts
DLite
Gloss
U-Pai
Dialog Proxy
Folio Proxy
DigiCash Proxy
F.V. Proxy
F.V.
Folio
Dialog
DigiCash
11
InterServ
Dynamic Artifacts
Services
Sophistication
Perpetual Activity
InfoBus
InfoBus Pro
Maturity
12
Perpetual Activity Service
Service
register
P.A.S.
User Request
state plans
13
Perpetual Activity Service
Service
register
restart service, use alternate
P.A.S.
check
check
User Request
restore state, try alternatives
state plans
14
SDLIP
  • Simple Digital Library Interoperability Protocol
  • Goal get InterLib (and DLI2) to interoperate!!

15
Search Protocol Initial Goals
  • Trivial to implement!
  • Works over CORBA/COM, DASL/HTTP
  • Use XML
  • Does not prescribe query format
  • Does not prescribe result format
  • Small footprint (Desktop/Laptop/PDA)
  • Allows for stateful or stateless operation

But lets you say whatyoure using
16
Interface Consists of Four Components
17
SDLIP Status
  • Design Meeting June 22, 1999

18
SDLIP Status
  • Design Meeting June 22, 1999
  • Client Server Toolkits Available
  • Extensive Documentation
  • Seehttp//www-diglib.Stanford.EDU/testbed/doc2/S
    DLIP/

19
Current SDLIP Sources
  • Some Web sources
  • People Lookup www.switchboard.com
  • Altavista
  • IMDB (movies)
  • NCSTRL services www.ncstrl.org
  • Dienst compliant services, e.g., CoRR?
  • Z39.50 servers
  • e.g., Library of Congress
  • Stanford WebBase
  • CDL
  • e.g., MELVYL gateway
  • DASL-compliant servers

20
Existing Clients
  • Java
  • command line
  • applet
  • C
  • Palm Pilot
  • TCL (Ray Larson)
  • DASL-compliant clients

21
Filtering Challenges
  • Too much information
  • Not controlled

22
Current Filtering
textual similarity
23
Page Rank Filtering
textual similarity
page rank (Google)
24
Initial Page Rank
1
4
25
Recursive Page Rank
2
1
2
1212 6
4
1
6
26
Value Filtering
access
textual similarity
opinions
page rank
context
geography
27
Value Filtering Challenges
  • Collection of Value Information
  • Scalability
  • Privacy of Value Information
  • Understanding Page Rank
  • Searching Non-Text Objects
  • Combining Value Information
  • HCI Aspects

28
WebBase Goals
  • Manage very large collections of Web pages
  • Enable large-scale Web-related research
  • Locally provide a significant portion of the Web
  • Efficient wide-area Web data distribution

29
Challenges
  • Huge information space
  • Wide area distribution
  • URL space (to remember while crawling)
  • Web content (to store)
  • Limited resources
  • Disk
  • Time
  • Memory
  • Bandwidth
  • Server administrator tolerance
  • Continuous evolution
  • More pages
  • Pages change/disappear
  • Mirror sites installed
  • Keeping data fresh
  • Crawling issues
  • Data fiefdoms firewalls access permissions
    load controls
  • Overhead per site DNS lookups processing
    robots.txt
  • Parallelization
  • Ability to interrupt restart

30
WebBase Architecture
Client
Client
Webbase API
WWW
Retrieval Indexes
Feature Repository
Repository
Multicast Engine
Client
Client
Client
Client
31
Mobile Access Challenges
  • Limited Resources
  • Transitions Between Devices
  • Exploiting Context

32
Mobile Access Challenges
  • Limited Resources
  • Transitions Between Devices
  • Exploiting Context
  • Solutions
  • Power Browsing
  • Information Tiles
  • Information Paging

33
Power Browsing
?
34
Power Browsing
?
  • Techniques
  • Show only text headers
  • Show URLs, anchors, titles
  • Order URLs by page rank
  • Summarize text
  • Summarize set of pages
  • Low-resolution pictures
  • Display relevant text
  • ...

35
PowerBrowser - Start Screen
36
PowerBrowser - Hypertext View
37
PowerBrowser - Text View
38
PowerBrowser - History
39
IP Management Challenges
  • Heterogeneity
  • Complexity of Interactions
  • Varied Information Appliances
  • Mobile Access
  • Security/Privacy

40
Fundamental Problem
  • Safeguards (security, privacy, authentication,
    payment, non-repudiation...) are afterthought
  • Spaghetti code for safeguards
  • Experience at Stanford
  • InterPay, CommPacts, Copy Detection
  • Goal was interoperability
  • Correctness, complexity were problems

41
Example Simple Pay Per View
transfer(amt, account, libAccount)
patron
library
bank
view(docId, account, amt)
42
Example Simple Payment
transfer(amt, account, libAccount)
patron
library
bank
view(docId, account, amt)
  • Goals
  • Do not want others to see data
  • Do not want library to see account number
  • Need receipt from bank

43
Example Simple Payment
transfer(amt, account, libAccount)
patron
library
bank
view(docId, account, amt)
  • Goals
  • Do not want others to see data
  • Do not want library to see account number
  • Need receipt from bank

Result A Mess!!
44
Declarative Safeguards for DLs
  • Safeguards built in at system design time
  • Declare goals, not mechanisms
  • Players, data, ...
  • Who can see what, who can do what, ...(Note
    access information can also be protected)

Secure DLs
Components IP Mgmt, Wallets, ...
Declarative Infrastructure
45
Solution
  • Extended Interface Definition Language
  • Corba or D-COM like
  • Example

class artRecord authorized(policy)
setOwner(encrypted string ownerName,
encrypted(bank) int price,
picture pic )
46
Declarative Safeguards for DLs
Secure DLs
Components IP Mgmt, Wallets, ...
Declarative Infrastructure
47
Information Preservation Challenges
  • Preserving the Bits
  • Evolving hardware
  • Evolving software
  • Evolving organizations
  • Preserving the Meaning

48
Stanford Archival Repository
  • Object Identifier ? Signature

handle
  • No Deletions (never ever!)

set
new version?
49
Repository Layers
Intellectual Property
Indexing, Naming
Reliability
Complex Objects
Identity
Object Store
50
Archiving the Web - Problem
users
Web Server
File System
51
Archiving the Web - One Solution
users
Web Server
Archival Repository
File System
52
Archiving the Web - Our Solution
users
users
Web Server
Archival Repository
InfoMonitor
File System
53
InfoMonitor History View
54
InfoMonitor Snapshot View
55
Stanford InterLib Technologies
Physical Barriers
  • Mobile Access

Economic Concerns
  • IP Infrastructure

Information Loss
  • Archival Repository

Information Overload
  • Value Filtering

Service Heterogeneity
  • Interoperability
Write a Comment
User Comments (0)
About PowerShow.com