Digital Libraries: Re-inventing Scholarly Information Dissemination and Use - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Digital Libraries: Re-inventing Scholarly Information Dissemination and Use

Description:

Robert Wilensky (CS & SIMS) David Forsyth (CS) Faculty ... Have begun experimental use by CS Division and SIMS. Image Analysis for Access. BlobWorld ... – PowerPoint PPT presentation

Number of Views:549
Avg rating:3.0/5.0
Slides: 35
Provided by: csBer
Category:

less

Transcript and Presenter's Notes

Title: Digital Libraries: Re-inventing Scholarly Information Dissemination and Use


1
Digital Libraries Re-inventing Scholarly
Information Dissemination and Use
  • Robert Wilensky
  • Principal Investigator
  • David Forsyth
  • Co-principal Investigator
  • The UC Berkeley Digital Library Team

2
Central Thrusts
  • Provide tools to facilitate
  • changing the publishing model
  • from centralized, linear, binary, expensive,
    filter-then-disseminate model,
  • to a much less costly, powerful, fully
    distributed disseminate-filter-collaborate
    cycle
  • without sacrificing good organization, peer
    review
  • treating non-textual material (photos, video,
    maps, primary data sets) as first class citizens

3
Who We Are
  • PI and Co-PI
  • Robert Wilensky (CS SIMS)
  • David Forsyth (CS)
  • Faculty Investigators
  • Richard Fateman (CS)
  • Ray Larson (SIMS)
  • Jitendra Malik (CS)
  • Philip Stark (Statistics)
  • Doug Tygar (CS SIMS)
  • Nancy Van House (SIMS)
  • Hal Varian (SIMS)
  • Marti Hearst (SIMS)
  • James Landay (CS)
  • Joe Hellerstein (CS)
  • Post-docs
  • Kobus Barnard
  • Other Investigators
  • Henry Baird (Xerox PARC)
  • Bernie Hurley (UCB Library)
  • Pinar Duygulu (Middle East Technical University)
  • Students
  • Byunghoon Kang
  • Xiaofeng Ren
  • Sumeet Solanki
  • Staff
  • Ginger Ogle
  • Jeff Anderson-Lee
  • Howard Foster
  • Loretta Willis
  • Joyce Gross
  • Tom Phelps
  • Tracy Riggs
  • Byunghoon Kang
  • Jon Traupman

4
Partners
  • DLIB InterOp Project Partners
  • Stanford, UCSB
  • California Digital Library
  • SDSC
  • Not-for-profits
  • CalFlora
  • California Academy of Science
  • Fine Arts Museum of S.F.
  • California Department of Fish and Game
  • UCB Organizations
  • Museum of Vertebrate Zoology
  • Jebson Herbarium
  • U.C.B. Library
  • U.C.B. Instructional Technology Program
  • Corporate
  • Xerox PARC
  • Hewlett-Packard
  • IBM Almaden
  • NEC
  • SUN Microsystems
  • Microsoft
  • Sharp

5
Some Technology
  • New Document Models
  • Multivalent Documents
  • GIS Viewer
  • Related Tools TilePic and GISLite
  • Collaborative quality filtering as a proxy for
    academic review
  • Robust Linking
  • Personal Libraries
  • Image Analysis
  • Better content-based image analysis
  • Combining image and text
  • Self-administrating Documents
  • Document recognition
  • A turbo recognition DID-based approach to
    document layout analysis.
  • Collections
  • Biologically-related large image and data
    collections
  • Rare books scanning effort

6
Tools for Information Management and Collaboration
  • Multivalent Documents A Platform for New Ideas
  • An Anytime, Anywhere, Any Type, Every Way
    User-Improvable Digital Document Platform
  • Not format-centric. radically extensible to
  • support any format
  • perform standard document functionality
  • implement your new idea
  • Extensions work across all formats.

7
Multivalent Architecture
  • Extensibility achieved by
  • behaviors and layers paradigm
  • behaviors written to conform to an open protocol
  • document tree (that includes UI)
  • So each document can be its own custom browser.
  • Conducive to developing a digital
    library-centric browser
  • E.g., easy to support distributed annotation.

8
Multivalent Status
  • Multivalent Browser, DR4, available beta ASN
  • An open source Java (1.4) application, at
    http//http.cs.berkeley.edu/phelps/Multivalent/
  • standard browser features (cache, UI, bookmarks,
    etc.), robust URL support
  • Implemented behaviors
  • Media Adaptors
  • HTML 3.2 CSS
  • LaTex/DVI
  • ASCII
  • PDF
  • enlivened scanned images
  • multi-page
  • Span hyperlink, highlight, copyeditor
    annotations, anchored ink, style, redaction
  • Lenses show OCR, magnify, cypher, notes,
    rulers, etc.
  • Structural alt. select-and-paste, Notemarks
  • Misc search hit visualization, managers

9
Multivalent Plans
  • Support project goals by providing
  • Complete media adaptors for common document
    formats (esp. HTMLCSS, XML, LaTeX/DVI, PDF)
  • More standard browser features (e.g.,
    hierarchical bookmarks, preference editing)
  • Experiment with history-enriched digital
    objects
  • Mechanisms for manipulating multiple annotations
  • Support from document collaboration services
  • Support for (non-textual) data types
  • temporal and geographic extent, via JMF 2.0
  • involving dynamic elements
  • data set elements

10
Related Image-oriented Tools
  • GIS Viewer
  • 4.0 released to public
  • Related Tools
  • TilePic
  • GIS Lite

11
Robustness The Challenge
  • How do we put together distributed applications
  • that rely on independently administered
    distributed resources
  • which change chaotically
  • yet whose performance degrades gracefully as the
    world changes?
  • One answer Provide multiple, largely
    independent descriptions along uncontrolled
    network boundaries.

12
Robust Linking
  • Robust Locations
  • Refer to locations within a resource, but can
    still be used to find the location after the page
    is edited.
  • Implemented in Multivalent Browser
  • Robust Hyperlinks
  • Refer to whole resources, but can still be used
    to find the resource after the page is moved,
    etc.
  • Available now http//www.cs.berkeley.edu/phelps/
    Robust

13
Robust Hyperlink Example
  • Compute lexical signature of page
    http//www.eng.nsf.gov/engnews/2001/Dec01RobotLego
    s/dec01robotlegos.htm
  • which turns out to be jjarosz lambirth
    telesurgery jarosz simulating
  • Add to URL to make robust URL
  • http//www.eng.nsf.gov/engnews/2001/Dec01RobotLego
    s/dec01robotlegos.htm/?lexical-signature
    jjaroszlambirthtelesurgeryjaroszsimulating
  • Feed signature to a search engine on URL failure

14
Robust Hyperlinks Plans
  • Problem No one wants to bother signing
    anything.
  • Proposal Build a URL-signature data base fail
    over to this upon 404 errors.
  • using Stanfords WebBase

15
Collaborative Quality Filtering
  • Idea Traditional peer review is majorized by a
    good collaborative filtering system.
  • I.e., publishing dissemination collaborative
    quality filtering
  • Approach
  • Good papers are ones good reviewers rate highly,
    etc.
  • Good reviewers are the ones that rate papers
    accurately.
  • Assumption Good reviewers reviews should agree
    with the asymptotic average (looking forwards)
  • Use hubs-and-authorities type algorithm to
    establish credentials.
  • Note
  • Can rate along multiple dimensions, e.g.,
    importance and correctness
  • Later on, can add other factors, e.g.,
    predication of asymptotic citation index,
    credentials, expertise

16
Collaborative Quality Filtering (cont)
  • Simple algorithm predicts users evaluations of
    reviewer in empirical study.
  • Parameters for number of items reviewer has
    reviewed, no. of reviews item receives, rank of
    review.
  • Advanced version incorporating notion of areas of
    expertise being tested.
  • Maintains reviewer ratings on a per document
    basis computes document rating based on
    similarity of documents.
  • Initial implementations in collab. with NEC
    CiteSeer
  • More details are available.

17
Personal Libraries
  • Goal Make it easy for individuals and groups to
    build and manage document collections.
  • Seamlessly incorporating digital-born and legacy
    documents
  • Approach Provide collection manager
  • Manages collections in distributed repositories
  • Initial prototype
  • Supports collection creation, population,
    editing, access by metadata searching, full-text
    indexing.
  • HTML, PDF, ASCII, scanned images, composites
    (prototype)
  • Provides affiliated repository service
  • Scan-to-collection service

18
Personal Libraries (cont)
  • Future directions
  • Incorporate robust linking
  • Full support for composites
  • Automatic collection population
  • OceanStore backend?
  • Have begun experimental use by CS Division and
    SIMS

19
Image Analysis for Access
  • BlobWorld
  • A new framework for segmentation (normalized
    cuts)
  • Shape
  • New algorithm developed for measuring image shape
    similarity using shape contents
  • Now hold world record for handwritten digit
    recognition
  • Combining Image Features with Text for Image Data
    Organization and Search
  • Kobus Barnard and David Forsyth

20
Combining Image Features with Text
  • Idea Use text and image features together
  • Text ? semantic categories
  • Image features ? visual similarity
  • Together ? learn interesting relationships
  • Use statistical models to learn structure
  • Cluster on blobs and (disambiguated,
    hierarchically enhanced) words
  • Arrange clusters in hierarchy
  • Result is automatic organization of large
    collections for
  • Browsing (using GIS Viewer as interface, and
    using TilePic)
  • auto-illustration
  • auto-captioning

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
Combining Image Features with Text
  • And here are some results for labeling image
    segments.

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
Summary
  • We need to rethink the entire cycle of
    information use
  • creation, dissemination and collaboration
  • We must provide support for
  • finding and presenting non-textual material
    (photos, video, maps)
  • collection creation of primary data sources and
    informal publication
  • radically new modes of use
  • robustness in a chaotic world
  • We will need a lot of help!

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com