Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library - PowerPoint PPT Presentation

About This Presentation
Title:

Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library

Description:

Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School of Information Management and Systems – PowerPoint PPT presentation

Number of Views:287
Avg rating:3.0/5.0
Slides: 71
Provided by: ValuedGate1063
Category:

less

Transcript and Presenter's Notes

Title: Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library


1
Data-Driven Digital Library Applications -- The
UC Berkeley Environmental Digital Library
  • University of California, Berkeley
  • School of Information Management and Systems
  • SIMS 257 Database Management

2
Lecture Outline
  • Final Project
  • Review
  • ORDBMS Feature
  • JDBC Access to DBMS
  • Data-Driven Digital Library Applications
  • Berkeleys Environmental Digital Library

3
Lecture Outline
  • Final Project
  • Review
  • ORDBMS Feature
  • JDBC Access to DBMS
  • Data-Driven Digital Library Applications
  • Berkeleys Environmental Digital Library

4
Final Project Requirements
  • See WWW site
  • http//sims.berkeley.edu/courses/is257/f04/index.h
    tml
  • Report on personal/group database including
  • Database description and purpose
  • Data Dictionary
  • Relationships Diagram
  • Sample queries and results (Web or Access tools)
  • Sample forms (Web or Access tools)
  • Sample reports (Web or Access tools)
  • Application Screens (Web or Access tools)

5
Final Presentations and Reports
  • Specifications for final report are on the Web
    Site under assignments
  • Reports Due on December 15.
  • Presentations on December 15, 900-1200

6
Lecture Outline
  • Final Project
  • Review
  • ORDBMS Feature
  • JDBC Access to DBMS
  • Data-Driven Digital Library Applications
  • Berkeleys Environmental Digital Library

7
Object Relational Data Model
  • Class, instance, attribute, method, and integrity
    constraints
  • OID per instance
  • Encapsulation
  • Multiple inheritance hierarchy of classes
  • Class references via OID object references
  • Set-Valued attributes
  • Abstract Data Types

8
PostgreSQL
  • All of the usual SQL commands for creation,
    searching and modifying classes (tables) are
    available. With some additions
  • Inheritance
  • Non-Atomic Values
  • User defined functions and operators

9
Inheritance
  • CREATE TABLE cities (
  • name text,
  • population float,
  • altitude int -- (in ft)
  • )
  • CREATE TABLE capitals (
  • state char(2)
  • ) INHERITS (cities)

10
Non-Atomic Values - Arrays
  • Postgres allows attributes of an instance to be
    defined as fixed-length or variable-length
    multi-dimensional arrays. Arrays of any base type
    or user-defined type can be created. To
    illustrate their use, we first create a class
    with arrays of base types.
  • CREATE TABLE SAL_EMP (
  • name text,
  • pay_by_quarter int4,
  • schedule text
  • )

11
PostgreSQL Extensibility
  • Postgres is extensible because its operation is
    catalog-driven
  • RDBMS store information about databases, tables,
    columns, etc., in what are commonly known as
    system catalogs. (Some systems call this the data
    dictionary).
  • One key difference between Postgres and standard
    RDBMS is that Postgres stores much more
    information in its catalogs
  • not only information about tables and columns,
    but also information about its types, functions,
    access methods, etc.
  • These classes can be modified by the user, and
    since Postgres bases its internal operation on
    these classes, this means that Postgres can be
    extended by users
  • By comparison, conventional database systems can
    only be extended by changing hardcoded procedures
    within the DBMS or by loading modules
    specially-written by the DBMS vendor.

12
User Defined Functions
  • CREATE FUNCTION allows a Postgres user to
    register a function with a database.
    Subsequently, this user is considered the owner
    of the function
  • CREATE FUNCTION name ( ftype , ... )
  • RETURNS rtype
  • AS SQLdefinition
  • LANGUAGE 'langname'
  • WITH ( attribute , ... )
  • CREATE FUNCTION name ( ftype , ... )
  • RETURNS rtype
  • AS obj_file , link_symbol
  • LANGUAGE 'C'
  • WITH ( attribute , ... )

13
External Functions
  • This example creates a C function by calling a
    routine from a user-created shared library. This
    particular routine calculates a check digit and
    returns TRUE if the check digit in the function
    parameters is correct. It is intended for use in
    a CHECK contraint.
  • CREATE FUNCTION ean_checkdigit(bpchar, bpchar)
    RETURNS bool
  • AS '/usr1/proj/bray/sql/funcs.so' LANGUAGE
    'c'
  • CREATE TABLE product (
  • id char(8) PRIMARY KEY,
  • eanprefix char(8) CHECK (eanprefix
    '0-92 0-95')
  • REFERENCES
    brandname(ean_prefix),
  • eancode char(6) CHECK (eancode
    '0-96'),
  • CONSTRAINT ean CHECK (ean_checkdigit(eanpre
    fix, eancode)))

14
Creating new Types
  • CREATE TYPE allows the user to register a new
    user data type with Postgres for use in the
    current data base. The user who defines a type
    becomes its owner. typename is the name of the
    new type and must be unique within the types
    defined for this database.
  • CREATE TYPE typename ( INPUT input_function,
    OUTPUT output_function
  • , INTERNALLENGTH internallength
    VARIABLE , EXTERNALLENGTH externallength
    VARIABLE
  • , DEFAULT "default"
  • , ELEMENT element , DELIMITER
    delimiter
  • , SEND send_function , RECEIVE
    receive_function
  • , PASSEDBYVALUE )

15
Rules System
  • CREATE RULE name AS ON event
  • TO object WHERE condition
  • DO INSTEAD action NOTHING
  • Rules can be triggered by any event (select,
    update, delete, etc.)

16
Views as Rules
  • Views in Postgres are implemented using the rule
    system. In fact there is absolutely no difference
    between a
  • CREATE VIEW myview AS SELECT FROM mytab
  • compared against the two commands
  • CREATE TABLE myview (same attribute list as for
    mytab)
  • CREATE RULE "_RETmyview" AS ON SELECT TO myview
    DO INSTEAD
  • SELECT FROM mytab

17
GiST Approach
  • A generalized search tree. Must be
  • Extensible in terms of queries
  • General (B-tree, R-tree, etc.)
  • Easy to extend
  • Efficient (match specialized trees)
  • Highly concurrent, recoverable, etc.

18
Java and JDBC
  • Java is probably the high-level language used in
    most software development today one of the
    earliest enterprise additions to Java was JDBC
  • JDBC is an API that provides a mid-level access
    to DBMS from Java applications
  • Intended to be an open cross-platform standard
    for database access in Java
  • Similar in intent to Microsofts ODBC

19
JDBC
  • Provides a standard set of interfaces for any
    DBMS with a JDBC driver using SQL to specify
    the databases operations.

20
JDBC Simple Java Implementation
import java.sql. import oracle.jdbc. public
class JDBCSample public static void
main(java.lang.String args) try //
this is where the driver is loaded
//Class.forName("jdbc.oracle.thin")
DriverManager.registerDriver(new
OracleDriver()) catch (SQLException e)
System.out.println("Unable to load driver
Class") return
21
JDBC Simple Java Impl.
try //All DB access is within the
try/catch block... // make a connection to
ORACLE on Dream Connection con
DriverManager.getConnection(
"jdbcoraclethin_at_dream.sims.berkel
ey.edu1521dev", mylogin",
myoraclePW") // Do an SQL statement...
Statement stmt con.createStatement()
ResultSet rs stmt.executeQuery("SELECT NAME
FROM DIVECUST")
22
JDBC Simple Java Impl.
// show the Results... while(rs.next())
System.out.println(rs.getString("NAME"))
// Release the database
resources... rs.close()
stmt.close() con.close() catch
(SQLException se) // inform user of
errors... System.out.println("SQL Exception
" se.getMessage()) se.printStackTrace(Syst
em.out)
23
Lecture Outline
  • Final Project
  • Review
  • ORDBMS Feature
  • JDBC Access to DBMS
  • Data-Driven Digital Library Applications
  • Berkeleys Environmental Digital Library

24
Berkeley DL Project
  • Object Relational Database Applications
  • The Berkeley Digital Library Project
  • Slides from RRL and Robert Wilensky, EECS
  • Use of DBMS in DL project

25
Overview
  • What is an Digital Library?
  • Overview of Ongoing Research on Information
    Access in Digital Libraries

26
Digital Libraries Are Like Traditional
Libraries...
  • Involve large repositories of information
    (storage, preservation, and access)
  • Provide information organization and retrieval
    facilities (categorization, indexing)
  • Provide access for communities of users
    (communities may be as large as the general
    public or small as the employees of a particular
    organization)

27
Traditional Library System
28
But Digital Libraries Are Different From
Libraries...
  • Not a physical location with local copies
    objects held closer to originators
  • Decoupling of storage, organization, access
  • Enhanced Authoring (origination, annotation,
    support for work groups)
  • Subscription, pay-per-view supported in addition
    to free browsing.
  • Integration into user tasks.

29
A Digital Library Infrastructure Model
30
UC Berkeley Digital Library Project
  • Focus Work-centered digital information
    services
  • Testbed Digital Library for the California
    Environment
  • Research Technical agenda supporting
    user-oriented access to large distributed
    collections of diverse data types.
  • Part of the NSF/NASA/DARPA Digital Library
    Initiative (Phases 1 and 2)

31
UCB Digital Library Project Research
Organizations
  • UC Berkeley EECS, SIMS, CED, IST
  • UCOP/CDL
  • Xerox PARCs Document Image Decoding group and
    Work Practices group
  • Hewlett-Packard
  • NEC
  • SUN Microsystems
  • IBM Almaden
  • Microsoft
  • Ricoh California Research
  • Philips Research

32
Testbed An Environmental Digital Library
  • Collection Diverse material relevant to
    Californias key habitats.
  • Users A consortium of state agencies,
    development corporations, private corporations,
    regional government alliances, educational
    institutions, and libraries.
  • Potential Impact on state-wide environmental
    system (CERES )

33
The Environmental Library -Users/Contributors
  • California Resources Agency, California
    Environment Resources Evaluation System (CERES)
  • California Department of Water Resources
  • The California Department of Fish Game
  • SANDAG
  • UC Water Resources Center Archives
  • New Partners CDL and SDSC

34
The Environmental Library - Contents
  • Environmental technical reports, bulletins, etc.
  • County general plans
  • Aerial and ground photography
  • USGS topographic maps
  • Land use and other special purpose maps
  • Sensor data
  • Derived information
  • Collection data bases for the classification and
    distribution of the California biota (e.g.,
    SMASCH)
  • Supporting 3-D, economic, traffic, etc. models
  • Videos collected by the California Resources
    Agency

35
The Environmental Library - Contents
  • As of late 2002, the collection represents over
    one terabyte of data, including over 183,000
    digital images, about 300,000 pages of
    environmental documents, and over 2 million
    records in geographical and botanical databases.

36
Botanical Data
  • The CalFlora Database contains taxonomical and
    distribution information for more than 8000
    native California plants. The Occurrence Database
    includes over 600,000 records of California plant
    sightings from many federal, state, and private
    sources. The botanical databases are linked to
    the CalPhotos collection of California plants,
    and are also linked to external collections of
    data, maps, and photos.

37
Geographical Data
  • Much of the geographical data in the collection
    has been used to develop our web-based GIS
    Viewer. The Street Finder uses 500,000 Tiger
    records of S.F. Bay Area streets along with the
    70,000-records from the USGS GNIS database.
    California Dams is a database of information
    about the 1395 dams under state jurisdiction. An
    additional 11 GB of geographical data represents
    maps and imagery that have been processed for
    inclusion as layers in our GIS Viewer. This
    includes Digital Ortho Quads and DRG maps for the
    S.F. Bay Area.

38
Documents
  • Most of the 300,000 pages of digital documents
    are environmental reports and plans that were
    provided by California state agencies. This
    collection includes documents, maps, articles,
    and reports on the California environment
    including Environmental Impact Reports (EIRs),
    educational pamphlets, water usage bulletins, and
    county plans. Documents in this collection come
    from the California Department of Water Resources
    (DWR), California Department of Fish and Game
    (DFG), San Diego Association of Governments
    (SANDAG), and many other agencies. Among the most
    frequently accessed documents are County General
    Plans for every California county and a survey of
    125 Sacramento Delta fish species.

39
Testbed Success Stories
  • LUPIN CERES Land Use Planning Information
    Network
  • California Country General Plans and other
    environmental documents.
  • Enter at Resources Agency Server, documents
    stored at and retrieved from UCB DLIB server.
  • California flood relief efforts
  • High demand for some data sets only available on
    our server (created by document recognition).
  • CalFlora Creation and interoperation of
    repositories pertaining to plant biology.
  • Cloning of services at Cal State Library, FBI

40
Research Highlights
  • Documents
  • Multivalent Document prototype
  • Page images, structured documents, GIS data,
    photographs
  • Intelligent Access to Content
  • Document recognition
  • Vision-based Image Retrieval stuff, thing, scene
    retrieval
  • Natural Language Processing categorizing the
    web, Cheshire II, TileBar Interfaces

41
Multivalent Documents
  • MVD Model
  • radically distributed, open, extensible
  • behaviors and layers
  • behaviors conform to a protocol suite
  • inter-operation via IDEG
  • Applied to enlivening legacy documents
  • various nice behaviors, e.g., lenses

42
Document Presentation
  • Problem Digital libraries must deliver digital
    documents -- but in what form?
  • Different forms have advantages for particular
    purposes
  • Retrieval
  • Reuse
  • Content Analysis
  • Storage and archiving
  • Combining forms (Multivalent documents)

43
Spectrum of Digital Document Representations
Adapted from Fox, E.A., et al. Users, User
Interfaces and Objects Evision, an Electronic
Library, JASIS 44(8), 1993
44
Document Representation Multivalent Documents
  • Primary user interface/document model for UCB
    Digital Library (Wilensky Phelps)
  • Goal An approach to new document representations
    and their authoring.
  • Supports active, distributed, composable
    transformations of multimedia documents.
  • Enables sophisticated annotations, intelligent
    result handling, user-modifiable interface,
    composite documents.

45
Multivalent Documents
46
(No Transcript)
47
(No Transcript)
48
MVD availability
  • The MVD Browser is now available as open source
    on SourceForge
  • http//multivalent.sourceforge.net
  • See also
  • http//elib.cs.berkeley.edu

49
GIS in the MVD Framework
  • Layers are georeferenced data sets.
  • Behaviors are
  • display semi-transparently
  • pan
  • zoom
  • issue query
  • display context
  • spatial hyperlinks
  • annotations
  • Written in Java

50
GIS Viewer Features
  • Annotation and saving
  • points, rectangles (w. labels and links), vectors
  • saving of annotations as separate layer
  • Integration with address, street finding,
    gazetteer services
  • Application to image viewing tilePix
  • Castanet client

51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
GIS Viewer Example
http//elib.cs.berkeley.edu/annotations/gis/buildi
ngs.html
55
Geographic Information Plans and Ideas
  • More annotations, flexible saving
  • Support for large vector data sets
  • Interoperability
  • On-the-fly
  • conversion of formats
  • generation of catalogs
  • Via OGDI/GLTP
  • Experimenting with various CERES servers

56
Documents Information from scanned documents
  • Built document recognizers for some important
    documents, e.g. Bulletin 17. TR-9.
  • Recognized document structure, with order
    magnitude better OCR.
  • Automatically generated 1395 item dam relational
    data base.
  • Enabled access via forms, map interfaces.
  • Enable interoperation with image DB.

57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
Document Recognition Ongoing Work
  • Document recognizers for dozen document types
  • Development and integration of mathematical OCR
    and recognition.
  • Eventually produce document recognizer generator,
    i.e., make it easier to write recognizers.

61
Vision-Based Image Retrieval
  • Stuff-based queries blobs
  • Basic blobs colors, sizes, variable number
  • demonstrated utility for interesting queries
  • Blob world Above plus texture, applied to
  • retrieving similar images
  • successful learning scene classifier
  • Thing-finding Successfully deployed detectors
    adding body plans (adding shape, geometry and
    kinematic constraints)

62
Image Retrieval Research
  • Finding Stuff vs Things
  • BlobWorld
  • Other Vision Research

63
(Old stuff-based image retrieval Query)
64
(Old stuff-based image retrieval Result)
65
Blobworld use regions for retrieval
  • We want to find general objects? Represent
    images based on coherent regions

66
(No Transcript)
67
(No Transcript)
68
(Thing-based image retrieval using body
plans Result)
69
Natural Language Processing
Automatic Topic Assignment
  • Developed automatic categorization/disambiguation
    method to point where topic assignment (but not
    disambiguation) appears feasible.
  • Ran controlled experiment
  • Took Yahoo as ground truth.
  • Chose 9 overlapping categories took 1000 web
    pages from Yahoo as input.
  • Result 84 precision 48 recall (using top 5
    of 1073 categories)

70
Further Information
  • Berkeley DL web site
  • http//elib.cs.berkeley.edu
Write a Comment
User Comments (0)
About PowerShow.com