Database Applications -- The UC Berkeley Environmental Digital Library - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Database Applications -- The UC Berkeley Environmental Digital Library

Description:

Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School of Information Management and Systems – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Database Applications -- The UC Berkeley Environmental Digital Library


1
Database Applications -- The UC Berkeley
Environmental Digital Library
  • University of California, Berkeley
  • School of Information Management and Systems
  • SIMS 257 Database Management

2
Lecture Outline
  • Review
  • Database Administration
  • Database Applications
  • Berkeleys Environmental Digital Library

3
Final Project Requirements
  • See WWW site
  • http//sims.berkeley.edu/courses/is257/f02/index.h
    tml
  • Report on personal/group database including
  • Database description and purpose
  • Data Dictionary
  • Relationships Diagram
  • Sample queries and results (Web or Access tools)
  • Sample forms (Web or Access tools)
  • Sample reports (Web or Access tools)
  • Application Screens (Web or Access tools)

4
Final Presentations and Reports
  • Specifications for final report are on the Web
    Site under assignments
  • Presentations (1 on Nov. 28, Others on Nov 30,
    Dec 5th and 7th (Full))

5
Lecture Outline
  • Review
  • Database Administration
  • Database Applications
  • Berkeleys Environmental Digital Library

6
Terms and Concepts (trad)
  • Data Administration
  • Responsibility for the overall management of data
    resources within an organization
  • Database Administration
  • Responsibility for physical database design and
    technical issues in database management
  • These roles are often combined or overlapping in
    some organizations

7
Database System Life Cycle
Note this is a different version of this life
cycle than discussed previously
8
Database Planning DA DBA functions
  • Develop corporate database strategy (DA)
  • Develop enterprise model (DA)
  • Develop cost/benefit models (DA)
  • Design database environment (DA)
  • Develop data administration plan (DA)

9
Database Analysis DA DBA functions
  • Define and model data requirements (DA)
  • Define and model business rules (DA)
  • Define operational requirements (DA)
  • Maintain corporate Data Dictionary (DA)

10
Database Design DA DBA functions
  • Perform logical database design (DA)
  • Design external models (subschemas) (DBA)
  • Design internal model (Physical design) (DBA)
  • Design integrity controls (DBA)

11
Database Implementation DA DBA functions
  • Specify database access policies (DA DBA)
  • Establish Security controls (DBA)
  • Supervise Database loading (DBA)
  • Specify test procedures (DBA)
  • Develop application programming standards (DBA)
  • Establish procedures for backup and recovery
    (DBA)
  • Conduct User training (DA DBA)

12
Operation and Maintenance DA DBA functions
  • Monitor database performance (DBA)
  • Tune and reorganize databases (DBA)
  • Enforce standards and procedures (DBA)
  • Support users (DA DBA)

13
Growth Change DA DBA functions
  • Implement change control procedures (DA DBA)
  • Plan for growth and change (DA DBA)
  • Evaluate new technology (DA DBA)

14
Functions in Database Administration
  • Planning and Design (we have already looked at
    theses processes in detail)
  • Data Integrity
  • Backup and Recovery
  • Security Management

15
Data Integrity
  • Intrarecord integrity (enforcing constraints on
    contents of fields, etc.)
  • Referential Integrity (enforcing the validity of
    references between records in the database)
  • Concurrency control (ensuring the validity of
    database updates in a shared multiuser
    environment)

16
Database Security
  • Views or restricted subschemas
  • Authorization rules to identify users and the
    actions they can perform
  • User-defined procedures (and rule systems) to
    define additional constraints or limitations in
    using the database
  • Encryption to encode sensitive data
  • Authentication schemes to positively identify a
    person attempting to gain access to the database

17
Database Backup and Recovery
  • Backup
  • Journaling (audit trail)
  • Checkpoint facility
  • Recovery manager

18
Disaster Recovery Planning
From Toigo Disaster Recovery Planning
19
Threats to Assets and Functions
  • Water
  • Fire
  • Power Failure
  • Mechanical breakdown or software failure
  • Accidental or deliberate destruction of hardware
    or software
  • By hackers, disgruntled employees, industrial
    saboteurs, terrorists, or others

20
Threats
  • Between 1967 and 1978 fire and water damage
    accounted for 62 of all data processing
    disasters in the U.S.
  • The water damage was sometimes caused by fighting
    fires
  • More recently improvements in fire suppression
    (e.g., Halon) for DP centers has meant that water
    is the primary danger to DP centers

21
Kinds of Records
  • Class I VITAL
  • Essential, irreplaceable or necessary to recovery
  • Class II IMPORTANT
  • Essential or important, but reproducible with
    difficulty or at extra expense
  • Class III USEFUL
  • Records whose loss would be inconvenient, but
    which are replaceable
  • Class IV NONESSENTIAL
  • Records which upon examination are found to be no
    longer necessary

22
Offsite Storage of Data
  • Early offsite storage facilities were often
    intended to survive atomic explosions
  • PRISM International directory
  • Mirror sites (Hot sites)
  • E.g. Cantor-Fitzgerald

23
Lecture Outline
  • Review
  • Database Administration
  • Database Applications
  • Berkeleys Environmental Digital Library

24
Berkeley DL Project
  • Object Relational Database Applications
  • The Berkeley Digital Library Project
  • Slides from RRL and Robert Wilensky, EECS
  • Use of DBMS in DL project

25
Overview
  • What is an Digital Library?
  • Overview of Ongoing Research on Information
    Access in Digital Libraries

26
Digital Libraries Are Like Traditional
Libraries...
  • Involve large repositories of information
    (storage, preservation, and access)
  • Provide information organization and retrieval
    facilities (categorization, indexing)
  • Provide access for communities of users
    (communities may be as large as the general
    public or small as the employees of a particular
    organization)

27
Traditional Library System
28
But Digital Libraries Are Different From
Libraries...
  • Not a physical location with local copies
    objects held closer to originators
  • Decoupling of storage, organization, access
  • Enhanced Authoring (origination, annotation,
    support for work groups)
  • Subscription, pay-per-view supported in addition
    to free browsing.
  • Integration into user tasks.

29
A Digital Library Infrastructure Model
30
UC Berkeley Digital Library Project
  • Focus Work-centered digital information
    services
  • Testbed Digital Library for the California
    Environment
  • Research Technical agenda supporting
    user-oriented access to large distributed
    collections of diverse data types.
  • Part of the NSF/NASA/DARPA Digital Library
    Initiative (Phases 1 and 2)

31
UCB Digital Library Project Research
Organizations
  • UC Berkeley EECS, SIMS, CED, IST
  • UCOP/CDL
  • Xerox PARCs Document Image Decoding group and
    Work Practices group
  • Hewlett-Packard
  • NEC
  • SUN Microsystems
  • IBM Almaden
  • Microsoft
  • Ricoh California Research
  • Philips Research

32
Testbed An Environmental Digital Library
  • Collection Diverse material relevant to
    Californias key habitats.
  • Users A consortium of state agencies,
    development corporations, private corporations,
    regional government alliances, educational
    institutions, and libraries.
  • Potential Impact on state-wide environmental
    system (CERES )

33
The Environmental Library -Users/Contributors
  • California Resources Agency, California
    Environment Resources Evaluation System (CERES)
  • California Department of Water Resources
  • The California Department of Fish Game
  • SANDAG
  • UC Water Resources Center Archives
  • New Partners CDL and SDSC

34
The Environmental Library - Contents
  • Environmental technical reports, bulletins, etc.
  • County general plans
  • Aerial and ground photography
  • USGS topographic maps
  • Land use and other special purpose maps
  • Sensor data
  • Derived information
  • Collection data bases for the classification and
    distribution of the California biota (e.g.,
    SMASCH)
  • Supporting 3-D, economic, traffic, etc. models
  • Videos collected by the California Resources
    Agency

35
The Environmental Library - Contents
  • As of late 2002, the collection represents over
    one terabyte of data, including over 183,000
    digital images, about 300,000 pages of
    environmental documents, and over 2 million
    records in geographical and botanical databases.

36
Botanical Data
  • The CalFlora Database contains taxonomical and
    distribution information for more than 8000
    native California plants. The Occurrence Database
    includes over 600,000 records of California plant
    sightings from many federal, state, and private
    sources. The botanical databases are linked to
    the CalPhotos collection of California plants,
    and are also linked to external collections of
    data, maps, and photos.

37
Geographical Data
  • Much of the geographical data in the collection
    has been used to develop our web-based GIS
    Viewer. The Street Finder uses 500,000 Tiger
    records of S.F. Bay Area streets along with the
    70,000-records from the USGS GNIS database.
    California Dams is a database of information
    about the 1395 dams under state jurisdiction. An
    additional 11 GB of geographical data represents
    maps and imagery that have been processed for
    inclusion as layers in our GIS Viewer. This
    includes Digital Ortho Quads and DRG maps for the
    S.F. Bay Area.

38
Documents
  • Most of the 300,000 pages of digital documents
    are environmental reports and plans that were
    provided by California state agencies. This
    collection includes documents, maps, articles,
    and reports on the California environment
    including Environmental Impact Reports (EIRs),
    educational pamphlets, water usage bulletins, and
    county plans. Documents in this collection come
    from the California Department of Water Resources
    (DWR), California Department of Fish and Game
    (DFG), San Diego Association of Governments
    (SANDAG), and many other agencies. Among the most
    frequently accessed documents are County General
    Plans for every California county and a survey of
    125 Sacramento Delta fish species.

39
Testbed Success Stories
  • LUPIN CERES Land Use Planning Information
    Network
  • California Country General Plans and other
    environmental documents.
  • Enter at Resources Agency Server, documents
    stored at and retrieved from UCB DLIB server.
  • California flood relief efforts
  • High demand for some data sets only available on
    our server (created by document recognition).
  • CalFlora Creation and interoperation of
    repositories pertaining to plant biology.
  • Cloning of services at Cal State Library, FBI

40
Research Highlights
  • Documents
  • Multivalent Document prototype
  • Page images, structured documents, GIS data,
    photographs
  • Intelligent Access to Content
  • Document recognition
  • Vision-based Image Retrieval stuff, thing, scene
    retrieval
  • Natural Language Processing categorizing the
    web, Cheshire II, TileBar Interfaces

41
Multivalent Documents
  • MVD Model
  • radically distributed, open, extensible
  • behaviors and layers
  • behaviors conform to a protocol suite
  • inter-operation via IDEG
  • Applied to enlivening legacy documents
  • various nice behaviors, e.g., lenses

42
Document Presentation
  • Problem Digital libraries must deliver digital
    documents -- but in what form?
  • Different forms have advantages for particular
    purposes
  • Retrieval
  • Reuse
  • Content Analysis
  • Storage and archiving
  • Combining forms (Multivalent documents)

43
Spectrum of Digital Document Representations
Adapted from Fox, E.A., et al. Users, User
Interfaces and Objects Evision, an Electronic
Library, JASIS 44(8), 1993
44
Document Representation Multivalent Documents
  • Primary user interface/document model for UCB
    Digital Library (Wilensky Phelps)
  • Goal An approach to new document representations
    and their authoring.
  • Supports active, distributed, composable
    transformations of multimedia documents.
  • Enables sophisticated annotations, intelligent
    result handling, user-modifiable interface,
    composite documents.

45
Multivalent Documents
46
(No Transcript)
47
(No Transcript)
48
MVD availability
  • The MVD Browser is now available as open source
    on SourceForge
  • http//sourceforge.net/project/showfiles.php?group
    _id44509
  • See also
  • http//http.cs.berkeley.edu/phelps/Multivalent/

49
GIS in the MVD Framework
  • Layers are georeferenced data sets.
  • Behaviors are
  • display semi-transparently
  • pan
  • zoom
  • issue query
  • display context
  • spatial hyperlinks
  • annotations
  • Written in Java

50
GIS Viewer Features
  • Annotation and saving
  • points, rectangles (w. labels and links), vectors
  • saving of annotations as separate layer
  • Integration with address, street finding,
    gazetteer services
  • Application to image viewing tilePix
  • Castanet client

51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
GIS Viewer Example
http//elib.cs.berkeley.edu/annotations/gis/buildi
ngs.html
55
Geographic Information Plans and Ideas
  • More annotations, flexible saving
  • Support for large vector data sets
  • Interoperability
  • On-the-fly
  • conversion of formats
  • generation of catalogs
  • Via OGDI/GLTP
  • Experimenting with various CERES servers

56
Documents Information from scanned documents
  • Built document recognizers for some important
    documents, e.g. Bulletin 17. TR-9.
  • Recognized document structure, with order
    magnitude better OCR.
  • Automatically generated 1395 item dam relational
    data base.
  • Enabled access via forms, map interfaces.
  • Enable interoperation with image DB.

57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
Document Recognition Ongoing Work
  • Document recognizers for dozen document types
  • Development and integration of mathematical OCR
    and recognition.
  • Eventually produce document recognizer generator,
    i.e., make it easier to write recognizers.

61
Vision-Based Image Retrieval
  • Stuff-based queries blobs
  • Basic blobs colors, sizes, variable number
  • demonstrated utility for interesting queries
  • Blob world Above plus texture, applied to
  • retrieving similar images
  • successful learning scene classifier
  • Thing-finding Successfully deployed detectors
    adding body plans (adding shape, geometry and
    kinematic constraints)

62
Image Retrieval Research
  • Finding Stuff vs Things
  • BlobWorld
  • Other Vision Research

63
(Old stuff-based image retrieval Query)
64
(Old stuff-based image retrieval Result)
65
Blobworld use regions for retrieval
  • We want to find general objects? Represent
    images based on coherent regions

66
(No Transcript)
67
(No Transcript)
68
(Thing-based image retrieval using body
plans Result)
69
Natural Language Processing
Automatic Topic Assignment
  • Developed automatic categorization/disambiguation
    method to point where topic assignment (but not
    disambiguation) appears feasible.
  • Ran controlled experiment
  • Took Yahoo as ground truth.
  • Chose 9 overlapping categories took 1000 web
    pages from Yahoo as input.
  • Result 84 precision 48 recall (using top 5
    of 1073 categories)

70
Further Information
  • Berkeley DL web site
  • http//elib.cs.berkeley.edu
Write a Comment
User Comments (0)
About PowerShow.com