New Generation Database Systems: XML Databases and Grid-based Digital Libraries PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: New Generation Database Systems: XML Databases and Grid-based Digital Libraries


1
New Generation Database Systems XML Databases
and Grid-based Digital Libraries
  • University of California, Berkeley
  • School of Information
  • IS 257 Database Management

2
Lecture Outline
  • XML and DBMS
  • The Grid and DBMS
  • The Grid
  • Data Grids
  • Grid-based DBMS

3
Lecture Outline
  • XML and DBMS
  • The Grid and DBMS
  • The Grid
  • Data Grids
  • Grid-based DBMS

4
Standards XML/SQL
  • As part of SQL3 an extension providing a mapping
    from XML to DBMS is being created called XML/SQL
  • The (draft) standard is very complex, but the
    ideas are actually pretty simple
  • Suppose we have a table called EMPLOYEE that has
    columns EMPNO, FIRSTNAME, LASTNAME, BIRTHDATE,
    SALARY

5
Standards XML/SQL
  • That table can be mapped to
    ltEMPLOYEEgt
    ltrowgtltEMPNOgt000020lt/EMPNOgt
    ltFIRSTNAMEgtJohnlt/FIRSTNAM
    Egt ltLASTNAMEgtSmithlt/LASTNAMEgt
    ltBIRTHDATEgt1955-08-21lt/BIRTHDATEgt
    ltSALARYgt52300.00lt/SALARYgt
    lt/rowgt
  • ltrowgt etc.

6
Standards XML/SQL
  • In addition the standard says that XMLSchemas
    must be generated for each table, and also allows
    relations to be managed by nesting records from
    tables in the XML.
  • Variants of this are incorporated into the latest
    versions of ORACLE
  • (Slides from Oracle Web Site on ORACLE XML)

7
Lecture Outline
  • XML and DBMS
  • The Grid and DBMS
  • The Grid
  • Data Grids
  • Grid-based DBMS

8
Grid-based Digital Libraries
  • So whats this Grid thing anyhow?
  • Data Grids and Distributed Storage
  • Grid-Based IR
  • Grid-Based Digital Libraries
  • This lecture borrows heavily from presentations
    by Ian Foster (Argonne National Laboratory
    University of Chicago), Reagan Moore and others
    from San Diego Supercomputer Center

9
The Grid On-Demand Access to Electricity
Quality, economies of scale
Time
Source Ian Foster
10
By Analogy, A Computing Grid
  • Decouples production and consumption
  • Enable on-demand access
  • Achieve economies of scale
  • Enhance consumer flexibility
  • Enable new devices
  • On a variety of scales
  • Department
  • Campus
  • Enterprise
  • Internet

Source Ian Foster
11
What is the Grid?
  • The short answer is that, whereas the Web is a
    service for sharing information over the
    Internet, the Grid is a service for sharing
    computer power and data storage capacity over the
    Internet. The Grid goes well beyond simple
    communication between computers, and aims
    ultimately to turn the global network of
    computers into one vast computational resource.
  • Source The Global Grid Forum

12
Not Exactly a New Idea
  • The time-sharing computer system can unite a
    group of investigators . one can conceive of
    such a facility as an intellectual public
    utility.
  • Fernando Corbato and Robert Fano , 1966
  • We will perhaps see the spread of computer
    utilities, which, like present electric and
    telephone utilities, will service individual
    homes and offices across the country. Len
    Kleinrock, 1967

Source Ian Foster
13
But, Things are Different Now
  • Networks are far faster (and cheaper)
  • Faster than computer backplanes
  • Computing is very different than pre-Net
  • Our computers have already disintegrated
  • E-commerce increases size of demand peaks
  • Entirely new applications social structures
  • Weve learned a few things about software

Source Ian Foster
14
Computing isnt Really Like Electricity
  • I import electricity but must export data
  • Computing is not interchangeable but highly
    heterogeneous data, sensors, services,
  • This complicates things but also means that the
    sum can be greater than the parts
  • Real opportunity Construct new capabilities
    dynamically from distributed services
  • Raises three fundamental questions
  • Can I really achieve economies of scale?
  • Can I achieve QoS across distributed services?
  • Can I identify apps that exploit synergies?

Source Ian Foster
15
Why the Grid?(1) Revolution in Science
  • Pre-Internet
  • Theorize /or experiment, aloneor in small
    teams publish paper
  • Post-Internet
  • Construct and mine large databases of
    observational or simulation data
  • Develop simulations analyses
  • Access specialized devices remotely
  • Exchange information within distributed
    multidisciplinary teams

Source Ian Foster
16
Why the Grid?(2) Revolution in Business
  • Pre-Internet
  • Central data processing facility
  • Post-Internet
  • Enterprise computing is highly distributed,
    heterogeneous, inter-enterprise (B2B)
  • Business processes increasingly computing-
    data-rich
  • Outsourcing becomes feasible gt service
    providers of various sorts

Source Ian Foster
17
The Information Grid
  • Imagine a web of data
  • Machine Readable
  • Search, Aggregate, Transform, Report On, Mine
    Data using more computers, and less humans
  • Scalable
  • Machines are cheap can buy 50 machines with
    100Gb or memory and 100 TB disk for under 100K,
    and dropping
  • Network is now faster than disk
  • Flexible
  • Move data around without breaking the apps

Source S. Banerjee, O. Alonso, M. Drake - ORACLE
18
The Foundations are Being Laid
19
Data Grid Problem
  • Enable a geographically distributed community
    of thousands to pool their resources in order
    to perform sophisticated, computationally
    intensive analyses on Petabytes of data
  • Note that this problem
  • Is common to many areas of science
  • Overlaps strongly with other Grid problems

20
Data Grids forHigh Energy Physics
Image courtesy Harvey Newman, Caltech
21
Grids and Open Standards
App-specific Services
Increased functionality, standardization
Custom solutions
Time
22
The Gridas Enabler of 21st Century Science
  • Entirely new approaches to enquiry based on
  • Deep analysis of huge quantities of data
  • Interdisciplinary collaboration
  • Large-scale simulation
  • Smart instrumentation
  • Enabled by an infrastructure that enables access
    to, and integration of, resources services
    without regard for location

23
Not only Science
  • The Database world is moving to the Grid for
    large-scale applications
  • Oracle 10g is specifically designed to exploit
    clustered/grid computing using RACs (Real
    Application Clusters)
  • An example from the Information/Publishing world
  • Presentation from Oracle about Thomson Legals
    use of Oracle 10g and RACs
Write a Comment
User Comments (0)
About PowerShow.com