OGSA-DAI data access and integration - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

OGSA-DAI data access and integration

Description:

Data & code travel faster and more cheaply. Accommodates temporal distribution ... no such thing as a free lunch. Things are not yet 'Plug and Play' ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 23
Provided by: neilch2
Category:

less

Transcript and Presenter's Notes

Title: OGSA-DAI data access and integration


1
OGSA-DAIdata access and integration
  • NERC GridGIS workshop
  • eSI, 1 February 2006

2
Overview
  • The Data Deluge
  • challenges of increasing data availability
  • benefits of bringing data together
  • OGSA-DAI
  • overview
  • use as a data integration base layer

3
The Data Deluge
  • Entering an age of data
  • Data Explosion
  • CERN LHC will generate 1GB/s 10PB/y
  • VLBA (NRAO) generates 1GB/s today
  • Pixar generate 100 TB/Movie
  • Storage getting cheaper
  • Data stored in many different ways
  • Data resources
  • Relational databases
  • XML databases / files
  • Result files
  • Need ways to facilitate
  • Data discovery
  • Data access
  • Data integration
  • Empower e-Business and e-Science
  • The Grid is a vehicle for achieving this

4
Composing Observations in Astronomy
  • No. sizes of data sets as of mid-2002,
    grouped by wavelength
  • 12 waveband coverage of large areas of the
    sky
  • Total about 200 TB data
  • Doubling every 12 months
  • Largest catalogues near 1B objects

Data and images courtesy
Alex Szalay, John Hopkins

5
Data Services motives
  • Key to Integration of Scientific Methods
  • Publication and sharing of results
  • Primary data from observation, simulation
    experiment
  • Encourages novel uses
  • Allows validation of methods and derivatives
  • Enables discovery by combining data collected
    independently
  • Key to Large-scale Collaboration
  • Economies data production, publication
    management
  • Sharing cost of storage, management and curation
  • Many researchers contributing increments of data
  • Pooling annotation leads to rapid incremental
    publication
  • Accommodates global distribution
  • Data code travel faster and more cheaply
  • Accommodates temporal distribution
  • Researchers assemble data
  • Later (other) researchers access data

6
Data Services challenges to management
  • Scale
  • Many sites, large collections, many uses
  • Longevity
  • Research requirements outlive technical decisions
  • Diversity
  • No one size fits all solutions will work
  • Primary Data, Data Products, Meta Data,
    Administrative data,
  • Many Data Resources
  • Independently owned managed
  • No common goals
  • No common design
  • Work hard for agreements on foundation types and
    ontologies
  • Autonomous decisions change data, structure,
    policy,
  • Geographically distributed
  • and I havent even mentioned security yet!

7
Small problems
  • Not just Grand Challenges!
  • Also the small problems
  • For instance
  • What happens to data when a researcher leaves a
    team?
  • How can a research leader point to popular data
    when a new researcher joins?
  • How can you manage your data when you start to
    run out of local storage space?
  • How do I get my data from one format/database to
    another?
  • How do I combine my data with your data?
  • You need to manage your data

8
What is a data service?
  • An interface to a stored collection of data
  • e.g. Google and Amazon
  • web services
  • But the data could be
  • replicated
  • shared
  • federated
  • virtual
  • incomplete
  • Dont care about the underlying representation
  • do care about the information it represents
  • Adding a service layer to existing data sources
    can improve composability

9
Examples of Data Services
  • Many Data Services and applications
  • Commercial databases
  • Web interfaces
  • Applications developed individually by groups and
    projects
  • Also many places to get hold of public data
  • Publications and citation servers
  • Results servers
  • But no such thing as a free lunch
  • Things are not yet Plug and Play
  • You need to expend some effort to use these
    services effectively

10
Use Cases for Data Services
  • Data Filtering
  • Single source producing large amounts of data
    distributed to many sites downstream
  • Data Discovery
  • many sources, many query entry points in a linked
    system
  • Data Translation
  • source to sink, conversion of data model /
    structure
  • Data Federation
  • many sources, linked to provide view as a single
    source
  • Data Replication
  • full or partial copies to improve throughput
  • Data Integration (model aggregation)
  • e.g. integration of time variant data, streams,
    files
  • Data Integration (knowledge expansion)
  • forming links between databases to increase
    knowledge

11
Trade Offs
  • Speed vs completeness
  • do you require the exact answer or an answer?
  • Application specific vs language specific queries
  • how will users interrogate a data service?
  • Static system vs Dynamic Discovery
  • do you actually have dynamic resources?
  • Static vs Dynamic data
  • READ only, READ/INSERT only, UPDATE permitted
  • Static vs Dynamic queries
  • optimisation over flexibility
  • Intranet vs Internet
  • speed over security
  • Single data model versus mixed data models
  • ease/speed over integration
  • Queries vs Questions
  • assume that we know the structure when we form
    the query

12
Requirements on Data Services?
  • Common Data Model e.g. RowSet
  • Common Query Language(s) e.g. XQuery, SQL
  • Standard access to
  • data resource schema information for schema
    mapping
  • physical data resource information for
    optimisation purposes
  • data resource descriptive information for
    discovery / integration
  • Single, seamless security model
  • Dynamic publication and discovery
  • Multiple, efficient delivery methods
  • Move computation towards data
  • Data aggregation functionality
  • Provenance information
  • Replication information

13
OGSA-DAI In One Slide
  • An extensible framework for data access and
    integration.
  • Expose heterogeneous data resources to a grid
    through web services.
  • Interact with data resources
  • Queries and updates.
  • Data transformation / compression
  • Data delivery.
  • Customise for your project using
  • Additional Activities
  • Client Toolkit APIs
  • Data Resource handlers
  • A base for higher-level services
  • federation, mining, visualisation,

14
OGSA-DAI team
NeSC, Edinburgh
EPCC Team, Edinburgh
NEReSC, Newcastle
IBM Dissemination Team
IBM Development Team, Hursley
15
OGSA-DAI Design Principles I
  • Efficient client-server communication
  • Minimise where possible
  • One request specifies multiple operations
  • No unnecessary data movement
  • Move computation to the data
  • Utilise third-party delivery
  • Apply transforms (e.g., compression)
  • Build on existing standards
  • Fill-in gaps where necessary
  • DAIS specifications from DAIS WG at GGF

16
OGSA-DAI Design Principles II
  • Do not hide underlying data model
  • Users must know where to target queries
  • Data virtualisation is hard
  • Extensible architecture
  • Modular and customisable
  • e.g., to accommodate stronger security
  • Extensible activity framework
  • Cannot anticipate all desired functionality
  • Activity unit of functionality
  • Allow users to plug-in their own

17
The OGSA-DAI Framework
Application
Client Toolkit
OGSA-DAI service
Engine
SQLQuery
Activities
GZip
GridFTP
XPath
readFile
XSLT
JDBC
Data Resources
XMLDB
File
MySQL
DB2
XIndice
SWISS PROT
SQL Server
Data- bases
18
Intermediary
  • Simple intermediary
  • potential to accelerate development, logging, or
    filtering
  • Persistent intermediary
  • e.g. to allow efficient local indexing

19
Redirector, Coordinator, Network
  • Allowing composition and decentralisation

20
Extensibility Example
OGSA-DAI service
Engine
SQLQuery
SQLQuery
JDBC
Multiple SQL GDS
MySQL
21
Map Retrieval Current
22
Map Retrieval Grid Prototype
23
Map Retrieval Security
  • Exploit NGS infrastructure to provide secure
    access layer

EDINA
NGS Authentication
Allowed users dn
SO-OGC
OGC
ODS 1
GIS
Oracle
24
Map Retrieval Integration
  • Exploit OGSA-DAI extensibility to add e.g. overlay

25
OGSA-DAI / EDINA prototyping work
  • Stage 1 Using existing OGSA-DAI technology
  • Stage 2 Extending OGSA-DAI

OGSA-DAI service
Input Parameters
URL
GIS Client
DeliverFrom URL
GIS Activities
Image/XML File
WMS Server
HTTP Request
HTTP Data Resource
HTTP Response
26
Core features of OGSA-DAI I
  • A framework for building applications
  • Supports data access, insert and update
  • Relational MySQL, Oracle, DB2, SQL Server,
    Postgres
  • XML Xindice, eXist
  • Files CSV, BinX, EMBL, OMIM, SWISSPROT,
  • Supports data delivery
  • SOAP over HTTP
  • FTP GridFTP
  • E-mail
  • Inter-service
  • Supports data transformation
  • XSLT
  • ZIP GZIP
  • Supports security
  • X.509 certificate based security

27
Core features of OGSA-DAI II
  • A framework for building data clients
  • Client toolkit library for application developers
  • A framework for developing functionality
  • Extend existing activities, or implement your own
  • Mix and match activities to provide functionality
    you need
  • Highly-extensible
  • Customise our out-of-the-box product
  • Provide your own services, client-side support
    and data-related functionality
  • Comprehensive documentation and tutorials
  • Latest release supports GT4.0 and Axis 1.2 /
    OMII_2 using Java 1.4

28
Distributed Query Processing
  • Higher level services building on OGSA-DAI
  • specialised metadata extraction
  • Execute queries in parallel over multiple data
    resources
  • Queries mapped to algebraic expressions for
    evaluation
  • Parallelism represented by partitioning queries
  • Use exchange operators
  • Equality based joins in current release
  • supported types long, integer, string, double
    and float

29
DQP architecture
30
GridMiner Data Mediation Service
  • Principles
  • Tight Federation
  • global (relational) schema
  • Virtual integration
  • leave the data where it is
  • always up-to-date data
  • Build on data access from OGSA-DAI
  • Not bound to special architecture
  • Supported data sources
  • RDBMS (via JDBC), XMLDB (Xindice), CSV files
  • Operators Union all and inner join
  • Operators are XQuery based (using SAXON)

31
Data Integration Scenario
  • Heterogeneities
  • Name in A is First Last (as the target format)
  • Name in C has to be combined
  • Distribution
  • 3 data sources
  • Java based schema mapping to global schema
  • types limited by WebRowSet

32
Data Integration Scenario (cont.)
  • Query
  • SELECT p_name FROM patient WHERE id10

Standard
to
optimized
33
caBIG
  • Object-Oriented view of data
  • Data types are well-defined and registered in a
    repository
  • Standardized metadata facilitates discovery
  • custom query language implemented as an activity

34
LEAD
35
FirstDIG
  • Data mining with the First Transport Group, UK
  • Example When buses are more than 10 minutes
    late there is an 82 chance that revenue drops by
    at least 10
  • "The results of this exercise will revolutionise
    the way we do things in the bus industry.,
    Darren Unwin, Divisional Manager, First South
    Yorkshire.
  • Client based joins, using temporary tables

OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI Client Application
Data Mining Application
36
OGSA-DAI Challenges
  • Metadata extraction
  • define a common model for e.g. database schema?
  • Intermediate representation
  • between multiple models (relational, XML,)
  • XML WebRowSet is flexible (c.f. GridMiner) but
    expansive
  • DFDL and GridFTP/parallel HTTP?
  • Query definition
  • translation of queries
  • aggregation of results
  • Data transport and workflow
  • workflow is typically compute driven
  • Move computation to data
  • mobile code activities?
  • data services hosted on DBMS?

37
Contributing to OGSA-DAI
  • Additional functionality
  • Provide activities which implement specific
    functionality
  • Provide extra client functionality
  • Provide different security mechanisms
  • Provide higher level components and applications
  • Different levels of contributions
  • Based on OGSA-DAI?
  • Works with OGSA-DAI?
  • Part of OGSA-DAI?

38
In the near future
  • A new version of the OGSA-DAI Engine
  • should look mostly the same externally
  • better support for concurrency, sessions and
    monitoring
  • Implementing new versions of specifications
  • DAIS Specifications
  • Key things that we will be addressing
  • Performance
  • A Security Model which can be applied across
    platforms
  • Full Transactions framework, distributed
    transactions
  • More data integration facilities
  • Better abstraction over DBMS variation
  • Application centric queries
  • collaborating with other projects
  • Research projects looking at
  • schema mapping
  • extended data resources

39
Associated Meetings and Workshops
  • DIALOGUE Workshops (http//www.datagrids.org)
  • Data Integration Applications Linking
    Organisations to Gain Understanding and
    Experience
  • Bringing together Data Integration middleware and
    application providers with users
  • Next one at NeSC 9-10th February 2006
  • http//www.nesc.ac.uk/esi/events/636/
  • Next Generation Distributed Data Management
    (HPDC15, Paris)
  • http//www.isi.edu/annc/distributedDataWorkshop.h
    tml
  • Data Management on Grids (VLDB06, Seoul)

40
Conclusions
  • The benefits of trying to integrate data are
    hindered by challenges such as heterogeneity,
    scale and distribution
  • A common data service layer should make data
    integration easier
  • OGSA-DAI provides an extensible, data service
    based framework which makes it easier to
    implement data integration
  • GIS data is amenable to integration using data
    services

41
Further information
  • The OGSA-DAI Project Site
  • http//www.ogsadai.org.uk
  • The DAIS-WG site
  • http//forge.gridforum.org/projects/dais-wg/
  • OGSA-DAI Users Mailing list
  • users_at_ogsadai.org.uk
  • General discussion on grid DAI matters
  • Formal support for OGSA-DAI releases
  • http//bugs.ogsadai.org.uk/
  • OGSA-DAI training courses
Write a Comment
User Comments (0)
About PowerShow.com