Managing Grids with Information and - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Managing Grids with Information and

Description:

Bob Mann (ROE, AstroGrid project) Steve Fisher (RAL, EGEE project) Objectives: ... 'What is Sally's phone and office?' q(P,O) - phone(sally,P) & office(sally,O) ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 26
Provided by: Werne67
Category:

less

Transcript and Presenter's Notes

Title: Managing Grids with Information and


1
MAGIK-I
  • Managing Grids with Information and
  • Knowledge that are Incomplete
  • Andy Cooke, Alasdair Gray, Lisha Ma and
  • Werner Nutt
  • 8th June 2004
  • Royal Observatory, Edinburgh

2
Who are we?
  • Part of the Database Group at Heriot-Watt
    University
  • Interested in information integration
  • Grid Monitoring
  • in collaboration with DataGrid/EGEE
  • funded by EPSRC
  • Theoretical work
  • e.g. query answering using views
  • Integrating distributed data streams
  • Query languages for streams (Lisha Ma)
  • Managing views over streams (Alasdair Gray)
  • Integrating biological data

3
What is MAGIK-I?
  • How can one handle incompleteness in an
  • information-integration setting?
  • An EPSRC-funded project (until Sept. 2007)
  • part of Semantic Grid Initiative
  • Collaborating with
  • Bob Mann (ROE, AstroGrid project)
  • Steve Fisher (RAL, EGEE project)
  • Objectives
  • develop logical framework, to back-up code
    solutions
  • extract requirements from collaborators
  • use test-bed to try out ideas (and get feedback)

4
A Common Problem
  • Users want to obtain information
  • published on a Grid but
  • many sources to find!
  • how is their data related?
  • which sources have relevant data?
  • what query should be posed?
  • Also
  • possibly different data models to interact with
  • distributed query processing is hard!

5
Information Integrationthe Paradigm
  • Addresses problems 1 - 4 on previous slide
  • not concerned with distributed query processing,
  • nor how to accommodate different data models
  • The general approach
  • define a global schema,
  • users query virtual database,
  • mediator translates query into
  • distributed query over sources

6
What do Mediators need?
  • A mapping that relates each source schema with
    the global schema, e.g.
  • global database described as view over sources
  • sources described as view over global database
  • a combination of these.
  • A global query language
  • A common source query language
  • Information about capabilities of sources
  • what queries do they support?
  • how complete are they?

7
Example Grid Monitoring with the R-GMA System
  • Allows Grid middleware, e.g. a broker, to find
    out about the state of Grid resources
  • Offers APIs to users
  • Producer (for publishing information streams),
    or
  • Consumer (for posing queries against a global
    schema)
  • APIs are supported by
  • agents (work on behalf of producers and
    consumers)
  • smart registry (can match queries with views)
  • republishers (collect information together to
    optimize queries)

8
R-GMA Birds Eye View
9
Producers Register their View
Stream Producer 1 publishes and registers
SELECT FROM CPULoad WHERE country UK and
site RAL
Stream Producer 2 publishes and registers
SELECT FROM CPULoad WHERE country UK and
site HW
10
Views Map to a Global Table
11
How does R-GMAs Mediator work?
  • Find relevant producers using a satisfiability
    check
  • Query SELECT WHERE site RAL
  • View WHERE site HW producer can never
    contribute!
  • Choose the best plan
  • e.g. contact one republisher, rather than 40
    producers!
  • Execute plan
  • switch to alternative plan if first fails
  • currently no distributed query processor is used
  • Return answer, and report if incomplete
  • by appending a warning to a result set
  • but what about more
    complex views?

12
Information ManifoldSupports Join Views
  • Global schema for employers virtual db
  • employees emp(eName)
  • phone numbers phone(eName, phoneNo)
  • managers mgr(eName, mName)
  • departments dept(eName, dept)
  • office office(eName, office)
  • Three source relations
  • S1(E,M,P) employees with managers and phones
  • S2(E,O,D) employess with office and department
  • S3(E,P) employees of toy department and
    phone

13
How Can we Describe the Sources?
  • S1(E, M, P) (employees with managers
    and phones)
  • contains answers to the query
  • SELECT E.eName, M.mName, P.phoneNo
  • FROM emp E, mgr M, phone P
  • WHERE E.eName M.eNAME AND
  • E.eName P.eName
  • Shorthand notation
  • S1(E,M,P) lt- emp(E) mgr(E,M) phone(E,P)

14
Shorthand Notation for Source Descriptions and
Queries
  • Sources
  • S1(E,M,P) lt- emp(E) mgr(E,M)
    phone(E,P)
  • S2(E,O,D) lt- emp(E) office(E,O)
    dept(E,D)
  • S3(E,P) lt- emp(E) phone(E,P)
    dept(E,toys)
  • Query What is Sallys phone and office?
  • q(P,O) lt- phone(sally,P) office(sally,O)

15
Query Plans
  • Two plans are possible
  • p1(P,O) lt- s1(sally,P,M) s2(sally,O,D)
  • p2(P,O) lt- s3(sally,P) s2(sally,O,D)
  • How good are these plans?
  • How complete are s1 and s3 wrt phone numbers?
  • What if s1 and s3 return different phone numbers?
  • What if a source contains nulls?
  • Matching queries and sources is harder than in
    todays R-GMA
  • but some research systems have been
    built
  • e.g. Information Manifold

16
Could AstroGrid benefit from a Mediator?
  • Typical query
  • Find objects that appear in x-ray but dont
    appear in infra-red, for these sky
    coordinates
  • A mediator could
  • identify relevant sources.
  • dont bother considering this db
    its sky coverage never overlaps
    with any x-ray db
  • build local queries/ workflows on behalf of user
  • estimate coverage of a users query
  • your query would only cover 5 of what you
    want!
  • you can only get answers from the southern
    hemisphere
    (intentional
    answer)
  • so what is needed?

17
What would AstroGrids Mediator Need?
  • A global schema
  • seems work has started, e.g. UCDs
  • A common query language
  • you are working on this
  • do sources have a uniform interface?
  • Mappings that relate source schemas to global
    schema
  • source descriptions (UCDs) are registered
  • but may need to be more expressive
  • Other information would help
  • how complete is the view description?
  • is the source republishing data?
  • what queries/algorithms can be processed?
  • are there access restrictions?
  • challenging, as views and queries are
    complex!

18
Incompleteness when Integrating Data
  • Databases cover different areas of sky
  • DB1 contains all optical objects in its area
  • DB2 contains all x-ray objects in its area
  • Give me all objects in optical that are not in
    x-ray
  • Useful concepts like certain answer and
    possible answer arise from information-integrati
    on setting
  • can compare global concepts with what is in
    database
  • now consider types of incompleteness
    in AstroGrid

19
UCD Imprecision
Incompleteness due to UCD imprecision, e.g. UCD
list cannot enumerate all optical
bandpasses (from Bob Mann)
  • Make UCDs more expressive, to describe more
    precisely what databases contain
  • e.g. source view where x lt wavelength lt z and
    "
  • Mediator can then reason, and accurately identify
    relevant databases
  • e.g. whenever user asks where wavelength
    between (x, y)
  • Makes it easier to construct good workflows
    automatically

20
Sky Coverage Problem
  • Incompleteness in sky coverage
  • I want You
    have
  • Registry cant contain a full description of db
  • So approximately describe coverage?
  • Mediator could estimate quality
  • your query plan would only give 2 coverage

21
Precision Problem
  • Incompleteness in spectral coverage
  • I want a flux at 1.5mm, you have fluxes at 1.4mm
    and 1.6mm is that good enough?
  • Users specify precision in query?
  • I want flux at 1.5mm /- 0.2mm
  • Enhance source views (UCDs)
  • source A where flux 1.4mm /- 0.1mm
  • Mediator can then reason
  • yes, source A is suitable!

22
Service Unavailability
  • Incompleteness due to service unavailability
  • Registry says theres relevant data, but data
    centre is currently offline
  • Mediator needs to keep track of availability
  • e.g. send ping messages?
  • Could suggest alternative workflows
  • e.g. use republisher (warehouse) if possible.

23
Null Values
  • Incompleteness in the Registry
  • Allows Not applicable, Unknown and Not
    provided entries
  • Do these express a projection view?
  • e.g. view select a,b from aTable where
  • What if query refers to missing attribute?
  • e.g. select pad tuples with nulls?
  • e.g. where c val mark possible answers?
  • would need to provide semantics of any nulls that
    are added
  • also, nulls in databases

24
Conclusions
  • A mediator for AstroGrid?
  • access to sources is planned manually just now
  • looks like a natural setting for a mediator
  • many relevant pieces in place (query languages,)
  • would be interesting to explore idea!
  • Mediator could help with incompleteness problems
  • as incompleteness is w.r.t. something else,
  • and so something else must be provided!

25
Proposal
  • Set up a toy scenario to explore ideas
  • we would need technical support with this
  • plug trial components into AstroGrid?
  • Iterative development
  • users prioritise new features
  • we develop solutions
  • users give us feedback
  • we develop supporting logical model
Write a Comment
User Comments (0)
About PowerShow.com