Brian Matthews, - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Brian Matthews,

Description:

Motivates and describes a pilot implementation of a Science Data Portal ... (Chick Wilson, Ken Shankland, ISIS, Simon Teat SRS) Facility IT support ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 31
Provided by: Alexis50
Category:
Tags: brian | chick | matthews

less

Transcript and Presenter's Notes

Title: Brian Matthews,


1
The CLRC Data Portal
  • Brian Matthews
  • Information Technology Dept
  • CLRC
  • b.m.matthews_at_rl.ac.uk

2
Overview
  • Motivates and describes a pilot implementation of
    a Science Data Portal
  • The team Juan Bicarregui, David Boyd, Simon
    Lambert, Brian Matthews, Kevin ONeill, Kerstin
    Kleese, John Ashby

3
Who we are (CLRC)
  • Central Laboratory of the Research Councils
  • 1700 staff - supporting 12000 scientists and
    engineers from universities and industry
  • Based at 3 sites
  • Daresbury Laboratory
  • Rutherford Appleton Laboratory
  • Chilbolton Observatory
  • A Multidisciplinary Laboratory

4
A Multidisciplinary Laboratory
  • Spallation Neutron and Muon Source (ISIS)
  • Synchrotron Radiation Source (SRS)
  • Lasers
  • Microstructures
  • Space Science and Technology
  • Molecular Spectroscopy
  • Earth Observation
  • Atmospheric Science
  • Computational Science
  • Energy Research
  • Information Technology
  • Particle Physics
  • Radio Communications
  • Surfaces Transforms and Interfaces

5
The Vision
6
The Problem
  • Scientific institutions generate vast quantities
    of data
  • CLRC - ISIS, SRS, Space Science, Particle
    Physics, Computational Science, ...
  • More data coming on stream all the time
  • CERN-LHC, Diamond, CASIM, HGP, ...
  • Very good at handling large amounts of data
  • Diverse approaches to organising and distributing
    it.

Need a usable way of gaining access to the data
7
The Data Portal Concept
  • Single point of access to the CLRC data resources
  • Encompasses a wide range of data holdings
  • Describes what data is available from the
    facilities
  • Links to the data held at the facility
  • Different archiving methods
  • Caters for a wide range of users
  • general community ? data curators
  • Supports a wide range of queries
  • employing data mining, thesauri, .

8
User Scenarios
  • Experiment Proposer
  • Have there been any neutron or X-Ray studies of
    this molecule at 100 K?
  • Lecturer
  • This published study would be a good example for
    teaching, is the raw data publicly available?
  • Researcher
  • This is an interesting paper - can I check the
    data?
  • Instrument Scientist
  • The instrument seems a bit unstable recently,
    fetch me the results of all calibration runs from
    the last 3 months?

9
User Scenarios
  • SRS user Has a molecule with this space group
    and cell dimensions been studied with neutrons?
  • ISIS user is there SRS data about this molecule
    to give me a start with the structure?
  • Instrument scientist How far had my colleague
    got with the analysis before he went on holiday?

10
Combine Diverse Users Searches ...
Discovery
Excavation
Experimenter
Data curator
Wider science community
General community
Specialist user
11
with Distributed Data Silos.
Facility 1
Facility 2
Facility 3
Facility 4
12
using a central index ...
Client
http
CLRC Data Access Server
XML wrapper
XML wrapper
Local metadata
Common metadata catalogue database
Local data
Facility 1
13
a common metadata...
Investigation
Data Holding
Data Holding
Data Holding
Data-Set 1 (Raw)
Data-Set 2 (Inter)
Data-Set 3 (Final)
File 1 name date
File 1 name date
File 1 name date
14
and a Web based interface
  • Exploit the existing Web infrastructure.
  • Use New Technologies (XML/RDF)
  • rapidly disseminated
  • widely accessible
  • database and user platform independent
  • can be developed quickly, but with the GRID in
    mind.

Every user who needs to can get to the
information.
15
The Pilot
  • Crystallography data at SRS and ISIS
  • Single Crystals, Proteins and Powder
  • Web-based but Grid-compatible
  • Motivated by interviews with
  • Instrument and support Scientists
  • (Chick Wilson, Ken Shankland, ISIS, Simon Teat
    SRS)
  • Facility IT support
  • (Kevin Knowles, Mark Enderby)
  • Users
  • (Jeremy Cockcroft, David Moss, Birkbeck)

16
Approach
  • 1. Consulted with potential Users
  • developed use cases
  • 2. Developed metadata formats
  • generic and specific
  • 3. Implemented database, server and queries
  • 4. Piloted with real data

17
Metadata
A generic metadata model for all scientific
applications with Specialisation for each domain

Can answer questions across domains Can answer
questions about specific domains
18
Metadata Model
19
Metadata example
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
CLRCMetadata SYSTEM "clrcmetadata.dtd"gt ltCLRCMetad
atagtltMetadataRecord metadataID"N000001"gt ltTopicgt
ltDisciplinegtChemistrylt/Disciplinegt ltSubjectgtC
rystal Structurelt/Subjectgt ltSubjectgtCopperlt/Subj
ectgt... ltExperimentgt ltStudyNamegtCrystal
Structure Copper Palladium complex 150K
... ltInvestigatorgtltNamegtltSurnamegtPorter...ltInsti
tutiongtUniversity of Peebles ... ltFundinggtEPSRC
... ltTimePeriodgtltStartDategtltDategt21/04/1999.
ltPurposegtltAbstractgt To study the structure of
Copper and Palladium co-ordination complexes at a
150K. ltDataManagergtltNamegtltSurnamegtTeat... ltIn
strumentgtSRS Station 9.8, BRUKER AXS SMART
1K... ltConditiongt...Wavelength...ltUnitsgtAngstr
om...ltParamValuegt0.6890... ltConditiongtCrystal-t
o-detector distanceltUnitsgtcm...ltParamValuegt5.00...
ltAccessConditionsgtThe user has to be one of
Prof. F. Porter.
20
Architecture
Users
Other Data Portals
Grid middleware
XML wrapper
XML wrapper
XML wrapper
Local metadata
Local metadata
Local metadata
Local data
Local data
Local data
Facility 2
Facility 3
Facility 4
21
Server Architecture
USER
Key
User input interpreter
User output generator
Internal
http
pre-set XSL Script
Query Generator
module
Response Generator
XML Parser
External agent
XML File
XML File
Central metadata repository
Local metadata repository
Ascii file
22
Standards and tools
  • Client
  • a standard web browser.
  • Server
  • Apache Tomcat - Java Servlet engine.
  • Xerces XALAN - Apache XML tools - standard
    APIs.
  • Database
  • Access generating XML
  • W3C standards used
  • XHTML
  • XML
  • XML DTDs
  • DOM
  • XSLT
  • HTTP

Implemented using standards for portability
and available support
23
Example

Result of searching search across facilities -
returns XML to session and displays summary
24
Expand Results - give more details from the same
XML
25
Going Deeper - Can browse the data sets
26
Select data - pick the required data files and
download from convenient location.
27
The Future
  • Pilot (very quickly) completed
  • Now in a planning and investigation phase
  • Consolidate and Broaden existing system
  • move towards a development system
  • handle greater diversity of data sources
  • Enhance the Technology
  • GRID tools (Globus, SRB)
  • Web tools (XML Query, RDF)

28
End to End Support
  • Metadata collection and maintenance is a big
    problem.
  • But doing science is a process.

Collecting the metadata can then become part of
the experimental support environment
29
Interface with existing archives
  • CLRC maintains existing data archives
  • Atmospheric, earth observation, STP, astronomy.
  • Existing access mechanisms (Web, Z39.50)
  • Existing metadata catalogues and formats
  • Can we use the Data Portal to access them?
  • Use the Metadata format as a framework to be
    specialised to express existing metadata
    framework
  • San Diego SRB as an abstraction layer on the
    archive
  • XML Query as a query layer on the archive

30
Re-architect system
  • Break up the portal middleware into components.

User service
Grid Enable with Globus components
Results collation
ontology service
RDFDAMLOIL
DP
Query generation
XML Query
Security service
Globus GSI
Replication service
Globus replication service
Data source location
Globus GIS - MDS
31
Peer-to-Peer
Build into a Grid Infrastructure Deliver data to
computing resources
GP1
DP1
DP2
DS1
DS1
DS1
DS2
32
Technology developments
  • Enhance and standardise Metadata Model
  • relate to other models
  • Use RDF and XML Schema
  • Enhance and Tune the user interface
  • Keywords, Glossary, Thesaurus, Data mining
  • Enhance Security Model
  • Grid enable
  • link to visualisation tools
  • link to computational resource discovery and
    allocation
  • link to experimental process

33
E.g. using XML Schema
USER
Key
User input interpreter
User output generator
Internal
http
generated XSL Script
Query Generator
module
Response Generator
XML Schema
XML Parser
External agent
XML File
XML File
Central metadata repository
Local metadata repository
Ascii file
34
Where are we?
  • Initial pilot produced
  • reviewed and supported by users
  • Consolidating into facilities
  • Planning for expansion and integration
  • Looking for collaborators
  • b.m.matthews_at_rl.ac.uk
  • www.escience.clrc.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com