Title: Brian Matthews,
1The CLRC Data Portal
- Brian Matthews
- Information Technology Dept
- CLRC
- b.m.matthews_at_rl.ac.uk
2Overview
- Motivates and describes a pilot implementation of
a Science Data Portal - The team Juan Bicarregui, David Boyd, Simon
Lambert, Brian Matthews, Kevin ONeill, Kerstin
Kleese, John Ashby
3Who we are (CLRC)
- Central Laboratory of the Research Councils
- 1700 staff - supporting 12000 scientists and
engineers from universities and industry - Based at 3 sites
- Daresbury Laboratory
- Rutherford Appleton Laboratory
- Chilbolton Observatory
- A Multidisciplinary Laboratory
4A Multidisciplinary Laboratory
- Spallation Neutron and Muon Source (ISIS)
- Synchrotron Radiation Source (SRS)
- Lasers
- Microstructures
- Space Science and Technology
- Molecular Spectroscopy
- Earth Observation
- Atmospheric Science
- Computational Science
- Energy Research
- Information Technology
- Particle Physics
- Radio Communications
- Surfaces Transforms and Interfaces
5The Vision
6The Problem
- Scientific institutions generate vast quantities
of data - CLRC - ISIS, SRS, Space Science, Particle
Physics, Computational Science, ... - More data coming on stream all the time
- CERN-LHC, Diamond, CASIM, HGP, ...
- Very good at handling large amounts of data
- Diverse approaches to organising and distributing
it.
Need a usable way of gaining access to the data
7The Data Portal Concept
- Single point of access to the CLRC data resources
- Encompasses a wide range of data holdings
- Describes what data is available from the
facilities - Links to the data held at the facility
- Different archiving methods
- Caters for a wide range of users
- general community ? data curators
- Supports a wide range of queries
- employing data mining, thesauri, .
8User Scenarios
- Experiment Proposer
- Have there been any neutron or X-Ray studies of
this molecule at 100 K? - Lecturer
- This published study would be a good example for
teaching, is the raw data publicly available? - Researcher
- This is an interesting paper - can I check the
data? - Instrument Scientist
- The instrument seems a bit unstable recently,
fetch me the results of all calibration runs from
the last 3 months?
9User Scenarios
- SRS user Has a molecule with this space group
and cell dimensions been studied with neutrons? - ISIS user is there SRS data about this molecule
to give me a start with the structure? - Instrument scientist How far had my colleague
got with the analysis before he went on holiday?
10Combine Diverse Users Searches ...
Discovery
Excavation
Experimenter
Data curator
Wider science community
General community
Specialist user
11 with Distributed Data Silos.
Facility 1
Facility 2
Facility 3
Facility 4
12using a central index ...
Client
http
CLRC Data Access Server
XML wrapper
XML wrapper
Local metadata
Common metadata catalogue database
Local data
Facility 1
13a common metadata...
Investigation
Data Holding
Data Holding
Data Holding
Data-Set 1 (Raw)
Data-Set 2 (Inter)
Data-Set 3 (Final)
File 1 name date
File 1 name date
File 1 name date
14 and a Web based interface
- Exploit the existing Web infrastructure.
- Use New Technologies (XML/RDF)
- rapidly disseminated
- widely accessible
- database and user platform independent
- can be developed quickly, but with the GRID in
mind.
Every user who needs to can get to the
information.
15The Pilot
- Crystallography data at SRS and ISIS
- Single Crystals, Proteins and Powder
- Web-based but Grid-compatible
- Motivated by interviews with
- Instrument and support Scientists
- (Chick Wilson, Ken Shankland, ISIS, Simon Teat
SRS) - Facility IT support
- (Kevin Knowles, Mark Enderby)
- Users
- (Jeremy Cockcroft, David Moss, Birkbeck)
16Approach
- 1. Consulted with potential Users
- developed use cases
- 2. Developed metadata formats
- generic and specific
- 3. Implemented database, server and queries
- 4. Piloted with real data
17Metadata
A generic metadata model for all scientific
applications with Specialisation for each domain
Can answer questions across domains Can answer
questions about specific domains
18Metadata Model
19Metadata example
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
CLRCMetadata SYSTEM "clrcmetadata.dtd"gt ltCLRCMetad
atagtltMetadataRecord metadataID"N000001"gt ltTopicgt
ltDisciplinegtChemistrylt/Disciplinegt ltSubjectgtC
rystal Structurelt/Subjectgt ltSubjectgtCopperlt/Subj
ectgt... ltExperimentgt ltStudyNamegtCrystal
Structure Copper Palladium complex 150K
... ltInvestigatorgtltNamegtltSurnamegtPorter...ltInsti
tutiongtUniversity of Peebles ... ltFundinggtEPSRC
... ltTimePeriodgtltStartDategtltDategt21/04/1999.
ltPurposegtltAbstractgt To study the structure of
Copper and Palladium co-ordination complexes at a
150K. ltDataManagergtltNamegtltSurnamegtTeat... ltIn
strumentgtSRS Station 9.8, BRUKER AXS SMART
1K... ltConditiongt...Wavelength...ltUnitsgtAngstr
om...ltParamValuegt0.6890... ltConditiongtCrystal-t
o-detector distanceltUnitsgtcm...ltParamValuegt5.00...
ltAccessConditionsgtThe user has to be one of
Prof. F. Porter.
20Architecture
Users
Other Data Portals
Grid middleware
XML wrapper
XML wrapper
XML wrapper
Local metadata
Local metadata
Local metadata
Local data
Local data
Local data
Facility 2
Facility 3
Facility 4
21Server Architecture
USER
Key
User input interpreter
User output generator
Internal
http
pre-set XSL Script
Query Generator
module
Response Generator
XML Parser
External agent
XML File
XML File
Central metadata repository
Local metadata repository
Ascii file
22Standards and tools
- Client
- a standard web browser.
- Server
- Apache Tomcat - Java Servlet engine.
- Xerces XALAN - Apache XML tools - standard
APIs. - Database
- Access generating XML
- W3C standards used
- XHTML
- XML
- XML DTDs
- DOM
- XSLT
- HTTP
Implemented using standards for portability
and available support
23Example
Result of searching search across facilities -
returns XML to session and displays summary
24Expand Results - give more details from the same
XML
25Going Deeper - Can browse the data sets
26Select data - pick the required data files and
download from convenient location.
27The Future
- Pilot (very quickly) completed
- Now in a planning and investigation phase
- Consolidate and Broaden existing system
- move towards a development system
- handle greater diversity of data sources
- Enhance the Technology
- GRID tools (Globus, SRB)
- Web tools (XML Query, RDF)
28End to End Support
- Metadata collection and maintenance is a big
problem. - But doing science is a process.
Collecting the metadata can then become part of
the experimental support environment
29Interface with existing archives
- CLRC maintains existing data archives
- Atmospheric, earth observation, STP, astronomy.
- Existing access mechanisms (Web, Z39.50)
- Existing metadata catalogues and formats
- Can we use the Data Portal to access them?
- Use the Metadata format as a framework to be
specialised to express existing metadata
framework - San Diego SRB as an abstraction layer on the
archive - XML Query as a query layer on the archive
30Re-architect system
- Break up the portal middleware into components.
User service
Grid Enable with Globus components
Results collation
ontology service
RDFDAMLOIL
DP
Query generation
XML Query
Security service
Globus GSI
Replication service
Globus replication service
Data source location
Globus GIS - MDS
31Peer-to-Peer
Build into a Grid Infrastructure Deliver data to
computing resources
GP1
DP1
DP2
DS1
DS1
DS1
DS2
32Technology developments
- Enhance and standardise Metadata Model
- relate to other models
- Use RDF and XML Schema
- Enhance and Tune the user interface
- Keywords, Glossary, Thesaurus, Data mining
- Enhance Security Model
- Grid enable
- link to visualisation tools
- link to computational resource discovery and
allocation - link to experimental process
33E.g. using XML Schema
USER
Key
User input interpreter
User output generator
Internal
http
generated XSL Script
Query Generator
module
Response Generator
XML Schema
XML Parser
External agent
XML File
XML File
Central metadata repository
Local metadata repository
Ascii file
34Where are we?
- Initial pilot produced
- reviewed and supported by users
- Consolidating into facilities
- Planning for expansion and integration
- Looking for collaborators
- b.m.matthews_at_rl.ac.uk
- www.escience.clrc.ac.uk