Title: Engaging researchers with e-Infrastructure
1Engaging researchers with e-Infrastructure
- Leaping hurdles planning IT provision for
research - 6 June 2009
- Neil Chue Hong / Steve Brewer
2Engaging Research with e-Infrastructure
- What do people want to do? What are they doing
already? - Trivial barriers can seem insurmountable
- Demonstrate success and inspire trust in
e-Infrastructure - Get users to engage with e-Infrastructure to
improve research output
- Interview researchers to identify what works and
whats needed - Analyse requirements and propose interventions
- Develop solutions and disseminate best practice
- www.engage.ac.uk
3Engaging Research with e-Infrastructure
Interviews
Wider deployment
Projects
Dissemination
Adoption
New requirements
4ENGAGE Researcher Interviews
- 53 interviews
- semistructured
- 36 face-to-face
- 17 telephone
- 60 people
- 24 institutions
- Triage process to identify development projects
and best practice
5The Analysis ofData in ENGAGE
Interview Summary
1) Writes own software in Python - looking at
better ways of getting it used 2) Software works
on multiple datasets mapping to own data
format 3) Cytoscape not automated. Cannot
automate visualisations. 4) Data visualisation is
restricted as there are multiple datasets to
download and access 6) Runs take 1-2 weeks. Not
interactive. 7) Cannot submit large jobs.
Transcription
Analysis Triage
Obstacles
Best Practice
Interviews
- Tools not easily accessible to other researchers
- Unable to run large jobs on current resources
- Difficult to reuse / repurpose workflows
Sourcing a system on which to run the
services Assumptions made by software
installation Assumptions made handling I/O with
WS framework Timing issues when checking for
secure services Teaching, admin and other
research commitments meant that the primary
researcher had insufficient time
- Undertake feasibility study and investigate
making the protein sequence databases available
as web services before the wrapped applications
can use their data. - Get the wrapped applications and workflows to
work in a production environment on
instituitional facilities - Investigate and carry out the migration from the
production environment to the NGS.
Commission
Evaluation
Evaluation Report
Development Project
ProjectBrief
6Case 2 - Interview Summary
- Lost support of software
- Certificates cause problems
- Analysis takes 3 days
- Cannot log in to MetaData Database 50 of time
due to proxy certificate problems - Would use grid tools is had more confidence in
their stability - Security an issue as human nature has made it
less secure - Morphology and Growth Rate calculations better if
distributed - Lots of data uploaded to NGS
- Process takes 3 months from beginning to end
7Post Interview Questionnaire
- Research/project objective
- Name of main sponsor of the work
- Research/project main challenge(s)
- Software application(s) used also provide
relationship role (analyst, developer,
contributor, user, support/ advisor, non-user
researcher) if applicable - Number of current users number of expected users
- Infrastructure/platform(s) used/desired
8Triage Questionnaire
- What is the degree of applicability of the work
to OMII-UK/NGS? - Suggested other partner(s) and partner
institution(s) (if applicable) - Research/project objective observation/clarifica
tion required - Related/relevant software application(s)
- Estimation of probable users suggested target
for eventual users - Infrastructure/platform used
- Research/project main obstacle(s)/issue(s)
(maximum of 3) Research/project main
technological obstacle(s)/issue(s) (maximum of 3) - Potential solution(s) (maximum of 3)
- Salient points, comments or brief proposal
(elevator speech)
9Case 2 Identified Obstacles
- Research/Project Obstacles
- Lack of continuity of support. They would use
grid resources more if they were confident about
stability and continuity. - Started second phase of project to look at making
the tools accessible. More marketing and
streamlining of the tools. - Technical Obstacles
- Have lost key people who tinkered with the code
to get it to run. - Amount of time spent getting the computational
systems working and efficient. - Future work might be more sensitive (possible
forging of polymorphs, pharmaceutical data), so
security will become increasingly important.
10Case 2 Potential Approaches
- Installation and wrapping support for specific
applications in different contexts eg. NGS,
Legion and security eg easier authentication - Consultancy and support gluing it all together
- Partnership with their consultancy
11Case 2 Project Brief
- Replace DMAREL with DMACRYS, which is capable of
dealing with much larger molecules and crystal
structures, and using better models for the
intermolecular forces. - Expand the BPEL workflow to perform
post-processing of the results, such as
consistency checks and the re-submission of
minima that are transition states. Enhance the
presentation during the search and the storage
for retrieval of the results. - Port the deployment to run on both Legion and
Condor pool for testing, and design it to then
also run on the NGS so that polymorph
calculations can be performed by the wider range
of users. - Consider whether either larger machine would be
able to also offer the alternative search method
Crystal Predictor.
12First Phase ENGAGE Development Projects
- High Throughput Humanities for e-Research
- Exposing bioinformatic programs as Web Services
- Protein Molecule Simulation on the Grid
- Enable workflows in a Shared Genomics causality
workbench - Linking and Querying Ancient Texts
- SWARMCloud
- Rapid Chemistry Portals by Engaging Users
13Second Phase ENGAGE Development Projects
- Monte Carlo Treatment Planning
- Crystal Energy Landscape Application
- Epigraphy and papyrology image processing
- Strengthening and support for eMinerals RMCS
system - Configuration parameters for the GENIE simulator
- Lab Blog Book
- Strengthening and supporting the text and data
analysis toolkit OSCAR
14Planning IT provisionfor research
Putting the Team together
Creating a Common language
Need to bring together researchers, developers
and infrastructure providers. Can be difficult to
retain experienced staff.
Shared vocabulary for information exchange e.g.
analysis, ontologies. Experience can make it
easier to broker this process.
Lab Blog Book
Best Practice
Link researchers analysing molecular structure
and function via crystallography and MD simulation
Understand where approaches can be reused.
Virtual server provided on NGS2 hosting Lab Blog
server. Databases ported for wider use. Working
with IT administrators makes provisioning faster
Follow how researcher constructs the DL Poly
simulation files, recreate at Southampton, link
it to servers.Evaluation improves the usability
of the work
Unix, Apache 2, PHP 5, MySQL 5, ImageMagick, 1GB
storage. Well defined requirements drive wider
infrastructure adoption
Defining Requirements
Provisioning infrastructure
Evaluating Usage
15First Phase ENGAGE Development Projects
- HiTheR implemented different document similarity
algorithms, 75 reduction in run-time on small
Condor cluster, positive researcher evaluations -
discovered various chains of related articles and
misclassified articles, looking to transfer to
NGS - Exposing bioinformatic programs as Web Services
Nine protein sequence analysis applications
hosted on 144 CPU cluster, workflows created now
in daily use by postgraduates, has impressed
infrastructure providers - ProSim Tools connected and made available in
portal, workflows evaluated, workshop ran from
20-24 April with 40 attendees - Shared Genomics workbenches integrated leading
to new ideas for innovative user interfaces based
on coverflow techniques - LaQuAT three databases integrated, in different
languages. researcher about to complete formal
evaluation - RCPER 3 portals complete, 1 portal underway 1
portal evaluated and about to be used by 100
undergraduates dissemination at ScotChem workshop
16Second Phase ENGAGE Development Projects
- MCTP new users at Swansea, Galway and
Liverpool making use of the updated system - Crystal Energy Landscape application-CPOSS New
DMACRYS system now working re-engineered
workflows being evaluated by Sally Prices
research team at UCL - RMCS Remote job submission for molecular
simulation Project complete and good progress
achieved Examining link to other projects - Integration of image processing tools within the
VRE-SDM New integrated system previewed at
recent Image, Text, Interpretation workshop in
Oxford user interface well received - Aladdin 2 a launchpad for the GENIE Earth-System
Model Ported GENIE simulator now operational
configurable parameters can be rendered MatLab
logic has been ported from GENIELab.
17ENGAGE Summary
- From interview to exemplar project showing the
use of e-Infrastructure - Provide publicly available information to improve
uptake
18www.engage.ac.uk