Title: myExperiment A Web 2'0 VRE
1myExperiment A Web 2.0 Virtual Research
Environment David De RoureCarole Goble
2Overview
- e-Science is about scientists doing science
- A Tale of Two Projects
- myExperiment
- Design Patterns for a VRE
3CombeChem pilot project
Video
Simulation
Properties
Analysis
StructuresDatabase
Diffractometer
X-Raye-Lab
Propertiese-Lab
Grid Middleware
www.combechem.org
4Undergraduate Students
Digital Library
Graduate Students
E-Scientists
E-Scientists
E-Scientists
Reducing time-to-experiment
E-Experimentation
Entire e-Science CycleEncompassing
experimentation, analysis, publication, research,
learning
http//www.ukoln.ac.uk/projects/ebank-uk/
5Provenance
- The key observation!
-
- Publication at Source describes the need to
capture data and its context from the outset and
maintain a complete end-to-end connection between
the laboratory bench and the intellectual
chemical knowledge that is published as a result
of the investigation
The details of the origins of data are just as
important to understanding as their actual values
6My Chemistry Experiment
7(No Transcript)
8The RDF Graph
9(No Transcript)
10Data creation capture in Smart lab
Presentation services portals
Data discovery, linking, citation
Search, harvest
Data analysis, transformation, mining, modelling
Aggregator services
Harvest
Deposit
e-Research workflows
Institutional data repositories
Laboratory repository
e-Crystals Federation model
Deposit
Validation
Validation
Publication
(Chemistry Central)
Data curation preservation databases
databanks
Linking, citation
Publishers peer-review journals, conference
proceedings
This work is licensed under a Creative Commons
LicenceAttribution-ShareAlike 2.0
11Key collective activities in e-science
informal and formalcommunication
meetings
interpretation of data/events
archiving/recovering information
following through decisions/coordinating
activities
producing documents other artifacts
http//www.aktors.org/coakting/
12What we learnt about VREs
- Reducing time-to-experiment
- Datasets as publication
- Provenance matters
- Publish the pieces, dont warehouse
- Semantic Lab notebooks in the VRE
- Blogging the lab
- Federated back end
- Semantic DataGrid, built socially
- Deep integration with collaborative tools
13Bioinformatics is not Chemistry
- There are many pieces, from many boxes, but
no box, and no lid with a complete picture of
what the puzzle is supposed to be. - Planning? No.
- Metadata an afterthought
14myGrid
- Open Source middleware for Life Scientists that
enables them to undertake in silico experiments
and share those experiments and their results. - Machinery for linking together datasets and tools
- Individual scientists, in under-resourced labs,
who use other peoples datasets and applications. - Ad hoc exploratory workflows (data flows)
- To support sharing and collaboration between
scientists to disseminate best practice and
improve the quality of science - 33,000 downloads 200 user sites 400
workflows - 3500 third party external services accessible.
- Moved from prototype to production quality.
- Open Middleware Infrastructure Institute UK
- http//www.mygrid.org.uk
15Taverna Workflow Workbench
16Widespread Adoption
- Users in US, Asia, UK, Europe, Australia
- Systems biology
- Proteomics
- Gene/protein annotation
- Microarray data analysis
- Medical image analysis
- Heart simulation orchestration
- High throughput screening of chemical compounds
- Phenotypical studies
- Public Health studies
- Clinical trial analysis
- Plants, Mouse, Human
- Astronomy
- Cultural Heritage
17Recycling, Reuse, Repurposing
- Identified a pathway for which its correlating
gene (Daxx) is believed to play a role in
trypanosomiasis resistance. - Manual analysis on the microarray and QTL data
failed to identify this gene as a candidate. - Repetitive, unbiased analysis.
- Trypanosomiasis cattle workflow reused without
change to identify the biological pathways
involved in sex dependence in the mouse model,
previously believed to be involved in the ability
of mice to expel the parasite. - Previously a manual two year study of candidate
genes had failed to do this.
Paul Fisher et al A Systematic Strategy for
Large-Scale Unbiased Analysis of
Genotype-Phenotype Correlations Bioinformatics in
review
18- Service and workflow annotation
- Ontology 710 classes
- Full time curator
- Tagging by the masses
- 3500 service. 350 curated
- Provenance
- Ontology 35 classes
- Enriched with domain ontologies and service
ontologies. Possibly. - Export with data. Desirably.
19New Scientific Digital Artefacts
- Design
- Workflow design history
- Experiment purpose
- Scientist
- LogBook
- Workflow run log
- Data lineage
- Results interpretation log
20New digital artefacts
21myExperiment.org Portal Party
- 28th 29th Sept 2006
- Hand picked Taverna users Taverna development
team - Facilitated by NCeSS.
- AJAX based development
- CombeChem xfer
- A social networking environment for sharing any
workflow - A Taverna workflow run environment
- A multi-workflow launch environment
22Virtual Research Environments
- VRE 1
- Technology-focused
- Experimental
- Diverse design development approaches
- Stand-alone solutions
- VRE 2
- User- research practice-focused
- Developmental
- Unified design development approaches
- Integrated solutions
- Collaboration
- Supporting small large-scale research
- Support for single-disciplinary and
multi-disciplinary research
23(No Transcript)
24openwetware.org
25(No Transcript)
26What are we trying to do?
- Enabling scientists to be (more) creative.
- Enabling scientists to be scientists. And not
programmers. - Enabling mediocre scientists to become better and
thus have better science. - Enabling smart scientists to be smarter and
propagate their smartness. - Accelerate dissemination, pooling, insight.
- Encouraging sanctioned plagiarism.
27Principles
- Focus on making it easy to publish information
- Discovering and sharing experimental artefacts
- Publishing results to standard community
repositories - Publishing scholarly output
- Familiar social networking / web paradigms
- Keeping it free and fluid and creative.
Me-Science. - Crossing system boundaries
- Trans-workflow
- Crossing discipline boundaries
- Multi-disciplinary, Inter-disciplinary,
Trans-disciplinary - Clustering expertise
- Intellectual fusion outside discipline.
We-Science. - Life Science, Social Science, Astronomy, Chemistry
28Scoping exercise
- Workflow warehouse / federation of repositories
Open Archives Initiative. Federated
myExperiments. Sharepoint. - Social space organised rich site Social
discourse organised service / workflow space
using curated semantics. - Granularity and identifiers Rolling-up
provenance. Id resolution - Open vs protected content Quality, Reliability,
Validation, Safety, Intellectual Property,
Ownership, Secrecy, A duty of guardianship.
Curation? Policing? Local data mixed with shared
resources - Desktop integration Google gadgets for workflows.
Interacting with workflows through Office
products. - Workflow execution (WHIP) Workflows Hosted in
Portals project - Evolving the myExperiment software Community
development - Enabling Scientists added value through
applications and collaborative tagging
29Hack Fest
30(No Transcript)
31(No Transcript)
32Q1. Workflow Warehouse orFederation of
Repositories?
- Everything on the myExperiment.org web site
- vs
- Distributed stores
- Multiple myExperiments
33Q2. Social Space or Shoe Shop?
- Shopping for Workflows and Services and Data
should be as easy as shopping for shoes. - Organic growth is good and bad.
- Social tagging might help discover workflows but
we need good metadata for automated use.
26/2/2007 myExperiment Slide 33
34Q3. How open is the content?
- OpenWetware is open
- Our users dont want this
- Provenance helps
35Q4. Integration
- Bring user to Web Site
- vs
- Bringing myExperimentness to existing interfaces
36Web 2.0 Design Patterns
- The Long Tail
- Data is the Next Intel Inside
- Users Add Value
- Network Effects by Default
- Some Rights Reserved
- The Perpetual Beta
- Cooperate, Don't Control
- Software Above the Level of a Single Device
- http//www.oreillynet.com/pub/a/oreilly/tim/news/2
005/09/30/what-is-web-20.html
371. The Long Tail
- Our target users are not just the specialist
e-Scientists using computing resources to tackle
major scientific breakthroughs, but also the
large number of scientists conducting the routine
processes of science on a daily basis. - Through sharing we have the potential to enable
smart scientists to be smarter and propagate
their smartness, in turn enabling other
scientists to become better and conduct better
science.
382. Data is the Next Intel Inside
- myExperiment understands that scientists are
focused on data, not software or one particular
workflow engine. - Workflows are components of customised
applications, many of which are data-oriented
rather than process-oriented. - Users manipulate, through their own applications,
the product (data, model) yielded by the
workflow. - Furthermore, workflows themselves are the data of
myExperiment and provide its unique value.
393. Users Add Value
- myExperiment makes it easy to find workflows and
is designed to make it useful and straightforward
to share workflows and add workflows to the pool.
- To succeed we draw on the insights into the
incentive models of scientists gained through
experience with Taverna.
404. Network Effects by Default
- myExperiment aggregates user data as a
side-effect of using the VRE. - The ability to execute workflows from
myExperiment, and the integration of tools such
as Taverna with myExperiment, further enable us
to achieve increased value through usage.
415. Some Rights Reserved
- myExperiment users require protection as well as
sharing, but the environment is designed for
maximum ease of sharing to achieve collective
benefits workflows are "hackable" and
"remixable". - Initiatives such as Science Commons provide a
useful context for this.
426. The Perpetual Beta
- myExperiment is an online service (a collection
of online services) and is continually evolving
in response to its users. - To support this, the project commenced with
developers being embedded in the user community. - Through day-to-day contact between designers and
researchers, design is both inspired and
validated.
437. Cooperate, Don't Control
- myExperiment is a network of cooperating data
services with simple interfaces which make it
easy to work with content. - It both provides services and reuses the service
of others. - It aims to support lightweight programming models
so that it can easily be part of loosely coupled
systems.
448. Software Above the Level of a Single Device
- The current model of Taverna running on the
scientists desktop PC or laptop is evolving into
myExperiment being available through a variety of
interfaces and supporting workflow execution.
45Closing
- e-Science is difficult workflows and Web 2.0
make it easier. - Our design workshops and the review against Web
2.0 design patterns have revealed the
relationship between myExperiment and Web 2.0. - The collective benefits of participation arise
not only from the users but also from the
developers ease of use and ease of development.
- It might be useful to review other VREs against
the design patterns.
46Take homes
- myExperiment is a Web 2.0 Environment for
Scientists to share experiments - Join us!
- David De Roure
- dder_at_ecs.soton.ac.uk
- Carole Goble
- carole.goble_at_manchester.ac.uk
47Credits
- myGrid and CombeChem
- Matt Lee
- David Withers
- Don Cruickshank
- Rob Procter
- Alex Voss
- June Finch
- Ed Zaluska
- All the users inc. embedders