Title: Promoting reuse and repurposing on the Semantic Grid
1Promoting reuse and repurposing on the Semantic
Grid
- Antoon Goderis
- University of Manchester, UK
- CHESS seminar, 19 July 2005
2Talk plan
- The grid
- The semantic grid
- Reuse and repurposing
- 7 bottlenecks to repurposing
- Semantics to the rescue
3The Grid
- Pervasive and dependable computing utility
- A distributed computing infrastructure for
advanced science and engineering - Coordinated resource sharing and problem solving
in dynamic, multi-institutional virtual
organisations
4Science in the 21st century
- Huge quantities of data
- Huge number of data collection devices
- Analysis is the bottleneck
- Global distributed science
- Collaboration and sharing the norm
- In silico experiments
- Build, reuse, repurpose on-line concurrent
processes (workflows)
5Grid application evolution
Smaller scale data, less machine computational
intensive, complex heterogeneous applications,
complex semantics, many people
Functional Genomics Oceanography Biodiversity Eart
h Science Neuroscience
Large scale data, large number of machines,
expensive computation, simple semantics, small
numbers of people
High Energy Physics
6The Semantic Grid
- The Grid has been about large scale computation
- But the applications are also about collaboration
- A gap between grid computing endeavours and the
vision of Grid computing - To support the full richness of the vision we
need both grid and semantic web (technologies) - Knowledge explicitly asserted explicitly used
7Semantic Grid
Semantic Web
Richer semantics
Classical Web
Classical Grid
More computation
Source Norman Paton
8Semantics in Grid workflows
- Classification and discovery of computational and
data resources provenance trails - Declarative specification of services, workflows
and their requirements problem solving selection
- Job control, distributed execution models,
semantic integration, resource brokering,
resource scheduling - Encoding performance metrics, service state,
event notification topics, access rights to
databases, personal profiles and security
groupings charging infrastructure
9Talk plan
- The grid
- The semantic grid
- Reuse and repurposing
- 7 bottlenecks to repurposing
- Semantics to the rescue
10From building workflows to recycling them
- Reuse of workflows
- Best practice
- Training
- Peer review
- Repurposing
- Adapt and extend useful fragments
- Build on best practice
- Across groups / communities
11Analyze This
12Analyze This x scientistsx workflowsx
versionsx runs
13Bridging user information need and workflow
descriptions
14Bridging user information need and workflow
descriptions
Network effects!
15Reuse and repurposing
- A user will reuse a workflow or workflow fragment
that fits their purpose and could be customised
with different parameter settings or data inputs
to solve their particular scientific problem.
16Reuse and repurposing
- A user will reuse a workflow or workflow fragment
that fits their purpose and could be customised
with different parameter settings or data inputs
to solve their particular scientific problem. - A piece of an experimental description that is a
coherent sub-workflow that makes sense to a
domain specialist (in Ptolemy, a composite actor) - A snippet of workflow code annotation
17Reuse and repurposing
- A user will reuse a workflow or workflow fragment
that fits their purpose and could be customised
with different parameter settings or data inputs
to solve their particular scientific problem. - A user will repurpose a workflow or workflow
fragment by - finding one that is close enough to be the basis
of a new workflow for a different purpose and - making small changes to its structure to fit it
to its new purpose. - Aiming for automated discovery of ranked
fragments
187 bottlenecks to workflow repurposing
- Lack of a comprehensive discovery model
- Process knowledge acquisition bottleneck
- Lack of workflow fragment rankings
- Workflow interoperability
- Restrictions on service availability
- Rigidity of service and workflow definitions
- Intellectual property rights on workflows
Make workflows usable
Collect enough workflows
19A comprehensive discovery model
- A user will repurpose a workflow or workflow
fragment by - finding one that is close enough to be the basis
of a new workflow for a different purpose and - making small changes to its structure to fit it
to its new purpose. - Based on semantic annotation, find a set of
workflows, which people can then edit - For scientists data flow based queries in their
jargon, largely abstracting from control - For developers control flow based queries,
largely abstracting from data
20Kepler
http//kepler.ecoinformatics.org/
Courtesy Bertram Ludaescher
21A comprehensive discovery model
- Scientist queries
- Find all processes where sequence alignment is
followed by visualisation - Given a set of data points, services, or
fragments, have these been connected up in an
existing base of workflows? Alternatives? - Show me the provenance of this workflow
- Developer queries
- How have people applied this dataflow execution
model (eg in Ptolemy, an SDF Director)? - How can it be combined with other execution
models?
22A comprehensive discovery model
- Challenges
- Libraries of (scientific) task based patterns
- Eg task semantics of gene annotation pipelines
classified in OWL - Libraries of design patterns for distributed
behaviour - Identify how people build concurrent systems how
they choose (combinations of) execution semantics
- A good start workflow patterns for Petri Nets
- Eg synchronizing merge and multi-merge
23Workflow fragment rankings
- A user will repurpose a workflow or workflow
fragment by - finding one that is close enough to be the basis
of a new workflow for a different purpose and - making small changes to its structure to fit it
to its new purpose. - We need metrics for processes
- For scientists ranking scientific relevance
- For developers
- compare processes based on the same execution
semantics - compare different execution semantics
- Challenge defining the metrics, and combining
them into rankings
24Workflow interoperability
- A user will repurpose a workflow or workflow
fragment by - finding one that is close enough to be the basis
of a new workflow for a different purpose and - making small changes to its structure to fit it
to its new purpose. - Workflows take a long time to build and get very
large - The nice thing about standards
- Different workflow systems, different (implicit)
semantics - Import workflows across workflow environments
- Manually redo it in your own
- Wrapping
- Auto-rewrite to new environment
- eg
25Workflow interoperability
- To inform interoperation, we need a layer of
abstraction that captures behavioural semantics - Many non-standardised formalisms out there
- Functional languages - one paradigm fits all?
- Petri nets
- Process algebras
- Finite State Machines
- All (hierarchical-) combinations of these
- Challenge
- Behavioural design patterns to compare formalism
classes, eg PN and SDF Director
26Conclusions
- Grid Semantic Grid
- Reuse ltgt repurposing
- Task and behavioural semantics both needed for
repurposing - Design patterns for distributed processes a long
road ahead - Task semantics
- Behavioural semantics
27EPSRC funded UK eScience Program Pilot Project
Many slides taken from Carole Goble
28- Core
- Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
Cawley, Neil Davis, Alvaro Fernandes, Justin
Ferris, Robert Gaizaukaus, Kevin Glover, Carole
Goble, Chris Greenhalgh, Mark Greenwood, Yikun
Guo, Jan Humble, Ananth Krishna, Peter Li,
Phillip Lord, Darren Marvin, Simon Miles, Luc
Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay,
Savas Parastatidis, Norman Paton, Terry Payne,
Matthew Pocock Milena Radenkovic, Stefan
Rennick-Egglestone, Peter Rice, Ian Roberts,
Martin Senger, Nick Sharman, Robert Stevens,
Victor Tan, Anil Wipat, Paul Watson, Jimi
Worthington and Chris Wroe. - Users
- Simon Pearce and Claire Jennings, Institute of
Human Genetics School of Clinical Medical
Sciences, University of Newcastle, UK - Hannah Tipney, May Tassabehji, Andy Brass, St
Marys Hospital, Manchester, UK - Steve Kemp, Liverpool, UK
- Postgraduates
- Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
Alper, Keith Flanagan, Antoon Goderis, Tracy
Craddock, Alastair Hampshire - Industrial
- Dennis Quan, Sean Martin, Michael Niemi, Syd
Chapman (IBM) - Robin McEntire (GSK)
- Collaborators
- Keith Decker
29References
- Publications on
- Home page www.cs.man.ac.uk/goderisa
- myGrid site www.mygrid.org.uk