Title: Personal Data Management
1Personal Data Management
- Why is this such an issue? Data Provenance
- Representing links v Representing data
- Identifying resources Life Science Identifiers
- Different types of provenance
- Provenance generation
- Provenance storage
- Provenance retrieval
2Problem
- Automated workflows produce lots of heterogeneous
data - These are just some of the results from one
workflow run for Williams Disease
3Amplification of results
One input
Many outputs
4Link v Data Representation
- Data management questions refer to relationships
rather than internal content - What are the origins of this data?
- Which service produced this data?
- Which data is this derived from?
- Who was this data produced for?
- ?What is this data telling me?
- Data analysis questions delegated to external
services.
5Representing links
urnlsidtaverna.sf.netdatathing45fg6
urnlsidtaverna.sf.netdatathing23ty3
- Identify each resource
- Life science identifier URI with associated data
and metadata retrieval protocols. - Understanding that underlying data will not change
6Representing links II
http//www.mygrid.org.uk/ontologyderived_from
urnlsidtaverna.sf.netdatathing45fg6
urnlsidtaverna.sf.netdatathing23ty3
- Identify link type
- Again use URI
- Allows us to use RDF infrastructure
- Repositories
- Ontologies
7Provenance (1)
Organisation level provenance
Process level provenance
Service
Project
runBye.g. BLAST _at_ NCBI
Experiment design
Process
Workflow design
componentProcesse.g. web service invocation of
BLAST _at_ NCBI
Event
partOf
instanceOf
componentEvente.g. completion of a web service
invocation at 12.04pm
Workflow run
Data/ knowledge level provenance
knowledge statementse.g. similar protein
sequence to
run for
User can add templates to each workflow process
to determine links between data items.
Data item
Person
Organisation
Data item
Data item
data derivation e.g. output data derived from
input data
8Storing management metadata
- Automated generation of this web of links
preferable - Workflow enactor generates
- LSIDs
- Data derivation links
- Knowledge links
- Process links
- Organisation links
As RDF
9Provenance generation
- Configuring and generating provenance within
Taverna
10Storage
- LSID has no protocol for storage
- Taverna/ Freefluo implements its own data/
metadata storage protocol
Publish interface
Taverna/ Freefluo
Metadata Store
data
Data store
metadata
11Retrieval
- LSID protocol used to retrieve data and metadata
- Query handled separately
LSID aware client
RDF aware client
LSID interface
Query
Metadata Store
Data store
12LSID launchpad
- Light weight plug in to Internet Explorer
providing access to LSID data / metadata - demo
13Using IBMs Haystack
GenBank record
Portion of the Web of provenance
Managing collection of sequences for review