Title: Overview of Today
1Overview of Todays Talks
- Provenance Data Structures
- Recording and Querying Provenance
- Break (30 minutes)
- Distribution and Scalability
- Security
- Methodology
2Distribution and Scalability by Paul Groth
(pg03r_at_ecs.soton.ac.uk)
3Applications are distributed
4Applications require scalability
- Applications may have millions of interactions
- These interactions may be simultaneous
- Applications may have large amounts of data
5These Issues Provenance
- Because applications are distributed and need
scalability, a provenance system must support
these requirements - Provenance Systems have their own requirements in
these areas - Large numbers of p-assertions.
- Scalability in terms of querying and recording
6Provenance Store Distribution
Recording Patterns
- - Bandwidth
- - Access Control
- Storage
- Legal
- Multiple physical Provenance Stores per site
PS
PS
PS
PS
PS
PS
7Logical Distribution
- Provenance Stores are both physical and logical
entities - Single physical store could have multiple logical
stores - Logical Provenance Stores provide bounds to
process documentation - Could be organisational, experimental, or
individual
8Logical Provenance Stores
Physical Provenance Store Hospital
Payroll Store
Pauls Store
Donor Data Collector Store
SurgeryWard Store
9Provenance Store Usage
- Combinations of logical and physical Provenance
Stores can be adopted depending on the
applications needs - In terms of
- Scalability
- Regulatory / Legal
- Information partitioning
10Distributed Query?
- Process Documentation is in multiple stores
- How do we get the provenance of a data item in
this case? - Solution connections embedded in process
documentation - Shared Context
- Links
11Shared Context revisited
Querier
PS 2
PS 1
Query PAs with IK 1
Query PAs with IK 1
P-assertion With IK 1
P-assertion With IK 1
IK 1
12Links
- Links are unidirectional pointers to provenance
stores - Links connect provenance stores
- Links are recorded by actors as part of
p-assertions - Links are transferred between actors using
interaction contexts - There are two kinds of links
- View Links
- Object Links
13Views revisited
- A view is the set of assertions by 1 actor about
1 interaction. - A view contains
- An actor identity
- A set of p-assertions
- A view is one of two view kinds sender or
receiver
Donor Data Collector
User Interface
14View Links
- A view link points to the provenance store
containing the opposite view of the interaction - View Links are transferred in p-headers or
interaction contexts
PS 2
PS 1
Record Link to PS 1
Will Record P-Assertions
Record Link to PS 2
Will Record P-Assertions
Inform of PS1 usage
Inform of PS 2 usage
Receiver
Sender
15Object Links
- A pointer to the provenance store where the
object of a relationship is stored - This allows for distributed provenance queries
PS 1
PS 2
PS 3
16Implementing Distributed Queries
- Querying actor centric (thick client)
- The querying actor follows links
- Provenance Store centric (thin client)
- Provenance Stores follow links
17Querying Actor Centric
Process Results for links
PS 1
Querying Actor
Issue Query
Receive Result
Issue Query
PS 2
Receive Result
18Provenance Store centric
Process Internal Results For Links
PS 1
Issue Query
Querying Actor
Receive Results
Collate Results
Issue Query
Receive Results
Receive Results
Issue Query
PS 2
PS 3
19Analysis of Links
- Links are unidirectional like the Web
- This approach should be fairly scalable
- Maintain autonomy of application actors
- There is no need for synchronization between
actors - Like the web, queriers must traverse the link
structure to find content of interest - Two mechanisms for implementing distributed
queries using links.
20Supporting Large Data
- Depending on the size of the data involved the
provenance store may not - Be able to store the data immediately
- Asynchronous recording
- Be able to store the data
- Solution references
- References to data, instead of the data itself
- Support for three kinds
- Application
- Internal
- External
21Application References
- The application already transfers references in
its application messages - Nothing to do. Record p-assertions as is
- Inform querying actors of how to resolve these
application specific references
http//datastore/pr1234
22External References
- Application transfers a large message
- Stores all or part of the message in some data
repository - Reference to this external data repository
- Burden is placed on the data repository to
maintain the data as long as process documentation
23External References cont.
Large Patient Record
Data Repository
PS
DocStyle Reference
http//DataRepository/LPR1
24External References cont.
- ltsoapenvelopegt
- ltsoapheadergtlt/soapheadergt
- ltsoapbodygt
- ltechrsstoregt
- ltechrspatientRecordgt
- ltpidgt1lt/pidgt
- ltxraygtj8ladfhaufjalkdjkfaslalkfd
jaljfafjaljajfdlja - adfhaldfjhaslfjdasldfja
slfj. - lt/xraygt
- lt/echrspatientRecordgt
- lt/echrsstoregt
- lt/soapbodygt
- lt/soapenvelopegt
25Styled Reference P-Assertion
- ltpsinteractionPAssertiongt
- ltpslocalPAssertionIdgt1lt/pslocalPAssertionIdgt
- ltpsdocumentationStylegt
- http//www.pasoa.org/.../stylesReference
- lt/psdocumentationStylegt
- ltpscontentgt
- ltsoapenvelopegt
- ltsoapheadergtlt/soapheadergt
- ltsoapbodygt
- ltechrsstoregt
- ltechrsrefgt http//DataRepository/LPR
1 lt/ echrsrefgt - lt/echrsstoregt
- lt/soapbodygt
- lt/soapenvelopegt
- lt/pscontentgt
- lt/psinteractionPAssertiongt
26Internal References
- Same as External References
- However, the reference is to data already stored
inside the provenance store - This is made possible by the unique
addressability of p-assertions - Useful for the case of large actor state
p-assertions that are recorded several times - Example System Configuration Information
27Internal References cont.
PS
Actor State P-Assertion Lots of Configuration
information
Actor State P-Assertion 1
Actor State P-Assertion 2
Actor State P-Assertion 3
Actor State P-Assertion 4
28Provenance Query Results Scalability
- Provenance Query result sets are scalable
- Return pointers to p-assertions not the
assertions themselves
29ltpsinteractionPAssertiongt ltpslocalPAssertionId
gt1lt/pslocalPAssertionIdgt ltpsdocumentationStyle
gt http//www.pasoa.org/.../stylesReference
lt/psdocumentationStylegt ltpscontentgt
ltsoapenvelopegt ltsoapheadergtlt/soapheadergt
ltsoapbodygt ltechrsstoregt
ltechrsrefgt http//DataRepository/LPR1lt/ec
hrsrefgt lt/echrsstoregt
lt/soapbodygt lt/soapenvelopegt
lt/pscontentgt lt/psinteractionPAssertiongt ltpsint
eractionPAssertiongt ltpslocalPAssertionIdgt2lt/ps
localPAssertionIdgt ltpsdocumentationStylegt
http//www.pasoa.org/.../stylesReference
lt/psdocumentationStylegt
30ltpsdidgt ltinteractionKeygt ltsendergtdonerdatacol
lectorlt/sendergt ltreceivegtechrlt/receivergt
ltidgt12233lt/idgt lt/interactionKeygt ltviewkindgtsender
lt/viewkindgt ltlocalPAssertionIdgt1gtlt/localPAssertion
Id lt/psdidgt ltpsdidgt ltinteractionKeygt
ltsendergtdonerdatacollectorlt/sendergt
ltreceivegtechrlt/receivergt ltidgt1224lt/idgt lt/inte
ractionKeygt ltviewkindgtsenderlt/viewkindgt ltlocalPAss
ertionIdgt5gtlt/localPAssertionId lt/psdidgt ltpsdidgt
31Provenance Query Results Scalability
- Provenance query results are scalable
- Return pointers to p-assertions not the
assertions themselves - Scoping means provenance query results are only
what is necessary for the querier
32Iterative Query Results
- Return iterators over results from process
documentation or provenance query results
PS
Issue Query
Querying Actor
Results Iterator
Results Iterator getNextRes() getNextXRes(int x)
33Iterative Query Results
- Return iterators over results from process
documentation or provenance query results - This functionality is planned for future
implementations - The planned implementation makes use of
- OGSA-DAI
- WSRF
34Summary
- Discussed both Distribution and Scalability
- Introduced links for connecting distributed
provenance stores - Two ways of implementing distributed queries
- Large data support through asynchronous recording
and references - Query Scalability
- Provenance Query Results
- Iterative Query Results
35Questions?
- Paul Groth
- pg03r_at_ecs.soton.ac.uk