Overview of Today - PowerPoint PPT Presentation

About This Presentation
Title:

Overview of Today

Description:

Styled Reference P-Assertion ps:interactionPAssertion ... http://www.pasoa.org/.../styles#Reference /ps:documentationStyle ps:content soap:envelope ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 36
Provided by: pg30
Category:
Tags: overview | styled | today

less

Transcript and Presenter's Notes

Title: Overview of Today


1
Overview of Todays Talks
  • Provenance Data Structures
  • Recording and Querying Provenance
  • Break (30 minutes)
  • Distribution and Scalability
  • Security
  • Methodology

2
Distribution and Scalability by Paul Groth
(pg03r_at_ecs.soton.ac.uk)
3
Applications are distributed
4
Applications require scalability
  • Applications may have millions of interactions
  • These interactions may be simultaneous
  • Applications may have large amounts of data

5
These Issues Provenance
  • Because applications are distributed and need
    scalability, a provenance system must support
    these requirements
  • Provenance Systems have their own requirements in
    these areas
  • Large numbers of p-assertions.
  • Scalability in terms of querying and recording

6
Provenance Store Distribution
Recording Patterns
  • - Bandwidth
  • - Access Control
  • Storage
  • Legal
  • Multiple physical Provenance Stores per site

PS
PS
PS
PS
PS
PS
7
Logical Distribution
  • Provenance Stores are both physical and logical
    entities
  • Single physical store could have multiple logical
    stores
  • Logical Provenance Stores provide bounds to
    process documentation
  • Could be organisational, experimental, or
    individual

8
Logical Provenance Stores
Physical Provenance Store Hospital
Payroll Store
Pauls Store
Donor Data Collector Store
SurgeryWard Store
9
Provenance Store Usage
  • Combinations of logical and physical Provenance
    Stores can be adopted depending on the
    applications needs
  • In terms of
  • Scalability
  • Regulatory / Legal
  • Information partitioning

10
Distributed Query?
  • Process Documentation is in multiple stores
  • How do we get the provenance of a data item in
    this case?
  • Solution connections embedded in process
    documentation
  • Shared Context
  • Links

11
Shared Context revisited
Querier
PS 2
PS 1
Query PAs with IK 1
Query PAs with IK 1
P-assertion With IK 1
P-assertion With IK 1
IK 1
12
Links
  • Links are unidirectional pointers to provenance
    stores
  • Links connect provenance stores
  • Links are recorded by actors as part of
    p-assertions
  • Links are transferred between actors using
    interaction contexts
  • There are two kinds of links
  • View Links
  • Object Links

13
Views revisited
  • A view is the set of assertions by 1 actor about
    1 interaction.
  • A view contains
  • An actor identity
  • A set of p-assertions
  • A view is one of two view kinds sender or
    receiver

Donor Data Collector
User Interface
14
View Links
  • A view link points to the provenance store
    containing the opposite view of the interaction
  • View Links are transferred in p-headers or
    interaction contexts

PS 2
PS 1
Record Link to PS 1
Will Record P-Assertions
Record Link to PS 2
Will Record P-Assertions
Inform of PS1 usage
Inform of PS 2 usage
Receiver
Sender
15
Object Links
  • A pointer to the provenance store where the
    object of a relationship is stored
  • This allows for distributed provenance queries

PS 1
PS 2
PS 3
16
Implementing Distributed Queries
  • Querying actor centric (thick client)
  • The querying actor follows links
  • Provenance Store centric (thin client)
  • Provenance Stores follow links

17
Querying Actor Centric
Process Results for links
PS 1
Querying Actor
Issue Query
Receive Result
Issue Query
PS 2
Receive Result
18
Provenance Store centric
Process Internal Results For Links
PS 1
Issue Query
Querying Actor
Receive Results
Collate Results
Issue Query
Receive Results
Receive Results
Issue Query
PS 2
PS 3
19
Analysis of Links
  • Links are unidirectional like the Web
  • This approach should be fairly scalable
  • Maintain autonomy of application actors
  • There is no need for synchronization between
    actors
  • Like the web, queriers must traverse the link
    structure to find content of interest
  • Two mechanisms for implementing distributed
    queries using links.

20
Supporting Large Data
  • Depending on the size of the data involved the
    provenance store may not
  • Be able to store the data immediately
  • Asynchronous recording
  • Be able to store the data
  • Solution references
  • References to data, instead of the data itself
  • Support for three kinds
  • Application
  • Internal
  • External

21
Application References
  • The application already transfers references in
    its application messages
  • Nothing to do. Record p-assertions as is
  • Inform querying actors of how to resolve these
    application specific references

http//datastore/pr1234
22
External References
  • Application transfers a large message
  • Stores all or part of the message in some data
    repository
  • Reference to this external data repository
  • Burden is placed on the data repository to
    maintain the data as long as process documentation

23
External References cont.
Large Patient Record
Data Repository
PS
DocStyle Reference
http//DataRepository/LPR1
24
External References cont.
  • ltsoapenvelopegt
  • ltsoapheadergtlt/soapheadergt
  • ltsoapbodygt
  • ltechrsstoregt
  • ltechrspatientRecordgt
  • ltpidgt1lt/pidgt
  • ltxraygtj8ladfhaufjalkdjkfaslalkfd
    jaljfafjaljajfdlja
  • adfhaldfjhaslfjdasldfja
    slfj.
  • lt/xraygt
  • lt/echrspatientRecordgt
  • lt/echrsstoregt
  • lt/soapbodygt
  • lt/soapenvelopegt

25
Styled Reference P-Assertion
  • ltpsinteractionPAssertiongt
  • ltpslocalPAssertionIdgt1lt/pslocalPAssertionIdgt
  • ltpsdocumentationStylegt
  • http//www.pasoa.org/.../stylesReference
  • lt/psdocumentationStylegt
  • ltpscontentgt
  • ltsoapenvelopegt
  • ltsoapheadergtlt/soapheadergt
  • ltsoapbodygt
  • ltechrsstoregt
  • ltechrsrefgt http//DataRepository/LPR
    1 lt/ echrsrefgt
  • lt/echrsstoregt
  • lt/soapbodygt
  • lt/soapenvelopegt
  • lt/pscontentgt
  • lt/psinteractionPAssertiongt

26
Internal References
  • Same as External References
  • However, the reference is to data already stored
    inside the provenance store
  • This is made possible by the unique
    addressability of p-assertions
  • Useful for the case of large actor state
    p-assertions that are recorded several times
  • Example System Configuration Information

27
Internal References cont.
PS
Actor State P-Assertion Lots of Configuration
information
Actor State P-Assertion 1
Actor State P-Assertion 2
Actor State P-Assertion 3
Actor State P-Assertion 4
28
Provenance Query Results Scalability
  • Provenance Query result sets are scalable
  • Return pointers to p-assertions not the
    assertions themselves

29
ltpsinteractionPAssertiongt ltpslocalPAssertionId
gt1lt/pslocalPAssertionIdgt ltpsdocumentationStyle
gt http//www.pasoa.org/.../stylesReference
lt/psdocumentationStylegt ltpscontentgt
ltsoapenvelopegt ltsoapheadergtlt/soapheadergt
ltsoapbodygt ltechrsstoregt
ltechrsrefgt http//DataRepository/LPR1lt/ec
hrsrefgt lt/echrsstoregt
lt/soapbodygt lt/soapenvelopegt
lt/pscontentgt lt/psinteractionPAssertiongt ltpsint
eractionPAssertiongt ltpslocalPAssertionIdgt2lt/ps
localPAssertionIdgt ltpsdocumentationStylegt
http//www.pasoa.org/.../stylesReference
lt/psdocumentationStylegt
30
ltpsdidgt ltinteractionKeygt ltsendergtdonerdatacol
lectorlt/sendergt ltreceivegtechrlt/receivergt
ltidgt12233lt/idgt lt/interactionKeygt ltviewkindgtsender
lt/viewkindgt ltlocalPAssertionIdgt1gtlt/localPAssertion
Id lt/psdidgt ltpsdidgt ltinteractionKeygt
ltsendergtdonerdatacollectorlt/sendergt
ltreceivegtechrlt/receivergt ltidgt1224lt/idgt lt/inte
ractionKeygt ltviewkindgtsenderlt/viewkindgt ltlocalPAss
ertionIdgt5gtlt/localPAssertionId lt/psdidgt ltpsdidgt
31
Provenance Query Results Scalability
  • Provenance query results are scalable
  • Return pointers to p-assertions not the
    assertions themselves
  • Scoping means provenance query results are only
    what is necessary for the querier

32
Iterative Query Results
  • Return iterators over results from process
    documentation or provenance query results

PS
Issue Query
Querying Actor
Results Iterator
Results Iterator getNextRes() getNextXRes(int x)
33
Iterative Query Results
  • Return iterators over results from process
    documentation or provenance query results
  • This functionality is planned for future
    implementations
  • The planned implementation makes use of
  • OGSA-DAI
  • WSRF

34
Summary
  • Discussed both Distribution and Scalability
  • Introduced links for connecting distributed
    provenance stores
  • Two ways of implementing distributed queries
  • Large data support through asynchronous recording
    and references
  • Query Scalability
  • Provenance Query Results
  • Iterative Query Results

35
Questions?
  • Paul Groth
  • pg03r_at_ecs.soton.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com