Provenance in Open Distributed Information Systems - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Provenance in Open Distributed Information Systems

Description:

The scalable storage system depends on the location of provenance store containing log ... analysis is performed on distributed tightly coupled provenance store ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 23
Provided by: p9494
Category:

less

Transcript and Presenter's Notes

Title: Provenance in Open Distributed Information Systems


1
Provenance in Open Distributed Information
Systems
  • PhD Scholar Syed Imran Jami
  • Dated 17th February, 2009
  • Presented under CRUC Weekly Research Seminar

2
Introduction
  • Provenance Systems
  • Provenance is considered as a metadata that keeps
    the record of the origin and history of a target
    object.
  • The metadata contains the log of each step in
    sourcing, moving, and processing the object.
  • Keeps the record of transformation steps on
    target object
  • Provides information related to recreation of
    object
  • Helps in maintaining the quality and reliability
    of object
  • Provide trust mechanism on object for its use in
    simulation and experiments

3
(No Transcript)
4
Introduction (2)
  • Open Distributed Information Systems
  • Information and sequence of steps performed are
    distributed among information systems that are
    independent and could be under different
    administrative controls
  • Nodes can be heterogeneous
  • Now widely used in collaboration and information
    sharing
  • Requires open access (read/write) to digital
    artifact
  • Web 2.0 (blogs, Wikipedia,etc)
  • Grids and Cloud Computing

5
Our Problem
  • Main Problem
  • To propose and develop provenance system for open
    distributed environment
  • Research Question
  • How can we develop provenance model for an
    information system in open distributed
    environment
  • Hypothesis
  • Provenance model for an information system in an
    open distributed environment can be developed by
    incorporating agents to autonomously track the
    interactions.
  • Providing provenance ontology enables the
    provenance representation in RDF graphs to work
    in a heterogeneous environment.
  • The use of ontology and RDF graphs will also make
    the system domain independent.

6
Motivation Justification
  • Most of the existing provenance systems track
    data only
  • The definition of data is now changing
  • Information portals in open environment can
    contain data, document and information
  • Tagged representation in XML reduces the gap
    between data and document
  • Most of the existing provenance systems are
    specialized (domain dependent)
  • Open distributed systems should be able to
    accommodate any kind of information -- Generic
  • The existing systems are not Autonomous
  • They require to change in operating systems or
    work flows in order to track provenance
  • Most of the existing provenance systems do not
    give importance to Heterogeneity
  • It is one of the important factor to be
    considered in open distributed systems

7
Research Issues
  • Provenance Tracking, Representation and Storage
    in open distributed systems lead to following
    research challenges
  • Autonomousity
  • Domain Independent
  • Heterogeneity
  • Scalability and Efficiency
  • Genericity
  • Mobility
  • Privacy Security

8
Proposed Solution
  • As a testbed we developed an XML based
    Information System
  • XML page contains information contributed by
    different sources and used by different users
  • Each interaction is merged with main XML page
    using Agents
  • Provenance of each interaction is tracked using
    Multi Agent Systems
  • Provenance logs are represented in RDF Graphs as
    Triples
  • The logs are stored in distributed locations

9
Proposed Solution
  • Generic
  • Research Question (1)
  • Can we develop a provenance system that can track
    not only data but also other digital objects.
  • Most of the existing systems work for data only
  • For example they use RDBMS as underlying storage
    mechanisms
  • The provenance model should be generic that can
    accommodate data, documents and other digital
    artifacts
  • Semantic Grid based techniques can play its role
  • XML reduces the gap between data and documents
    due to tagged representation
  • All data formats are translated to XML in
    information system
  • Our provenance tracking system will track the
    interactions performed as XML tree

10
Proposed Solutions
  • Autonomousity
  • Research Question (Sub problem 1)
  • Can we develop a model that does not require to
    change or adapt OS, language platform or workflow
    application to track provenance?
  • To provide automated and autonomous tracking
  • Almost all the systems are dependent on APIs, OS
    routines, workflows etc to track provenance which
    is not recommended for open systems like grids
    since one cant change OS or Workflows to use the
    provenance aware information service
  • Multi Agent based systems can be used to provide
    autonomous nature
  • Only one work uses MAS to track data provenance
    for their Health care system (specialized domain)
  • MAS based system will provide the best autonomous
    system among other options

11
Proposed Solution
  • Heterogeneity
  • Research Question (2)
  • Can we develop a provenance system that can track
    the transformation steps in heterogeneous nodes
    of open distributed system.
  • The system should record and track provenance
    even for heterogeneous nodes
  • Device Heterogeneity
  • Platform Heterogeneity
  • Semantic (Schema) Heterogeneity
  • JVM based implementation will provide
    heterogeneity at device and platform
  • Semantic Heterogeneity will be solved by
    representing provenance metadata in RDF triples
    as graphs
  • XML and RDF are standards according to W3C for
    all systems and devices
  • Requires to develop RDF vocabulary for Provenance
    Ontology
  • JVM, XML and RDF based provenance model will make
    our system Domain Independent

12
Proposed Solution
  • Scalability
  • Research Question (3)
  • Can we make provenance storage and tracking
    scalable?
  • The tracking system should be Scalable in case of
    increasing number of users in open distributed
    system
  • The simultaneous recording through agents will
    make the tracking scalable. Each node is
    responsible for autonomously tracking the
    interaction
  • The scalable storage system depends on the
    location of provenance store containing log
  • With the target or separate ??
  • Centralized or Decentralized
  • Decentralized system will be scalable
  • RDF graphs will reside on some other node
  • No single node will be over utilized
  • Problem This solution will cost efficiency !!
  • Another solution is to store sub graphs at the
    local host instead of combining and merging sub
    graphs into one

13
Proposed Solution
  • Efficiency
  • Research Question (4)
  • With the propose solution of scalability, can we
    adapt efficiency in our system for fast retrieval
    of provenance metadata scattered around the
    system
  • The solutions of scalability costs the overhead
    of low efficiency
  • Extra time required to search for RDF graphs
  • Some lookup tables will be required.
  • Solution
  • Each digital artifact must be given unique ID
    like URI
  • Unique IDs should compose of binary strings
  • Lookup table will use these binary strings for
    fast retrievals
  • Can use our own developed ID system
  • Single RDF graph should be maintained for
    multiple copies

14
Current Progress
  • A prototype application is developed that is
    serving as a testbed for information system on
    open distributed environment
  • The system can track provenance log in RDF file
    that is merged in single main RDF graph that
    keeps that track of information
  • Dublin Core is used as an ontology for provenance
  • Both the contribution to information and
    provenance metadata are transmitted through
    Aglets
  • An ID system is developed to label the digital
    artifact
  • Scalability analysis is performed on distributed
    tightly coupled provenance store

15
Results
  • The earlier results are showing that Provenance
    log is independent of file size
  • The logs are dependent on interactions
  • Our storage algorithm has some limitations. Logs
    are converging at one place

16
Contribution towards Provenance
  • A Knowledge Provenance Architecture Open
    Distributed Systems
  • Autonomous Provenance Recording in Heterogeneous
    nodes
  • A Scalable Provenance Storage System
  • Semantic Heterogeneity of Provenance System using
    Provenance Ontology
  • A Domain Independent Provenance System

17
Publications
  • Syed Imran Jami and Zubair A. Shaikh, "A workflow
    based academic management system using multi
    agent approach", Proceedings of the 11th WSEAS
    International Conference on Computers, Agios
    Nikolaos, Crete Island, Greece, Pg 202-207, Year
    of Publication 2007, ISSN1790-5117
  • Imran Jami and Zubair A. Shaikh, "A Multi Agent
    based Architecture for Data Provenance in
    Semantic Grid", Proceedings of International
    Multi-Conference of Engineers and Computer
    Scientists, Hong Kong, Pg 360-364, Year of
    Publication 2008, ISBN 978-988-98671-8-8
  • Syed Imran Jami, Jemal Abawajy, Zubair A. Shaikh,
    A Taxonomy of Provenance Models for Open
    Distributed Systems, Submitted in Journal of
    Information Sciences, Elsevier Publisher, Impact
    Factor 2.147
  • Syed Imran Jami, Jemal Abawajy, Zubair A. Shaikh,
    Information Provenance for Open Distributed
    Collaborative System, About to submit in ACS
    high impact conference.

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Questions /Suggestions /Recommendations
?
Write a Comment
User Comments (0)
About PowerShow.com