Difference between revisions of "Second provenance challenge"

From VistrailsWiki
Jump to navigation Jump to search
Line 1: Line 1:
[Model Integration Results]
==Model Integration Results==


We have successfully performed the queries using data from VisTrails, MyGrid and Southampton (TBD insert links).
We have successfully performed the queries using data from VisTrails, MyGrid and Southampton (TBD insert links).
Line 7: Line 7:
Our queries required the data to contain at least module executions, connections between them and required annotations. These were all present in the models except Southampton which lacked some annotations.
Our queries required the data to contain at least module executions, connections between them and required annotations. These were all present in the models except Southampton which lacked some annotations.


[Translation Details]
==Translation Details==


A description of the system describing Model, API and Implementation?
A description of the system describing Model, API and Implementation?
Line 13: Line 13:
any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous_
any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous_


[Benchmarks]
==Benchmarks==


What should we put here?
What should we put here?
_Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system_
_Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system_


[Further Comments]
==Further Comments==


_Provide here further comments._
_Provide here further comments._


[Conclusions]
==Conclusions==


In the general case, tracking provenance through different systems is a data integration problem. But by defining a common model (SWPDM) on a restricted domain (Scientific Workflow) the difficulty is reduced to efficiency and entity resolution problems. We believe that it should be possible for the Scientific workflow community to support a model similar to the SWPDM to enable provenance to be tracked through their systems. We have showed that an API for querying this model can be built and its compatibility with three of the current systems.
In the general case, tracking provenance through different systems is a data integration problem. But by defining a common model (SWPDM) on a restricted domain (Scientific Workflow) the difficulty is reduced to efficiency and entity resolution problems. We believe that it should be possible for the Scientific workflow community to support a model similar to the SWPDM to enable provenance to be tracked through their systems. We have showed that an API for querying this model can be built and its compatibility with three of the current systems.

Revision as of 15:23, 19 June 2007

Model Integration Results

We have successfully performed the queries using data from VisTrails, MyGrid and Southampton (TBD insert links).

Our method consists of using wrappers to translates the queries between a common data model and the source data. We first define a high-level general model that captures the basic concepts of workflows and its executions. The model contains basic concepts making it possible to express queries over the different models. Secondly, we defined API functions for the wrappers that use this model. Thirdly, we implemented the wrappers to show that the data model is valid and it is possible to construct the queries using our model.

Our queries required the data to contain at least module executions, connections between them and required annotations. These were all present in the models except Southampton which lacked some annotations.

Translation Details

A description of the system describing Model, API and Implementation?

any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous_

Benchmarks

What should we put here? _Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system_

Further Comments

_Provide here further comments._

Conclusions

In the general case, tracking provenance through different systems is a data integration problem. But by defining a common model (SWPDM) on a restricted domain (Scientific Workflow) the difficulty is reduced to efficiency and entity resolution problems. We believe that it should be possible for the Scientific workflow community to support a model similar to the SWPDM to enable provenance to be tracked through their systems. We have showed that an API for querying this model can be built and its compatibility with three of the current systems.

Problems for discussion:

How to connect these systems? There is a need for the data to support referencing other models. E.g. If a data item is stored externally and tracked through another provenance store. Common identifiers like LSID:s might be part of the solution. External data items should also be given a namespace to indicate where they came from.

The concept of data item varies between systems. It can be represented as the data exchanged between modules, the inputs or outputs of a workflow or a file reference passed between modules.

How will a user be able to express these kind of queries.