Second provenance challenge

From VistrailsWiki
Revision as of 14:40, 19 June 2007 by Tommy (talk | contribs)
Jump to navigation Jump to search

---++ Second Provenance Challenge Template

%TOC%

---+++ Participating Team

  * Short team name: VisTrails
  * Participant names: Erik Anderson, Steven Callahan, Tommy Ellkvist, Juliana Freire, David Koop, Emanuele Santos, Carlos Scheidegger, Claudio Silva, Nathan Smith and Huy Vo
  * Project URL: http://www.vistrails.org/
  * First challenge results: VisTrails

---++ Differences from First Challenge

We have changed the structure of our provenance representation to generalize and better structure our data, but the data stored is roughly equivalent to our previous representation. The schemas and data are provided below. Recall that we store workflow evolution in a _vistrail_ which is a tree of actions where each node represents a (possibly partial) workflow. To allow easier integration with other systems, we have also materialized the individual workflow specifications for the three parts.

We split our original workflow into three individual workflows to better reflect the independence of the parts. In addition, because the AIR tools depend on a (.hdr, .img) pair of files, the workflows are slightly restructed so that module inputs and outputs are also paired using a !FileSet module.

---++ Provenance Data for Workflow Parts

The provenance data is split into three layers (workflow evolution, workflows, and execution). The schemas for these layers are available:

  * [[%ATTACHURL%/vistrail.xsd][vistrail.xsd]] - workflow evolution actions
  * [[%ATTACHURL%/workflow.xsd][workflow.xsd]] - workflow specification
  * [[%ATTACHURL%/log.xsd][log.xsd]] - workflow execution information

The data corresponding to these layers:

  * [[%ATTACHURL%/pc_vt.xml][pc_vt.xml]] stores the workflow evolution (you can materialize workflows from this data)
  * [[%ATTACHURL%/pc_part1.xml][pc_part1.xml]] is the materialized workflow for part 1
  * [[%ATTACHURL%/pc_part2.xml][pc_part2.xml]] is the materialized workflow for part 2
  * [[%ATTACHURL%/pc_part3a.xml][pc_part3a.xml]] is the materialized workflow for part 3 (first version)
  * [[%ATTACHURL%/pc_part3b.xml][pc_part3b.xml]] is the materialized workflow for part 3 (second version)
  * [[%ATTACHURL%/pc_log.xml][pc_log.xml]] is the execution information

Note that teams may decide to use the vistrail data or the four materialized workflows for the challenge; the four workflows constitute a subset of the workflows contained in the vistrail. Please refer to the [[VisTrails][previous challenge]] for documentation on the system design.

---++ Model Integration Results

_State here which combinations of teams' models you have managed to perform the provenance query over_

---++ Translation Details

_Describe details regarding how data models were translated (or otherwise used to answer the query following the team's approach), any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous_

---++ Benchmarks

_Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system_

---+++ Further Comments

_Provide here further comments._

---+++ Conclusions

_Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting._

-- Main.JulianaFreire - 20 Feb 2007