Provenance challenge

From VistrailsWiki
Revision as of 09:14, 13 April 2007 by Tommy (talk | contribs)
Jump to navigation Jump to search

Second provenance challenge design overview

This page describes the implementation of how to answer the queries of the second provenance challenge.

The goal of this project is to create an api capable of querying different kinds of databases containing provenance data. The main focus will be on provenance generated by scientific workflows.

data model overview

This is a description of the data model that i am trying to implement.

Module definition is a description of a processor that takes inputs and generates outputs.

Workflow definition is a description of a workflow that contains modules and connections between them through ports. In the case of VisTrails, it also contains the evolution of the workflow through a parent relation.

Execution log is the information about a workflow execution. It contains information about the processors that were executed and the data items that were created.

Pc model er.gif


primitives

The api will deal with the basic primitives describing workflow executions.


node types:

dataitema dataitem that is input/output to a module execution
modulethe module/service that is to be executed
moduleInstancethe module as represented in a workflow
moduleExecutionthe execution of a module
workflowa description of a process containing modules and connections
workflowExecutionthe representation of a workflow execution
inputPortrepresents a specific port thas can be assigned an input to a module execution
outputPortrepresents a specific port thas can contain a product of a module execution
connectionrepresents a connection between module Instances

Relations

RelationInputOutput
existsallboolean
equalsallboolean
annotationsalldict of key/value pairs

getInputPortForData dataItem inputPort getOutputPortForData dataItem outputPort getDataFromInputPort inputPort dataItem getDataFromOutputPort outputPort dataItem

hasInputPort moduleInstance inputPort inputPortOf inputPort moduleInstance hasOutputPort moduleInstance outputPort outputPortOf outputPort moduleInstance


outputOf dataItem moduleExecution inputOf dataItem moduleExecution hasOutput moduleExecution dataItem hasInput moduleExecution dataItem

startTime moduleExecution time endTime moduleExecution time startTime workflowExecution time endTime workflowExecution time

executionOf moduleExecution moduleInstance executionOf workflowExecution workflowInstance

hasExecution moduleInstance moduleExecution hasExecution workflowInstance workflowExecution

executions workflowExecution moduleExecution executedIn moduleExecution workflowExecution

inWorkflow moduleInstance workflow hasModule workflow moduleInstance

connectedTo inputPort outputPort connectedTo outputPort inputPort

runsModule moduleInstance module hasInstance module moduleInstance


derived relations: (might be native)

derivedFrom dataItem dataItem derivedData dataItem dataItem previousModuleExecution moduleExecution moduleExecution


transitive relations:


datatype relation


upstreams:

dataitem derivedFrom - .outputOf()[forall].hasInput() moduleInstance prevModuleInstance - .hasInputPort()[forall].connectedTo().outputPortOf() moduleExecution prevModuleExecution - .hasInput()[forall].OutputOf()

downstreams:

dataitem derivedData - .inputOf()[forall].hasOutput() moduleInstance nextModuleInstance - .hasOutputPort()[forall].connectedTo().inputPortOf() moduleExecution nextModuleExecution - .hasOutput()[forall].inputOf()


--Tommy 09:05, 12 April 2007 (MDT)