VisTrails has groups which are "encapsulate by value" entities and subworkflows which are "encapsulate by reference". A group is stored as a workflow with modules and connections (including special InputPort and OutputPort modules that determine which ports are exposed in the group module) while a subworkflow is stored as a version of a workflow in a different vistrail. Thus, a change to a group will generate an entirely new workflow that is fully specified while a change to a subworkflow uses change-based provenance to only store the actions between the previous workflow and the new one. Groups are fairly simple and work well to abstract a piece of a workflow but use storage inefficiently. In addition, in their current implementation in VisTrails, they are impossible to edit in place. A user must expand the group, make changes, possibly add back some Input/OutputPort modules, and then group the modules again. Subworkflows solve the storage issue but introduce significant complexity because they must be indexed by the module registry. Subworkflows can exist locally (in a user's .vistrails directory), as part of a distributed package (can use _subworkflows list or add_subworkflow), or as part of a vt bundle. In addition, any reference to a subworkflow in a workflow must both identify the subworkflow and the version to be used.
One goal is to make editing a group easier. If we could open a window like with editing subworkflows, we could take the editing workflow and use it as the new group. Note, however, that this has a couple issues:
- We must delete any parameters or connections that were attached to ports that were removed (should be able to look at the upgrade code for these routines).
- Workflow editing current assumes a vistrail exists which may introduce issues. We have the ability to import workflows from xml so we should be able to create a vistrail from the group's workflow, allow the user to make edits and then pull the currentPipeline back as the group's workflow. Ideally, it would be nice to eventually have a separation between workflow and vistrail such that VisTrails could have a standalone workflow editor with provenance threaded in (this would allow us to use the Provenance SDK).
Another possibility that has been proposed is to make grouping something that is more visual than anything else. In other words, I create a "group" from modules a,b,c by visually collapsing these into a single box but all I write to the vistrail is that certain module ids have been marked as collapsed. The major issue with this is that significant functionality has been built assuming that a group is a tangible entity (e.g. in control flow modules).
Subworkflow versions and upgrades
Because we want to allow any user to run a workflow that contains a subworkflow, we save the subworkflows in the vt bundle. This ensures that a user who receives this bundle can run the workflow, assuming they have all the necessary packages installed. At the same time, this means that there can be many different versions of a given subworkflow since User A can send a vistrail to Users B & C who can both make independent edits to a subworkflow and send them back to User A. An issue here is uniquely identifying each of the subworkflows yet possibly maintaining the derivation history. In addition, we have two different types of upgrades that can come into play:
- User-defined upgrades to the subworkflow (e.g. a user changes a parameter or replaces a module)
- Package-required upgrades to the subworkflow (e.g. the VTK package installed by a user is newer than the one used to create the original subworkflow)
In the first case, the user controls whether an attempt is made to update the subworkflow with a newer version (a leaf in the vistrail) as we do not want to break a workflow by forcing the user to use the new subworkflow. In the second case, the upgrade is required because the subworkflow will not run without the changes. Note that this second upgrade requires a new version in the subworkflow's vistrail as well as a new version in the containing workflow's vistrail because the containing workflow must reference a different "internal version" of the subworkflow.
Subworkflows and the registry
Subworkflows are also stored in the registry so that a user might drag them out to use like a normal module. However, for those subworkflows that exist only in an opened vistrail, we do not put them in the registry list (although they must be in the registry); a user must import them into his own "My Subworkflows" in order to use them in his own work. That said, there may be multiple copies of a subworkflow in the registry, meaning we must have a way to differentiate between them all. This is done via abstraction_uuid and abstraction_origin_uuid annotations in the subworkflow which get used as namespaces and possibly filenames. Obviously, this becomes complicated quickly. With upgraded subworkflows, the different versions must also be managed in the registry. The win with using the registry is that a subworkflow will look like a normal module to most of the VisTrails code.
Subworkflows and uuids
Subworkflows already use some uuids, but with a uuid for version ids (as in the uuid branch), the issues with identifying which version of a subworkflow (even across multiple files) are largely removed since the uuid will be unique. There still may be some interesting issues about which is the "most recent" version is since the ids are not monotonic.
#Introduction "A user must expand the group, make changes, possibly add back some Input/OutputPort modules, and then group the modules again."
I didn't know you could create a group this way, by manually placing the InputPorts before grouping. However, ungrouping deletes them, so it does not look like a supported workflow. -- [RR]
- Can we make the ungrouping preserve the input/output ports? Maybe even keep the connections to other modules? This would make it trivial to ungroup-edit-regroup. Editing in a new window would be even better and could work the same. But it would require more work. [TE]
Are abstractions so broken we need to change them in 2.1? Is there a known example that breaks? Would it be enough to warn of the limitations in 2.1 and aim for the uuid version? [TE]
- I added to trac some of the issues I encountered while building the "primes" example with subworkflows (it is currently checked-in with Groups). I think they are it: trac: !closed abstraction