Difference between revisions of "UsersGuideVisTrailsPackages"

From VistrailsWiki
Jump to navigation Jump to search
Line 18: Line 18:
New VisTrails modules must subclass from Module, the base class that defines basic functionality. The only required override is the <tt>compute()</tt> method, which performs the actual module computation. Input and output is specified through ''ports'', which currently have to be explicitly registered with VisTrails. However, this is straightforward, and done through method calls to the module registry. A complete documented example of a (slightly) more complicated module is available [[VistrailsPackagePythonCalcExample | here]].
New VisTrails modules must subclass from Module, the base class that defines basic functionality. The only required override is the <tt>compute()</tt> method, which performs the actual module computation. Input and output is specified through ''ports'', which currently have to be explicitly registered with VisTrails. However, this is straightforward, and done through method calls to the module registry. A complete documented example of a (slightly) more complicated module is available [[VistrailsPackagePythonCalcExample | here]].


== Dealing with side effects ==
== Dealing with command line tools and side effects ==


In an ideal world, each module would be referentially transparently. In other words, a module's outputs should be completely determined by its inputs. This is important for provenance purposes - if modules have implicit dependencies, it is not possible to be certain that when the process is reexecuted, the same results will be generated.
In an ideal world, each module would be referentially transparently. In other words, a module's outputs should be completely determined by its inputs. This is important for provenance purposes - if modules have implicit dependencies, it is not possible to be certain that when the process is reexecuted, the same results will be generated.


However, it is clear that certain modules are inherently side-effectful (reading/writing files, network, etc). For the common case of temporary files, VisTrails provides a convenience layer that removes part of the burden of managing temporary files.
However, it is clear that certain modules are inherently side-effectful (reading/writing files, network, etc). For the common case of temporary files, VisTrails provides a convenience layer that removes part of the burden of managing temporary files. As an illustrative example, consider one of the packages we make available for image conversion, using the [http://www.imagemagick.org/ ImageMagick] suite:
 
class Convert(ImageMagick):
    """Convert is the base Module for VisTrails Modules in the ImageMagick
package that deal with operations on images. Convert is a bit of a misnomer since
the 'convert' tool does more than simply file format conversion. Each subclass
has a descriptive name of the operation it implements."""
    def create_output_file(self):
        """Creates a File with the output format given by the
outputFormat port."""
        if self.hasInputFromPort('outputFormat'):
            s = '.' + self.getInputFromPort('outputFormat')
            return self.interpreter.filePool.create_file(suffix=s)
    def geometry_description(self):
        """returns a string with the description of the geometry as
indicated by the appropriate ports (geometry or width and height)"""
        # if complete geometry is available, ignore rest
        if self.hasInputFromPort("geometry"):
            return self.getInputFromPort("geometry")
        elif self.hasInputFromPort("width"):
            w = self.getInputFromPort("width")
            h = self.getInputFromPort("height")
            return "'%sx%s'" % (w, h)
        else:
            raise ModuleError(self, "Needs geometry or width/height")
    def run(self, *args):
        """run(*args), runs ImageMagick's 'convert' on a shell, passing all
arguments to the program."""       
        cmdline = ("convert" + (" %s" * len(args))) % args
        if not self.__quiet:
            print cmdline
        r = os.system(cmdline)
        if r != 0:
            raise ModuleError("system call failed: '%s'" % cmdline)
    def compute(self):
        o = self.create_output_file()
        i = self.input_file_description()
        self.run(i, o.name)
        self.setResult("output", o)
(...)
 
    reg.addModule(Convert)
    reg.addInputPort(Convert, "geometry", (basic.String, 'ImageMagick geometry'))
    reg.addInputPort(Convert, "width", (basic.String, 'width of the geometry for operation'))
    reg.addInputPort(Convert, "height", (basic.String, 'height of the geometry for operation'))
    reg.addOutputPort(Convert, "output", (basic.File, 'the output file'))
 
This example introduces several new VisTrails features. The last line of the snippet registers an output port that provides a file. Immediately, a file output presents several problems when a pipeline is to be shared among users in heterogenous environments. For example, where should a file be written to? For temporary files, VisTrails provides a ''file pool'' class, that manages temporary files and their lifetimes automatically, so that users don't have to worry about deleting them post-execution. To create a temporary file, a user calls, for example
 
fileObj = self.interpreter.filePool.create(suffix=".png")
 
<tt>fileObj</tt> will then contain a module that represents a file. The only thing a filePool does is create a temporary file with write permissions, whose local name is available, in this case, as <tt>fileObj.name</tt>.
 
''To be continued''

Revision as of 17:26, 31 January 2007

Introduction

VisTrails provides infrastructure for user-defined functionality to be incorporated into the main program. Specifically, users can incorporate their own visualization and simulation codes into pipelines by defining custom modules. These modules are bundled in what we call packages. A VisTrails package is simply a collection of Python classes -- each of these classes will represent a new module -- created by the user that respects a certain convention. Here's a simplified example of a very simple user-defined module:

class Divide(Module):
    def compute(self):
        arg1 = self.getInputFromPort("arg1")
        arg2 = self.getInputFromPort("arg2")
        if arg2 == 0.0:
            raise ModuleError(self, "Division by zero")
        self.setResult("result", arg1 / arg2)

registry.addModule(Divide)
registry.addInputPort(Divide, "arg1", (basic.Float, 'dividend'))
registry.addInputPort(Divide, "arg2", (basic.Float, 'divisor'))
registry.addOutputPort(Divide, "result", (basic.Float, 'quotient'))

New VisTrails modules must subclass from Module, the base class that defines basic functionality. The only required override is the compute() method, which performs the actual module computation. Input and output is specified through ports, which currently have to be explicitly registered with VisTrails. However, this is straightforward, and done through method calls to the module registry. A complete documented example of a (slightly) more complicated module is available here.

Dealing with command line tools and side effects

In an ideal world, each module would be referentially transparently. In other words, a module's outputs should be completely determined by its inputs. This is important for provenance purposes - if modules have implicit dependencies, it is not possible to be certain that when the process is reexecuted, the same results will be generated.

However, it is clear that certain modules are inherently side-effectful (reading/writing files, network, etc). For the common case of temporary files, VisTrails provides a convenience layer that removes part of the burden of managing temporary files. As an illustrative example, consider one of the packages we make available for image conversion, using the ImageMagick suite:

class Convert(ImageMagick):
    """Convert is the base Module for VisTrails Modules in the ImageMagick
package that deal with operations on images. Convert is a bit of a misnomer since
the 'convert' tool does more than simply file format conversion. Each subclass
has a descriptive name of the operation it implements."""

    def create_output_file(self):
        """Creates a File with the output format given by the
outputFormat port."""
        if self.hasInputFromPort('outputFormat'):
            s = '.' + self.getInputFromPort('outputFormat')
            return self.interpreter.filePool.create_file(suffix=s)

    def geometry_description(self):
        """returns a string with the description of the geometry as
indicated by the appropriate ports (geometry or width and height)"""
        # if complete geometry is available, ignore rest
        if self.hasInputFromPort("geometry"):
            return self.getInputFromPort("geometry")
        elif self.hasInputFromPort("width"):
            w = self.getInputFromPort("width")
            h = self.getInputFromPort("height")
            return "'%sx%s'" % (w, h)
        else:
            raise ModuleError(self, "Needs geometry or width/height")

    def run(self, *args):
        """run(*args), runs ImageMagick's 'convert' on a shell, passing all
arguments to the program."""        
        cmdline = ("convert" + (" %s" * len(args))) % args
        if not self.__quiet:
            print cmdline
        r = os.system(cmdline)
        if r != 0:
            raise ModuleError("system call failed: '%s'" % cmdline)

    def compute(self):
        o = self.create_output_file()
        i = self.input_file_description()
        self.run(i, o.name)
        self.setResult("output", o)

(...)
    reg.addModule(Convert)
    reg.addInputPort(Convert, "geometry", (basic.String, 'ImageMagick geometry'))
    reg.addInputPort(Convert, "width", (basic.String, 'width of the geometry for operation'))
    reg.addInputPort(Convert, "height", (basic.String, 'height of the geometry for operation'))
    reg.addOutputPort(Convert, "output", (basic.File, 'the output file'))

This example introduces several new VisTrails features. The last line of the snippet registers an output port that provides a file. Immediately, a file output presents several problems when a pipeline is to be shared among users in heterogenous environments. For example, where should a file be written to? For temporary files, VisTrails provides a file pool class, that manages temporary files and their lifetimes automatically, so that users don't have to worry about deleting them post-execution. To create a temporary file, a user calls, for example

fileObj = self.interpreter.filePool.create(suffix=".png")

fileObj will then contain a module that represents a file. The only thing a filePool does is create a temporary file with write permissions, whose local name is available, in this case, as fileObj.name.

To be continued