FAQ

Also check our Known Issues page for troubleshooting.

Running workflows

How can I run a workflow using the command line?

(Updated for version 1.2) Call vistrails using the following options:

python vistrails.py -b path_to_vistrails_file:pipeline

where pipeline can be a version tag name or version id

NOTE: If you downloaded the MacOS X bundle, you can run vistrails from the command line via the following commands in the Terminal. Change the current directory to wherever VisTrails was installed (often /Applications), and then type:

Vistrails.app/Contents/MacOS/vistrails [<cmd_line_options>] Running a Specific Workflow in Batch Mode

Using the command line, we'd like to execute a workflow multiple times, with slightly different parameters, and create a series of output files. Is this possible?

(Updated for version 1.2) We can change parameters that have an alias through the command line.

For example, offscreen pipeline in offscreen.vt always creates the file called image.png. If you want generate it with a different filename:

python vistrails.py -b ../examples/offscreen.vt:offscreen -a"filename=other.png"

filename in the example above is the alias name assigned to the parameter in the value method inside the String module. When running a pipeline from the command line, VisTrails will try to start the spreadsheet automatically if the pipeline requires it. For example, this other execution will also start the spreadsheet (attention to how $ characters are escaped when running on bash):

python vistrails.py -b ../examples/head.vt:aliases -a"isovalue=30\$&\$diffuse_color=0.8,0.4,0.2"

You can also execute more than one pipeline on the command line:

python vistrails.py -b ../examples/head.vt:aliases ../examples/spx.vt:spx \ -a"isovalue=30"

Use the -a parameter only once regardless the number of pipelines. Running a Workflow with Specific Parameters

I can load a vistrail, and the version tree shows up fine. However, no pipelines appear when I click on a version. What gives?

The most likely reason is that the vistrail uses a package that is not registered with VisTrails. You need to identify the needed package and add it to your .vistrails/startup.py. A single line like the following should be enough:

addPackage('enter_package_name_here')

Some packages might need more information. For example:

addPackage('afront', executable_path='/path/to/afront')

Refer to the package documentation for details. The one inconvenient step is that currently there's no automated way to describe what is the missing package. We're working on this feature for future releases.

I have a workflow that reads a file and then does some processing. The first time it runs, it executes correctly. But in subsequent, nothing happens.

VisTrails caches by default, so after a workflow is executed, if none of its parameters change, it won't be executed again.

If a workflow reads a file using the basic module File, VisTrails does check whether the file was modified since the last run. It does so by keeping a signature that is based on the modification time of the file. And if the file was modified, the File module and all downstream modules (the ones which depend on File) will be executed.

Note: If you would like your input and output data to be versioned, you can use the Persistence package.

If you do not want VisTrails to cache executions, you can turn off caching: go to Menu Edit -> Preferences and in the General Configuration tab, change Cache execution results to Never. Workflow Execution

Can VisTrails execute workflows in parallel?

The VisTrails server can only execute pipelines in parallel if there's more than one instance of VisTrails running. The command

self.rpcserver = ThreadedXMLRPCServer((self.temp_xml_rpc_options.server, self.temp_xml_rpc_options.port))

starts a multithreaded version of the XML-RPC server, so it will create a thread for each request received by the server. The problem is that Qt/PyQT doesn't allow these multiple threads create GUI objects, only in the main thread. To overcome this limitation, the multithreaded version can instantiate other single threaded versions of VisTrails and put them in a queue, so workflow executions and other GUI-related requests, such as generating workflow graphs and history trees can be forwarded to this queue, and each instance takes turns in answering the request. If the results are in the cache, the multithreaded version answers the requests directly.

Note that this infrastructure works on Linux only. To make this work on Windows, you have to create a script similar to start_vistrails_xvfb.sh (located in the scripts folder) where you can send the number of other instances via command-line options to VisTrails. The command line options are:

python vistrails_server.py -T <ADDRESS> -R <PORT> -O<NUMBER_OF_OTHER_VISTRAILS_INSTANCES> [-M]&

If you want the main vistrails instance to be multithreaded, use the -M at the end.

After creating this script, update function start_other_instances in vistrails/gui/application_server.py lines 1007-1023 and set the script variable to point to your script. You may also have to change the arguments sent to your script (line 1016: for example, you don't need to set a virtual display). You will need to change the path to the stop_vistrails_server.py script (on line 1026) according to your installation path. Executing Workflows in Parallel

When a workflow is executed, what do the colors mean?

- lilac: module was notexecuted

- yellow: module is currently being executed

- green: module was successfully executed

- orange: module was cached

- red: the execution of the module failed

Workflow Execution

Building workflows

Is there a way to give each widget a "display name" in addition to the module name at the center of the widget?

Yes, a "display name" can be assigned to a module by selecting the triangle in its top right corner to open a popup menu and selecting the Set Module Label... menu item. You will then be prompted to enter the "display name". Changing Module Labels

Is there a way to re-center the picture-in-picture (PiP) view?

Yes. If you click on the PIP window to bring it to focus, you can press Ctrl-R (or Command-R on Mac) to re-center the PiP window. Vistrails Interaction

How do I search for a literal "?" (question mark) in the search box in the Property panel?

Since we allow regular expressions in our search box, question marks are treated as meta-characters. Thus, searching for "?" returns everything and "abc?" will return everything containing "abc". You need to use "\?" instead to search for "?". So the search for "??" would be "\?\?". Textual Queries

Using VisTrails as a server

What is the VisTrails server-mode?

Using the VisTrails server mode, it is possible to execute workflows and control VisTrails through another application. For example, the CrowdLabs Web portal (http://www.crowdlabs.org) accesses a VisTrails sever to execute workflows, retrieve and display vistrail trees and workflows. Using VisTrails as a Server

How do I execute workflows and control VisTrails through another application?

The way you access the server is by doing XML-RPC calls. In the current VisTrails release, we include a set of PHP scripts that can talk to a VisTrails server instance. They are in "extensions/http" folder. The files are reasonably well documented. Also, it should be not difficult to create python scripts to access the server (just use xmlrpclib module).

Note that the VisTrails server requires the provenance and workflows to be in a database. More detailed instructions on how to setup the server and the database are available here:

http://www.crowdlabs.org/site_media/static/dev_docs/vistrails_server_setup.html

http://www.crowdlabs.org/site_media/static/dev_docs/vistrails_database_setup.html

If what you want is just to execute a series of workflows in batch mode, a simpler solution would be to use the VisTrails client in batch mode. Chapter 12 of the user's guide contains detailed information and examples on that. Using VisTrails as a Server

Control Flow

Note: using map

When using 'map', the module (or subworkflow) used as function port in the map module MUST be a function, i.e., it can only define 1 output port. The Map Operator

Spreadsheet

Where pipeline is a version number or a tag.

How can I save an image from the spreadsheet?

While having the focus on a spreadsheet cell and select the camera on the toolbar to take a snapshot. The system will prompt you for the location and file name where it should be saved. The other icons can be used for saving multiple images that can be used for generating an animation on demand. A whole sheet can also be saved by selecting Export (either from the menu or from the toolbar). Saving a Spreadsheet Image

Is it possible to save the complete state of the spreadsheet?

Saving a Spreadsheet

Can I view multiple sheets at the same time?

Yes. Each sheet on the spreadsheet can be displayed as a dock widget separated from the main spreadsheet window by dragging its tab name out of the tab bar at the bottom of the spreadsheet. Multiple Spreadsheets

Then, how can I put back a separated sheet?

A sheet can be docked back to the main window by dragging it back to the tab bar or double-click on its title bar. Multiple Spreadsheets

How can I order sheets on the spreadsheet?

This can be done by dragging the sheet name on the bottom top bar and drop it to the right place. Multiple Spreadsheets

Can I control where a cell will be placed on the spreadsheet window?

By default, an unoccupied cell on the active sheet will be chosen to display the result. However, you can specify exactly in the pipeline where a spreadsheet cell will be placed by using CellLocation and SheetReference. CellLocation specifies the location (row and column) of a cell when connecting to a spreadsheet cell (VTKCell, ImageViewerCell, ...). Similarly, a SheetReference module (when connecting to a CellLocation) will specify which sheet the cell will be put on given its name, minimum row size and minimum column size. There is an example of this in examples/vtk.xml (select the version below Double Renderer). Sending Output to the Spreadsheet

How do I output results to the spreadsheet?

By inspecting the VisTrails Spreadsheet package (in the list of packages, to the left of the pipeline builder), you can see there are built-in cells for different kinds of data, e.g., RichTextCell to display HTML and plain text. op You (the user) can also define new cell types to display application-specific data. For example, we have developed VtkCell, MplFigureCell, and OpenGLCell. It is possible to display pretty much anything on the Spreadsheet! Sending Output to the Spreadsheet

Examples of writing cell modules can be found in: RichTextCell: packages/spreadsheet/widgets/richtext/richtext.py VTK: packages/vtk/vtkcell.py

Here is the summary of some requirements on a cell widget:

(1) It must be a Qt widget. It should inherit from spreadsheet_cell.QCellWidget in the spreadsheet package. Although any Qt Widget would work, certain features such as animation will not be available (without rewriting it).

(2) It must re-implement the updateContents() function to take a set of inputs (usually coming from input ports of a wrapper Module) and display on the cells. VisTrails uses this function to update/reuse cells on the spreadsheet when new data comes in.

(3) It needs a wrapper VisTrails Module (inherited from basic_widgets.SpreadsheetCell of the spreadsheet package). Inside the compute() method of this module, it may call self.display(CellWidgetType, (inputs)) to trigger the display event on the spreadsheet. Advanced Cell Options

How do I control the default number of cells in the spreadsheet?

You can configure the rowCount and colCount using the preferences dialog. Just go to the Module Packages tab, select spreadsheet in the "Enabled packages" and press the Configure button. Then a list of all the configuration options for the spreadsheet will show up. Custom Layout Options

Is it possible to launch a web browser from the vistrails spreadsheet? We would like to output several urls from a parameter sweep and then have the option to click on each one to view the resulting page. I can view the page within the spreadsheet, but it is really too crowded.

Currently, there isn't a widget that provides exactly this functionality, but I can think of a few solutions that may work for you:

(1) You can use parameter exploration to generate multiple sheets so you might have an exploration that opens each page in a new sheet. Use the third column/dimension in the exploration interface to have a parameter span sheets.

(2) The spreadsheet is extensible so you can write a custom spreadsheet cell widget that has a button or label with the desired link (a QLabel with openExternalLinks set to True, for example).

(3) You can tweak the existing RichTextCell be adding the line "self.browser.setOpenExternalLinks(True)" at line 63 of the source file "vistrails/packages/spreadsheet/widgets/richtext/richtext.py". Then, if your workflow creates a file with html markup text like "<a href="http://www.vistrails.org">VisTrails</a>" connected to a RichTextCell, clicking on the rendered link in the cell will open it in a web browser. You need to add the aforementioned line to the source to let Qt know that you want the link opened externally; by default, it will just issue an event that isn't processed. Launching a Web Browser

Integrating your software into VisTrails

How can I integrate my own program into VisTrails?

The easiest way is to create a package. Writing a package is often very simple, here are instructions on how to do it: UsersGuideVisTrailsPackages

You can also dynamically generate modules. For an example see:

http://www.vistrails.org/index.php/UsersGuideVisTrailsPackages#Packages_that_generate_modules_dynamically

In particular, see the new_module call which uses python's type() function to generate new classes dynamically.

How do modules deal with multiple inputs in a same port?

(And should that even be allowed?)

For compatibility reasons, we do need to allow multiple connections to an input port. However, most package developers should never have to use this, and so we do our best to hide it. the default behavior for getting inputs from a port, then, is to always return a single input.

If on your module you need multiple inputs connected to a single port, use the 'forceGetInputListFromPort' method. It will return a list of all the data items coming through the port. The spreadsheet package uses this feature, so look there for usage examples (vistrails/packages/spreadsheet/basic_widgets.py)

Are there mechanisms for attaching widgets to different modules/parameters?

Right now, we have a mechanism for putting a specific widget for an input port. For example, if a port is SetColor(red, green, blue), we can put a color wheel widget there. Or we can also replace the SetFileName port with a File Widget. However, this is not per parameter (only per port). We are currently working on this problem.

Can I organize my package so it appears hierarchical in the module palette?

Yes. Use the namespace keyword argument when adding the module to the registry. For example,

registry.add_module(MyModule, namespace='MyNamespace')

Module Hierarchy and Visibility

Can I nest namespaces?

Yes. Use the '|' character to separate different the hierarchy. For example,

registry.add_module(MyModule, namespace='ParentNamespace|ChildNamespace')

Module Hierarchy and Visibility

Are there shortcuts for registry initialization?

Yes. If you define _modules as a list of classes in the __init__.py file of your package, VisTrails will attempt to load all classes specified as modules. You can provide add_module options as keyword arguments by specifying a tuple (class, kwargs) in the list. For example:

_modules = [MyModule1, (MyModule2, {'namespace': 'MyNamespace'})]

In addition, you need to identify the ports of your modules as a field in your class by defining _input_ports and _output_ports lists. Here, the items in each list must be tuples of the form (portName, portSignature, optional=False, sort_key=-1). For example:

class MyModule(Module):
    def compute(self):
       pass

   _input_ports = [('firstInput', String), ('secondInput', Integer, True)]
   _output_ports = [('firstOutput', String), ('secondOutput', String)]

Options for Configuring Modules and Ports

Can I define ports to be of types that I do not import into my package?

Yes. You can pass an identifier string as the portSignature instead. The port_signature string is defined by:

<module_string> := <package_identifier>:[<namespace>|]<module_name>,
<port_signature> := (<module_string>*)

For example,

registry.add_input_port(MyModule, 'myInputPort', '(edu.utah.sci.vistrails.basic:String)')

or

 _input_ports = [('myInputPort', '(edu.utah.sci.vistrails.basic:String)')]

What do I need to change in my package to make it reloadable (new in v1.4.2)?

See UsersGuideVisTrailsPackages for an explanation.

Can I add default values or labels for parameters?

Yes. Versions 1.4 and greater support these features. See UsersGuideVisTrailsPacakges for more details.

I want to write a module to load HDF data whose output (e.g., data, string) varies according to the input I give it. Is is possible to do this in VisTrails, and if yes, how can I do that? Ideally, I would like to avoid having to change the connection of my output every time I change the input.

There are a few ways to tackle this - each has it's own benefits and pitfalls. Firstly, module connections do respect class hierarchies as we're familiar with in object oriented languages. For instance, A module can output a Constant of which String, Float, Integer, etc are specifications. In this way, you can have a subclass of something like HDFData be passed out of the module and the connections will be established regardless of the sub-type. This is a bit dangerous though. Modules downstream of such a class may not really know how to operate on certain types derived from the super-class. Extreme care must be taken both when creating the modules as well as connecting them to prevent things like this from happening.

A second method that I employ in several different packages is the idea of a container class. For instance, the NumSciPy package uses a relatively generic container "Numpy Array" to encapsulate the data. Of course, these encapsulating objects can store dictionaries that other modules can easily access and understand how to operate on. Although this method is slightly more work, the benefits of a stricter typing of ports is beneficial - particularly upon interfacing with other packages that may depend on strongly typed constants (for example).

I need to determine, at run-time, whether or not a "child" module is attached to the output port of a "parent" module. (I do not specifically need to know which child; just if there is one).

The outputPorts dictionary of the base Module stores this information. Thus, you should be able to check

("myPortName" in self.outputPorts)

on the parent module to check if there are any downstream connections from the port "myPortName". This might be used, for example, to only set results for output ports that will be used. ***Note***, however, that the caching algorithm assumes that all outputs are set so adding a new connection to a previously unconnected output port will not work as desired if that module is cached. For this reason, I would currently recommend making such a module not cacheable. Another possibility is overriding the update() method to check the outputPorts and set the upToDate flag if they are not equal. In a single, limited test, this seemed to work, but be warned that it is not fully tested. Here is an example:

class TestModule(Module):
    _output_ports = [('a1', '(edu.utah.sci.vistrails.basic:String)'),
                     ('a2', '(edu.utah.sci.vistrails.basic:String)')]
    def __init__(self):
        Module.__init__(self)
        self._cached_output_ports = set()
    
    def update(self):
        if len(set(self.outputPorts) - self._cached_output_ports) > 0:
            self.upToDate = False
        Module.update(self)
    
    def compute(self):
        if "a1" in self.outputPorts:
            self.setResult("a1", "test")
        if "a2" in self.outputPorts:
            self.setResult("a2", "test2")
        self._cached_output_ports = set(self.outputPorts)

How can I make a module not display in the modules list?

You should set the abstract parameter to True when adding the module to the registry. Using the original syntax, this looks like:

def initialize():
    reg = core.modules.module_registry.get_module_registry()
    reg.add_module(InvisibleModule, abstract=True)
    # ...

With the _modules dictionary shortcut (for more details, see the FAQ section on this), you include it in a kwargs dict as part of a module tuple:

_modules = [AnotherModule, (InvisibleModule, {'abstract': True})]

There is also a 'hide_descriptor' parameter that prevents the module from appearing in the module palette without declaring it to be abstract.

The technical difference between the two is that 'abstract' will not add the item to the module palette while 'hide_descriptor' does add the item but immediately hides it. If the module should never be instantiated in a workflow, declare it abstract. If you don't want users to be able to add the module to a pipeline, but you have code that may add it programmatically, declare it with hide_descriptor=True.

The Console

Where should I go to find out what I can call from the console and how to import it?

We have tried to make some methods more accessible in the console via an api. You can import the api via import api in the console and see the available methods with dir(api). To open a vistrail:

import api
api.open_vistrail_from_file('/Applications/VisTrails/examples/terminator.vt')

To execute a version of a workflow, you currently have to go through the controller:

api.select_version('Histogram')
api.get_current_controller().execute_current_workflow()

Currently, only a subset of VisTrails functionality is directly available from the api. However, since VisTrails is written in python, you can dig down starting with the VistrailsApplication or controller object to expose most of our internal methods. If you have suggestions for calls to be added to the api, please let us know.

One other feature that we're working on, but is still in progress is the ability to construct workflows via the console. For example:

vtk = load_package('edu.utah.sci.vistrails.vtk')
vtk.vtkDataSetReader() # adds a vtkDataSetReader module to the pipeline
# click on the new module
a = selected_modules()[0] # get the one currently selected module
a.SetFile('/vistrails/examples/data/head120.vtk') # sets the SetFile parmaeter for the data set reader
b = vtk.vtkContourFilter() # adds a vtkContourFilter module to the pipeline and saves to var b
b.SetInputConnection0(a.GetOutputPort0()) # connects a's GetOutputPort0 port to b's SetInputConnection0

Persistence Package

How do I use the output of one workflow as the input for another using the persistence package?

You need to configure the persistence modules using the module's configuration dialog. After adding a PersistentOutputFile to the workflow, click on the triangle in the upper-right corner of the PersistentOutputFile, and select "Edit Configuration" from the menu that appears. In this dialog, select "Create New Reference" and give the reference a name (and any space-delimited tags). Upon running that workflow, the data will be written to the persistent store. In the second workflow where you wish to use that file, add a PersistentInputFile and go to its configuration dialog in the same manner as with the output file. In that dialog, select "Use Existing Reference" and select the data that you just added in the first workflow in the list of files below. Now, when you run that workflow, it will grab the data from the persistent store.

Here is an example: offscreen_persistent.vt. Run the "persistent offscreen" workflow first, and then run the "display persistent output" to use the output of the first workflow as the input for the second.

VTK

Given a VTK visualization, how can I generate a webpage from it?

Check out the html pipeline in offscreen.xml.

I'm trying to use VTK, but there doesn't seem to be any output. What is wrong?

To use VTK on VisTrails, you need a slightly different way of connecting the renderer modules. Instead of using the standard RenderWindow/RenderWindowInteractor infrastructure, you simply connect the renderer to a VTKCell. The examples directory in the distribution has several VTK examples that illustrate.

I am trying to add a module to the workflow via Python, but how can I access vtk modules?

Here's an example:

import api

vtvtk = 'edu.utah.sci.vistrails.vtk'

module = api.add_module(0, 0, vtvtk, 'vtkContourFilter', )

The third argument in add_module is the package identifier. You can find this in the "Module Packages" panel of the Preferences; just click on the package you're interested in and it will appear in the information on the right.

matplotlib

I'm experiencing a problem with Latex labels and the matplotlib that comes with VisTrails 1.5. The script below entered to the interpreter that comes with VT is sufficient to reproduce it.

  import matplotlib.pyplot as plt
  plt.plot([1,2,3],[1,2,3])
  plt.xlabel("$foo$")

Remove your ~/.matplotlib folder and re-start VisTrails

VisTrails Development

I would like to build VisTrails from source. Are there instructions on how to do this?

Yes! Take a look at http://www.vistrails.org/index.php/Mac_Intel_Instructions Installing VisTrails from source

Accessing Provenance Information

How do I access the information in the execution log?

The code responsible for storing execution information is located in the "core/log" directories, and the code that generates much of that information is in "core/interpreter/cached.py". Modules can add execution-specific annotations to provenance via annotate() calls during execution, but much of the data (like timing and errors) is captured by the LogController and CachedInterpreter (the execution engine) objects. To analyze the log from a vistrail (.vt) file, you might have something like the following:

 import core.log.log
 import db.services.io

 def run(fname):
  # open the .vt bundle specified by the filename "fname"
  bundle = db.services.io.open_vistrail_bundle_from_zip_xml(fname)[0]
  # get the log filename
  log_fname = bundle.vistrail.db_log_filename
  if log_fname is not None:
      # open the log
      log = db.services.io.open_log_from_xml(log_fname, True)
      # convert the log from a db object
      core.log.log.Log.convert(log)
      for workflow_exec in log.workflow_execs:
          print 'workflow version:', workflow_exec.parent_version
          print 'time started:', workflow_exec.ts_start
          print 'time ended:', workflow_exec.ts_end
          print 'modules executed:', [i.module_id 
                                      for i in workflow_exec.item_execs]
 if __name__ == '__main__':
    run("some_vistrail.vt")

You should be able to see what information is available by looking at the "core/log" classes. Accessing the Execution Log

FAQ