Wrapping command line tools using package CLTools

Package CLTools

The package CLTools provide a way to wrap command line tools so that they can be used as modules in VisTrails. It includes a wizard that simplifies the creation of wrappers. To use the package, enable CLTools in the package configuration window. The package will be empty until you add a wrapper for a command line tool. When you have added a wrapper you need to reload the wrappers by either pressing the reload button in the wizard, reloading the CLTools package, or selecting Packages->CLTools->Reload All Scripts on the menu.

Using the CLTools Wizard

You can run the Wizard from within VisTrails. First, make sure the CLTools package is enabled. Then, on the menu, select Packages->CLTools->Open Wizard.

Or, to launch the wizard from the command line run: python vistrails/package/CLTools/wizard.py

The wizard allows you to create and edit wrappers for command line tools. Input/output ports can be created as arguments to the command or using pipes (stdin, stdout, or stderr). Figure 1.1 shows the main interface. Command line arguments can be added, removed and rearranged. Pipes can be added and configured. There is a preview line where you can see how your command will look when executed. You can also push the preview button to see which ports will be available for the vistrails module, as shown in the bottom right. This example shows some of the most common ways to specify arguments. In order: The standard output is used as a string output port, an integer attribute using the -i flag, a boolean flag -A that can be turned on or off, an input file using a prefix, an output file using the -o flag, and finally a simple string input. Note the way arguments correspond to ports in the bottom right.

_images/wizard.png

Figure 1.1 - CLTools Wizard main window

Arguments can represent either input ports, output ports, both, or constant strings. Ports can handle different types such as boolean flags, strings, integers, floats, or files. Lists of strings and files are also possible. Each argument can have a flag before it such as -f or a prefix such as --file=.

A file ending can be specified for files that are used as outputs using file suffix.

You can view and import flags from man and help pages (See Figure 1.2).

_images/import.png

Figure 1.2 - Import Arguments Window

Files should be saved as {modulename}.clt in the directory .vistrails/CLTools/

Supported flags:

-c   Import a command with arguments automatically
     For example, to create a wrapper for ls with two flags -l and -A run:
     python wizard.py -c ls -l -A

Try it Now!

Create a wrapper that takes a file as input and generate a file as output using -o. The ports should always be visible. The command looks like:

filter infile -o outfile

Your wrapper should look like in figure Figure 1.3. Note that the order of the arguments is always preserved:

_images/inputoutputfile.png

Figure 1.3 - An infile outfile wrapper

Port visibility

Figure 1.4 shows how the visible setting affects ports in VisTrails. Visible ports are meant to be connected to other modules, and are shown as square input or output ports on the module, while non-visible ports are meant to be optional, or added as parameters on the input port list to the right. Non-visible ports can be made visible on the module by clicking on the left side of the ports pane, so that a eye icon is displayed. The example below has 2 visible input ports and one visible output port. The input list to the right shows available inputs, bot visible and non-visible. The first input in the input list to the right is visible by default, which is shown by a greyed-out eye. The second port is non-visible by default but has been made visible as shown by the eye icon. The second input is non-visible but can be made visible on the module by clicking so that the eye icon becomes visible.

_images/visibility.png

Figure 1.4 - Port visibility in VisTrails

Environment Variables

There are three ways to set environment variables in CLTools. If your environment variables is platform-dependent, you should set the env configuration variable for the CLTools package. In VisTrails, go to the Preferences->Module Packages dialog, select CLTools, make sure it is enabled, and select Configure.... Set the env variable to the preferred environment. Separate name and value using = and variables using ;, like this:

PATH=/my/custom/path;DEBUG=;MYVAR=32

If you want to specify the variables in your workflow, you can enable the env input port on your module by checking the env option in the top toolbar in the CLTools wizard. Then you can specify environment variables either as parameters to your module or by connecting the env input port to other modules. Multiple parameters can be specified as a single string or by adding multiple env parameters. These variables overrides variables specified using the other two methods.

For modules that always need the same environment variables, they can be added to the module by editing the .clt file directly and adding an env entry in the options section as shown below. These variables overrides the ones specified in the CLTools configuration:

{
    "command": "ls",
    "options": {
        "env": "MYVAR=/my/custom/path;MYVAR2=64"
    }
}

Note that if you replace e.g. the PATH variable, you should include the existing path, which can be found by running e.g. echo $PATH on the command line.

Setting working directory

The Directory field to the right of the command field can be used to specify the working directory where the command will be executed. It does not specify the directory where the command is found. Use the PATH enviroment variable for that.

InputOutput files

The InputOutput port should be used for commands that modifies a file in-place, so that it is used both as and input and an output. An example of using the InputOutput module is shown in Figure 1.5. When executed, the input file will be copied to a temporary file before it is passed to the command and used as an output. This is because you should not (if you can avoid it) modify the inputs to your modules, because they may be used by other modules, or re-executed by the same module. It may be useful to set the file suffix attribute to make sure the copied file is of the same type as the original. There is currently no way of passing the original file to the command, since it is discouraged. But if this is necessary in a particular case, CLTools can be easily modified to do this.

_images/inputoutputport.png

Figure 1.5 - Example of an InputOutput port

Creating a standalone package

When you have a working set of wrappers and want to distribute them, you should put them in a separate module package. This allows you to name and version your package, and makes sure there are no conflicts with modules using the same name as yours. One warning: workflows using the old modules will need to be recreated to use the modules in this new package instead, so it is better to start building workflows after a separate package has been created. Below are the steps to follow in order to set up a new package.

  1. Create a new directory in $HOME/.vistrails/userpackages/
  2. Copy __init__.py and init.py from vistrails/packages/CLTools to this new directory
  3. Update name, identifier, and version in __init__.py to the desired values
  4. Move all desired tools (*.clt files) to this new directory
  5. Enable and test your new package!

File Format

The wrapper is stored as a JSON file and can be edited using a text editor if needed. It uses the following schema:

ROOT is a dict with the following possible keys:

  • command (required) - value is the command to execute like “cat” or “/home/tommy/cat”
  • stdin - handle stdin - value is a 3-list [“port name”, CLASS, OPTIONDICT]
  • stdout - handle stdout - value is a 3-list [“port name”, CLASS, OPTIONDICT]
  • stderr - handle stdout - value is a 3-list [“port name”, CLASS, OPTIONDICT]
  • args - list of ordered arguments that can either be constants, inputs, or outputs. See ARG.
  • dir - value is the working directory to execute the command from
  • options - a dict of module options - see OPTIONDICT

OPTIONDICT is a dict with module specific options, recognized options are:

  • std_using_files - connect files to pipes so that they need not be stored in memory. This is useful for large files but may be unsafe since it does not use subprocess.communicate
  • env - A list of environment variables to set when executing the command, with entries separated by ; and key/value pairs separated by =. This overrides all other environment variables set, except for the env_port, and should only be used when they are not expected to change. It can only be set by editing the .clt files directly with a text editor.
  • env_port - Set to add an input port env for specifying the environment variables to use, this overrides all other environment variables set

ARG is a 4-list containing [TYPE, “name”, KLASS, ARGOPTIONDICT] TYPE is one of:

  • input - create input port for this arg
  • output - create output port for this arg
  • inputoutput - create both input and output port for this arg. The type must be File and a copy of the original file will be processed and used as output.
  • constant - use “port name” directly as a constant string

CLASS indicates the port type and can be one of the following. String is used by default.

  • File - A vistrails File type. The filename will be used as the argument
  • String - A vistrails String type. The string will be used as the argument
  • Integer - A vistrails Integer type. Its string value will be used as the argument
  • Float - A vistrails Float type. Its string value will be used as the argument
  • Flag - A vistrails Bool type. A boolean flag that when set to true will add the value of the argument to the command.
  • List - A list of values of the type specified by the type option. All values in the list will be added as arguments.

ARGOPTIONDICT is a dict containing argument options. recognized options are:

  • type: CLASS - used by List-types to specify subtype.
  • flag: name - Append name as a short-style flag before the specified argument. If type is List it is appended before each item
  • prefix: name - Append name as a long-style prefix to the final argument. If it is also a list it is appended to each item.
  • required: None - Makes the port always visible in VisTrails.
  • suffix: name - Specifies the file ending for created files

Try it Now!

Wrap the command “cat” that takes 2 files as input named “first” and “second”. Also take a list of files as input named “rest”. Catch stdout as file, name it “combined”. Catch stderr as string, name it “stderr”. Show “first” and “combined” by default.

Your wrapper should now look like this:

{"command": "cat",
 "args": [["input", "first", "File", {"required":""}],
          ["input", "second", "File", {}],
          ["input", "rest", "List", {"type":"File"}]],
"stdout": ["combined", "File", {"required":""}],
"stderr": ["stderr", "String", {}]
}

Save as {yourhomedirectory}/.vistrails/CLTools/cat.clt Reload CLTools package in VisTrails. Test the new module.