.. _chap-controlflow:

*************************
Control Flow in VisTrails
*************************

Scientific workflows usually follow a dataflow model, but, in some cases,
control structures, including loops and conditionals, are necessary to
accomplish certain tasks. |vistrails| provides the ``Control Flow``
package to support these and other structures.


The Map operator
================

In functional programming, ``map`` is a high-order function that applies 
a given function to a list (each element of the list is processed using this 
function) and returns a sequence of results. The ``Map`` module 
provides this functionality for workflows in |vistrails|. Note that this
module provides simple looping as it can be used to iterate through a list
of inputs.

In order to use the ``Map`` module, a ``ListOfElements``
type representing a list of data structure is also provided. Two additional
modules help users combine elements of lists: ``Dot`` combines the
elements of two lists like the dot product (the first element of the first list
is combined with the first of the second one, the second element of the first
list is combined with the second of the second one, and so on), and
``Cross`` combines the elements as the cross product (all the possible
combinations between the elements of the lists); both modules return a list of
tuples. In addition to two lists of elements as input ports, they take an
optional boolean input named "CombineTuple". This input is useful
when one or both input lists have tuples as elements; if this port is selected,
and its value is ``False``, the elements of the list will not be
combined in just one tuple, |eg| (1, 2) + 3 :math:`\rightarrow` ((1, 2), 3);  otherwise, the elements will be combined, |eg| (1, 2) + 3 :math:`\rightarrow` (1, 2, 3).

The Map module has four input ports:

* "FunctionPort": this port receives the module (via the "self" output port) that represents the function to be applied for each element of the input list; if the function uses more than one module, you must use a ``Group`` (see Chapter :ref:`chap-creating`) or a ``SubWorkflow`` and connect that composite module to this port;
* "InputPort": this port receives a list of the names of the input ports that represent the individual arguments of the function;
* "OutputPort": this port receives the name of the output port that represents the individual result of the function;
* "InputList": this port receives the input list for the loop; it must be a list of tuples if more than one function input port was chosen.


The output port "Result" produces a list of results, one for each element in the input list.

To better show how to use the ``Map`` module, let's use a workflow as
an example. Inside the "examples" directory of the |vistrails|
distribution, open the "triangle_area.vt" vistrail. Now, select the
"Surface Area" version. This version basically calculates the area
of a given isosurface. We are going to modify this version, in order to
calculate the areas of the isosurface given by contour values in a list.
Then, we will create a 2D plot to show all the areas.

Begin by deleting the ``StandardOutput`` module, and the connection
between the ``vtkDataSetReader`` and the ``vtkContourFilter``
modules. Then, drag the following modules to the canvas:

* ``Map``
* ``ListOfElements``
* ``Cross``
* ``MplPlot`` (under "matplotlib")
* ``MplFigure`` (under "matplotlib")
* ``MplFigureCell`` (under "matplotlib")
* ``InputPort`` (under "Basic Modules") - you will need two of them
* ``OutputPort`` (under "Basic Modules")
* ``PythonSource`` (under "Basic Modules")

Notice that when you drag ``Map`` to the pipeline canvas it will be
drawn in a different shape from the other modules. This is a visual cue to
help distinguish control modules from other modules. All control modules
have the same shape.

Select the ``vtkContourFilter`` module and delete its method "SetValue"
in the ``Set Methods`` container. Then, open its configuration dialog
(you can use the "Ctrl-E" or "Command-E" keyboard shortcut) and
enable this method (the input port "SetValue") by clicking on it,
and pressing ``OK``.

Then, connect the modules as shown in Figure :ref:`fig-controlflow-calculate_area`.

.. _fig-controlflow-calculate_area:

.. figure:: figures/controlflow/CalculateArea.png
   :align: center
   :width: 1.8in

   Connecting a subset of the modules to be grouped as a ``SubWorkflow``

These modules represent the function we wish to map: each element of the input list will be processed using them. Because we have more than one module, we need to create a ``Group`` or a ``SubWorkflow`` to identify the entire function. The ``InputPort`` and the ``OutputPort`` modules are
necessary to expose these ports in the ``Group``/``SubWorkflow``
structure.

In this example, we will use a ``SubWorkflow`` structure. Select all the
modules shown in Figure :ref:`fig-controlflow-calculate_area`, go to the
``Edit`` menu, and then click on ``Make SubWorkflow``. You can
name it ``CalculateArea``. Select this SubWorkflow and open its
configuration. When the configuration dialog opens, enable the output port
"self" and press ``OK``. You will need this port to connect to
the ``Map`` module.

.. topic:: Note

   When using ``Map``, the module (or subworkflow) used as function port in the map module MUST be a function, i.e., it can only define 1 output port.

Now, select the ``MplPlot`` module and open its configuration dialog. Inside
it, add two input ports of type ``ListOfElements``: "InputList" and
"X_Values". Also, copy the code listed below, in order to create the
necessary information for the 2D plot, into the source text area and save your
changes using the ``OK`` button.

.. code-block:: python

   subplot(212)

   dashes = [1, 3]
   list1 = self.getInputFromPort("InputList")
   list2 = self.getInputFromPort("X_values")
   list3 = []

   for i in xrange(len(list1)):
       list3.append(list2[i][1])

   l, = plot(list3, list1, marker="o", markerfacecolor="red",
        markersize=7, label="IsoSurface Areas", linewidth=1.5)

   l.set_dashes(dashes)

Next, edit the ``PythonSource`` module by adding an output port "List"
of type ``ListOfElements``, copying the following code to the source text area,
and saving these changes.  The code will create a range of contour values that we will
use as our input list.

.. code-block:: python

   result = []

   for i in xrange(4, 256, 4):
       result.append(i)

   self.setResult("List", result)

It may be helpful to identify this ``PythonSource`` module by labeling it as
``RangeList``. Connect all the modules as shown in
Figure :ref:`fig-controlflow-map_workflow`.

.. _fig-controlflow-map_workflow:

.. figure:: figures/controlflow/Map_Workflow.png
   :align: center
   :width: 4in

   All the modules connected in the canvas

You will set some parameters now:

* ``HTTPFile``: set the parameter "url" to http://www.sci.utah.edu/~cscheid/stuff/head.120.vtk
* ``ListOfElements``: set the parameter "value" to *[0]*
* ``Map``: set the parameter "InputPort" to *["SetValue"]* and the parameter "OutputPort" to *GetSurfaceArea*


The workflow is now ready to be executed. When you execute the workflow, you will
see the SubWorkflow ``CalculateArea`` executing several times (one time
for each value of the input list). It's important to notice that, although only the
module connected to ``Map`` (in this example, the SubWorkflow
``Calculate Area``) will be in the loop, the modules above it will be also
used in each loop iteration; the difference is that they are going to be executed
only for the first iteration; in all other iterations, the results will be taken
from the cache.

When the workflow finishes its execution, the |vistrails| Spreadsheet will contain
a 2D plot (Figure :ref:`fig-controlflow-map_spreadsheet`).

.. _fig-controlflow-map_spreadsheet:

.. figure:: figures/controlflow/Map_Spreadsheet.png
   :align: center
   :width: 100%

   The result in the |vistrails| Spreadsheet

This example can be found in the version "Surface Area with Map", inside
the "triangle_area.vt" vistrail.


Filtering results
=================

When computing large lists of results, it can be useful to selectively reduce the list during execution to avoid unnecessary computation.

The ``Filter`` module was developed to address this issue. It receives an
input list and, based on a specified boolean condition, returns only elements of
the list that satisfy the condition. Its ports are the same as those in the
``Map`` module. The difference between these modules is related to the
structure: in ``Filter``, the output port "FunctionPort" cannot
receive any function, but only a condition; in other words, the selected port in
"OutputPort" must return a boolean value or ``Filter`` will not
work. Thus, ``Filter`` will not return a list with boolean values as
``Map`` would do, but rather the elements of the input list for which
the condition evaluated to ``True``.

To better understand how ``Filter`` works, let's modify our earlier example
to filter out areas less than 200,000. With the previous vistrail open (you can
use the "Surface Area with Map" version), add the following modules to the
canvas:

* ``Filter``
* ``PythonSource`` (under "Basic Modules")

Edit the configuration of ``PythonSource`` by adding an input port of type
``Float`` named "Area", and an output port of type ``Boolean``
named "Condition", and writing the following code inside the source text
area:

.. code-block:: python

   area = self.getInputFromPort("Area")

   if area>200000.00:
       self.setResult("Condition", True)
   else:
       self.setResult("Condition", False)

Press the ``OK`` button. You can label this ``PythonSource`` as
``FilterCondition``. Now, reorganize the modules in the canvas as shown in
Figure :ref:`fig-controlflow-mapandfilter_workflow`.

.. _fig-controlflow-mapandfilter_workflow:

.. figure:: figures/controlflow/MapAndFilter_Workflow.png
   :align: center
   :width: 3.8in

   The new organization of the modules in the canvas

Select the ``Filter`` module and set the values of its parameters to the
following:

* "InputPort": *["Area"]*
* "OutputPort": *Condition*

When you execute this workflow, it will generate another plot that is similar to
the one from the ``Map`` example, but only areas above 200,000 are
considered (Figure :ref:`fig-controlflow-mapandfilter_spreadsheet`).

.. _fig-controlflow-mapandfilter_spreadsheet:

.. figure:: figures/controlflow/MapAndFilter_Spreadsheet.png
   :align: center
   :width: 100%

   The result in the |vistrails| spreadsheet

This example is already inside the "triangle_area.vt" vistrail, in the
"Surface Area with Map and Filter" version.

Later in this chapter, you will see how to combine ``Map`` and ``Filter``
in one single module, by creating your own control structure.


Conditional module
==================

Conditional statements are a very important control flow structure frequently used
in programming languages, and the ``if`` structure is probably the most
common of these structures. In scientific workflows, for example, an ``if``
structure can be used to select the part of the pipeline to be executed based on a
boolean condition.

For this reason, the ``Control Flow`` package also includes an ``If``
module. Its input ports are:

* "Condition": this port receives a boolean value that will specify the direction of computation;
* "TruePort": this port receives the part of the workflow that will be executed if the condition value is ``True``; you don't need to group or make a SubWorkflow in this case: just connect the output port "self" of the last module in this port;
* "FalsePort": this port receives the part of the workflow that will be executed if the condition value is ``False``; as with the
* "TruePort" port, you don't need to group or make a SubWorkflow;
* "TrueOutputPorts": this port receives a list that contains the names of the output ports of the module connected to "TruePort" that you want the result of; this port is optional; 
* "FalseOutputPorts": this port receives a list that contains the names of the output ports of the module connected to "FalsePort" that you want the result of; this port is optional.


The ``If`` module has an output port named "Result" that returns a
list with the results of the specified output ports of "TrueOutputPorts" or
"FalseOutputPorts", depending on the condition value. If only one output
port is chosen, the result of this port will not be returned in a list. If
"TrueOutputPorts" or "FalseOutputPorts" are not enabled,
"Result" returns ``None``.

Let's do now a simple example to show how exactly this module works. This example is
from the bioinformatics domain, and takes a string as the input; if it's a structure
identifier, a web service from the European Bioinformatics Institute, or simply EBI
(http://www.ebi.ac.uk/), a centre of researchs in bioinformatics,
is used to get the structure in the PDB format, a standard representation for
macromolecular structure, and then, the ``VTK`` package is used to show the protein in
the |vistrails| Spreadsheet; otherwise, the input is assumed to be invalid, and a message
is generated in the Spreadsheet too.

First, the EBI's web service must be enabled. For this, you need to add the following
url to the ``wsdlList`` configuration:


``http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl``


Don't forget to ensure that the ``webServices`` package is enabled in the 
``Preferences`` dialog. For more information about web services in |vistrails|, see
Chapter :ref:`chap-webservices`.

Now, you're going to drag the following modules to the canvas:

* ``If``
* ``fetchData`` (under "Methods" for the current web service)
* ``vtkPDBReader`` (under "VTK")
* ``vtkDataSetMapper`` (under "VTK")
* ``vtkActor`` (under "VTK")
* ``vtkRenderer`` (under "VTK")
* ``VTKCell`` (under "VTK")
* ``PythonSource`` (under "Basic Modules") - you will need three of them
* ``String`` (under "Basic Modules")
* ``RichTextCell`` (under "|vistrails| Spreadsheet")

Select one of the ``PythonSource`` modules, and open its configuration dialog. Inside it,
add one input port of type ``String``, named "PDB_format", and one output port of
type ``File``, named "File". Then, write the following code:

.. code-block:: python
   :linenos:

   PDB_format = self.getInputFromPort('PDB_format')

   output = self.interpreter.filePool.create_file()
   file_ = open(str(output.name), 'w')
   file_.write(PDB_format)

   self.setResult('File', file_)

   file_.close()

You can name this module as ``CreateFile``.
Now, set some paremeters of ``fetchData``:

* "format": *pdb*
* "style": *raw*

Next, connect some modules as shown in Figure :ref:`fig-controlflow-if_group`.

.. _fig-controlflow-if_group:

.. figure:: figures/controlflow/If_Group.png
   :align: center
   :width: 1.5in

   Some modules of the workflow connected

The aim of this group of modules is to get the PDB format of the structure ID, through the web service,
and then make the visualization with the ``VTK`` package. This is the part of the workflow
that will be executed if the input is a structure identifier.

Next, select another ``PythonSource`` module and open its configuration
dialog too. One input port named "Structure", of type ``String``, and one
output port named "Is_ID", of type ``Boolean``, must be added, as well as the
code below:

.. code-block:: python
   :linenos:

   if"\n" in structure:
       lineLen = structure.index("\n")
   else:
       lineLen = -1
   if lineLen<1:
       lineLen = len(structure)

   if ":" in structure:
       index = structure.index(":")
   else:
       index = -1

   if (structure[0]!="ID ") and (index>0) and (index<lineLen):
       is_ID = True
   else:
       is_ID = False

   self.setResult("Is_ID", is_ID)

Name this module as ``Is_ID``. This module will be the condition for the ``If``
structure.

Now, select the last ``PythonSource`` module, and, inside its configuration, add
one input port of type ``String``, named "Input", and one output port of
type ``File``, named "html". Then, copy the code below:

.. code-block:: python
   :linenos:

   input = self.getInputFromPort("Input")

   output = self.interpreter.filePool.create_file()
   f = open(str(output.name), 'w')
   text = '<HTML><TITLE>Protein Visualization</TITLE><BODY BGCOLOR="#FFFFFF">'
   f.write(text)
   text = '<H2>Protein Visualization Workflow</H2>'
   f.write(text)
   text = '<H3>The following input is not an ID from a protein:</H3>'
   text += '<H4>' + str(input) + '</H4>'
   text += '<H3>The visualization cannot be done.</H3>'
   f.write(text)

   text = '</BODY></HTML>'
   f.write(text)

   self.setResult('html', f)

   f.close()

Name this ``PythonSource`` as ``Not_ID``. This module will print a message in the
|vistrails| Spreadsheet when the input is not a structure identifier.

Finally, the ``String`` module can be named as ``Workflow_Input``, to make it
clear that it takes the input of the workflow. Also, open the configuration dialog of
``RichTextCell`` to enable the output port "self", so it can be connected to the
``If`` module. Then, connect all the modules as shown in Figure :ref:`fig-controlflow-if_workflow`. 

.. _fig-controlflow-if_workflow:

.. figure:: figures/controlflow/If_Workflow.png
   :align: center
   :width: 3.0in

   All the modules connected

In order to better organize the disposal of the modules, group all the modules shown in
Figure :ref:`fig-controlflow-if_group` by selecting them, going to the ``Edit``
menu and clicking on ``Group``. Name it as ``Generate_Visualization``.
Your workflow must correspond to the one shown in Figure :ref:`fig-controlflow-if_workflow_group`.

.. _fig-controlflow-if_workflow_group:

.. figure:: figures/controlflow/If_Workflow_Group.png
   :align: center
   :width: 3.8in

   The final workflow, using the ``Group`` structure

Note that this implementation follows exactly the initial especification of the workflow. If the input
is a structure identifier (``Is_ID`` returns ``True``), ``Generate_Visualization``
will be executed; otherwise (``Is_ID`` returns ``False``), ``Not_ID``
and ``RichTextCell`` will create an error message in the |vistrails| Spreadsheet.

For the workflow execution, set the parameter "value" of the
``Workflow_Input`` module to *PDB:3BG0*. This entry is an ID from a
protein; so, the condition will be ``True``, and the ``Generate_Visualization``
group will be executed, generating the visualization show in
Figure :ref:`fig-controlflow-if_spreadsheet_true`.

.. _fig-controlflow-if_spreadsheet_true:

.. figure:: figures/controlflow/If_Spreadsheet_True.png
   :align: center
   :width: 100%

   The visualization of the protein in the |vistrails| Spreadsheet

If you change the value from the input port "value" to *protein*, for example, the
condition will be ``False``, and the message shown in
Figure :ref:`fig-controlflow-if_spreadsheet_false` will be generated in the Spreadsheet.

.. _fig-controlflow-if_spreadsheet_false:

.. figure:: figures/controlflow/If_Spreadsheet_False.png
   :align: center
   :width: 3.8in

   The message in the Spreadsheet, generated when the input is not a structure ID

This example can be found inside the "examples" directory, in the
"protein_visualization.vt" vistrail. It was partially based on the workflow
"Structure_or_ID", which can be found at http://www.myexperiment.org/workflows/225.

Building your own loop structure
================================

In functional programming, ``fold`` is a high-order function used to
encapsulate a pattern of recursion for processing lists. A simple example of a
``fold`` is summing the elements inside a list. If you ``fold`` the
list [1, 2, 3, 4] with the sum operator, the result will be (((1+2)+3)+4) = 10. It's
common to start with an initial value too. In the sum example, the initial value
would be 0, and the result would be ((((0+1)+2)+3)+4) = 10.

With this function, a programmer can do any type of recursion. In fact, the
``map`` and ``filter`` functions, shown previously, can be implemented
with ``fold``. The ``Control Flow`` package provides a ``Fold``
module to enable this functionality, and the ``Map`` and the ``Filter``
modules inherit from the ``Fold`` class.

In fact, any control module that has this kind of recursion uses the ``Fold``
class. To use this functionality for your own control modules, instead of defining
the ``compute()`` method, you need to define two other methods:

* ``setInitialValue()``: in this method, you will set the initial value of the fold operator through the ``self.initialValue`` attribute; 
* ``operation()``: in this method, you must implement the function to be applied recursively to the elements of the input list (|eg| the sum function). More specifically, you need to define the relationship between the previous iteration's result (``self.partialResult`` attribute) and the current element of the list (``self.element`` attribute); this method must be defined after the ``setInitialValue()`` one.

It's important to notice that all modules inheriting from ``Fold`` will have
the same ports, as ``Map`` and ``Filter``, but you can add any other
ports that will be necessary for your control structure. Also, you do not need to use
the input ports "FunctionPort", "InputPort" and
"OutputPort". You will only use them when you create an operator like
``Map`` and ``Filter``, which need a function to be applied for each
element of the input list.

As an example, we will create a simple ``Sum`` module to better understand the
idea. Create a new package, and the code inside it would be as follows:

.. role:: red

.. code-block:: python
   :linenos:
  
   from controlflow import Fold, registerControl

   version = "0.1"
   name = "My Control Modules"
   identifier = "edu.utah.sci.my_control_modules"

   def package_dependencies():
       return ["edu.utah.sci.vistrails.control_flow"]

   class Sum(Fold):
       def setInitialValue(self):
           self.initialValue = 0

       def operation(self):
           self.partialResult += self.element

   def initialize(*args,**keywords):
       registerControl(Sum)

.. highlight:: python
   :linenothreshold: 1

.. .. parsed-literal::

   :red:`from controlflow import Fold, registerControl`

   version = "0.1"
   name = "My Control Modules"
   identifier = "edu.utah.sci.my_control_modules"

   :red:`def package_dependencies():`
       :red:`return ["edu.utah.sci.vistrails.control_flow"]`

   class Sum(:red:`Fold`):
       :red:`def setInitialValue(self):`
           :red:`self.initialValue = 0`

       :red:`def operation(self):`
           :red:`self.partialResult += self.element`

   def initialize(\*args,**keywords):
       :red:`registerControl(Sum)`

We begin by importing the ``Fold`` class and the ``registerControl``
function from the ``Control Flow`` package (Line 1).
The ``registerControl`` function is used to register the control modules, so
the shape of them can be set automatically.

Also, define the variables ``version``, ``name`` and
``identifier``, as it's done for all
packages. The interpackage dependency (include reference of the package chapter) is
used too, as ``My Control Modules`` requires a module and a function from
``Control Flow`` (Lines 7 and 8); in
this way, |vistrails| can initialize the packages in the correct order. Then, create
the class ``Sum``, which inherits from ``Fold``. Inside it, set the
initial value to 0 inside the ``setInitialValue()`` method
(Lines 11 and 12), and define the sum operator
inside ``operation()``, as shown clearly by the relation between
``self.partialResult`` and ``self.element``
(Lines 14 and 15).

The last thing we must do is define the ``initialize()`` method, so the
package can be loaded in |vistrails|. However, instead of calling the registry, if you
do not need any other ports, you just have to call the ``registerControl()``
function (Line 18).

Save this package and enable it inside |vistrails|. Create a similar workflow as shown
in Figure :ref:`fig-controlflow-sum_workflow`.

.. _fig-controlflow-sum_workflow:

.. figure:: figures/controlflow/Sum_Workflow.png
   :align: center
   :width: 100%

   A workflow using the ``Sum`` module

Upon executing this workflow, the sum ((((0+1)+2)+3)+4), should be printed on your
terminal as following:

``10``

Note that the input ports "FunctionPort", "InputPort" and
"OutputPort" were not necessary for this module. Now, let's see another
example that does use them. Open the workflow we used to calculate the area of
isosurfaces (in "triangle_area.vt", "Surface Area with Map
and Filter" version), and delete the ``Map``, the ``Filter``, and the
``FilterCondition`` (``PythonSource``) modules.

Now, create a single module that maps the list and filters the results, named as
``AreaFilter``. Inside your package, add the following class:

.. code-block:: python
   :linenos:

   class AreaFilter(Fold):
       def setInitialValue(self):
           self.initialValue = []

       def operation(self):
           area = self.elementResult

           if area>200000:
               self.partialResult.append(area)

.. .. parsed-literal::

   class AreaFilter(:red:`Fold`):
       :red:`def setInitialValue(self):`
       .. _ref-areafilter-config1:

           :red:`self.initialValue = []`

       :red:`def operation(self):`
           :red:`area = self.elementResult`\label{ref:areafilter:config2}

           :red:`if area>200000:`\label{ref:areafilter:config3}
               :red:`self.partialResult.append(area)`\label{ref:areafilter:config4}

The initial value is an empty list, so the result of each element can be appended to
it (Line 3). In the ``operation()`` method, the
``self.elementResult`` attribute is used (Line 6);
it represents the result of the port chosen in "OutputPort"; so, it means
that "FunctionPort", "InputPort" and "OutputPort" will have
connections. In this workflow, ``self.elementResult`` is the area for each
contour value inside the input list, and, if the area is above 200,000, it will be
appended to the final result (Lines 8 and 9). We can easily see that this module does exactly
the same as ``Map`` and ``Filter`` combined.

Don't forget to register this module in the ``initialize()`` function. After
doing this, save the package and load it again inside |vistrails|. Then, just connect
``AreaFilter`` as in Figure :ref:`fig-controlflow-areafilter_workflow`.

.. _fig-controlflow-areafilter_workflow:

.. figure:: figures/controlflow/AreaFilter_Workflow.png
   :align: center
   :width: 3.8in

   The same workflow, but now with ``AreaFilter``

Now, you must set some values in the following parameters of ``AreaFilter``:

* "InputPort": *["SetValue"]*
* "OutputPort": *GetSurfaceArea*

When you execute this workflow, the result in the |vistrails| Spreadsheet will be the
same as shown previously (Figure :ref:`fig-controlflow-mapandfilter_spreadsheet`). It
shows the flexibility of doing a recursion function by inheriting from
``Fold``.