Difference between revisions of "Log API"

From VistrailsWiki
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==


The purpose of a log API is to make detailed queries on the vistrails execution log possible. An index has been created that can answer high-level queries about workflow executions. However, to enable queries on individual module executions, the log inside each vistrail file need to be accessed. Also, to answer queries related to workflow modules such as parameters, the original workflow pipeline need to be correlated with the execution log.  
The purpose of a log API is to make detailed queries on the vistrails execution log possible. An index has been created that can answer general queries about workflow executions. However, to enable queries on individual module executions, the log inside each vistrail file need to be accessed, which may be slow. Also, to answer queries related to workflow modules such as parameters, the original workflow pipeline need to be correlated with the execution log.  


The information required to process different log queries can be divided into 3 parts:
The information required to process different log queries can be divided into 3 parts. First, using only the index. Second, using the index and complete log files. Third, correlating the index, and log file with the corresponding pipelines. For each part I will state the information available and a few example queries that might be possible.


== Part 1: Queries only requiring information from the index that are fast to process ==
== Part 1: Queries only requiring information from the index ==
The index contains execution-level information such as:
The index contains execution-level information such as:
* start-end time of pipeline executions
* start-end time of pipeline executions
Line 13: Line 13:
* Time range queries, e.g. executions on a specific day or month.
* Time range queries, e.g. executions on a specific day or month.
* Specific users or execution results
* Specific users or execution results


== Part 2: Queries on individual module executions ==
== Part 2: Queries on individual module executions ==
Line 39: Line 38:
It would be good to ask users which types of queries are important, as not all types of queries may be required in day-to-day work.
It would be good to ask users which types of queries are important, as not all types of queries may be required in day-to-day work.


It may be the case that most queries can be answered by using the index, and more advanced queries can be answered by using an execution viewer, that should contain all the module executions for a specific pipeline execution as well as the relevant pipeline.
It may be the case that most queries can be answered by using the index, and more advanced queries can be answered by using an execution viewer, which should contain all the module executions for a specific pipeline execution as well as the relevant pipeline.

Latest revision as of 17:21, 7 February 2011

Introduction

The purpose of a log API is to make detailed queries on the vistrails execution log possible. An index has been created that can answer general queries about workflow executions. However, to enable queries on individual module executions, the log inside each vistrail file need to be accessed, which may be slow. Also, to answer queries related to workflow modules such as parameters, the original workflow pipeline need to be correlated with the execution log.

The information required to process different log queries can be divided into 3 parts. First, using only the index. Second, using the index and complete log files. Third, correlating the index, and log file with the corresponding pipelines. For each part I will state the information available and a few example queries that might be possible.

Part 1: Queries only requiring information from the index

The index contains execution-level information such as:

  • start-end time of pipeline executions
  • user
  • result (success, failed, cached, or not executed).

Example queries:

  • Time range queries, e.g. executions on a specific day or month.
  • Specific users or execution results

Part 2: Queries on individual module executions

The log contains module-level execution information such as:

  • start-end times of module executions
  • result (success, failed, cached, or not executed).
  • Errors
  • Execution annotations

Example queries:

  • Module execution lasting more than 5 minutes
  • Error annotation containing a specific word

Part 3: Queries on module types and parameters

The vistrail contain the pipeline information which includes:

  • Module types
  • Parameters
  • Module annotations
  • Connections

Example queries:

  • Failed executions of a specific module type
  • Visual Query where a specific module type is connected to a module that have failed.
  • All failed executions of a specific module type with a specific parameter.

Summary

It would be good to ask users which types of queries are important, as not all types of queries may be required in day-to-day work.

It may be the case that most queries can be answered by using the index, and more advanced queries can be answered by using an execution viewer, which should contain all the module executions for a specific pipeline execution as well as the relevant pipeline.