ExecutablePapers

From VistrailsWiki
Jump to navigation Jump to search

Introduction

While computational experiments have become an integral part of the scientific method, it is still a challenge to repeat such experiments, because often, computational experiments require specific hardware, non-trivial software installation, and complex manipulations to obtain results. We posit that integrating data acquisition, derivation, analysis, and visualization as executable components throughout the publication process will make it easier to generate and share repeatable results. We have built an extensible infrastructure to support the life-cycle of executable publications---their creation, review and re-use. Our focus is on papers whose computational experiments can be reproduced and validated. We note that our approach is orthogonal to others which focus on semantics and authoring, and can be combined with these.

We have written a paper that details the challenges of computational repeatability and the solutions we have developed: http://www.cs.utah.edu/~juliana/pub/vistrails-executable-paper.pdf

Infrastructure

We have developed a set of techniques to help authors, reviewers, and readers construct and interact with executable papers. Many of these techniques have been integrated with the VisTrails system and the crowdLabs site. VisTrails is an open-source scientific workflow and provenance management system, and our extensions for executable papers allow users to create papers and Web publications whose figures and results are directly tied to the computations that generated them. Both the LaTeX and MediaWiki extensions call VisTrails (either a server or local executable) to embed results into their documents. In addition, authors can embed images of the workflows that generated a particular result or the history of their explorations. These capabilities are included in the VisTrails 1.6.2 release available for Windows, Mac, and Linux.

Authors can also choose to couple a publication with the actual computations by embedding links to the workflows in the paper. The embedding assistant in VisTrails can generate these links automatically when computations are hosted on an accessible database. Using a compatible PDF reader (e.g. Adobe Acrobat), readers can click on a result, and the embedded link will access a vtl file that contains the workflow or the necessary information to access a remotely-hosted workflow. VisTrails can open vtl files, and from that file will open the corresponding workflow. Note that you may have to alert your operating system of the file extension association. A reader can then explore the result by changing the input data or original parameters. That reader might choose to publish the modified result to a Web page or another publication.

Our MediaWiki extensions allow authors to embed computational results into wiki pages. The wiki extension allows authors to reference results by tag which allows a user to update results without updating the wiki text. Again, the embedding assistant in VisTrails can generate the markup automatically, and a user simply copies that markup into the wiki page. The crowdLabs site allows authors to host computations and interactive results; VisTrails supports synchronizing workflows with this site. Besides hosting computations, crowdLabs can host input and output data, allows users to search for results, and provides social networking features so other readers can add comments, rate workflows, or form groups. In addition, users can embed results from crowdLabs directly using links on the computation pages. Both the MediaWiki extensions and crowdLabs site support server-based executions of workflows so users can interact with and modify results from a Web browser. VisMashup support allows authors to generate Flash applications that allow users to interactively explore an author-defined subset of parameters and input data.

Demonstrations

To see our infrastructure in action, please see the following videos:


Examples of Executable Publications

ALPS 2.0

For a real example of an executable paper, whose results can be reproduced and validated, please see the ALPS 2.0 paper published in the Journal of Statistical Mechanics. To repeat the experiments shown in the paper, you will need to Downloads download VisTrails 1.6.2. The ALPS 2.0 package is included in VisTrails 1.6.

CFD Flow Analysis

Another example of an executable publication, can be found at: http://www.vistrails.org/index.php/User:Tohline/CPM/Levels2and3 Because this paper is published on a Wiki, it is possible to interact with the results using a Web browser. Try it out!

WikiQuery

Here's a case study we did for the SIGMOD Repeatability effort: http://www.cs.utah.edu/~juliana/WikiQuery/WikiQuery_casestudy.pdf

For more details on how to run the WikiQuery experiments, and to obtain the experiments (data, code, workflows) see WikiQuery.