Difference between revisions of "ExecutablePapers"

From VistrailsWiki
Jump to navigation Jump to search
 
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Introduction ==
While computational experiments have become an integral part of the scientific method, it is still a challenge to repeat such experiments, because often, computational experiments require specific hardware, non-trivial software installation, and complex manipulations to obtain results. We posit that integrating data acquisition, derivation, analysis, and visualization as executable components throughout the publication process will make it easier to generate and share repeatable results.  
While computational experiments have become an integral part of the scientific method, it is still a challenge to repeat such experiments, because often, computational experiments require specific hardware, non-trivial software installation, and complex manipulations to obtain results. We posit that integrating data acquisition, derivation, analysis, and visualization as executable components throughout the publication process will make it easier to generate and share repeatable results.  
We have built an extensible infrastructure to support the life-cycle of ''executable publications''---their creation, review and re-use.
We have built an extensible infrastructure to support the life-cycle of ''executable publications''---their creation, review and re-use.
Our focus is on papers whose ''computational experiments'' can be reproduced and validated. We note that our approach is orthogonal to others which focus on semantics and authoring, and can be combined with these.
Our focus is on papers whose ''computational experiments'' can be reproduced and validated. We note that our approach is orthogonal to others which focus on semantics and authoring, and can be combined with these.


Here's a '''position paper''' that details the challenges of computational repeatability and the solutions we have developed:
We have written a paper that details the challenges of computational repeatability and the solutions we have developed:
http://www.cs.utah.edu/~juliana/pub/vistrails-executable-paper.pdf
http://www.cs.utah.edu/~juliana/pub/vistrails-executable-paper.pdf


To see our infrastructure in action, check out the following '''videos''':
== Infrastructure ==
* Editing an executable paper written using LaTeX: {{qt|link=http://www.vistrails.org/download/download.php?type=MEDIA&id=executable_paper_latex.mov|text=View}}
 
* Editing a Web-hosted paper using MediaWiki and server-based computation: {{qt|link=http://www.vistrails.org/download/download.php?type=MEDIA&id=executable_paper_wiki.mov|text=View}}
We have developed a set of techniques to help authors, reviewers, and readers construct and interact with executable papers.  Many of these techniques have been integrated with the [http://www.vistrails.org/ VisTrails] system and the [http://www.crowdlabs.org crowdLabs] site.  VisTrails is an open-source scientific workflow and provenance management system, and our extensions for executable papers allow users to create papers and Web publications whose figures and results are directly tied to the computations that generated them.  Both the [http://www.latex-project.org LaTeX] and [http://www.mediawiki.org MediaWiki] extensions call VisTrails (either a server or local executable) to embed results into their documents.  In addition, authors can embed images of the workflows that generated a particular result or the history of their explorations.  These capabilities are included in the [[Downloads | VisTrails 1.6.2 release]] available for Windows, Mac, and Linux.
 
Authors can also choose to couple a publication with the actual computations by embedding links to the workflows in the paper.  The embedding assistant in VisTrails can generate these links automatically when computations are hosted on an accessible database.  Using a compatible PDF reader (e.g. [http://get.adobe.com/reader/ Adobe Acrobat]), readers can click on a result, and the embedded link will access a <code>vtl</code> file that contains the workflow or the necessary information to access a remotely-hosted workflow.  VisTrails can open <code>vtl</code> files, and from that file will open the corresponding workflow.  Note that you may have to alert your operating system of the file extension association.  A reader can then explore the result by changing the input data or original parameters.  That reader might choose to publish the modified result to a Web page or another publication.
 
Our [http://www.mediawiki.org MediaWiki] extensions allow authors to embed computational results into wiki pages.  The wiki extension allows authors to reference results by tag which allows a user to update results without updating the wiki text.  Again, the embedding assistant in VisTrails can generate the markup automatically, and a user simply copies that markup into the wiki page.  The [http://www.crowdlabs.org crowdLabs] site allows authors to host computations and interactive results; VisTrails supports synchronizing workflows with this site.  Besides hosting computations, crowdLabs can host input and output data, allows users to search for results, and provides social networking features so other readers can add comments, rate workflows, or form groups.  In addition, users can embed results from crowdLabs directly using links on the computation pages.  Both the MediaWiki extensions and crowdLabs site support ''server-based'' executions of workflows so users can interact with and modify results from a Web browser.  [[VisMashup]] support allows authors to generate Flash applications that allow users to interactively explore an author-defined subset of parameters and input data.
 
== Demonstrations ==
 
To see our infrastructure in action, please see the following videos:
* {{qt|link=http://www.vistrails.org/images/executable_paper_latex.mov|text=Editing an executable paper written using LaTeX and VisTrails}}
* {{qt|link=http://www.vistrails.org/images/executable_paper_server.mov|text=Exploring a Web-hosted paper using server-based computation}}


== Examples of Executable Publications ==
== Examples of Executable Publications ==


=== ALPS2.0===
=== ALPS 2.0===
 
For a real example of an executable paper, whose results can be reproduced and validated, please see the ALPS 2.0 paper published in the [http://www.iop.org/EJ/abstract/1742-5468/2011/05/P05001 Journal of Statistical Mechanics].  To repeat the experiments shown in the paper, you will need to [[Downloads | download]] VisTrails 1.6.2.  The ALPS 2.0 package is included in VisTrails 1.6.
 
=== CFD Flow Analysis ===
This [http://www.vistrails.org/index.php/User:Tohline/CPM/Levels2and3 CFD analysis example] is published on the Web via the MediaWiki extensions we have developed.  Because this paper is published on a Wiki, it is possible to interact with the results using a Web browser. Try it out!
 
=== WikiQuery ===
We performed a [http://www.cs.utah.edu/~juliana/WikiQuery/WikiQuery_casestudy.pdf case study] for the [http://www.sigmod2011.org/calls_papers_sigmod_research_repeatability.shtml SIGMOD Repeatability effort].  For more details on how to run the WikiQuery experiments, and to obtain the experiments (data, code, workflows) please see the [[WikiQuery]] page.


For a real example of an executable paper, whose results can be reproduced and validated, check out the PDF for ALPS2.0 paper at http://arxiv.org/pdf/1101.2646
== Tutorials ==


To repeat the experiments shown in the paper, you will need to download VisTrails from:
* [[ExecutableLatexTutorial | Editing an executable paper using VisTrails and its LaTeX extensions]]
You can download VisTrails from: http://www.vistrails.org/index.php/Downloads
The ALPS2.0 package is included in VisTrails 1.6.


=== CFD Flow Analysis ===
== Ongoing Work ==
Another example of an executable publication, can be found at: http://www.vistrails.org/index.php/User:Tohline/CPM/Levels2and3
 
Because this paper is published on a Wiki, it is possible to interact with the results using a Web browser. Try it out!
We have been investigating better methods to modify embedded computational results more directly.  As part of this work, we have extended the current embedding assistant to parse LaTeX source files so that users can examine and modify the results more directly.  A screenshot of this updated assistant is shown below.  We plan to release this functionality in a future version of VisTrails.
[[File:Editing_latex.png|center|400px|thumb|Editing existing papers directly]]
 
== Funding ==
 
This project is sponsored by the National Science Foundation awards [http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1139832 IIS#1139832], [http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1050422 IIS#1050422],  [http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1050388 IIS#1050388], [http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0905385 IIS#0905385], and [http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0751152 CNS#0751152].
For more information about our research project, see [[RepeatabilityCentral]]

Latest revision as of 12:23, 7 August 2011

Introduction

While computational experiments have become an integral part of the scientific method, it is still a challenge to repeat such experiments, because often, computational experiments require specific hardware, non-trivial software installation, and complex manipulations to obtain results. We posit that integrating data acquisition, derivation, analysis, and visualization as executable components throughout the publication process will make it easier to generate and share repeatable results. We have built an extensible infrastructure to support the life-cycle of executable publications---their creation, review and re-use. Our focus is on papers whose computational experiments can be reproduced and validated. We note that our approach is orthogonal to others which focus on semantics and authoring, and can be combined with these.

We have written a paper that details the challenges of computational repeatability and the solutions we have developed: http://www.cs.utah.edu/~juliana/pub/vistrails-executable-paper.pdf

Infrastructure

We have developed a set of techniques to help authors, reviewers, and readers construct and interact with executable papers. Many of these techniques have been integrated with the VisTrails system and the crowdLabs site. VisTrails is an open-source scientific workflow and provenance management system, and our extensions for executable papers allow users to create papers and Web publications whose figures and results are directly tied to the computations that generated them. Both the LaTeX and MediaWiki extensions call VisTrails (either a server or local executable) to embed results into their documents. In addition, authors can embed images of the workflows that generated a particular result or the history of their explorations. These capabilities are included in the VisTrails 1.6.2 release available for Windows, Mac, and Linux.

Authors can also choose to couple a publication with the actual computations by embedding links to the workflows in the paper. The embedding assistant in VisTrails can generate these links automatically when computations are hosted on an accessible database. Using a compatible PDF reader (e.g. Adobe Acrobat), readers can click on a result, and the embedded link will access a vtl file that contains the workflow or the necessary information to access a remotely-hosted workflow. VisTrails can open vtl files, and from that file will open the corresponding workflow. Note that you may have to alert your operating system of the file extension association. A reader can then explore the result by changing the input data or original parameters. That reader might choose to publish the modified result to a Web page or another publication.

Our MediaWiki extensions allow authors to embed computational results into wiki pages. The wiki extension allows authors to reference results by tag which allows a user to update results without updating the wiki text. Again, the embedding assistant in VisTrails can generate the markup automatically, and a user simply copies that markup into the wiki page. The crowdLabs site allows authors to host computations and interactive results; VisTrails supports synchronizing workflows with this site. Besides hosting computations, crowdLabs can host input and output data, allows users to search for results, and provides social networking features so other readers can add comments, rate workflows, or form groups. In addition, users can embed results from crowdLabs directly using links on the computation pages. Both the MediaWiki extensions and crowdLabs site support server-based executions of workflows so users can interact with and modify results from a Web browser. VisMashup support allows authors to generate Flash applications that allow users to interactively explore an author-defined subset of parameters and input data.

Demonstrations

To see our infrastructure in action, please see the following videos:


Examples of Executable Publications

ALPS 2.0

For a real example of an executable paper, whose results can be reproduced and validated, please see the ALPS 2.0 paper published in the Journal of Statistical Mechanics. To repeat the experiments shown in the paper, you will need to download VisTrails 1.6.2. The ALPS 2.0 package is included in VisTrails 1.6.

CFD Flow Analysis

This CFD analysis example is published on the Web via the MediaWiki extensions we have developed. Because this paper is published on a Wiki, it is possible to interact with the results using a Web browser. Try it out!

WikiQuery

We performed a case study for the SIGMOD Repeatability effort. For more details on how to run the WikiQuery experiments, and to obtain the experiments (data, code, workflows) please see the WikiQuery page.

Tutorials

Ongoing Work

We have been investigating better methods to modify embedded computational results more directly. As part of this work, we have extended the current embedding assistant to parse LaTeX source files so that users can examine and modify the results more directly. A screenshot of this updated assistant is shown below. We plan to release this functionality in a future version of VisTrails.

Editing existing papers directly

Funding

This project is sponsored by the National Science Foundation awards IIS#1139832, IIS#1050422, IIS#1050388, IIS#0905385, and CNS#0751152. For more information about our research project, see RepeatabilityCentral