Enhancing Scholarly Communication with ReproZip
Reproducibility has been one of the cornerstones of science for many years. Revisiting and reusing findings from past research -- or, as Newton once said, “standing on the shoulders of giants” -- is a common practice that has led to countless advances. While this long tradition entails results to be reproducible, we do not have a standard approach to share computational experiments: one must often resort in papers to informal descriptions of processes and environments, along with their raw data and code. There are numerous reasons for the lack of computational reproducibility in the current methods of scholarly communication, but one that is fairly recurrent is that the process requires a lot of effort: scientists need to carefully collect and document all the dependencies (i.e., provenance) after the project is fully implemented, which is time-consuming and error-prone; besides, whoever tries to reproduce the results of this project needs to locate and set up all these dependencies, sometimes in the exact same version, and make sure they run in his computational environment, which may be different from the one originally used. I will present ReproZip, a tool that allows a researcher to create a compendium for his/her experiment by automatically tracking and identifying all its required dependencies (data files, libraries, configuration files, etc.). After creating the package with little to no effort, the researcher can share it with others, who can then use ReproZip to unpack the experiment and reproduce the findings independent of their operating system, again with little to no effort.