Tuesday, April 3, 2012

Highly Replicable Research

There is a really good post by Titus Brown (by way of PlanetScipy) on replicate-able research.
So what did we do to make this paper extra super replicable?
If you go to the paper Web site, you'll find:
  • a link to the paper itself, in preprint form, stored at the arXiv site;
  • a tutorial for running the software on a Linux machine hosted in the Amazon cloud;
  • a git repository for the software itself (hosted on github);
  • a git repository for the LaTeX paper and analysis scripts (also hosted on github), including an ipython notebook for generating the figures (more about that in my next blog post);
  • instructions on how to start up an EC2 cloud instance, install the software and paper pipeline, and build most of the analyses and all of the figures from scratch;
  • the data necessary to run the pipeline;
  • some of the output data discussed in the paper.
(Whew, it makes me a little tired just to type all that...)
What this means is that you can regenerate substantial amounts (but not all) of the data and analyses underlying the paper from scratch, all on your own, on a machine that you can rent for something like 50 cents an hour. (It'll cost you about $4 -- 8 hours of CPU -- to re-run everything, plus some incidental costs for things like downloads.)

I really think it is a neat use of the Amazon elastic compute cloud.