Open Source for Research

February 26, 2012

One of the important principles of the scientific method is the full reporting of experimental results: not just the researcher’s conclusions, but also a detailed description of the method and apparatus used, and of the data obtained.  The idea is to enable others to replicate the experiment, to help ensure that the results are not a fluke, or just a mistake.  Ars Technica reports on a new proposal, published in an editorial in the journal Nature, to provide open source code for the computations that underlie most contemporary scientific papers.

Reproducibility refers to the ability to repeat some work and obtain similar results.  …  Scientific papers include detailed descriptions of experimental methods—sometimes down to the specific equipment used—so that others can independently verify results and build upon the work.

Reproducibility becomes more difficult when results rely on software. The authors of the editorial argue that, unless research code is open sourced, reproducing results on different software/hardware configurations is impossible.

If one accepts the fundamental idea of full disclosure, it is hard to argue with the basic thrust of this proposal.  It is a rare piece of experimental research that does not involve the use of some software to process and analyze the resulting data.  At present, there are some journals, such as Science, which expect code to be included as part of a submitted paper.  Others, such as Nature itself, only require a detailed written description of the code.  In some cases, the authors offer to supply an executable (binary) version of the code on request.

Alternatives to the actual source code are less than entirely satisfactory for a number of reasons.  Any experienced software developer can tell you that a major problem with software documentation (that is, a description of the program) is that the description does not match what the program actually does.  Executable versions of a program will generally  be usable only for others with the same computing platform, and are largely opaque; if someone attempts to replicate the results, but gets different answers, it is hard to know where to look.  As I’ve noted before, software for numerical analysis is especially subject to errors resulting from the idiosyncracies of computer arithmetic; to make matters worse, these effects can be platform-dependent.   Even with source code available, there can be ambiguities in the order in which arithmetic operations are performed; there is seldom any guaranty that the order will match the conventional order expected from looking at the mathematics.

Of course, there are in some cases obstacles to releasing the source code.  But the purpose of the scientific enterprise really requires every effort to make experimental results as transparent as possible.


%d bloggers like this: