You must post your source code in science.

An image of the Ctrl-Alt-Delete source code.

Show me the source!

I’ve made it pretty clear by now that I do a lot of computational work in my research, so you can imagine that my metaphorical ears perked up when I came across this article in Nature by Nick Barnes about releasing scientific source code when you publish research papers.  I liked the article he wrote, but since this is my blog, I’m allowed to go even further than he did.

I’ve written a lot of software as a hobbyist over the years, ever since I began programming in elementary school.  A lot of it I simply tinkered with and forgot, but I’ve released source into the wild before, though I’m not an open source / free software zealot in general;  for example, I think that the FSF is, frankly, a little out to lunch about the issues.

However, when it comes to science I am militant:  if you publish a scientific paper based on the results of code that you have written, then that code is a part of your methodology and must be made available to others in the scientific community so that they can examine it and replicate your results.  Nobody is allowed to get away with saying “well, we did this DNA sequencing, but I’m not going to tell you how we did it, what materials or equipment we used, or what our procedures were – you just have to trust us that our results are right”.  That wouldn’t fly in any reputable journal, and it shouldn’t be allowed when it comes to source code.  The Barnes article implies that some people are too ashamed of their code to release it – but if you’re too ashamed to let other people see it, why are you publishing results based on it?

The other excuses he mentions in the article are equally rubbish, but one that he didn’t mention (which I’ve actually come across) is “oh, well, I can’t give you the source code because you’ll use it to do further work and publish papers that I want to do”.   This infuriated me when I heard it.  How the author in question managed to justify this in my head mystifies me;  where in science do you get to claim that you can’t release your methodology because other people might replicate or extend your work?  That’s the whole point of science.  You don’t get to write a paper which proposes a great new method and then admonish people that they can’t use it until you’re done publishing on it!

The only possible exception that I can see to this is in cases where the paper describes a finished product that is being made available to the scientific community (either free or for pay), but I don’t think too much about these cases because they strike me as being more of an advertisement than anything else.  There’s nothing inherently wrong with that, but it will also be fairly rare.  I would also distinguish between specific products (like, say, a GIS tool) and a new method, like a statistical analysis package.  The latter should release the code, no exceptions, because otherwise we can’t validate the method for ourselves.  An example of doing it right comes from the Laland lab at St. Andrews, who published a new method for measuring the spread of information across a network (network based diffusion analysis; NBDA).  Along with the paper, they released the source code and a package for R to help users implement and use the method themselves.

In the end, science thrives on the free exchange of information for the advancement of our collective knowledge. Anyone who feels that their source code is not a part of that exchange is not only wrong, they’re doing bad science.

Image credit: ptufts
Tagged , , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s