Cette page appartient aux archives web de l'EPFL et n'est plus tenue à jour.
This page belongs to EPFL's web archive and is no longer updated.

Reproducible Research

An article about reproducible research appeared in the July 2007 issue of IEEE Signal Processing Magazine eNewsletter. It invites readers to discuss about reproducible research on our discussion forum.

Recently, a note encouraging authors to make their publications reproducible was also added to the IEEE Transactions on Signal Processing homepage.

Things are moving, and they are moving fast!
Posted by Patrick Vandewalle at 17:42
Reproducible Research in Blogosphere
Things have been rather quiet here recently... Not because nothing was happening on reproducible research, but mainly because I was not sure about the purposes and use of this Blog. Please feel free to let me know if you have any feelings about the use or lack of use for such a blog.

After some interesting discussions about reproducible research and open access, some colleagues have reported about our reproducible research initiatives on their blogs:
- Peter Murray-Rust wrote an article " Open Data is critical for Reproducible Research" on his blog at University of Cambridge. He is quite active on Open Access to publications and data in chemistry. He and his colleagues have built a robot that extracts cristallographic information from publications and gathers them in an online database CristalEye. In their community, they also have the Blue Obelisk which collects open source code and data in chemistry.
- Peter Suber referred to reproducible research on his Open Access News blog: OA for text, data, and code to make research reproducible. Peter is a policy strategist for open access to scientific and scholarly research literature. On his blog, he gives a lot of news about new initiatives, publishing policies, etc.

And thanks of course also to Stevan Harnad for his kind and helpful reactions, and for bringing me into contact with these people!
Posted by Patrick Vandewalle at 17:34
repository server for publications
I think it's probably a lot easier, and more consistent, if instead of making a web page for each RR paper we do (http://lcavwww.epfl.ch/reproducible_research), we have a setup (a bit) like Infoscience, where everyone can enter publications by filling in the required and optional fields. I would like to build such a setup based on EPrints (http://www.eprints.org/software/) and make it public, such that other labs/universities can also easily set up a similar server. We will probably let the people from EPrints develop this system, but for that we need accurate requirements... So your comments on this would be very welcome!

I was thinking about the following fields:
- standard publication fields (title, author, reviewing status, journal, volume, number, pages, year, DOI, abstract, keywords, PDF, publisher, official URL)
- specifically for RR:
* code and data (in a zip archive, specifying also the type of code), mandatory
* tested configurations, mandatory
* contact e-mail address, mandatory
* figures, optional
- additional features for readers (cfr http://clare.eprints.org/10/ for an example of the last)
* a check box saying 'I have tested this code and it runs/does not run'
* a check box saying 'I was/was not able to reproduce the results described in this paper'
* a field where anyone can add comments

Any comments? More/less things needed?
Some specific questions:
- should we make these 'Additional features' linked to a name and/or date or so, such that we can avoid the author clicking 10 times? ;-)
- should we separate code and data? Data might get quite large, while code is generally small.
Posted by Patrick Vandewalle at 10:35
licenses - which one to use?
OK, there we go again after some pretty silent months... with a first note about a usable license for our reproducible research.

I have done some more reading about licensing possibilities, and believe me, there are plenty of them: ;-) see for example
- www.opensource.org,
- GNU Licenses, or
- Creative Commons licenses (although this one is not intended for software, so it seems not really useful for our purpose).

Some features that I would find desirable for our license:
- be an 'open' license, meaning that people can get it freely (without paying) and easily on the web, and can even contribute etc.
- it would sound fair to me that if someone wants to build a commercial application using my code, he has to somehow ask (and pay) for it.
- if I want to commercialize my code myself, I need to be able to do this ;-)

This second point seems to be a problem with most open source licenses. GPL says that all derived works need to use GPL too, whereas many other open licenses allow any kind of redistribution, under whatever commercial/noncommercial license that person would want to. As far as I can see, the third item is not a problem, as the author himself can apparently re-license things anyway he wants. Except of course for the fact that some version of your code may already be floating around on the internet.

I currently feel attracted to the dual licensing I saw on some places on the internet (MySQL uses this, and our neighbors from CVLab also): put the work by default freely available under GPL, but with a remark that people who want to use it commercially can contact us for a commercial license. This should give a very open distribution, forces other people to use GPL too if they want to redistribute it, but also gives the possibility to commercialize it.

Any comments on this? Is this the way we should license our reproducible work?
Posted by Patrick Vandewalle at 16:22
Academic Free License (AFL) v. 3.0
--- just copying an e-mail from Christophe below about a possible license: AFL ---

Hello to the reproducible research group ;)

Although I should still read it once more (and in its entirety) to understand it, this license could be a good candidate for Reproducible Research.
Does anyone know about it and if it is good or bad in any sense?

HTML: http://www.opensource.org/licenses/afl-3.0.php

Posted by Patrick Vandewalle at 16:56
One important issue to discuss is under which license all our reproducible code and information should go. Currently most of our stuff is under GPL, but that does not permit even ourselves to commercialize things later. All derived code and products have to necessarily go under GPL too in that case. So maybe not that great if someone would want to do a startup based on his PhD research.
Posted by Patrick Vandewalle at 16:56
Comments (1)
ICASSP special session
At the next ICASSP conference (April 15-20, 2007, in Hawaii), we are organizing a special session on the topic, together with Mauro Barni and Fernando Perez-Gonzalez. This should allow discussion and exchange of ideas with a broader public. We will have six papers covering various aspects of reproducible research: case studies, public datasets, publishing issues, tools for making research reproducible, etc. I can already say that the papers look very interesting. I look already forward to the conference and the inspiring discussions around (and not only because of its location ;-)).
Posted by Patrick Vandewalle at 22:02
What is reproducible research?
The idea behind reproducible research is quite simple: all the information relevant to the work should be made available. This means that the publication(s), the data and code used to produce results, figures, etc. should be available, typically online. In practice, this does require some effort, which is largely paid back in additional visibility, impact, and ease of reuse of the work.
Posted by Patrick Vandewalle at 21:53
Welcome on this blog about reproducible research!

The goal of this blog is to exchange ideas about reproducible research in general, and on how to make all our lab's research reproducible. May the exchange of ideas be fruitful, and result in a good setup for reproducible research!
Posted by Patrick Vandewalle at 21:36