Thomas M. DeCarlo
  • Home
  • Research blog
  • CV
  • Photos
  • Data and Codes
  • Media
  • Press Room
  • Contact

Research blog

Transparent science

3/5/2017

0 Comments

 

Have you ever tried to repeat the analysis described in a paper and found it difficult? Whether it's some missing raw data or you aren't sure which R package was used, this can be quite frustrating. Wouldn't it be nice if authors published all of the raw data and codes needed to run the analysis with a single click of a button? Now this is possible! Code Ocean is a new online tool that lets you upload data and run scripts written in a variety of languages (Matlab and R being most relevant for me). The great part is that one need not have these programs on their computer or even download the data; the scripts are run through the cloud with literally the click of a single button. (I realise this may sound like an advertisement, but I assure you that it is not. I am just genuinely excited about this.)

I think this could be a great leap forward in making science transparent and reproducible. Imagine if every published paper were accompanied with a post on Code Ocean allowing the full analysis of the raw data to be executed by anyone. This would (1) save people a lot of time trying to recreate published analyses, (2) encourage collaborations, (3) improve reproducibility, and (4) ultimately scientists would probably write better code. Why this last point? If you are writing code and know that you will make it publicly available at some later stage, you are a lot more likely to write a neat and well-commented code, in much the same way that you will carefully write the paper describing the code.

Of course, I can see reasons why people may hesitate to post their code. What if someone finds a mistake in the code? What if someone tries to reuse the code in an inappropriate way? Plus, it takes some effort to check and upload the working code.

But surely the benefits vastly outweigh these downsides (more like excuses, really). The problem I foresee is that, unfortunately, there is little direct incentive to post a code. It won't directly affect one's ResearchGate, Google Scholar, or CV, and as far as I know there are no journal requirements to upload executable codes. I suspect that for this to gain momentum it will have to be required by either funding agencies or journals; or perhaps if enough researchers voluntarily post their codes then there will be a peer-pressure incentive to follow suit.

I tried this out for one of my previous publications. I will admit I was skeptical that Code Ocean would work in this case. This is some of the more complex code I have written. There are multiple ordinary differential equations embedded in multiple optimisations embedded in a Monte Carlo simulation. I imagined that Code Ocean must have some simplified Matlab compiler that would get caught up somewhere in this. But I was wrong! I literally just dragged and dropped the several Matlab .m files and the text files with the raw data into Code Ocean, changed the paths in the code to point to the raw data, and clicked "Run". Ok ok, actually I did need to write a few new lines at the end so that the main results would export export as a simple text file (I used structures in a .mat file for my original analysis). But really, this took a whole 20 minutes of my time. I even got personalised feedback on the script from a Code Ocean staff member. And now the entire analysis is traceable, from the very raw data to the final results in the published tables. Check it out below.

So will you post the data and codes for your publications? Or what is your excuse?

0 Comments



Leave a Reply.

    Author

    Thomas M. DeCarlo

    Archives

    May 2018
    May 2017
    April 2017

    Categories

    All

    RSS Feed
  • Home
  • Research blog
  • CV
  • Photos
  • Data and Codes
  • Media
  • Press Room
  • Contact