Trust in Scientific Code by Alcides Fonseca

In 2010 Carmen Reinhart and Kenneth Rogoff published Growth in a Time of Debt. It’s arguably one of the most influential economics papers of the decade, convincing the IMF to push austerity measures in the European debt crisis. It was a very, very big deal.
In 2013 they shared their code with another team, who quickly found a bug. Once corrected, the results disappeared.
Greece took on austerity because of a software bug. That’s pretty fucked up.

— How do we trust our science code? by Hillel Wayne

As more and more scientific publications are dependent on code, trusting code is more and more needed. Hillel asks for solutions, I propose to tackle the problem in two fronts.

1 – More engineering resources

Writing production-level quality software requires larger resources (usually engineerings, but also some tooling). Most scientific software is written once and read never. Some PhD or MSc student writes a prototype, shows the plots to their advisors who write (some or most of) the paper. It’s rare for senior researchers to inspect other people’s code. In fact, I doubt any of them (except if they teach software engineering principles) has had any training in code inspection.

We need research labs to hire (and maintain) scientific software engineering teams. For that to happen, funding has to be more stable. We cannot rely on project funding that may or may not be awarded. We need stable funding for institutions so they can maintain this team and resources.

2 – More reproducibility

Artifact Evaluation Committees are a good addition to computer science conferences. Mostly comprised of students (who have the energy to debug!), they run the artifacts and verify whether the results of the run justify the results presented in the paper. Having done that myself in the past, it is very tricky to find bugs in that process. Mostly we verify whether it will run outside of your machine, but not whether it is rightly implemented.

What would help is to fund reproduction of science. Set 50% of the agency funding for reproducibility. Labs that get these projects should spend less than the original project to reproduce the results (and most of the challenging decisions are already made). In this approach, we will have less new research, but more robust one.

Given how most of the CS papers are garbage (including mine), I welcome this change. We need more in-depth strong papers that move the needle, and less bullshit papers that are just published for the brownie points.

Overall we need better scientific policies with the right incentives for trustworthy science. I wonder who will take this challenge on…