Alcides Fonseca

40.197958, -8.408312

Posts tagged as Research

Overview of what has been happening to LLMs

It’s impossible to keep up with all the new developments in the LLM-era. However, one thing has been true: they never stopped improving.

Malte Skarupke explains How LLMs Keep on Getting Better, covering a few of the different visible and invisible aspects of LLMs that have been worked on over the past couple of years. It’s a really good overview for those who are not into the weeds of it.

Peer Review is Dead

If ChatGPT can produce research papers that are indistinguishable from what most scientists can write, then maybe scientists can focus on actually advancing science—something that ChatGPT has thus far proven unable to do.

Beyond papers: rethinking science in the era of artificial intelligence by Daniel Lemire

Looking at the proceedings of our conferences over the past few years, I find that most of the papers are simply uninteresting. Moreover, it seems that every first-year PhD student is now required to write a systematic review on their topic — supposedly to learn about the field while producing a publication.

Let me be blunt: every systematic review I’ve read has felt like a waste of time. I want to read opinionated reviews written by experts — people who have seen enough to have perspective — not by PhD students who have just skimmed the past decade of papers on Google Scholar.

We need far fewer papers (I’m doing my best to contribute to that cause), and the ones we do publish should be bold, revolutionary, and even a little irreverent. We need innovation and the courage to break expectations. Incremental research has its place, but that doesn’t mean it always needs to be published.

To make this possible, evaluation committees — both nationally and within universities — must rethink their processes to move away from bean-counting metrics. Our current incentive system discourages genuine peer review, and even when proper reviews happen, they often waste effort on work that adds little value.

Otherwise, yes — the bean-counting-reinforcement-learning AIs will take our jobs.

Trust in Scientific Code

In 2010 Carmen Reinhart and Kenneth Rogoff published Growth in a Time of Debt. It’s arguably one of the most influential economics papers of the decade, convincing the IMF to push austerity measures in the European debt crisis. It was a very, very big deal.
In 2013 they shared their code with another team, who quickly found a bug. Once corrected, the results disappeared.
Greece took on austerity because of a software bug. That’s pretty fucked up.

How do we trust our science code? by Hillel Wayne

As more and more scientific publications are dependent on code, trusting code is more and more needed. Hillel asks for solutions, I propose to tackle the problem in two fronts.

1 – More engineering resources

Writing production-level quality software requires larger resources (usually engineerings, but also some tooling). Most scientific software is written once and read never. Some PhD or MSc student writes a prototype, shows the plots to their advisors who write (some or most of) the paper. It’s rare for senior researchers to inspect other people’s code. In fact, I doubt any of them (except if they teach software engineering principles) has had any training in code inspection.

We need research labs to hire (and maintain) scientific software engineering teams. For that to happen, funding has to be more stable. We cannot rely on project funding that may or may not be awarded. We need stable funding for institutions so they can maintain this team and resources.

2 – More reproducibility

Artifact Evaluation Committees are a good addition to computer science conferences. Mostly comprised of students (who have the energy to debug!), they run the artifacts and verify whether the results of the run justify the results presented in the paper. Having done that myself in the past, it is very tricky to find bugs in that process. Mostly we verify whether it will run outside of your machine, but not whether it is rightly implemented.

What would help is to fund reproduction of science. Set 50% of the agency funding for reproducibility. Labs that get these projects should spend less than the original project to reproduce the results (and most of the challenging decisions are already made). In this approach, we will have less new research, but more robust one.

Given how most of the CS papers are garbage (including mine), I welcome this change. We need more in-depth strong papers that move the needle, and less bullshit papers that are just published for the brownie points.

Overall we need better scientific policies with the right incentives for trustworthy science. I wonder who will take this challenge on…

SPECIES Scholarship 2025

Assumed audience: MSc or PhD Students with interest in Evolutionary Algorithms in Program Synthesis or Theorem Proving.

If you are curious about using Evolutionary Algorithms (Genetic Programming in particular) to Theorem Proving or Program synthesis, consider applying. It consists of funding for spending 3 months in Lisbon working with me.

Automatic Generation of Mathematical Proofs

Mathematicians like Fields Medalist Terrence Tao are now using programming languages like Lean to proof interesting theorems in both known and research-level mathematical areas. The key feature of these languages is Dependent Types, which allow to describe much more interesting properties than regular types like Ints and Floats. You can learn more about this approach here.

The Curry-Howard correspondence shows that programs are proofs, and proofs are programs. The goal of this project is to explore how can Genetic Programming be used to synthesize mathematical proofs. One concern is that the probability of success of a crossover is reduced as common subtrees are less frequent than in symbolic regression or strongly-typed programs.

For evaluation, we plan to use the Lean programming language, and the proof database mathlib, and the LeanGym environment. We also plan to compare with approaches that use Language Models to predict the next step in a proof.

If you have any questions, feel free to reach out to me at me@alcidesfonseca.com

Should proofs have to be readable

It seems to us that the scenario envisioned by the proponents of verification goes something like this: The programmer inserts his 300-line input/output package into the verifier. Several hours later, he returns. There is his 20,000-line verification and the message “VERIFIED.”

Social Processes and Proofs of Theorems and Programs by Richard DeMillo, Richard Lipton and Alan Perlis

Although this is another straw man, many people claim to have verified something, offering as evidence a formal proof using their favourite tool that cannot be checked except by running that very tool again, or possibly some other automatic tool. A legible formal proof allows a human reader to check and understand the reasoning. We must insist on this.

Lawrence C Paulson

In my research group, we have been thinking not only on how to improve error messages, but also how to improve the understandability of proofs. It feels good to read such reinsuring take.

Wanted: an elegant solution for breadth-first iteration of a tree structure.

While working on the enumerative feature of GeneticEngine, I wanted to recursively explore all the instances of a grammar, ideally using cache.

My first solution ended up being DFS as I used Python generators to yield all possible options on sum types and recursively iterating through all arguments in product types.

I’ve written this proof of concept pt implement breadth-first iteration in a generic tree structure that yields the right order. However, I find the extra argument a bit ugly, and I would like a more elegant solution. If you happen to know it, I’m all ears!