Alcides Fonseca

40.197958, -8.408312

Trust in Scientific Code

In 2010 Carmen Reinhart and Kenneth Rogoff published Growth in a Time of Debt. It’s arguably one of the most influential economics papers of the decade, convincing the IMF to push austerity measures in the European debt crisis. It was a very, very big deal.
In 2013 they shared their code with another team, who quickly found a bug. Once corrected, the results disappeared.
Greece took on austerity because of a software bug. That’s pretty fucked up.

How do we trust our science code? by Hillel Wayne

As more and more scientific publications are dependent on code, trusting code is more and more needed. Hillel asks for solutions, I propose to tackle the problem in two fronts.

1 – More engineering resources

Writing production-level quality software requires larger resources (usually engineerings, but also some tooling). Most scientific software is written once and read never. Some PhD or MSc student writes a prototype, shows the plots to their advisors who write (some or most of) the paper. It’s rare for senior researchers to inspect other people’s code. In fact, I doubt any of them (except if they teach software engineering principles) has had any training in code inspection.

We need research labs to hire (and maintain) scientific software engineering teams. For that to happen, funding has to be more stable. We cannot rely on project funding that may or may not be awarded. We need stable funding for institutions so they can maintain this team and resources.

2 – More reproducibility

Artifact Evaluation Committees are a good addition to computer science conferences. Mostly comprised of students (who have the energy to debug!), they run the artifacts and verify whether the results of the run justify the results presented in the paper. Having done that myself in the past, it is very tricky to find bugs in that process. Mostly we verify whether it will run outside of your machine, but not whether it is rightly implemented.

What would help is to fund reproduction of science. Set 50% of the agency funding for reproducibility. Labs that get these projects should spend less than the original project to reproduce the results (and most of the challenging decisions are already made). In this approach, we will have less new research, but more robust one.

Given how most of the CS papers are garbage (including mine), I welcome this change. We need more in-depth strong papers that move the needle, and less bullshit papers that are just published for the brownie points.

Overall we need better scientific policies with the right incentives for trustworthy science. I wonder who will take this challenge on…

Hidden interface controls are affecting usability

It’s the year 2070. You are a 20 year recruit that is going to travel back in time 12-monkey style to try and save the world. You get to 2025, you find proof on a iPhone and you need to take a screenshot and send to a safe email address. Do you have a change at discovering how to take a screenshot?

The other day I was locked out of my car. I had my keys, but the key fob button wouldn’t work and neither would the little button on the door handle that normally unlocks the car. At this point, every action I had to take in order to get into the car required knowledge of a hidden control. Why didn’t I just use my key to get in? First, you need to know there is a hidden key inside the fob. Second, because there doesn’t appear to be a keyhole on the car door, you also have to know that you need to disassemble a portion of the car door handle to expose the keyhole.

Philip Kortum has a nice article on how this quest towards “clean” interfaces actually hurts usability.

How to select your side project

Recommended audience: CS students

Austin Henley shares some properties of a good side project. Personally, I think having a clear shippable objective is what most people lack, and prevents them from ever being complete.

I remember having side-projects suggestions during my courses. Maybe that’s something I have to incorporate in mine.

Most of what I’ve learned during my degree was doing side-projects. From competing in hackathons, creating a junior company, organizing conferences, doing a couple of research internships, and doing some freelancing work, these projects all taught me something that was not in the syllabus. That’s what separates you from the average student, and what will get you a good job in a world where unemployed software engineers are aplenty.

Joshua Barretto shares a really interesting list of possible side projects:

  • Regex engine (5 days)
  • x86 kernel (2 months)
  • Gameboy emulator (3 weeks)
  • Gameboy advance game (2 weeks)
  • Chess engine (5 days)
  • Physics engine (1 week)
  • Voxel engine (2 weeks)
  • GUI Toolkit (3 weeks)
  • Posix shell (5 days)
  • Dynamic interpreter (2 weeks)
  • Compiler (3 months)
  • Threaded Virtual machine (1 week)
  • Text editor (4 weeks)

The last four will give you an heads up in the programming language world. I might even have an internship for you.

Perhaps you’re a user of LLMs. But I might suggest resisting the temptation to use them for projects like this. Knowledge is not supposed to be fed to you on a plate. If you want that sort of learning, read a book – the joy in building toy projects like this comes from an exploration of the unknown, without polluting one’s mind with an existing solution.

Selling SAAS to universities

Recommended audience: Startups and large companies who intend to sell software to universities.

Most SAAS is sold on a per-seat basis. But this does not scale to universities, as we have a large number of possible seats, but most of them (students, possibly from different scientific areas) do not use the software, at least for it to be worthwhile.

On the other hand, unpredictable costs (when paying per activity) is also something that does not work, as we need other budget it yearly.

Chris Siebenmann has a really good write up on this issue, which I recommend if you manage or sell to universities.

Gremllm

Take python magic methods and add LLM code generation. That’s Gremlin, which no one should use.

However, I would certainly use this (if the output had a different mark, like color in my interpreter) for debugging code.

from gremllm import Gremllm

counter = Gremllm(“counter”)
counter.value = 5
counter.increment()
print(counter.value) # 6?
print(counter.to_roman_numerals()) # VI?

LLMs have the same right as humans when it comes to copyright and learning

As I stated before, the boundary of what is copyright infringement when it comes to machine training is quite blurred.

If you see LLMs as their own entities (I don’t, but I’m afraid Nobel laureate Geoffrey Hinton does), they have the same right to learn as humans. They just happen to have photographic (literary?) memory. Is it their fault?

On the other hand, you look at LLMs as a form of compression. Lossy, yes, but a compression algorithm nevertheless. In that case, if you zip a book and unzip it, even with a few faults, it’s the same book you copied.

Legislation will have to decide on this sooner or latter.

William Haskell Alsup, of Oracle vs Google fame, ruled that buying and scanning books to train LLMs was legal. He also decided that downloading (pirated by a 3rd party) ebooks was not fine.

Regardless of my own position, I believe every government should create a task force to think about this, including experts from different fields. Last time something like this (peer-to-peer, Napster, The Pirate Bay) happened, legislation took too long to arise. Now, this are moving at an ever faster pace. And I’m afraid our legal systems are not flexible and agile enough to adapt.

New Alignments and Fonts

I’ve recently come across novel approaches to typography, which given how much we are moving to digital, I find to be rarer than expected.

Alternative Layout System presents different ways of justifying text or, in the case of the picture below, ways of annotating what is coming in the next line to help reading.

Kermit is a new font designed by Underwear and commissioned by Microsoft that aims to help kids to read, including those with dyslexia. But for me, it’s much more than that. It allows to include tones in the typography that help to convey how you should read some text, making kids books much more fun.

And don’t miss Jason Santa Maria’s other recommendations.

No AI in Servo

Contributions must not include content generated by large language models or other probabilistic tools, including but not limited to Copilot or ChatGPT. This policy covers code, documentation, pull requests, issues, comments, and any other contributions to the Servo project.

A web browser engine is built to run in hostile execution environments, so all code must take into account potential security issues. Contributors play a large role in considering these issues when creating contributions, something that we cannot trust an AI tool to do.

Contributing to Servo (via Simon Willison)

Critical projects should be more explicit about their policies. If I had a critical piece of software, I would do the same choice, for safety. The advantage of LLMs are not that huge to be worth the risk.

SPECIES Scholarship 2025

Assumed audience: MSc or PhD Students with interest in Evolutionary Algorithms in Program Synthesis or Theorem Proving.

If you are curious about using Evolutionary Algorithms (Genetic Programming in particular) to Theorem Proving or Program synthesis, consider applying. It consists of funding for spending 3 months in Lisbon working with me.

Automatic Generation of Mathematical Proofs

Mathematicians like Fields Medalist Terrence Tao are now using programming languages like Lean to proof interesting theorems in both known and research-level mathematical areas. The key feature of these languages is Dependent Types, which allow to describe much more interesting properties than regular types like Ints and Floats. You can learn more about this approach here.

The Curry-Howard correspondence shows that programs are proofs, and proofs are programs. The goal of this project is to explore how can Genetic Programming be used to synthesize mathematical proofs. One concern is that the probability of success of a crossover is reduced as common subtrees are less frequent than in symbolic regression or strongly-typed programs.

For evaluation, we plan to use the Lean programming language, and the proof database mathlib, and the LeanGym environment. We also plan to compare with approaches that use Language Models to predict the next step in a proof.

If you have any questions, feel free to reach out to me at me@alcidesfonseca.com

Cloud Identity Providers

If you have no in-house authentication or ‘who exists’ identity system and you’ve offloaded all of these to some external provider (or several external providers that you keep in sync somehow), you’re clearly at the mercy of that cloud identity provider. Otherwise, it’s less clear and a lot more situational as to when you could be said to be using a cloud identity provider and thus how exposed you are. I think one useful line to look at is to ask whether a particular identity provider is used by third party services or if it’s only used to for that provider’s own services. Or to put it in concrete terms, as an example, do you use Github identities only as part of using Github, or do you authenticate other things through your Github identities?

Chris Siebenmann

I was a very strong advocate for OpenID and OAuth, back in the day. However, my idea was to own my online authentication. My OpenID was alcidesfonseca.com, which I would delegate to a provider of my own choosing, which I could change whenever I wanted, without changing anything on the several websites I used to authenticate with.

However, the silo’ed web decided against custom OpenID providers, and ended up supporting Google, Facebook and Github, for the more tech savvy websites. And I’m talking about important stuff, like Tailscale (my VPN provider). The open web lost its decentralization.

Now that I am moving away from Google, I have to change my login details in every website that I used Google as my identity. But most do not support adding a tradition email-password login. And I don’t quite want that. Because I don’t want to update passwords every once in a while at a 1000 websites.

So remember kids, create an account on every website, so you can leave your centralized server whenever they change their privacy policy or they start charging for what used to be free.

Watching Millionaires

I watched the Champions League final the other day when it struck me: I’m basically watching millionaires all the time.
The players are millionaires, the coaches are millionaires, the club owners are millionaires. It’s surreal.
This week I watched John Wick Ballerina and, again, there’s Keanu Reeves, who is a millionaire, and Ana de Armas, who is as well.
Yesterday I heard about Trump and Musk fighting. They are not millionaires, they are billionaires!
As I’m writing this, I’m watching the Rock am Ring live stream, a music festival in Germany. Weezer is playing. These guys are all millionaires.
I don’t know what to make of it. It’s a strange realization, but one that feels worth sharing.

Matthias Endler

The most impressive thing is that my grandparents did not do this. They barely listened to the radio, let alone TV, which was only available when they were 50+.

Personally, I have given up on watching TV or Radio. I watch the TV shows and Movies that I decide, and I listen to the music that I choose. I still watch millionaires, but I am intentional about it.

I recommend looking for the indie content, both TV and music. last.fm is a great source for the latter. Following movie critics blogs is another way of finding interesting TV. For me, I rely on jwz.

Transformative and Compositional Work

To assess the productivity gained by LLMs, Iris Meredith distinguishes between transformative and compositional work:

While work is obviously a very, very complicated thing, a useful lens for the purpose of this essay is to draw a distinction between work which reshapes a raw material into a finished object and work that puts together multiple objects in a way that creates a certain effect in the world. For the sake of having a shorthand, I’ve chosen to call them transformative and compositional work, respectively.

While Iris does not have a particular clear idea of LLMs, Jeremy Keith does:

My own take on this is that transformative work is often the drudge work—take this data dump and convert it to some other format; take this mock-up and make a disposable prototype. I want my tools to help me with that.
But compositional work that relies on judgement, taste, and choice? Not only would I not use a large language model for that, it’s exactly the kind of work that I don’t want to automate away.
Transformative work is done with broad brushstrokes. Compositional work is done with a scalpel.

Personally, I think it depends much more on where you are in the economic value proposition. Are you selling quick-and-dirty cheap stuff? LLMs are great for you. Are you delivering high-assurance, high-quality work? Then you might/should be skeptical of LLMs.

Smart Donkey Factory

My first day of uni, I received these two t-shirts designed by the student group.

two blue t-shirts: the first one depicting the text biggest fucking noob of CS; the second features a factory that takes a donkey as input and outputs the same donkey, but with a diploma

While I completely forgot about the top one, I keep the bottom one near my heart. While I found it amusing, I did not find it to be true. I did learn a lot during these years, and I gained much more than the degree (which is only required for the Portuguese bureaucratic system where Simon Peyton Jones couldn’t even get a position as Assistant Professor).

Now 19 years later, I no longer find it to be funny. I feel the scholarly spirit is dying and young people do not care about learning or knowledge. They care only about grades and getting the degree. And GPT is the TLA that takes them from the donkey without the diploma to the donkey with the diploma.

Replacing Pokemons with Hometown contributors

The plan worked. Kids have started attending local events and volunteering for community activities — just for a chance to meet the ojisan from their cards. Participation in town events has reportedly doubled since the game launched.

Tokyo Weekender

I’m not a fan of gamification, but prioritizing community members over pokemons or football players is a win in my book.

I format my computer every semester

This tradition started back when I was a student. I installed random software for each of the 5 courses I took every semester. I ended up with wasted disk space, random OS configurations and always a complete mess in my $PATH.

So I started formatting my Macs at the end of every semester. And I continue doing that today. Being a professor, I also deal with the software baggage every same semester — otherwise I would probably format it every year.

Most of the people I know think this is insane! Because they spend days in this chore, they avoid it as much as possible, often delaying it so much that they end up buying a new computer before considering formatting. And they also delay buying a new computer for the same reason.

My trick is simple: I automate the process as much as possible, such that it takes ~20 minutes now to format and install everything, and another 2 hours to copy all data and login into the necessary accounts. And you can watch a TV Show while doing it.

I keep a repository with all my dot file configurations, which also contains scripts to soft link all my configurations (usually located at $HOME/Code/Support/applebin) to their expected location ($HOME). This process also includes a .bash_local or .zsh_local where I introduce machine or instance-specific details that I don’t mind losing when I format it in 6 months. Long-lasting configurations go in the repo.

If the machine runs macOS, I also run a script that sets a bunch of defaults (dock preferences, Safari options, you name it) that avoid me going through all settings windows and configuring it the way I like it.

But the most useful file is my Brewfile, that contains all the apps and command-line utilities I use. I should write another usesthis, where I go through all the apps I have installed, and why.

My process starts with copying my home directory to an external hard-drive (for restoring speeds). During this process I usually clean up my Downloads and Desktop folders, which act as more durable /tmp folders. When it’s done, I reset my MacBook to a new state. I then install homebrew and Xcode command line utilities (for git), I clone my repo and run the setup script. At the same time, I start copying back all my documents from the external drive back to my Mac. Then it’s time to do something else.

Two hours later, I can open the newly installed apps and login or enter registration keys, and make sure everything is working fine.

Now I’m ready for the next semester!

The Great Scrape

These companies are racing to create the next big LLM, and in order to do that they need more and more novel data with which to train these models. This incentivises these companies to ruthlessly scrape every corner of the internet for any bit of new data to feed the machine. Unfortunately these scrapers are terrible netizens and have been taking down site-after-site in an unintentional wide-spread DDoS attack.

More tools are being released to combat this, one interesting tool from Cloudflare is the AI Labyrinth which traps AI scrapers that ignore robots.txt in a never-ending maze of no-follow links. This is how the arms race begins.

Herman Martinus

This is a lost fight. As I’ve wrote before, the only solution I see is to encrypt content and monetize it, so there is a trace to show to courts.

Porting vs rewriting codebases from scratch

Languages that allow for a structurally similar codebase offer a significant boon for anyone making code changes because we can easily port changes between the two codebases. In contrast, languages that require fundamental rethinking of memory management, mutation, data structuring, polymorphism, laziness, etc., might be a better fit for a ground-up rewrite, but we’re undertaking this more as a port that maintains the existing behavior and critical optimizations we’ve built into the language. Idiomatic Go strongly resembles the existing coding patterns of the TypeScript codebase, which makes this porting effort much more tractable.

Ryan Cavanaugh, on why TypeScript chose to rewrite in Go, not Rust (via Simon Willison)

Should proofs have to be readable

It seems to us that the scenario envisioned by the proponents of verification goes something like this: The programmer inserts his 300-line input/output package into the verifier. Several hours later, he returns. There is his 20,000-line verification and the message “VERIFIED.”

Social Processes and Proofs of Theorems and Programs by Richard DeMillo, Richard Lipton and Alan Perlis

Although this is another straw man, many people claim to have verified something, offering as evidence a formal proof using their favourite tool that cannot be checked except by running that very tool again, or possibly some other automatic tool. A legible formal proof allows a human reader to check and understand the reasoning. We must insist on this.

Lawrence C Paulson

In my research group, we have been thinking not only on how to improve error messages, but also how to improve the understandability of proofs. It feels good to read such reinsuring take.