Alcides Fonseca

40.197958, -8.408312

New Alignments and Fonts

I’ve recently come across novel approaches to typography, which given how much we are moving to digital, I find to be rarer than expected.

Alternative Layout System presents different ways of justifying text or, in the case of the picture below, ways of annotating what is coming in the next line to help reading.

Kermit is a new font designed by Underwear and commissioned by Microsoft that aims to help kids to read, including those with dyslexia. But for me, it’s much more than that. It allows to include tones in the typography that help to convey how you should read some text, making kids books much more fun.

And don’t miss Jason Santa Maria’s other recommendations.

No AI in Servo

Contributions must not include content generated by large language models or other probabilistic tools, including but not limited to Copilot or ChatGPT. This policy covers code, documentation, pull requests, issues, comments, and any other contributions to the Servo project.

A web browser engine is built to run in hostile execution environments, so all code must take into account potential security issues. Contributors play a large role in considering these issues when creating contributions, something that we cannot trust an AI tool to do.

Contributing to Servo (via Simon Willison)

Critical projects should be more explicit about their policies. If I had a critical piece of software, I would do the same choice, for safety. The advantage of LLMs are not that huge to be worth the risk.

Species Scholarship 2025

Assumed audience: MSc or PhD Students with interest in Evolutionary Algorithms in Program Synthesis or Theorem Proving.

If you are curious about using Evolutionary Algorithms (Genetic Programming in particular) to Theorem Proving or Program synthesis, consider applying. It consists of funding for spending 3 months in Lisbon working with me.

Automatic Generation of Mathematical Proofs

Mathematicians like Fields Medalist Terrence Tao are now using programming languages like Lean to proof interesting theorems in both known and research-level mathematical areas. The key feature of these languages is Dependent Types, which allow to describe much more interesting properties than regular types like Ints and Floats. You can learn more about this approach here.

The Curry-Howard correspondence shows that programs are proofs, and proofs are programs. The goal of this project is to explore how can Genetic Programming be used to synthesize mathematical proofs. One concern is that the probability of success of a crossover is reduced as common subtrees are less frequent than in symbolic regression or strongly-typed programs.

For evaluation, we plan to use the Lean programming language, and the proof database mathlib, and the LeanGym environment. We also plan to compare with approaches that use Language Models to predict the next step in a proof.

If you have any questions, feel free to reach out to me at me@alcidesfonseca.com

Cloud Identity Providers

If you have no in-house authentication or ‘who exists’ identity system and you’ve offloaded all of these to some external provider (or several external providers that you keep in sync somehow), you’re clearly at the mercy of that cloud identity provider. Otherwise, it’s less clear and a lot more situational as to when you could be said to be using a cloud identity provider and thus how exposed you are. I think one useful line to look at is to ask whether a particular identity provider is used by third party services or if it’s only used to for that provider’s own services. Or to put it in concrete terms, as an example, do you use Github identities only as part of using Github, or do you authenticate other things through your Github identities?

Chris Siebenmann

I was a very strong advocate for OpenID and OAuth, back in the day. However, my idea was to own my online authentication. My OpenID was alcidesfonseca.com, which I would delegate to a provider of my own choosing, which I could change whenever I wanted, without changing anything on the several websites I used to authenticate with.

However, the silo’ed web decided against custom OpenID providers, and ended up supporting Google, Facebook and Github, for the more tech savvy websites. And I’m talking about important stuff, like Tailscale (my VPN provider). The open web lost its decentralization.

Now that I am moving away from Google, I have to change my login details in every website that I used Google as my identity. But most do not support adding a tradition email-password login. And I don’t quite want that. Because I don’t want to update passwords every once in a while at a 1000 websites.

So remember kids, create an account on every website, so you can leave your centralized server whenever they change their privacy policy or they start charging for what used to be free.

Watching Millionaires

I watched the Champions League final the other day when it struck me: I’m basically watching millionaires all the time.
The players are millionaires, the coaches are millionaires, the club owners are millionaires. It’s surreal.
This week I watched John Wick Ballerina and, again, there’s Keanu Reeves, who is a millionaire, and Ana de Armas, who is as well.
Yesterday I heard about Trump and Musk fighting. They are not millionaires, they are billionaires!
As I’m writing this, I’m watching the Rock am Ring live stream, a music festival in Germany. Weezer is playing. These guys are all millionaires.
I don’t know what to make of it. It’s a strange realization, but one that feels worth sharing.

Matthias Endler

The most impressive thing is that my grandparents did not do this. They barely listened to the radio, let alone TV, which was only available when they were 50+.

Personally, I have given up on watching TV or Radio. I watch the TV shows and Movies that I decide, and I listen to the music that I choose. I still watch millionaires, but I am intentional about it.

I recommend looking for the indie content, both TV and music. last.fm is a great source for the latter. Following movie critics blogs is another way of finding interesting TV. For me, I rely on jwz.

Transformative and Compositional Work

To assess the productivity gained by LLMs, Iris Meredith distinguishes between transformative and compositional work:

While work is obviously a very, very complicated thing, a useful lens for the purpose of this essay is to draw a distinction between work which reshapes a raw material into a finished object and work that puts together multiple objects in a way that creates a certain effect in the world. For the sake of having a shorthand, I’ve chosen to call them transformative and compositional work, respectively.

While Iris does not have a particular clear idea of LLMs, Jeremy Keith does:

My own take on this is that transformative work is often the drudge work—take this data dump and convert it to some other format; take this mock-up and make a disposable prototype. I want my tools to help me with that.
But compositional work that relies on judgement, taste, and choice? Not only would I not use a large language model for that, it’s exactly the kind of work that I don’t want to automate away.
Transformative work is done with broad brushstrokes. Compositional work is done with a scalpel.

Personally, I think it depends much more on where you are in the economic value proposition. Are you selling quick-and-dirty cheap stuff? LLMs are great for you. Are you delivering high-assurance, high-quality work? Then you might/should be skeptical of LLMs.

Smart Donkey Factory

My first day of uni, I received these two t-shirts designed by the student group.

two blue t-shirts: the first one depicting the text biggest fucking noob of CS; the second features a factory that takes a donkey as input and outputs the same donkey, but with a diploma

While I completely forgot about the top one, I keep the bottom one near my heart. While I found it amusing, I did not find it to be true. I did learn a lot during these years, and I gained much more than the degree (which is only required for the Portuguese bureaucratic system where Simon Peyton Jones couldn’t even get a position as Assistant Professor).

Now 19 years later, I no longer find it to be funny. I feel the scholarly spirit is dying and young people do not care about learning or knowledge. They care only about grades and getting the degree. And GPT is the TLA that takes them from the donkey without the diploma to the donkey with the diploma.

Replacing Pokemons with Hometown contributors

The plan worked. Kids have started attending local events and volunteering for community activities — just for a chance to meet the ojisan from their cards. Participation in town events has reportedly doubled since the game launched.

Tokyo Weekender

I’m not a fan of gamification, but prioritizing community members over pokemons or football players is a win in my book.

I format my computer every semester

This tradition started back when I was a student. I installed random software for each of the 5 courses I took every semester. I ended up with wasted disk space, random OS configurations and always a complete mess in my $PATH.

So I started formatting my Macs at the end of every semester. And I continue doing that today. Being a professor, I also deal with the software baggage every same semester — otherwise I would probably format it every year.

Most of the people I know think this is insane! Because they spend days in this chore, they avoid it as much as possible, often delaying it so much that they end up buying a new computer before considering formatting. And they also delay buying a new computer for the same reason.

My trick is simple: I automate the process as much as possible, such that it takes ~20 minutes now to format and install everything, and another 2 hours to copy all data and login into the necessary accounts. And you can watch a TV Show while doing it.

I keep a repository with all my dot file configurations, which also contains scripts to soft link all my configurations (usually located at $HOME/Code/Support/applebin) to their expected location ($HOME). This process also includes a .bash_local or .zsh_local where I introduce machine or instance-specific details that I don’t mind losing when I format it in 6 months. Long-lasting configurations go in the repo.

If the machine runs macOS, I also run a script that sets a bunch of defaults (dock preferences, Safari options, you name it) that avoid me going through all settings windows and configuring it the way I like it.

But the most useful file is my Brewfile, that contains all the apps and command-line utilities I use. I should write another usesthis, where I go through all the apps I have installed, and why.

My process starts with copying my home directory to an external hard-drive (for restoring speeds). During this process I usually clean up my Downloads and Desktop folders, which act as more durable /tmp folders. When it’s done, I reset my MacBook to a new state. I then install homebrew and Xcode command line utilities (for git), I clone my repo and run the setup script. At the same time, I start copying back all my documents from the external drive back to my Mac. Then it’s time to do something else.

Two hours later, I can open the newly installed apps and login or enter registration keys, and make sure everything is working fine.

Now I’m ready for the next semester!

The Great Scrape

These companies are racing to create the next big LLM, and in order to do that they need more and more novel data with which to train these models. This incentivises these companies to ruthlessly scrape every corner of the internet for any bit of new data to feed the machine. Unfortunately these scrapers are terrible netizens and have been taking down site-after-site in an unintentional wide-spread DDoS attack.

More tools are being released to combat this, one interesting tool from Cloudflare is the AI Labyrinth which traps AI scrapers that ignore robots.txt in a never-ending maze of no-follow links. This is how the arms race begins.

Herman Martinus

This is a lost fight. As I’ve wrote before, the only solution I see is to encrypt content and monetize it, so there is a trace to show to courts.

Porting vs rewriting codebases from scratch

Languages that allow for a structurally similar codebase offer a significant boon for anyone making code changes because we can easily port changes between the two codebases. In contrast, languages that require fundamental rethinking of memory management, mutation, data structuring, polymorphism, laziness, etc., might be a better fit for a ground-up rewrite, but we’re undertaking this more as a port that maintains the existing behavior and critical optimizations we’ve built into the language. Idiomatic Go strongly resembles the existing coding patterns of the TypeScript codebase, which makes this porting effort much more tractable.

Ryan Cavanaugh, on why TypeScript chose to rewrite in Go, not Rust (via Simon Willison)

Should proofs have to be readable

It seems to us that the scenario envisioned by the proponents of verification goes something like this: The programmer inserts his 300-line input/output package into the verifier. Several hours later, he returns. There is his 20,000-line verification and the message “VERIFIED.”

Social Processes and Proofs of Theorems and Programs by Richard DeMillo, Richard Lipton and Alan Perlis

Although this is another straw man, many people claim to have verified something, offering as evidence a formal proof using their favourite tool that cannot be checked except by running that very tool again, or possibly some other automatic tool. A legible formal proof allows a human reader to check and understand the reasoning. We must insist on this.

Lawrence C Paulson

In my research group, we have been thinking not only on how to improve error messages, but also how to improve the understandability of proofs. It feels good to read such reinsuring take.

How scientists learn computing and use LLMs to program

“scientists often use code generating models as an information retrieval tool for navigating unfamiliar programming languages and libraries.” Again, they are busy professionals who are trying to get their job done, not trying to learn a programming language.

How scientists learn computing and use LLMs to program: Computing education for scientists and for democracy

Very interesting read, especially since we teach programming to non-CS students, which is fundamentally different. Scientists are often multilingual (Python, R, bash) and use LLMs to get the job done. Their goal is not to write maintainable large software, but rather scripts that achieve a goal.

Now I wonder how confident they are that their programs do what they are supposed to do. In my own research, I’ve found invisible bugs (in bash, setting parameters, usually in parts of the code that are not algorithmic) that produce the wrong result. How much of the results in published articles is wrong because of these bugs?

We might need to improve the quality of code that is written by non-scientists.

Do not take career advice from engineers with 5+ years of experience

bq Advice people with long careers on what worked for them when they were getting started is unlikely to be advice that works today. The tech industry of 15 or 20 years ago was, again, dramatically different from tech today. I used to joke that if you knew which was was up on a keyboard, you could get a job in tech. That joke makes no sense today: breaking into the field is now very difficult, and getting harder every year.

Beware tech career advice from old heads, by Jacob Kaplan-Moss

The industry is undervaluing junior developers, by thinking LLMs can do their work. This is true at this instant, but junior developers have the potential to become senior developers.

I still remember years when my team did not have interns at Uber; and years when we did. During the time we did: energy levels were up, and excluding the intern I’d wager we actually did more. Or the same. But it was a lot more fun. All our interns later returned as fulltime devs. All of them are now sr or above engineers – at the same company still (staying longer than the average tenure)

Gergely Orosz

It is up to your faith whether LLMs can eventually be promoted to senior developers (or management). And if you believe it, you may need to reconsider your own job.

Copyright, AI and the Future of the Web

Gorillaz’s Damon Albarn and Kate Bush are among 1000 artists who launched a silent album (on Spotify no less) in protest against the UK government allowing AIs to be trained using copyright-protected work without permission.

This protest highlights the tension between creating valuable tools and devaluing human content.

The value of AI

ChatGPT (and even Apple Intelligence) is trained on information publicly available on the internet, data from (consenting) third parties, and information provided by their employees or contractors. Over the last year and a half, people have been amazed at what ChatGPT has been able to do. Although the quality of its work fluctuates as new data/methods are being updated, ChatGPT and similar tools are being used to create value. But at what cost?

Unconsciously, The Algorithm has become more and more important in our lives. From Instagram and TikTok reels, X and Facebook timelines, Spotify, YouTube, or Netflix’s recommendations, the decision of what we see is no longer ours. And we are also not delegating our choices to a human editor (as is the case of the old boring telly or radio channels). Those decisions are being made by black-box algorithms that are hidden in the shadows.

The EU AI law, which I blogged about before, only requires explainability for applications in high-risk domains. Entertainment can hardly be thought of as high-risk. However, I would argue that given the importance of online content consumption in today’s society, it should be considered high-risk. One example is the perceived power of Twitter/X in political elections.

On the other hand, educational purposes are considered fair use in most countries (which is certainly true here in Portugal). What is the difference between fair use for human and machine learning? As we become increasingly dependent on AI for our daily tasks – I use Siri and Reminders to augment my memory and recalling ability — we become de facto cyborgs. Is there a difference between human and machine learning for education?

The devalue of Human content

In 2017, Spotify introduced the Perfect Fit Content program, encouraging editors to include songs purposely designed to fit a given mood in their playlists. Liz Pelly goes into all the details in her piece The Ghosts in the Machine. Some human, some AI, several companies have been starting to produce music à lá carte for Spotify.

According to The Dark Side of Spotify, Spotify investors are also investing in these companies (phantom artists on the platform, which use random names with no online presence other than inside the platforms) and promoting the use of AI to beat the algorithm. While this vertical integration might be cause for considering anti-trust or monopoly issues, the fact is that Netflix has been successful in expanding to content production (as has Disney been successful in expanding into content distribution).

AIs are much more productive in generating music than humans. Which is not necessarily the same as being successful in producing music a) that humans enjoy or b) that is commercially viable. The Musical Turing Test is almost solved, addressing a). Commercial viability is even easier to address. Because the cost of producing AI music is so low compared to the human equivalent, AI companies can flood the market with millions of songs, letting the algorithm filter out the ones that do not work. In that scenario, human musicians are not just competing with each other for user’s attention but are now unable to be showcased to users without an explicit search. Additionally, AI can better cater to some audiences based on data extracted from these networks (remember Spotify’s investors also investing in AI music production companies?) than humans can, at least in large numbers.

And I’m aware AI can be a tool for musicians, but if AI can perform end-to-end music generation passing the Musical Turing Test, it becomes much more interesting from a commercial standpoint.

The only chance for musicians is to promote their own content outside of these platforms, abandoning the initial goal of Web 2.0, where anyone can create content on the web. They can, but it just won’t be discoverable in the ocean of AI-generated content. But this is a symptom of a more significant problem for the web.

I feel like the people who try to be positive – well, I don’t know what they’re doing. I’m a music producer and also a writer who also happens to make art/design her own album art. Thankfully, I also dance, which is going to be the one thing that saves me I feel. — PrettyLittleHateMaschine on AI music.

The quality of AIs depends on human

ChatGPT was primarily trained on internet-available content. So, its quality depends on what is available at a given time. If we stop collecting new information, we can assume its quality will remain unchanged. Still, it will not be helpful with new information, such as news updates or scientific discoveries. Its usefulness will be reduced.

On the other hand, if the quality of AIs increases — it’s more and more difficult to tell the difference between human and GPT-generated text — and it passes the Turing test, the content available online will be more and more AI-generated than human-generated, as it’s more economical to use AI to produce text, audio or even video.

Here, we consider what may happen to GPT-{n} once LLMs contribute much of the text found online. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear.

AI models collapse when trained on recursively generated data

This recent Nature paper reports that LLMs perform worse when trained on LLM-generated content. Human content is now essential! LLM companies need high-quality human content to train their next-generation models, especially concerning novel knowledge. But econmics no longer work. Content is created once, consumed once, and used to generate millions of derivates for almost free. An author might publish a book, hoping to make the money for the time it took to write from the sum of all individual sales. However, AI companies will not buy the book at its production cost to train a model. Same for daily news. The human audience is still needed to make this work. And suppose everything is made available for free on the web. In that case, humans are making the same mistake that led to ChatGPT being in business without contributing to the original content sources.

The current Web is not enough.

Web 2.0 died and now the web happens more and more inside silos. Famously, Instragram does not allow for links outside its app. “Link in the bio” will be listed as the cause of death in Tim Berners Lee’s obituary. It goes against what the web was supposed to be. But today’s personal entertainment happens in silos (Instagram, Netflix, Disney+, etc…), not on the open web. Even Reddit communities have started blocking links to some websites, like X.

The web failed at microtransactions. Paying 10 cents for reading a well-written article was the original goal. Even with Paypal and Apple Pay, the model was only successful for large purchases, not pay-per-view. Imagine that you give Youtube your credit card, and it takes 1 euro for each hour watched. Once you have something for free, it is difficult for companies to make you pay for it.

As a business that moved from analog to digital almost completely, most news outlets have failed to change their economics and they are now struggeling financially. As the price of advertising online has decreased over the past years, they have switched to a subscription model, putting up paywalls with dubious outcomes.

The future of the Web

I foresee a web where high-quality human content is behind paywalls. While most of the web can be AI-generated and free, it will be ignored if high-quality content is available from trusted sources. Content will be signed and (possibly) encrypted using personal keys. These keys can be provided by the government, or other parties. For instance, every Portuguese citizen already has their keys inside our citizen cards, sometimes with professional attributes.

If you wanted to read the news, you can go to an online newspaper, where the content will be signed by a recognized journalist or editor. The body of the text can be encrypted but with a faster Apple Pay-like prompt, you can pay cents to read it. Even if the journalist published AI-generated content, they are liable for its content.

This proposal makes the web a more trustful place and somewhat addresses the economic problems of paying for content on the web. It requires payment processors to drop the minimum cost per transaction, which I believe is happening more and more. And as more and more garbage is published online, users will see the need to pay for high-quality content.

As for AI providers, they will now have to pay for content. And even if it is ridiculously cheap, there is a trace that they bought that information, useful when you want to prove in court that your content was used in training LLMs.

We might not get to this Web, but I hope some of this ideas help the web survide the tsunami of garbage content that is starting to flood our dear World Wide Web.

The world is an hexagonal boardgames

Hexagons are quite popular in boardgames, given the high number of neighbours, which increases the number of possible actions by players.

Only triangles, squares, and hexagons can tile a plane without gaps, and of those three shapes hexagons offer the best ratio of perimeter to area.

Simon Willison

Which makes hexagon the shape used in Uber’s H3 geographical indexing mechanism, which can be visualized at https://wolf-h3-viewer.glitch.me/

Programming for non-CS is different

[…] the top learning objective was for their students to understand that websites can be built from databases.

I’m pretty sure that the most popular programming language (in terms of number of people using it) on most campuses is R. All of Statistics is taught in R.

End-user programmers most often use systems where they do not write loops. Instead, they use vector-based operations — doing something to a whole dataset at once. […] Yet, we teach FOR and WHILE loops in every CS1, and rarely (ever?) teach vector operations first.

CS doesn’t have a monopoly on computing education: Programming is for everyone by Mark Guzdial

The main take away is that you do not teach Programming 101 to non-Software Engineering/Computer Science the same way you teach to those students. The learning outcomes are different, and so should the content.

Funny how Functional Programming (via vectorized operations) is suggested to appear first than imperative constructs like for or while. This even aligns with GPU-powered parallelism that is needed when processing large datasets.

Food for thought.

A review of "We are destroying software"

Salvatore Sanfilippo (@antirez):

We are destroying software by no longer taking complexity into account when adding features or optimizing some dimension.

Agree.

We are destroying software with complex build systems.

Disagree: they are no longer build systems. They also take care of deployment, notarization, linting, vcs, etc.

We are destroying software with an absurd chain of dependencies, making everything bloated and fragile.

Mostly disagree. Leftpad is a good example of this taken to the extreme. But 90% of the cases are worthwhile. Fixing bugs in onde dependency fixes many downstream projects. However, product maintenance is often ignore in industry, and this is the real issue.

We are destroying software telling new programmers: “Don’t reinvent the wheel!”. But, reinventing the wheel is how you learn how things work, and is the first step to make new, different wheels.

Mostly disagree. Reinventing the wheel is a good academic exercise, but not in a product or service. Do it on your own time or in school.

We are destroying software by no longer caring about backward APIs compatibility.

Agree. We need to care more about the longevity of software and hardware. How long should a car last? Or a phone? I still use a very old iPod, but can’t use my more-recent blackberry.

We are destroying software pushing for rewrites of things that work.

Mostly disagree. I think most of the cases, we lack rewrites of things that do not work. The opposite is much less common.

We are destroying software by jumping on every new language, paradigm, and framework.

I agree, but only for startups/SV. It’s a common practice for CoolCompanies™ to write their software using a newish framework to hire people who are interested in learning (often better engineers). But that only occurs in a minority of the companies producing software.

We are destroying software by always underestimating how hard it is to work with existing complex libraries VS creating our stuff.

Mostly disagree. It’s easier to underestimate building things from scratch.

We are destroying software by always thinking that the de-facto standard for XYZ is better than what we can do, tailored specifically for our use case.

Disagree. We want open and modular software. I hate that the Gmail app is way better than Mail.app. Or that WhatsApp does not talk to Telegram or Signal. I hate silos like instagram that are not good internet citizens by not having open APIs and standards. Yes, standards are slow, but the end result is better for society.

We are destroying software claiming that code comments are useless.

Mostly disagree. We are destroying software by not writing the right comments. Most comments are written by people who write poor code and the wrong comments.

We are destroying software mistaking it for a purely engineering discipline.

I don’t even understand this point. Writing software products and services is engineering: it has millions of tradeoffs.

We are destroying software by making systems that no longer scale down: simple things should be simple to accomplish, in any system.

Disagree. We are destroying software by not spending the resources to build good software. It’s not about scaling.

We are destroying software trying to produce code as fast as possible, not as well designed as possible.

Mostly agree. Again, it’s about economics. Software is build with the constraints provided. Time and team quality are part of those constraints. Maybe we need better leadership and better economics.

We are destroying software, and what will be left will no longer give us the joy of hacking.

Disagree. Enterprise software does not provide joy, and that software is most of what exists. There is still good, indie or opensource software that provides that joy, it’s simply in less and less of products/services.

For most software to be hackable, it would have to incorporate many of @antirez’s complaints. I understand it’s a manifesto, but it lacks consistency and focus.

Score: 5/14

TIL: Redirect Emails

Today I learned that emails have the option to redirect an email to someone else, while keeping you in the loop but having replies going to the original sender.

Only the address of the original sender is shown to the recipient, and the recipient’s reply goes only to the original sender.

Apple Mail.app Documentation (via Chris Krycho)