Alcides Fonseca

40.197958, -8.408312

Google Workspace and Email Authentication

My employer added a new spam filtering system (Anubis), which blocks my own personal email. Since I have an old free account in Google Workspaces, I had to figure out what was going on. Apparently, I was not signing the emails with SPF, DKIM and DMARC.

I followed this very simple and direct guide on how to setup it up. And I tested my new configurations using DMARCly.

Now the problem is that I have my Gmail (for Google Workspaces) set up to send emails via my employer’s SMTP account. Yes, I like to have all my email accounts merged in one inbox. I tried to split it last year, but failed miserably as I have plenty of work stuff in my personal email, and I like a single search box for all my emails.

I’ll update this post when I am successful at signing SMTP outgoing emails as well.

Universidades contornam limites de propinas com taxas e taxinhas

No Politécnico de Coimbra subiram a taxa de matrícula de 30 para 125 euros. Alunos de Mestrado e Doutoramento pagam até 500 euros de taxa de entrega de tese.

No caso das licenciaturas, estas taxas servem para as universidades públicas receberem mais dinheiro do que a propinas que está definida por lei. A nível de doutoramento, serve para manter o valor da propina naquele que a FCT suporta nas suas bolsas (2750 euros).

A verdade é que os alunos vêem um preço anunciado, e depois é-lhes impossível acabarem o curso pagando apenas esse valor. É literalmente publicidade enganosa.

Precisamos de duas mudanças: eliminação das taxas por parte das Universidades e Politécnicos, englobando esse custo na propina. Um aluno pagando a propina, deve conseguir ter acesso a assistir às aulas, ser avaliado e obter o diploma, sem qualquer taxa.

E o estado precisa de majorar o financiamento das universidades, que claramente têm de recorrer a estas acções eticamente discutíveis para manter a sustentabilidade económica que lhes é exigida pelo Tribunal de Contas.

Reddit will block the Internet Archive

Reddit says that it has caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to start blocking the Internet Archive from indexing the vast majority of Reddit. The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.

Jay Peters for The Verge (via Simon Willison)

Next it will be google. And most of the post-2025 forum knowledge in the web will be lost. Imagine a web browser that cannot access StackOverflow or Reddit. How useful it is? LLMs will need new data to continue being relevant, and a new data monetization strategy will change the internet forever.

Using AI to get an answer

AI is a Floor Raiser, not a Ceiling Raiser

Every day I am convinced that Software Engineering should be taught without AI. AI can give you answers to easy problems. But you won’t be able to create mental models of how things work, which will help you solve hard problems.

This next semester I am teaching a course on practices and tools in Software Engineering (my take will be inspired by MIT’s The Missing Semester). AI usage will be one of the topics, where we will explore MCP, IDE integrations and AI-assisted documentation.

But I have no idea how to write assignments for other topics. It is very likely that an AI will be able to complete the assignment without any human intervention. If students opt to do that (they will, that’s the faith I have in our grade-oriented system), they will not achieve the learning outcomes.

Students will question: if AI can do these tasks, why should we learn it? Well, math teachers in school still assigned me problems that machines could already solve by then. But creating mental models of how things work is essential in education.

Now the real question is about the incentives. We should assess whether students can use their mental models, and not whether they can solve the task. Especially with 100 students, where exams or take-home assignments are the norm.

Pypy and Flask

Very basic hello world app hosted under gunicorn (just returning the string “hello world”, so hopefully this is measuring the framework time). Siege set to do 10k requests, 25 concurrency, running that twice so that they each have a chance to “warm up”, the second round (warmed up) results give me:

  • pypy : 8127.44 trans/sec
  • cpython: 4512.64 trans/sec
    So it seems like there’s definitely things that pypy’s JIT can do to speed up the Flask underpinnings.

Twirrim on HN

I had no idea Pypy could improve webapps this much. I really ought to give it a try speeding GeneticEngine again. Guilherme suggested it originally, but there was some C-binding issues at the time. Will report back.

20 years of Django

Simon Willisson’s 10 years of Django presentation

I remember using Django back in 2010. There were a few problems with it:

  • Deployment: you could not just upload files like with PHP, you had to rely on some cgi-bin magic (fastcgi was introduced later, if I am not mistaken), making me buy my own VPS for the first time, which I still use to run this website, now deployed using Phusion Passenger.
  • Versioning: There was no stable version. Each project I started, I ended up following the official recommendation: use the trunk version! For the young developers out there, trunk is SVN’s main branch. So there was no stable version I could use, or keep all my websites on the same version and every 6-or-so months upgrade all of them.

And that’s it. Everything else was awesome: nice templating system, URL scheme, MVC (MTV in their style) pattern, image uploads, etc. The one thing missing for large-scale applications was migrations (the main thing Ruby on Rails had and Django didn’t). But that did not stop me. Migrations were done manually in the shell. For a one-man job, it worked perfectly.

Besides this website, I build a project management tool for jeKnowledge (a junior company I founded back in 2011), an App Store for a touch screen wall, a Python blog aggregator and a few other pet projects. Later, I would end working on a startup for tennis match-making as well as on a web application for automating genomics testing, all in Django.

Weavers against the Machine

IconFactory is a boutique design studio that focus on app icons. Their business is being replaced by AI.

This is the challenge of GenerativeAI: it replaces creative work (or automates it). Design work is one of those fields where the number of employed designers will decrease, and you will need to be either a very good, or a very productive one. If I were an undergrad in design, I would try to learn the difficult stuff, and now the run-of-the-mill stuff that will be easily automated.

Japan's IC Cards

Places like Hong Kong and Tokyo have a lot of commuters, leading to a lot of congestion around station gates. Sony realised this, and invested heavily into the performance of their technology – FeliCa cards boast an advertised communication speed of up to 424kbps, making a noticeable improvement in gate processing speeds compared to Western counterparts. Compare the speed of passing through a ticket gate on the Underground to a Tokyo ticket gate – you could practically sprint through. This is partly achieved by the fact that transactions only involve the card and the reader itself – the reader doesn’t talk to an external server to perform a transaction. This makes IC cards stored-value cards – as in, they store the value on themselves, rather than their value being stored on the backend where it’s controlled fully by the operator.

Japan’s IC cards are weird and wonderful by @aecsocket

I visited Japan in May and I was a bit confused by how IC cards (pre-paid NFC cards) interacted with my iPhone. It was really weird that I could have a digital version of the card or a physical card, not both. In practice, when I converted the physical card to Apple Pay, the physical copy would no longer work. After reading this awesome article about the technology, I now understand why: unlike western NFC cards, the money is stored as credits in the card itself. Therefore, you are limited to having only one of them as your money storage device. I wonder whether IC cards could be used for money laundering, given how multipurpose they are — you can pay your supermarket or mean with them!

Apple Pay is very convenient. Most of the days I don’t even carry a wallet. I pay everything with my card, and I even have my citizen ID and driver’s license on my government app. However, I do not carry a power bank or a lightning cable. That means that I’m usually screwed up if I ran out of battery. In Japan, that meant getting stuck in transit (especially when traveling from city to city). The fact that NFCs can work passively is a major advantage of the tech. Maybe we need phone NFCs to work even without battery. Or have some kind of chi-charging and a low-battery mode for phones to provide critical features outside of the main OS.

Trust in Scientific Code

In 2010 Carmen Reinhart and Kenneth Rogoff published Growth in a Time of Debt. It’s arguably one of the most influential economics papers of the decade, convincing the IMF to push austerity measures in the European debt crisis. It was a very, very big deal.
In 2013 they shared their code with another team, who quickly found a bug. Once corrected, the results disappeared.
Greece took on austerity because of a software bug. That’s pretty fucked up.

How do we trust our science code? by Hillel Wayne

As more and more scientific publications are dependent on code, trusting code is more and more needed. Hillel asks for solutions, I propose to tackle the problem in two fronts.

1 – More engineering resources

Writing production-level quality software requires larger resources (usually engineerings, but also some tooling). Most scientific software is written once and read never. Some PhD or MSc student writes a prototype, shows the plots to their advisors who write (some or most of) the paper. It’s rare for senior researchers to inspect other people’s code. In fact, I doubt any of them (except if they teach software engineering principles) has had any training in code inspection.

We need research labs to hire (and maintain) scientific software engineering teams. For that to happen, funding has to be more stable. We cannot rely on project funding that may or may not be awarded. We need stable funding for institutions so they can maintain this team and resources.

2 – More reproducibility

Artifact Evaluation Committees are a good addition to computer science conferences. Mostly comprised of students (who have the energy to debug!), they run the artifacts and verify whether the results of the run justify the results presented in the paper. Having done that myself in the past, it is very tricky to find bugs in that process. Mostly we verify whether it will run outside of your machine, but not whether it is rightly implemented.

What would help is to fund reproduction of science. Set 50% of the agency funding for reproducibility. Labs that get these projects should spend less than the original project to reproduce the results (and most of the challenging decisions are already made). In this approach, we will have less new research, but more robust one.

Given how most of the CS papers are garbage (including mine), I welcome this change. We need more in-depth strong papers that move the needle, and less bullshit papers that are just published for the brownie points.

Overall we need better scientific policies with the right incentives for trustworthy science. I wonder who will take this challenge on…

Hidden interface controls are affecting usability

It’s the year 2070. You are a 20 year recruit that is going to travel back in time 12-monkey style to try and save the world. You get to 2025, you find proof on a iPhone and you need to take a screenshot and send to a safe email address. Do you have a change at discovering how to take a screenshot?

The other day I was locked out of my car. I had my keys, but the key fob button wouldn’t work and neither would the little button on the door handle that normally unlocks the car. At this point, every action I had to take in order to get into the car required knowledge of a hidden control. Why didn’t I just use my key to get in? First, you need to know there is a hidden key inside the fob. Second, because there doesn’t appear to be a keyhole on the car door, you also have to know that you need to disassemble a portion of the car door handle to expose the keyhole.

Philip Kortum has a nice article on how this quest towards “clean” interfaces actually hurts usability.

How to select your side project

Recommended audience: CS students

Austin Henley shares some properties of a good side project. Personally, I think having a clear shippable objective is what most people lack, and prevents them from ever being complete.

I remember having side-projects suggestions during my courses. Maybe that’s something I have to incorporate in mine.

Most of what I’ve learned during my degree was doing side-projects. From competing in hackathons, creating a junior company, organizing conferences, doing a couple of research internships, and doing some freelancing work, these projects all taught me something that was not in the syllabus. That’s what separates you from the average student, and what will get you a good job in a world where unemployed software engineers are aplenty.

Joshua Barretto shares a really interesting list of possible side projects:

  • Regex engine (5 days)
  • x86 kernel (2 months)
  • Gameboy emulator (3 weeks)
  • Gameboy advance game (2 weeks)
  • Chess engine (5 days)
  • Physics engine (1 week)
  • Voxel engine (2 weeks)
  • GUI Toolkit (3 weeks)
  • Posix shell (5 days)
  • Dynamic interpreter (2 weeks)
  • Compiler (3 months)
  • Threaded Virtual machine (1 week)
  • Text editor (4 weeks)

The last four will give you an heads up in the programming language world. I might even have an internship for you.

Perhaps you’re a user of LLMs. But I might suggest resisting the temptation to use them for projects like this. Knowledge is not supposed to be fed to you on a plate. If you want that sort of learning, read a book – the joy in building toy projects like this comes from an exploration of the unknown, without polluting one’s mind with an existing solution.

Selling SAAS to universities

Recommended audience: Startups and large companies who intend to sell software to universities.

Most SAAS is sold on a per-seat basis. But this does not scale to universities, as we have a large number of possible seats, but most of them (students, possibly from different scientific areas) do not use the software, at least for it to be worthwhile.

On the other hand, unpredictable costs (when paying per activity) is also something that does not work, as we need other budget it yearly.

Chris Siebenmann has a really good write up on this issue, which I recommend if you manage or sell to universities.

Gremllm

Take python magic methods and add LLM code generation. That’s Gremlin, which no one should use.

However, I would certainly use this (if the output had a different mark, like color in my interpreter) for debugging code.

from gremllm import Gremllm

counter = Gremllm(“counter”)
counter.value = 5
counter.increment()
print(counter.value) # 6?
print(counter.to_roman_numerals()) # VI?

LLMs have the same right as humans when it comes to copyright and learning

As I stated before, the boundary of what is copyright infringement when it comes to machine training is quite blurred.

If you see LLMs as their own entities (I don’t, but I’m afraid Nobel laureate Geoffrey Hinton does), they have the same right to learn as humans. They just happen to have photographic (literary?) memory. Is it their fault?

On the other hand, you look at LLMs as a form of compression. Lossy, yes, but a compression algorithm nevertheless. In that case, if you zip a book and unzip it, even with a few faults, it’s the same book you copied.

Legislation will have to decide on this sooner or latter.

William Haskell Alsup, of Oracle vs Google fame, ruled that buying and scanning books to train LLMs was legal. He also decided that downloading (pirated by a 3rd party) ebooks was not fine.

Regardless of my own position, I believe every government should create a task force to think about this, including experts from different fields. Last time something like this (peer-to-peer, Napster, The Pirate Bay) happened, legislation took too long to arise. Now, this are moving at an ever faster pace. And I’m afraid our legal systems are not flexible and agile enough to adapt.

New Alignments and Fonts

I’ve recently come across novel approaches to typography, which given how much we are moving to digital, I find to be rarer than expected.

Alternative Layout System presents different ways of justifying text or, in the case of the picture below, ways of annotating what is coming in the next line to help reading.

Kermit is a new font designed by Underwear and commissioned by Microsoft that aims to help kids to read, including those with dyslexia. But for me, it’s much more than that. It allows to include tones in the typography that help to convey how you should read some text, making kids books much more fun.

And don’t miss Jason Santa Maria’s other recommendations.

No AI in Servo

Contributions must not include content generated by large language models or other probabilistic tools, including but not limited to Copilot or ChatGPT. This policy covers code, documentation, pull requests, issues, comments, and any other contributions to the Servo project.

A web browser engine is built to run in hostile execution environments, so all code must take into account potential security issues. Contributors play a large role in considering these issues when creating contributions, something that we cannot trust an AI tool to do.

Contributing to Servo (via Simon Willison)

Critical projects should be more explicit about their policies. If I had a critical piece of software, I would do the same choice, for safety. The advantage of LLMs are not that huge to be worth the risk.