Latest Pages by Alcides Fonseca

Copyright, AI and the Future of the Web

Gorillaz’s Damon Albarn and Kate Bush are among 1000 artists who launched a silent album (on Spotify no less) in protest against the UK government allowing AIs to be trained using copyright-protected work without permission.

This protest highlights the tension between creating valuable tools and devaluing human content.

The value of AI

ChatGPT (and even Apple Intelligence) is trained on information publicly available on the internet, data from (consenting) third parties, and information provided by their employees or contractors. Over the last year and a half, people have been amazed at what ChatGPT has been able to do. Although the quality of its work fluctuates as new data/methods are being updated, ChatGPT and similar tools are being used to create value. But at what cost?

Unconsciously, The Algorithm has become more and more important in our lives. From Instagram and TikTok reels, X and Facebook timelines, Spotify, YouTube, or Netflix’s recommendations, the decision of what we see is no longer ours. And we are also not delegating our choices to a human editor (as is the case of the old boring telly or radio channels). Those decisions are being made by black-box algorithms that are hidden in the shadows.

The EU AI law, which I blogged about before, only requires explainability for applications in high-risk domains. Entertainment can hardly be thought of as high-risk. However, I would argue that given the importance of online content consumption in today’s society, it should be considered high-risk. One example is the perceived power of Twitter/X in political elections.

On the other hand, educational purposes are considered fair use in most countries (which is certainly true here in Portugal). What is the difference between fair use for human and machine learning? As we become increasingly dependent on AI for our daily tasks – I use Siri and Reminders to augment my memory and recalling ability — we become de facto cyborgs. Is there a difference between human and machine learning for education?

The devalue of Human content

In 2017, Spotify introduced the Perfect Fit Content program, encouraging editors to include songs purposely designed to fit a given mood in their playlists. Liz Pelly goes into all the details in her piece The Ghosts in the Machine. Some human, some AI, several companies have been starting to produce music à lá carte for Spotify.

According to The Dark Side of Spotify, Spotify investors are also investing in these companies (phantom artists on the platform, which use random names with no online presence other than inside the platforms) and promoting the use of AI to beat the algorithm. While this vertical integration might be cause for considering anti-trust or monopoly issues, the fact is that Netflix has been successful in expanding to content production (as has Disney been successful in expanding into content distribution).

AIs are much more productive in generating music than humans. Which is not necessarily the same as being successful in producing music a) that humans enjoy or b) that is commercially viable. The Musical Turing Test is almost solved, addressing a). Commercial viability is even easier to address. Because the cost of producing AI music is so low compared to the human equivalent, AI companies can flood the market with millions of songs, letting the algorithm filter out the ones that do not work. In that scenario, human musicians are not just competing with each other for user’s attention but are now unable to be showcased to users without an explicit search. Additionally, AI can better cater to some audiences based on data extracted from these networks (remember Spotify’s investors also investing in AI music production companies?) than humans can, at least in large numbers.

And I’m aware AI can be a tool for musicians, but if AI can perform end-to-end music generation passing the Musical Turing Test, it becomes much more interesting from a commercial standpoint.

The only chance for musicians is to promote their own content outside of these platforms, abandoning the initial goal of Web 2.0, where anyone can create content on the web. They can, but it just won’t be discoverable in the ocean of AI-generated content. But this is a symptom of a more significant problem for the web.

I feel like the people who try to be positive – well, I don’t know what they’re doing. I’m a music producer and also a writer who also happens to make art/design her own album art. Thankfully, I also dance, which is going to be the one thing that saves me I feel. — PrettyLittleHateMaschine on AI music.

The quality of AIs depends on human

ChatGPT was primarily trained on internet-available content. So, its quality depends on what is available at a given time. If we stop collecting new information, we can assume its quality will remain unchanged. Still, it will not be helpful with new information, such as news updates or scientific discoveries. Its usefulness will be reduced.

On the other hand, if the quality of AIs increases — it’s more and more difficult to tell the difference between human and GPT-generated text — and it passes the Turing test, the content available online will be more and more AI-generated than human-generated, as it’s more economical to use AI to produce text, audio or even video.

Here, we consider what may happen to GPT-{n} once LLMs contribute much of the text found online. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear.

— AI models collapse when trained on recursively generated data

This recent Nature paper reports that LLMs perform worse when trained on LLM-generated content. Human content is now essential! LLM companies need high-quality human content to train their next-generation models, especially concerning novel knowledge. But econmics no longer work. Content is created once, consumed once, and used to generate millions of derivates for almost free. An author might publish a book, hoping to make the money for the time it took to write from the sum of all individual sales. However, AI companies will not buy the book at its production cost to train a model. Same for daily news. The human audience is still needed to make this work. And suppose everything is made available for free on the web. In that case, humans are making the same mistake that led to ChatGPT being in business without contributing to the original content sources.

The current Web is not enough.

Web 2.0 died and now the web happens more and more inside silos. Famously, Instragram does not allow for links outside its app. “Link in the bio” will be listed as the cause of death in Tim Berners Lee’s obituary. It goes against what the web was supposed to be. But today’s personal entertainment happens in silos (Instagram, Netflix, Disney+, etc…), not on the open web. Even Reddit communities have started blocking links to some websites, like X.

The web failed at microtransactions. Paying 10 cents for reading a well-written article was the original goal. Even with Paypal and Apple Pay, the model was only successful for large purchases, not pay-per-view. Imagine that you give Youtube your credit card, and it takes 1 euro for each hour watched. Once you have something for free, it is difficult for companies to make you pay for it.

As a business that moved from analog to digital almost completely, most news outlets have failed to change their economics and they are now struggeling financially. As the price of advertising online has decreased over the past years, they have switched to a subscription model, putting up paywalls with dubious outcomes.

The future of the Web

I foresee a web where high-quality human content is behind paywalls. While most of the web can be AI-generated and free, it will be ignored if high-quality content is available from trusted sources. Content will be signed and (possibly) encrypted using personal keys. These keys can be provided by the government, or other parties. For instance, every Portuguese citizen already has their keys inside our citizen cards, sometimes with professional attributes.

If you wanted to read the news, you can go to an online newspaper, where the content will be signed by a recognized journalist or editor. The body of the text can be encrypted but with a faster Apple Pay-like prompt, you can pay cents to read it. Even if the journalist published AI-generated content, they are liable for its content.

This proposal makes the web a more trustful place and somewhat addresses the economic problems of paying for content on the web. It requires payment processors to drop the minimum cost per transaction, which I believe is happening more and more. And as more and more garbage is published online, users will see the need to pay for high-quality content.

As for AI providers, they will now have to pay for content. And even if it is ridiculously cheap, there is a trace that they bought that information, useful when you want to prove in court that your content was used in training LLMs.

We might not get to this Web, but I hope some of this ideas help the web survide the tsunami of garbage content that is starting to flood our dear World Wide Web.

Published: Mar 13, 2025

Author: Alcides Fonseca

Language: English

The world is an hexagonal boardgames

Hexagons are quite popular in boardgames, given the high number of neighbours, which increases the number of possible actions by players.

Only triangles, squares, and hexagons can tile a plane without gaps, and of those three shapes hexagons offer the best ratio of perimeter to area.

— Simon Willison

Which makes hexagon the shape used in Uber’s H3 geographical indexing mechanism, which can be visualized at https://wolf-h3-viewer.glitch.me/

Published: Mar 12, 2025

Author: Alcides Fonseca

Language: English

Programming for non-CS is different

[…] the top learning objective was for their students to understand that websites can be built from databases.

I’m pretty sure that the most popular programming language (in terms of number of people using it) on most campuses is R. All of Statistics is taught in R.

End-user programmers most often use systems where they do not write loops. Instead, they use vector-based operations — doing something to a whole dataset at once. […] Yet, we teach FOR and WHILE loops in every CS1, and rarely (ever?) teach vector operations first.

— CS doesn’t have a monopoly on computing education: Programming is for everyone by Mark Guzdial

The main take away is that you do not teach Programming 101 to non-Software Engineering/Computer Science the same way you teach to those students. The learning outcomes are different, and so should the content.

Funny how Functional Programming (via vectorized operations) is suggested to appear first than imperative constructs like for or while. This even aligns with GPU-powered parallelism that is needed when processing large datasets.

Food for thought.

Published: Mar 06, 2025

Author: Alcides Fonseca

Language: English

A review of "We are destroying software"

Salvatore Sanfilippo (@antirez):

We are destroying software by no longer taking complexity into account when adding features or optimizing some dimension.

Agree.

We are destroying software with complex build systems.

Disagree: they are no longer build systems. They also take care of deployment, notarization, linting, vcs, etc.

We are destroying software with an absurd chain of dependencies, making everything bloated and fragile.

Mostly disagree. Leftpad is a good example of this taken to the extreme. But 90% of the cases are worthwhile. Fixing bugs in onde dependency fixes many downstream projects. However, product maintenance is often ignore in industry, and this is the real issue.

We are destroying software telling new programmers: “Don’t reinvent the wheel!”. But, reinventing the wheel is how you learn how things work, and is the first step to make new, different wheels.

Mostly disagree. Reinventing the wheel is a good academic exercise, but not in a product or service. Do it on your own time or in school.

We are destroying software by no longer caring about backward APIs compatibility.

Agree. We need to care more about the longevity of software and hardware. How long should a car last? Or a phone? I still use a very old iPod, but can’t use my more-recent blackberry.

We are destroying software pushing for rewrites of things that work.

Mostly disagree. I think most of the cases, we lack rewrites of things that do not work. The opposite is much less common.

We are destroying software by jumping on every new language, paradigm, and framework.

I agree, but only for startups/SV. It’s a common practice for CoolCompanies™ to write their software using a newish framework to hire people who are interested in learning (often better engineers). But that only occurs in a minority of the companies producing software.

We are destroying software by always underestimating how hard it is to work with existing complex libraries VS creating our stuff.

Mostly disagree. It’s easier to underestimate building things from scratch.

We are destroying software by always thinking that the de-facto standard for XYZ is better than what we can do, tailored specifically for our use case.

Disagree. We want open and modular software. I hate that the Gmail app is way better than Mail.app. Or that WhatsApp does not talk to Telegram or Signal. I hate silos like instagram that are not good internet citizens by not having open APIs and standards. Yes, standards are slow, but the end result is better for society.

We are destroying software claiming that code comments are useless.

Mostly disagree. We are destroying software by not writing the right comments. Most comments are written by people who write poor code and the wrong comments.

We are destroying software mistaking it for a purely engineering discipline.

I don’t even understand this point. Writing software products and services is engineering: it has millions of tradeoffs.

We are destroying software by making systems that no longer scale down: simple things should be simple to accomplish, in any system.

Disagree. We are destroying software by not spending the resources to build good software. It’s not about scaling.

We are destroying software trying to produce code as fast as possible, not as well designed as possible.

Mostly agree. Again, it’s about economics. Software is build with the constraints provided. Time and team quality are part of those constraints. Maybe we need better leadership and better economics.

We are destroying software, and what will be left will no longer give us the joy of hacking.

Disagree. Enterprise software does not provide joy, and that software is most of what exists. There is still good, indie or opensource software that provides that joy, it’s simply in less and less of products/services.

For most software to be hackable, it would have to incorporate many of @antirez’s complaints. I understand it’s a manifesto, but it lacks consistency and focus.

Score: 5/14

Published: Mar 06, 2025

Author: Alcides Fonseca

Language: English

TIL: Redirect Emails

Today I learned that emails have the option to redirect an email to someone else, while keeping you in the loop but having replies going to the original sender.

Only the address of the original sender is shown to the recipient, and the recipient’s reply goes only to the original sender.

— Apple Mail.app Documentation (via Chris Krycho)

Published: Feb 06, 2025

Author: Alcides Fonseca

Language: English

The Aging Programmer

Kate Gregory explains the challenges of growing old. I recommend this talk even more to younger folks who pull all nighters, drink red bull (or worse), and don’t care about their posture.

I’ve been highly concerned with my health as someone whose job is spent mostly in front of a screen. I take care of my posture, but even with standing desks, exercise and external monitor, I’ve been having backaches. In my 20ies, I’ve had issues with my wrists (I’ve since adopted vertical mouses and trackballs). Now I’m experiencing eyesight degradation. Even with this, I’ve learned a few new things:

Having more muscles leads to a better imune system, and more independence when you’re older.
After your 50ies, driving at night is a problem due to slow adaptation to high-contrast scenarios. One problematic example is the bright screen cars come equipped with. Maybe we need to invest in analog cars for the elderly (not a joke, I also want one of those).

Published: Feb 03, 2025

Author: Alcides Fonseca

Language: English

Anthropic Dogfooding

Dogfooding generally consists in using your own product, to make sure you feel the user’s pain points.

Anthropic is probably doing it unwantingly, as their job postings inbox is so full of AI-written applications that they had to introduce the following policy:

While we encourage people to use AI systems during their role to help them work faster and more effectively, please do not use AI assistants during the application process. We want to understand your personal interest in Anthropic without mediation through an AI system, and we also want to evaluate your non-AI-assisted communication skills. Please indicate ‘Yes’ if you have read and agree.

— Anthropic Job Application Form (via Simon Willison)

Published: Feb 03, 2025

Author: Alcides Fonseca

Language: English

Tik tok ban, Shoes and Public Health

The TikTok ban

TikTok was banned in the US The Congress and the previous president banned TikTok in the US with effect on January 19th unless its owner, Chinese ByteDance, sells the app to a non-Chinese company. The current president gave it a few more weeks, but it’s still an ongoing issue.

The reason for the ban is homeland security: Chinese entities can use TikTok usage data and content to gain access to private data on US citizens, as well as use personal content for blackmailing and influencing the public perspective. They do have a point about security, but should we also ban Facebook, Instagram, Google and others for their possible control of Europeans? We need to fight for our technological independence.

Public Health

However, I would agree with the ban if the reason was Public Health. I see the effect TikTok and other short-form media (Reels and Shorts) taking over media consumption, even in my age group. This results in people (even more pronounced in children) having less patience and unable to concentrate for a long time without some form of stimuli.

I don’t mean to sound like an old man, but this is becoming an issue of Public Health. And if you don’t want your government to shut down “addictive apps”, we must self-regulate its use first.

Shoes

A few decades ago, people used to walk bare naked. If you were a little better off, you’d maybe wear them to church, but not on a daily basis. In 1928, Portugal forbid walking barefoot in its two main cities. The cost of shoes, even for poor people, would be nothing compared to the cost of losing a foot or a leg from an untreated infection.

Think of the case, which is quite common, of a humble head of a family who injured his foot with a piece of glass, a stone, a tack, or in some other way, and who, as a result, contracted an infection that resulted in his death. Evaluate how that family will live, whose only support has disappeared thanks to lack of foresight, which is the result of a bad habit. (Government ad to promote the use of shoes, translated from an old style of Portuguese)

More recently, the Covid curfews implemented in many countries were a similar measure: taking away people’s freedom allowed the health of the population to improve. And these are not democratic decisions. They are mostly based on a technical analysis.

Now, there are technical reports that TikTok is affecting Public Health. It is the first time the new generation has more difficulties learning than the previous one. Mental health is at its worst and we need to act upon it.

Conclusion

I don’t have TikTok installed. I don’t watch reels or shorts. I try to watch and read long-form content. I teach my students that you need a larger-than-average attention span as a programmer. You should train it. And we should encourage the next generation to appreciate long-form.

Published: Jan 21, 2025

Author: Alcides Fonseca

Language: English

Projeto de Regulamento do Emprego Científico em Contexto Não Académico da FCT

Está em consulta pública o Projeto de Regulamento do Emprego Científico em Contexto Não Académico da FCT. Eu fiz a minha parte e enviei o meu feedback sobre a proposta:

1. A candidatura é feita pelos possíveis investigadores doutorados. Isto não faz sentido: a candidatura apoia as entidades (privadas ou públicas) e portanto estas deveriam ser as candidatas. Na situação actual mostra um compromisso maior realizar a candidatura do que escrever apenas uma carta de apoio (proposta actual). Mas na realidade, a entidade deveria-se candidatar ao apoio para a vaga, independentemente da pessoa que for ocupar a vaga (nem submetendo o currículo). É que estes resultados demoram meses a sair, e no entanto os candidatos actuais de topo acabam por arranjar outras alternativas.

2. Estão excluídos os candidatos que já tenham um contrato sem termo. Ora uma pessoa que para financiar o seu doutoramento tenha conseguido uma posição de técnico sem termo, termina o doutoramento e não pode concorrer a estes apoios. É uma restrição completamente desnecessária.

Published: Jan 16, 2025

Author: Alcides Fonseca

Language: Português

Choose your editor font, tournament style

Coding Font is a web-based tournament game that allows you to select your favorite programming font.

Mine is Inconsolata, which I have used for years in macOS’s Terminal.app), but I’m not sure I ever used in my editor. My VSCode is set to use “Menlo, Monaco, ‘Courier New’, monospace”.

Published: Jan 14, 2025

Author: Alcides Fonseca

Language: English

Justin Trudeau on Elections

Trudeau, who is now answering questions from reporters, said his one regret of his premiership has been his failure to introduce electoral reform.
In particular, he said he wanted Canadians to be able to choose their second and third choices in elections.
He said this would have helped reduce polarisation in society and ensure “common ground” between political parties.

— BBC: Justin Trudeau resigns as Canadian prime minister

As the extreme right and left gain more power in Europe, we should be considering voting in more than 1 party at a time.

Published: Jan 06, 2025

Author: Alcides Fonseca

Language: English

2024 in Music

Throughout December, Spotify promotes their Wrapped feature, where users can share their music stats. Despite being an early Spotify adopter (back when I lived in Sweden, and knew all Swedish ads by heart, without understanding a word), I’ve been abandoning cloud-based solutions over the last decade.

Audio and Scrobbling Setup

For Music, my setup relies on a Synology NAS at home, running Plex for all my movies, tv shows and music. I buy songs from artists directly if possible, otherwise I buy DRM-free versions on iTunes Store, which Apple is trying to hide more and more in their (awful) desktop Music.app.

For tracking, I’ve been a Last.fm user since 2006, with a few gaps in my scrobbling due to using Apple Music or Spotify. But, when configured, Plex scrobbles all my plays (regardless of being mobile, desktop or CarPlay) and this is the first year I can properly share all my stats, wrapped-style.

One of the caveats in this report is that Plex shuffle is not really random. And I don’t mean the perceived randomness, but rather that Plex by default is more likely to include highly-rated songs when shuffling artists or playlists, creating a bubble effect. For 2025, I’ve disabled this feature to try and identify the differences. Personally, I would like this setting to be enabled when playing my whole library, but disabled for playlists and artists, as my whole library has many albums just for the sake of completeness.

2024 Wrapped

I guess my 2024’s report was not surprising. Early in the year I found out about D’artagnan’s two English songs: C’est la vie (a translation of their German original) and We’re Gonna be Drinking, a pub-style Celtic rock shanty.

I’ve been a long-time fan of the Metal Opera super-group Avantasia (And got to see them live in Madrid!) with their latest album A Paranormal Evening with the Moonflow Society, which follows the style of the previous one (they frequently do album trilogies, like my favorite fantasy authors).

And despite being released halfway through November, Linkin Park’s first album with Emily Armstrong From Zero became my daily driver until today. Heavy is the Crown is a complete banger, and The Emptiness Machine, Over Each Other and Two Faced round up the playlist. Now, a lot of folks think they should have started a new band, but I actually think this more pre-meteora style than the last Chester-featured albums. Several bands go through multiple vocalists (most notably Nightwish) when it’s a project led by another musician (Shinoda in this case). Bands are just like companies, they have a life outside of their members, and Theseus’s Ships can happen!

Symphonic and Power Metal artists are a big chunk of my Scrobbles: Avantasia, Amaranthe, Kamelot, Arion, Ad Infinitum, Ayreon, Dynasty, Epic, Aina, Exit Eden, DragonForce and so on. After Amaranthe being the top band in 2023, they have kept in the top-played rotation, joined by DragonForce’s Doomsday Party and Kamelot’s Under Grey Skies ballad.

Ad Infinitum was my favorite newly-discovered band these last couple of years. All three opera-style albums are powerful and are as catchy as pleasant to listen to while working. Amaranthe kept the good pace coming with The Catalyst, of which Damnation Flame is a fantastic demo (pun intended), keeping the fast-paced and symphonic style post Jake’s departure.

This was a fun post to write, and let’s see what 2025 brings us (really excited for Avantasia’s Here Be Dragons).

Published: Jan 02, 2025

Author: Alcides Fonseca

Language: English

Sobre as novas propostas do Estatuto da Carreira Científica

O Governo, PS e BE propuseram três versões muito idênticas de um novo estatuto da carreira de investigação científica.

Em linhas gerais, as três propostas pretendem tornar a carreira de investigador alinhada com a de docente do superior:

A contratação é feita por concurso internacional (com júri preferencialmente estrangeiro)
Tenure de 5 anos para Auxiliar e 3 para Principais e Coordenadores (Para docentes é apenas um ano para estes dois níveis).
Necessidade de Agregação para Coordenadores (mas com a ressalva que se vier do estrangeiro, pode fazê-la durante o período experimental)
Existência de Investigador Convidado
Avaliação em períodos de 3 anos (ou entre 3 e 5, segundo o BE)
Subida em escalões igual à de docentes (6 anos de avaliação máxima)

E outras novidades:

Existência de Investigador Doutorando, permitindo eventualmente acabar com as bolsas de investigação ilegais !
Carga lectiva até 4 horas (opcionais segundo o BE, decidido pelas instituições nas outras duas versões).
Permite alguma mobilidade entre a carreira de docência e investigação, mantendo o ordenado original.

Análise

Alinhamento com carreira docente

O alinhamento com a carreira docente parece-me um ponto positivo, em geral, visto que a diferença entre as duas carreiras se distinguem pelo peso da componente lectiva. Honestamente, parece-me um esforço desnecessário fazer uma carreira separada, quando bastava propor uma carreira única, onde a componente lectiva podia ser variável entre 0 a 100%, sendo a avaliação proporcional a essa fatia. Leis mais simples perduram mais tempo.

Infelizmente os graves problemas que existem com a carreira docente são transpostos para a carreira de investigação:

O período experimental é demasiado elevado. Quando comparado com o privado e outras áreas da função pública, os contratos permanentes são atribuídos nos primeiros 2-3 anos ou mesmo na celebração do contrato. Porque não podem as universidades e centros a liberdade de oferecer à tenure imediata a candidatos de excelência e CV apropriado.
Dá-se importância à agregação/habilitação. Embora seja mais fácil contratar investigadores de entidades estrangeiras onde não exista este título, é exigido na mesma aos nacionais que estejam na indústria. Devíamos descartar a necessidade de habilitação para qualquer posição: o currículo científico já é avaliado na totalidade pelo júri. A existência deste requisito não é justificado, senão para alinhar com a docência (onde também não encontro justificação).
Investigadores Auxiliares não gerem projectos. A separação entre investigador auxiliar e principal baseia-se no princípio que os principais gerem projectos. Ora a realidade é que os investigadores auxiliares gerem projectos (desde exploratórios até aos projectos de 3 anos FCT), criando uma situação impossível. Nesse caso, bastava serem investigadores principais de um projecto financiado para progredirem automaticamente para a categoria de Investigador Principal.
Os investigadores não doutorados têm de ser doutorandos. Não existe enquadramento para investigadores que não tenham nem queiram ter doutoramento. Passaram por mim já alguns jovens que queriam ser investigadores por alguns anos sem tirarem doutoramento. Estão satisfeitos com a formação de mestrado e estão a ser produtivos (com vários artigos publicados como primeiro autor). Porquê exigir que todos tenham doutoramento?
Avaliação de 3 em 3 (ou 5 em 5) anos é insuficiente. Tal como na docência, um período experimental de 5 anos (ou limite de 3 anos para convidados) torna uma avaliação ao final de 3 anos insuficiente para alterar o curso. Devemos promover avaliações ao final de semestres ou pelo menos anuais para docentes/investigadores convidados ou em período experimental. Assim, há de facto feedback útil para melhorarem.

Published: Dec 23, 2024

Author: Alcides Fonseca

Language: Português

Afinal já existia um LLM Português!

Recentemente o Primeiro Ministro anunciou na WebSummit uma contratação directa para o desenvolvimento de um LLM Português. Falou no IST e na Nova, omitindo o resto do consórcio em IA Responsável que tem trabalhado em vários aspectos relevantes (fairness, sustentabilidade ambiental e fiabilidade).

Curiosamente, um grupo de investigação do meu departamento tem já feito trabalho na área, tendo lançado dois modelos (Albertina e Gervásio) em Português Europeu. Deu agora uma entrevista muito educativa ao Dinheiro Vivo:

Por exemplo, um banco quer apenas ter um assistente virtual para os seus clientes, que fale acerca de depósitos, levantamentos, etc. Não vai querer que o seu chatbot faça tradução automática, sumarização, dê a biografia do Friedrich Nietzsche e faça piadas.

Um LLM não é um chatbot. Um LLM é, numa analogia que as pessoas compreendem, uma espécie de um motor e a partir de um motor nós podemos fazer diferentes modelos de carros. O LLM é aquilo sobre o qual se pode desenvolver diferentes aplicações, uma das quais é o chatbot, outra, por exemplo, a tradução automática, ou o diagnóstico médico, etc.

Então, a nossa proposta nesse artigo de opinião, que saiu no Público em fevereiro de 2023, é que o que precisamos de uma IA aberta e de desenvolvimento de LLMs em código aberto, licença aberta e distribuição aberta para que outros atores e outras organizações, seja da investigação, seja da administração pública, seja do setor da inovação, possam eles próprios construir as suas propostas de valor e tirar partido desses LLMs sem estarem dependentes do fornecimento desses serviços, das big techs. Portanto, quanto mais houver uma oferta cada vez mais variada, mais se reduz o risco de dependência de um pequeno oligopólico que nos fornece esses serviços.

— António Branco @ Dinheiro Vivo

Published: Dec 02, 2024

Author: Alcides Fonseca

Language: Português

Gwern.net on the scalability of AI

I’ve been following Gwern.net for a long time, and I was really curious when I found out this interview that kept him anonymous.

How does an hearing-impaired, introvert American grow up to be a polymath and a scholar?

The wikipedia editor, now full-time writer without a salary got a lot farther by writing a lot and harder, by being a lot smarter, by going down rabbitrabbitholesholes by fourteen, they placed him in charge of wikipedia edits 🎶

Published: Nov 17, 2024

Author: Alcides Fonseca

Language: English

Wanted: an elegant solution for breadth-first iteration of a tree structure.

While working on the enumerative feature of GeneticEngine, I wanted to recursively explore all the instances of a grammar, ideally using cache.

My first solution ended up being DFS as I used Python generators to yield all possible options on sum types and recursively iterating through all arguments in product types.

I’ve written this proof of concept pt implement breadth-first iteration in a generic tree structure that yields the right order. However, I find the extra argument a bit ugly, and I would like a more elegant solution. If you happen to know it, I’m all ears!

Published: Nov 07, 2024

Author: Alcides Fonseca

Language: English

Hidden Bug: Python class as definition

While preparing geneticengine to participate in the SRBench 2024 run, I was getting None out of a constructor:



def PredictorWrapper(BaseEstimator):
    def __init__(self, ind: tuple[str, str]):
        self.ind = ind

    def predict(self, X):
        _, data = self.prepare_inputs(X)
        return forward_dataset(self.ind⁰, data)

    def to_sympy(self):
        return self.ind¹


	def mk_estimator(x):
    print(f“x={x}”)
    p = PredictorWrapper(x)
    print(f“p={p}”)
    return p

Outputting:

x=('np.log(np.power(dataset[:,1], 10.0))', 'log((x1 ** 10.0))')
p=None

Have you found the bug? It took me probably around 1 hour, mainly because I trusted myself too much (and there many other things going on in the code). If you still haven’t found the bug, check the first 3 characters of the code snippet. A function with only other functions inside returns None.

Published: Oct 13, 2024

Author: Alcides Fonseca

Language: English

LG to add Ads to Screensavers

And it’s a bold move on LG’s part, considering most folks just want to see their family photos or some calming art while they’re not actively binge-watching. Even if you can turn it off, the default setting is a bit of a slap in the face for anyone who thought they’d bought a premium product free from such annoyances.

This shift towards monetizing every idle moment on your TV is a slippery slope. It’s not just about selling hardware anymore; it’s about squeezing every last cent from customers, and brokering the data to get more revenue. And while LG claims this will boost brand awareness, one has to wonder if viewers will just tune out entirely (or worse, switch to a platform that respects their downtime). As the lines blur between content and advertising, it feels like we’re all just one step closer to a world where even our screen savers are working overtime.

— LG adds Ads to Screensavers by Rui Carmo

Someone should send a copy of Black Mirror’s Fifteen Million Merits down to LG leadership. As long as profit is all these corporations care about, and as long as the users don’t stand up to this, it’s all going downhill from here.

Published: Sep 26, 2024

Author: Alcides Fonseca

Language: English

TIL: Flags for macOS's open command

open -e opens the item in TextEdit. I basically never want this, and it’s fascinating that it’s built in.

open -t opens in your default text editor — for me it’s BBEdit; but whatever you have configured will do. Note: this is not $EDITOR but LaunchServices: a macOS-ism.

open -F opens a “fresh” version of the app, not doing window or document restoration. Handy if it’s borked!

open -R reveals it in the Finder instead of opening it.

open -f reads input from stdin and opens the results in your text editor (weird but… cool, I think).

—Read the Manual: open — Chris Krycho

Published: Sep 22, 2024

Author: Alcides Fonseca

Language: English

The dict: protocol

Custom URL schemes are the de facto way to launch other apps on iOS and Android ecosystems. Today, while reading about the dict: url scheme, I assumed it was a custom url scheme for the default dictionary app on iOS.

But I was wrong:

The (informal) standard was published in 1997 but has kept a relatively low profile since then. You can understand why it was invented – in an age of low-size disk drives and expensive software, looking up data over a dedicated protocol seems like a nifty idea.

http:, ftp:, and … dict:? — Terence Eden

It now makes sense. This was created in an era where protocols dominated the internet. IRC, SMTP, POP3, HTTP, FTP. Nowadays, it’s all about startups with their own internal protocol that moves too fast, for any standardization to happen. Even big names like Google and Facebook care less about open protocols than they used to.

It’s clear that we need more open-source efforts and funding.

Published: Sep 18, 2024

Author: Alcides Fonseca

Language: English