Gorillaz’s Damon Albarn and Kate Bush are among 1000 artists who launched a silent album (on Spotify no less) in protest against the UK government allowing AIs to be trained using copyright-protected work without permission.
This protest highlights the tension between creating valuable tools and devaluing human content.
The value of AI
ChatGPT (and even Apple Intelligence) is trained on information publicly available on the internet, data from (consenting) third parties, and information provided by their employees or contractors. Over the last year and a half, people have been amazed at what ChatGPT has been able to do. Although the quality of its work fluctuates as new data/methods are being updated, ChatGPT and similar tools are being used to create value. But at what cost?
Unconsciously, The Algorithm has become more and more important in our lives. From Instagram and TikTok reels, X and Facebook timelines, Spotify, YouTube, or Netflix’s recommendations, the decision of what we see is no longer ours. And we are also not delegating our choices to a human editor (as is the case of the old boring telly or radio channels). Those decisions are being made by black-box algorithms that are hidden in the shadows.
The EU AI law, which I blogged about before, only requires explainability for applications in high-risk domains. Entertainment can hardly be thought of as high-risk. However, I would argue that given the importance of online content consumption in today’s society, it should be considered high-risk. One example is the perceived power of Twitter/X in political elections.
On the other hand, educational purposes are considered fair use in most countries (which is certainly true here in Portugal). What is the difference between fair use for human and machine learning? As we become increasingly dependent on AI for our daily tasks – I use Siri and Reminders to augment my memory and recalling ability — we become de facto cyborgs. Is there a difference between human and machine learning for education?
The devalue of Human content
In 2017, Spotify introduced the Perfect Fit Content program, encouraging editors to include songs purposely designed to fit a given mood in their playlists. Liz Pelly goes into all the details in her piece The Ghosts in the Machine. Some human, some AI, several companies have been starting to produce music à lá carte for Spotify.
According to The Dark Side of Spotify, Spotify investors are also investing in these companies (phantom artists on the platform, which use random names with no online presence other than inside the platforms) and promoting the use of AI to beat the algorithm. While this vertical integration might be cause for considering anti-trust or monopoly issues, the fact is that Netflix has been successful in expanding to content production (as has Disney been successful in expanding into content distribution).
AIs are much more productive in generating music than humans. Which is not necessarily the same as being successful in producing music a) that humans enjoy or b) that is commercially viable. The Musical Turing Test is almost solved, addressing a). Commercial viability is even easier to address. Because the cost of producing AI music is so low compared to the human equivalent, AI companies can flood the market with millions of songs, letting the algorithm filter out the ones that do not work. In that scenario, human musicians are not just competing with each other for user’s attention but are now unable to be showcased to users without an explicit search. Additionally, AI can better cater to some audiences based on data extracted from these networks (remember Spotify’s investors also investing in AI music production companies?) than humans can, at least in large numbers.
And I’m aware AI can be a tool for musicians, but if AI can perform end-to-end music generation passing the Musical Turing Test, it becomes much more interesting from a commercial standpoint.
The only chance for musicians is to promote their own content outside of these platforms, abandoning the initial goal of Web 2.0, where anyone can create content on the web. They can, but it just won’t be discoverable in the ocean of AI-generated content. But this is a symptom of a more significant problem for the web.
I feel like the people who try to be positive – well, I don’t know what they’re doing. I’m a music producer and also a writer who also happens to make art/design her own album art. Thankfully, I also dance, which is going to be the one thing that saves me I feel. — PrettyLittleHateMaschine on AI music.
The quality of AIs depends on human
ChatGPT was primarily trained on internet-available content. So, its quality depends on what is available at a given time. If we stop collecting new information, we can assume its quality will remain unchanged. Still, it will not be helpful with new information, such as news updates or scientific discoveries. Its usefulness will be reduced.
On the other hand, if the quality of AIs increases — it’s more and more difficult to tell the difference between human and GPT-generated text — and it passes the Turing test, the content available online will be more and more AI-generated than human-generated, as it’s more economical to use AI to produce text, audio or even video.
Here, we consider what may happen to GPT-{n} once LLMs contribute much of the text found online. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear.
— AI models collapse when trained on recursively generated data
This recent Nature paper reports that LLMs perform worse when trained on LLM-generated content. Human content is now essential! LLM companies need high-quality human content to train their next-generation models, especially concerning novel knowledge. But econmics no longer work. Content is created once, consumed once, and used to generate millions of derivates for almost free. An author might publish a book, hoping to make the money for the time it took to write from the sum of all individual sales. However, AI companies will not buy the book at its production cost to train a model. Same for daily news. The human audience is still needed to make this work. And suppose everything is made available for free on the web. In that case, humans are making the same mistake that led to ChatGPT being in business without contributing to the original content sources.
The current Web is not enough.
Web 2.0 died and now the web happens more and more inside silos. Famously, Instragram does not allow for links outside its app. “Link in the bio” will be listed as the cause of death in Tim Berners Lee’s obituary. It goes against what the web was supposed to be. But today’s personal entertainment happens in silos (Instagram, Netflix, Disney+, etc…), not on the open web. Even Reddit communities have started blocking links to some websites, like X.
The web failed at microtransactions. Paying 10 cents for reading a well-written article was the original goal. Even with Paypal and Apple Pay, the model was only successful for large purchases, not pay-per-view. Imagine that you give Youtube your credit card, and it takes 1 euro for each hour watched. Once you have something for free, it is difficult for companies to make you pay for it.
As a business that moved from analog to digital almost completely, most news outlets have failed to change their economics and they are now struggeling financially. As the price of advertising online has decreased over the past years, they have switched to a subscription model, putting up paywalls with dubious outcomes.
The future of the Web
I foresee a web where high-quality human content is behind paywalls. While most of the web can be AI-generated and free, it will be ignored if high-quality content is available from trusted sources. Content will be signed and (possibly) encrypted using personal keys. These keys can be provided by the government, or other parties. For instance, every Portuguese citizen already has their keys inside our citizen cards, sometimes with professional attributes.
If you wanted to read the news, you can go to an online newspaper, where the content will be signed by a recognized journalist or editor. The body of the text can be encrypted but with a faster Apple Pay-like prompt, you can pay cents to read it. Even if the journalist published AI-generated content, they are liable for its content.
This proposal makes the web a more trustful place and somewhat addresses the economic problems of paying for content on the web. It requires payment processors to drop the minimum cost per transaction, which I believe is happening more and more. And as more and more garbage is published online, users will see the need to pay for high-quality content.
As for AI providers, they will now have to pay for content. And even if it is ridiculously cheap, there is a trace that they bought that information, useful when you want to prove in court that your content was used in training LLMs.
We might not get to this Web, but I hope some of this ideas help the web survide the tsunami of garbage content that is starting to flood our dear World Wide Web.