LLMs have the same right as humans when it comes to copyright and learning by Alcides Fonseca

As I stated before, the boundary of what is copyright infringement when it comes to machine training is quite blurred.

If you see LLMs as their own entities (I don’t, but I’m afraid Nobel laureate Geoffrey Hinton does), they have the same right to learn as humans. They just happen to have photographic (literary?) memory. Is it their fault?

On the other hand, you look at LLMs as a form of compression. Lossy, yes, but a compression algorithm nevertheless. In that case, if you zip a book and unzip it, even with a few faults, it’s the same book you copied.

Legislation will have to decide on this sooner or latter.

William Haskell Alsup, of Oracle vs Google fame, ruled that buying and scanning books to train LLMs was legal. He also decided that downloading (pirated by a 3rd party) ebooks was not fine.

Regardless of my own position, I believe every government should create a task force to think about this, including experts from different fields. Last time something like this (peer-to-peer, Napster, The Pirate Bay) happened, legislation took too long to arise. Now, this are moving at an ever faster pace. And I’m afraid our legal systems are not flexible and agile enough to adapt.