Thursday, February 26, 2026

Who (or what) will write the next best seller?

Michael - Alternate Thursdays 

AI is a very strange industry. Basically, as far as I can see, it is an answer in search of a question. I’m not saying it’s not useful, or at least helpful, but is it multi-trillion dollar useful? That is the question.


One area where the AI companies have been focusing is on writing. After all, they are working with Large Language Models (LLMs), computer systems designed to absorb languages and their structures and uses in order to "comprehend" human communication and be able to communicate with us.

So while humans seem to write more and more in shorthand with emojis and acronyms, it’s very important for AI companies that their systems actually write (or speak) clear, contemporary languages. And they are not very particular about how they achieve that.

Probably most authors (and many readers) have heard of the big settlement that Anthropic agreed to with authors of books that they used for training their systems. The law suit was settled for $1.5 billion dollars, but that turns out to be only part of the story. The judge has now unsealed some of the documents from the case, and it turns out that Anthropic’s aim was to scan all the books in the world to train their systems. The effort was called Project Panama. And it was secret. The company, along with the other AI giants, didn’t see it as practical to get the authors permission, so they acquired troves of books, separated the pages, and scanned them to feed their voracious LLMs hoping they'd learn to read and, perhaps, write as well as the authors of the books. At some point that became too slow and cumbersome and one of the company’s founders downloaded and shared a huge number of pirated books from an online library called LibGen. The newly-released court papers make it clear that he was well aware that this was a copyright infringement.

It was this direct copyright infringement that lead to the settlement. The deeper issue is about the legality and the morality of AI companies sucking all these books into their LLMs. I guess the moral arguments depend on which side of the fence you sit (as they usually do) and are unlikely to make much impact if they don’t lead to actual enforceable laws. The legal arguments are tricky, and so far no court has found against the companies. Their argument is that what they are doing is “fair use”.

Here’s what Google's AI thinks “fair use” is:

Four Factors of Fair Use: Courts determine fair use by analyzing the purpose (e.g., nonprofit educational vs. commercial), the nature of the copyrighted work, the amount used relative to the whole, and the effect on the work's market value.

Transformative Purpose: Using material in a way that adds new expression or meaning, rather than just replacing the original, strongly favors fair use.

Common Examples: Quoting a book in a review, using clips in news reporting, parodying a song, or using small portions of video for commentary.

Not a Rule, but a Defense: There are no strict legal limits (e.g., "under 30 seconds" is not a rule), and only a court can ultimately determine if a use is fair.

The AI companies claim that their work is “transformative” and “educational”. Anthropic also said that it hadn’t used the actual materials “read” for financial gain. (Hmm. Why do it then?) The settlement was based on the fact that the books were downloaded from an illegal site (in terms of copyright laws) and that the company was well aware of the fact that they were doing that. The use those pirated books were put to was not the issue. Other cases carry on in various courts against various companies, but my bet is that they won’t end with the judges ruling against the companies.


So where does all this go? Can LLMs write books? Of course they can. You can buy as many as you want on Kindle or another electronic or print-on-demand site. I certainly can’t tell if a random book from Amazon is written by AI or an unknown, mediocre, human author. If people keep buying them, they must find them readable. We like to think AI has nothing original to say and that may be true, but the same criticism applies to some human authors. Some publishers are requiring that authors of manuscripts sign that they haven’t used AI. The question is to do what? To write the book? To do research? To check geography? Publishers have famously released books with large parts lifted from other works. I can’t see them being successful as AI gatekeepers. And if a LLM comes up with a really good book (Borel's monkeys typing the works of Shakespeare comes to mind), is it wrong to publish that? If so why?

More questions than answers, I guess. In the meanwhile, AI systems have their own social media sites now. So they can try out their new writing skills on each other. Cut out the middle man so to speak. No humans allowed as members. See: A Social Network for AI Bots Only. Maybe we humans should start writing novels for them. They certainly have the disposable income.

No comments:

Post a Comment