AI is a very strange industry. Basically, as far as I can see, it is an answer in search of a question. I’m not saying it’s not useful, or at least helpful, but is it multi-trillion dollar useful? That is the question.
One area where the AI companies have been focusing is on writing. After all, they are working with Large Language Models (LLMs), computer systems designed to absorb languages and their structures and uses in order to "comprehend" human communication and be able to communicate with us.
So while
humans seem to write more and more in shorthand with emojis and acronyms, it’s
very important for AI companies that their systems actually write (or speak)
clear, contemporary languages. And they are not very particular about how they
achieve that.
Probably
most authors (and many readers) have heard of the big settlement that Anthropic
agreed to with authors of books that they used for training their systems. The
law suit was settled for $1.5 billion
dollars, but that turns out to be only part of the story. The judge has now
unsealed some of the documents from the case, and it turns out that
Anthropic’s aim was to scan all the books
in the world to train their systems. The effort was called Project Panama. And
it was secret. The company, along with the other AI giants, didn’t see it as
practical to get the authors permission, so they acquired troves of books,
separated the pages, and scanned them to feed their voracious LLMs hoping they'd learn to read and, perhaps, write as well as the authors of the books.
At some point that became too slow and cumbersome and one of the company’s
founders downloaded and shared a huge number of pirated books from an online
library called LibGen. The newly-released court papers make it clear that he was well aware
that this was a copyright infringement.
It was this direct copyright infringement that lead to the settlement. The deeper issue is about the legality and the morality of AI companies sucking all these books into their LLMs. I guess the moral arguments depend on which side of the fence you sit (as they usually do) and are unlikely to make much impact if they don’t lead to actual enforceable laws. The legal arguments are tricky, and so far no court has found against the companies. Their argument is that what they are doing is “fair use”.
Here’s what Google's AI
thinks “fair use” is:
Four Factors of Fair Use: Courts determine fair
use by analyzing the purpose (e.g., nonprofit educational vs. commercial), the
nature of the copyrighted work, the amount used relative to the whole, and the
effect on the work's market value.
Transformative Purpose: Using material in a way
that adds new expression or meaning, rather than just replacing the original,
strongly favors fair use.
Common Examples: Quoting a book in a review,
using clips in news reporting, parodying a song, or using small portions of
video for commentary.
Not a Rule, but a Defense: There are no strict
legal limits (e.g., "under 30 seconds" is not a rule), and only a
court can ultimately determine if a use is fair.
The AI
companies claim that their work is “transformative” and “educational”.
Anthropic also said that it hadn’t used the actual materials “read” for
financial gain. (Hmm. Why do it then?) The settlement was based on the fact that the books were
downloaded from an illegal site (in terms of copyright laws) and that the
company was well aware of the fact that they were doing that. The use those
pirated books were put to was not the issue. Other cases carry on in various
courts against various companies, but my bet is that they won’t end with the judges
ruling against the companies.
So where does all this go? Can LLMs write books? Of course they can. You can buy as many as you want on Kindle or another electronic or print-on-demand site. I certainly can’t tell if a random book from Amazon is written by AI or an unknown, mediocre, human author. If people keep buying them, they must find them readable. We like to think AI has nothing original to say and that may be true, but the same criticism applies to some human authors. Some publishers are requiring that authors of manuscripts sign that they haven’t used AI. The question is to do what? To write the book? To do research? To check geography? Publishers have famously released books with large parts lifted from other works. I can’t see them being successful as AI gatekeepers. And if a LLM comes up with a really good book (Borel's monkeys typing the works of Shakespeare comes to mind), is it wrong to publish that? If so why?
More questions than answers, I guess. In the meanwhile, AI systems have their own social media sites now. So they can try out their new writing skills on each other. Cut out the middle man so to speak. No humans allowed as members. See: A Social Network for AI Bots Only. Maybe we humans should start writing novels for them. They certainly have the disposable income.
No comments:
Post a Comment