In January of 2023, I posted an episode of Sonitotum with Matthew Wayne Selznick in which I shared my thoughts as to whether or not creators — especially writers — should fear large language model and generative artificial intelligence tools such as ChatGPT, Stable Diffusion, and the like.
A lot has happened in that arena in the last six months.
I still don’t think creators necessarily need to fear generative, large language model AI tools. At least not directly. At least not yet.
In recent weeks, as I learned more about not just how these tools work, but how they were trained, and with what, I’ve come to recognize that the question is not whether or not we should fear AI.
The question is whether we should use LLM tools at all.
It’s About Ethics
It didn’t take a whole lot of thought to work out what I think is right and wrong when it comes to this stuff.
It’s taken a little longer to put it into words.
My position as a creator and creative services provider is spelled out in my Statement of Ethics and Principles. Here it is:
- Regarding the use of Large Language Model and Generative Artificial Intelligence, including tools, services, and software that employ such technologies as a service:
- To the best of my knowledge, I do not publish or otherwise distribute, or attribute to myself, any work created in whole or in part by any AI tool or service trained using copyrighted materials accessed without the express permission of the copyright holders.
- In the creation of my own work or works for hire / in the service of others:
- I never knowingly use such a tool, service, or software application trained using copyrighted materials accessed without the expressed permission of the copyright holders.
- To the best of my knowledge, I restrict my use of LLM / generative AI tools and services to self-hosted, locally installed tools and services trained either on public domain assets or assets in which I own the copyright or possess an applicable license to use. For brevity’s sake and in this context, I refer to this as “Ethical AI.”
- I clearly and plainly disclose the use of Ethical AI tools and services when they are used to aid in the creation of my own publicly distributed and or published works.
Slightly Shorter Version
If tool or service depends on an AI tool or service that was trained using copyrighted assets (text, sound, video, images, etc.) obtained without the expressed authorization of the copyright holder… I won’t knowingly use it.
When I do use an AI tool or service, it will be a locally installed tool or service trained on public domain (not under copyright) assets, or on my own assets, or assets for which I have a license to use for that purpose.
When (and if — and it’s a big, big if) I use an AI tool or service (under the previously stated constraints) to aid in the creation of my own work, I will let you know about it.
How I’m Using, and Want to Use, AI
In practice, as I write this here at the beginning of June 2023, my use of AI tools and services is minimal:
- I use a locally installed version of Whisper, the speech-to-text transcription AI, to create transcripts and captioning from podcast episodes and videos created for myself and for clients.
- I use the chatbot built into the Edge web browser to do quick calculations, conversions, and weather forecasts.
But guess what? That Edge AI assistant is driven by OpenAi’s GPT-3 AI, which I know was trained, in part, on the Books1 and Books2 datasets.
Books1 is very likely BookCorpus (a huge dataset of thousands of books pulled without authorization or compensation from the Smashwords catalog) re-born, and the pedigree of Books2 is similarly suspect. (Here’s a great link explaining how most LLMs use ill-gotten, unauthorized works under copyright.)
So I have to stop using the handy tool built into the Edge browser (likewise the Google Chrome version).
Which bums me out, because it’s damn convenient. But the high road wouldn’t be the high road if it didn’t strain your muscles a little as you ascend.
I’m actively researching locally hosted LLM AI tools that would be trained solely on public domain / Creative Commons licensed / open-source materials, as well as on my own writing.
My goal? Analyzing my own work as part of my quest to create my perfect fiction writing software.
I’m a long way from accomplishing anything on that front, though. The deeper I look, the more I discover that many LLMs are trained, in part, using databases like BookCorpus, Books1, Books2, and the like.
Legal? Maybe. Ethical? You Decide.
Many people, smarter and more educated that I, argue that the use of works under copyright to train AI models falls under the fair use doctrine and can / will be defended as such as this mess makes its way through the courts.
They may be right.
After all, the end result of a ChatGPT query is not all that different from the cut-up works of, say, William S. Burroughs, who famously cut up printed sentences and phrases of his work and the works of others in order to rearrange the pieces to create a new work.
Burroughs published and materially benefited from those works, much the same way a writer might publish and materially benefit from a work created with the assistance of one of the tools and services powered by ChatGPT today.
Allow me to both pose a question that, given the pre-Internet era in which Burrough did his work, I’m also going to assume I know the answer:
Did Burroughs pay for the works of others that he cut up?
Unless he only used books he stole or borrowed from the library or friends and never gave back… Yes.
Conversely, the rights holders of the works used to train most large language models have not, to my knowledge, been compensated for the use of those works.
And now, companies like OpenAI benefit. Their entire business model would not be possible… or, to be more conservative, their end product would not be as effective… without the unauthorized use of works from thousands of uncompensated authors.
Maybe the courts will, in five or ten or fifteen years, decide that’s okay. That it doesn’t run afoul of copyright law.
But is it right?
I’d love to hear what you decide. Please leave a comment!
Me? I’ve made my decision.
*Feature Image by Gerd Altmann from Pixabay