by Bill Franks
On the negative side of the recent hype (hysteria?) around generative AI has been a focus on the underlying legal implications of this technology. Here, I’ll dive into one specific issue that has been the subject of intense debate. Namely, is it fair for generative models to be trained on data the model creator does not have rights to? Further, is an AI process doing anything different than we do ourselves when it uses such content?
How Humans Learn and Use Information
As we go through life, we learn by taking in and processing information about what we experience. This includes everything we read, hear, see, touch, or smell. We then compile these observations into opinions, likes & dislikes, and just general “stuff we know.” We all come across a lot of material that we do not own. Most people do their best to compartmentalize what they’ve learned from such sources, but over time the information tends to get absorbed into the general bucket of “things they know.”
As part of our education, we are explicitly encouraged to study examples of others’ writing, music, and art to understand various approaches and styles. Painters, for example, study a broad range of other painters, both current and historical. They then dive more deeply into the techniques and styles of their favorite painters. As painters begin creating their own paintings, they borrow from their favorite techniques and sometimes even attempt to paint something similar in style to another painter. Is this plagiarism and a violation of copyright? Or is it simply consolidating everything one knows to give one’s own best effort?
The same goes for music. Musicians will study a wide range of music styles and musicians. Over time, they will focus on a certain style and build on what they liked best from others. While this evolution has yielded amazing new songs and even new genres, is a musician stealing from all the other musicians they’ve listened to over time or simply creating their own best music? Is it possible to parse out what part of any given song was influenced by some other song or musician deep in a person’s brain? Not really.
There Are Lines Around Fair Use of Others’ Material
The prior comments could lead you to believe that I am suggesting that anything goes. That’s not the case. When rap music first hit the scene, “sampling” was a big part of it. There were lawsuits in cases where a rap group took an exact snippet of someone else’s song and used it as a center point of their own. This was deemed a step too far. The case of Vanilla Ice’s song “Ice Ice Baby” is one of the most famous. He had sampled the song “Under Pressure” by David Bowie and Queen and was forced via legal proceedings to pay them royalties. As I write this, Google is holding back the release of its text-to-music generator because of today’s unclear legal environment.
Let’s assume that it is OK to learn from others, and even to mimic them to a large degree, but not to explicitly take what they have done as-is and put it into our own work without permission and attribution. Even then, at what point does the line get crossed? I know from my writing experience that different publishers vary in how much is permissible to quote and include from someone else before running afoul of fair use rights. I’ve heard of limits for text ranging from 50 words to several hundred words being within the fair use range. So, even with human output, there is a degree of subjectivity as to what’s OK. There have been cases where, for example, code generated by an AI process has excerpts that clearly came from a specific source. It seems fair to say we need generative AI algorithms to stay within the same fair use parameters that we follow. But is it fair to say they require a different standard?
Is Generative AI Really Doing Anything We Don’t Do?
If I am a painter or photographer, each image I create will be necessarily guided by what I’ve learned from all the images and techniques I’ve seen in the past, whether mine or someone else’s, whether copyrighted or not. There is no way to ignore everything I’ve seen in the past as I create today’s image. Similarly, generative AI is trained on images and then creates an amalgamation of all it has learned as it generates new images.
Even the most diligent humans can only study so many documents or images in their life. Generative AI, however, can take in virtually everything ever created. As such, it could be argued that we generate our own art in a way that borrows much more heavily from each input we’ve trained ourselves on than generative AI does. This is because generative AI has been trained on orders of magnitude more inputs than us. Can it, therefore, be argued that generative AI is “stealing less” from others than we do based on that broader range of inputs?
In the end, is it any worse for generative AI to create something in the style of Van Gogh than it is for me to do it? Is it any worse for AI to create a song in the style of Snoop Dogg than it is for me to do it? There are plenty of people who make a living mimicking popular artists. As long as they create their art from scratch, it is OK. If generative AI creates its output from scratch, shouldn’t it be OK too? No doubt, this question will be argued in both legal courts and the court of public opinion for years to come.
Bill Franks is an internationally recognized chief analytics officer, thought leader, speaker, consultant, and author focused on analytics and data science. https://www.linkedin.com/in/billfranksga/
This blog was published on Linked-in by Bill Franks on March 21, 2023. Originally published by the International Institute for Analytics. https://www.linkedin.com/pulse/exploring-legal-implications-generative-ai-fair-use-bill-franks/?trackingId=eewkEMlcRjuW%2BDfTuOo2uQ%3D%3D