In November, I quit my job at Generative AI to campaign for the right of creators not to use their work for AI training without permission. I started Fairly Trained, a non-profit organization that certifies productive AI companies that obtain a license before training models on copyrighted works.
Mostly, I felt good about this decision — but there were a few times when I questioned it. Like when a major media company, while willing to defend its rights, told me it couldn't find a way to stop unfairly using genetically trained AI in other fields. Or whenever demos of the latest models receive unquestionable praise, despite how they're trained. Or, last week, with the publication of a series of articles about music AI company Suno that I think downplayed serious questions about the training data it uses.
Suno is an AI music production company with impressive text-to-song capabilities. I have nothing against Suno, with one exception: Combining various clues, it seems likely that her model has been trained on copyrighted works without the consent of the copyright holders.
What are these indications? Suno refuses to disclose the sources of its training data. In an interview with Rolling rock, one of its investors revealed that Suno had no deals with record labels “when the company started” (there's no indication that this has changed), that they invested in the company “with the full knowledge that record labels and music publishers could sue” and that the founders' lack of open hostility toward the music industry “doesn't mean they're not going to sue us.” And, although I've approached the company through two channels about being certified as Fairly Trained, so far they haven't taken me up on the offer, unlike the other 12 AI music companies we've certified as training their platforms fairly .
There is, of course, a chance that Suno will license their training data, and I sincerely hope I'm wrong. If they set the record straight, I'll be the first to loudly and regularly trumpet the company's fair training credentials.
But I'd like to see media coverage of companies like Suno give more weight to the question of what training data is being used. This is an existential issue for creators.
Editor's note: Suno's founders did not respond to requests for comment from Advertising sign about their training practices. Sources confirm that the company has not entered into licensing agreements with some of the most prominent music rights holders, including the three major record groups and the National Music Publishers' Association.
Limiting the discussion of Suno's training data to the fact that it is “decreasing[s] to reveal details' and not explicitly stating the possibility that Suno is using copyrighted music without permission means that readers may not be aware of the possibility of unfair exploitation of musicians' work by AI music companies. This should factor into our thoughts on which AI music companies to support.
If Suno is trained on unlicensed copyrighted music, this is likely the technological factor that sets it apart from other music AI products. The Rolling rock The article mentions some of the tough technical problems that Suno is solving—to do with tokens, audio sample rate, and more—but these are problems that other companies have solved. In fact, several competitors have models as capable as Suno's. The reason you don't see more models like Suno released to the public is that most AI music companies want to make sure their training data is licensed before releasing their products.
The context here is important. Some of the world's largest AI companies use untold amounts of unlicensed creators' work in order to train AI models that compete with those creators. There is understandably a great deal of public outcry over this large-scale scraping of copyrighted works from the creative community. This has led to a series of lawsuits, which Rolling rock states.
The fact that genetic AI is competing with human creators is something that AI companies prefer not to talk about. But it is undeniable. People are already listening to music from companies like Suno instead of Spotify, and generative AI listening will inevitably contribute to the music industry's revenue – and thus the income of human musicians – if the training data is permissionless.
Generative AI is a powerful technology that will likely bring many benefits. But if we support the exploitation of people's labor for education without permission, we are implicitly supporting the unfair destruction of the creative industries. Instead, we should support companies that take a fairer approach to training data.
And these companies exist. There are many – generally startups – who take a fairer approach, refusing to use copyrighted work without consent. License or use public domain data or outsource data or all of the above. In short, they work hard not to train unethically. At Fairly Trained, we've certified 12 of these companies in music AI. If you want to use AI music and care about copyright, you have options.
There is a possibility that Suno has licensed its data. I encourage the company to disclose what it is training its AI model on. Until we know more, I hope that anyone who wants to use AI music will choose to work with companies that we know take a fair approach to using creators' work.
To put it simply — and to use some details gleaned from Suno Rolling rock interview — it doesn't matter if you're a band of musicians, what you think you think about IP, or how many photos of famous composers you have on your walls. If you train on unlicensed copyrighted works, you are not on the side of the musicians. You are taking unfair advantage of their work to make something that competes with them. You get them at your profit — and their cost.
Ed Newton-Rex is the CEO of Fairly Trained and a composer. He previously founded Jukedeck, one of the first AI music companies, led product in Europe for TikTok and was head of audio at Stability AI.