Penguin Random House, one of the world’s largest publishers, has taken action to block firms from training AI systems on its huge portfolio, publishing trade The Bookseller reports.
AI firms often trawl or “scrape” sources like fiction and non-fiction books, newspapers, and social media to train their AI models, which has already caused plenty of legal controversies.
Alongside Simon & Schuster, Hachette, HarperCollins, and Macmillan Publishers, Penguin Random House is considered one of the ‘Big Five’ English language publishers. These are thought to control 80% of the U.S. book trade as of 2022.
Penguin has amended the copyright wording which appears on all its titles worldwide, across all its imprints. It now reads: “No part of this book may be used or reproduced in any manner to train artificial intelligence technologies or systems”.
According to The Bookseller, the new wording will appear on all its new titles and any reprinted old titles.
The statement also calls on a European Parliament directive released earlier this year, which gives copyright holders the right for their material to be protected from text or data mining by AI firms, as long as their work has been opted out from being used by AI.
It’s not just book publishing that is lashing out against AI firms allegedly profiting off their material, it’s a big issue in other industries.
In December 2023, The New York Times sued OpenAI and Microsoft for copyright infringement, claiming that millions of its articles were used to train the companies’ AI models.
However, not all of the world’s largest book publishers are taking such a hardline approach to how their material is used.
Wiley, Oxford University Press, and Taylor & Francis have all signed agreements that allow their content to be used to train AI, under certain conditions, according to The Bookseller earlier this year.
In a statement to the trade publication, copyright lawyer Chien‑Wei Lui, a senior associate at Fox Williams LLP, broadly supported the recent change to the copyright warning.
Recommended by Our Editors
“The more training that is being done on a non-contractual/licence basis, the greater the risk that author content is being devalued,” she said. “Why would a platform pay to license content for training purposes if it suspects that content is already ’out there’?
The lawyer added: “Publishers need to ensure they understand all the tools at their disposal to limit the ability for third parties to use their content for training purposes. Having a clear and advertised statement about reserving all training and text and data mining rights, for example, is helpful.”
Get Our Best Stories!
Sign up for What’s New Now to get our top stories delivered to your inbox every morning.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.