During the IP Café on “Artificial Intelligence (“AI”) and Copyright” at the 2023 AIPPI World Congress, participants agreed that the issues caused by AI’s interplay with copyright law are an important problem requiring resolution by international co-operation. Without such co-operation, how governments choose to legislate on this may differ internationally, which could result in unnecessary complication and confusion. This would not be good for the development of this important new technology. There is hope that this will not happen though – as evidenced by twenty-nine states and groupings (including the UK, US, EU, India and China) signing the Bletchley Declaration on November 1, 2023, at the AI Safety Summit 2023. AI appears to be one of those areas where governments are eager to co-operate internationally.
The question to be grappled with is how this might be done without undermining what most IP lawyers agree is the fundamental purpose of copyright law, namely to reward authors globally for the creation of original works when made by granting them (or their assignees) exclusive rights to control how their works are made available to the public to enjoy.
As recognised in the TRIPS Agreement the exceptions to these copyright owner’s exclusive rights need to be restricted to “certain special cases which do not conflict with a normal exploitation of the work and do not unreasonably prejudice the legitimate interests of the right holder”. Such exceptions are commonly called “fair use exceptions” – whether they are enshrined in the common/case law or in statute (such as in the UK s.29A(1) (non-commercial research) and s.30A (fair dealing for criticism and review) of the Copyright Designs and Patents Act 1988 (“CDPA”)). However, these laws were written before the rise of AI technologies and the questions of how copyright law applies to AI and where the line is between copyright infringement and permitted acts remain to be determined.
In addition, there is the question of whether and under what circumstances an AI generated work should attract copyright and, if so, who should own it – in the UK, copyright works which are “computer-generated” can attract copyright protection and the first owner will be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken (s.9(3) CPDA). However, other countries may not recognise such works as being suitable for protection to the extent they do not represent the fruit of a human’s intellectual endeavour.
Below is a brief overview of some of the cases in which these issues are raised along with some thoughts from an English law perspective on what the possible implications of the outcomes could mean for both tech companies and copyright holders.
Input Data Cases
The use of copyright works for training purposes for AI tools has become hugely controversial, which is evidenced by the rising number of lawsuits addressing this very topic. In 2024 we may begin to see some judicial guidance on this, with the USA trial date set for Thomson Reuters Enterprise Centre GMBH and West Publishing Corp’s v Ross Intelligence Inc. on 26 August 2024. This case will test whether copyright owners can prevent AI companies from using copyright works for the purpose of training the machine learning models behind generative AI programs. Thomson Reuters is accusing Ross Intelligence of unlawfully copying content from its legal-research platform Westlaw® to train a competing artificial intelligence-based platform.
Since 2020 (when this case was first brought), other tech companies, including Meta, Stability AI and Open AI, have begun facing similar lawsuits in the USA and UK for alleged copyright infringement following the use of the claimant’s datasets for training their AI tools (e.g., LLaMA, Stable Diffusion, ChatGPT). At the core of these cases is whether the use of the copyright works for training Large Language Models (“LLMs”) is copying within the meaning of copyright law and, if so, whether it qualifies as a permitted act and therefore, lawful use or, if not, whether the copying is substantial and, as such, amounts to infringement.
The Generative AI business model relies upon training data to enable it to examine patterns such as style, structure, and aesthetics to generate new and original content. This data is often obtained by web scraping, which is at the heart of the class action filed against Google in the US (J.L. v. Alphabet Inc, U.S. District Court for the Northern District of California, No. 3:23-cv-03440). Google is accused of using web scraping in the training of its AI tools, Bard, Imagen, MusicLM, Duet AI and Gemini. The two questions this case seeks to clarify are: (1) if and how the training of the LLM used by the Google technology can be based on resources available on the Internet and whether any fair use exceptions under US law apply for such training; and (2) whether and to what extent the shift of the risk of copyright infringement deriving from input (and output) onto the users (by the contracts users enter into when using the technology) is enforceable.
With regard to the first question, this is an area where we may see a divergence of thought internationally. Under English law, the fair use exceptions are largely restricted to non-commercial use and so for training methods not to infringe, it would seem that the defendant would need to argue that either the works used for training are not being copied or if they are, they are being copied for non-commercial purposes – or perhaps by arguing that it is not the service provider doing the copying but the user (who is therefore arguably the infringer).
Conversely, in the USA, fair use is a broader concept that focuses on the nature of the work, the amount used and the substantiality of what has been taken from the work. Therefore, despite the underlying works being used for commercial purposes, if a defendant can prove that the output is “transformative” (i.e., “adds something with a further purpose or different character, altering the first with new expression, meaning, or message” Campbell, 510 U.S. at 579) then a fair use defence may succeed. This defence was used unsuccessfully in the recent US Supreme Court case with Andy Warhol v Lynn Goldsmith, whereby the US Supreme Court held that Warhol’s images of the late musician Prince did not transform Goldsmith’s photograph of him in 1981 to a great enough degree and therefore did not qualify as “fair use” to avoid claims of copyright infringement. However, it appeared to succeed in the earlier case of Author’s Guild v Google in which a US appeal court agreed with the first instance ruling that Google’s wholesale copying of books was not an infringement of the copyright in them.
A similar debate around web scraping is at the core of the lawsuit between Getty Images v Stability AI in the UK and the USA. However, since the lawsuit was brought, we have seen concessions by Stability AI to content creators by way of offering an opt-out of Stable Diffusion 3 image training. We have also seen similar approaches being taken by OpenAI with their opt-out of having users input and output data used for training purposes. Is this the way forward for Generative AI models to succeed? Time will tell.
Output Data Cases
The other side of the discussion is what IP is generated by the AI tool – is there copyright in the generated work and, if so, who is the author of that copyright and what should the scope of the protection be? In addition, is there a risk that the generated work infringes the original work that it is based on?
It seems easier to comprehend that computers and AI would assist in the production of useful things, whereas the idea of AI being creative seems more difficult to grapple with. However, this has changed since the launch of tools such as ChatGPT, DALL-E, Midjourney and Stable Diffusion. We have even seen the first AI realised artwork sold by the auction house Christie’s – “Compte de Bellamy”. As such, those who use AI tools to create are looking to copyright law for protection for the works so created. However, what the scope and ownership of such works should be may depend in some countries on whether a human “author” can be identified as having originated the AI-generated work and what involvement a human had in the “creative” process. Unlike many countries, the UK’s CDPA expressly provides for copyright protection of computer-generated works that do not have a human creator. The law designates that where a work is “generated by computer in circumstances where there is no human author” (s.178 CDPA), the author of such a work is “the person by whom the arrangements necessary for the creation of the work are undertaken” (s.9(3) CDPA). Therefore, under English law, a computer-generated output could be protected under copyright law and the law indicates who the first owner of that copyright will be. However, the English courts have also adopted the requirement that in order to show originality that there be “intellectual creation” by the author, which is derived from the ECJ decision in Infopaq and subsequent decisions; it is also possible that if a human was a creator of the AI system as a whole then that person / persons may have little connection with the direct creative input and therefore, could not be considered the author and consequently outside of the copyright law protection. Consequently, at the core of the question of authorship, it may be that the critical question to be addressed is what degree of “creative” involvement there was in the creation of the AI system output.
In relation to infringement of output data, the UK Government at one point announced it would introduce an exception to copyright and database right infringement to allow commercial text and data mining of copyright works provided the miners have “lawful access” to the work along the lines of that permitted under EU copyright law. However the UK Government decided not to proceed with this change in the law after much lobbying but this is still a live debate; it will be interesting to see how this develops.
Conclusion
There are no concrete answers as yet to the various questions raised as to whether and how the use of AI technology results in copyright infringement, and it seems unlikely that the legislators will provide the answers any time soon. As such, the cases set to be handed down next year could be pivotal to the development of future AI and copyright law. In the words of Getty Images’ CEO Craig Peters in the Getty v Stability AI case, “there are ways of building generative models that respect intellectual property. I equate [this to] Napster and Spotify. Spotify negotiated with intellectual property rights holders — labels and artists — to create a service. You can debate over whether they’re fairly compensated in that or not, but it’s a negotiation based of the rights of individuals and entities. And that’s what we’re looking for, rather than a singular entity benefiting off the backs of others. That’s the long-term goal of this action.”
Tech companies could consider following the approaches of OpenAI and Stability AI by issuing users the right to opt-out of their web crawlers and be transparent in their terms and conditions about what user’s data may be used for. Transparency is at the core of the General Data Protection Regulation, which has had a rippling effect globally on data privacy, and it may be that that concept is used as inspiration for the approach to AI governance. Therefore, it would be prudent for tech companies to consider adopting a similar approach. Conversely, copyright holders can include copyright warnings in their data sets or consider putting in technical measures to restrict access to their works and levy fees for access to their data sets as a way to be compensated for their works. Nonetheless, my view is that tech companies and copyright holders should not be seen as two separate entities fighting against each other, but should find ways to mutually benefit each other to produce a commercially viable service for their customers who will largely be the same set of people.