According to OpenAI, creating tools such as ChatGPT, a revolutionary chatbot, would not be feasible without the use of copyrighted material. This has sparked increased scrutiny on artificial intelligence companies and the sources they rely on to train their products.
Chatbots like ChatGPT and image generators such as Stable Diffusion are “educated” using a large amount of data sourced from the internet, a significant portion of which is subject to copyright laws. These laws protect the original creators’ work from being used without their consent.
The New York Times filed a lawsuit against OpenAI and Microsoft, a major investor in OpenAI and a user of its tools in their products, alleging that they had engaged in “unauthorized use” of their work to develop their own products.
OpenAI stated in a report to the House of Lords communications and digital select committee that it is unable to train its GPT-4 model, which powers ChatGPT, without being able to use copyrighted material.
OpenAI, as reported by the Telegraph, stated that due to the wide range of human expressions covered by copyright, such as blog posts, photographs, forum posts, snippets of code, and government documents, it is necessary to use copyrighted materials in training current AI models.
The statement also mentioned that only using training materials from books and drawings that are no longer under copyright would result in insufficient AI systems: “Using only public domain books and drawings from over a hundred years ago may lead to an intriguing experiment, but it would not produce AI systems that meet the demands of modern society.”
In reaction to the legal action taken by the New York Times in the previous month, OpenAI stated that it acknowledges and values the “rights of those who create and own content.” When AI companies are faced with copyright infringement claims, they often rely on the legal principle of “fair use”, which permits the use of copyrighted material under specific circumstances without obtaining permission from the owner.
According to OpenAI’s statement, they believe that copyright law does not prohibit training.
The New York Times’ legal action against OpenAI is just one of many lawsuits filed against the organization. In September, 17 authors including John Grisham, Jodi Picoult, and George RR Martin accused OpenAI of “systematic theft on a large scale.”
The company Getty Images, known for having a vast collection of photos, is taking legal action against Stability AI, the creator of Stable Diffusion, for violating copyright laws in both the US and England and Wales. In the US, several music publishers, including Universal Music, are also suing Anthropic, a company backed by Amazon that created the Claude chatbot. They claim that Anthropic has used countless copyrighted song lyrics to train its model.
In addition, in their submission to the House of Lords, OpenAI expressed their support for having independent experts evaluate their security measures in regards to AI safety. They also mentioned their endorsement of “red-teaming,” which involves third-party researchers mimicking the actions of malicious actors to test the safety of a product.
OpenAI is one of the companies that have committed to collaborating with governments in testing the safety of their most advanced models before and after their release. This agreement was made at a worldwide safety summit in the UK last year.
Source: theguardian.com