Meta Admits Using Books3's Pirated Books To Train AI Models, But Denies Copyright Infringement

Meta admitted using Books3, a text dataset containing hundreds of thousands of pirated books. But, the social media giant still claims that there's no intentional copyright infringement. To clarify things, the case of Meta's Books3 usage is not really new in the technology industry. Other tech firms are also doing it, such as OpenAI and Microsoft.

But, the developer of ChatGPT explained that training artificial intelligence models without using copyrighted material is "almost impossible." Despite OpenAI's explanation and other lawsuits' arguments, Meta still defended its usage of Books3. Here's what the Facebook owner said.

Meta Admits Using Books3's Pirated Books To Train AI Models

According to GizChina's latest report, Meta recently faced a class-action lawsuit from several authors who claimed that their books were pirated by the Books3 dataset.

They accused the American tech firm of using the text dataset to train its LLAM 2 and LLAM 1 models. Meta said that it's really using the Books3, but claimed that there's now intentional copyright infringement.

Because of this, Meta refused to pay the right compensation to the plaintiffs. The social media giant argued that its usage of Books3 falls within the scope of "fair use"; an argument used when using copyrighted material for limited and transformative purposes only.

The Facebook owner added that the books contained in the Books3 don't require attribution, compensation, or even permission. Due to this argument, Meta is challenging the legality of the class-action lawsuit.

Wired reported that Books3 began as a passion project of a Midwestern man named Shawn Presser, a computer researcher.

"I poured my soul into the work," explained Presser.

He said that the Books3 dataset aligned with the open-source movement, which he supports. Presser added that this is his way to democratize access to the kind of databases that OpenAI and other tech firms are already using.

However, his creation has been the center of controversy in the AI industry. Meta, OpenAI, Microsoft, and other tech firms that develop artificial intelligence models are now using his innovation.

Numerous authors and publishers reported the Books3 dataset since their works are being used without permission. These include notable authors, such as Bart King, Biana Turetsky, T. Greenwood, Lauren Groff, and Conor Kostick.

Because of the text dataset's controversial functionality, it led to several lawsuits against Meta and other tech firms that use it to train their AI models.

As of writing, many authors and publishers are still making an effort to get the compensation they deserve.