'Books3' Takedown: Anti-Piracy Group Calls for More AI Training Transparency
Ernesto Van der Sar, 05 Sep 12:55 PM

old computer History has shown that copyright holders tend to be wary of new technologies that disrupt the status quo.

From the printing press, through cassette tapes, to online video streaming services, all were seen as major threats to the revenues of copyright holders at some point.

These weren't just overblown fears, since technologies can be used for both good and bad. Pirate streaming services are still a problem today, for example, but the same can't be said for Netflix and Spotify.

Over the past year, artificial intelligence has propelled itself to become a top concern for copyright holders. While this evolving technology can be a boon to rightsholders, the current focus is to prevent AI from exploiting, cannibalizing, or infringing copyrighted content.

The issue has already made its way to the courts in several instances and a few weeks ago we reported that anti-piracy groups are also getting involved. Last month, the Danish Rights Alliance was the first group to claim a major victory on the takedown front, by removing a copy of the controversial Books3 AI training dataset from the web.

The Books3 dataset has a clear piracy angle, as it was created from the library of 'pirate' site Bibliotik. The plaintext collection of 196,640 books, which is nearly 37GB in size, was used to train several AI models, including Meta's.

Books3 was first published on The Eye in late 2020 and was eventually removed when the Rights Alliance sent a formal takedown notice. There are still copies circulating elsewhere, but rightsholders are determined to take these down as well.

Transparency Needed

Many rightsholders believe that Books3 isn't the only piracy-sourced dataset. There are other book datasets as well, which are too large to have been created from public domain content. And then there are datasets that use copyrighted music, images, and video as well.

What makes Books3 unique is the fact that the source was published. In many other instances that's not the case, so rightsholders can't send takedown notices, even if they wanted to.

Rights Alliance director Maria Fredenslund notes that the Books3 example shows the importance of companies being transparent about the datasets they use to train AI models. This should be the rule going forward, not the exception.

"Books3 was a special case, as the creators of the data set had made public its origin, and at the same time, some artificial intelligence developers had indicated that they had used Books3. The case is therefore a real example of transparency being necessary for rights holders to enforce their content," Fredenslund says.

"We are therefore in the process of continuing our experience with Books3, in a call for a stricter requirement for transparency in the EU's AI Regulation, so that rights holders have a real opportunity to check whether their content is used to train artificial intelligence."

U.S. Copyright Office Asks Questions

The anti-piracy group isn't the only party focusing on transparency. The U.S. Copyright Office, which launched a broader AI initiative earlier this year, just launched a public consultation where it asks stakeholders for their views on the matter.

"In order to allow copyright owners to determine whether their works have been used, should developers of AI models be required to collect, retain, and disclose records regarding the materials used to train their models?" the Office asks.

"What obligations, if any, should there be to notify copyright owners that their works have been used to train an AI model?" another question reads.

UK House of Commons Committee Chimes In

Last week, a new AI report from the UK House of Commons Committee also chimed in on the subject. The government previously floated the idea of introducing a copyright exception for text and data mining for AI, but after objections, quickly walked it back.

The House of Commons Committee believes that this was wise, noting that rightsholders should be protected. Their report also recommends further transparency and the need for copyright holders to be compensated if their work is used for AI training purposes.

"The Government should consider how creatives can ensure transparency and, if necessary, recourse and redress if they suspect that AI developers are wrongfully using their works in AI development," the House of Commons Committee writes.

"The Government should support the continuance of a strong copyright regime in the UK and be clear that licenses are required to use copyrighted content in AI. In line with our previous work, this Committee also believes that the Government should act to ensure that creators are well rewarded in the copyright regime."

Just the Beginning

The European Union already has a transparency requirement in its recently proposed AI regulation but Rights Alliance doesn't believe it's helpful in its current form.

"[T]he EU's AI regulation is not sufficient, since this does not oblige the developers of artificial intelligence to publish where the content of their training data originates," the anti-piracy group notes.

These are just a few examples of recent AI-related copyright issues. While it's still early days, we can expect the topic to keep rightsholders, lawmakers, and courts busy for years.

Now is the time for various stakeholders to draw their lines in the sand. It's clear that AI development can't be slowed down, but which training data and outputs will be considered fair game is yet to be determined.

From: TF, for the latest news on copyright battles, piracy and more.

'News Media are a Useful Tool to Educate the Public on Piracy Risks and Threats'
Ernesto Van der Sar, 04 Sep 09:20 PM

Danger Online piracy is a complex and constantly evolving phenomenon that fuels the daily reporting on this website.

While big headlines don't appear every day, there's always something happening. These events are no longer the preserve of niche audiences as mainstream media outlets show increasing interest.

This broader coverage is in part fueled by a steady stream of press releases issued by rightsholders and anti-piracy groups. These alerts often find their way to online news publications where they are republished with no questions asked. It's a particularly useful mechanism for those crafting the messaging.

The Media as an Anti-Piracy Tool

The role of the press as a communication 'tool' isn't typically discussed in public but did come up in a recent letter sent to the US Trademark and Patent Office by the Premier League. It mentions the media as one of the key tools through which messages can be sent.

The Premier League lists media reporting as one of the main ways the public at large is informed about the harms and dangers of piracy. This apparently can be very effective, as an example from Singapore shows.

"Raids conducted at the end of last year in Singapore are a good example of law enforcement using media coverage to send a strong message to the market about the legal risks associated with piracy.

"Since the raids, and resultant widespread media coverage, the Premier League has not been able to find any physical shops selling illicit streaming devices offering its content in Singapore," the Premier League notes.

In addition to covering enforcement actions, the media also helps by 'amplifying' research and reports, including those that highlight malware threats and piracy-related scams.

"Publications of studies and reports, such as those identified in the previous response, that contain empirical data on the harms and dangers of piracy. These are generally amplified by media reporting and consumer campaigns," the Premier League adds.

Balance vs. Hysteria

The Premier League's letter shows that rightsholders can utilize the media to boost their anti-piracy message. There is absolutely nothing wrong with that strategy, as long as the media maintains a critical and balanced view. But it doesn't always work that way.

With limited time available, many news publications rely heavily on information being fed to them. And since press releases typically lack caveats and counterpoints, subsequent reports risk being one-sided; in some cases, dramatically so.

In recent years we have seen media outfits not only repeat messaging but deliberately make it more extreme. This can lead to hysterical coverage or outright misinformation.

Nuance

Again, the Premier League and other rightsholders are not really to blame for this. Their press releases are carefully crafted to ensure effective delivery of their main messages and the absence of nuance is intended. The role of the media is to find balance and avoid hyperbole, but the former is often lacking.

A critical look at industry reports can pay off though. For example, when a report suggested that pirate sites are the main propagation method for malware, experts were quick to reject the claim. And when researchers fail to present proper evidence, that's worth pointing out too.

All in all, it's the responsibility of the media to ensure the presentation of a complete picture. The facts that are not mentioned in a press release are often far more interesting than those that are.

There are plenty of signs that rightsholder groups are deliberately creating strategies to 'cultivate' their relationships with the press. And, needless to say, it's not their goal to make sure that the media remains dedicated to balanced reporting; they simply want to deter piracy.

—

A copy of the Premier League's full letter to the U.S. Parent and Trademark Office is available here (pdf)

From: TF, for the latest news on copyright battles, piracy and more.

Are you looking for a VPN service? TorrentFreak sponsor NordVPN has some excellent offers.

Big video box

Tuesday, September 5, 2023

TorrentFreak's Latest News

'Books3' Takedown: Anti-Piracy Group Calls for More AI Training Transparency
Ernesto Van der Sar, 05 Sep 12:55 PM

Transparency Needed

U.S. Copyright Office Asks Questions

UK House of Commons Committee Chimes In

Just the Beginning

'News Media are a Useful Tool to Educate the Public on Piracy Risks and Threats'
Ernesto Van der Sar, 04 Sep 09:20 PM

The Media as an Anti-Piracy Tool

Balance vs. Hysteria

Nuance

No comments:

Blog Archive

About Me

Big video box

Tuesday, September 5, 2023

TorrentFreak's Latest News

'Books3' Takedown: Anti-Piracy Group Calls for More AI Training TransparencyErnesto Van der Sar, 05 Sep 12:55 PM

Transparency Needed

U.S. Copyright Office Asks Questions

UK House of Commons Committee Chimes In

Just the Beginning

'News Media are a Useful Tool to Educate the Public on Piracy Risks and Threats'Ernesto Van der Sar, 04 Sep 09:20 PM

The Media as an Anti-Piracy Tool

Balance vs. Hysteria

Nuance

No comments:

Blog Archive

About Me

'Books3' Takedown: Anti-Piracy Group Calls for More AI Training Transparency
Ernesto Van der Sar, 05 Sep 12:55 PM

'News Media are a Useful Tool to Educate the Public on Piracy Risks and Threats'
Ernesto Van der Sar, 04 Sep 09:20 PM