torrenting

judge:-pirate-libraries-may-have-profited-from-meta-torrenting-80tb-of-books

Judge: Pirate libraries may have profited from Meta torrenting 80TB of books

It could certainly look worse for Meta if authors manage to present evidence supporting the second way that torrenting could be relevant to the case, Chhabaria suggested.

“Meta downloading copyrighted material from shadow libraries” would also be relevant to the character of the use, “if it benefitted those who created the libraries and thus supported and perpetuated their unauthorized copying and distribution of copyrighted works,” Chhabria wrote.

Counting potential strikes against Meta, Chhabria pointed out that the “vast majority of cases” involving “this sort of peer-to-peer file-sharing” are found to “constitute copyright infringement.” And it likely doesn’t help Meta’s case that “some of the libraries Meta used have themselves been found liable for infringement.”

However, Meta may overcome this argument, too, since book authors “have not submitted any evidence” that potentially shows how Meta’s downloading may perhaps be “propping up” or financially benefiting pirate libraries.

Finally, Chhabria noted that the “last issue relating to the character of Meta’s use” of books in regards to its torrenting is “the relationship between Meta’s downloading of the plaintiffs’ books and Meta’s use of the books to train Llama.”

Authors had tried to argue that these elements were distinct. But Chhabria said there’s no separating the fact that Meta downloaded the books to serve the “highly transformative” purpose of training Llama.

“Because Meta’s ultimate use of the plaintiffs’ books was transformative, so too was Meta’s downloading of those books,” Chhabria wrote.

AI training rulings may get more authors paid

Authors only learned of Meta’s torrenting through discovery in the lawsuit, and because of that, Chhabria noted that “the record on Meta’s alleged distribution is incomplete.”

It’s possible that authors may be able to show evidence that Meta “contributed to the BitTorrent network” by providing significant computing power that could’ve meaningfully assisted shadow libraries, Chhabria said in a footnote.

Judge: Pirate libraries may have profited from Meta torrenting 80TB of books Read More »

judge-on-meta’s-ai-training:-“i-just-don’t-understand-how-that-can-be-fair-use”

Judge on Meta’s AI training: “I just don’t understand how that can be fair use”


Judge downplayed Meta’s “messed up” torrenting in lawsuit over AI training.

A judge who may be the first to rule on whether AI training data is fair use appeared skeptical Thursday at a hearing where Meta faced off with book authors over the social media company’s alleged copyright infringement.

Meta, like most AI companies, holds that training must be deemed fair use, or else the entire AI industry could face immense setbacks, wasting precious time negotiating data contracts while falling behind global rivals. Meta urged the court to rule that AI training is a transformative use that only references books to create an entirely new work that doesn’t replicate authors’ ideas or replace books in their markets.

At the hearing that followed after both sides requested summary judgment, however, Judge Vince Chhabria pushed back on Meta attorneys arguing that the company’s Llama AI models posed no threat to authors in their markets, Reuters reported.

“You have companies using copyright-protected material to create a product that is capable of producing an infinite number of competing products,” Chhabria said. “You are dramatically changing, you might even say obliterating, the market for that person’s work, and you’re saying that you don’t even have to pay a license to that person.”

Declaring, “I just don’t understand how that can be fair use,” the shrewd judge apparently stoked little response from Meta’s attorney, Kannon Shanmugam, apart from a suggestion that any alleged threat to authors’ livelihoods was “just speculation,” Wired reported.

Authors may need to sharpen their case, which Chhabria warned could be “taken away by fair use” if none of the authors suing, including Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey, can show “that the market for their actual copyrighted work is going to be dramatically affected.”

Determined to probe this key question, Chhabria pushed authors’ attorney, David Boies, to point to specific evidence of market harms that seemed noticeably missing from the record.

“It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected by the billions of things that Llama will ultimately be capable of producing,” Chhabria said. “And it’s just not obvious to me that that’s the case.”

But if authors can prove fears of market harms are real, Meta might struggle to win over Chhabria, and that could set a precedent impacting copyright cases challenging AI training on other kinds of content.

The judge repeatedly appeared to be sympathetic to authors, suggesting that Meta’s AI training may be a “highly unusual case” where even though “the copying is for a highly transformative purpose, the copying has the high likelihood of leading to the flooding of the markets for the copyrighted works.”

And when Shanmugam argued that copyright law doesn’t shield authors from “protection from competition in the marketplace of ideas,” Chhabria resisted the framing that authors weren’t potentially being robbed, Reuters reported.

“But if I’m going to steal things from the marketplace of ideas in order to develop my own ideas, that’s copyright infringement, right?” Chhabria responded.

Wired noted that he asked Meta’s lawyers, “What about the next Taylor Swift?” If AI made it easy to knock off a young singer’s sound, how could she ever compete if AI produced “a billion pop songs” in her style?

In a statement, Meta’s spokesperson reiterated the company’s defense that AI training is fair use.

“Meta has developed transformational open source AI models that are powering incredible innovation, productivity, and creativity for individuals and companies,” Meta’s spokesperson said. “Fair use of copyrighted materials is vital to this. We disagree with Plaintiffs’ assertions, and the full record tells a different story. We will continue to vigorously defend ourselves and to protect the development of GenAI for the benefit of all.”

Meta’s torrenting seems “messed up”

Some have pondered why Chhabria appeared so focused on market harms, instead of hammering Meta for admittedly illegally pirating books that it used for its AI training, which seems to be obvious copyright infringement. According to Wired, “Chhabria spoke emphatically about his belief that the big question is whether Meta’s AI tools will hurt book sales and otherwise cause the authors to lose money,” not whether Meta’s torrenting of books was illegal.

The torrenting “seems kind of messed up,” Chhabria said, but “the question, as the courts tell us over and over again, is not whether something is messed up but whether it’s copyright infringement.”

It’s possible that Chhabria dodged the question for procedural reasons. In a court filing, Meta argued that authors had moved for summary judgment on Meta’s alleged copying of their works, not on “unsubstantiated allegations that Meta distributed Plaintiffs’ works via torrent.”

In the court filing, Meta alleged that even if Chhabria agreed that the authors’ request for “summary judgment is warranted on the basis of Meta’s distribution, as well as Meta’s copying,” that the authors “lack evidence to show that Meta distributed any of their works.”

According to Meta, authors abandoned any claims that Meta’s seeding of the torrented files served to distribute works, leaving only claims about Meta’s leeching. Meta argued that the authors “admittedly lack evidence that Meta ever uploaded any of their works, or any identifiable part of those works, during the so-called ‘leeching’ phase,” relying instead on expert estimates based on how torrenting works.

It’s also possible that for Chhabria, the torrenting question seemed like an unnecessary distraction. Former Meta attorney Mark Lumley, who quit the case earlier this year, told Vanity Fair that the torrenting was “one of those things that sounds bad but actually shouldn’t matter at all in the law. Fair use is always about uses the plaintiff doesn’t approve of; that’s why there is a lawsuit.”

Lumley suggested that court cases mulling fair use at this current moment should focus on the outputs, rather than the training. Citing the ruling in a case where Google Books scanning books to share excerpts was deemed fair use, Lumley argued that “all search engines crawl the full Internet, including plenty of pirated content,” so there’s seemingly no reason to stop AI crawling.

But the Copyright Alliance, a nonprofit, non-partisan group supporting the authors in the case, in a court filing alleged that Meta, in its bid to get AI products viewed as transformative, is aiming to do the opposite. “When describing the purpose of generative AI,” Meta allegedly strives to convince the court to “isolate the ‘training’ process and ignore the output of generative AI,” because that’s seemingly the only way that Meta can convince the court that AI outputs serve “a manifestly different purpose from Plaintiffs’ books,” the Copyright Alliance argued.

“Meta’s motion ignores what comes after the initial ‘training’—most notably the generation of output that serves the same purpose of the ingested works,” the Copyright Alliance argued. And the torrenting question should matter, the group argued, because unlike in Google Books, Meta’s AI models are apparently training on pirated works, not “legitimate copies of books.”

Chhabria will not be making a snap decision in the case, planning to take his time and likely stressing not just Meta, but every AI company defending training as fair use the longer he delays. Understanding that the entire AI industry potentially has a stake in the ruling, Chhabria apparently sought to relieve some tension at the end of the hearing with a joke, Wired reported.

 “I will issue a ruling later today,” Chhabria said. “Just kidding! I will take a lot longer to think about it.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Judge on Meta’s AI training: “I just don’t understand how that can be fair use” Read More »

meta-claims-torrenting-pirated-books-isn’t-illegal-without-proof-of-seeding

Meta claims torrenting pirated books isn’t illegal without proof of seeding

Just because Meta admitted to torrenting a dataset of pirated books for AI training purposes, that doesn’t necessarily mean that Meta seeded the file after downloading it, the social media company claimed in a court filing this week.

Evidence instead shows that Meta “took precautions not to ‘seed’ any downloaded files,” Meta’s filing said. Seeding refers to sharing a torrented file after the download completes, and because there’s allegedly no proof of such “seeding,” Meta insisted that authors cannot prove Meta shared the pirated books with anyone during the torrenting process.

Whether or not Meta actually seeded the pirated books could make a difference in a copyright lawsuit from book authors including Richard Kadrey, Sarah Silverman, and Ta-Nehisi Coates. Authors had previously alleged that Meta unlawfully copied and distributed their works through AI outputs—an increasingly common complaint that so far has barely been litigated. But Meta’s admission to torrenting appears to add a more straightforward claim of unlawful distribution of copyrighted works through illegal torrenting, which has long been considered established case-law.

Authors have alleged that “Meta deliberately engaged in one of the largest data piracy campaigns in history to acquire text data for its LLM training datasets, torrenting and sharing dozens of terabytes of pirated data that altogether contain many millions of copyrighted works.” Separate from their copyright infringement claims opposing Meta’s AI training on pirated copies of their books, authors alleged that Meta torrenting the dataset was “independently illegal” under California’s Computer Data Access and Fraud Act (CDAFA), which allegedly “prevents the unauthorized taking of data, including copyrighted works.”

Meta, however, is hoping to convince the court that torrenting is not in and of itself illegal, but is, rather, a “widely-used protocol to download large files.” According to Meta, the decision to download the pirated books dataset from pirate libraries like LibGen and Z-Library was simply a move to access “data from a ‘well-known online repository’ that was publicly available via torrents.”

Meta claims torrenting pirated books isn’t illegal without proof of seeding Read More »