copyright infringement

ai-industry-horrified-to-face-largest-copyright-class-action-ever-certified

AI industry horrified to face largest copyright class action ever certified

According to the groups, allowing copyright class actions in AI training cases will result in a future where copyright questions remain unresolved and the risk of “emboldened” claimants forcing enormous settlements will chill investments in AI.

“Such potential liability in this case exerts incredibly coercive settlement pressure for Anthropic,” industry groups argued, concluding that “as generative AI begins to shape the trajectory of the global economy, the technology industry cannot withstand such devastating litigation. The United States currently may be the global leader in AI development, but that could change if litigation stymies investment by imposing excessive damages on AI companies.”

Some authors won’t benefit from class actions

Industry groups joined Anthropic in arguing that, generally, copyright suits are considered a bad fit for class actions because each individual author must prove ownership of their works. And the groups weren’t alone.

Also backing Anthropic’s appeal, advocates representing authors—including Authors Alliance, the Electronic Frontier Foundation, American Library Association, Association of Research Libraries, and Public Knowledge—pointed out that the Google Books case showed that proving ownership is anything but straightforward.

In the Anthropic case, advocates for authors criticized Alsup for basically judging all 7 million books in the lawsuit by their covers. The judge allegedly made “almost no meaningful inquiry into who the actual members are likely to be,” as well as “no analysis of what types of books are included in the class, who authored them, what kinds of licenses are likely to apply to those works, what the rightsholders’ interests might be, or whether they are likely to support the class representatives’ positions.”

Ignoring “decades of research, multiple bills in Congress, and numerous studies from the US Copyright Office attempting to address the challenges of determining rights across a vast number of books,” the district court seemed to expect that authors and publishers would easily be able to “work out the best way to recover” damages.

AI industry horrified to face largest copyright class action ever certified Read More »

meta-pirated-and-seeded-porn-for-years-to-train-ai,-lawsuit-says

Meta pirated and seeded porn for years to train AI, lawsuit says

Evidence may prove Meta seeded more content

Seeking evidence to back its own copyright infringement claims, Strike 3 Holdings searched “its archive of recorded infringement captured by its VXN Scan and Cross Reference tools” and found 47 “IP addresses identified as owned by Facebook infringing its copyright protected Works.”

The data allegedly demonstrates a “continued unauthorized distribution” over “several years.” And Meta allegedly did not stop its seeding after Strike 3 Holdings confronted the tech giant with this evidence—despite the IP data supposedly being verified through an industry-leading provider called Maxmind.

Strike 3 Holdings shared a screenshot of MaxMind’s findings. Credit: via Strike 3 Holdings’ complaint

Meta also allegedly attempted to “conceal its BitTorrent activities” through “six Virtual Private Clouds” that formed a “stealth network” of “hidden IP addresses,” the lawsuit alleged, which seemingly implicated a “major third-party data center provider” as a partner in Meta’s piracy.

An analysis of these IP addresses allegedly found “data patterns that matched infringement patterns seen on Meta’s corporate IP Addresses” and included “evidence of other activity on the BitTorrent network including ebooks, movies, television shows, music, and software.” The seemingly non-human patterns documented on both sets of IP addresses suggest the data was for AI training and not for personal use, Strike 3 Holdings alleged.

Perhaps most shockingly, considering that a Meta employee joked “torrenting from a corporate laptop doesn’t feel right,” Strike 3 Holdings further alleged that it found “at least one residential IP address of a Meta employee” infringing its copyrighted works. That suggests Meta may have directed an employee to torrent pirated data outside the office to obscure the data trail.

The adult site operator did not identify the employee or the major data center discussed in its complaint, noting in a subsequent filing that it recognized the risks to Meta’s business and its employees’ privacy of sharing sensitive information.

In total, the company alleged that evidence shows “well over 100,000 unauthorized distribution transactions” linked to Meta’s corporate IPs. Strike 3 Holdings is hoping the evidence will lead a jury to find Meta liable for direct copyright infringement or charge Meta with secondary and vicarious copyright infringement if the jury finds that Meta successfully distanced itself by using the third-party data center or an employee’s home IP address.

“Meta has the right and ability to supervise and/or control its own corporate IP addresses, as well as the IP addresses hosted in off-infra data centers, and the acts of its employees and agents infringing Plaintiffs’ Works through their residential IPs by using Meta’s AI script to obtain content through BitTorrent,” the complaint said.

Meta pirated and seeded porn for years to train AI, lawsuit says Read More »

toy-company-may-regret-coming-for-“sylvanian-drama”-tiktoker,-experts-say

Toy company may regret coming for “Sylvanian Drama” TikToker, experts say


Possible legal paths to revive a shuttered video series on TikTok and Instagram.

A popular account on TikTok and Instagram stopped posting suddenly at the end of last year, hit by a lawsuit after garnering millions of views on funny videos it made using adorable children’s Calico Critter dolls to act out dark, cringe-y adult storylines.

While millions of followers mourn the so-called “Sylvanian Drama” account’s demise, experts told Ars that the creator may have a decent chance at beating the lawsuit.

The “Sylvanian Drama” account derived its name from “Sylvanian Families,” a brand name used by Epoch Company Ltd., the maker of Calico Critters, for its iconic fuzzy animal dolls in some markets outside the US. Despite these videos referencing murder, drugs, and hookups, the toy company apparently had no problem, until the account, managed by Ireland-based Thea Von Engelbrechten, started accepting big brand partnerships and making sponsored content featuring the dolls.

Since Epoch, too, strikes partnerships with brands and influencers to promote its own videos marketing the dolls, the company claimed “Sylvanian Drama” risked creating too much confusion online. They also worried viewers would think Epoch had signed off on the videos, since the sponsored content was marked “paid partnership” without specifying precisely which featured brands had paid for the spots. They further accused Von Engelbrechten of building her advertising business around their brand without any attempt to properly license the dolls, while allegedly usurping licensing opportunities from Epoch.

So far, Von Engelbrechten has delayed responding in the lawsuit. As the account remained inactive over the past few months, fans speculated whether it could survive the lawsuit, which raised copyright and trademark infringement claims to get all the videos removed. In their complaint, the toy company requested not only an injunction preventing Von Engelbrechten from creating more “Sylvanian Drama” videos, but also sought all of her profits from her online accounts, in addition to further damages.

Von Engelbrechten declined Ars’ request to provide an update on her defense in the case, but her response is due in early August. That filing will make clear what arguments she may make to overcome Epoch’s suit, but legal experts told Ars that the case isn’t necessarily a slam dunk for the toy company. So all that “Sylvanian Drama” isn’t over just yet.

Epoch’s lawyers did not respond to Ars’ request to comment.

“Sylvanian Drama” needs the court to get the joke

Epoch raised copyright infringement charges that could hit Von Engelbrechten with fines totaling $150,000 per violation.

For Von Engelbrechten to defeat the copyright infringement claim, she’ll need to convince the court that her videos are parodies. A law professor at Santa Clara University School of Law, Eric Goldman, told Ars that her videos may qualify since “even if they don’t expressly reference Epoch’s offerings by name, the videos intentionally communicate a jarring juxtaposition of adorable critters who are important parts of pop culture living through the darker sides of humanity.”

Basically, Von Engelbrechten will need the court to understand the humor in her videos to win on that claim, Rebecca Tushnet, a First Amendment law professor at Harvard Law School, told Ars.

“Courts have varied in their treatment of parodies; the complaint’s definition of parody is not controlling but humor is one of the hardest things to predict—if the court gets the joke, it will be more likely to say that the juxtaposition between the storylines and the innocent appearance of the dolls is parodic,” Tushnet said.

But if the court does get the joke, Goldman suggested that even the sponsored content—which hilariously incorporates product placements from various big brands like Marc Jacobs, Taco Bell, Hilton, and Sephora into storylines—could possibly be characterized as parody.

However, “the fact that the social media posts were labeled #ad will make it extremely difficult for the artist to contest the videos’ status as ads,” Goldman said.

Ultimately, Goldman said that Epoch’s lawsuit “raises a host of complex legal issues” and is “not an easy case on either side.”

And one of the most significant issues that Epoch may face in the courtroom could end up gutting all of its trademark infringement claims that supposedly entitle the toy company to all of Von Engelbrechten’s profits, Alexandra Jane Roberts, a Northeastern University professor of law and media with special expertise in trademark law, told Ars.

Calico Critters may stumble on trademark hurdle

The toy company has raised several trademark infringement claims, all of which depend on Epoch proving that Von Engelbrechten “knowingly and willfully” used its trademarks without permission.

However, Roberts pointed out to Ars that Epoch has no trademarks for its iconic dolls, relying only on common law to assert sole rights to the “look and design of the critters.”

It’s likely impossible for Epoch to trademark the dolls, since trademarks are not intended to block competition, and there are only so many ways to design cute dolls that resemble cats or bunnies, Roberts suggested. A court may decide “there’s only so many ways to make a small fuzzy bunny that doesn’t look like this,” potentially narrowing the rights Epoch has under trade dress, a term that Epoch doesn’t use once in its complaint.

Roberts told Ars that Epoch’s trademark claims are “not so far off the mark,” and Von Engelbrechten’s defense was certainly not strengthened by her decision to monetize the content. Prior cases, like the indie band OK Go sending a cease-and-desist to Post cereal over a breakfast product called “OK Go” due to fears of false endorsement, make it clear that courts have agreed in the past that online collaborations have muddied the waters regarding who is the actual source of content for viewers.

“The question becomes whether people are going to see these videos, even though they’re snarky, and even though they’re silly and think, ‘Oh, Calico Critters must have signed off on this,'” Roberts said. “So the argument about consumer confusion, I think, is a plausible argument.”

However, if Epoch fails to convince the court that its trademarks have been infringed, then its other claims alleging false endorsement and unfair competition would likely also collapse.

“You can still get sometimes to unfair competition or to kind of like a false endorsement, but it’s harder to win on those claims and certainly harder to get damages on those claims,” Roberts said. “You don’t get trademark infringement if you don’t have a trademark.”

Possible defenses to keep “Sylvanian Drama” alive

Winning on the trademark claims may not be easy for Von Engelbrechten, who possibly weakened her First Amendment defense by creating the sponsored content. Regardless, she will likely try to convince the court to view the videos as parody, which is a slightly different analysis under trademark law than copyright’s more well-known fair use parody exceptions.

That could be a struggle, since trademark law requires that Von Engelbrechten’s parody videos directly satirize the “Sylvanian Families” brand, and “Sylvanian Drama” videos, even the ads, instead seem to be “making fun of elements of society and culture,” rather than the dolls themselves, Roberts said.

She pointed to winning cases involving the Barbie trademark as an instructive example. In a case disputing Mattel trademarks used in the lyrics for the one-hit wonder “Barbie Girl,” the song was cleared for trademark infringement as a “purely expressive work” that directly parodies Barbie in the lyrics. And in another case, where an artist, Tom Forsythe, captured photos of Barbie dolls in kitchen vessels like a blender or a margarita glass, more robust First Amendment protection was offered since his photos “had a lot to say about sexism and the dolls and what the dolls represent,” Roberts said.

The potential “Sylvanian Drama” defense seems to lack strong go-to arguments that typically win trademark cases, but Roberts said there is still one other defense the content creator may be weighing.

Under “nominative fair use,” it’s OK to use another company’s trademark if it’s necessary in an ad. Roberts provided examples, like a company renting Lexus cars needing to use that trademark or comparative advertising using Tiffany’s diamonds as a reference point to hype their lower prices.

If Von Engelbrechten goes that route, she will need to prove she used “no more of the mark than is necessary” and did not mislead fans on whether Epoch signed off on the use.

“Here it’s hard to say that ‘Sylvanian Drama’ really needed to use so much of those characters and that they didn’t use more than they needed and that they weren’t misleading,” Roberts said.

However, Von Engelbrechten’s best bet might be arguing that there was no confusion, since “Sylvanian Families” isn’t even a brand that’s used in the US, which is where Epoch chose to file its lawsuit because the brands that partnered with the popular account are based in New York. And the case may not even get that far, Roberts suggested, since “before you can get to those questions about the likelihood of confusion, you have to show that you actually have trademark or trade dress rights to enforce.”

Calico Critters creator may face millennial backlash

Epoch may come to regret filing the lawsuit, Roberts said, noting that as a millennial who grew up a big “Hello Kitty” fan, she still buys merch that appeals to her, and Epoch likely knows about that market, as it has done collaborations with the “Hello Kitty” brand. The toymaker could risk alienating other millennials nostalgic for Calico Critters who may be among the “Sylvanian Drama” audience and feel turned off by the lawsuit.

“When you draw attention to something like this and appear litigious, and that you’re coming after a creator who a lot of people really like and really enjoy and probably feel defensive about, like, ‘Oh, she’s just making these funny videos that everyone loves. Why would you want to sue her?'” Roberts said, “that can be really bad press.”

Goldman suggested that Epoch might be better off striking a deal with the creator, which “could establish some boundaries for the artist to keep going without stepping on the IP owner’s rights.” But he noted that “often IP owners in these situations are not open to negotiation,” and “that requires courts to draw difficult and unpredictable lines about the permissible scope of fair use.”

For Von Engelbrechten, the lawsuit may mean that her days of creating “Sylvanian Drama”-sponsored content are over, which could risk crushing a bigger dream she had to succeed in advertising. However, if the lawsuit can be amicably settled, the beloved content creator could also end up making money for Epoch, considering her brand deals appeared to be bigger.

While she seems to take her advertising business seriously, Von Engelbrechten’s videos often joke about legal consequences, such as one where a cat doll says she cannot go to a party because she’s in jail but says “I’ll figure it out” when told her ex will be attending. Perhaps Von Engelbrechten is currently devising a scheme, like her characters, to escape consequences and keep the “Sylvanian Drama” going.

“Maybe if this company were really smart, they would want to hire this person instead of suing them,” Roberts said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Toy company may regret coming for “Sylvanian Drama” TikToker, experts say Read More »

everything-tech-giants-will-hate-about-the-eu’s-new-ai-rules

Everything tech giants will hate about the EU’s new AI rules

The code also details expectations for AI companies to respect paywalls, as well as robots.txt instructions restricting crawling, which could help confront a growing problem of AI crawlers hammering websites. It “encourages” online search giants to embrace a solution that Cloudflare is currently pushing: allowing content creators to protect copyrights by restricting AI crawling without impacting search indexing.

Additionally, companies are asked to disclose total energy consumption for both training and inference, allowing the EU to detect environmental concerns while companies race forward with AI innovation.

More substantially, the code’s safety guidance provides for additional monitoring for other harms. It makes recommendations to detect and avoid “serious incidents” with new AI models, which could include cybersecurity breaches, disruptions of critical infrastructure, “serious harm to a person’s health (mental and/or physical),” or “a death of a person.” It stipulates timelines of between five and 10 days to report serious incidents with the EU’s AI Office. And it requires companies to track all events, provide an “adequate level” of cybersecurity protection, prevent jailbreaking as best they can, and justify “any failures or circumventions of systemic risk mitigations.”

Ars reached out to tech companies for immediate reactions to the new rules. OpenAI, Meta, and Microsoft declined to comment. A Google spokesperson confirmed that the company is reviewing the code, which still must be approved by the European Commission and EU member states amid expected industry pushback.

“Europeans should have access to first-rate, secure AI models when they become available, and an environment that promotes innovation and investment,” Google’s spokesperson said. “We look forward to reviewing the code and sharing our views alongside other model providers and many others.”

These rules are just one part of the AI Act, which will start taking effect in a staggered approach over the next year or more, the NYT reported. Breaching the AI Act could result in AI models being yanked off the market or fines “of as much as 7 percent of a company’s annual sales or 3 percent for the companies developing advanced AI models,” Bloomberg noted.

Everything tech giants will hate about the EU’s new AI rules Read More »

nyt-to-start-searching-deleted-chatgpt-logs-after-beating-openai-in-court

NYT to start searching deleted ChatGPT logs after beating OpenAI in court


What are the odds NYT will access your ChatGPT logs in OpenAI court battle?

Last week, OpenAI raised objections in court, hoping to overturn a court order requiring the AI company to retain all ChatGPT logs “indefinitely,” including deleted and temporary chats.

But Sidney Stein, the US district judge reviewing OpenAI’s request, immediately denied OpenAI’s objections. He was seemingly unmoved by the company’s claims that the order forced OpenAI to abandon “long-standing privacy norms” and weaken privacy protections that users expect based on ChatGPT’s terms of service. Rather, Stein suggested that OpenAI’s user agreement specified that their data could be retained as part of a legal process, which Stein said is exactly what is happening now.

The order was issued by magistrate judge Ona Wang just days after news organizations, led by The New York Times, requested it. The news plaintiffs claimed the order was urgently needed to preserve potential evidence in their copyright case, alleging that ChatGPT users are likely to delete chats where they attempted to use the chatbot to skirt paywalls to access news content.

A spokesperson told Ars that OpenAI plans to “keep fighting” the order, but the ChatGPT maker seems to have few options left. They could possibly petition the Second Circuit Court of Appeals for a rarely granted emergency order that could intervene to block Wang’s order, but the appeals court would have to consider Wang’s order an extraordinary abuse of discretion for OpenAI to win that fight.

OpenAI’s spokesperson declined to confirm if the company plans to pursue this extreme remedy.

In the meantime, OpenAI is negotiating a process that will allow news plaintiffs to search through the retained data. Perhaps the sooner that process begins, the sooner the data will be deleted. And that possibility puts OpenAI in the difficult position of having to choose between either caving to some data collection to stop retaining data as soon as possible or prolonging the fight over the order and potentially putting more users’ private conversations at risk of exposure through litigation or, worse, a data breach.

News orgs will soon start searching ChatGPT logs

The clock is ticking, and so far, OpenAI has not provided any official updates since a June 5 blog post detailing which ChatGPT users will be affected.

While it’s clear that OpenAI has been and will continue to retain mounds of data, it would be impossible for The New York Times or any news plaintiff to search through all that data.

Instead, only a small sample of the data will likely be accessed, based on keywords that OpenAI and news plaintiffs agree on. That data will remain on OpenAI’s servers, where it will be anonymized, and it will likely never be directly produced to plaintiffs.

Both sides are negotiating the exact process for searching through the chat logs, with both parties seemingly hoping to minimize the amount of time the chat logs will be preserved.

For OpenAI, sharing the logs risks revealing instances of infringing outputs that could further spike damages in the case. The logs could also expose how often outputs attribute misinformation to news plaintiffs.

But for news plaintiffs, accessing the logs is not considered key to their case—perhaps providing additional examples of copying—but could help news organizations argue that ChatGPT dilutes the market for their content. That could weigh against the fair use argument, as a judge opined in a recent ruling that evidence of market dilution could tip an AI copyright case in favor of plaintiffs.

Jay Edelson, a leading consumer privacy lawyer, told Ars that he’s concerned that judges don’t seem to be considering that any evidence in the ChatGPT logs wouldn’t “advance” news plaintiffs’ case “at all,” while really changing “a product that people are using on a daily basis.”

Edelson warned that OpenAI itself probably has better security than most firms to protect against a potential data breach that could expose these private chat logs. But “lawyers have notoriously been pretty bad about securing data,” Edelson suggested, so “the idea that you’ve got a bunch of lawyers who are going to be doing whatever they are” with “some of the most sensitive data on the planet” and “they’re the ones protecting it against hackers should make everyone uneasy.”

So even though odds are pretty good that the majority of users’ chats won’t end up in the sample, Edelson said the mere threat of being included might push some users to rethink how they use AI. He further warned that ChatGPT users turning to OpenAI rival services like Anthropic’s Claude or Google’s Gemini could suggest that Wang’s order is improperly influencing market forces, which also seems “crazy.”

To Edelson, the most “cynical” take could be that news plaintiffs are possibly hoping the order will threaten OpenAI’s business to the point where the AI company agrees to a settlement.

Regardless of the news plaintiffs’ motives, the order sets an alarming precedent, Edelson said. He joined critics suggesting that more AI data may be frozen in the future, potentially affecting even more users as a result of the sweeping order surviving scrutiny in this case. Imagine if litigation one day targets Google’s AI search summaries, Edelson suggested.

Lawyer slams judges for giving ChatGPT users no voice

Edelson told Ars that the order is so potentially threatening to OpenAI’s business that the company may not have a choice but to explore every path available to continue fighting it.

“They will absolutely do something to try to stop this,” Edelson predicted, calling the order “bonkers” for overlooking millions of users’ privacy concerns while “strangely” excluding enterprise customers.

From court filings, it seems possible that enterprise users were excluded to protect OpenAI’s competitiveness, but Edelson suggested there’s “no logic” to their exclusion “at all.” By excluding these ChatGPT users, the judge’s order may have removed the users best resourced to fight the order, Edelson suggested.

“What that means is the big businesses, the ones who have the power, all of their stuff remains private, and no one can touch that,” Edelson said.

Instead, the order is “only going to intrude on the privacy of the common people out there,” which Edelson said “is really offensive,” given that Wang denied two ChatGPT users’ panicked request to intervene.

“We are talking about billions of chats that are now going to be preserved when they weren’t going to be preserved before,” Edelson said, noting that he’s input information about his personal medical history into ChatGPT. “People ask for advice about their marriages, express concerns about losing jobs. They say really personal things. And one of the bargains in dealing with OpenAI is that you’re allowed to delete your chats and you’re allowed to temporary chats.”

The greatest risk to users would be a data breach, Edelson said, but that’s not the only potential privacy concern. Corynne McSherry, legal director for the digital rights group the Electronic Frontier Foundation, previously told Ars that as long as users’ data is retained, it could also be exposed through future law enforcement and private litigation requests.

Edelson pointed out that most privacy attorneys don’t consider OpenAI CEO Sam Altman to be a “privacy guy,” despite Altman recently slamming the NYT, alleging it sued OpenAI because it doesn’t “like user privacy.”

“He’s trying to protect OpenAI, and he does not give a hoot about the privacy rights of consumers,” Edelson said, echoing one ChatGPT user’s dismissed concern that OpenAI may not prioritize users’ privacy concerns in the case if it’s financially motivated to resolve the case.

“The idea that he and his lawyers are really going to be the safeguards here isn’t very compelling,” Edelson said. He criticized the judges for dismissing users’ concerns and rejecting OpenAI’s request that users get a chance to testify.

“What’s really most appalling to me is the people who are being affected have had no voice in it,” Edelson said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

NYT to start searching deleted ChatGPT logs after beating OpenAI in court Read More »

judge:-pirate-libraries-may-have-profited-from-meta-torrenting-80tb-of-books

Judge: Pirate libraries may have profited from Meta torrenting 80TB of books

It could certainly look worse for Meta if authors manage to present evidence supporting the second way that torrenting could be relevant to the case, Chhabaria suggested.

“Meta downloading copyrighted material from shadow libraries” would also be relevant to the character of the use, “if it benefitted those who created the libraries and thus supported and perpetuated their unauthorized copying and distribution of copyrighted works,” Chhabria wrote.

Counting potential strikes against Meta, Chhabria pointed out that the “vast majority of cases” involving “this sort of peer-to-peer file-sharing” are found to “constitute copyright infringement.” And it likely doesn’t help Meta’s case that “some of the libraries Meta used have themselves been found liable for infringement.”

However, Meta may overcome this argument, too, since book authors “have not submitted any evidence” that potentially shows how Meta’s downloading may perhaps be “propping up” or financially benefiting pirate libraries.

Finally, Chhabria noted that the “last issue relating to the character of Meta’s use” of books in regards to its torrenting is “the relationship between Meta’s downloading of the plaintiffs’ books and Meta’s use of the books to train Llama.”

Authors had tried to argue that these elements were distinct. But Chhabria said there’s no separating the fact that Meta downloaded the books to serve the “highly transformative” purpose of training Llama.

“Because Meta’s ultimate use of the plaintiffs’ books was transformative, so too was Meta’s downloading of those books,” Chhabria wrote.

AI training rulings may get more authors paid

Authors only learned of Meta’s torrenting through discovery in the lawsuit, and because of that, Chhabria noted that “the record on Meta’s alleged distribution is incomplete.”

It’s possible that authors may be able to show evidence that Meta “contributed to the BitTorrent network” by providing significant computing power that could’ve meaningfully assisted shadow libraries, Chhabria said in a footnote.

Judge: Pirate libraries may have profited from Meta torrenting 80TB of books Read More »

key-fair-use-ruling-clarifies-when-books-can-be-used-for-ai-training

Key fair use ruling clarifies when books can be used for AI training

“This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,” Alsup wrote. “Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

But Alsup said that the Anthropic case may not even need to decide on that, since Anthropic’s retention of pirated books for its research library alone was not transformative. Alsup wrote that Anthropic’s argument to hold onto potential AI training material it pirated in case it ever decided to use it for AI training was an attempt to “fast glide over thin ice.”

Additionally Alsup pointed out that Anthropic’s early attempts to get permission to train on authors’ works withered, as internal messages revealed the company concluded that stealing books was considered the more cost-effective path to innovation “to avoid ‘legal/practice/business slog,’ as cofounder and chief executive officer Dario Amodei put it.”

“Anthropic is wrong to suppose that so long as you create an exciting end product, every ‘back-end step, invisible to the public,’ is excused,” Alsup wrote. “Here, piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it.”

To avoid maximum damages in the event of a loss, Anthropic will likely continue arguing that replacing pirated books with purchased books should water down authors’ fight, Alsup’s order suggested.

“That Anthropic later bought a copy of a book it earlier stole off the Internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages,” Alsup noted.

Key fair use ruling clarifies when books can be used for AI training Read More »

man-who-stole-1,000-dvds-from-employer-strikes-plea-deal-over-movie-leaks

Man who stole 1,000 DVDs from employer strikes plea deal over movie leaks

An accused movie pirate who stole more than 1,000 Blu-ray discs and DVDs while working for a DVD manufacturing company struck a plea deal this week to lower his sentence after the FBI claimed the man’s piracy cost movie studios millions.

Steven Hale no longer works for the DVD company. He was arrested in March, accused of “bypassing encryption that prevents unauthorized copying” and ripping pre-release copies of movies he could only access because his former employer was used by major movie studios. As alleged by the feds, his game was beating studios to releases to achieve the greatest possible financial gains from online leaks.

Among the popular movies that Hale is believed to have leaked between 2021 and 2022 was Spider-Man: No Way Home, which the FBI alleged was copied “tens of millions of times” at an estimated loss of “tens of millions of dollars” for just one studio on one movie. Other movies Hale ripped included animated hits like Encanto and Sing 2, as well as anticipated sequels like The Matrix: Resurrections and Venom: Let There Be Carnage.

The cops first caught wind of Hale’s scheme in March 2022. They seized about 1,160 Blu-rays and DVDs in what TorrentFreak noted were the days just “after the Spider-Man movie leaked online.” It’s unclear why it took close to three years before Hale’s arrest, but TorrentFreak suggested that Hale’s case is perhaps part of a bigger investigation into the Spider-Man leaks.

Man who stole 1,000 DVDs from employer strikes plea deal over movie leaks Read More »

judge-on-meta’s-ai-training:-“i-just-don’t-understand-how-that-can-be-fair-use”

Judge on Meta’s AI training: “I just don’t understand how that can be fair use”


Judge downplayed Meta’s “messed up” torrenting in lawsuit over AI training.

A judge who may be the first to rule on whether AI training data is fair use appeared skeptical Thursday at a hearing where Meta faced off with book authors over the social media company’s alleged copyright infringement.

Meta, like most AI companies, holds that training must be deemed fair use, or else the entire AI industry could face immense setbacks, wasting precious time negotiating data contracts while falling behind global rivals. Meta urged the court to rule that AI training is a transformative use that only references books to create an entirely new work that doesn’t replicate authors’ ideas or replace books in their markets.

At the hearing that followed after both sides requested summary judgment, however, Judge Vince Chhabria pushed back on Meta attorneys arguing that the company’s Llama AI models posed no threat to authors in their markets, Reuters reported.

“You have companies using copyright-protected material to create a product that is capable of producing an infinite number of competing products,” Chhabria said. “You are dramatically changing, you might even say obliterating, the market for that person’s work, and you’re saying that you don’t even have to pay a license to that person.”

Declaring, “I just don’t understand how that can be fair use,” the shrewd judge apparently stoked little response from Meta’s attorney, Kannon Shanmugam, apart from a suggestion that any alleged threat to authors’ livelihoods was “just speculation,” Wired reported.

Authors may need to sharpen their case, which Chhabria warned could be “taken away by fair use” if none of the authors suing, including Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey, can show “that the market for their actual copyrighted work is going to be dramatically affected.”

Determined to probe this key question, Chhabria pushed authors’ attorney, David Boies, to point to specific evidence of market harms that seemed noticeably missing from the record.

“It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected by the billions of things that Llama will ultimately be capable of producing,” Chhabria said. “And it’s just not obvious to me that that’s the case.”

But if authors can prove fears of market harms are real, Meta might struggle to win over Chhabria, and that could set a precedent impacting copyright cases challenging AI training on other kinds of content.

The judge repeatedly appeared to be sympathetic to authors, suggesting that Meta’s AI training may be a “highly unusual case” where even though “the copying is for a highly transformative purpose, the copying has the high likelihood of leading to the flooding of the markets for the copyrighted works.”

And when Shanmugam argued that copyright law doesn’t shield authors from “protection from competition in the marketplace of ideas,” Chhabria resisted the framing that authors weren’t potentially being robbed, Reuters reported.

“But if I’m going to steal things from the marketplace of ideas in order to develop my own ideas, that’s copyright infringement, right?” Chhabria responded.

Wired noted that he asked Meta’s lawyers, “What about the next Taylor Swift?” If AI made it easy to knock off a young singer’s sound, how could she ever compete if AI produced “a billion pop songs” in her style?

In a statement, Meta’s spokesperson reiterated the company’s defense that AI training is fair use.

“Meta has developed transformational open source AI models that are powering incredible innovation, productivity, and creativity for individuals and companies,” Meta’s spokesperson said. “Fair use of copyrighted materials is vital to this. We disagree with Plaintiffs’ assertions, and the full record tells a different story. We will continue to vigorously defend ourselves and to protect the development of GenAI for the benefit of all.”

Meta’s torrenting seems “messed up”

Some have pondered why Chhabria appeared so focused on market harms, instead of hammering Meta for admittedly illegally pirating books that it used for its AI training, which seems to be obvious copyright infringement. According to Wired, “Chhabria spoke emphatically about his belief that the big question is whether Meta’s AI tools will hurt book sales and otherwise cause the authors to lose money,” not whether Meta’s torrenting of books was illegal.

The torrenting “seems kind of messed up,” Chhabria said, but “the question, as the courts tell us over and over again, is not whether something is messed up but whether it’s copyright infringement.”

It’s possible that Chhabria dodged the question for procedural reasons. In a court filing, Meta argued that authors had moved for summary judgment on Meta’s alleged copying of their works, not on “unsubstantiated allegations that Meta distributed Plaintiffs’ works via torrent.”

In the court filing, Meta alleged that even if Chhabria agreed that the authors’ request for “summary judgment is warranted on the basis of Meta’s distribution, as well as Meta’s copying,” that the authors “lack evidence to show that Meta distributed any of their works.”

According to Meta, authors abandoned any claims that Meta’s seeding of the torrented files served to distribute works, leaving only claims about Meta’s leeching. Meta argued that the authors “admittedly lack evidence that Meta ever uploaded any of their works, or any identifiable part of those works, during the so-called ‘leeching’ phase,” relying instead on expert estimates based on how torrenting works.

It’s also possible that for Chhabria, the torrenting question seemed like an unnecessary distraction. Former Meta attorney Mark Lumley, who quit the case earlier this year, told Vanity Fair that the torrenting was “one of those things that sounds bad but actually shouldn’t matter at all in the law. Fair use is always about uses the plaintiff doesn’t approve of; that’s why there is a lawsuit.”

Lumley suggested that court cases mulling fair use at this current moment should focus on the outputs, rather than the training. Citing the ruling in a case where Google Books scanning books to share excerpts was deemed fair use, Lumley argued that “all search engines crawl the full Internet, including plenty of pirated content,” so there’s seemingly no reason to stop AI crawling.

But the Copyright Alliance, a nonprofit, non-partisan group supporting the authors in the case, in a court filing alleged that Meta, in its bid to get AI products viewed as transformative, is aiming to do the opposite. “When describing the purpose of generative AI,” Meta allegedly strives to convince the court to “isolate the ‘training’ process and ignore the output of generative AI,” because that’s seemingly the only way that Meta can convince the court that AI outputs serve “a manifestly different purpose from Plaintiffs’ books,” the Copyright Alliance argued.

“Meta’s motion ignores what comes after the initial ‘training’—most notably the generation of output that serves the same purpose of the ingested works,” the Copyright Alliance argued. And the torrenting question should matter, the group argued, because unlike in Google Books, Meta’s AI models are apparently training on pirated works, not “legitimate copies of books.”

Chhabria will not be making a snap decision in the case, planning to take his time and likely stressing not just Meta, but every AI company defending training as fair use the longer he delays. Understanding that the entire AI industry potentially has a stake in the ruling, Chhabria apparently sought to relieve some tension at the end of the hearing with a joke, Wired reported.

 “I will issue a ruling later today,” Chhabria said. “Just kidding! I will take a lot longer to think about it.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Judge on Meta’s AI training: “I just don’t understand how that can be fair use” Read More »

feds-arrest-man-for-sharing-dvd-rip-of-spider-man-movie-with-millions-online

Feds arrest man for sharing DVD rip of Spider-Man movie with millions online

A 37-year-old Tennessee man was arrested Thursday, accused of stealing Blu-rays and DVDs from a manufacturing and distribution company used by major movie studios and sharing them online before the movies’ scheduled release dates.

According to a US Department of Justice press release, Steven Hale worked at the DVD company and allegedly stole “numerous ‘pre-release’ DVDs and Blu-rays” between February 2021 and March 2022. He then allegedly “ripped” the movies, “bypassing encryption that prevents unauthorized copying” and shared copies widely online. He also supposedly sold the actual stolen discs on e-commerce sites, the DOJ alleged.

Hale has been charged with “two counts of criminal copyright infringement and one count of interstate transportation of stolen goods,” the DOJ said. He faces a maximum sentence of five years for the former, and 10 years for the latter.

Among blockbuster movies that Hale is accused of stealing are Dune, F9: The Fast Saga, Venom: Let There Be Carnage, Godzilla v. Kong, and, perhaps most notably, Spider-Man: No Way Home.

The DOJ claimed that “copies of Spider-Man: No Way Home were downloaded tens of millions of times, with an estimated loss to the copyright owner of tens of millions of dollars.”

In 2021, when the Spider-Man movie was released in theaters only, it became the first movie during the COVID-19 pandemic to gross more than $1 billion at the box office, Forbes noted. But for those unwilling to venture out to see the movie, Forbes reported, the temptation to find leaks and torrents apparently became hard to resist. It was in this climate that Hale is accused of widely sharing copies of the movie before it was released online.

Feds arrest man for sharing DVD rip of Spider-Man movie with millions online Read More »

music-labels-will-regret-coming-for-the-internet-archive,-sound-historian-says

Music labels will regret coming for the Internet Archive, sound historian says

But David Seubert, who manages sound collections at the University of California, Santa Barbara library, told Ars that he frequently used the project as an archive and not just to listen to the recordings.

For Seubert, the videos that IA records of the 78 RPM albums capture more than audio of a certain era. Researchers like him want to look at the label, check out the copyright information, and note the catalogue numbers, he said.

“It has all this information there,” Seubert said. “I don’t even necessarily need to hear it,” he continued, adding, “just seeing the physicality of it, it’s like, ‘Okay, now I know more about this record.'”

Music publishers suing IA argue that all the songs included in their dispute—and likely many more, since the Great 78 Project spans 400,000 recordings—”are already available for streaming or downloading from numerous services.”

“These recordings face no danger of being lost, forgotten, or destroyed,” their filing claimed.

But Nathan Georgitis, the executive director of the Association for Recorded Sound Collections (ARSC), told Ars that you just don’t see 78 RPM records out in the world anymore. Even in record stores selling used vinyl, these recordings will be hidden “in a few boxes under the table behind the tablecloth,” Georgitis suggested. And in “many” cases, “the problem for libraries and archives is that those recordings aren’t necessarily commercially available for re-release.”

That “means that those recordings, those artists, the repertoire, the recorded sound history in itself—meaning the labels, the producers, the printings—all of that history kind of gets obscured from view,” Georgitis said.

Currently, libraries trying to preserve this history must control access to audio collections, Georgitis said. He sees IA’s work with the Great 78 Project as a legitimate archive in that, unlike a streaming service, where content may be inconsistently available, IA’s “mission is to preserve and provide access to content over time.”

Music labels will regret coming for the Internet Archive, sound historian says Read More »

isp-sued-by-record-labels-agrees-to-identify-100-users-accused-of-piracy

ISP sued by record labels agrees to identify 100 users accused of piracy

Cable company Altice agreed to give Warner and other record labels the names and contact information of 100 broadband subscribers who were accused of pirating songs.

The subscribers “were the subject of RIAA or third party copyright notices,” said a court order that approved the agreement between Altice and the plaintiff record companies. Altice is notifying each subscriber “of Altice’s intent to disclose their name and contact information to Plaintiffs pursuant to this Order,” and telling the notified subscribers that they have 30 days to seek relief from the court.

If subscribers do not object within a month, Altice must disclose the subscribers’ names, phone numbers, addresses, and email addresses. The judge’s order was issued on February 12 and reported yesterday by TorrentFreak.

The names and contact information will be classified as “highly confidential—attorneys’ eyes only.” A separate order issued in April 2024 said that documents produced in discovery “shall be used by the Parties only in the litigation of this Action and shall not be used for any other purpose.”

Altice, which operates the Optimum brand, was sued in December 2023 in US District Court for the Eastern District of Texas. The music publishers’ complaint alleges that Altice “knowingly contributed to, and reaped substantial profits from, massive copyright infringement committed by thousands of its subscribers.”

The lawsuit said plaintiffs sent over 70,000 infringement notices to Altice from February 2020 through November 2023. At least a few subscribers were allegedly hit with hundreds of notices. The lawsuit gave three examples of IP addresses that were cited in 502, 781, and 926 infringement notices, respectively.

Altice failed to terminate repeat infringers whose IP addresses were flagged in these copyright notices, the lawsuit said. “Those notices advised Altice of its subscribers’ blatant and systematic use of Altice’s Internet service to illegally download, copy, and distribute Plaintiffs’ copyrighted music through BitTorrent and other online file-sharing services. Rather than working with Plaintiffs to curb this massive infringement, Altice did nothing, choosing to prioritize its own profits over its legal obligations,” the plaintiffs alleged.

ISPs face numerous lawsuits

This is one of numerous copyright lawsuits filed against broadband providers, and it’s not the first time an ISP handed names of subscribers to the plaintiffs. We have previously written articles about film studios trying to force Reddit to identify users who admitted torrenting in discussion forums. Reddit was able to avoid providing information in one case in part because the film studios already obtained identifying details for 118 subscribers directly from Grande, the ISP they had sued.

ISP sued by record labels agrees to identify 100 users accused of piracy Read More »