copyright

nyt-to-start-searching-deleted-chatgpt-logs-after-beating-openai-in-court

NYT to start searching deleted ChatGPT logs after beating OpenAI in court


What are the odds NYT will access your ChatGPT logs in OpenAI court battle?

Last week, OpenAI raised objections in court, hoping to overturn a court order requiring the AI company to retain all ChatGPT logs “indefinitely,” including deleted and temporary chats.

But Sidney Stein, the US district judge reviewing OpenAI’s request, immediately denied OpenAI’s objections. He was seemingly unmoved by the company’s claims that the order forced OpenAI to abandon “long-standing privacy norms” and weaken privacy protections that users expect based on ChatGPT’s terms of service. Rather, Stein suggested that OpenAI’s user agreement specified that their data could be retained as part of a legal process, which Stein said is exactly what is happening now.

The order was issued by magistrate judge Ona Wang just days after news organizations, led by The New York Times, requested it. The news plaintiffs claimed the order was urgently needed to preserve potential evidence in their copyright case, alleging that ChatGPT users are likely to delete chats where they attempted to use the chatbot to skirt paywalls to access news content.

A spokesperson told Ars that OpenAI plans to “keep fighting” the order, but the ChatGPT maker seems to have few options left. They could possibly petition the Second Circuit Court of Appeals for a rarely granted emergency order that could intervene to block Wang’s order, but the appeals court would have to consider Wang’s order an extraordinary abuse of discretion for OpenAI to win that fight.

OpenAI’s spokesperson declined to confirm if the company plans to pursue this extreme remedy.

In the meantime, OpenAI is negotiating a process that will allow news plaintiffs to search through the retained data. Perhaps the sooner that process begins, the sooner the data will be deleted. And that possibility puts OpenAI in the difficult position of having to choose between either caving to some data collection to stop retaining data as soon as possible or prolonging the fight over the order and potentially putting more users’ private conversations at risk of exposure through litigation or, worse, a data breach.

News orgs will soon start searching ChatGPT logs

The clock is ticking, and so far, OpenAI has not provided any official updates since a June 5 blog post detailing which ChatGPT users will be affected.

While it’s clear that OpenAI has been and will continue to retain mounds of data, it would be impossible for The New York Times or any news plaintiff to search through all that data.

Instead, only a small sample of the data will likely be accessed, based on keywords that OpenAI and news plaintiffs agree on. That data will remain on OpenAI’s servers, where it will be anonymized, and it will likely never be directly produced to plaintiffs.

Both sides are negotiating the exact process for searching through the chat logs, with both parties seemingly hoping to minimize the amount of time the chat logs will be preserved.

For OpenAI, sharing the logs risks revealing instances of infringing outputs that could further spike damages in the case. The logs could also expose how often outputs attribute misinformation to news plaintiffs.

But for news plaintiffs, accessing the logs is not considered key to their case—perhaps providing additional examples of copying—but could help news organizations argue that ChatGPT dilutes the market for their content. That could weigh against the fair use argument, as a judge opined in a recent ruling that evidence of market dilution could tip an AI copyright case in favor of plaintiffs.

Jay Edelson, a leading consumer privacy lawyer, told Ars that he’s concerned that judges don’t seem to be considering that any evidence in the ChatGPT logs wouldn’t “advance” news plaintiffs’ case “at all,” while really changing “a product that people are using on a daily basis.”

Edelson warned that OpenAI itself probably has better security than most firms to protect against a potential data breach that could expose these private chat logs. But “lawyers have notoriously been pretty bad about securing data,” Edelson suggested, so “the idea that you’ve got a bunch of lawyers who are going to be doing whatever they are” with “some of the most sensitive data on the planet” and “they’re the ones protecting it against hackers should make everyone uneasy.”

So even though odds are pretty good that the majority of users’ chats won’t end up in the sample, Edelson said the mere threat of being included might push some users to rethink how they use AI. He further warned that ChatGPT users turning to OpenAI rival services like Anthropic’s Claude or Google’s Gemini could suggest that Wang’s order is improperly influencing market forces, which also seems “crazy.”

To Edelson, the most “cynical” take could be that news plaintiffs are possibly hoping the order will threaten OpenAI’s business to the point where the AI company agrees to a settlement.

Regardless of the news plaintiffs’ motives, the order sets an alarming precedent, Edelson said. He joined critics suggesting that more AI data may be frozen in the future, potentially affecting even more users as a result of the sweeping order surviving scrutiny in this case. Imagine if litigation one day targets Google’s AI search summaries, Edelson suggested.

Lawyer slams judges for giving ChatGPT users no voice

Edelson told Ars that the order is so potentially threatening to OpenAI’s business that the company may not have a choice but to explore every path available to continue fighting it.

“They will absolutely do something to try to stop this,” Edelson predicted, calling the order “bonkers” for overlooking millions of users’ privacy concerns while “strangely” excluding enterprise customers.

From court filings, it seems possible that enterprise users were excluded to protect OpenAI’s competitiveness, but Edelson suggested there’s “no logic” to their exclusion “at all.” By excluding these ChatGPT users, the judge’s order may have removed the users best resourced to fight the order, Edelson suggested.

“What that means is the big businesses, the ones who have the power, all of their stuff remains private, and no one can touch that,” Edelson said.

Instead, the order is “only going to intrude on the privacy of the common people out there,” which Edelson said “is really offensive,” given that Wang denied two ChatGPT users’ panicked request to intervene.

“We are talking about billions of chats that are now going to be preserved when they weren’t going to be preserved before,” Edelson said, noting that he’s input information about his personal medical history into ChatGPT. “People ask for advice about their marriages, express concerns about losing jobs. They say really personal things. And one of the bargains in dealing with OpenAI is that you’re allowed to delete your chats and you’re allowed to temporary chats.”

The greatest risk to users would be a data breach, Edelson said, but that’s not the only potential privacy concern. Corynne McSherry, legal director for the digital rights group the Electronic Frontier Foundation, previously told Ars that as long as users’ data is retained, it could also be exposed through future law enforcement and private litigation requests.

Edelson pointed out that most privacy attorneys don’t consider OpenAI CEO Sam Altman to be a “privacy guy,” despite Altman recently slamming the NYT, alleging it sued OpenAI because it doesn’t “like user privacy.”

“He’s trying to protect OpenAI, and he does not give a hoot about the privacy rights of consumers,” Edelson said, echoing one ChatGPT user’s dismissed concern that OpenAI may not prioritize users’ privacy concerns in the case if it’s financially motivated to resolve the case.

“The idea that he and his lawyers are really going to be the safeguards here isn’t very compelling,” Edelson said. He criticized the judges for dismissing users’ concerns and rejecting OpenAI’s request that users get a chance to testify.

“What’s really most appalling to me is the people who are being affected have had no voice in it,” Edelson said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

NYT to start searching deleted ChatGPT logs after beating OpenAI in court Read More »

in-a-wild-time-for-copyright-law,-the-us-copyright-office-has-no-leader

In a wild time for copyright law, the US Copyright Office has no leader


Rudderless Copyright Office has taken on new prominence during the AI boom.

It’s a tumultuous time for copyright in the United States, with dozens of potentially economy-shaking AI copyright lawsuits winding through the courts. It’s also the most turbulent moment in the US Copyright Office’s history. Described as “sleepy” in the past, the Copyright Office has taken on new prominence during the AI boom, issuing key rulings about AI and copyright. It also hasn’t had a leader in more than a month.

In May, Copyright Register Shira Perlmutter was abruptly fired by email by the White House’s deputy director of personnel. Perlmutter is now suing the Trump administration, alleging that her firing was invalid; the government maintains that the executive branch has the authority to dismiss her. As the legality of the ouster is debated, the reality within the office is this: There’s effectively nobody in charge. And without a leader actually showing up at work, the Copyright Office is not totally business-as-usual; in fact, there’s debate over whether the copyright certificates it’s issuing could be challenged.

The firing followed a pattern. The USCO is part of the Library of Congress; Perlmutter had been appointed to her role by Librarian of Congress Carla Hayden. A few days before Perlmutter’s dismissal, Hayden, who had been in her role since 2016, was also fired by the White House via email. The White House appointed Deputy Attorney General Todd Blanche, who had previously served as President Trump’s defense attorney, as the new acting Librarian of Congress.

Two days after Pelmutter’s firing, Justice Department official Paul Perkins showed up at the Copyright Office, along with his colleague Brian Nieves. According to an affidavit from Perlmutter, they were carrying “printed versions of emails” from Blanche indicating that they had been appointed to new roles within the Copyright Office. Perkins, the email said, was designated as Acting Register of Copyrights. In other words, he was Perlmutter’s replacement.

But was Blanche actually the acting Librarian, and thus able to appoint Perkins as such? Within the Library of Congress, someone else had already assumed the role—Robert Newlen, Hayden’s former second-in-command, who has worked at the LOC since the 1970s. Following Hayden’s ouster, Newlen emailed LOC staff asserting that he was the acting Librarian—never mentioning Blanche—and noting that “Congress is engaged with the White House” on how to proceed.

In her lawsuit, Perlmutter argues that only the Librarian of Congress can fire and appoint a new Register. In a filing on Tuesday, defendants argued that the president does indeed have the authority to fire and appoint the Librarian of Congress and that his appointees then have the ability to choose a new Copyright Register.

Neither the Department of Justice nor the White House responded to requests for comment on this issue; the Library of Congress declined to comment.

Perkins and Nieves did not enter the USCO office or assume the roles they purported to fill the day they showed up. And since they left, sources within the Library of Congress tell WIRED, they have never returned, nor have they assumed any of the duties associated with the roles. These sources say that Congress is in talks with the White House to reach an agreement over these personnel disputes.

A congressional aide familiar with the situation told WIRED that Blanche, Perkins, and Nieves had not shown up for work “because they don’t have jobs to show up to.” The aide continued: “As we’ve always maintained, the President has no authority to appoint them. Robert Newlen has always been the Acting Librarian of Congress.”

If talks are happening, they remain out of public view. But Perlmutter does have some members of Congress openly on her side. “The president has no authority to remove the Register of Copyrights. That power lies solely with the Librarian of Congress. I’m relieved that the situation at the Library and Copyright Office has stabilized following the administration’s unconstitutional attempt to seize control for the executive branch. I look forward to quickly resolving this matter in a bipartisan way,” Senator Alex Padilla tells WIRED in a statement.

In the meantime, the Copyright Office is in the odd position of attempting to carry on as though it wasn’t missing its head. Immediately after Perlmutter’s dismissal, the Copyright Office paused issuing registration certificates “out of an abundance of caution,” according to USCO spokesperson Lisa Berardi Marflak, who says the pause impacted around 20,000 registrations. It resumed activities on May 29 but is now sending out registration certificates with a blank spot where Perlmutter’s signature would ordinarily be.

This unusual change has prompted discussion amongst copyright experts as to whether the registrations are now more vulnerable to legal challenges. The Copyright Office maintains that they are valid: “There is no requirement that the Register’s signature must appear on registration certificates,” says Berardi Marflak.

In a motion related to Perlmutter’s lawsuit, though, she alleges that sending out the registrations without a signature opens them up to “challenges in litigation,” something outside copyright experts have also pointed out. “It’s true the law doesn’t explicitly require a signature,” IP lawyer Rachael Dickson says. “However, the law really explicitly says that it’s the Register of Copyright determining whether the material submitted for the application is copyrightable subject matter.”

Without anyone acting as Register, Dickson thinks it would be reasonable to argue that the statutory requirements are not being met. “If you take them completely out of the equation, you have a really big problem,” she says. “Litigators who are trying to challenge a copyright registration’s validity will jump on this.”

Perlmutter’s lawyers have argued that leaving the Copyright Office without an active boss will cause dysfunction beyond the registration certificate issue, as the Register performs a variety of tasks, from advising Congress on copyright to recertifying organizations like the Mechanical Licensing Collective, the nonprofit in charge of administering royalties for streaming and download music in the United States. Since the MLC’s certification is up right now, Perlmutter would ordinarily be moving forward with recertifying the organization; as her lawsuit notes, right now, the recertification process is not moving forward.

The MLC may not be as impacted by Perlmutter’s absence as the complaint suggests. A source close to the MLC told WIRED that the organization does indeed need to be recertified but that the law doesn’t require the recertification process to be completed within a specific time frame, so it will be able to continue operating as usual.

Still, there are other ways that the lack of a boss is a clear liability. The Copyright Claims Board, a three-person tribunal that resolves some copyright disputes, needs to replace one of its members this year, as a current board member, who did not reply to a request for comment, is leaving. The job posting is already live and says applications are being reviewed, but as the position is supposed to be appointed by the Librarian of Congress with the guidance of the Copyright Register, it’s unclear how exactly it will be filled. A source familiar at the Library of Congress tells WIRED that Newlen could make the appointment if necessary, but they “expect there to be some kind of greater resolution by then.”

As they wait for the resolution, it remains an especially inopportune time for a headless Copyright Office. Perlmutter was fired just days after the office released a hotly contested report on generative AI training and fair use. That report has already been heavily cited in a new class action lawsuit against AI tools Suno and Udio, even though it was technically a “prepublication” version and not finalized. But everyone looking to see what a final report will say—or what guidance the office will issue next—can only keep waiting.

This story originally appeared on wired.com.

Photo of WIRED

Wired.com is your essential daily guide to what’s next, delivering the most original and complete take you’ll find anywhere on innovation’s impact on technology, science, business and culture.

In a wild time for copyright law, the US Copyright Office has no leader Read More »

judge:-pirate-libraries-may-have-profited-from-meta-torrenting-80tb-of-books

Judge: Pirate libraries may have profited from Meta torrenting 80TB of books

It could certainly look worse for Meta if authors manage to present evidence supporting the second way that torrenting could be relevant to the case, Chhabaria suggested.

“Meta downloading copyrighted material from shadow libraries” would also be relevant to the character of the use, “if it benefitted those who created the libraries and thus supported and perpetuated their unauthorized copying and distribution of copyrighted works,” Chhabria wrote.

Counting potential strikes against Meta, Chhabria pointed out that the “vast majority of cases” involving “this sort of peer-to-peer file-sharing” are found to “constitute copyright infringement.” And it likely doesn’t help Meta’s case that “some of the libraries Meta used have themselves been found liable for infringement.”

However, Meta may overcome this argument, too, since book authors “have not submitted any evidence” that potentially shows how Meta’s downloading may perhaps be “propping up” or financially benefiting pirate libraries.

Finally, Chhabria noted that the “last issue relating to the character of Meta’s use” of books in regards to its torrenting is “the relationship between Meta’s downloading of the plaintiffs’ books and Meta’s use of the books to train Llama.”

Authors had tried to argue that these elements were distinct. But Chhabria said there’s no separating the fact that Meta downloaded the books to serve the “highly transformative” purpose of training Llama.

“Because Meta’s ultimate use of the plaintiffs’ books was transformative, so too was Meta’s downloading of those books,” Chhabria wrote.

AI training rulings may get more authors paid

Authors only learned of Meta’s torrenting through discovery in the lawsuit, and because of that, Chhabria noted that “the record on Meta’s alleged distribution is incomplete.”

It’s possible that authors may be able to show evidence that Meta “contributed to the BitTorrent network” by providing significant computing power that could’ve meaningfully assisted shadow libraries, Chhabria said in a footnote.

Judge: Pirate libraries may have profited from Meta torrenting 80TB of books Read More »

key-fair-use-ruling-clarifies-when-books-can-be-used-for-ai-training

Key fair use ruling clarifies when books can be used for AI training

“This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,” Alsup wrote. “Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

But Alsup said that the Anthropic case may not even need to decide on that, since Anthropic’s retention of pirated books for its research library alone was not transformative. Alsup wrote that Anthropic’s argument to hold onto potential AI training material it pirated in case it ever decided to use it for AI training was an attempt to “fast glide over thin ice.”

Additionally Alsup pointed out that Anthropic’s early attempts to get permission to train on authors’ works withered, as internal messages revealed the company concluded that stealing books was considered the more cost-effective path to innovation “to avoid ‘legal/practice/business slog,’ as cofounder and chief executive officer Dario Amodei put it.”

“Anthropic is wrong to suppose that so long as you create an exciting end product, every ‘back-end step, invisible to the public,’ is excused,” Alsup wrote. “Here, piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it.”

To avoid maximum damages in the event of a loss, Anthropic will likely continue arguing that replacing pirated books with purchased books should water down authors’ fight, Alsup’s order suggested.

“That Anthropic later bought a copy of a book it earlier stole off the Internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages,” Alsup noted.

Key fair use ruling clarifies when books can be used for AI training Read More »

openai-is-retaining-all-chatgpt-logs-“indefinitely”-here’s-who’s-affected.

OpenAI is retaining all ChatGPT logs “indefinitely.” Here’s who’s affected.

In the copyright fight, Magistrate Judge Ona Wang granted the order within one day of the NYT’s request. She agreed with news plaintiffs that it seemed likely that ChatGPT users may be spooked by the lawsuit and possibly set their chats to delete when using the chatbot to skirt NYT paywalls. Because OpenAI wasn’t sharing deleted chat logs, the news plaintiffs had no way of proving that, she suggested.

Now, OpenAI is not only asking Wang to reconsider but has “also appealed this order with the District Court Judge,” the Thursday statement said.

“We strongly believe this is an overreach by the New York Times,” Lightcap said. “We’re continuing to appeal this order so we can keep putting your trust and privacy first.”

Who can access deleted chats?

To protect users, OpenAI provides an FAQ that clearly explains why their data is being retained and how it could be exposed.

For example, the statement noted that the order doesn’t impact OpenAI API business customers under Zero Data Retention agreements because their data is never stored.

And for users whose data is affected, OpenAI noted that their deleted chats could be accessed, but they won’t “automatically” be shared with The New York Times. Instead, the retained data will be “stored separately in a secure system” and “protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations,” OpenAI explained.

Of course, with the court battle ongoing, the FAQ did not have all the answers.

Nobody knows how long OpenAI may be required to retain the deleted chats. Likely seeking to reassure users—some of which appeared to be considering switching to a rival service until the order lifts—OpenAI noted that “only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.”

OpenAI is retaining all ChatGPT logs “indefinitely.” Here’s who’s affected. Read More »

judge-on-meta’s-ai-training:-“i-just-don’t-understand-how-that-can-be-fair-use”

Judge on Meta’s AI training: “I just don’t understand how that can be fair use”


Judge downplayed Meta’s “messed up” torrenting in lawsuit over AI training.

A judge who may be the first to rule on whether AI training data is fair use appeared skeptical Thursday at a hearing where Meta faced off with book authors over the social media company’s alleged copyright infringement.

Meta, like most AI companies, holds that training must be deemed fair use, or else the entire AI industry could face immense setbacks, wasting precious time negotiating data contracts while falling behind global rivals. Meta urged the court to rule that AI training is a transformative use that only references books to create an entirely new work that doesn’t replicate authors’ ideas or replace books in their markets.

At the hearing that followed after both sides requested summary judgment, however, Judge Vince Chhabria pushed back on Meta attorneys arguing that the company’s Llama AI models posed no threat to authors in their markets, Reuters reported.

“You have companies using copyright-protected material to create a product that is capable of producing an infinite number of competing products,” Chhabria said. “You are dramatically changing, you might even say obliterating, the market for that person’s work, and you’re saying that you don’t even have to pay a license to that person.”

Declaring, “I just don’t understand how that can be fair use,” the shrewd judge apparently stoked little response from Meta’s attorney, Kannon Shanmugam, apart from a suggestion that any alleged threat to authors’ livelihoods was “just speculation,” Wired reported.

Authors may need to sharpen their case, which Chhabria warned could be “taken away by fair use” if none of the authors suing, including Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey, can show “that the market for their actual copyrighted work is going to be dramatically affected.”

Determined to probe this key question, Chhabria pushed authors’ attorney, David Boies, to point to specific evidence of market harms that seemed noticeably missing from the record.

“It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected by the billions of things that Llama will ultimately be capable of producing,” Chhabria said. “And it’s just not obvious to me that that’s the case.”

But if authors can prove fears of market harms are real, Meta might struggle to win over Chhabria, and that could set a precedent impacting copyright cases challenging AI training on other kinds of content.

The judge repeatedly appeared to be sympathetic to authors, suggesting that Meta’s AI training may be a “highly unusual case” where even though “the copying is for a highly transformative purpose, the copying has the high likelihood of leading to the flooding of the markets for the copyrighted works.”

And when Shanmugam argued that copyright law doesn’t shield authors from “protection from competition in the marketplace of ideas,” Chhabria resisted the framing that authors weren’t potentially being robbed, Reuters reported.

“But if I’m going to steal things from the marketplace of ideas in order to develop my own ideas, that’s copyright infringement, right?” Chhabria responded.

Wired noted that he asked Meta’s lawyers, “What about the next Taylor Swift?” If AI made it easy to knock off a young singer’s sound, how could she ever compete if AI produced “a billion pop songs” in her style?

In a statement, Meta’s spokesperson reiterated the company’s defense that AI training is fair use.

“Meta has developed transformational open source AI models that are powering incredible innovation, productivity, and creativity for individuals and companies,” Meta’s spokesperson said. “Fair use of copyrighted materials is vital to this. We disagree with Plaintiffs’ assertions, and the full record tells a different story. We will continue to vigorously defend ourselves and to protect the development of GenAI for the benefit of all.”

Meta’s torrenting seems “messed up”

Some have pondered why Chhabria appeared so focused on market harms, instead of hammering Meta for admittedly illegally pirating books that it used for its AI training, which seems to be obvious copyright infringement. According to Wired, “Chhabria spoke emphatically about his belief that the big question is whether Meta’s AI tools will hurt book sales and otherwise cause the authors to lose money,” not whether Meta’s torrenting of books was illegal.

The torrenting “seems kind of messed up,” Chhabria said, but “the question, as the courts tell us over and over again, is not whether something is messed up but whether it’s copyright infringement.”

It’s possible that Chhabria dodged the question for procedural reasons. In a court filing, Meta argued that authors had moved for summary judgment on Meta’s alleged copying of their works, not on “unsubstantiated allegations that Meta distributed Plaintiffs’ works via torrent.”

In the court filing, Meta alleged that even if Chhabria agreed that the authors’ request for “summary judgment is warranted on the basis of Meta’s distribution, as well as Meta’s copying,” that the authors “lack evidence to show that Meta distributed any of their works.”

According to Meta, authors abandoned any claims that Meta’s seeding of the torrented files served to distribute works, leaving only claims about Meta’s leeching. Meta argued that the authors “admittedly lack evidence that Meta ever uploaded any of their works, or any identifiable part of those works, during the so-called ‘leeching’ phase,” relying instead on expert estimates based on how torrenting works.

It’s also possible that for Chhabria, the torrenting question seemed like an unnecessary distraction. Former Meta attorney Mark Lumley, who quit the case earlier this year, told Vanity Fair that the torrenting was “one of those things that sounds bad but actually shouldn’t matter at all in the law. Fair use is always about uses the plaintiff doesn’t approve of; that’s why there is a lawsuit.”

Lumley suggested that court cases mulling fair use at this current moment should focus on the outputs, rather than the training. Citing the ruling in a case where Google Books scanning books to share excerpts was deemed fair use, Lumley argued that “all search engines crawl the full Internet, including plenty of pirated content,” so there’s seemingly no reason to stop AI crawling.

But the Copyright Alliance, a nonprofit, non-partisan group supporting the authors in the case, in a court filing alleged that Meta, in its bid to get AI products viewed as transformative, is aiming to do the opposite. “When describing the purpose of generative AI,” Meta allegedly strives to convince the court to “isolate the ‘training’ process and ignore the output of generative AI,” because that’s seemingly the only way that Meta can convince the court that AI outputs serve “a manifestly different purpose from Plaintiffs’ books,” the Copyright Alliance argued.

“Meta’s motion ignores what comes after the initial ‘training’—most notably the generation of output that serves the same purpose of the ingested works,” the Copyright Alliance argued. And the torrenting question should matter, the group argued, because unlike in Google Books, Meta’s AI models are apparently training on pirated works, not “legitimate copies of books.”

Chhabria will not be making a snap decision in the case, planning to take his time and likely stressing not just Meta, but every AI company defending training as fair use the longer he delays. Understanding that the entire AI industry potentially has a stake in the ruling, Chhabria apparently sought to relieve some tension at the end of the hearing with a joke, Wired reported.

 “I will issue a ruling later today,” Chhabria said. “Just kidding! I will take a lot longer to think about it.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Judge on Meta’s AI training: “I just don’t understand how that can be fair use” Read More »

judge-calls-out-openai’s-“straw-man”-argument-in-new-york-times-copyright-suit

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit

“Taken as true, these facts give rise to a plausible inference that defendants at a minimum had reason to investigate and uncover end-user infringement,” Stein wrote.

To Stein, the fact that OpenAI maintains an “ongoing relationship” with users by providing outputs that respond to users’ prompts also supports contributory infringement claims, despite OpenAI’s argument that ChatGPT’s “substantial noninfringing uses” are exonerative.

OpenAI defeated some claims

For OpenAI, Stein’s ruling likely disappoints, although Stein did drop some of NYT’s claims.

Likely upsetting to news publishers, that included a “free-riding” claim that ChatGPT unfairly profits off time-sensitive “hot news” items, including the NYT’s Wirecutter posts. Stein explained that news publishers failed to plausibly allege non-attribution (which is key to a free-riding claim) because, for example, ChatGPT cites the NYT when sharing information from Wirecutter posts. Those claims are pre-empted by the Copyright Act anyway, Stein wrote, granting OpenAI’s motion to dismiss.

Stein also dismissed a claim from the NYT regarding alleged removal of copyright management information (CMI), which Stein said cannot be proven simply because ChatGPT reproduces excerpts of NYT articles without CMI.

The Digital Millennium Copyright Act (DMCA) requires news publishers to show that ChatGPT’s outputs are “close to identical” to the original work, Stein said, and allowing publishers’ claims based on excerpts “would risk boundless DMCA liability”—including for any use of block quotes without CMI.

Asked for comment on the ruling, an OpenAI spokesperson declined to go into any specifics, instead repeating OpenAI’s long-held argument that AI training on copyrighted works is fair use. (Last month, OpenAI warned Donald Trump that the US would lose the AI race to China if courts ruled against that argument.)

“ChatGPT helps enhance human creativity, advance scientific discovery and medical research, and enable hundreds of millions of people to improve their daily lives,” OpenAI’s spokesperson said. “Our models empower innovation, and are trained on publicly available data and grounded in fair use.”

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit Read More »

music-labels-will-regret-coming-for-the-internet-archive,-sound-historian-says

Music labels will regret coming for the Internet Archive, sound historian says

But David Seubert, who manages sound collections at the University of California, Santa Barbara library, told Ars that he frequently used the project as an archive and not just to listen to the recordings.

For Seubert, the videos that IA records of the 78 RPM albums capture more than audio of a certain era. Researchers like him want to look at the label, check out the copyright information, and note the catalogue numbers, he said.

“It has all this information there,” Seubert said. “I don’t even necessarily need to hear it,” he continued, adding, “just seeing the physicality of it, it’s like, ‘Okay, now I know more about this record.'”

Music publishers suing IA argue that all the songs included in their dispute—and likely many more, since the Great 78 Project spans 400,000 recordings—”are already available for streaming or downloading from numerous services.”

“These recordings face no danger of being lost, forgotten, or destroyed,” their filing claimed.

But Nathan Georgitis, the executive director of the Association for Recorded Sound Collections (ARSC), told Ars that you just don’t see 78 RPM records out in the world anymore. Even in record stores selling used vinyl, these recordings will be hidden “in a few boxes under the table behind the tablecloth,” Georgitis suggested. And in “many” cases, “the problem for libraries and archives is that those recordings aren’t necessarily commercially available for re-release.”

That “means that those recordings, those artists, the repertoire, the recorded sound history in itself—meaning the labels, the producers, the printings—all of that history kind of gets obscured from view,” Georgitis said.

Currently, libraries trying to preserve this history must control access to audio collections, Georgitis said. He sees IA’s work with the Great 78 Project as a legitimate archive in that, unlike a streaming service, where content may be inconsistently available, IA’s “mission is to preserve and provide access to content over time.”

Music labels will regret coming for the Internet Archive, sound historian says Read More »

meta-claims-torrenting-pirated-books-isn’t-illegal-without-proof-of-seeding

Meta claims torrenting pirated books isn’t illegal without proof of seeding

Just because Meta admitted to torrenting a dataset of pirated books for AI training purposes, that doesn’t necessarily mean that Meta seeded the file after downloading it, the social media company claimed in a court filing this week.

Evidence instead shows that Meta “took precautions not to ‘seed’ any downloaded files,” Meta’s filing said. Seeding refers to sharing a torrented file after the download completes, and because there’s allegedly no proof of such “seeding,” Meta insisted that authors cannot prove Meta shared the pirated books with anyone during the torrenting process.

Whether or not Meta actually seeded the pirated books could make a difference in a copyright lawsuit from book authors including Richard Kadrey, Sarah Silverman, and Ta-Nehisi Coates. Authors had previously alleged that Meta unlawfully copied and distributed their works through AI outputs—an increasingly common complaint that so far has barely been litigated. But Meta’s admission to torrenting appears to add a more straightforward claim of unlawful distribution of copyrighted works through illegal torrenting, which has long been considered established case-law.

Authors have alleged that “Meta deliberately engaged in one of the largest data piracy campaigns in history to acquire text data for its LLM training datasets, torrenting and sharing dozens of terabytes of pirated data that altogether contain many millions of copyrighted works.” Separate from their copyright infringement claims opposing Meta’s AI training on pirated copies of their books, authors alleged that Meta torrenting the dataset was “independently illegal” under California’s Computer Data Access and Fraud Act (CDAFA), which allegedly “prevents the unauthorized taking of data, including copyrighted works.”

Meta, however, is hoping to convince the court that torrenting is not in and of itself illegal, but is, rather, a “widely-used protocol to download large files.” According to Meta, the decision to download the pirated books dataset from pirate libraries like LibGen and Z-Library was simply a move to access “data from a ‘well-known online repository’ that was publicly available via torrents.”

Meta claims torrenting pirated books isn’t illegal without proof of seeding Read More »

”torrenting-from-a-corporate-laptop-doesn’t-feel-right”:-meta-emails-unsealed

”Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed

Emails discussing torrenting prove that Meta knew it was “illegal,” authors alleged. And Bashlykov’s warnings seemingly landed on deaf ears, with authors alleging that evidence showed Meta chose to instead hide its torrenting as best it could while downloading and seeding terabytes of data from multiple shadow libraries as recently as April 2024.

Meta allegedly concealed seeding

Supposedly, Meta tried to conceal the seeding by not using Facebook servers while downloading the dataset to “avoid” the “risk” of anyone “tracing back the seeder/downloader” from Facebook servers, an internal message from Meta researcher Frank Zhang said, while describing the work as in “stealth mode.” Meta also allegedly modified settings “so that the smallest amount of seeding possible could occur,” a Meta executive in charge of project management, Michael Clark, said in a deposition.

Now that new information has come to light, authors claim that Meta staff involved in the decision to torrent LibGen must be deposed again, because allegedly the new facts “contradict prior deposition testimony.”

Mark Zuckerberg, for example, claimed to have no involvement in decisions to use LibGen to train AI models. But unredacted messages show the “decision to use LibGen occurred” after “a prior escalation to MZ,” authors alleged.

Meta did not immediately respond to Ars’ request for comment and has maintained throughout the litigation that AI training on LibGen was “fair use.”

However, Meta has previously addressed its torrenting in a motion to dismiss filed last month, telling the court that “plaintiffs do not plead a single instance in which any part of any book was, in fact, downloaded by a third party from Meta via torrent, much less that Plaintiffs’ books were somehow distributed by Meta.”

While Meta may be confident in its legal strategy despite the new torrenting wrinkle, the social media company has seemingly complicated its case by allowing authors to expand the distribution theory that’s key to winning a direct copyright infringement claim beyond just claiming that Meta’s AI outputs unlawfully distributed their works.

As limited discovery on Meta’s seeding now proceeds, Meta is not fighting the seeding aspect of the direct copyright infringement claim at this time, telling the court that it plans to “set… the record straight and debunk… this meritless allegation on summary judgment.”

”Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed Read More »

copyright-office-suggests-ai-copyright-debate-was-settled-in-1965

Copyright Office suggests AI copyright debate was settled in 1965


Most people think purely AI-generated works shouldn’t be copyrighted, report says.

Ars used Copilot to generate this AI image using the precise prompt the Copyright Office used to determine that prompting alone isn’t authorship. Credit: AI image generated by Copilot

The US Copyright Office issued AI guidance this week that declared no laws need to be clarified when it comes to protecting authorship rights of humans producing AI-assisted works.

“Questions of copyrightability and AI can be resolved pursuant to existing law, without the need for legislative change,” the Copyright Office said.

More than 10,000 commenters weighed in on the guidance, with some hoping to convince the Copyright Office to guarantee more protections for artists as AI technologies advance and the line between human- and AI-created works seems to increasingly blur.

But the Copyright Office insisted that the AI copyright debate was settled in 1965 after commercial computer technology started advancing quickly and “difficult questions of authorship” were first raised. That was the first time officials had to ponder how much involvement human creators had in works created using computers.

Back then, the Register of Copyrights, Abraham Kaminstein—who was also instrumental in codifying fair use—suggested that “there is no one-size-fits-all answer” to copyright questions about computer-assisted human authorship. And the Copyright Office agrees that’s still the case today.

“Very few bright-line rules are possible,” the Copyright Office said, with one obvious exception. Because of “insufficient human control over the expressive elements” of resulting works, “if content is entirely generated by AI, it cannot be protected by copyright.”

The office further clarified that doesn’t mean that works assisted by AI can never be copyrighted.

“Where AI merely assists an author in the creative process, its use does not change the copyrightability of the output,” the Copyright Office said.

Following Kaminstein’s advice, officials plan to continue reviewing AI disclosures and weighing, on a case-by-case basis, what parts of each work are AI-authored and which parts are human-authored. Any human-authored expressive element can be copyrighted, the office said, but any aspect of the work deemed to have been generated purely by AI cannot.

Prompting alone isn’t authorship, Copyright Office says

After doing some testing on whether the same exact prompt can generate widely varied outputs, even from the same AI tool, the Copyright Office further concluded that “prompts do not alone provide sufficient control” over outputs to allow creators to copyright purely AI-generated works based on highly intelligent or creative prompting.

That decision could change, the Copyright Office said, if AI technologies provide more human control over outputs through prompting.

New guidance noted, for example, that some AI tools allow prompts or other inputs “to be substantially retained as part of the output.” Consider an artist uploading an original drawing, the Copyright Office suggested, and prompting AI to modify colors, or an author uploading an original piece and using AI to translate it. And “other generative AI systems also offer tools that similarly allow users to exert control over the selection, arrangement, and content of the final output.”

The Copyright Office drafted this prompt to test artists’ control over expressive inputs that are retained in AI outputs. Credit: Copyright Office

“Where a human inputs their own copyrightable work and that work is perceptible in the output, they will be the author of at least that portion of the output,” the guidelines said.

But if officials conclude that even the most iterative prompting doesn’t perfectly control the resulting outputs—even slowly, repeatedly prompting AI to produce the exact vision in an artist’s head—some artists are sure to be disappointed. One artist behind a controversial prize-winning AI-generated artwork has staunchly defended his rigorous AI prompting as authorship.

However, if “even expert researchers are limited in their ability to understand or predict the behavior of specific models,” the Copyright Office said it struggled to see how artists could. To further prove their point, officials drafted a lengthy, quirky prompt about a cat reading a Sunday newspaper to compare different outputs from the same AI image generator.

Copyright Office drafted a quirky, lengthy prompt to test creative control over AI outputs. Credit: Copyright Office

Officials apparently agreed with Adobe, which submitted a comment advising the Copyright Office that any output is “based solely on the AI’s interpretation of that prompt.” Academics further warned that copyrighting outputs based only on prompting could lead copyright law to “effectively vest” authorship adopters with “rights in ideas.”

“The Office concludes that, given current generally available technology, prompts alone do not provide sufficient human control to make users of an AI system the authors of the output. Prompts essentially function as instructions that convey unprotectable ideas,” the guidance said. “While highly detailed prompts could contain the user’s desired expressive elements, at present they do not control how the AI system processes them in generating the output.”

Hundreds of AI artworks are copyrighted, officials say

The Copyright Office repeatedly emphasized that most commenters agreed with the majority of their conclusions. Officials also stressed that hundreds of AI artworks submitted for registration, under existing law, have been approved to copyright the human-authored elements of their works. Rejections are apparently expected to be less common.

“In most cases,” the Copyright Office said, “humans will be involved in the creation process, and the work will be copyrightable to the extent that their contributions qualify as authorship.”

For stakeholders who have been awaiting this guidance for months, the Copyright Office report may not change the law, but it offers some clarity.

For some artists who hoped to push the Copyright Office to adapt laws, the guidelines may disappoint, leaving many questions about a world of possible creative AI uses unanswered. But while a case-by-case approach may leave some artists unsure about which parts of their works are copyrightable, seemingly common cases are being resolved more readily. According to the Copyright Office, after each decision, it gets easier to register AI works that meet similar standards for copyrightability. Perhaps over time, artists will grow more secure in how they use AI and whether it will impact their exclusive rights to distribute works.

That’s likely cold comfort for the artist advocating for prompting alone to constitute authorship. One AI artist told Ars in October that being denied a copyright has meant suffering being mocked and watching his award-winning work freely used anywhere online without his permission and without payment. But in the end, the Copyright Office was apparently more sympathetic to other commenters who warned that humanity’s progress in the arts could be hampered if a flood of easily generated, copyrightable AI works drowned too many humans out of the market.

“We share the concerns expressed about the impact of AI-generated material on human authors and the value that their creative expression provides to society. If a flood of easily and rapidly AI-generated content drowns out human-authored works in the marketplace, additional legal protection would undermine rather than advance the goals of the copyright system. The availability of vastly more works to choose from could actually make it harder to find inspiring or enlightening content.”

New guidance likely a big yawn for AI companies

For AI companies, the copyright guidance may mean very little. According to AI company Hugging Face’s comments to the Copyright Office, no changes in the law were needed to ensure the US continued leading in AI innovation, because “very little to no innovation in generative AI is driven by the hope of obtaining copyright protection for model outputs.”

Hugging Face’s Head of ML & Society, Yacine Jernite, told Ars that the Copyright Office seemed to “take a constructive approach” to answering some of artists’ biggest questions about AI.

“We believe AI should support, not replace, artists,” Jernite told Ars. “For that to happen, the value of creative work must remain in its human contribution, regardless of the tools used.”

Although the Copyright Office suggested that this week’s report might be the most highly anticipated, Jernite said that Hugging Face is eager to see the next report, which officials said would focus on “the legal implications of training AI models on copyrighted works, including licensing considerations and the allocation of any potential liability.”

“As a platform that supports broader participation in AI, we see more value in distributing its benefits than in concentrating all control with a few large model providers,” Jernite said. “We’re looking forward to the next part of the Copyright Office’s Report, particularly on training data, licensing, and liability, key questions especially for some types of output, like code.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Copyright Office suggests AI copyright debate was settled in 1965 Read More »

democrat-teams-up-with-movie-industry-to-propose-website-blocking-law

Democrat teams up with movie industry to propose website-blocking law

US Rep. Zoe Lofgren (D-Calif.) today proposed a law that would let copyright owners obtain court orders requiring Internet service providers to block access to foreign piracy websites. The bill would also force DNS providers to block sites.

Lofgren said in a press release that she “work[ed] for over a year with the tech, film, and television industries” on “a proposal that has a remedy for copyright infringers located overseas that does not disrupt the free Internet except for the infringers.” Lofgren said she plans to work with Republican leaders to enact the bill.

Lofgren’s press release includes a quote from Charles Rivkin, chairman and CEO of the Motion Picture Association (MPA). As we’ve previously written, the MPA has been urging Congress to pass a site-blocking law.

“More than 55 nations around the world, including democracies such as Canada, the United Kingdom, and Australia, have put in place tools similar to those proposed by Rep. Lofgren, and they have successfully reduced piracy’s harms while protecting consumer access to legal content,” Rivkin was quoted as saying in Lofgren’s press release today.

Lofgren is the ranking member of the House Science, Space, and Technology Committee and a member of the House Subcommittee on Courts, Intellectual Property, Artificial Intelligence and the Internet.

Bill called “censorious site-blocking” measure

Although Lofgren said her proposed Foreign Anti-Digital Piracy Act “preserves the open Internet,” consumer advocacy group Public Knowledge described the bill as a “censorious site-blocking” measure “that turns broadband providers into copyright police at Americans’ expense.”

“Rather than attacking the problem at its source—bringing the people running overseas piracy websites to court—Congress and its allies in the entertainment industry has decided to build out a sweeping infrastructure for censorship,” Public Knowledge Senior Policy Counsel Meredith Rose said. “Site-blocking orders force any service provider, from residential broadband providers to global DNS resolvers, to disrupt traffic from targeted websites accused of copyright infringement. More importantly, applying blocking orders to global DNS resolvers results in global blocks. This means that one court can cut off access to a website globally, based on one individual’s filing and an expedited procedure. Blocking orders are incredibly powerful weapons, ripe for abuse, and we’ve seen the messy consequences of them being implemented in other countries.”

Democrat teams up with movie industry to propose website-blocking law Read More »