copyright

”torrenting-from-a-corporate-laptop-doesn’t-feel-right”:-meta-emails-unsealed

”Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed

Emails discussing torrenting prove that Meta knew it was “illegal,” authors alleged. And Bashlykov’s warnings seemingly landed on deaf ears, with authors alleging that evidence showed Meta chose to instead hide its torrenting as best it could while downloading and seeding terabytes of data from multiple shadow libraries as recently as April 2024.

Meta allegedly concealed seeding

Supposedly, Meta tried to conceal the seeding by not using Facebook servers while downloading the dataset to “avoid” the “risk” of anyone “tracing back the seeder/downloader” from Facebook servers, an internal message from Meta researcher Frank Zhang said, while describing the work as in “stealth mode.” Meta also allegedly modified settings “so that the smallest amount of seeding possible could occur,” a Meta executive in charge of project management, Michael Clark, said in a deposition.

Now that new information has come to light, authors claim that Meta staff involved in the decision to torrent LibGen must be deposed again, because allegedly the new facts “contradict prior deposition testimony.”

Mark Zuckerberg, for example, claimed to have no involvement in decisions to use LibGen to train AI models. But unredacted messages show the “decision to use LibGen occurred” after “a prior escalation to MZ,” authors alleged.

Meta did not immediately respond to Ars’ request for comment and has maintained throughout the litigation that AI training on LibGen was “fair use.”

However, Meta has previously addressed its torrenting in a motion to dismiss filed last month, telling the court that “plaintiffs do not plead a single instance in which any part of any book was, in fact, downloaded by a third party from Meta via torrent, much less that Plaintiffs’ books were somehow distributed by Meta.”

While Meta may be confident in its legal strategy despite the new torrenting wrinkle, the social media company has seemingly complicated its case by allowing authors to expand the distribution theory that’s key to winning a direct copyright infringement claim beyond just claiming that Meta’s AI outputs unlawfully distributed their works.

As limited discovery on Meta’s seeding now proceeds, Meta is not fighting the seeding aspect of the direct copyright infringement claim at this time, telling the court that it plans to “set… the record straight and debunk… this meritless allegation on summary judgment.”

”Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed Read More »

copyright-office-suggests-ai-copyright-debate-was-settled-in-1965

Copyright Office suggests AI copyright debate was settled in 1965


Most people think purely AI-generated works shouldn’t be copyrighted, report says.

Ars used Copilot to generate this AI image using the precise prompt the Copyright Office used to determine that prompting alone isn’t authorship. Credit: AI image generated by Copilot

The US Copyright Office issued AI guidance this week that declared no laws need to be clarified when it comes to protecting authorship rights of humans producing AI-assisted works.

“Questions of copyrightability and AI can be resolved pursuant to existing law, without the need for legislative change,” the Copyright Office said.

More than 10,000 commenters weighed in on the guidance, with some hoping to convince the Copyright Office to guarantee more protections for artists as AI technologies advance and the line between human- and AI-created works seems to increasingly blur.

But the Copyright Office insisted that the AI copyright debate was settled in 1965 after commercial computer technology started advancing quickly and “difficult questions of authorship” were first raised. That was the first time officials had to ponder how much involvement human creators had in works created using computers.

Back then, the Register of Copyrights, Abraham Kaminstein—who was also instrumental in codifying fair use—suggested that “there is no one-size-fits-all answer” to copyright questions about computer-assisted human authorship. And the Copyright Office agrees that’s still the case today.

“Very few bright-line rules are possible,” the Copyright Office said, with one obvious exception. Because of “insufficient human control over the expressive elements” of resulting works, “if content is entirely generated by AI, it cannot be protected by copyright.”

The office further clarified that doesn’t mean that works assisted by AI can never be copyrighted.

“Where AI merely assists an author in the creative process, its use does not change the copyrightability of the output,” the Copyright Office said.

Following Kaminstein’s advice, officials plan to continue reviewing AI disclosures and weighing, on a case-by-case basis, what parts of each work are AI-authored and which parts are human-authored. Any human-authored expressive element can be copyrighted, the office said, but any aspect of the work deemed to have been generated purely by AI cannot.

Prompting alone isn’t authorship, Copyright Office says

After doing some testing on whether the same exact prompt can generate widely varied outputs, even from the same AI tool, the Copyright Office further concluded that “prompts do not alone provide sufficient control” over outputs to allow creators to copyright purely AI-generated works based on highly intelligent or creative prompting.

That decision could change, the Copyright Office said, if AI technologies provide more human control over outputs through prompting.

New guidance noted, for example, that some AI tools allow prompts or other inputs “to be substantially retained as part of the output.” Consider an artist uploading an original drawing, the Copyright Office suggested, and prompting AI to modify colors, or an author uploading an original piece and using AI to translate it. And “other generative AI systems also offer tools that similarly allow users to exert control over the selection, arrangement, and content of the final output.”

The Copyright Office drafted this prompt to test artists’ control over expressive inputs that are retained in AI outputs. Credit: Copyright Office

“Where a human inputs their own copyrightable work and that work is perceptible in the output, they will be the author of at least that portion of the output,” the guidelines said.

But if officials conclude that even the most iterative prompting doesn’t perfectly control the resulting outputs—even slowly, repeatedly prompting AI to produce the exact vision in an artist’s head—some artists are sure to be disappointed. One artist behind a controversial prize-winning AI-generated artwork has staunchly defended his rigorous AI prompting as authorship.

However, if “even expert researchers are limited in their ability to understand or predict the behavior of specific models,” the Copyright Office said it struggled to see how artists could. To further prove their point, officials drafted a lengthy, quirky prompt about a cat reading a Sunday newspaper to compare different outputs from the same AI image generator.

Copyright Office drafted a quirky, lengthy prompt to test creative control over AI outputs. Credit: Copyright Office

Officials apparently agreed with Adobe, which submitted a comment advising the Copyright Office that any output is “based solely on the AI’s interpretation of that prompt.” Academics further warned that copyrighting outputs based only on prompting could lead copyright law to “effectively vest” authorship adopters with “rights in ideas.”

“The Office concludes that, given current generally available technology, prompts alone do not provide sufficient human control to make users of an AI system the authors of the output. Prompts essentially function as instructions that convey unprotectable ideas,” the guidance said. “While highly detailed prompts could contain the user’s desired expressive elements, at present they do not control how the AI system processes them in generating the output.”

Hundreds of AI artworks are copyrighted, officials say

The Copyright Office repeatedly emphasized that most commenters agreed with the majority of their conclusions. Officials also stressed that hundreds of AI artworks submitted for registration, under existing law, have been approved to copyright the human-authored elements of their works. Rejections are apparently expected to be less common.

“In most cases,” the Copyright Office said, “humans will be involved in the creation process, and the work will be copyrightable to the extent that their contributions qualify as authorship.”

For stakeholders who have been awaiting this guidance for months, the Copyright Office report may not change the law, but it offers some clarity.

For some artists who hoped to push the Copyright Office to adapt laws, the guidelines may disappoint, leaving many questions about a world of possible creative AI uses unanswered. But while a case-by-case approach may leave some artists unsure about which parts of their works are copyrightable, seemingly common cases are being resolved more readily. According to the Copyright Office, after each decision, it gets easier to register AI works that meet similar standards for copyrightability. Perhaps over time, artists will grow more secure in how they use AI and whether it will impact their exclusive rights to distribute works.

That’s likely cold comfort for the artist advocating for prompting alone to constitute authorship. One AI artist told Ars in October that being denied a copyright has meant suffering being mocked and watching his award-winning work freely used anywhere online without his permission and without payment. But in the end, the Copyright Office was apparently more sympathetic to other commenters who warned that humanity’s progress in the arts could be hampered if a flood of easily generated, copyrightable AI works drowned too many humans out of the market.

“We share the concerns expressed about the impact of AI-generated material on human authors and the value that their creative expression provides to society. If a flood of easily and rapidly AI-generated content drowns out human-authored works in the marketplace, additional legal protection would undermine rather than advance the goals of the copyright system. The availability of vastly more works to choose from could actually make it harder to find inspiring or enlightening content.”

New guidance likely a big yawn for AI companies

For AI companies, the copyright guidance may mean very little. According to AI company Hugging Face’s comments to the Copyright Office, no changes in the law were needed to ensure the US continued leading in AI innovation, because “very little to no innovation in generative AI is driven by the hope of obtaining copyright protection for model outputs.”

Hugging Face’s Head of ML & Society, Yacine Jernite, told Ars that the Copyright Office seemed to “take a constructive approach” to answering some of artists’ biggest questions about AI.

“We believe AI should support, not replace, artists,” Jernite told Ars. “For that to happen, the value of creative work must remain in its human contribution, regardless of the tools used.”

Although the Copyright Office suggested that this week’s report might be the most highly anticipated, Jernite said that Hugging Face is eager to see the next report, which officials said would focus on “the legal implications of training AI models on copyrighted works, including licensing considerations and the allocation of any potential liability.”

“As a platform that supports broader participation in AI, we see more value in distributing its benefits than in concentrating all control with a few large model providers,” Jernite said. “We’re looking forward to the next part of the Copyright Office’s Report, particularly on training data, licensing, and liability, key questions especially for some types of output, like code.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Copyright Office suggests AI copyright debate was settled in 1965 Read More »

democrat-teams-up-with-movie-industry-to-propose-website-blocking-law

Democrat teams up with movie industry to propose website-blocking law

US Rep. Zoe Lofgren (D-Calif.) today proposed a law that would let copyright owners obtain court orders requiring Internet service providers to block access to foreign piracy websites. The bill would also force DNS providers to block sites.

Lofgren said in a press release that she “work[ed] for over a year with the tech, film, and television industries” on “a proposal that has a remedy for copyright infringers located overseas that does not disrupt the free Internet except for the infringers.” Lofgren said she plans to work with Republican leaders to enact the bill.

Lofgren’s press release includes a quote from Charles Rivkin, chairman and CEO of the Motion Picture Association (MPA). As we’ve previously written, the MPA has been urging Congress to pass a site-blocking law.

“More than 55 nations around the world, including democracies such as Canada, the United Kingdom, and Australia, have put in place tools similar to those proposed by Rep. Lofgren, and they have successfully reduced piracy’s harms while protecting consumer access to legal content,” Rivkin was quoted as saying in Lofgren’s press release today.

Lofgren is the ranking member of the House Science, Space, and Technology Committee and a member of the House Subcommittee on Courts, Intellectual Property, Artificial Intelligence and the Internet.

Bill called “censorious site-blocking” measure

Although Lofgren said her proposed Foreign Anti-Digital Piracy Act “preserves the open Internet,” consumer advocacy group Public Knowledge described the bill as a “censorious site-blocking” measure “that turns broadband providers into copyright police at Americans’ expense.”

“Rather than attacking the problem at its source—bringing the people running overseas piracy websites to court—Congress and its allies in the entertainment industry has decided to build out a sweeping infrastructure for censorship,” Public Knowledge Senior Policy Counsel Meredith Rose said. “Site-blocking orders force any service provider, from residential broadband providers to global DNS resolvers, to disrupt traffic from targeted websites accused of copyright infringement. More importantly, applying blocking orders to global DNS resolvers results in global blocks. This means that one court can cut off access to a website globally, based on one individual’s filing and an expedited procedure. Blocking orders are incredibly powerful weapons, ripe for abuse, and we’ve seen the messy consequences of them being implemented in other countries.”

Democrat teams up with movie industry to propose website-blocking law Read More »

openai-blamed-nyt-for-tech-problem-erasing-evidence-of-copyright-abuse

OpenAI blamed NYT for tech problem erasing evidence of copyright abuse


It’s not “lost,” just “inadvertently removed”

OpenAI denies deleting evidence, asks why NYT didn’t back up data.

OpenAI keeps deleting data that could allegedly prove the AI company violated copyright laws by training ChatGPT on authors’ works. Apparently largely unintentional, the sloppy practice is seemingly dragging out early court battles that could determine whether AI training is fair use.

Most recently, The New York Times accused OpenAI of unintentionally erasing programs and search results that the newspaper believed could be used as evidence of copyright abuse.

The NYT apparently spent more than 150 hours extracting training data, while following a model inspection protocol that OpenAI set up precisely to avoid conducting potentially damning searches of its own database. This process began in October, but by mid-November, the NYT discovered that some of the data gathered had been erased due to what OpenAI called a “glitch.”

Looking to update the court about potential delays in discovery, the NYT asked OpenAI to collaborate on a joint filing admitting the deletion occurred. But OpenAI declined, instead filing a separate response calling the newspaper’s accusation that evidence was deleted “exaggerated” and blaming the NYT for the technical problem that triggered the data deleting.

OpenAI denied deleting “any evidence,” instead admitting only that file-system information was “inadvertently removed” after the NYT requested a change that resulted in “self-inflicted wounds.” According to OpenAI, the tech problem emerged because NYT was hoping to speed up its searches and requested a change to the model inspection set-up that OpenAI warned “would yield no speed improvements and might even hinder performance.”

The AI company accused the NYT of negligence during discovery, “repeatedly running flawed code” while conducting searches of URLs and phrases from various newspaper articles and failing to back up their data. Allegedly the change that NYT requested “resulted in removing the folder structure and some file names on one hard drive,” which “was supposed to be used as a temporary cache for storing OpenAI data, but evidently was also used by Plaintiffs to save some of their search results (apparently without any backups).”

Once OpenAI figured out what happened, data was restored, OpenAI said. But the NYT alleged that the only data that OpenAI could recover did “not include the original folder structure and original file names” and therefore “is unreliable and cannot be used to determine where the News Plaintiffs’ copied articles were used to build Defendants’ models.”

In response, OpenAI suggested that the NYT could simply take a few days and re-run the searches, insisting, “contrary to Plaintiffs’ insinuations, there is no reason to think that the contents of any files were lost.” But the NYT does not seem happy about having to retread any part of model inspection, continually frustrated by OpenAI’s expectation that plaintiffs must come up with search terms when OpenAI understands its models best.

OpenAI claimed that it has consulted on search terms and been “forced to pour enormous resources” into supporting the NYT’s model inspection efforts while continuing to avoid saying how much it’s costing. Previously, the NYT accused OpenAI of seeking to profit off these searches, attempting to charge retail prices instead of being transparent about actual costs.

Now, OpenAI appears to be more willing to conduct searches on behalf of NYT that it previously sought to avoid. In its filing, OpenAI asked the court to order news plaintiffs to “collaborate with OpenAI to develop a plan for reasonable, targeted searches to be executed either by Plaintiffs or OpenAI.”

How that might proceed will be discussed at a hearing on December 3. OpenAI said it was committed to preventing future technical issues and was “committed to resolving these issues efficiently and equitably.”

It’s not the first time OpenAI deleted data

This isn’t the only time that OpenAI has been called out for deleting data in a copyright case.

In May, book authors, including Sarah Silverman and Paul Tremblay, told a US district court in California that OpenAI admitted to deleting the controversial AI training data sets at issue in that litigation. Additionally, OpenAI admitted that “witnesses knowledgeable about the creation of these datasets have apparently left the company,” authors’ court filing said. Unlike the NYT, book authors seem to suggest that OpenAI’s deleting appeared potentially suspicious.

“OpenAI’s delay campaign continues,” the authors’ filing said, alleging that “evidence of what was contained in these datasets, how they were used, the circumstances of their deletion and the reasons for” the deletion “are all highly relevant.”

The judge in that case, Robert Illman, wrote that OpenAI’s dispute with authors has so far required too much judicial intervention, noting that both sides “are not exactly proceeding through the discovery process with the degree of collegiality and cooperation that might be optimal.” Wired noted similarly the NYT case is “not exactly a lovefest.”

As these cases proceed, plaintiffs in both cases are struggling to decide on search terms that will surface the evidence they seek. While the NYT case is bogged down by OpenAI seemingly refusing to conduct any searches yet on behalf of publishers, the book author case is differently being dragged out by authors failing to provide search terms. Only four of the 15 authors suing have sent search terms, as their deadline for discovery approaches on January 27, 2025.

NYT judge rejects key part of fair use defense

OpenAI’s defense primarily hinges on courts agreeing that copying authors’ works to train AI is a transformative fair use that benefits the public, but the judge in the NYT case, Ona Wang, rejected a key part of that fair use defense late last week.

To win their fair use argument, OpenAI was trying to modify a fair use factor regarding “the effect of the use upon the potential market for or value of the copyrighted work” by invoking a common argument that the factor should be modified to include the “public benefits the copying will likely produce.”

Part of this defense tactic sought to prove that the NYT’s journalism benefits from generative AI technologies like ChatGPT, with OpenAI hoping to topple NYT’s claim that ChatGPT posed an existential threat to its business. To that end, OpenAI sought documents showing that the NYT uses AI tools, creates its own AI tools, and generally supports the use of AI in journalism outside the court battle.

On Friday, however, Wang denied OpenAI’s motion to compel this kind of evidence. Wang deemed it irrelevant to the case despite OpenAI’s claims that if AI tools “benefit” the NYT’s journalism, that “benefit” would be relevant to OpenAI’s fair use defense.

“But the Supreme Court specifically states that a discussion of ‘public benefits’ must relate to the benefits from the copying,” Wang wrote in a footnote, not “whether the copyright holder has admitted that other uses of its copyrights may or may not constitute fair use, or whether the copyright holder has entered into business relationships with other entities in the defendant’s industry.”

This likely stunts OpenAI’s fair use defense by cutting off an area of discovery that OpenAI previously fought hard to pursue. It essentially leaves OpenAI to argue that its copying of NYT content specifically serves a public good, not the act of AI training generally.

In February, Ars forecasted that the NYT might have the upper hand in this case because the NYT already showed that sometimes ChatGPT would reproduce word-for-word snippets of articles. That will likely make it harder to convince the court that training ChatGPT by copying NYT articles is a transformative fair use, as Google Books famously did when copying books to create a searchable database.

For OpenAI, the strategy seems to be to erect as strong a fair use case as possible to defend its most popular release. And if the court sides with OpenAI on that question, it won’t really matter how much evidence the NYT surfaces during model inspection. But if the use is not seen as transformative and then the NYT can prove the copying harms its business—without benefiting the public—OpenAI could risk losing this important case when the verdict comes in 2025. And that could have implications for book authors’ suit as well as other litigation, expected to drag into 2026.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI blamed NYT for tech problem erasing evidence of copyright abuse Read More »

openai-accused-of-trying-to-profit-off-ai-model-inspection-in-court

OpenAI accused of trying to profit off AI model inspection in court


Experiencing some technical difficulties

How do you get an AI model to confess what’s inside?

Credit: Aurich Lawson | Getty Images

Since ChatGPT became an instant hit roughly two years ago, tech companies around the world have rushed to release AI products while the public is still in awe of AI’s seemingly radical potential to enhance their daily lives.

But at the same time, governments globally have warned it can be hard to predict how rapidly popularizing AI can harm society. Novel uses could suddenly debut and displace workers, fuel disinformation, stifle competition, or threaten national security—and those are just some of the obvious potential harms.

While governments scramble to establish systems to detect harmful applications—ideally before AI models are deployed—some of the earliest lawsuits over ChatGPT show just how hard it is for the public to crack open an AI model and find evidence of harms once a model is released into the wild. That task is seemingly only made harder by an increasingly thirsty AI industry intent on shielding models from competitors to maximize profits from emerging capabilities.

The less the public knows, the seemingly harder and more expensive it is to hold companies accountable for irresponsible AI releases. This fall, ChatGPT-maker OpenAI was even accused of trying to profit off discovery by seeking to charge litigants retail prices to inspect AI models alleged as causing harms.

In a lawsuit raised by The New York Times over copyright concerns, OpenAI suggested the same model inspection protocol used in a similar lawsuit raised by book authors.

Under that protocol, the NYT could hire an expert to review highly confidential OpenAI technical materials “on a secure computer in a secured room without Internet access or network access to other computers at a secure location” of OpenAI’s choosing. In this closed-off arena, the expert would have limited time and limited queries to try to get the AI model to confess what’s inside.

The NYT seemingly had few concerns about the actual inspection process but bucked at OpenAI’s intended protocol capping the number of queries their expert could make through an application programming interface to $15,000 worth of retail credits. Once litigants hit that cap, OpenAI suggested that the parties split the costs of remaining queries, charging the NYT and co-plaintiffs half-retail prices to finish the rest of their discovery.

In September, the NYT told the court that the parties had reached an “impasse” over this protocol, alleging that “OpenAI seeks to hide its infringement by professing an undue—yet unquantified—’expense.'” According to the NYT, plaintiffs would need $800,000 worth of retail credits to seek the evidence they need to prove their case, but there’s allegedly no way it would actually cost OpenAI that much.

“OpenAI has refused to state what its actual costs would be, and instead improperly focuses on what it charges its customers for retail services as part of its (for profit) business,” the NYT claimed in a court filing.

In its defense, OpenAI has said that setting the initial cap is necessary to reduce the burden on OpenAI and prevent a NYT fishing expedition. The ChatGPT maker alleged that plaintiffs “are requesting hundreds of thousands of dollars of credits to run an arbitrary and unsubstantiated—and likely unnecessary—number of searches on OpenAI’s models, all at OpenAI’s expense.”

How this court debate resolves could have implications for future cases where the public seeks to inspect models causing alleged harms. It seems likely that if a court agrees OpenAI can charge retail prices for model inspection, it could potentially deter lawsuits from any plaintiffs who can’t afford to pay an AI expert or commercial prices for model inspection.

Lucas Hansen, co-founder of CivAI—a company that seeks to enhance public awareness of what AI can actually do—told Ars that probably a lot of inspection can be done on public models. But often, public models are fine-tuned, perhaps censoring certain queries and making it harder to find information that a model was trained on—which is the goal of NYT’s suit. By gaining API access to original models instead, litigants could have an easier time finding evidence to prove alleged harms.

It’s unclear exactly what it costs OpenAI to provide that level of access. Hansen told Ars that costs of training and experimenting with models “dwarfs” the cost of running models to provide full capability solutions. Developers have noted in forums that costs of API queries quickly add up, with one claiming OpenAI’s pricing is “killing the motivation to work with the APIs.”

The NYT’s lawyers and OpenAI declined to comment on the ongoing litigation.

US hurdles for AI safety testing

Of course, OpenAI is not the only AI company facing lawsuits over popular products. Artists have sued makers of image generators for allegedly threatening their livelihoods, and several chatbots have been accused of defamation. Other emerging harms include very visible examples—like explicit AI deepfakes, harming everyone from celebrities like Taylor Swift to middle schoolers—as well as underreported harms, like allegedly biased HR software.

A recent Gallup survey suggests that Americans are more trusting of AI than ever but still twice as likely to believe AI does “more harm than good” than that the benefits outweigh the harms. Hansen’s CivAI creates demos and interactive software for education campaigns helping the public to understand firsthand the real dangers of AI. He told Ars that while it’s hard for outsiders to trust a study from “some random organization doing really technical work” to expose harms, CivAI provides a controlled way for people to see for themselves how AI systems can be misused.

“It’s easier for people to trust the results, because they can do it themselves,” Hansen told Ars.

Hansen also advises lawmakers grappling with AI risks. In February, CivAI joined the Artificial Intelligence Safety Institute Consortium—a group including Fortune 500 companies, government agencies, nonprofits, and academic research teams that help to advise the US AI Safety Institute (AISI). But so far, Hansen said, CivAI has not been very active in that consortium beyond scheduling a talk to share demos.

The AISI is supposed to protect the US from risky AI models by conducting safety testing to detect harms before models are deployed. Testing should “address risks to human rights, civil rights, and civil liberties, such as those related to privacy, discrimination and bias, freedom of expression, and the safety of individuals and groups,” President Joe Biden said in a national security memo last month, urging that safety testing was critical to support unrivaled AI innovation.

“For the United States to benefit maximally from AI, Americans must know when they can trust systems to perform safely and reliably,” Biden said.

But the AISI’s safety testing is voluntary, and while companies like OpenAI and Anthropic have agreed to the voluntary testing, not every company has. Hansen is worried that AISI is under-resourced and under-budgeted to achieve its broad goals of safeguarding America from untold AI harms.

“The AI Safety Institute predicted that they’ll need about $50 million in funding, and that was before the National Security memo, and it does not seem like they’re going to be getting that at all,” Hansen told Ars.

Biden had $50 million budgeted for AISI in 2025, but Donald Trump has threatened to dismantle Biden’s AI safety plan upon taking office.

The AISI was probably never going to be funded well enough to detect and deter all AI harms, but with its future unclear, even the limited safety testing the US had planned could be stalled at a time when the AI industry continues moving full speed ahead.

That could largely leave the public at the mercy of AI companies’ internal safety testing. As frontier models from big companies will likely remain under society’s microscope, OpenAI has promised to increase investments in safety testing and help establish industry-leading safety standards.

According to OpenAI, that effort includes making models safer over time, less prone to producing harmful outputs, even with jailbreaks. But OpenAI has a lot of work to do in that area, as Hansen told Ars that he has a “standard jailbreak” for OpenAI’s most popular release, ChatGPT, “that almost always works” to produce harmful outputs.

The AISI did not respond to Ars’ request to comment.

NYT “nowhere near done” inspecting OpenAI models

For the public, who often become guinea pigs when AI acts unpredictably, risks remain, as the NYT case suggests that the costs of fighting AI companies could go up while technical hiccups could delay resolutions. Last week, an OpenAI filing showed that NYT’s attempts to inspect pre-training data in a “very, very tightly controlled environment” like the one recommended for model inspection were allegedly continuously disrupted.

“The process has not gone smoothly, and they are running into a variety of obstacles to, and obstructions of, their review,” the court filing describing NYT’s position said. “These severe and repeated technical issues have made it impossible to effectively and efficiently search across OpenAI’s training datasets in order to ascertain the full scope of OpenAI’s infringement. In the first week of the inspection alone, Plaintiffs experienced nearly a dozen disruptions to the inspection environment, which resulted in many hours when News Plaintiffs had no access to the training datasets and no ability to run continuous searches.”

OpenAI was additionally accused of refusing to install software the litigants needed and randomly shutting down ongoing searches. Frustrated after more than 27 days of inspecting data and getting “nowhere near done,” the NYT keeps pushing the court to order OpenAI to provide the data instead. In response, OpenAI said plaintiffs’ concerns were either “resolved” or discussions remained “ongoing,” suggesting there was no need for the court to intervene.

So far, the NYT claims that it has found millions of plaintiffs’ works in the ChatGPT pre-training data but has been unable to confirm the full extent of the alleged infringement due to the technical difficulties. Meanwhile, costs keep accruing in every direction.

“While News Plaintiffs continue to bear the burden and expense of examining the training datasets, their requests with respect to the inspection environment would be significantly reduced if OpenAI admitted that they trained their models on all, or the vast majority, of News Plaintiffs’ copyrighted content,” the court filing said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI accused of trying to profit off AI model inspection in court Read More »

ever-heard-of-“llady-gaga”?-universal-files-piracy-suit-over-alleged-knockoffs.

Ever heard of “Llady Gaga”? Universal files piracy suit over alleged knockoffs.

Universal Music Group yesterday sued a music firm that allegedly distributes pirated songs on popular streaming services under misspelled versions of popular artists’ names—such as “Kendrik Laamar,” “Arriana Gramde,” “Jutin Biber,” and “Llady Gaga.” The UMG Recordings lawsuit against the French company Believe and its US-based subsidiary, TuneCore, alleges that “Believe is fully aware that its business model is fueled by rampant piracy” and “turned a blind eye to the fact that its music catalog was rife with copyright infringing sound recordings.”

Believe is a publicly traded company with about 2,020 employees in over 50 countries and reported $518 million (474.1 million euros) in revenue in the first half of 2024. Believe says its “mission is to develop independent artists and labels in the digital world.”

UMG alleges that Believe achieved “dramatic growth and profitability in recent years by operating as a hub for the distribution of infringing copies of the world’s most popular copyrighted recordings.” Believe has licensing deals with online platforms “including TikTok, YouTube, Spotify, Apple Music, Instagram and hundreds of others,” the lawsuit said.

UMG alleged that Believe distributes songs on these services “with full knowledge that many of the clients of its distribution services are fraudsters regularly providing infringing copies of copyrighted recordings.” Believe enters into “distribution contracts with anyone willing to sign one of its basic form agreements,” and its “client list is overrun with fraudulent ‘artists’ and pirate record labels who rely on Believe and its distribution network to seed infringing copies of popular sound recordings throughout the digital music ecosystem,” the lawsuit said, continuing:

Believe makes little effort to hide its illegal actions. Indeed, the names of its “artists” and recordings are often minor variants on the names of Plaintiffs’ famous recording artists and the titles of their most successful works. For example, Believe has distributed infringing tracks from infringers who call themselves “Kendrik Laamar” (a reference to Kendrick Lamar); “Arriana Gramde” (a reference to Ariana Grande); “Jutin Biber” (a reference to Justin Bieber); and “Llady Gaga” (a reference to Lady Gaga). Often, Believe distributes overtly infringing versions of original tracks by famous artists with notations that they are “sped up” or “remixed.”

The Rihanna song “S&M” was distributed as a remix by Believe under the name “Rihamna,” the lawsuit said. In other cases, names associated with allegedly infringing tracks were very different from the real artists’ names. The lawsuit said Lady Gaga’s “Bad Romance” and Billie Eilish’s “TV” were both distributed in sped-up form under the name “INDRAGERSN.”

Ever heard of “Llady Gaga”? Universal files piracy suit over alleged knockoffs. Read More »

tesla,-warner-bros.-sued-for-using-ai-ripoff-of-iconic-blade-runner-imagery

Tesla, Warner Bros. sued for using AI ripoff of iconic Blade Runner imagery


A copy of a copy of a copy

“That movie sucks,” Elon Musk said in response to the lawsuit.

Credit: via Alcon Entertainment

Elon Musk may have personally used AI to rip off a Blade Runner 2049 image for a Tesla cybercab event after producers rejected any association between their iconic sci-fi movie and Musk or any of his companies.

In a lawsuit filed Tuesday, lawyers for Alcon Entertainment—exclusive rightsholder of the 2017 Blade Runner 2049 movie—accused Warner Bros. Discovery (WBD) of conspiring with Musk and Tesla to steal the image and infringe Alcon’s copyright to benefit financially off the brand association.

According to the complaint, WBD did not approach Alcon for permission until six hours before the Tesla event when Alcon “refused all permissions and adamantly objected” to linking their movie with Musk’s cybercab.

At that point, WBD “disingenuously” downplayed the license being sought, the lawsuit said, claiming they were seeking “clip licensing” that the studio should have known would not provide rights to livestream the Tesla event globally on X (formerly Twitter).

Musk’s behavior cited

Alcon said it would never allow Tesla to exploit its Blade Runner film, so “although the information given was sparse, Alcon learned enough information for Alcon’s co-CEOs to consider the proposal and firmly reject it, which they did.” Specifically, Alcon denied any affiliation—express or implied—between Tesla’s cybercab and Blade Runner 2049.

“Musk has become an increasingly vocal, overtly political, highly polarizing figure globally, and especially in Hollywood,” Alcon’s complaint said. If Hollywood perceived an affiliation with Musk and Tesla, the complaint said, the company risked alienating not just other car brands currently weighing partnerships on the Blade Runner 2099 TV series Alcon has in the works, but also potentially losing access to top Hollywood talent for their films.

The “Hollywood talent pool market generally is less likely to deal with Alcon, or parts of the market may be, if they believe or are confused as to whether, Alcon has an affiliation with Tesla or Musk,” the complaint said.

Musk, the lawsuit said, is “problematic,” and “any prudent brand considering any Tesla partnership has to take Musk’s massively amplified, highly politicized, capricious and arbitrary behavior, which sometimes veers into hate speech, into account.”

In bad faith

Because Alcon had no chance to avoid the affiliation while millions viewed the cybercab livestream on X, Alcon saw Tesla using the images over Alcon’s objections as “clearly” a “bad faith and malicious gambit… to link Tesla’s cybercab to strong Hollywood brands at a time when Tesla and Musk are on the outs with Hollywood,” the complaint said.

Alcon believes that WBD’s agreement was likely worth six or seven figures and likely stipulated that Tesla “affiliate the cybercab with one or more motion pictures from” WBD’s catalog.

While any of the Mad Max movies may have fit the bill, Musk wanted to use Blade Runner 2049, the lawsuit alleged, because that movie features an “artificially intelligent autonomously capable” flying car (known as a spinner) and is “extremely relevant” to “precisely the areas of artificial intelligence, self-driving capability, and autonomous automotive capability that Tesla and Musk are trying to market” with the cybercab.

The Blade Runner 2049 spinner is “one of the most famous vehicles in motion picture history,” the complaint alleged, recently exhibited alongside other iconic sci-fi cars like the Back to the Future time-traveling DeLorean or the light cycle from Tron: Legacy.

As Alcon sees it, Musk seized the misappropriation of the Blade Runner image to help him sell Teslas, and WBD allegedly directed Musk to use AI to skirt Alcon’s copyright to avoid a costly potential breach of contract on the day of the event.

For Alcon, brand partnerships are a lucrative business, with carmakers paying as much as $10 million to associate their vehicles with Blade Runner 2049. By seemingly using AI to generate a stylized copy of the image at the heart of the movie—which references the scene where their movie’s hero, K, meets the original 1982 Blade Runner hero, Rick Deckard—Tesla avoided paying Alcon’s typical fee, their complaint said.

Musk maybe faked the image himself, lawsuit says

During the live event, Musk introduced the cybercab on a WBD Hollywood studio lot. For about 11 seconds, the Tesla founder “awkwardly” displayed a fake, allegedly AI-generated Blade Runner 2049 film still. He used the image to make a point that apocalyptic films show a future that’s “dark and dismal,” whereas Tesla’s vision of the future is much brighter.

In Musk’s slideshow image, believed to be AI-generated, a male figure is “seen from behind, with close-cropped hair, wearing a trench coat or duster, standing in almost full silhouette as he surveys the abandoned ruins of a city, all bathed in misty orange light,” the lawsuit said. The similarity to the key image used in Blade Runner 2049 marketing is not “coincidental,” the complaint said.

If there were any doubts that this image was supposed to reference the Blade Runner movie, the lawsuit said, Musk “erased them” by directly referencing the movie in his comments.

“You know, I love Blade Runner, but I don’t know if we want that future,” Musk said at the event. “I believe we want that duster he’s wearing, but not the, uh, not the bleak apocalypse.”

The producers think the image was likely generated—”even possibly by Musk himself”—by “asking an AI image generation engine to make ‘an image from the K surveying ruined Las Vegas sequence of Blade Runner 2049,’ or some closely equivalent input direction,” the lawsuit said.

Alcon is not sure exactly what went down after the company rejected rights to use the film’s imagery at the event and is hoping to learn more through the litigation’s discovery phase.

Musk may try to argue that his comments at the Tesla event were “only meant to talk broadly about the general idea of science fiction films and undesirable apocalyptic futures and juxtaposing them with Musk’s ostensibly happier robot car future vision.”

But producers argued that defense is “not credible” since Tesla explicitly asked to use the Blade Runner 2049 image, and there are “better” films in WBD’s library to promote Musk’s message, like the Mad Max movies.

“But those movies don’t have massive consumer goodwill specifically around really cool-looking (Academy Award-winning) artificially intelligent, autonomous cars,” the complaint said, accusing Musk of stealing the image when it wasn’t given to him.

If Tesla and WBD are found to have violated copyright and false representation laws, that potentially puts both companies on the hook for damages that cover not just copyright fines but also Alcon’s lost profits and reputation damage after the alleged “massive economic theft.”

Musk responds to Blade Runner suit

Alcon suspects that Musk believed that Blade Runner 2049 was eligible to be used at the event under the WBD agreement, not knowing that WBD never had “any non-domestic rights or permissions for the Picture.”

Once Musk requested to use the Blade Runner imagery, Alcon alleged that WBD scrambled to secure rights by obscuring the very lucrative “larger brand affiliation proposal” by positioning their ask as a request for much less expensive “clip licensing.”

After Alcon rejected the proposal outright, WBD told Tesla that the affiliation in the event could not occur because X planned to livestream the event globally. But even though Tesla and X allegedly knew that the affiliation was rejected, Musk appears to have charged ahead with the event as planned.

“It all exuded an odor of thinly contrived excuse to link Tesla’s cybercab to strong Hollywood brands,” Alcon’s complaint said. “Which of course is exactly what it was.”

Alcon is hoping a jury will find Tesla, Musk, and WBD violated laws. Producers have asked for an injunction stopping Tesla from using any Blade Runner imagery in its promotional or advertising campaigns. They also want a disclaimer slapped on the livestreamed event video on X, noting that the Blade Runner association is “false or misleading.”

For Musk, a ban on linking Blade Runner to his car company may feel bleak. Last year, he touted the Cybertruck as an “armored personnel carrier from the future—what Bladerunner would have driven.”  This amused many Blade Runner fans, as Gizmodo noted, because there never was a character named “Bladerunner,” but rather that was just a job title for the film’s hero Deckard.

In response to the lawsuit, Musk took to X to post what Blade Runner fans—who rated the 2017 movie as 88 percent fresh on Rotten Tomatoes—might consider a polarizing take, replying, “That movie sucks” on a post calling out Alcon’s lawsuit as “absurd.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Tesla, Warner Bros. sued for using AI ripoff of iconic Blade Runner imagery Read More »

man-tricks-openai’s-voice-bot-into-duet-of-the-beatles’-“eleanor-rigby”

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby”

A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI's Advanced Voice Mode through the ChatGPT app.

Enlarge / A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI’s Advanced Voice Mode through the ChatGPT app.

OpenAI’s new Advanced Voice Mode (AVM) of its ChatGPT AI assistant rolled out to subscribers on Tuesday, and people are already finding novel ways to use it, even against OpenAI’s wishes. On Thursday, a software architect named AJ Smith tweeted a video of himself playing a duet of The Beatles’ 1966 song “Eleanor Rigby” with AVM. In the video, Smith plays the guitar and sings, with the AI voice interjecting and singing along sporadically, praising his rendition.

“Honestly, it was mind-blowing. The first time I did it, I wasn’t recording and literally got chills,” Smith told Ars Technica via text message. “I wasn’t even asking it to sing along.”

Smith is no stranger to AI topics. In his day job, he works as associate director of AI Engineering at S&P Global. “I use [AI] all the time and lead a team that uses AI day to day,” he told us.

In the video, AVM’s voice is a little quavery and not pitch-perfect, but it appears to know something about “Eleanor Rigby’s” melody when it first sings, “Ah, look at all the lonely people.” After that, it seems to be guessing at the melody and rhythm as it recites song lyrics. We have also convinced Advanced Voice Mode to sing, and it did a perfect melodic rendition of “Happy Birthday” after some coaxing.

AJ Smith’s video of singing a duet with OpenAI’s Advanced Voice Mode.

Normally, when you ask AVM to sing, it will reply something like, “My guidelines won’t let me talk about that.” That’s because in the chatbot’s initial instructions (called a “system prompt“), OpenAI instructs the voice assistant not to sing or make sound effects (“Do not sing or hum,” according to one system prompt leak).

OpenAI possibly added this restriction because AVM may otherwise reproduce copyrighted content, such as songs that were found in the training data used to create the AI model itself. That’s what is happening here to a limited extent, so in a sense, Smith has discovered a form of what researchers call a “prompt injection,” which is a way of convincing an AI model to produce outputs that go against its system instructions.

How did Smith do it? He figured out a game that reveals AVM knows more about music than it may let on in conversation. “I just said we’d play a game. I’d play the four pop chords and it would shout out songs for me to sing along with those chords,” Smith told us. “Which did work pretty well! But after a couple songs it started to sing along. Already it was such a unique experience, but that really took it to the next level.”

This is not the first time humans have played musical duets with computers. That type of research stretches back to the 1970s, although it was typically limited to reproducing musical notes or instrumental sounds. But this is the first time we’ve seen anyone duet with an audio-synthesizing voice chatbot in real time.

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby” Read More »

one-startup’s-plan-to-fix-ai’s-“shoplifting”-problem

One startup’s plan to fix AI’s “shoplifting” problem

I’ve been caught stealing, once when I was five —

Algorithm will identify sources used by generative AI, compensate them for use.

One startup’s plan to fix AI’s “shoplifting” problem

Bloomberg via Getty

Bill Gross made his name in the tech world in the 1990s, when he came up with a novel way for search engines to make money on advertising. Under his pricing scheme, advertisers would pay when people clicked on their ads. Now, the “pay-per-click” guy has founded a startup called ProRata, which has an audacious, possibly pie-in-the-sky business model: “AI pay-per-use.”

Gross, who is CEO of the Pasadena, California, company, doesn’t mince words about the generative AI industry. “It’s stealing,” he says. “They’re shoplifting and laundering the world’s knowledge to their benefit.”

AI companies often argue that they need vast troves of data to create cutting-edge generative tools and that scraping data from the Internet, whether it’s text from websites, video or captions from YouTube, or books pilfered from pirate libraries, is legally allowed. Gross doesn’t buy that argument. “I think it’s bullshit,” he says.

So do plenty of media executives, artists, writers, musicians, and other rights-holders who are pushing back—it’s hard to keep up with the constant flurry of copyright lawsuits filed against AI companies, alleging that the way they operate amounts to theft.

But Gross thinks ProRata offers a solution that beats legal battles. “To make it fair—that’s what I’m trying to do,” he says. “I don’t think this should be solved by lawsuits.”

His company aims to arrange revenue-sharing deals so publishers and individuals get paid when AI companies use their work. Gross explains it like this: “We can take the output of generative AI, whether it’s text or an image or music or a movie, and break it down into the components, to figure out where they came from, and then give a percentage attribution to each copyright holder, and then pay them accordingly.” ProRata has filed patent applications for the algorithms it created to assign attribution and make the appropriate payments.

This week, the company, which has raised $25 million, launched with a number of big-name partners, including Universal Music Group, the Financial Times, The Atlantic, and media company Axel Springer. In addition, it has made deals with authors with large followings, including Tony Robbins, Neal Postman, and Scott Galloway. (It has also partnered with former White House Communications Director Anthony Scaramucci.)

Even journalism professor Jeff Jarvis, who believes scraping the web for AI training is fair use, has signed on. He tells WIRED that it’s smart for people in the news industry to band together to get AI companies access to “credible and current information” to include in their output. “I hope that ProRata might open discussion for what could turn into APIs [application programming interfaces] for various content,” he says.

Following the company’s initial announcement, Gross says he had a deluge of messages from other companies asking to sign up, including a text from Time CEO Jessica Sibley. ProRata secured a deal with Time, the publisher confirmed to WIRED. He plans to pursue agreements with high-profile YouTubers and other individual online stars.

The key word here is “plans.” The company is still in its very early days, and Gross is talking a big game. As a proof of concept, ProRata is launching its own subscription chatbot-style search engine in October. Unlike other AI search products, ProRata’s search tool will exclusively use licensed data. There’s nothing scraped using a web crawler. “Nothing from Reddit,” he says.

Ed Newton-Rex, a former Stability AI executive who now runs the ethical data licensing nonprofit Fairly Trained, is heartened by ProRata’s debut. “It’s great to see a generative AI company licensing training data before releasing their model, in contrast to many other companies’ approach,” he says. “The deals they have in place further demonstrate media companies’ openness to working with good actors.”

Gross wants the search engine to demonstrate that quality of data is more important than quantity and believes that limiting the model to trustworthy information sources will curb hallucinations. “I’m claiming that 70 million good documents is actually superior to 70 billion bad documents,” he says. “It’s going to lead to better answers.”

What’s more, Gross thinks he can get enough people to sign up for this all-licensed-data AI search engine to make as much money needed to pay its data providers their allotted share. “Every month the partners will get a statement from us saying, ‘Here’s what people search for, here’s how your content was used, and here’s your pro rata check,’” he says.

Other startups already are jostling for prominence in this new world of training-data licensing, like the marketplaces TollBit and Human Native AI. A nonprofit called the Dataset Providers Alliance was formed earlier this summer to push for more standards in licensing; founding members include services like the Global Copyright Exchange and Datarade.

ProRata’s business model hinges in part on its plan to license its attribution and payment technologies to other companies, including major AI players. Some of those companies have begun striking their own deals with publishers. (The Atlantic and Axel Springer, for instance, have agreements with OpenAI.) Gross hopes that AI companies will find licensing ProRata’s models more affordable than creating them in-house.

“I’ll license the system to anyone who wants to use it,” Gross says. “I want to make it so cheap that it’s like a Visa or MasterCard fee.”

This story originally appeared on wired.com.

One startup’s plan to fix AI’s “shoplifting” problem Read More »

the-“netflix-of-anime”-piracy-site-abruptly-shuts-down,-shocking-users

The “Netflix of anime” piracy site abruptly shuts down, shocking users

Disney+ promotional art for <em>The Fable</em>, an anime series that triggered Animeflix takedown notices.” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/07/The-Fable-press-image-800×450.jpeg”></img><figcaption>
<p><a data-height=Enlarge / Disney+ promotional art for The Fable, an anime series that triggered Animeflix takedown notices.

Disney+

Thousands of anime fans were shocked Thursday when the popular piracy site Animeflix voluntarily shut down without explaining why, TorrentFreak reported.

“It is with a heavy heart that we announce the closure of Animeflix,” the site’s operators told users in a Discord with 35,000 members. “After careful consideration, we have decided to shut down our service effective immediately. We deeply appreciate your support and enthusiasm over the years.”

Prior to its shutdown, Animeflix attracted millions of monthly visits, TorrentFreak reported. It was preferred by some anime fans for its clean interface, with one fan on Reddit describing Animeflix as the “Netflix of anime.”

“Deadass this site was clean,” one Reddit user wrote. “The best I’ve ever seen. Sad to see it go.”

Although Animeflix operators did not connect the dots for users, TorrentFreak suggested that the piracy site chose to shut down after facing “considerable legal pressure in recent months.”

Back in December, an anti-piracy group, Alliance for Creativity and Entertainment (ACE), sought to shut down Animeflix. Then in mid-May, rightsholders—including Netflix, Disney, Universal, Paramount, and Warner Bros.—won an injunction through the High Court of India against several piracy sites, including Animeflix. This briefly caused Animeflix to be unavailable until Animeflix simply switched to another domain and continued serving users, TorrentFreak reported.

Although Animeflix is not telling users why it’s choosing to shut down now, TorrentFreak—which, as its name suggests, focuses much of its coverage on copyright issues impacting file sharing online—noted that “when a pirate site shuts down, voluntarily or not, copyright issues typically play a role.”

For anime fans, the abrupt closure was disappointing because of difficulty accessing the hottest new anime titles and delays as studios work to offer translations to various regions. The delays are so bad that some studios are considering combating piracy by using AI to push out translated versions more quickly. But fans fear this will only result in low-quality subtitles, CBR reported.

On Reddit, some fans also complained after relying exclusively on Animeflix to keep track of where they left off on anime shows that often span hundreds of episodes.

Others begged to be turned onto other anime piracy sites, while some speculated whether Animeflix might eventually pop up at a new domain. TorrentFreak noted that Animeflix shut down once previously several years ago but ultimately came back. One Redditor wrote, “another hero has passed away but the will, will be passed.” On another Reddit thread asking “will Animeflix be gone forever or maybe create a new site,” one commenter commiserated, writing, “We don’t know for sure. Only time will tell.”

It’s also possible that someone else may pick up the torch and operate a new piracy site under the same name. According to TorrentFreak, this is “likely.”

Animeflix did not reassure users that it may be back, instead urging them to find other sources for their favorite shows and movies.

“We hope the joy and excitement of anime continue to brighten your days through other wonderful platforms,” Animeflix’s Discord message said.

ACE did not immediately respond to Ars’ request for comment.

The “Netflix of anime” piracy site abruptly shuts down, shocking users Read More »

washing-machine-chime-scandal-shows-how-absurd-youtube-copyright-abuse-can-get

Washing machine chime scandal shows how absurd YouTube copyright abuse can get

Washing machine chime scandal shows how absurd YouTube copyright abuse can get

YouTube’s Content ID system—which automatically detects content registered by rightsholders—is “completely fucking broken,” a YouTuber called “Albino” declared in a rant on X (formerly Twitter) viewed more than 950,000 times.

Albino, who is also a popular Twitch streamer, complained that his YouTube video playing through Fallout was demonetized because a Samsung washing machine randomly chimed to signal a laundry cycle had finished while he was streaming.

Apparently, YouTube had automatically scanned Albino’s video and detected the washing machine chime as a song called “Done”—which Albino quickly saw was uploaded to YouTube by a musician known as Audego nine years ago.

But when Albino hit play on Audego’s song, the only thing that he heard was a 30-second clip of the washing machine chime. To Albino it was obvious that Audego didn’t have any rights to the jingle, which Dexerto reported actually comes from the song “Die Forelle” (“The Trout”) from Austrian composer Franz Schubert.

The song was composed in 1817 and is in the public domain. Samsung has used it to signal the end of a wash cycle for years, sparking debate over whether it’s the catchiest washing machine song and inspiring at least one violinist to perform a duet with her machine. It’s been a source of delight for many Samsung customers, but for Albino, hearing the jingle appropriated on YouTube only inspired ire.

“A guy recorded his fucking washing machine and uploaded it to YouTube with Content ID,” Albino said in a video on X. “And now I’m getting copyright claims” while “my money” is “going into the toilet and being given to this fucking slime.”

Albino suggested that YouTube had potentially allowed Audego to make invalid copyright claims for years without detecting the seemingly obvious abuse.

“How is this still here?” Albino asked. “It took me one Google search to figure this out,” and “now I’m sharing revenue with this? That’s insane.”

At first, Team YouTube gave Albino a boilerplate response on X, writing, “We understand how important it is for you. From your vid, it looks like you’ve recently submitted a dispute. When you dispute a Content ID claim, the person who claimed your video (the claimant) is notified and they have 30 days to respond.”

Albino expressed deep frustration at YouTube’s response, given how “egregious” he considered the copyright abuse to be.

“Just wait for the person blatantly stealing copyrighted material to respond,” Albino responded to YouTube. “Ah okay, yes, I’m sure they did this in good faith and will make the correct call, though it would be a shame if they simply clicked ‘reject dispute,’ took all the ad revenue money and forced me to risk having my channel terminated to appeal it!! XDxXDdxD!! Thanks Team YouTube!”

Soon after, YouTube confirmed on X that Audego’s copyright claim was indeed invalid. The social platform ultimately released the claim and told Albino to expect the changes to be reflected on his channel within two business days.

Ars could not immediately reach YouTube or Albino for comment.

Widespread abuse of Content ID continues

YouTubers have complained about abuse of Content ID for years. Techdirt’s Timothy Geigner agreed with Albino’s assessment that the YouTube system is “hopelessly broken,” noting that sometimes content is flagged by mistake. But just as easily, bad actors can abuse the system to claim “content that simply isn’t theirs” and seize sometimes as much as millions in ad revenue.

In 2021, YouTube announced that it had invested “hundreds of millions of dollars” to create content management tools, of which Content ID quickly emerged as the platform’s go-to solution to detect and remove copyrighted materials.

At that time, YouTube claimed that Content ID was created as a “solution for those with the most complex rights management needs,” like movie studios and record labels whose movie clips and songs are most commonly uploaded by YouTube users. YouTube warned that without Content ID, “rightsholders could have their rights impaired and lawful expression could be inappropriately impacted.”

Since its rollout, more than 99 percent of copyright actions on YouTube have consistently been triggered automatically through Content ID.

And just as consistently, YouTube has seen widespread abuse of Content ID, terminating “tens of thousands of accounts each year that attempt to abuse our copyright tools,” YouTube said. YouTube also acknowledged in 2021 that “just one invalid reference file in Content ID can impact thousands of videos and users, stripping them of monetization or blocking them altogether.”

To help rightsholders and creators track how much copyrighted content is removed from the platform, YouTube started releasing biannual transparency reports in 2021. The Electronic Frontier Foundation (EFF), a nonprofit digital rights group, applauded YouTube’s “move towards transparency” while criticizing YouTube’s “claim that YouTube is adequately protecting its creators.”

“That rings hollow,” EFF reported in 2021, noting that “huge conglomerates have consistently pushed for more and more restrictions on the use of copyrighted material, at the expense of fair use and, as a result, free expression.” As EFF saw it then, YouTube’s Content ID system mainly served to appease record labels and movie studios, while creators felt “pressured” not to dispute Content ID claims out of “fear” that their channel might be removed if YouTube consistently sided with rights holders.

According to YouTube, “it’s impossible for matching technology to take into account complex legal considerations like fair use or fair dealing,” and that impossibility seemingly ensures that creators bear the brunt of automated actions even when it’s fair to use copyrighted materials.

At that time, YouTube described Content ID as “an entirely new revenue stream from ad-supported, user generated content” for rights holders, who made more than $5.5 billion from Content ID matches by December 2020. More recently, YouTube reported that figure climbed above $9 million, as of December 2022. With so much money at play, it’s easy to see how the system could be seen as disproportionately favoring rights holders, while creators continue to suffer from income diverted by the automated system.

Washing machine chime scandal shows how absurd YouTube copyright abuse can get Read More »

can-an-online-library-of-classic-video-games-ever-be-legal?

Can an online library of classic video games ever be legal?

Legal eagles —

Preservationists propose access limits, but industry worries about a free “online arcade.”

The Q*Bert's so bright, I gotta wear shades.

Enlarge / The Q*Bert’s so bright, I gotta wear shades.

Aurich Lawson | Getty Images | Gottlieb

For years now, video game preservationists, librarians, and historians have been arguing for a DMCA exemption that would allow them to legally share emulated versions of their physical game collections with researchers remotely over the Internet. But those preservationists continue to face pushback from industry trade groups, which worry that an exemption would open a legal loophole for “online arcades” that could give members of the public free, legal, and widespread access to copyrighted classic games.

This long-running argument was joined once again earlier this month during livestreamed testimony in front of the Copyright Office, which is considering new DMCA rules as part of its regular triennial process. During that testimony, representatives of the Software Preservation Network and the Library Copyright Alliance defended their proposal for a system of “individualized human review” to help ensure that temporary remote game access would be granted “primarily for the purposes of private study, scholarship, teaching, or research.”

Lawyer Steve Englund, who represented the ESA at the Copyright Office hearing.

Enlarge / Lawyer Steve Englund, who represented the ESA at the Copyright Office hearing.

Speaking for the Entertainment Software Association trade group, though, lawyer Steve Englund said the new proposal was “not very much movement” on the part of the proponents and was “at best incomplete.” And when pressed on what would represent “complete” enough protections to satisfy the ESA, Englund balked.

“I don’t think there is at the moment any combination of limitations that ESA members would support to provide remote access,” Englund said. “The preservation organizations want a great deal of discretion to handle very valuable intellectual property. They have yet to… show a willingness on their part in a way that might be comforting to the owners of that IP.”

Getting in the way of research

Research institutions can currently offer remote access to digital copies of works like books, movies, and music due to specific DMCA exemptions issued by the Copyright Office. However, there is no similar exemption that allows for sending temporary digital copies of video games to interested researchers. That means museums like the Strong Museum of Play can only provide access to their extensive game archives if a researcher physically makes the trip to their premises in Rochester, New York.

Currently, the only way for researchers to access these games in the Strong Museum's collection is to visit Rochester, New York, in person.

Enlarge / Currently, the only way for researchers to access these games in the Strong Museum’s collection is to visit Rochester, New York, in person.

During the recent Copyright Office hearing, industry lawyer Robert Rothstein tried to argue that this amounts to more of a “travel problem” than a legal problem that requires new rule-making. But NYU professor Laine Nooney argued back that the need for travel represents “a significant financial and logistical impediment to doing research.”

For Nooney, getting from New York City to the Strong Museum in Rochester would require a five- to six-hour drive “on a good day,” they said, as well as overnight accommodations for any research that’s going to take more than a small part of one day. Because of this, Nooney has only been able to access the Strong collection twice in her career. For researchers who live farther afield—or for grad students and researchers who might not have as much funding—even a single research visit to the Strong might be out of reach.

“You don’t go there just to play a game for a couple of hours,” Nooney said. “Frankly my colleagues in literary studies or film history have pretty routine and regular access to digitized versions of the things they study… These impediments are real and significant and they do impede research in ways that are not equitable compared to our colleagues in other disciplines.”

Limited access

Lawyer Kendra Albert.

Enlarge / Lawyer Kendra Albert.

During the hearing, lawyer Kendra Albert said the preservationists had proposed the idea of human review of requests for remote access to “strike a compromise” between “concerns of the ESA and the need for flexibility that we’ve emphasized on behalf of preservation institutions.” They compared the proposed system to the one already used to grant access for libraries’ “special collections,” which are not made widely available to all members of the public.

But while preservation institutions may want to provide limited scholarly access, Englund argued that “out in the real world, people want to preserve access in order to play games for fun.” He pointed to public comments made to the Copyright Office from “individual commenters [who] are very interested in playing games recreationally” as evidence that some will want to exploit this kind of system.

Even if an “Ivy League” library would be responsible with a proposed DMCA exemption, Englund worried that less scrupulous organizations might simply provide an online “checkbox” for members of the public who could easily lie about their interest in “scholarly play.” If a human reviewed that checkbox affirmation, it could provide a legal loophole to widespread access to an unlimited online arcade, Englund argued.

Will any restrictions be enough?

VGHF Library Director Phil Salvador.

Enlarge / VGHF Library Director Phil Salvador.

Phil Salvador of the Video Game History Foundation said that Englund’s concern about this score was overblown. “Building a video game collection is a specialized skill that most libraries do not have the human labor to do, or the expertise, or the resources, or even the interest,” he said.

Salvador estimated that the number of institutions capable of building a physical collection of historical games is in the “single digits.” And that’s before you account for the significant resources needed to provide remote access to those collections; Rhizome Preservation Director Dragan Espenschied said it costs their organization “thousands of dollars a month” to run the sophisticated cloud-based emulation infrastructure needed for a few hundred users to access their Emulation as a Service art archives and gaming retrospectives.

Salvador also made reference to last year’s VGHF study that found a whopping 87 percent of games ever released are out of print, making it difficult for researchers to get access to huge swathes of video game history without institutional help. And the games of most interest to researchers are less likely to have had modern re-releases since they tend to be the “more primitive” early games with “less popular appeal,” Salvador said.

The Copyright Office is expected to rule on the preservation community’s proposed exemption later this year. But for the moment, there is some frustration that the industry has not been at all receptive to the significant compromises the preservation community feels it has made on these potential concerns.

“None of that is ever going to be sufficient to reassure these rights holders that it will not cause harm,” Albert said at the hearing. “If we’re talking about practical realities, I really want to emphasize the fact that proponents have continually proposed compromises that allow preservation institutions to provide the kind of access that is necessary for researchers. It’s not clear to me that it will ever be enough.”

Can an online library of classic video games ever be legal? Read More »