reddit

Lawsuit: Reddit caught Perplexity “red-handed” stealing data from Google results

AI, ai scraping, Google, google search, Perplexity, Perplexity.ai, Policy, reddit, Web Scraping / Beth Washington / October 24, 2025

Scraper accused of stealing Reddit content “shocked” by lawsuit.

In a lawsuit filed on Wednesday, Reddit accused an AI search engine, Perplexity, of conspiring with several companies to illegally scrape Reddit content from Google search results, allegedly dodging anti-scraping methods that require substantial investments from both Google and Reddit.

Reddit alleged that Perplexity feeds off Reddit and Google, claiming to be “the world’s first answer engine” but really doing “nothing groundbreaking.”

“Its answer engine simply uses a different company’s” large language model “to parse through a massive number of Google search results to see if it can answer a user’s question based on those results,” the lawsuit said. “But Perplexity can only run its ‘answer engine’ by wrongfully accessing and scraping Reddit content appearing in Google’s own search results from Google’s own search engine.”

Likening companies involved in the alleged conspiracy to “bank robbers,” Reddit claimed it caught Perplexity “red-handed” stealing content that its “answer engine” should not have had access to.

Baiting Perplexity with “the digital equivalent of marked bills,” Reddit tested out posting content that could only be found in Google search engine results pages (SERPs) and “within hours, queries to Perplexity’s ‘answer engine’ produced the contents of that test post.”

“The only way that Perplexity could have obtained that Reddit content and then used it in its ‘answer engine’ is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content and Perplexity then quickly incorporated that data into its answer engine,” Reddit’s lawsuit said.

In a Reddit post, Perplexity denied any wrongdoing, describing its answer engine as summarizing Reddit discussions and citing Reddit threads in answers, just like anyone who shares links or posts on Reddit might do. Perplexity suggested that Reddit was attacking the open Internet by trying to extort licensing fees for Reddit content, despite knowing that Perplexity doesn’t train foundational models. Reddit’s endgame, Perplexity alleged, was to use the Perplexity lawsuit as a “show of force in Reddit’s training data negotiations with Google and OpenAI.”

“We won’t be extorted, and we won’t help Reddit extort Google, even if they’re our (huge) competitor,” Perplexity wrote. “Perplexity will play fair, but we won’t cave. And we won’t let bigger companies use us in shell games. ”

Reddit likely anticipated Perplexity’s defense of the “open Internet,” noting in its complaint that “Reddit’s current Robots Exclusion Protocol file (‘robots.txt’) says, ‘Reddit believes in an open Internet, but not the misuse of public content.’”

Google reveals how scrapers steal from search results

To block scraping, Reddit uses various measures, such as “registered user-identification limits, IP-rate limits, captcha bot protection, and anomaly-detection tools,” the complaint said.

Similarly, Google relies on “anti-scraping systems and teams dedicated to preventing unauthorized access to its products and services,” Reddit said, noting Google prohibits “unauthorized automated access” to its SERPs.

To back its claims, Reddit subpoenaed Google to find out more about how the search giant blocks AI scrapers from accessing content on SERPs. Google confirmed it relies on “a technological access control system called ‘SearchGuard,’ which is designed to prevent automated systems from accessing and obtaining wholesale search results and indexed data while allowing individual users—i.e., humans—access to Google’s search results, including results that feature Reddit data.”

“SearchGuard prevents unauthorized access to Google’s search data by imposing a barrier challenge that cannot be solved in the ordinary course by automated systems unless they take affirmative actions to circumvent the SearchGuard system,” Reddit’s complaint explained.

Bypassing these anti-scraping systems violates the Digital Millennium Copyright Act, Reddit alleged, as well as laws against unfair trade and unjust enrichment. Seemingly, Google’s SearchGuard may currently be the easiest to bypass for alleged conspirators who supposedly pivoted to looting Google SERPs after realizing they couldn’t access Reddit content directly on the platform.

Scrapers shocked by Reddit lawsuit

Reddit accused three companies of conspiring with Perplexity—”a Lithuanian data scraper” called Oxylabs UAB, “a former Russian botnet” known as AWMProxy, and SerpApi, a Texas company that sells services for scraping search engines.

Oxylabs “is explicit that its scraping service is meant to circumvent Google’s technological measures,” Reddit alleged, pointing to an Oxylabs’ website called “How to Scrape Google Search Results.”

SerpApi touts the same service, including some options to scrape SERPs at “ludicrous speeds.” To trick browsers, SerpApi’s fastest option uses “a server-swarm to hide from, avoid, or simply overwhelm by brute force effective measures Google has put in place to ward off automated access to search engine results,” Reddit alleged. SerpApi also allegedly provides users “with tips to reduce the chance of being blocked while web scraping, such as by sending ‘fake user-agent string[s],’ shifting IP addresses to avoid multiple requests from the same address, and using proxies ‘to make traffic look like regular user traffic’ and thereby ‘impersonate’ user traffic.”

According to Reddit, the three companies disguise “their web scrapers as regular people (among other techniques) to circumvent or bypass the security restrictions meant to stop them.” During a two-week span in July, they scraped “almost three billion” SERPs containing Reddit text, URLs, images, and videos, a subpoena requesting information from Google revealed.

Ars could not immediately reach AWMProxy for comment. However, the other companies were surprised by Reddit’s lawsuit, while vowing to defend their business models.

SerpApi’s spokesperson told Ars that Reddit did not notify the company before filing the lawsuit.

“We strongly disagree with Reddit’s allegations and intend to vigorously defend ourselves in court,” SerpApi’s spokesperson said. “In the eight years we’ve been in business, SerpApi has always operated on the right side of the law. As stated on our website, ‘The crawling and parsing of public data is protected by the First Amendment of the United States Constitution. We value freedom of speech tremendously.’”

Additionally, SerpAPI works “closely with our attorneys to ensure that our services comply with all applicable laws and fair use principles. SerpApi stands firmly behind its business model and conduct, and we will continue to defend our rights to the fullest extent,” the spokesperson said.

Oxylabs’ chief governance strategy officer, Denas Grybauskas, told Ars that Reddit’s complaint seemed baffling since the other companies involved in the litigation are “unrelated and unaffiliated.”

“We are shocked and disappointed by this news, as Reddit has made no attempt to speak with us directly or communicate any potential concerns,” Grybauskas said. “Oxylabs has always been and will continue to be a pioneer and an industry leader in public data collection, and it will not hesitate to defend itself against these allegations. Oxylabs’ position is that no company should claim ownership of public data that does not belong to them. It is possible that it is just an attempt to sell the same public data at an inflated price.”

Grybauskas defended Oxylabs’ business as creating “real-world value for thousands of businesses and researchers, such as those driving open-source investigations, disinformation tackling, or environmental monitoring.”

“We strongly believe that our core business principles make the Internet a better place and serve the public good,” Grybauskas said. “Oxylabs provides infrastructure for compliant access to publicly available information, and we demand every customer to use our services lawfully. ”

Reddit cited threats to licensing deals

Apparently, Reddit caught on to the alleged scheme after sending cease-and-desist letters to Perplexity to stop scraping Reddit content that its answer engine was citing. Rather than ending the scraping, Reddit claimed Perplexity’s citations increased “forty-fold.” Since Perplexity is a customer listed on SerpApi’s website, Reddit hypothesized the two were conspiring to skirt Google’s anti-circumvention tools, the complaint said, along with the other companies.

In a statement provided to Ars, Ben Lee, chief legal officer at Reddit, said that Oxylabs, AWMProxy, and SerpApi were “textbook examples” of scrapers that “bypass technological protections to steal data, then sell it to clients hungry for training material.”

“Unable to scrape Reddit directly, they mask their identities, hide their locations, and disguise their web scrapers to steal Reddit content from Google Search,” Lee said. “Perplexity is a willing customer of at least one of these scrapers, choosing to buy stolen data rather than enter into a lawful agreement with Reddit itself.”

On Reddit, Perplexity pushed back on Reddit’s claims that Perplexity ignored requests to license Reddit content.

“Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content,” Perplexity said. “Never has. So, it is impossible for us to sign a license agreement to do so.”

Reddit supposedly “insisted we pay anyway, despite lawfully accessing Reddit data,” Perplexity said. “Bowing to strong arm tactics just isn’t how we do business.”

Perplexity’s spokesperson, Jesse Dwyer, told Ars the company chose to post its statement on Reddit “to illustrate a simple point.”

“It is a public Reddit link accessible to anyone, yet by the logic of Reddit’s lawsuit, if you mention it or cite it in any way (which is your job as a reporter), they might just sue you,” Dwyer said.

But Reddit claimed that its business and reputation have been “damaged” by “misappropriation of Reddit data and circumvention of technological control measures.” Without a licensing deal ensuring that Perplexity and others are respecting Reddit policies, Reddit cannot control who has access to data, how they’re using data, and if data use conflicts with Reddit’s privacy policy and user agreement, the complaint said.

Further, Reddit’s worried that Perplexity’s workaround could catch on, potentially messing up Reddit’s other licensing deals. All the while, Reddit noted, it has to invest “significant resources” in anti-scraping technology, with Reddit ultimately suffering damages, including “lost profits and business opportunities, reputational harm, and loss of user trust.”

Reddit’s hoping the court will grant an injunction barring companies from scraping Reddit content from Google SERPs. It also wants companies blocked from both selling Reddit data and “developing or distributing any technology or product that is used for the unauthorized circumvention of technological control measures and scraping of Reddit data.”

If Reddit wins, companies could be required to pay substantial damages or to disgorge profits from the sale of Reddit content.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder in Reddit.

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Lawsuit: Reddit caught Perplexity “red-handed” stealing data from Google results Read More »

Reddit bug caused lesbian subreddit to be labeled as a place for “straight” women

AI, reddit, Social Media, Tech / 9u50fv / September 10, 2025

Explaining further to Ars, Reddit spokesperson Tim Rathschmidt said:

There was a small bug in a test we ran that mistakenly caused the English-to-English translation(s) you saw. That bug has been resolved. Unsurprisingly, English-to-English translations are not part of our strategy, as they aren’t necessary. English-to-English translations were not a desired or expected outcome of the test.

Reddit pulled the test it was running, but its machine learning-powered translations are still functioning, Rathschmidt said. The company plans to fix the bug and run its unspecified “test” again.

Reddit’s explanation differs from user theories floating around beforehand, which were mainly that Reddit was rewriting user-created summaries with generative AI, possibly to boost SEO. Some may still be perturbed by the problem persisting for weeks without explanation and the apparent lack of manual checks for the translation service. However, Redditors can now take comfort in knowing that Reddit is not currently using generative AI to alter user-generated content without notice.

Paige_Railstone, however, maintains frustration and wants to tell Reddit admins, “STOP. Hand off.” The translation bug, they noted, led to people posting on a subreddit for parents with autism that their child might be autistic, “and how terrible that would be for them,” Paige_Railstone recalled.

“These are the kind of unintentionally insulting posts that drive autistics into leaving a community, and it increases the workload of us moderators,” they said.

Paige_Railstone also sees the incident as a reason for moderators to be more cautious.

“This never used to be a concern, but this translation service was rolled out without any notification that I’m aware of, and no option to disable it within the mods’ control. That has the potential to cause problems, as we’ve seen over the past two weeks,” they said.

Disclosure: Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder in Reddit.

Reddit bug caused lesbian subreddit to be labeled as a place for “straight” women Read More »

College student’s “time travel” AI experiment accidentally outputs real 1834 history

AI, AI development, AI research, AI training, Biz & IT, computational archaeology, digital archaeology, GitHub, historical AI, historical research, language models, large language models, LocalLLaMA, London history, machine learning, Open Source, reddit, TimeCapsuleLLM, Victorian era / Mike M. / August 23, 2025

A hobbyist developer building AI language models that speak Victorian-era English “just for fun” got an unexpected history lesson this week when his latest creation mentioned real protests from 1834 London—events the developer didn’t know had actually happened until he Googled them.

“I was interested to see if a protest had actually occurred in 1834 London and it really did happen,” wrote Reddit user Hayk Grigorian, who is a computer science student at Muhlenberg College in Pennsylvania.

For the past month, Grigorian has been developing what he calls TimeCapsuleLLM, a small AI language model (like a pint-sized distant cousin to ChatGPT) which has been trained entirely on texts from 1800–1875 London. Grigorian wants to capture an authentic Victorian voice in the AI model’s outputs. As a result, the AI model ends up spitting out text that’s heavy with biblical references and period-appropriate rhetorical excess.

Grigorian’s project joins a growing field of researchers exploring what some call “Historical Large Language Models” (HLLMs) if they feature a larger base model than the small one Grigorian is using. Similar projects include MonadGPT, which was trained on 11,000 texts from 1400 to 1700 CE that can discuss topics using 17th-century knowledge frameworks, and XunziALLM, which generates classical Chinese poetry following ancient formal rules. These models offer researchers a chance to interact with the linguistic patterns of past eras.

According to Grigorian, TimeCapsuleLLM’s most intriguing recent output emerged from a simple test. When he prompted it with “It was the year of our Lord 1834,” the AI model—which is trained to continue text from wherever a user leaves off—generated the following:

It was the year of our Lord 1834 and the streets of London were filled with protest and petition. The cause, as many re counted, was not bound in the way of private, but having taken up the same day in the day of Lord Palmerston, the public will receive a short statement of the difficulties under which the day of law has reached us. It is a matter of deep regret, that the present events in the history of the world are clear, and consequently will be’known. It is not true that the very men who first settled in the Gospel at Jerusalem should have so extensive and so interesting a record of the prosperity and prosperity

Curious about the accuracy, Grigorian did some fact-checking. “The output also brought up Lord Palmerston,” he wrote, “and after a google search I learned that his actions resulted in the 1834 protests.”

College student’s “time travel” AI experiment accidentally outputs real 1834 history Read More »

Reddit blocks Internet Archive to end sneaky AI scraping

AI, ai scraping, AI training, Artificial Intelligence, Google, Internet Archive, openai, Policy, reddit / DJ Henderson / August 12, 2025

“Until they’re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we’re limiting some of their access to Reddit data to protect redditors,” Rathschmidt said.

A review of social media comments suggests that in the past, some Redditors have used the Wayback Machine to research deleted comments or threads. Those commenters noted that myriad other tools exist for surfacing deleted posts or researching a user’s activity, with some suggesting that the Wayback Machine was maybe not the easiest platform to navigate for that purpose.

Redditors have also turned to resources like IA during times when Reddit’s platform changes trigger content removals. Most recently in 2023, when changes to Reddit’s public API threatened to kill beloved subreddits, archives stepped in to preserve content before it was lost.

IA has not signaled whether it’s looking into fixes to get Reddit’s restrictions lifted and did not respond to Ars’ request to comment on how this change might impact the archive’s utility as an open web resource, given Reddit’s popularity.

The director of the Wayback Machine, Mark Graham, told Ars that IA has “a longstanding relationship with Reddit” and continues to have “ongoing discussions about this matter.”

It seems likely that Reddit is financially motivated to restrict AI firms from taking advantage of Wayback Machine archives, perhaps hoping to spur more lucrative licensing deals like Reddit struck with OpenAI and Google. The terms of the OpenAI deal were kept quiet, but the Google deal was reportedly worth $60 million. Over the next three years, Reddit expects to make more than $200 million off such licensing deals.

Disclosure: Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder in Reddit.

Reddit blocks Internet Archive to end sneaky AI scraping Read More »

Reddit’s UK users must now prove they’re 18 to view adult content

Policy, reddit / Shannon Garcia / July 15, 2025

“Society has long protected youngsters from products that aren’t suitable for them, from alcohol to smoking or gambling,” Ofcom said. “Now, children will be better protected from online material that’s not appropriate for them, while adults’ rights to access legal content are preserved. We expect other companies to follow suit, or face enforcement if they fail to act.”

Ofcom said online platforms that fall under the law “must use highly effective age assurance to identify which users are children, to protect them from harmful material, while preserving adults’ rights to access legal content. That may involve preventing children from accessing the entire site or app, or only some parts or kinds of content.”

Ofcom Group Director for Online Safety Oliver Griffiths recently told the Daily Star that “if you’re a dedicated teenager, you’re probably going to be able to find ways to get [around this] in the same way as people manage to find their way in the pub to buy alcohol at under 18.” But he indicated that the law should prevent many kids from “stumbling across porn,” and that “this is very much a first step.”

In the US, individual states have been imposing age laws on porn websites. The US Supreme Court recently upheld a Texas law that requires age verification on porn sites, finding that the state’s age-gating law doesn’t violate the First Amendment. A dissent written by Justice Elena Kagan described the law’s ID requirement as a deterrent to exercising one’s First Amendment rights, saying that “Texas’s law defines speech by content and tells people entitled to view that speech that they must incur a cost to do so.”

While the Texas law applies to websites in which more than one-third of the content is sexual material, the UK law’s age provisions apply more broadly to social media websites. Reddit’s announcement of its UK restrictions said the company expects it will have to verify user ages in other countries.

“As laws change, we may need to collect and/or verify age in places other than the UK,” Reddit said. “Accordingly, we are also introducing globally an option for you to provide your birthdate to optimize your Reddit experience, for example to help ensure that content and ads are age-appropriate. This is optional, and you won’t be required to provide it unless you live in a place (like the UK) where we are required to ask for it.” Reddit said the option will be available in a user’s account settings, but will not roll out to all users immediately.

Disclosure: Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder in Reddit.

Reddit’s UK users must now prove they’re 18 to view adult content Read More »

Reddit CEO pledges site will remain “written by humans and voted on by humans”

AI, LLMs, reddit, steve huffman, syndication / Beth Washington / June 26, 2025

Reddit is in an “arms race” to protect its devoted online communities from a surge in artificial intelligence-generated content, with the authenticity of its vast repository of human interaction increasingly valuable in training new AI-powered search tools.

Chief executive Steve Huffman told the Financial Times that Reddit had “20 years of conversation about everything,” leaving the company with a lucrative resource of personal interaction.

This has allowed it to strike multimillion dollar partnerships with Google and OpenAI to train their large language models on its content, as tech companies look for real-world data that can improve their generative AI products.

But Huffman said Reddit was now battling to ensure its users stay at the center of the social network. “Where the rest of the internet seems to be powered by or written by or summarized by AI, Reddit is distinctly human,” he said. “It’s the place you go when you want to hear from people, their lived experiences, their perspectives, their recommendations. Reddit is communities and human curation and conversation and authenticity.”

As Reddit becomes an increasingly important source for LLMs, advertisers are responding with what one agency chief described as a “massive migration” to the platform.

Multiple advertising and agency executives speaking during this month’s Cannes advertising festival told the FT that brands were increasingly exploring hosting a business account and posting content on Reddit to boost the likelihood of their ads appearing in the responses of generative AI chatbots.

However, Huffman warned against any company seeking to game the site with fake or AI-generated content, with plans to bring in strict verification checks to ensure that only humans can post to its forums.

“For 20 years, we’ve been fighting people who have wanted to be popular on Reddit,” he said. “We index very well into the search engines. If you want to show up in the search engines, you try to do well on Reddit, and now the LLMs, it’s the same thing. If you want to be in the LLMs, you can do it through Reddit.”

Reddit CEO pledges site will remain “written by humans and voted on by humans” Read More »

Reddit sues Anthropic over AI scraping that retained users’ deleted posts

AI, ai scraping, alexa, Amazon, Anthropic, Artificial Intelligence, chatbots, Claude, large language models, Policy, reddit / Shannon Garcia / June 5, 2025

Of particular note, Reddit pointed out that Anthropic’s Claude models will help power Amazon’s revamped Alexa, following about $8 billion in Amazon investments in the AI company since 2023.

“By commercially licensing Claude for use in several of Amazon’s commercial offerings, Anthropic reaps significant profit from a technology borne of Reddit content,” Reddit alleged, and “at the expense of Reddit.” Anthropic’s unauthorized scraping also burdens Reddit’s servers, threatening to degrade the user experience and costing Reddit additional damages, Reddit alleged.

To rectify alleged harms, Reddit is hoping a jury will award not just damages covering Reddit’s alleged losses but also punitive damages due to Anthropic’s alleged conduct that is “willful, malicious, and undertaken with conscious disregard for Reddit’s contractual obligations to its users and the privacy rights of those users.”

Without an injunction, Reddit users allegedly have “no way of knowing” if Anthropic scraped their data, Reddit alleged. They also are “left to wonder whether any content they deleted after Claude began training on Reddit data nevertheless remains available to Anthropic and the likely tens of millions (and possibly growing) of Claude users,” Reddit said.

In a statement provided to Ars, Anthropic’s spokesperson confirmed that the AI company plans to fight Reddit’s claims.

“We disagree with Reddit’s claims and will defend ourselves vigorously,” Anthropic’s spokesperson said.

Amazon declined to comment. Reddit did not immediately respond to Ars’ request to comment. But Reddit’s chief legal officer, Ben Lee, told The New York Times that Reddit “will not tolerate profit-seeking entities like Anthropic commercially exploiting Reddit content for billions of dollars without any return for redditors or respect for their privacy.”

“AI companies should not be allowed to scrape information and content from people without clear limitations on how they can use that data,” Lee said. “Licensing agreements enable us to enforce meaningful protections for our users, including the right to delete your content, user privacy protections, and preventing users from being spammed using this content.”

Reddit sues Anthropic over AI scraping that retained users’ deleted posts Read More »

Company apologizes after AI support agent invents policy that causes user uproar

AI, AI development tools, AI programming, Biz & IT, chatbot, chatgpt, chatgtp, cursor, Hacker News, machine learning, reddit / Beth Washington / April 18, 2025

On Monday, a developer using the popular AI-powered code editor Cursor noticed something strange: Switching between machines instantly logged them out, breaking a common workflow for programmers who use multiple devices. When the user contacted Cursor support, an agent named “Sam” told them it was expected behavior under a new policy. But no such policy existed, and Sam was a bot. The AI model made the policy up, sparking a wave of complaints and cancellation threats documented on Hacker News and Reddit.

This marks the latest instance of AI confabulations (also called “hallucinations”) causing potential business damage. Confabulations are a type of “creative gap-filling” response where AI models invent plausible-sounding but false information. Instead of admitting uncertainty, AI models often prioritize creating plausible, confident responses, even when that means manufacturing information from scratch.

For companies deploying these systems in customer-facing roles without human oversight, the consequences can be immediate and costly: frustrated customers, damaged trust, and, in Cursor’s case, potentially canceled subscriptions.

How it unfolded

The incident began when a Reddit user named BrokenToasterOven noticed that while swapping between a desktop, laptop, and a remote dev box, Cursor sessions were unexpectedly terminated.

“Logging into Cursor on one machine immediately invalidates the session on any other machine,” BrokenToasterOven wrote in a message that was later deleted by r/cursor moderators. “This is a significant UX regression.”

Confused and frustrated, the user wrote an email to Cursor support and quickly received a reply from Sam: “Cursor is designed to work with one device per subscription as a core security feature,” read the email reply. The response sounded definitive and official, and the user did not suspect that Sam was not human.

Screenshot: — Screenshot of an email from the Cursor support bot named Sam. Credit: BrokenToasterOven / Reddit

After the initial Reddit post, users took the post as official confirmation of an actual policy change—one that broke habits essential to many programmers’ daily routines. “Multi-device workflows are table stakes for devs,” wrote one user.

Shortly afterward, several users publicly announced their subscription cancellations on Reddit, citing the non-existent policy as their reason. “I literally just cancelled my sub,” wrote the original Reddit poster, adding that their workplace was now “purging it completely.” Others joined in: “Yep, I’m canceling as well, this is asinine.” Soon after, moderators locked the Reddit thread and removed the original post.

Company apologizes after AI support agent invents policy that causes user uproar Read More »

Reddit mods are fighting to keep AI slop off subreddits. They could use help.

AI, Features, generative ai, reddit, Social Media, Tech / Mike M. / February 18, 2025

Mods ask Reddit for tools as generative AI gets more popular and inconspicuous.

Credit: Aurich Lawson (based on a still from Getty Images)

Like it or not, generative AI is carving out its place in the world. And some Reddit users are definitely in the “don’t like it” category. While some subreddits openly welcome AI-generated images, videos, and text, others have responded to the growing trend by banning most or all posts made with the technology.

To better understand the reasoning and obstacles associated with these bans, Ars Technica spoke with moderators of subreddits that totally or partially ban generative AI. Almost all these volunteers described moderating against generative AI as a time-consuming challenge they expect to get more difficult as time goes on. And most are hoping that Reddit will release a tool to help their efforts.

It’s hard to know how much AI-generated content is actually on Reddit, and getting an estimate would be a large undertaking. Image library Freepik has analyzed the use of AI-generated content on social media but leaves Reddit out of its research because “it would take loads of time to manually comb through thousands of threads within the platform,” spokesperson Bella Valentini told me. For its part, Reddit doesn’t publicly disclose how many Reddit posts involve generative AI use.

To be clear, we’re not suggesting that Reddit has a large problem with generative AI use. By now, many subreddits seem to have agreed on their approach to AI-generated posts, and generative AI has not superseded the real, human voices that have made Reddit popular.

Still, mods largely agree that generative AI will likely get more popular on Reddit over the next few years, making generative AI modding increasingly important to both moderators and general users. Generative AI’s rising popularity has also had implications for Reddit the company, which in 2024 started licensing Reddit posts to train the large language models (LLMs) powering generative AI.

(Note: All the moderators I spoke with for this story requested that I use their Reddit usernames instead of their real names due to privacy concerns.)

No generative AI allowed

When it comes to anti-generative AI rules, numerous subreddits have zero-tolerance policies, while others permit posts that use generative AI if it’s combined with human elements or is executed very well. These rules task mods with identifying posts using generative AI and determining if they fit the criteria to be permitted on the subreddit.

Many subreddits have rules against posts made with generative AI because their mod teams or members consider such posts “low effort” or believe AI is counterintuitive to the subreddit’s mission of providing real human expertise and creations.

“At a basic level, generative AI removes the human element from the Internet; if we allowed it, then it would undermine the very point of r/AskHistorians, which is engagement with experts,” the mods of r/AskHistorians told me in a collective statement.

The subreddit’s goal is to provide historical information, and its mods think generative AI could make information shared on the subreddit less accurate. “[Generative AI] is likely to hallucinate facts, generate non-existent references, or otherwise provide misleading content,” the mods said. “Someone getting answers from an LLM can’t respond to follow-ups because they aren’t an expert. We have built a reputation as a reliable source of historical information, and the use of [generative AI], especially without oversight, puts that at risk.”

Similarly, Halaku, a mod of r/wheeloftime, told me that the subreddit’s mods banned generative AI because “we focus on genuine discussion.” Halaku believes AI content can’t facilitate “organic, genuine discussion” and “can drown out actual artwork being done by actual artists.”

The r/lego subreddit banned AI-generated art because it caused confusion in online fan communities and retail stores selling Lego products, r/lego mod Mescad said. “People would see AI-generated art that looked like Lego on [I]nstagram or [F]acebook and then go into the store to ask to buy it,” they explained. “We decided that our community’s dedication to authentic Lego products doesn’t include AI-generated art.”

Not all of Reddit is against generative AI, of course. Subreddits dedicated to the technology exist, and some general subreddits permit the use of generative AI in some or all forms.

“When it comes to bans, I would rather focus on hate speech, Nazi salutes, and things that actually harm the subreddits,” said 3rdusernameiveused, who moderates r/consoom and r/TeamBuilder25, which don’t ban generative AI. “AI art does not do that… If I was going to ban [something] for ‘moral’ reasons, it probably won’t be AI art.”

“Overwhelmingly low-effort slop”

Some generative AI bans are reflective of concerns that people are not being properly compensated for the content they create, which is then fed into LLM training.

Mod Mathgeek007 told me that r/DeadlockTheGame bans generative AI because its members consider it “a form of uncredited theft,” adding:

You aren’t allowed to sell/advertise the workers of others, and AI in a sense is using patterns derived from the work of others to create mockeries. I’d personally have less of an issue with it if the artists involved were credited and compensated—and there are some niche AI tools that do this.

Other moderators simply think generative AI reduces the quality of a subreddit’s content.

“It often just doesn’t look good… the art can often look subpar,” Mathgeek007 said.

Similarly, r/videos bans most AI-generated content because, according to its announcement, the videos are “annoying” and “just bad video” 99 percent of the time. In an online interview, r/videos mod Abrownn told me:

It’s overwhelmingly low-effort slop thrown together simply for views/ad revenue. The creators rarely care enough to put real effort into post-generation [or] editing of the content [and] rarely have coherent narratives [in] the videos, etc. It seems like they just throw the generated content into a video, export it, and call it a day.

An r/fakemon mod told me, “I can’t think of anything more low-effort in terms of art creation than just typing words and having it generated for you.”

Some moderators say generative AI helps people spam unwanted content on a subreddit, including posts that are irrelevant to the subreddit and posts that attack users.

“[Generative AI] content is almost entirely posted for purely self promotional/monetary reasons, and we as mods on Reddit are constantly dealing with abusive users just spamming their content without regard for the rules,” Abrownn said.

A moderator of the r/wallpaper subreddit, which permits generative AI, disagrees. The mod told me that generative AI “provides new routes for novel content” in the subreddit and questioned concerns about generative AI stealing from human artists or offering lower-quality work, saying those problems aren’t unique to generative AI:

Even in our community, we observe human-generated content that is subjectively low quality (poor camera/[P]hotoshopping skills, low-resolution source material, intentional “shitposting”). It can be argued that AI-generated content amplifies this behavior, but our experience (which we haven’t quantified) is that the rate of such behavior (whether human-generated or AI-generated content) has not changed much within our own community.

But we’re not a very active community—[about] 13 posts per day … so it very well could be a “frog in boiling water” situation.

Generative AI “wastes our time”

Many mods are confident in their ability to effectively identify posts that use generative AI. A bigger problem is how much time it takes to identify these posts and remove them.

The r/AskHistorians mods, for example, noted that all bans on the subreddit (including bans unrelated to AI) have “an appeals process,” and “making these assessments and reviewing AI appeals means we’re spending a considerable amount of time on something we didn’t have to worry about a few years ago.”

They added:

Frankly, the biggest challenge with [generative AI] usage is that it wastes our time. The time spent evaluating responses for AI use, responding to AI evangelists who try to flood our subreddit with inaccurate slop and then argue with us in modmail, [direct messages that message a subreddits’ mod team], and discussing edge cases could better be spent on other subreddit projects, like our podcast, newsletter, and AMAs, … providing feedback to users, or moderating input from users who intend to positively contribute to the community.

Several other mods I spoke with agree. Mathgeek007, for example, named “fighting AI bros” as a common obstacle. And for r/wheeloftime moderator Halaku, the biggest challenge in moderating against generative AI is “a generational one.”

“Some of the current generation don’t have a problem with it being AI because content is content, and [they think] we’re being elitist by arguing otherwise, and they want to argue about it,” they said.

A couple of mods noted that it’s less time-consuming to moderate subreddits that ban generative AI than it is to moderate those that allow posts using generative AI, depending on the context.

“On subreddits where we allowed AI, I often take a bit longer time to actually go into each post where I feel like… it’s been AI-generated to actually look at it and make a decision,” explained N3DSdude, a mod of several subreddits with rules against generative AI, including r/DeadlockTheGame.

MyarinTime, a moderator for r/lewdgames, which allows generative AI images, highlighted the challenges of identifying human-prompted generative AI content versus AI-generated content prompted by a bot:

When the AI bomb started, most of those bots started using AI content to work around our filters. Most of those bots started showing some random AI render, so it looks like you’re actually talking about a game when you’re not. There’s no way to know when those posts are legit games unless [you check] them one by one. I honestly believe it would be easier if we kick any post with [AI-]generated image… instead of checking if a button was pressed by a human or not.

Mods expect things to get worse

Most mods told me it’s pretty easy for them to detect posts made with generative AI, pointing to the distinct tone and favored phrases of AI-generated text. A few said that AI-generated video is harder to spot but still detectable. But as generative AI gets more advanced, moderators are expecting their work to get harder.

In a joint statement, r/dune mods Blue_Three and Herbalhippie said, “AI used to have a problem making hands—i.e., too many fingers, etc.—but as time goes on, this is less and less of an issue.”

R/videos’ Abrownn also wonders how easy it will be to detect AI-generated Reddit content “as AI tools advance and content becomes more lifelike.”

Mathgeek007 added:

AI is becoming tougher to spot and is being propagated at a larger rate. When AI style becomes normalized, it becomes tougher to fight. I expect generative AI to get significantly worse—until it becomes indistinguishable from ordinary art.

Moderators currently use various methods to fight generative AI, but they’re not perfect. r/AskHistorians mods, for example, use “AI detectors, which are unreliable, problematic, and sometimes require paid subscriptions, as well as our own ability to detect AI through experience and expertise,” while N3DSdude pointed to tools like Quid and GPTZero.

To manage current and future work around blocking generative AI, most of the mods I spoke with said they’d like Reddit to release a proprietary tool to help them.

“I’ve yet to see a reliable tool that can detect AI-generated video content,” Aabrown said. “Even if we did have such a tool, we’d be putting hundreds of hours of content through the tool daily, which would get rather expensive rather quickly. And we’re unpaid volunteer moderators, so we will be outgunned shortly when it comes to detecting this type of content at scale. We can only hope that Reddit will offer us a tool at some point in the near future that can help deal with this issue.”

A Reddit spokesperson told me that the company is evaluating what such a tool could look like. But Reddit doesn’t have a rule banning generative AI overall, and the spokesperson said the company doesn’t want to release a tool that would hinder expression or creativity.

For now, Reddit seems content to rely on moderators to remove AI-generated content when appropriate. Reddit’s spokesperson added:

Our moderation approach helps ensure that content on Reddit is curated by real humans. Moderators are quick to remove content that doesn’t follow community rules, including harmful or irrelevant AI-generated content—we don’t see this changing in the near future.

Making a generative AI Reddit tool wouldn’t be easy

Reddit is handling the evolving concerns around generative AI as it has handled other content issues, including by leveraging AI and machine learning tools. Reddit’s spokesperson said that this includes testing tools that can identify AI-generated media, such as images of politicians.

But making a proprietary tool that allows moderators to detect AI-generated posts won’t be easy, if it happens at all. The current tools for detecting generative AI are limited in their capabilities, and as generative AI advances, Reddit would need to provide tools that are more advanced than the AI-detecting tools that are currently available.

That would require a good deal of technical resources and would also likely present notable economic challenges for the social media platform, which only became profitable last year. And as noted by r/videos moderator Abrownn, tools for detecting AI-generated video still have a long way to go, making a Reddit-specific system especially challenging to create.

But even with a hypothetical Reddit tool, moderators would still have their work cut out for them. And because Reddit’s popularity is largely due to its content from real humans, that work is important.

Since Reddit’s inception, that has meant relying on moderators, which Reddit has said it intends to keep doing. As r/dune mods Blue_Three and herbalhippie put it, it’s in Reddit’s “best interest that much/most content remains organic in nature.” After all, Reddit’s profitability has a lot to do with how much AI companies are willing to pay to access Reddit data. That value would likely decline if Reddit posts became largely AI-generated themselves.

But providing the technology to ensure that generative AI isn’t abused on Reddit would be a large challege. For now, volunteer laborers will continue to bear the brunt of generative AI moderation.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

Reddit mods are fighting to keep AI slop off subreddits. They could use help. Read More »

Reddit will lock some content behind a paywall this year, CEO says

reddit, Social Media, Tech / Shannon Garcia / February 16, 2025

Reddit is planning to introduce a paywall this year, CEO Steve Huffman said during a videotaped Ask Me Anything (AMA) session on Thursday.

Huffman previously showed interest in potentially introducing a new type of subreddit with “exclusive content or private areas” that Reddit users would pay to access.

When asked this week about plans for some Redditors to create “content that only paid members can see,” Huffman said:

It’s a work in progress right now, so that one’s coming… We’re working on it as we speak.

When asked about “new, key features that you plan to roll out for Reddit in 2025,” Huffman responded, in part: “Paid subreddits, yes.”

Reddit’s paywall would ostensibly only apply to certain new subreddit types, not any subreddits currently available. In August, Huffman said that even with paywalled content, free Reddit would “continue to exist and grow and thrive.”

A critical aspect of any potential plan to make Reddit users pay to access subreddit content is determining how related Reddit users will be compensated. Reddit may have a harder time getting volunteer moderators to wrangle discussions on paid-for subreddits—if it uses volunteer mods at all. Balancing paid and free content would also be necessary to avoid polarizing much of Reddit’s current user base.

Reddit has had paid-for premium versions of community features before, like r/Lounge, a subreddit that only people with Reddit Gold, which you have to buy with real money, can access.

Reddit would also need to consider how it might compensate people for user-generated content that people pay to access, as Reddit’s business is largely built on free, user-generated content. The Reddit Contributor Program, launched in September 2023, could be a foundation; it lets users “earn money for their qualifying contributions to the Reddit community, including awards and karma, collectible avatars, and developer apps,” according to Reddit. Reddit says it pays up to $0.01 per 1 Gold received, depending on how much karma the user has earned over the past year. For someone to pay out, they need at least 1,000 Gold, which is equivalent to $10.

Reddit will lock some content behind a paywall this year, CEO says Read More »

Reddit won’t interfere with users revolting against X with subreddit bans

culture, elon musk, facebook, Meta, reddit, Social Media, Tech / Mike M. / January 22, 2025

A Reddit spokesperson told Ars that decisions to ban or not ban X links are user-driven. Subreddit members are allowed to suggest and institute subreddit rules, they added.

“Notably, many Reddit communities also prohibit Reddit links,” the Reddit representative pointed out. They noted that Reddit as a company doesn’t currently have any ban on links to X.

A ban against links to an entire platform isn’t outside of the ordinary for Reddit. Numerous subreddits ban social media links, Reddit’s spokesperson said. r/EarthPorn, a subreddit for landscape photography, for example, doesn’t allow website links because all posts “must be static images,” per the subreddit’s official rules. r/AskReddit, meanwhile, only allows for questions asked in the title of a Reddit post and doesn’t allow for use of the text box, including for sharing links.

“Reddit has a longstanding commitment to freedom of speech and freedom of association,” Reddit’s spokesperson said. They added that any person is free to make or moderate their own community. Those unsatisfied with a forum about Seahawks football that doesn’t have X links could feel free to make their own subreddit. Although, some of the subreddits considering X bans, like r/MadeMeSmile, already have millions of followers.

Meta bans also under discussion

As 404 Media noted, some Redditors are also pushing to block content from Facebook, Instagram, and other Meta properties in response to new Donald Trump-friendly policies instituted by owner Mark Zuckerberg, like Meta killing diversity programs and axing third-party fact-checkers.

Reddit won’t interfere with users revolting against X with subreddit bans Read More »

Reddit debuts AI-powered discussion search—but will users like it?

Advance Publications, AI, Biz & IT, chatgpt, chatgtp, Conde Nast, machine learning, openai, Perplexity, Perplexity.ai, reddit, Reddit Answers / 9u50fv / December 9, 2024

The company then went on to strike deals with major tech firms, including a $60 million agreement with Google in February 2024 and a partnership with OpenAI in May 2024 that integrated Reddit content into ChatGPT.

But Reddit users haven’t been entirely happy with the deals. In October 2024, London-based Redditors began posting false restaurant recommendations to manipulate search results and keep tourists away from their favorite spots. This coordinated effort to feed incorrect information into AI systems demonstrated how user communities might intentionally “poison” AI training data over time.

The potential for trouble

While it’s tempting to lean heavily into generative AI technology while it is currently trendy, the move could also represent a challenge for the company. For example, Reddit’s AI-powered summaries could potentially draw from inaccurate information featured on the site and provide incorrect answers, or it may draw inaccurate conclusions from correct information.

We will keep an eye on Reddit’s new AI-powered search tool to see if it resists the type of confabulation that we’ve seen with Google’s AI Overview, an AI summary bot that has been a critical failure so far.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Reddit debuts AI-powered discussion search—but will users like it? Read More »