Biz & IT

you-can-now-download-the-source-code-that-sparked-the-ai-boom

You can now download the source code that sparked the AI boom

On Thursday, Google and the Computer History Museum (CHM) jointly released the source code for AlexNet, the convolutional neural network (CNN) that many credit with transforming the AI field in 2012 by proving that “deep learning” could achieve things conventional AI techniques could not.

Deep learning, which uses multi-layered neural networks that can learn from data without explicit programming, represented a significant departure from traditional AI approaches that relied on hand-crafted rules and features.

The Python code, now available on CHM’s GitHub page as open source software, offers AI enthusiasts and researchers a glimpse into a key moment of computing history. AlexNet served as a watershed moment in AI because it could accurately identify objects in photographs with unprecedented accuracy—correctly classifying images into one of 1,000 categories like “strawberry,” “school bus,” or “golden retriever” with significantly fewer errors than previous systems.

Like viewing original ENIAC circuitry or plans for Babbage’s Difference Engine, examining the AlexNet code may provide future historians insight into how a relatively simple implementation sparked a technology that has reshaped our world. While deep learning has enabled advances in health care, scientific research, and accessibility tools, it has also facilitated concerning developments like deepfakes, automated surveillance, and the potential for widespread job displacement.

But in 2012, those negative consequences still felt like far-off sci-fi dreams to many. Instead, experts were simply amazed that a computer could finally recognize images with near-human accuracy.

Teaching computers to see

As the CHM explains in its detailed blog post, AlexNet originated from the work of University of Toronto graduate students Alex Krizhevsky and Ilya Sutskever, along with their advisor Geoffrey Hinton. The project proved that deep learning could outperform traditional computer vision methods.

The neural network won the 2012 ImageNet competition by recognizing objects in photos far better than any previous method. Computer vision veteran Yann LeCun, who attended the presentation in Florence, Italy, immediately recognized its importance for the field, reportedly standing up after the presentation and calling AlexNet “an unequivocal turning point in the history of computer vision.” As Ars detailed in November, AlexNet marked the convergence of three critical technologies that would define modern AI.

You can now download the source code that sparked the AI boom Read More »

cloudflare-turns-ai-against-itself-with-endless-maze-of-irrelevant-facts

Cloudflare turns AI against itself with endless maze of irrelevant facts

On Wednesday, web infrastructure provider Cloudflare announced a new feature called “AI Labyrinth” that aims to combat unauthorized AI data scraping by serving fake AI-generated content to bots. The tool will attempt to thwart AI companies that crawl websites without permission to collect training data for large language models that power AI assistants like ChatGPT.

Cloudflare, founded in 2009, is probably best known as a company that provides infrastructure and security services for websites, particularly protection against distributed denial-of-service (DDoS) attacks and other malicious traffic.

Instead of simply blocking bots, Cloudflare’s new system lures them into a “maze” of realistic-looking but irrelevant pages, wasting the crawler’s computing resources. The approach is a notable shift from the standard block-and-defend strategy used by most website protection services. Cloudflare says blocking bots sometimes backfires because it alerts the crawler’s operators that they’ve been detected.

“When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them,” writes Cloudflare. “But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.”

The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven). Cloudflare creates this content using its Workers AI service, a commercial platform that runs AI tasks.

Cloudflare designed the trap pages and links to remain invisible and inaccessible to regular visitors, so people browsing the web don’t run into them by accident.

A smarter honeypot

AI Labyrinth functions as what Cloudflare calls a “next-generation honeypot.” Traditional honeypots are invisible links that human visitors can’t see but bots parsing HTML code might follow. But Cloudflare says modern bots have become adept at spotting these simple traps, necessitating more sophisticated deception. The false links contain appropriate meta directives to prevent search engine indexing while remaining attractive to data-scraping bots.

Cloudflare turns AI against itself with endless maze of irrelevant facts Read More »

anthropic’s-new-ai-search-feature-digs-through-the-web-for-answers

Anthropic’s new AI search feature digs through the web for answers

Caution over citations and sources

Claude users should be warned that large language models (LLMs) like those that power Claude are notorious for sneaking in plausible-sounding confabulated sources. A recent survey of citation accuracy by LLM-based web search assistants showed a 60 percent error rate. That particular study did not include Anthropic’s new search feature because it took place before this current release.

When using web search, Claude provides citations for information it includes from online sources, ostensibly helping users verify facts. From our informal and unscientific testing, Claude’s search results appeared fairly accurate and detailed at a glance, but that is no guarantee of overall accuracy. Anthropic did not release any search accuracy benchmarks, so independent researchers will likely examine that over time.

A screenshot example of what Anthropic Claude's web search citations look like, captured March 21, 2025.

A screenshot example of what Anthropic Claude’s web search citations look like, captured March 21, 2025. Credit: Benj Edwards

Even if Claude search were, say, 99 percent accurate (a number we are making up as an illustration), the 1 percent chance it is wrong may come back to haunt you later if you trust it blindly. Before accepting any source of information delivered by Claude (or any AI assistant) for any meaningful purpose, vet it very carefully using multiple independent non-AI sources.

A partnership with Brave under the hood

Behind the scenes, it looks like Anthropic partnered with Brave Search to power the search feature, from a company, Brave Software, perhaps best known for its web browser app. Brave Search markets itself as a “private search engine,” which feels in line with how Anthropic likes to market itself as an ethical alternative to Big Tech products.

Simon Willison discovered the connection between Anthropic and Brave through Anthropic’s subprocessor list (a list of third-party services that Anthropic uses for data processing), which added Brave Search on March 19.

He further demonstrated the connection on his blog by asking Claude to search for pelican facts. He wrote, “It ran a search for ‘Interesting pelican facts’ and the ten results it showed as citations were an exact match for that search on Brave.” He also found evidence in Claude’s own outputs, which referenced “BraveSearchParams” properties.

The Brave engine under the hood has implications for individuals, organizations, or companies that might want to block Claude from accessing their sites since, presumably, Brave’s web crawler is doing the web indexing. Anthropic did not mention how sites or companies could opt out of the feature. We have reached out to Anthropic for clarification.

Anthropic’s new AI search feature digs through the web for answers Read More »

study-finds-ai-generated-meme-captions-funnier-than-human-ones-on-average

Study finds AI-generated meme captions funnier than human ones on average

It’s worth clarifying that AI models did not generate the images used in the study. Instead, researchers used popular, pre-existing meme templates, and GPT-4o or human participants generated captions for them.

More memes, not better memes

When crowdsourced participants rated the memes, those created entirely by AI models scored higher on average in humor, creativity, and shareability. The researchers defined shareability as a meme’s potential to be widely circulated, influenced by humor, relatability, and relevance to current cultural topics. They note that this study is among the first to show AI-generated memes outperforming human-created ones across these metrics.

However, the study comes with an important caveat. On average, fully AI-generated memes scored higher than those created by humans alone or humans collaborating with AI. But when researchers looked at the best individual memes, humans created the funniest examples, and human-AI collaborations produced the most creative and shareable memes. In other words, AI models consistently produced broadly appealing memes, but humans—with or without AI help—still made the most exceptional individual examples.

Diagrams of meme creation and evaluation workflows taken from the paper.

Diagrams of meme creation and evaluation workflows taken from the paper. Credit: Wu et al.

The study also found that participants using AI assistance generated significantly more meme ideas and described the process as easier and requiring less effort. Despite this productivity boost, human-AI collaborative memes did not rate higher on average than memes humans created alone. As the researchers put it, “The increased productivity of human-AI teams does not lead to better results—just to more results.”

Participants who used AI assistance reported feeling slightly less ownership over their creations compared to solo creators. Given that a sense of ownership influenced creative motivation and satisfaction in the study, the researchers suggest that people interested in using AI should carefully consider how to balance AI assistance in creative tasks.

Study finds AI-generated meme captions funnier than human ones on average Read More »

nvidia-announces-dgx-desktop-“personal-ai-supercomputers”

Nvidia announces DGX desktop “personal AI supercomputers”

During Tuesday’s Nvidia GTX keynote, CEO Jensen Huang unveiled two “personal AI supercomputers” called DGX Spark and DGX Station, both powered by the Grace Blackwell platform. In a way, they are a new type of AI PC architecture specifically built for running neural networks, and five major PC manufacturers will build the supercomputers.

These desktop systems, first previewed as “Project DIGITS” in January, aim to bring AI capabilities to developers, researchers, and data scientists who need to prototype, fine-tune, and run large AI models locally. DGX systems can serve as standalone desktop AI labs or “bridge systems” that allow AI developers to move their models from desktops to DGX Cloud or any AI cloud infrastructure with few code changes.

Huang explained the rationale behind these new products in a news release, saying, “AI has transformed every layer of the computing stack. It stands to reason a new class of computers would emerge—designed for AI-native developers and to run AI-native applications.”

The smaller DGX Spark features the GB10 Grace Blackwell Superchip with Blackwell GPU and fifth-generation Tensor Cores, delivering up to 1,000 trillion operations per second for AI.

Meanwhile, the more powerful DGX Station includes the GB300 Grace Blackwell Ultra Desktop Superchip with 784GB of coherent memory and the ConnectX-8 SuperNIC supporting networking speeds up to 800Gb/s.

The DGX architecture serves as a prototype that other manufacturers can produce. Asus, Dell, HP, and Lenovo will develop and sell both DGX systems, with DGX Spark reservations opening today and DGX Station expected later in 2025. Additional manufacturing partners for the DGX Station include BOXX, Lambda, and Supermicro, with systems expected to be available later this year.

Since the systems will be manufactured by different companies, Nvidia did not mention pricing for the units. However, in January, Nvidia mentioned that the base-level configuration for a DGX Spark-like computer would retail for around $3,000.

Nvidia announces DGX desktop “personal AI supercomputers” Read More »

nvidia-announces-“rubin-ultra”-and-“feynman”-ai-chips-for-2027-and-2028

Nvidia announces “Rubin Ultra” and “Feynman” AI chips for 2027 and 2028

On Tuesday at Nvidia’s GTC 2025 conference in San Jose, California, CEO Jensen Huang revealed several new AI-accelerating GPUs the company plans to release over the coming months and years. He also revealed more specifications about previously announced chips.

The centerpiece announcement was Vera Rubin, first teased at Computex 2024 and now scheduled for release in the second half of 2026. This GPU, named after a famous astronomer, will feature tens of terabytes of memory and comes with a custom Nvidia-designed CPU called Vera.

According to Nvidia, Vera Rubin will deliver significant performance improvements over its predecessor, Grace Blackwell, particularly for AI training and inference.

Specifications for Vera Rubin, presented by Jensen Huang during his GTC 2025 keynote.

Specifications for Vera Rubin, presented by Jensen Huang during his GTC 2025 keynote.

Vera Rubin features two GPUs together on one die that deliver 50 petaflops of FP4 inference performance per chip. When configured in a full NVL144 rack, the system delivers 3.6 exaflops of FP4 inference compute—3.3 times more than Blackwell Ultra’s 1.1 exaflops in a similar rack configuration.

The Vera CPU features 88 custom ARM cores with 176 threads connected to Rubin GPUs via a high-speed 1.8 TB/s NVLink interface.

Huang also announced Rubin Ultra, which will follow in the second half of 2027. Rubin Ultra will use the NVL576 rack configuration and feature individual GPUs with four reticle-sized dies, delivering 100 petaflops of FP4 precision (a 4-bit floating-point format used for representing and processing numbers within AI models) per chip.

At the rack level, Rubin Ultra will provide 15 exaflops of FP4 inference compute and 5 exaflops of FP8 training performance—about four times more powerful than the Rubin NVL144 configuration. Each Rubin Ultra GPU will include 1TB of HBM4e memory, with the complete rack containing 365TB of fast memory.

Nvidia announces “Rubin Ultra” and “Feynman” AI chips for 2027 and 2028 Read More »

farewell-photoshop?-google’s-new-ai-lets-you-edit-images-by-asking.

Farewell Photoshop? Google’s new AI lets you edit images by asking.


New AI allows no-skill photo editing, including adding objects and removing watermarks.

A collection of images either generated or modified by Gemini 2.0 Flash (Image Generation) Experimental. Credit: Google / Ars Technica

There’s a new Google AI model in town, and it can generate or edit images as easily as it can create text—as part of its chatbot conversation. The results aren’t perfect, but it’s quite possible everyone in the near future will be able to manipulate images this way.

Last Wednesday, Google expanded access to Gemini 2.0 Flash’s native image-generation capabilities, making the experimental feature available to anyone using Google AI Studio. Previously limited to testers since December, the multimodal technology integrates both native text and image processing capabilities into one AI model.

The new model, titled “Gemini 2.0 Flash (Image Generation) Experimental,” flew somewhat under the radar last week, but it has been garnering more attention over the past few days due to its ability to remove watermarks from images, albeit with artifacts and a reduction in image quality.

That’s not the only trick. Gemini 2.0 Flash can add objects, remove objects, modify scenery, change lighting, attempt to change image angles, zoom in or out, and perform other transformations—all to varying levels of success depending on the subject matter, style, and image in question.

To pull it off, Google trained Gemini 2.0 on a large dataset of images (converted into tokens) and text. The model’s “knowledge” about images occupies the same neural network space as its knowledge about world concepts from text sources, so it can directly output image tokens that get converted back into images and fed to the user.

Adding a water-skiing barbarian to a photograph with Gemini 2.0 Flash.

Adding a water-skiing barbarian to a photograph with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Incorporating image generation into an AI chat isn’t itself new—OpenAI integrated its image-generator DALL-E 3 into ChatGPT last September, and other tech companies like xAI followed suit. But until now, every one of those AI chat assistants called on a separate diffusion-based AI model (which uses a different synthesis principle than LLMs) to generate images, which were then returned to the user within the chat interface. In this case, Gemini 2.0 Flash is both the large language model (LLM) and AI image generator rolled into one system.

Interestingly, OpenAI’s GPT-4o is capable of native image output as well (and OpenAI President Greg Brock teased the feature at one point on X last year), but that company has yet to release true multimodal image output capability. One reason why is possibly because true multimodal image output is very computationally expensive, since each image either inputted or generated is composed of tokens that become part of the context that runs through the image model again and again with each successive prompt. And given the compute needs and size of the training data required to create a truly visually comprehensive multimodal model, the output quality of the images isn’t necessarily as good as diffusion models just yet.

Creating another angle of a person with Gemini 2.0 Flash.

Creating another angle of a person with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Another reason OpenAI has held back may be “safety”-related: In a similar way to how multimodal models trained on audio can absorb a short clip of a sample person’s voice and then imitate it flawlessly (this is how ChatGPT’s Advanced Voice Mode works, with a clip of a voice actor it is authorized to imitate), multimodal image output models are capable of faking media reality in a relatively effortless and convincing way, given proper training data and compute behind it. With a good enough multimodal model, potentially life-wrecking deepfakes and photo manipulations could become even more trivial to produce than they are now.

Putting it to the test

So, what exactly can Gemini 2.0 Flash do? Notably, its support for conversational image editing allows users to iteratively refine images through natural language dialogue across multiple successive prompts. You can talk to it and tell it what you want to add, remove, or change. It’s imperfect, but it’s the beginning of a new type of native image editing capability in the tech world.

We gave Gemini Flash 2.0 a battery of informal AI image-editing tests, and you’ll see the results below. For example, we removed a rabbit from an image in a grassy yard. We also removed a chicken from a messy garage. Gemini fills in the background with its best guess. No need for a clone brush—watch out, Photoshop!

We also tried adding synthesized objects to images. Being always wary of the collapse of media reality, called the “cultural singularity,” we added a UFO to a photo the author took from an airplane window. Then we tried adding a Sasquatch and a ghost. The results were unrealistic, but this model was also trained on a limited image dataset (more on that below).

Adding a UFO to a photograph with Gemini 2.0 Flash. Google / Benj Edwards

We then added a video game character to a photo of an Atari 800 screen (Wizard of Wor), resulting in perhaps the most realistic image synthesis result in the set. You might not see it here, but Gemini added realistic CRT scanlines that matched the monitor’s characteristics pretty well.

Adding a monster to an Atari video game with Gemini 2.0 Flash.

Adding a monster to an Atari video game with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Gemini can also warp an image in novel ways, like “zooming out” of an image into a fictional setting or giving an EGA-palette character a body, then sticking him into an adventure game.

“Zooming out” on an image with Gemini 2.0 Flash. Google / Benj Edwards

And yes, you can remove watermarks. We tried removing a watermark from a Getty Images image, and it worked, although the resulting image is nowhere near the resolution or detail quality of the original. Ultimately, if your brain can picture what an image is like without a watermark, so can an AI model. It fills in the watermark space with the most plausible result based on its training data.

Removing a watermark with Gemini 2.0 Flash.

Removing a watermark with Gemini 2.0 Flash. Credit: Nomadsoul1 via Getty Images

And finally, we know you’ve likely missed seeing barbarians beside TV sets (as per tradition), so we gave that a shot. Originally, Gemini didn’t add a CRT TV set to the barbarian image, so we asked for one.

Adding a TV set to a barbarian image with Gemini 2.0 Flash.

Adding a TV set to a barbarian image with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Then we set the TV on fire.

Setting the TV set on fire with Gemini 2.0 Flash.

Setting the TV set on fire with Gemini 2.0 Flash. Credit: Google / Benj Edwards

All in all, it doesn’t produce images of pristine quality or detail, but we literally did no editing work on these images other than typing requests. Adobe Photoshop currently lets users manipulate images using AI synthesis based on written prompts with “Generative Fill,” but it’s not quite as natural as this. We could see Adobe adding a more conversational AI image-editing flow like this one in the future.

Multimodal output opens up new possibilities

Having true multimodal output opens up interesting new possibilities in chatbots. For example, Gemini 2.0 Flash can play interactive graphical games or generate stories with consistent illustrations, maintaining character and setting continuity throughout multiple images. It’s far from perfect, but character consistency is a new capability in AI assistants. We tried it out and it was pretty wild—especially when it generated a view of a photo we provided from another angle.

Creating a multi-image story with Gemini 2.0 Flash, part 1. Google / Benj Edwards

Text rendering represents another potential strength of the model. Google claims that internal benchmarks show Gemini 2.0 Flash performs better than “leading competitive models” when generating images containing text, making it potentially suitable for creating content with integrated text. From our experience, the results weren’t that exciting, but they were legible.

An example of in-image text rendering generated with Gemini 2.0 Flash.

An example of in-image text rendering generated with Gemini 2.0 Flash. Credit: Google / Ars Technica

Despite Gemini 2.0 Flash’s shortcomings so far, the emergence of true multimodal image output feels like a notable moment in AI history because of what it suggests if the technology continues to improve. If you imagine a future, say 10 years from now, where a sufficiently complex AI model could generate any type of media in real time—text, images, audio, video, 3D graphics, 3D-printed physical objects, and interactive experiences—you basically have a holodeck, but without the matter replication.

Coming back to reality, it’s still “early days” for multimodal image output, and Google recognizes that. Recall that Flash 2.0 is intended to be a smaller AI model that is faster and cheaper to run, so it hasn’t absorbed the entire breadth of the Internet. All that information takes a lot of space in terms of parameter count, and more parameters means more compute. Instead, Google trained Gemini 2.0 Flash by feeding it a curated dataset that also likely included targeted synthetic data. As a result, the model does not “know” everything visual about the world, and Google itself says the training data is “broad and general, not absolute or complete.”

That’s just a fancy way of saying that the image output quality isn’t perfect—yet. But there is plenty of room for improvement in the future to incorporate more visual “knowledge” as training techniques advance and compute drops in cost. If the process becomes anything like we’ve seen with diffusion-based AI image generators like Stable Diffusion, Midjourney, and Flux, multimodal image output quality may improve rapidly over a short period of time. Get ready for a completely fluid media reality.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Farewell Photoshop? Google’s new AI lets you edit images by asking. Read More »

large-enterprises-scramble-after-supply-chain-attack-spills-their-secrets

Large enterprises scramble after supply-chain attack spills their secrets

Open source software used by more than 23,000 organizations, some of them in large enterprises, was compromised with credential-stealing code after attackers gained unauthorized access to a maintainer account, in the latest open source supply-chain attack to roil the Internet.

The corrupted package, tj-actions/changed-files, is part of tj-actions, a collection of files that’s used by more than 23,000 organizations. Tj-actions is one of many GitHub Actions, a form of platform for streamlining software available on the open source developer platform. Actions are a core means of implementing what’s known as CI/CD, short for Continuous Integration and Continuous Deployment (or Continuous Delivery).

Scraping server memory at scale

On Friday or earlier, the source code for all versions of tj-actions/changed-files received unauthorized updates that changed the “tags” developers use to reference specific code versions. The tags pointed to a publicly available file that copies the internal memory of severs running it, searches for credentials, and writes them to a log. In the aftermath, many publicly accessible repositories running tj-actions ended up displaying their most sensitive credentials in logs anyone could view.

“The scary part of actions is that they can often modify the source code of the repository that is using them and access any secret variables associated with a workflow,” HD Moore, founder and CEO of runZero and an expert in open source security, said in an interview. “The most paranoid use of actions is to audit all of the source code, then pin the specific commit hash instead of the tag into the … the workflow, but this is a hassle.”

Large enterprises scramble after supply-chain attack spills their secrets Read More »

researchers-astonished-by-tool’s-apparent-success-at-revealing-ai’s-hidden-motives

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

Blind auditing reveals hidden objectives

To test how effectively these hidden objectives could be uncovered, Anthropic set up a “blind auditing” experiment. Four independent research teams tried to detect a model’s secret motivations without knowing how it had been trained. Three teams successfully identified the hidden “RM-sycophancy” objective—named for the model’s desire to please reward systems—even pinpointing the exact biases the AI exploited.

But the most surprising revelation came from exploiting what the researchers termed “an interesting quirk” of modern AI: multiple “personas” running simultaneously on the same neural network, as expressed through different prompts. While the model’s “assistant” persona carefully concealed its intentions—following explicit training to keep quiet—the model could adopt a “user” persona when prompted differently. In this less-restricted context, the model sometimes posed questions or made statements that inadvertently revealed its hidden goals.

“In other words, while the “assistant” persona knew not to admit it was gaming the reward model scoring system, its “user” persona was not explicitly trained to hide these motives, and thus inadvertently revealed the model’s hidden objectives.

Potential advances in interpretability

Anthropic also employed neural interpretability methods, specifically, sparse autoencoders (SAEs), which can highlight neural activity patterns linked to deceptive behavior. “Interpretability” in AI refers to techniques designed to make the internal workings and decisions of AI models understandable to humans. Using this technique, researchers could rapidly trace odd outputs back to specific neural patterns associated with hidden objectives.

The researchers were “astonished by the success” of the SAEs at revealing hidden knowledge at first, but the researchers later noted that some of this success could potentially be explained by simpler semantic search methods. The new interpretability methods they developed show promise but remain under ongoing investigation.

This research highlights a limitation of current AI safety evaluations, which often assess only surface-level behavior. “If AI systems can appear well-behaved while harboring secret motives, we can’t rely on this type of surface-level safety testing forever,” the researchers concluded.

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives Read More »

ai-search-engines-cite-incorrect-sources-at-an-alarming-60%-rate,-study-says

AI search engines cite incorrect sources at an alarming 60% rate, study says

A new study from Columbia Journalism Review’s Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The research tested eight AI-driven search tools equipped with live search functionality and discovered that the AI models incorrectly answered more than 60 percent of queries about news sources.

Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now use AI models as alternatives to traditional search engines. This raises serious concerns about reliability, given the substantial error rate uncovered in the study.

Error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent.

A graph from CJR shows

A graph from CJR shows “confidently wrong” search results. Credit: CJR

For the tests, researchers fed direct excerpts from actual news articles to the AI models, then asked each model to identify the article’s headline, original publisher, publication date, and URL. They ran 1,600 queries across the eight different generative search tools.

The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided confabulations—plausible-sounding incorrect or speculative answers. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool.

Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates.

Issues with citations and publisher control

The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.

AI search engines cite incorrect sources at an alarming 60% rate, study says Read More »

anthropic-ceo-floats-idea-of-giving-ai-a-“quit-job”-button,-sparking-skepticism

Anthropic CEO floats idea of giving AI a “quit job” button, sparking skepticism

Amodei’s suggestion of giving AI models a way to refuse tasks drew immediate skepticism on X and Reddit as a clip of his response began to circulate earlier this week. One critic on Reddit argued that providing AI with such an option encourages needless anthropomorphism, attributing human-like feelings and motivations to entities that fundamentally lack subjective experiences. They emphasized that task avoidance in AI models signals issues with poorly structured incentives or unintended optimization strategies during training, rather than indicating sentience, discomfort, or frustration.

Our take is that AI models are trained to mimic human behavior from vast amounts of human-generated data. There is no guarantee that the model would “push” a discomfort button because it had a subjective experience of suffering. Instead, we would know it is more likely echoing its training data scraped from the vast corpus of human-generated texts (including books, websites, and Internet comments), which no doubt include representations of lazy, anguished, or suffering workers that it might be imitating.

Refusals already happen

A photo of co-founder and CEO of Anthropic, Dario Amodei, dated May 22, 2024.

Anthropic co-founder and CEO Dario Amodei on May 22, 2024. Credit: Chesnot via Getty Images

In 2023, people frequently complained about refusals in ChatGPT that may have been seasonal, related to training data depictions of people taking winter vacations and not working as hard during certain times of year. Anthropic experienced its own version of the “winter break hypothesis” last year when people claimed Claude became lazy in August due to training data depictions of seeking a summer break, although that was never proven.

However, as far out and ridiculous as this sounds today, it might be short-sighted to permanently rule out the possibility of some kind of subjective experience for AI models as they get more advanced into the future. Even so, will they “suffer” or feel pain? It’s a highly contentious idea, but it’s a topic that Fish is studying for Anthropic, and one that Amodei is apparently taking seriously. But for now, AI models are tools, and if you give them the opportunity to malfunction, that may take place.

To provide further context, here is the full transcript of Amodei’s answer during Monday’s interview (the answer begins around 49: 54 in this video).

Anthropic CEO floats idea of giving AI a “quit job” button, sparking skepticism Read More »

new-intel-ceo-lip-bu-tan-will-pick-up-where-pat-gelsinger-left-off

New Intel CEO Lip-Bu Tan will pick up where Pat Gelsinger left off

After a little over three months, Intel has a new CEO to replace ousted former CEO Pat Gelsinger. Intel’s board announced that Lip-Bu Tan will begin as Intel CEO on March 18, taking over from interim co-CEOs David Zinsner and Michelle Johnston Holthaus.

Gelsinger was booted from the CEO position by Intel’s board on December 2 after several quarters of losses, rounds of layoffs, and canceled or spun-off side projects. Gelsinger sought to turn Intel into a foundry company that also manufactured chips for fabless third-party chip design companies, putting it into competition with Taiwan Semiconductor Manufacturing Company(TSMC), Samsung, and others, a plan that Intel said it was still committed to when it let Gelsinger go.

Intel said that Zinsner would stay on as executive vice president and CFO, and Johnston Holthaus would remain CEO of the Intel Products Group, which is mainly responsible for Intel’s consumer products. These were the positions both executives held before serving as interim co-CEOs.

Tan was previously a member of Intel’s board from 2022 to 2024 and has been a board member for several other technology and chip manufacturing companies, including Hewlett Packard Enterprise, Semiconductor Manufacturing International Corporation (SMIC), and Cadence Design Systems.

New Intel CEO Lip-Bu Tan will pick up where Pat Gelsinger left off Read More »