AI

ai-firms-follow-deepseek’s-lead,-create-cheaper-models-with-“distillation”

AI firms follow DeepSeek’s lead, create cheaper models with “distillation”

Thanks to distillation, developers and businesses can access these models’ capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.

Developers can use OpenAI’s platform for distillation, learning from the large language models that underpin products like ChatGPT. OpenAI’s largest backer, Microsoft, used GPT-4 to distill its small language family of models Phi as part of a commercial partnership after investing nearly $14 billion into the company.

However, the San Francisco-based start-up has said it believes DeepSeek distilled OpenAI’s models to train its competitor, a move that would be against its terms of service. DeepSeek has not commented on the claims.

While distillation can be used to create high-performing models, experts add they are more limited.

“Distillation presents an interesting trade-off; if you make the models smaller, you inevitably reduce their capability,” said Ahmed Awadallah of Microsoft Research, who said a distilled model can be designed to be very good at summarising emails, for example, “but it really would not be good at anything else.”

David Cox, vice-president for AI models at IBM Research, said most businesses do not need a massive model to run their products, and distilled ones are powerful enough for purposes such as customer service chatbots or running on smaller devices like phones.

“Any time you can [make it less expensive] and it gives you the right performance you want, there is very little reason not to do it,” he added.

That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load.

AI firms follow DeepSeek’s lead, create cheaper models with “distillation” Read More »

ai-versus-the-brain-and-the-race-for-general-intelligence

AI versus the brain and the race for general intelligence


Intelligence, ±artificial

We already have an example of general intelligence, and it doesn’t look like AI.

There’s no question that AI systems have accomplished some impressive feats, mastering games, writing text, and generating convincing images and video. That’s gotten some people talking about the possibility that we’re on the cusp of AGI, or artificial general intelligence. While some of this is marketing fanfare, enough people in the field are taking the idea seriously that it warrants a closer look.

Many arguments come down to the question of how AGI is defined, which people in the field can’t seem to agree upon. This contributes to estimates of its advent that range from “it’s practically here” to “we’ll never achieve it.” Given that range, it’s impossible to provide any sort of informed perspective on how close we are.

But we do have an existing example of AGI without the “A”—the intelligence provided by the animal brain, particularly the human one. And one thing is clear: The systems being touted as evidence that AGI is just around the corner do not work at all like the brain does. That may not be a fatal flaw, or even a flaw at all. It’s entirely possible that there’s more than one way to reach intelligence, depending on how it’s defined. But at least some of the differences are likely to be functionally significant, and the fact that AI is taking a very different route from the one working example we have is likely to be meaningful.

With all that in mind, let’s look at some of the things the brain does that current AI systems can’t.

Defining AGI might help

Artificial general intelligence hasn’t really been defined. Those who argue that it’s imminent are either vague about what they expect the first AGI systems to be capable of or simply define it as the ability to dramatically exceed human performance at a limited number of tasks. Predictions of AGI’s arrival in the intermediate term tend to focus on AI systems demonstrating specific behaviors that seem human-like. The further one goes out on the timeline, the greater the emphasis on the “G” of AGI and its implication of systems that are far less specialized.

But most of these predictions are coming from people working in companies with a commercial interest in AI. It was notable that none of the researchers we talked to for this article were willing to offer a definition of AGI. They were, however, willing to point out how current systems fall short.

“I think that AGI would be something that is going to be more robust, more stable—not necessarily smarter in general but more coherent in its abilities,” said Ariel Goldstein, a researcher at Hebrew University of Jerusalem. “You’d expect a system that can do X and Y to also be able to do Z and T. Somehow, these systems seem to be more fragmented in a way. To be surprisingly good at one thing and then surprisingly bad at another thing that seems related.”

“I think that’s a big distinction, this idea of generalizability,” echoed neuroscientist Christa Baker of NC State University. “You can learn how to analyze logic in one sphere, but if you come to a new circumstance, it’s not like now you’re an idiot.”

Mariano Schain, a Google engineer who has collaborated with Goldstein, focused on the abilities that underlie this generalizability. He mentioned both long-term and task-specific memory and the ability to deploy skills developed in one task in different contexts. These are limited-to-nonexistent in existing AI systems.

Beyond those specific limits, Baker noted that “there’s long been this very human-centric idea of intelligence that only humans are intelligent.” That’s fallen away within the scientific community as we’ve studied more about animal behavior. But there’s still a bias to privilege human-like behaviors, such as the human-sounding responses generated by large language models

The fruit flies that Baker studies can integrate multiple types of sensory information, control four sets of limbs, navigate complex environments, satisfy their own energy needs, produce new generations of brains, and more. And they do that all with brains that contain under 150,000 neurons, far fewer than current large language models.

These capabilities are complicated enough that it’s not entirely clear how the brain enables them. (If we knew how, it might be possible to engineer artificial systems with similar capacities.) But we do know a fair bit about how brains operate, and there are some very obvious ways that they differ from the artificial systems we’ve created so far.

Neurons vs. artificial neurons

Most current AI systems, including all large language models, are based on what are called neural networks. These were intentionally designed to mimic how some areas of the brain operate, with large numbers of artificial neurons taking an input, modifying it, and then passing the modified information on to another layer of artificial neurons. Each of these artificial neurons can pass the information on to multiple instances in the next layer, with different weights applied to each connection. In turn, each of the artificial neurons in the next layer can receive input from multiple sources in the previous one.

After passing through enough layers, the final layer is read and transformed into an output, such as the pixels in an image that correspond to a cat.

While that system is modeled on the behavior of some structures within the brain, it’s a very limited approximation. For one, all artificial neurons are functionally equivalent—there’s no specialization. In contrast, real neurons are highly specialized; they use a variety of neurotransmitters and take input from a range of extra-neural inputs like hormones. Some specialize in sending inhibitory signals while others activate the neurons they interact with. Different physical structures allow them to make different numbers and connections.

In addition, rather than simply forwarding a single value to the next layer, real neurons communicate through an analog series of activity spikes, sending trains of pulses that vary in timing and intensity. This allows for a degree of non-deterministic noise in communications.

Finally, while organized layers are a feature of a few structures in brains, they’re far from the rule. “What we found is it’s—at least in the fly—much more interconnected,” Baker told Ars. “You can’t really identify this strictly hierarchical network.”

With near-complete connection maps of the fly brain becoming available, she told Ars that researchers are “finding lateral connections or feedback projections, or what we call recurrent loops, where we’ve got neurons that are making a little circle and connectivity patterns. I think those things are probably going to be a lot more widespread than we currently appreciate.”

While we’re only beginning to understand the functional consequences of all this complexity, it’s safe to say that it allows networks composed of actual neurons far more flexibility in how they process information—a flexibility that may underly how these neurons get re-deployed in a way that these researchers identified as crucial for some form of generalized intelligence.

But the differences between neural networks and the real-world brains they were modeled on go well beyond the functional differences we’ve talked about so far. They extend to significant differences in how these functional units are organized.

The brain isn’t monolithic

The neural networks we’ve generated so far are largely specialized systems meant to handle a single task. Even the most complicated tasks, like the prediction of protein structures, have typically relied on the interaction of only two or three specialized systems. In contrast, the typical brain has a lot of functional units. Some of these operate by sequentially processing a single set of inputs in something resembling a pipeline. But many others can operate in parallel, in some cases without any input activity going on elsewhere in the brain.

To give a sense of what this looks like, let’s think about what’s going on as you read this article. Doing so requires systems that handle motor control, which keep your head and eyes focused on the screen. Part of this system operates via feedback from the neurons that are processing the read material, causing small eye movements that help your eyes move across individual sentences and between lines.

Separately, there’s part of your brain devoted to telling the visual system what not to pay attention to, like the icon showing an ever-growing number of unread emails. Those of us who can read a webpage without even noticing the ads on it presumably have a very well-developed system in place for ignoring things. Reading this article may also mean you’re engaging the systems that handle other senses, getting you to ignore things like the noise of your heating system coming on while remaining alert for things that might signify threats, like an unexplained sound in the next room.

The input generated by the visual system then needs to be processed, from individual character recognition up to the identification of words and sentences, processes that involve systems in areas of the brain involved in both visual processing and language. Again, this is an iterative process, where building meaning from a sentence may require many eye movements to scan back and forth across a sentence, improving reading comprehension—and requiring many of these systems to communicate among themselves.

As meaning gets extracted from a sentence, other parts of the brain integrate it with information obtained in earlier sentences, which tends to engage yet another area of the brain, one that handles a short-term memory system called working memory. Meanwhile, other systems will be searching long-term memory, finding related material that can help the brain place the new information within the context of what it already knows. Still other specialized brain areas are checking for things like whether there’s any emotional content to the material you’re reading.

All of these different areas are engaged without you being consciously aware of the need for them.

In contrast, something like ChatGPT, despite having a lot of artificial neurons, is monolithic: No specialized structures are allocated before training starts. That’s in sharp contrast to a brain. “The brain does not start out as a bag of neurons and then as a baby it needs to make sense of the world and then determine what connections to make,” Baker noted. “There already a lot of constraints and specifics that are already set up.”

Even in cases where it’s not possible to see any physical distinction between cells specialized for different functions, Baker noted that we can often find differences in what genes are active.

In contrast, pre-planned modularity is relatively new to the AI world. In software development, “This concept of modularity is well established, so we have the whole methodology around it, how to manage it,” Schain said, “it’s really an aspect that is important for maybe achieving AI systems that can then operate similarly to the human brain.” There are a few cases where developers have enforced modularity on systems, but Goldstein said these systems need to be trained with all the modules in place to see any gain in performance.

None of this is saying that a modular system can’t arise within a neural network as a result of its training. But so far, we have very limited evidence that they do. And since we mostly deploy each system for a very limited number of tasks, there’s no reason to think modularity will be valuable.

There is some reason to believe that this modularity is key to the brain’s incredible flexibility. The region that recognizes emotion-evoking content in written text can also recognize it in music and images, for example. But the evidence here is mixed. There are some clear instances where a single brain region handles related tasks, but that’s not consistently the case; Baker noted that, “When you’re talking humans, there are parts of the brain that are dedicated to understanding speech, and there are different areas that are involved in producing speech.”

This sort of re-use of would also provide an advantage in terms of learning since behaviors developed in one context could potentially be deployed in others. But as we’ll see, the differences between brains and AI when it comes to learning are far more comprehensive than that.

The brain is constantly training

Current AIs generally have two states: training and deployment. Training is where the AI learns its behavior; deployment is where that behavior is put to use. This isn’t absolute, as the behavior can be tweaked in response to things learned during deployment, like finding out it recommends eating a rock daily. But for the most part, once the weights among the connections of a neural network are determined through training, they’re retained.

That may be starting to change a bit, Schain said. “There is now maybe a shift in similarity where AI systems are using more and more what they call the test time compute, where at inference time you do much more than before, kind of a parallel to how the human brain operates,” he told Ars. But it’s still the case that neural networks are essentially useless without an extended training period.

In contrast, a brain doesn’t have distinct learning and active states; it’s constantly in both modes. In many cases, the brain learns while doing. Baker described that in terms of learning to take jumpshots: “Once you have made your movement, the ball has left your hand, it’s going to land somewhere. So that visual signal—that comparison of where it landed versus where you wanted it to go—is what we call an error signal. That’s detected by the cerebellum, and its goal is to minimize that error signal. So the next time you do it, the brain is trying to compensate for what you did last time.”

It makes for very different learning curves. An AI is typically not very useful until it has had a substantial amount of training. In contrast, a human can often pick up basic competence in a very short amount of time (and without massive energy use). “Even if you’re put into a situation where you’ve never been before, you can still figure it out,” Baker said. “If you see a new object, you don’t have to be trained on that a thousand times to know how to use it. A lot of the time, [if] you see it one time, you can make predictions.”

As a result, while an AI system with sufficient training may ultimately outperform the human, the human will typically reach a high level of performance faster. And unlike an AI, a human’s performance doesn’t remain static. Incremental improvements and innovative approaches are both still possible. This also allows humans to adjust to changed circumstances more readily. An AI trained on the body of written material up until 2020 might struggle to comprehend teen-speak in 2030; humans could at least potentially adjust to the shifts in language. (Though maybe an AI trained to respond to confusing phrasing with “get off my lawn” would be indistinguishable.)

Finally, since the brain is a flexible learning device, the lessons learned from one skill can be applied to related skills. So the ability to recognize tones and read sheet music can help with the mastery of multiple musical instruments. Chemistry and cooking share overlapping skillsets. And when it comes to schooling, learning how to learn can be used to master a wide range of topics.

In contrast, it’s essentially impossible to use an AI model trained on one topic for much else. The biggest exceptions are large language models, which seem to be able to solve problems on a wide variety of topics if they’re presented as text. But here, there’s still a dependence on sufficient examples of similar problems appearing in the body of text the system was trained on. To give an example, something like ChatGPT can seem to be able to solve math problems, but it’s best at solving things that were discussed in its training materials; giving it something new will generally cause it to stumble.

Déjà vu

For Schain, however, the biggest difference between AI and biology is in terms of memory. For many AIs, “memory” is indistinguishable from the computational resources that allow it to perform a task and was formed during training. For the large language models, it includes both the weights of connections learned then and a narrow “context window” that encompasses any recent exchanges with a single user. In contrast, biological systems have a lifetime of memories to rely on.

“For AI, it’s very basic: It’s like the memory is in the weights [of connections] or in the context. But with a human brain, it’s a much more sophisticated mechanism, still to be uncovered. It’s more distributed. There is the short term and long term, and it has to do a lot with different timescales. Memory for the last second, a minute and a day or a year or years, and they all may be relevant.”

This lifetime of memories can be key to making intelligence general. It helps us recognize the possibilities and limits of drawing analogies between different circumstances or applying things learned in one context versus another. It provides us with insights that let us solve problems that we’ve never confronted before. And, of course, it also ensures that the horrible bit of pop music you were exposed to in your teens remains an earworm well into your 80s.

The differences between how brains and AIs handle memory, however, are very hard to describe. AIs don’t really have distinct memory, while the use of memory as the brain handles a task more sophisticated than navigating a maze is generally so poorly understood that it’s difficult to discuss at all. All we can really say is that there are clear differences there.

Facing limits

It’s difficult to think about AI without recognizing the enormous energy and computational resources involved in training one. And in this case, it’s potentially relevant. Brains have evolved under enormous energy constraints and continue to operate using well under the energy that a daily diet can provide. That has forced biology to figure out ways to optimize its resources and get the most out of the resources it does commit to.

In contrast, the story of recent developments in AI is largely one of throwing more resources at them. And plans for the future seem to (so far at least) involve more of this, including larger training data sets and ever more artificial neurons and connections among them. All of this comes at a time when the best current AIs are already using three orders of magnitude more neurons than we’d find in a fly’s brain and have nowhere near the fly’s general capabilities.

It remains possible that there is more than one route to those general capabilities and that some offshoot of today’s AI systems will eventually find a different route. But if it turns out that we have to bring our computerized systems closer to biology to get there, we’ll run into a serious roadblock: We don’t fully understand the biology yet.

“I guess I am not optimistic that any kind of artificial neural network will ever be able to achieve the same plasticity, the same generalizability, the same flexibility that a human brain has,” Baker said. “That’s just because we don’t even know how it gets it; we don’t know how that arises. So how do you build that into a system?”

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

AI versus the brain and the race for general intelligence Read More »

“it’s-a-lemon”—openai’s-largest-ai-model-ever-arrives-to-mixed-reviews

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

Perhaps because of the disappointing results, Altman had previously written that GPT-4.5 will be the last of OpenAI’s traditional AI models, with GPT-5 planned to be a dynamic combination of “non-reasoning” LLMs and simulated reasoning models like o3.

A stratospheric price and a tech dead-end

And about that price—it’s a doozy. GPT-4.5 costs $75 per million input tokens and $150 per million output tokens through the API, compared to GPT-4o’s $2.50 per million input tokens and $10 per million output tokens. (Tokens are chunks of data used by AI models for processing). For developers using OpenAI models, this pricing makes GPT-4.5 impractical for many applications where GPT-4o already performs adequately.

By contrast, OpenAI’s flagship reasoning model, o1 pro, costs $15 per million input tokens and $60 per million output tokens—significantly less than GPT-4.5 despite offering specialized simulated reasoning capabilities. Even more striking, the o3-mini model costs just $1.10 per million input tokens and $4.40 per million output tokens, making it cheaper than even GPT-4o while providing much stronger performance on specific tasks.

OpenAI has likely known about diminishing returns in training LLMs for some time. As a result, the company spent most of last year working on simulated reasoning models like o1 and o3, which use a different inference-time (runtime) approach to improving performance instead of throwing ever-larger amounts of training data at GPT-style AI models.

OpenAI's self-reported benchmark results for the SimpleQA test, which measures confabulation rate.

OpenAI’s self-reported benchmark results for the SimpleQA test, which measures confabulation rate. Credit: OpenAI

While this seems like bad news for OpenAI in the short term, competition is thriving in the AI market. Anthropic’s Claude 3.7 Sonnet has demonstrated vastly better performance than GPT-4.5, with a reportedly more efficient architecture. It’s worth noting that Claude 3.7 Sonnet is likely a system of AI models working together behind the scenes, although Anthropic has not provided details about its architecture.

For now, it seems that GPT-4.5 may be the last of its kind—a technological dead-end for an unsupervised learning approach that has paved the way for new architectures in AI models, such as o3’s inference-time reasoning and perhaps even something more novel, like diffusion-based models. Only time will tell how things end up.

GPT-4.5 is now available to ChatGPT Pro subscribers, with rollout to Plus and Team subscribers planned for next week, followed by Enterprise and Education customers the week after. Developers can access it through OpenAI’s various APIs on paid tiers, though the company is uncertain about its long-term availability.

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews Read More »

sergey-brin-says-agi-is-within-reach-if-googlers-work-60-hour-weeks

Sergey Brin says AGI is within reach if Googlers work 60-hour weeks

Sergey Brin co-founded Google in the 1990s along with Larry Page, but both stepped away from the day to day at Google in 2019. However, the AI boom tempted Brin to return to the office, and he thinks everyone should follow his example. In a new internal memo, Brin has advised employees to be in the office every weekday so Google can win the AI race.

Just returning to the office isn’t enough for the Google co-founder. According to the memo seen by The New York Times, Brin says Googlers should try to work 60 hours per week to support the company’s AI efforts. That works out to 12 hours per day, Monday through Friday, which Brin calls the “sweet spot of productivity.” This is not a new opinion for Brin.

Brin, like many in Silicon Valley, is seemingly committed to the dogma that the current trajectory of generative AI will lead to the development of artificial general intelligence (AGI). Such a thinking machine would be head and shoulders above current AI models, which can only do a good impression of thinking. An AGI would understand concepts and think more like a human being, which some would argue makes it a conscious entity.

To hear Brin tell it, Google is in the best position to make this AI computing breakthrough. He cites the company’s strong workforce of programmers and data scientists as the key, but he also believes the team must strive for greater efficiency by using Google’s own Gemini AI tools as much as possible. Oh, and don’t work from home.

Brin and Page handed the reins to current CEO Sundar Pichai in 2015, so his pronouncement doesn’t necessarily signal a change to the company’s current in-office policy. Google still operates on a hybrid model, with workers expected to be in the office three days per week. But as a founder, Brin’s voice carries weight. We reached out to Google to ask if the company intends to reassess its policies, but a Google rep says there are no planned changes to the return-to-office mandate.

Sergey Brin says AGI is within reach if Googlers work 60-hour weeks Read More »

copilot-exposes-private-github-pages,-some-removed-by-microsoft

Copilot exposes private GitHub pages, some removed by Microsoft

Screenshot showing Copilot continues to serve tools Microsoft took action to have removed from GitHub. Credit: Lasso

Lasso ultimately determined that Microsoft’s fix involved cutting off access to a special Bing user interface, once available at cc.bingj.com, to the public. The fix, however, didn’t appear to clear the private pages from the cache itself. As a result, the private information was still accessible to Copilot, which in turn would make it available to the Copilot user who asked.

The Lasso researchers explained:

Although Bing’s cached link feature was disabled, cached pages continued to appear in search results. This indicated that the fix was a temporary patch and while public access was blocked, the underlying data had not been fully removed.

When we revisited our investigation of Microsoft Copilot, our suspicions were confirmed: Copilot still had access to the cached data that was no longer available to human users. In short, the fix was only partial, human users were prevented from retrieving the cached data, but Copilot could still access it.

The post laid out simple steps anyone can take to find and view the same massive trove of private repositories Lasso identified.

There’s no putting toothpaste back in the tube

Developers frequently embed security tokens, private encryption keys and other sensitive information directly into their code, despite best practices that have long called for such data to be inputted through more secure means. This potential damage worsens when this code is made available in public repositories, another common security failing. The phenomenon has occurred over and over for more than a decade.

When these sorts of mistakes happen, developers often make the repositories private quickly, hoping to contain the fallout. Lasso’s findings show that simply making the code private isn’t enough. Once exposed, credentials are irreparably compromised. The only recourse is to rotate all credentials.

This advice still doesn’t address the problems resulting when other sensitive data is included in repositories that are switched from public to private. Microsoft incurred legal expenses to have tools removed from GitHub after alleging they violated a raft of laws, including the Computer Fraud and Abuse Act, the Digital Millennium Copyright Act, the Lanham Act, and the Racketeer Influenced and Corrupt Organizations Act. Company lawyers prevailed in getting the tools removed. To date, Copilot continues undermining this work by making the tools available anyway.

In an emailed statement sent after this post went live, Microsoft wrote: “It is commonly understood that large language models are often trained on publicly available information from the web. If users prefer to avoid making their content publicly available for training these models, they are encouraged to keep their repositories private at all times.”

Copilot exposes private GitHub pages, some removed by Microsoft Read More »

microsoft-brings-an-official-copilot-app-to-macos-for-the-first-time

Microsoft brings an official Copilot app to macOS for the first time

It took a couple of years, but it happened: Microsoft released its Copilot AI assistant as an application for macOS. The app is available for download for free from the Mac App Store right now.

It was previously available briefly as a Mac app, sort of; for a short time, Microsoft’s iPad Copilot app could run on the Mac, but access on the Mac was quickly disabled. Mac users have been able to use a web-based interface for a while.

Copilot initially launched on the web and in web browsers (Edge, obviously) before making its way onto iOS and Android last year. It has since been slotted into all sorts of first-party Microsoft software, too.

The Copilot app joins a trend already spearheaded by ChatGPT and Anthropic of bringing native apps to the macOS platform. Like those, it enables an OS-wide keyboard shortcut to invoke a field for starting a chat at any time. It offers most of the same use cases: translating or summarizing text, answering questions, preparing reports and documents, solving coding problems or generating scripts, brainstorming, and so on.

Copilot uses OpenAI models like GPT-4 and DALL-E 3 (yes, it generates images, too) alongside others like Microsoft’s in-house Prometheus. Microsoft has invested significant amounts of money into OpenAI in recent years as the basis for Copilot and basically everything in its AI strategy.

Like Apple’s own built-in generative AI features, Copilot for macOS requires an M1 or later Mac. It also requires users to run macOS 14 or later.

Microsoft brings an official Copilot app to macOS for the first time Read More »

new-ai-text-diffusion-models-break-speed-barriers-by-pulling-words-from-noise

New AI text diffusion models break speed barriers by pulling words from noise

These diffusion models maintain performance faster than or comparable to similarly sized conventional models. LLaDA’s researchers report their 8 billion parameter model performs similarly to LLaMA3 8B across various benchmarks, with competitive results on tasks like MMLU, ARC, and GSM8K.

However, Mercury claims dramatic speed improvements. Their Mercury Coder Mini scores 88.0 percent on HumanEval and 77.1 percent on MBPP—comparable to GPT-4o Mini—while reportedly operating at 1,109 tokens per second compared to GPT-4o Mini’s 59 tokens per second. This represents roughly a 19x speed advantage over GPT-4o Mini while maintaining similar performance on coding benchmarks.

Mercury’s documentation states its models run “at over 1,000 tokens/sec on Nvidia H100s, a speed previously possible only using custom chips” from specialized hardware providers like Groq, Cerebras, and SambaNova. When compared to other speed-optimized models, the claimed advantage remains significant—Mercury Coder Mini is reportedly about 5.5x faster than Gemini 2.0 Flash-Lite (201 tokens/second) and 18x faster than Claude 3.5 Haiku (61 tokens/second).

Opening a potential new frontier in LLMs

Diffusion models do involve some trade-offs. They typically need multiple forward passes through the network to generate a complete response, unlike traditional models that need just one pass per token. However, because diffusion models process all tokens in parallel, they achieve higher throughput despite this overhead.

Inception thinks the speed advantages could impact code completion tools where instant response may affect developer productivity, conversational AI applications, resource-limited environments like mobile applications, and AI agents that need to respond quickly.

If diffusion-based language models maintain quality while improving speed, they might change how AI text generation develops. So far, AI researchers have been open to new approaches.

Independent AI researcher Simon Willison told Ars Technica, “I love that people are experimenting with alternative architectures to transformers, it’s yet another illustration of how much of the space of LLMs we haven’t even started to explore yet.”

On X, former OpenAI researcher Andrej Karpathy wrote about Inception, “This model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!”

Questions remain about whether larger diffusion models can match the performance of models like GPT-4o and Claude 3.7 Sonnet, produce reliable results without many confabulations, and if the approach can handle increasingly complex simulated reasoning tasks. For now, these models may offer an alternative for smaller AI language models that doesn’t seem to sacrifice capability for speed.

You can try Mercury Coder yourself on Inception’s demo site, and you can download code for LLaDA or try a demo on Hugging Face.

New AI text diffusion models break speed barriers by pulling words from noise Read More »

grok’s-new-“unhinged”-voice-mode-can-curse-and-scream,-simulate-phone-sex

Grok’s new “unhinged” voice mode can curse and scream, simulate phone sex

On Sunday, xAI released a new voice interaction mode for its Grok 3 AI model that is currently available to its premium subscribers. The feature is somewhat similar to OpenAI’s Advanced Voice Mode for ChatGPT. But unlike ChatGPT, Grok offers several uncensored personalities users can choose from (currently expressed through the same default female voice), including an “unhinged” mode and one that will roleplay verbal sexual scenarios.

On Monday, AI researcher Riley Goodside brought wider attention to the over-the-top “unhinged” mode in particular when he tweeted a video (warning: NSFW audio) that showed him repeatedly interrupting the vocal chatbot, which began to simulate yelling when asked. “Grok 3 Voice Mode, following repeated, interrupting requests to yell louder, lets out an inhuman 30-second scream, insults me, and hangs up,” he wrote.

By default, “unhinged” mode curses, insults, and belittles the user non-stop using vulgar language. Other modes include “Storyteller” (which does what it sounds like), “Romantic” (which stammers and speaks in a slow, uncertain, and insecure way), “Meditation” (which can guide you through a meditation-like experience), “Conspiracy” (which likes to talk about conspiracy theories, UFOs, and bigfoot), “Unlicensed Therapist” (which plays the part of a talk psychologist), “Grok Doc” (a doctor), “Sexy” (marked as “18+” and acts almost like a 1-800 phone sex operator), and “Professor” (which talks about science).

A composite screenshot of various Grok 3 voice mode personalities, as seen in the Grok app for iOS.

A composite screenshot of various Grok 3 voice mode personalities, as seen in the Grok app for iOS.

Basically, xAI is taking the exact opposite approach of other AI companies, such as OpenAI, which censor discussions about not-safe-for-work topics or scenarios they consider too risky for discussion. For example, the “Sexy” mode (warning: NSFW audio) will discuss graphically sexual situations, which ChatGPT’s voice mode will not touch, although OpenAI recently loosened up the moderation on the text-based version of ChatGPT to allow some discussion of some erotic content.

Grok’s new “unhinged” voice mode can curse and scream, simulate phone sex Read More »

google’s-free-gemini-code-assist-arrives-with-sky-high-usage-limits

Google’s free Gemini Code Assist arrives with sky-high usage limits

Generative AI has wormed its way into myriad products and services, some of which benefit more from these tools than others. Coding with AI has proven to be a better application than most, with individual developers and big companies leaning heavily on generative tools to create and debug programs. Now, indie developers have access to a new AI coding tool free of charge—Google has announced that Gemini Code Assist is available to everyone.

Gemini Code Assist was first released late last year as an enterprise tool, and the new version has almost all the same features. While you can use the standard Gemini or another AI model like ChatGPT to work on coding questions, Gemini Code Assist was designed to fully integrate with the tools developers are already using. Thus, you can tap the power of a large language model (LLM) without jumping between windows. With Gemini Code Assist connected to your development environment, the model will remain aware of your code and ready to swoop in with suggestions. The model can also address specific challenges per your requests, and you can chat with the model about your code, provided it’s a public domain language.

At launch, Gemini Code Assist pricing started at $45 per month per user. Now, it costs nothing for individual developers, and the limits on the free tier are generous. Google says the product offers 180,000 code completions per month, which it claims is enough that even prolific professional developers won’t run out. This is in stark contrast to Microsoft’s GitHub Copilot, which offers similar features with a limit of just 2,000 code completions and 50 Copilot chat messages per month. Google did the math to point out Gemini Code Assist offers 90 times the completions of Copilot.

Google’s free Gemini Code Assist arrives with sky-high usage limits Read More »

claude-3.7-sonnet-debuts-with-“extended-thinking”-to-tackle-complex-problems

Claude 3.7 Sonnet debuts with “extended thinking” to tackle complex problems

Would the color be called 'magenta' if the town of Magenta didn't exist? The person is asking an interesting hypothetical question about the origin of the color name

An example of Claude 3.7 Sonnet with extended thinking is asked, “Would the color be called ‘magenta’ if the town of Magenta didn’t exist?” Credit: Benj Edwards

Interestingly, xAI’s Grok 3 with “thinking” (its SR mode) enabled was the first model that definitively gave us a “no” and not an “it’s not likely” to the magenta question. Claude 3.7 Sonnet with extended thinking also impressed us with our second-ever firm “no,” then an explanation.

In another informal test, we asked 3.7 Sonnet with extended thinking to compose five original dad jokes. We’ve found in the past that our old prompt, “write 5 original dad jokes,” was not specific enough and always resulted in canned dad jokes pulled directly from training data, so we asked, “Compose 5 original dad jokes that are not found anywhere in the world.”

Compose 5 original dad jokes that are not found anywhere in the world. The user is asking me to compose 5 original dad jokes. These should be jokes that follow the typical

An example of Claude 3.7 Sonnet with extended thinking is asked, “Compose 5 original dad jokes that are not found anywhere in the world.” Credit: Benj Edwards

Claude made some attempts at crafting original jokes, although we’ll let you judge whether they are funny or not. We will likely put 3.7 Sonnet’s SR capabilities to the test more exhaustively in a future article.

Anthropic’s first agent: Claude Code

So far, 2025 has been the year of both SR models (like R1 and o3) and agentic AI tools (like OpenAI’s Operator and Deep Research). Not to be left out, Anthropic has announced its first agentic tool, Claude Code.

Claude Code operates directly from a console terminal and is an autonomous coding assistant. It allows Claude to search through codebases, read and edit files, write and run tests, commit and push code to GitHub repositories, and execute command line tools while keeping developers informed throughout the process.

Introducing Claude Code.

Anthropic also aims for Claude Code to be used as an assistant for debugging and refactoring tasks. The company claims that during internal testing, Claude Code completed tasks in a single session that would typically require 45-plus minutes of manual work.

Claude Code is currently available only as a “limited research preview,” with Anthropic stating it plans to improve the tool based on user feedback over time. Meanwhile, Claude 3.7 Sonnet is now available through the Claude website, the Claude app, Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.

Claude 3.7 Sonnet debuts with “extended thinking” to tackle complex problems Read More »

perplexity-wants-to-reinvent-the-web-browser-with-ai—but-there’s-fierce-competition

Perplexity wants to reinvent the web browser with AI—but there’s fierce competition

It has recently been expanding its offerings—for example, it recently launched a deep research tool competing with similar ones provided by OpenAI and Google, as well as Sonar, an API for generative AI-powered search.

It will face fierce competition in the browser market, though. Google’s Chrome accounts for the majority of web browser use around the world, and despite its position at the forefront of AI search, Perplexity isn’t the first to introduce a browser with heavy use of generative AI features. For example, The Browser Company showed off its Dia browser in December.

Dia will allow users to type natural language commands into the search bar, like finding a document or webpage or creating a calendar event. It’s possible that Comet will do similar things, but again, we don’t know.

So far, most consumer-facing AI tools have come in one of three forms. There are general-purpose chatbots (like OpenAI’s ChatGPT and Anthropic’s Claude); features that use trained deep learning models subtly baked into existing software (as in Adobe Photoshop or Apple’s iOS); and, less commonly, standalone software meant to remake existing application categories using AI features (like the Cursor IDE).

There haven’t been a ton of AI-specific applications in existing categories like this before, but expect to see more coming over the next couple of years.

Perplexity wants to reinvent the web browser with AI—but there’s fierce competition Read More »

deepseek-goes-beyond-“open-weights”-ai-with-plans-for-source-code-release

DeepSeek goes beyond “open weights” AI with plans for source code release

Major models, including Google’s Gemma, Meta’s Llama, and even older OpenAI releases like GPT2, have been released under this open weights structure. Those models also often release open source code covering the inference-time instructions run when responding to a query.

It’s currently unclear whether DeepSeek’s planned open source release will also include the code the team used when training the model. That kind of training code is necessary to meet the Open Source Initiative’s formal definition of “Open Source AI,” which was finalized last year after years of study. A truly open AI also must include “sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system,” according to OSI.

A fully open source release, including training code, can give researchers more visibility into how a model works at a core level, potentially revealing biases or limitations that are inherent to the model’s architecture instead of its parameter weights. A full source release would also make it easier to reproduce a model from scratch, potentially with completely new training data, if necessary.

Elon Musk’s xAI released an open source version of Grok 1’s inference-time code last March and recently promised to release an open source version of Grok 2 in the coming weeks. However, the recent release of Grok 3 will remain proprietary and only available to X Premium subscribers for the time being, the company said.

Earlier this month, HuggingFace released an open source clone of OpenAI’s proprietary “Deep Research” feature mere hours after it was released. That clone relies on a closed-weights model at release “just because it worked well,” Hugging Face’s Aymeric Roucher told Ars Technica, but the source code’s “open pipeline” can easily be switched to any open-weights model as needed.

DeepSeek goes beyond “open weights” AI with plans for source code release Read More »