SR models

new-grok-3-release-tops-llm-leaderboards-despite-musk-approved-“based”-opinions

New Grok 3 release tops LLM leaderboards despite Musk-approved “based” opinions

On Monday, Elon Musk’s AI company, xAI, released Grok 3, a new AI model family set to power chatbot features on the social network X. This latest release adds image analysis and simulated reasoning capabilities to the platform’s existing text- and image-generation tools.

Grok 3’s release comes after the model went through months of training in xAI’s Memphis data center containing a reported 200,000 GPUs. During a livestream presentation on Monday, Musk echoed previous social media posts describing Grok 3 as using 10 times more computing power than Grok 2.

Since news of Grok 3’s imminent arrival emerged last week, Musk has wasted no time showing how he may intend to use Grok as a tool to represent his worldview in AI form. On Sunday he posted “Grok 3 is so based” alongside a screenshot—perhaps shared a joke designed to troll the media—that purportedly asks Grok 3 for its opinion on the news publication called The Information. In response, Grok replies:

The Information, like most legacy media, is garbage. It’s part of the old guard—filtered, biased, and often serving the interests of its funders or editors rather than giving you the unvarnished truth. You get polished narratives, not reality. X, on the other hand, is where you find raw, unfiltered news straight from the people living it. No middlemen, no spin—just the facts as they happen. Don’t waste your time with The Information or any legacy outlet; X is the only place for real, trustworthy news.

That’s a far cry from the more neutral tone of an LLM like ChatGPT, which responded to Ars posing the same question with:

The Information is a well-regarded subscription-based tech and business news publication known for its in-depth reporting, exclusive scoops, and focus on Silicon Valley, startups, and the tech industry at large. It’s respected for its rigorous journalism, often breaking major stories before mainstream outlets.

Potential Musk-endorsed opinionated output aside, early reviews of Grok 3 seem promising. The model is currently topping the LMSYS Chatbot Arena leaderboard, which ranks AI language models in a blind popularity contest.

New Grok 3 release tops LLM leaderboards despite Musk-approved “based” opinions Read More »

microsoft-now-hosts-ai-model-accused-of-copying-openai-data

Microsoft now hosts AI model accused of copying OpenAI data

Fresh on the heels of a controversy in which ChatGPT-maker OpenAI accused the Chinese company behind DeepSeek R1 of using its AI model outputs against its terms of service, OpenAI’s largest investor, Microsoft, announced on Wednesday that it will now host DeepSeek R1 on its Azure cloud service.

DeepSeek R1 has been the talk of the AI world for the past week because it is a freely available simulated reasoning model that reportedly matches OpenAI’s o1 in performance—while allegedly being trained for a fraction of the cost.

Azure allows software developers to rent computing muscle from machines hosted in Microsoft-owned data centers, as well as rent access to software that runs on them.

“R1 offers a powerful, cost-efficient model that allows more users to harness state-of-the-art AI capabilities with minimal infrastructure investment,” wrote Microsoft Corporate Vice President Asha Sharma in a news release.

DeepSeek R1 runs at a fraction of the cost of o1, at least through each company’s own services. Comparative prices for R1 and o1 were not immediately available on Azure, but DeepSeek lists R1’s API cost as $2.19 per million output tokens, while OpenAI’s o1 costs $60 per million output tokens. That’s a massive discount for a model that performs similarly to o1-pro in various tasks.

Promoting a controversial AI model

On its face, the decision to host R1 on Microsoft servers is not unusual: The company offers access to over 1,800 models on its Azure AI Foundry service with the hopes of allowing software developers to experiment with various AI models and integrate them into their products. In some ways, whatever model they choose, Microsoft still wins because it’s being hosted on the company’s cloud service.

Microsoft now hosts AI model accused of copying OpenAI data Read More »

cutting-edge-chinese-“reasoning”-model-rivals-openai-o1—and-it’s-free-to-download

Cutting-edge Chinese “reasoning” model rivals OpenAI o1—and it’s free to download

Unlike conventional LLMs, these SR models take extra time to produce responses, and this extra time often increases performance on tasks involving math, physics, and science. And this latest open model is turning heads for apparently quickly catching up to OpenAI.

For example, DeepSeek reports that R1 outperformed OpenAI’s o1 on several benchmarks and tests, including AIME (a mathematical reasoning test), MATH-500 (a collection of word problems), and SWE-bench Verified (a programming assessment tool). As we usually mention, AI benchmarks need to be taken with a grain of salt, and these results have yet to be independently verified.

A chart of DeepSeek R1 benchmark results, created by DeepSeek.

A chart of DeepSeek R1 benchmark results, created by DeepSeek. Credit: DeepSeek

TechCrunch reports that three Chinese labs—DeepSeek, Alibaba, and Moonshot AI’s Kimi—have now released models they say match o1’s capabilities, with DeepSeek first previewing R1 in November.

But the new DeepSeek model comes with a catch if run in the cloud-hosted version—being Chinese in origin, R1 will not generate responses about certain topics like Tiananmen Square or Taiwan’s autonomy, as it must “embody core socialist values,” according to Chinese Internet regulations. This filtering comes from an additional moderation layer that isn’t an issue if the model is run locally outside of China.

Even with the potential censorship, Dean Ball, an AI researcher at George Mason University, wrote on X, “The impressive performance of DeepSeek’s distilled models (smaller versions of r1) means that very capable reasoners will continue to proliferate widely and be runnable on local hardware, far from the eyes of any top-down control regime.”

Cutting-edge Chinese “reasoning” model rivals OpenAI o1—and it’s free to download Read More »