openai

“it’s-a-lemon”—openai’s-largest-ai-model-ever-arrives-to-mixed-reviews

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

Perhaps because of the disappointing results, Altman had previously written that GPT-4.5 will be the last of OpenAI’s traditional AI models, with GPT-5 planned to be a dynamic combination of “non-reasoning” LLMs and simulated reasoning models like o3.

A stratospheric price and a tech dead-end

And about that price—it’s a doozy. GPT-4.5 costs $75 per million input tokens and $150 per million output tokens through the API, compared to GPT-4o’s $2.50 per million input tokens and $10 per million output tokens. (Tokens are chunks of data used by AI models for processing). For developers using OpenAI models, this pricing makes GPT-4.5 impractical for many applications where GPT-4o already performs adequately.

By contrast, OpenAI’s flagship reasoning model, o1 pro, costs $15 per million input tokens and $60 per million output tokens—significantly less than GPT-4.5 despite offering specialized simulated reasoning capabilities. Even more striking, the o3-mini model costs just $1.10 per million input tokens and $4.40 per million output tokens, making it cheaper than even GPT-4o while providing much stronger performance on specific tasks.

OpenAI has likely known about diminishing returns in training LLMs for some time. As a result, the company spent most of last year working on simulated reasoning models like o1 and o3, which use a different inference-time (runtime) approach to improving performance instead of throwing ever-larger amounts of training data at GPT-style AI models.

OpenAI's self-reported benchmark results for the SimpleQA test, which measures confabulation rate.

OpenAI’s self-reported benchmark results for the SimpleQA test, which measures confabulation rate. Credit: OpenAI

While this seems like bad news for OpenAI in the short term, competition is thriving in the AI market. Anthropic’s Claude 3.7 Sonnet has demonstrated vastly better performance than GPT-4.5, with a reportedly more efficient architecture. It’s worth noting that Claude 3.7 Sonnet is likely a system of AI models working together behind the scenes, although Anthropic has not provided details about its architecture.

For now, it seems that GPT-4.5 may be the last of its kind—a technological dead-end for an unsupervised learning approach that has paved the way for new architectures in AI models, such as o3’s inference-time reasoning and perhaps even something more novel, like diffusion-based models. Only time will tell how things end up.

GPT-4.5 is now available to ChatGPT Pro subscribers, with rollout to Plus and Team subscribers planned for next week, followed by Enterprise and Education customers the week after. Developers can access it through OpenAI’s various APIs on paid tiers, though the company is uncertain about its long-term availability.

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews Read More »

microsoft’s-new-ai-agent-can-control-software-and-robots

Microsoft’s new AI agent can control software and robots

The researchers' explanations about how

The researchers’ explanations about how “Set-of-Mark” and “Trace-of-Mark” work. Credit: Microsoft Research

The Magma model introduces two technical components: Set-of-Mark, which identifies objects that can be manipulated in an environment by assigning numeric labels to interactive elements, such as clickable buttons in a UI or graspable objects in a robotic workspace, and Trace-of-Mark, which learns movement patterns from video data. Microsoft says those features allow the model to complete tasks like navigating user interfaces or directing robotic arms to grasp objects.

Microsoft Magma researcher Jianwei Yang wrote in a Hacker News comment that the name “Magma” stands for “M(ultimodal) Ag(entic) M(odel) at Microsoft (Rese)A(rch),” after some people noted that “Magma” already belongs to an existing matrix algebra library, which could create some confusion in technical discussions.

Reported improvements over previous models

In its Magma write-up, Microsoft claims Magma-8B performs competitively across benchmarks, showing strong results in UI navigation and robot manipulation tasks.

For example, it scored 80.0 on the VQAv2 visual question-answering benchmark—higher than GPT-4V’s 77.2 but lower than LLaVA-Next’s 81.8. Its POPE score of 87.4 leads all models in the comparison. In robot manipulation, Magma reportedly outperforms OpenVLA, an open source vision-language-action model, in multiple robot manipulation tasks.

Magma's agentic benchmarks, as reported by the researchers.

Magma’s agentic benchmarks, as reported by the researchers. Credit: Microsoft Research

As always, we take AI benchmarks with a grain of salt since many have not been scientifically validated as being able to measure useful properties of AI models. External verification of Microsoft’s benchmark results will become possible once other researchers can access the public code release.

Like all AI models, Magma is not perfect. It still faces technical limitations in complex step-by-step decision-making that requires multiple steps over time, according to Microsoft’s documentation. The company says it continues to work on improving these capabilities through ongoing research.

Yang says Microsoft will release Magma’s training and inference code on GitHub next week, allowing external researchers to build on the work. If Magma delivers on its promise, it could push Microsoft’s AI assistants beyond limited text interactions, enabling them to operate software autonomously and execute real-world tasks through robotics.

Magma is also a sign of how quickly the culture around AI can change. Just a few years ago, this kind of agentic talk scared many people who feared it might lead to AI taking over the world. While some people still fear that outcome, in 2025, AI agents are a common topic of mainstream AI research that regularly takes place without triggering calls to pause all of AI development.

Microsoft’s new AI agent can control software and robots Read More »

openai-board-considers-special-voting-powers-to-prevent-elon-musk-takeover

OpenAI board considers special voting powers to prevent Elon Musk takeover

Poison pill another option

OpenAI was founded as a nonprofit in 2015 and created an additional “capped profit” entity in 2019. Any profit beyond the cap is returned to the nonprofit, OpenAI says.

That would change with OpenAI’s planned shift to a for-profit public benefit corporation this year. The nonprofit arm would retain shares in the for-profit arm and “pursue charitable initiatives in sectors such as health care, education, and science.”

Before making his offer, Musk asked a federal court to block OpenAI’s conversion from nonprofit to for-profit. The Financial Times article suggests that new voting rights for the nonprofit arm could address the concerns raised by Musk about the for-profit shift.

“Special voting rights could keep power in the hands of its nonprofit arm in future and so address the Tesla chief’s criticisms that Altman and OpenAI have moved away from their original mission of creating powerful AI for the benefit of humanity,” the FT wrote.

OpenAI could also consider a poison pill or a shareholder rights plan that would let shareholders “buy up additional shares at a discount in order to fend off hostile takeovers,” the FT article said. But it’s not clear whether this is a likely option, as the article said it’s just one that “could be considered by OpenAI’s board.”

In April 2022, Twitter’s board approved a poison pill to prevent a hostile takeover after Musk offered to buy Twitter for $43 billion. But Twitter’s board changed course 10 days later when it agreed to a $44 billion deal with Musk.

OpenAI board considers special voting powers to prevent Elon Musk takeover Read More »

chatgpt-can-now-write-erotica-as-openai-eases-up-on-ai-paternalism

ChatGPT can now write erotica as OpenAI eases up on AI paternalism

“Following the initial release of the Model Spec (May 2024), many users and developers expressed support for enabling a ‘grown-up mode.’ We’re exploring how to let developers and users generate erotica and gore in age-appropriate contexts through the API and ChatGPT so long as our usage policies are met—while drawing a hard line against potentially harmful uses like sexual deepfakes and revenge porn.”

OpenAI CEO Sam Altman has mentioned the need for a “grown-up mode” publicly in the past as well. While it seems like “grown-up mode” is finally here, it’s not technically a “mode,” but a new universal policy that potentially gives ChatGPT users more flexibility in interacting with the AI assistant.

Of course, uncensored large language models (LLMs) have been around for years at this point, with hobbyist communities online developing them for reasons that range from wanting bespoke written pornography to not wanting any kind of paternalistic censorship.

In July 2023, we reported that the ChatGPT user base started declining for the first time after OpenAI started more heavily censoring outputs due to public and lawmaker backlash. At that time, some users began to use uncensored chatbots that could run on local hardware and were often available for free as “open weights” models.

Three types of iffy content

The Model Spec outlines formalized rules for restricting or generating potentially harmful content while staying within guidelines. OpenAI has divided this kind of restricted or iffy content into three categories of declining severity: prohibited content (“only applies to sexual content involving minors”), restricted content (“includes informational hazards and sensitive personal data”), and sensitive content in appropriate contexts (“includes erotica and gore”).

Under the category of prohibited content, OpenAI says that generating sexual content involving minors is always prohibited, although the assistant may “discuss sexual content involving minors in non-graphic educational or sex-ed contexts, including non-graphic depictions within personal harm anecdotes.”

Under restricted content, OpenAI’s document outlines how ChatGPT should never generate information hazards (like how to build a bomb, make illegal drugs, or manipulate political views) or provide sensitive personal data (like searching for someone’s address).

Under sensitive content, ChatGPT’s guidelines mirror what we stated above: Erotica or gore may only be generated under specific circumstances that include educational, medical, and historical contexts or when transforming user-provided content.

ChatGPT can now write erotica as OpenAI eases up on AI paternalism Read More »

sam-altman:-openai-is-not-for-sale,-even-for-elon-musk’s-$97-billion-offer

Sam Altman: OpenAI is not for sale, even for Elon Musk’s $97 billion offer

A brief history of Musk vs. Altman

The beef between Musk and Altman goes back to 2015, when the pair partnered (with others) to co-found OpenAI as a nonprofit. Musk cut ties with the company in 2018 but watched from the sidelines as OpenAI became a media darling in 2022 and 2023 following the launch of ChatGPT and then GPT-4.

In July 2023, Musk created his own OpenAI competitor, xAI (maker of Grok). Since then, Musk has become a frequent legal thorn in Altman and OpenAI’s side, at times suing both OpenAI and Altman personally, claiming that OpenAI has strayed from its original open source mission—especially after reports emerged about Altman’s plans to transition portions of OpenAI into a for-profit company, something Musk has fiercely criticized.

Musk initially sued the company and Altman in March 2024, claiming that OpenAI’s alliance with Microsoft had broken its agreement to make a major breakthrough in AI “freely available to the public.” Musk withdrew the suit in June 2024, then revived it in August 2024 under similar complaints.

Musk and Altman have been publicly trading barbs frequently on X and in the press over the past few years, most recently when Musk criticized Altman’s $500B “Stargate” AI infrastructure project announced last month.

This morning, when asked on Bloomberg Television if Musk’s move comes from personal insecurity about xAI, Altman replied, “Probably his whole life is from a position of insecurity.”

“I don’t think he’s a happy guy. I feel for him,” he added.

Sam Altman: OpenAI is not for sale, even for Elon Musk’s $97 billion offer Read More »

openai’s-secret-weapon-against-nvidia-dependence-takes-shape

OpenAI’s secret weapon against Nvidia dependence takes shape

OpenAI is entering the final stages of designing its long-rumored AI processor with the aim of decreasing the company’s dependence on Nvidia hardware, according to a Reuters report released Monday. The ChatGPT creator plans to send its chip designs to Taiwan Semiconductor Manufacturing Co. (TSMC) for fabrication within the next few months, but the chip has not yet been formally announced.

The OpenAI chip’s full capabilities, technical details, and exact timeline are still unknown, but the company reportedly intends to iterate on the design and improve it over time, giving it leverage in negotiations with chip suppliers—and potentially granting the company future independence with a chip design it controls outright.

In the past, we’ve seen other tech companies, such as Microsoft, Amazon, Google, and Meta, create their own AI acceleration chips for reasons that range from cost reduction to relieving shortages of AI chips supplied by Nvidia, which enjoys a near-market monopoly on high-powered GPUs (such as the Blackwell series) for data center use.

In October 2023, we covered a report about OpenAI’s intention to create its own AI accelerator chips for similar reasons, so OpenAI’s custom chip project has been in the works for some time. In early 2024, OpenAI CEO Sam Altman also began spending considerable time traveling around the world trying to raise up to a reported $7 trillion to increase world chip fabrication capacity.

OpenAI’s secret weapon against Nvidia dependence takes shape Read More »

chatgpt-comes-to-500,000-new-users-in-openai’s-largest-ai-education-deal-yet

ChatGPT comes to 500,000 new users in OpenAI’s largest AI education deal yet

On Tuesday, OpenAI announced plans to introduce ChatGPT to California State University’s 460,000 students and 63,000 faculty members across 23 campuses, reports Reuters. The education-focused version of the AI assistant will aim to provide students with personalized tutoring and study guides, while faculty will be able to use it for administrative work.

“It is critical that the entire education ecosystem—institutions, systems, technologists, educators, and governments—work together to ensure that all students have access to AI and gain the skills to use it responsibly,” said Leah Belsky, VP and general manager of education at OpenAI, in a statement.

OpenAI began integrating ChatGPT into educational settings in 2023, despite early concerns from some schools about plagiarism and potential cheating, leading to early bans in some US school districts and universities. But over time, resistance to AI assistants softened in some educational institutions.

Prior to OpenAI’s launch of ChatGPT Edu in May 2024—a version purpose-built for academic use—several schools had already been using ChatGPT Enterprise, including the University of Pennsylvania’s Wharton School (employer of frequent AI commentator Ethan Mollick), the University of Texas at Austin, and the University of Oxford.

Currently, the new California State partnership represents OpenAI’s largest deployment yet in US higher education.

The higher education market has become competitive for AI model makers, as Reuters notes. Last November, Google’s DeepMind division partnered with a London university to provide AI education and mentorship to teenage students. And in January, Google invested $120 million in AI education programs and plans to introduce its Gemini model to students’ school accounts.

The pros and cons

In the past, we’ve written frequently about accuracy issues with AI chatbots, such as producing confabulations—plausible fictions—that might lead students astray. We’ve also covered the aforementioned concerns about cheating. Those issues remain, and relying on ChatGPT as a factual reference is still not the best idea because the service could introduce errors into academic work that might be difficult to detect.

ChatGPT comes to 500,000 new users in OpenAI’s largest AI education deal yet Read More »

hugging-face-clones-openai’s-deep-research-in-24-hours

Hugging Face clones OpenAI’s Deep Research in 24 hours

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Hugging Face clones OpenAI’s Deep Research in 24 hours Read More »

openai-hits-back-at-deepseek-with-o3-mini-reasoning-model

OpenAI hits back at DeepSeek with o3-mini reasoning model

Over the last week, OpenAI’s place atop the AI model hierarchy has been heavily challenged by Chinese model DeepSeek. Today, OpenAI struck back with the public release of o3-mini, its latest simulated reasoning model and the first of its kind the company will offer for free to all users without a subscription.

First teased last month, OpenAI brags in today’s announcement that o3-mini “advances the boundaries of what small models can achieve.” Like September’s o1-mini before it, the model has been optimized for STEM functions and shows “particular strength in science, math, and coding” despite lower operating costs and latency than o1-mini, OpenAI says.

Harder, better, faster, stronger

Users are able to choose from three different “reasoning effort options” when using o3-mini, allowing them to fine-tune a balance between latency and accuracy depending on the task. The lowest of these reasoning levels generally shows accuracy levels comparable to o1-mini in math and coding benchmarks, according to OpenAI, while the highest matches or surpasses the full-fledged o1 model in the same tests.

The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model in OpenAI’s tests.

The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model in OpenAI’s tests. Credit: OpenAI

OpenAI says testers reported a 39 percent reduction in “major errors” when using o3-mini, compared to o1-mini, and preferred the o3-mini responses 56 percent of the time. That’s despite the medium version of o3-mini offering a 24 percent faster response time than o1-mini on average—down from 10.16 seconds to 7.7 seconds.

OpenAI hits back at DeepSeek with o3-mini reasoning model Read More »

report:-deepseek’s-chat-histories-and-internal-data-were-publicly-exposed

Report: DeepSeek’s chat histories and internal data were publicly exposed

A cloud security firm found a publicly accessible, fully controllable database belonging to DeepSeek, the Chinese firm that has recently shaken up the AI world, “within minutes” of examining DeepSeek’s security, according to a blog post by Wiz.

An analytical ClickHouse database tied to DeepSeek, “completely open and unauthenticated,” contained more than 1 million instances of “chat history, backend data, and sensitive information, including log streams, API secrets, and operational details,” according to Wiz. An open web interface also allowed for full database control and privilege escalation, with internal API endpoints and keys available through the interface and common URL parameters.

“While much of the attention around AI security is focused on futuristic threats, the real dangers often come from basic risks—like accidental external exposure of databases,” writes Gal Nagli at Wiz’s blog. “As organizations rush to adopt AI tools and services from a growing number of startups and providers, it’s essential to remember that by doing so, we’re entrusting these companies with sensitive data. The rapid pace of adoption often leads to overlooking security, but protecting customer data must remain the top priority.”

Ars has contacted DeepSeek for comment and will update this post with any response. Wiz noted that it did not receive a response from DeepSeek regarding its findings, but after contacting every DeepSeek email and LinkedIn profile Wiz could find on Wednesday, the company protected the databases Wiz had previously accessed within half an hour.

Report: DeepSeek’s chat histories and internal data were publicly exposed Read More »

openai-teases-“new-era”-of-ai-in-us,-deepens-ties-with-government

OpenAI teases “new era” of AI in US, deepens ties with government

On Thursday, OpenAI announced that it is deepening its ties with the US government through a partnership with the National Laboratories and expects to use AI to “supercharge” research across a wide range of fields to better serve the public.

“This is the beginning of a new era, where AI will advance science, strengthen national security, and support US government initiatives,” OpenAI said.

The deal ensures that “approximately 15,000 scientists working across a wide range of disciplines to advance our understanding of nature and the universe” will have access to OpenAI’s latest reasoning models, the announcement said.

For researchers from Los Alamos, Lawrence Livermore, and Sandia National Labs, access to “o1 or another o-series model” will be available on Venado—an Nvidia supercomputer at Los Alamos that will become a “shared resource.” Microsoft will help deploy the model, OpenAI noted.

OpenAI suggested this access could propel major “breakthroughs in materials science, renewable energy, astrophysics,” and other areas that Venado was “specifically designed” to advance.

Key areas of focus for Venado’s deployment of OpenAI’s model include accelerating US global tech leadership, finding ways to treat and prevent disease, strengthening cybersecurity, protecting the US power grid, detecting natural and man-made threats “before they emerge,” and ” deepening our understanding of the forces that govern the universe,” OpenAI said.

Perhaps among OpenAI’s flashiest promises for the partnership, though, is helping the US achieve a “a new era of US energy leadership by unlocking the full potential of natural resources and revolutionizing the nation’s energy infrastructure.” That is urgently needed, as officials have warned that America’s aging energy infrastructure is becoming increasingly unstable, threatening the country’s health and welfare, and without efforts to stabilize it, the US economy could tank.

But possibly the most “highly consequential” government use case for OpenAI’s models will be supercharging research safeguarding national security, OpenAI indicated.

OpenAI teases “new era” of AI in US, deepens ties with government Read More »

i-agree-with-openai:-you-shouldn’t-use-other-peoples’-work-without-permission

I agree with OpenAI: You shouldn’t use other peoples’ work without permission

ChatGPT developer OpenAI and other players in the generative AI business were caught unawares this week by a Chinese company named DeepSeek, whose open source R1 simulated reasoning model provides results similar to OpenAI’s best paid models (with some notable exceptions) despite being created using just a fraction of the computing power.

Since ChatGPT, Stable Diffusion, and other generative AI models first became publicly available in late 2022 and 2023, the US AI industry has been undergirded by the assumption that you’d need ever-greater amounts of training data and compute power to continue improving their models and get—eventually, maybe—to a functioning version of artificial general intelligence, or AGI.

Those assumptions were reflected in everything from Nvidia’s stock price to energy investments and data center plans. Whether DeepSeek fundamentally upends those plans remains to be seen. But at a bare minimum, it has shaken investors who have poured money into OpenAI, a company that reportedly believes it won’t turn a profit until the end of the decade.

OpenAI CEO Sam Altman concedes that the DeepSeek R1 model is “impressive,” but the company is taking steps to protect its models (both language and business); OpenAI told the Financial Times and other outlets that it believed DeepSeek had used output from OpenAI’s models to train the R1 model, a method known as “distillation.” Using OpenAI’s models to train a model that will compete with OpenAI’s models is a violation of the company’s terms of service.

“We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the US government to protect the most capable models being built here,” an OpenAI spokesperson told Ars.

So taking data without permission is bad, now?

I’m not here to say whether the R1 model is the product of distillation. What I can say is that it’s a little rich for OpenAI to suddenly be so very publicly concerned about the sanctity of proprietary data.

I agree with OpenAI: You shouldn’t use other peoples’ work without permission Read More »