openai

what-does-“phd-level”-ai-mean?-openai’s-rumored-$20,000-agent-plan-explained.

What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan explained.

On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent—suggesting a leap in mathematical reasoning capabilities over the previous model.

Benchmarks vs. real-world value

Ideally, potential applications for a true PhD-level AI model would include analyzing medical research data, supporting climate modeling, and handling routine aspects of research work.

The high price points reported by The Information, if accurate, suggest that OpenAI believes these systems could provide substantial value to businesses. The publication notes that SoftBank, an OpenAI investor, has committed to spending $3 billion on OpenAI’s agent products this year alone—indicating significant business interest despite the costs.

Meanwhile, OpenAI faces financial pressures that may influence its premium pricing strategy. The company reportedly lost approximately $5 billion last year covering operational costs and other expenses related to running its services.

News of OpenAI’s stratospheric pricing plans come after years of relatively affordable AI services that have conditioned users to expect powerful capabilities at relatively low costs. ChatGPT Plus remains $20 per month and Claude Pro costs $30 monthly—both tiny fractions of these proposed enterprise tiers. Even ChatGPT Pro’s $200/month subscription is relatively small compared to the new proposed fees. Whether the performance difference between these tiers will match their thousandfold price difference is an open question.

Despite their benchmark performances, these simulated reasoning models still struggle with confabulations—instances where they generate plausible-sounding but factually incorrect information. This remains a critical concern for research applications where accuracy and reliability are paramount. A $20,000 monthly investment raises questions about whether organizations can trust these systems not to introduce subtle errors into high-stakes research.

In response to the news, several people quipped on social media that companies could hire an actual PhD student for much cheaper. “In case you have forgotten,” wrote xAI developer Hieu Pham in a viral tweet, “most PhD students, including the brightest stars who can do way better work than any current LLMs—are not paid $20K / month.”

While these systems show strong capabilities on specific benchmarks, the “PhD-level” label remains largely a marketing term. These models can process and synthesize information at impressive speeds, but questions remain about how effectively they can handle the creative thinking, intellectual skepticism, and original research that define actual doctoral-level work. On the other hand, they will never get tired or need health insurance, and they will likely continue to improve in capability and drop in cost over time.

What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan explained. Read More »

elon-musk-loses-initial-attempt-to-block-openai’s-for-profit-conversion

Elon Musk loses initial attempt to block OpenAI’s for-profit conversion

A federal judge rejected Elon Musk’s request to block OpenAI’s planned conversion from a nonprofit to for-profit entity but expedited the case so that Musk’s core claims can be addressed in a trial before the end of this year.

Musk had filed a motion for preliminary injunction in US District Court for the Northern District of California, claiming that OpenAI’s for-profit conversation “violates the terms of Musk’s donations” to the company. But Musk failed to meet the burden of proof needed for an injunction, Judge Yvonne Gonzalez Rogers ruled yesterday.

“Plaintiffs Elon Musk, [former OpenAI board member] Shivon Zilis, and X.AI Corp. (‘xAI’) collectively move for a preliminary injunction barring defendants from engaging in various business activities, which plaintiffs claim violate federal antitrust and state law,” Rogers wrote. “The relief requested is extraordinary and rarely granted as it seeks the ultimate relief of the case on an expedited basis, with a cursory record, and without the benefit of a trial.”

Rogers said that “the Court is prepared to offer an expedited schedule on the core claims driving this litigation [to] address the issues which are allegedly more urgent in terms of public, not private, considerations.” There would be important public interest considerations if the for-profit shift is found to be illegal at a trial, she wrote.

Musk said OpenAI took advantage of him

Noting that OpenAI donors may have taken tax deductions from a nonprofit that is now turning into a for-profit enterprise, Rogers said the court “agrees that significant and irreparable harm is incurred when the public’s money is used to fund a non-profit’s conversion into a for-profit.” But as for the motion to block the for-profit conversion before a trial, “The request for an injunction barring any steps towards OpenAI’s conversion to a for-profit entity is DENIED.”

Elon Musk loses initial attempt to block OpenAI’s for-profit conversion Read More »

ai-firms-follow-deepseek’s-lead,-create-cheaper-models-with-“distillation”

AI firms follow DeepSeek’s lead, create cheaper models with “distillation”

Thanks to distillation, developers and businesses can access these models’ capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.

Developers can use OpenAI’s platform for distillation, learning from the large language models that underpin products like ChatGPT. OpenAI’s largest backer, Microsoft, used GPT-4 to distill its small language family of models Phi as part of a commercial partnership after investing nearly $14 billion into the company.

However, the San Francisco-based start-up has said it believes DeepSeek distilled OpenAI’s models to train its competitor, a move that would be against its terms of service. DeepSeek has not commented on the claims.

While distillation can be used to create high-performing models, experts add they are more limited.

“Distillation presents an interesting trade-off; if you make the models smaller, you inevitably reduce their capability,” said Ahmed Awadallah of Microsoft Research, who said a distilled model can be designed to be very good at summarising emails, for example, “but it really would not be good at anything else.”

David Cox, vice-president for AI models at IBM Research, said most businesses do not need a massive model to run their products, and distilled ones are powerful enough for purposes such as customer service chatbots or running on smaller devices like phones.

“Any time you can [make it less expensive] and it gives you the right performance you want, there is very little reason not to do it,” he added.

That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load.

AI firms follow DeepSeek’s lead, create cheaper models with “distillation” Read More »

“it’s-a-lemon”—openai’s-largest-ai-model-ever-arrives-to-mixed-reviews

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

Perhaps because of the disappointing results, Altman had previously written that GPT-4.5 will be the last of OpenAI’s traditional AI models, with GPT-5 planned to be a dynamic combination of “non-reasoning” LLMs and simulated reasoning models like o3.

A stratospheric price and a tech dead-end

And about that price—it’s a doozy. GPT-4.5 costs $75 per million input tokens and $150 per million output tokens through the API, compared to GPT-4o’s $2.50 per million input tokens and $10 per million output tokens. (Tokens are chunks of data used by AI models for processing). For developers using OpenAI models, this pricing makes GPT-4.5 impractical for many applications where GPT-4o already performs adequately.

By contrast, OpenAI’s flagship reasoning model, o1 pro, costs $15 per million input tokens and $60 per million output tokens—significantly less than GPT-4.5 despite offering specialized simulated reasoning capabilities. Even more striking, the o3-mini model costs just $1.10 per million input tokens and $4.40 per million output tokens, making it cheaper than even GPT-4o while providing much stronger performance on specific tasks.

OpenAI has likely known about diminishing returns in training LLMs for some time. As a result, the company spent most of last year working on simulated reasoning models like o1 and o3, which use a different inference-time (runtime) approach to improving performance instead of throwing ever-larger amounts of training data at GPT-style AI models.

OpenAI's self-reported benchmark results for the SimpleQA test, which measures confabulation rate.

OpenAI’s self-reported benchmark results for the SimpleQA test, which measures confabulation rate. Credit: OpenAI

While this seems like bad news for OpenAI in the short term, competition is thriving in the AI market. Anthropic’s Claude 3.7 Sonnet has demonstrated vastly better performance than GPT-4.5, with a reportedly more efficient architecture. It’s worth noting that Claude 3.7 Sonnet is likely a system of AI models working together behind the scenes, although Anthropic has not provided details about its architecture.

For now, it seems that GPT-4.5 may be the last of its kind—a technological dead-end for an unsupervised learning approach that has paved the way for new architectures in AI models, such as o3’s inference-time reasoning and perhaps even something more novel, like diffusion-based models. Only time will tell how things end up.

GPT-4.5 is now available to ChatGPT Pro subscribers, with rollout to Plus and Team subscribers planned for next week, followed by Enterprise and Education customers the week after. Developers can access it through OpenAI’s various APIs on paid tiers, though the company is uncertain about its long-term availability.

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews Read More »

microsoft’s-new-ai-agent-can-control-software-and-robots

Microsoft’s new AI agent can control software and robots

The researchers' explanations about how

The researchers’ explanations about how “Set-of-Mark” and “Trace-of-Mark” work. Credit: Microsoft Research

The Magma model introduces two technical components: Set-of-Mark, which identifies objects that can be manipulated in an environment by assigning numeric labels to interactive elements, such as clickable buttons in a UI or graspable objects in a robotic workspace, and Trace-of-Mark, which learns movement patterns from video data. Microsoft says those features allow the model to complete tasks like navigating user interfaces or directing robotic arms to grasp objects.

Microsoft Magma researcher Jianwei Yang wrote in a Hacker News comment that the name “Magma” stands for “M(ultimodal) Ag(entic) M(odel) at Microsoft (Rese)A(rch),” after some people noted that “Magma” already belongs to an existing matrix algebra library, which could create some confusion in technical discussions.

Reported improvements over previous models

In its Magma write-up, Microsoft claims Magma-8B performs competitively across benchmarks, showing strong results in UI navigation and robot manipulation tasks.

For example, it scored 80.0 on the VQAv2 visual question-answering benchmark—higher than GPT-4V’s 77.2 but lower than LLaVA-Next’s 81.8. Its POPE score of 87.4 leads all models in the comparison. In robot manipulation, Magma reportedly outperforms OpenVLA, an open source vision-language-action model, in multiple robot manipulation tasks.

Magma's agentic benchmarks, as reported by the researchers.

Magma’s agentic benchmarks, as reported by the researchers. Credit: Microsoft Research

As always, we take AI benchmarks with a grain of salt since many have not been scientifically validated as being able to measure useful properties of AI models. External verification of Microsoft’s benchmark results will become possible once other researchers can access the public code release.

Like all AI models, Magma is not perfect. It still faces technical limitations in complex step-by-step decision-making that requires multiple steps over time, according to Microsoft’s documentation. The company says it continues to work on improving these capabilities through ongoing research.

Yang says Microsoft will release Magma’s training and inference code on GitHub next week, allowing external researchers to build on the work. If Magma delivers on its promise, it could push Microsoft’s AI assistants beyond limited text interactions, enabling them to operate software autonomously and execute real-world tasks through robotics.

Magma is also a sign of how quickly the culture around AI can change. Just a few years ago, this kind of agentic talk scared many people who feared it might lead to AI taking over the world. While some people still fear that outcome, in 2025, AI agents are a common topic of mainstream AI research that regularly takes place without triggering calls to pause all of AI development.

Microsoft’s new AI agent can control software and robots Read More »

openai-board-considers-special-voting-powers-to-prevent-elon-musk-takeover

OpenAI board considers special voting powers to prevent Elon Musk takeover

Poison pill another option

OpenAI was founded as a nonprofit in 2015 and created an additional “capped profit” entity in 2019. Any profit beyond the cap is returned to the nonprofit, OpenAI says.

That would change with OpenAI’s planned shift to a for-profit public benefit corporation this year. The nonprofit arm would retain shares in the for-profit arm and “pursue charitable initiatives in sectors such as health care, education, and science.”

Before making his offer, Musk asked a federal court to block OpenAI’s conversion from nonprofit to for-profit. The Financial Times article suggests that new voting rights for the nonprofit arm could address the concerns raised by Musk about the for-profit shift.

“Special voting rights could keep power in the hands of its nonprofit arm in future and so address the Tesla chief’s criticisms that Altman and OpenAI have moved away from their original mission of creating powerful AI for the benefit of humanity,” the FT wrote.

OpenAI could also consider a poison pill or a shareholder rights plan that would let shareholders “buy up additional shares at a discount in order to fend off hostile takeovers,” the FT article said. But it’s not clear whether this is a likely option, as the article said it’s just one that “could be considered by OpenAI’s board.”

In April 2022, Twitter’s board approved a poison pill to prevent a hostile takeover after Musk offered to buy Twitter for $43 billion. But Twitter’s board changed course 10 days later when it agreed to a $44 billion deal with Musk.

OpenAI board considers special voting powers to prevent Elon Musk takeover Read More »

chatgpt-can-now-write-erotica-as-openai-eases-up-on-ai-paternalism

ChatGPT can now write erotica as OpenAI eases up on AI paternalism

“Following the initial release of the Model Spec (May 2024), many users and developers expressed support for enabling a ‘grown-up mode.’ We’re exploring how to let developers and users generate erotica and gore in age-appropriate contexts through the API and ChatGPT so long as our usage policies are met—while drawing a hard line against potentially harmful uses like sexual deepfakes and revenge porn.”

OpenAI CEO Sam Altman has mentioned the need for a “grown-up mode” publicly in the past as well. While it seems like “grown-up mode” is finally here, it’s not technically a “mode,” but a new universal policy that potentially gives ChatGPT users more flexibility in interacting with the AI assistant.

Of course, uncensored large language models (LLMs) have been around for years at this point, with hobbyist communities online developing them for reasons that range from wanting bespoke written pornography to not wanting any kind of paternalistic censorship.

In July 2023, we reported that the ChatGPT user base started declining for the first time after OpenAI started more heavily censoring outputs due to public and lawmaker backlash. At that time, some users began to use uncensored chatbots that could run on local hardware and were often available for free as “open weights” models.

Three types of iffy content

The Model Spec outlines formalized rules for restricting or generating potentially harmful content while staying within guidelines. OpenAI has divided this kind of restricted or iffy content into three categories of declining severity: prohibited content (“only applies to sexual content involving minors”), restricted content (“includes informational hazards and sensitive personal data”), and sensitive content in appropriate contexts (“includes erotica and gore”).

Under the category of prohibited content, OpenAI says that generating sexual content involving minors is always prohibited, although the assistant may “discuss sexual content involving minors in non-graphic educational or sex-ed contexts, including non-graphic depictions within personal harm anecdotes.”

Under restricted content, OpenAI’s document outlines how ChatGPT should never generate information hazards (like how to build a bomb, make illegal drugs, or manipulate political views) or provide sensitive personal data (like searching for someone’s address).

Under sensitive content, ChatGPT’s guidelines mirror what we stated above: Erotica or gore may only be generated under specific circumstances that include educational, medical, and historical contexts or when transforming user-provided content.

ChatGPT can now write erotica as OpenAI eases up on AI paternalism Read More »

sam-altman:-openai-is-not-for-sale,-even-for-elon-musk’s-$97-billion-offer

Sam Altman: OpenAI is not for sale, even for Elon Musk’s $97 billion offer

A brief history of Musk vs. Altman

The beef between Musk and Altman goes back to 2015, when the pair partnered (with others) to co-found OpenAI as a nonprofit. Musk cut ties with the company in 2018 but watched from the sidelines as OpenAI became a media darling in 2022 and 2023 following the launch of ChatGPT and then GPT-4.

In July 2023, Musk created his own OpenAI competitor, xAI (maker of Grok). Since then, Musk has become a frequent legal thorn in Altman and OpenAI’s side, at times suing both OpenAI and Altman personally, claiming that OpenAI has strayed from its original open source mission—especially after reports emerged about Altman’s plans to transition portions of OpenAI into a for-profit company, something Musk has fiercely criticized.

Musk initially sued the company and Altman in March 2024, claiming that OpenAI’s alliance with Microsoft had broken its agreement to make a major breakthrough in AI “freely available to the public.” Musk withdrew the suit in June 2024, then revived it in August 2024 under similar complaints.

Musk and Altman have been publicly trading barbs frequently on X and in the press over the past few years, most recently when Musk criticized Altman’s $500B “Stargate” AI infrastructure project announced last month.

This morning, when asked on Bloomberg Television if Musk’s move comes from personal insecurity about xAI, Altman replied, “Probably his whole life is from a position of insecurity.”

“I don’t think he’s a happy guy. I feel for him,” he added.

Sam Altman: OpenAI is not for sale, even for Elon Musk’s $97 billion offer Read More »

openai’s-secret-weapon-against-nvidia-dependence-takes-shape

OpenAI’s secret weapon against Nvidia dependence takes shape

OpenAI is entering the final stages of designing its long-rumored AI processor with the aim of decreasing the company’s dependence on Nvidia hardware, according to a Reuters report released Monday. The ChatGPT creator plans to send its chip designs to Taiwan Semiconductor Manufacturing Co. (TSMC) for fabrication within the next few months, but the chip has not yet been formally announced.

The OpenAI chip’s full capabilities, technical details, and exact timeline are still unknown, but the company reportedly intends to iterate on the design and improve it over time, giving it leverage in negotiations with chip suppliers—and potentially granting the company future independence with a chip design it controls outright.

In the past, we’ve seen other tech companies, such as Microsoft, Amazon, Google, and Meta, create their own AI acceleration chips for reasons that range from cost reduction to relieving shortages of AI chips supplied by Nvidia, which enjoys a near-market monopoly on high-powered GPUs (such as the Blackwell series) for data center use.

In October 2023, we covered a report about OpenAI’s intention to create its own AI accelerator chips for similar reasons, so OpenAI’s custom chip project has been in the works for some time. In early 2024, OpenAI CEO Sam Altman also began spending considerable time traveling around the world trying to raise up to a reported $7 trillion to increase world chip fabrication capacity.

OpenAI’s secret weapon against Nvidia dependence takes shape Read More »

chatgpt-comes-to-500,000-new-users-in-openai’s-largest-ai-education-deal-yet

ChatGPT comes to 500,000 new users in OpenAI’s largest AI education deal yet

On Tuesday, OpenAI announced plans to introduce ChatGPT to California State University’s 460,000 students and 63,000 faculty members across 23 campuses, reports Reuters. The education-focused version of the AI assistant will aim to provide students with personalized tutoring and study guides, while faculty will be able to use it for administrative work.

“It is critical that the entire education ecosystem—institutions, systems, technologists, educators, and governments—work together to ensure that all students have access to AI and gain the skills to use it responsibly,” said Leah Belsky, VP and general manager of education at OpenAI, in a statement.

OpenAI began integrating ChatGPT into educational settings in 2023, despite early concerns from some schools about plagiarism and potential cheating, leading to early bans in some US school districts and universities. But over time, resistance to AI assistants softened in some educational institutions.

Prior to OpenAI’s launch of ChatGPT Edu in May 2024—a version purpose-built for academic use—several schools had already been using ChatGPT Enterprise, including the University of Pennsylvania’s Wharton School (employer of frequent AI commentator Ethan Mollick), the University of Texas at Austin, and the University of Oxford.

Currently, the new California State partnership represents OpenAI’s largest deployment yet in US higher education.

The higher education market has become competitive for AI model makers, as Reuters notes. Last November, Google’s DeepMind division partnered with a London university to provide AI education and mentorship to teenage students. And in January, Google invested $120 million in AI education programs and plans to introduce its Gemini model to students’ school accounts.

The pros and cons

In the past, we’ve written frequently about accuracy issues with AI chatbots, such as producing confabulations—plausible fictions—that might lead students astray. We’ve also covered the aforementioned concerns about cheating. Those issues remain, and relying on ChatGPT as a factual reference is still not the best idea because the service could introduce errors into academic work that might be difficult to detect.

ChatGPT comes to 500,000 new users in OpenAI’s largest AI education deal yet Read More »

hugging-face-clones-openai’s-deep-research-in-24-hours

Hugging Face clones OpenAI’s Deep Research in 24 hours

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Hugging Face clones OpenAI’s Deep Research in 24 hours Read More »

openai-hits-back-at-deepseek-with-o3-mini-reasoning-model

OpenAI hits back at DeepSeek with o3-mini reasoning model

Over the last week, OpenAI’s place atop the AI model hierarchy has been heavily challenged by Chinese model DeepSeek. Today, OpenAI struck back with the public release of o3-mini, its latest simulated reasoning model and the first of its kind the company will offer for free to all users without a subscription.

First teased last month, OpenAI brags in today’s announcement that o3-mini “advances the boundaries of what small models can achieve.” Like September’s o1-mini before it, the model has been optimized for STEM functions and shows “particular strength in science, math, and coding” despite lower operating costs and latency than o1-mini, OpenAI says.

Harder, better, faster, stronger

Users are able to choose from three different “reasoning effort options” when using o3-mini, allowing them to fine-tune a balance between latency and accuracy depending on the task. The lowest of these reasoning levels generally shows accuracy levels comparable to o1-mini in math and coding benchmarks, according to OpenAI, while the highest matches or surpasses the full-fledged o1 model in the same tests.

The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model in OpenAI’s tests.

The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model in OpenAI’s tests. Credit: OpenAI

OpenAI says testers reported a 39 percent reduction in “major errors” when using o3-mini, compared to o1-mini, and preferred the o3-mini responses 56 percent of the time. That’s despite the medium version of o3-mini offering a 24 percent faster response time than o1-mini on average—down from 10.16 seconds to 7.7 seconds.

OpenAI hits back at DeepSeek with o3-mini reasoning model Read More »