openai

12-days-of-openai:-the-ars-technica-recap

12 days of OpenAI: The Ars Technica recap


Did OpenAI’s big holiday event live up to the billing?

Over the past 12 business days, OpenAI has announced a new product or demoed an AI feature every weekday, calling the PR event “12 days of OpenAI.” We’ve covered some of the major announcements, but we thought a look at each announcement might be useful for people seeking a comprehensive look at each day’s developments.

The timing and rapid pace of these announcements—particularly in light of Google’s competing releases—illustrates the intensifying competition in AI development. What might normally have been spread across months was compressed into just 12 business days, giving users and developers a lot to process as they head into 2025.

Humorously, we asked ChatGPT what it thought about the whole series of announcements, and it was skeptical that the event even took place. “The rapid-fire announcements over 12 days seem plausible,” wrote ChatGPT-4o, “But might strain credibility without a clearer explanation of how OpenAI managed such an intense release schedule, especially given the complexity of the features.”

But it did happen, and here’s a chronicle of what went down on each day.

Day 1: Thursday, December 5

On the first day of OpenAI, the company released its full o1 model, making it available to ChatGPT Plus and Team subscribers worldwide. The company reported that the model operates faster than its preview version and reduces major errors by 34 percent on complex real-world questions.

The o1 model brings new capabilities for image analysis, allowing users to upload and receive detailed explanations of visual content. OpenAI said it plans to expand o1’s features to include web browsing and file uploads in ChatGPT, with API access coming soon. The API version will support vision tasks, function calling, and structured outputs for system integration.

OpenAI also launched ChatGPT Pro, a $200 subscription tier that provides “unlimited” access to o1, GPT-4o, and Advanced Voice features. Pro subscribers receive an exclusive version of o1 that uses additional computing power for complex problem-solving. Alongside this release, OpenAI announced a grant program that will provide ChatGPT Pro access to 10 medical researchers at established institutions, with plans to extend grants to other fields.

Day 2: Friday, December 6

Day 2 wasn’t as exciting. OpenAI unveiled Reinforcement Fine-Tuning (RFT), a model customization method that will let developers modify “o-series” models for specific tasks. The technique reportedly goes beyond traditional supervised fine-tuning by using reinforcement learning to help models improve their reasoning abilities through repeated iterations. In other words, OpenAI created a new way to train AI models that lets them learn from practice and feedback.

OpenAI says that Berkeley Lab computational researcher Justin Reese tested RFT for researching rare genetic diseases, while Thomson Reuters has created a specialized o1-mini model for its CoCounsel AI legal assistant. The technique requires developers to provide a dataset and evaluation criteria, with OpenAI’s platform managing the reinforcement learning process.

OpenAI plans to release RFT to the public in early 2024 but currently offers limited access through its Reinforcement Fine-Tuning Research Program for researchers, universities, and companies.

Day 3: Monday, December 9

On day 3, OpenAI released Sora, its text-to-video model, as a standalone product now accessible through sora.com for ChatGPT Plus and Pro subscribers. The company says the new version operates faster than the research preview shown in February 2024, when OpenAI first demonstrated the model’s ability to create videos from text descriptions.

The release moved Sora from research preview to a production service, marking OpenAI’s official entry into the video synthesis market. The company published a blog post detailing the subscription tiers and deployment strategy for the service.

Day 4: Tuesday, December 10

On day 4, OpenAI moved its Canvas feature out of beta testing, making it available to all ChatGPT users, including those on free tiers. Canvas provides a dedicated interface for extended writing and coding projects beyond the standard chat format, now with direct integration into the GPT-4o model.

The updated canvas allows users to run Python code within the interface and includes a text-pasting feature for importing existing content. OpenAI added compatibility with custom GPTs and a “show changes” function that tracks modifications to writing and code. The company said Canvas is now on chatgpt.com for web users and also available through a Windows desktop application, with more features planned for future updates.

Day 5: Wednesday, December 11

On day 5, OpenAI announced that ChatGPT would integrate with Apple Intelligence across iOS, iPadOS, and macOS devices. The integration works on iPhone 16 series phones, iPhone 15 Pro models, iPads with A17 Pro or M1 chips and later, and Macs with M1 processors or newer, running their respective latest operating systems.

The integration lets users access ChatGPT’s features (such as they are), including image and document analysis, directly through Apple’s system-level intelligence features. The feature works with all ChatGPT subscription tiers and operates within Apple’s privacy framework. Iffy message summaries remain unaffected by the additions.

Enterprise and Team account users need administrator approval to access the integration.

Day 6: Thursday, December 12

On the sixth day, OpenAI added two features to ChatGPT’s voice capabilities: “video calling” with screen sharing support for ChatGPT Plus and Pro subscribers and a seasonal Santa Claus voice preset.

The new visual Advanced Voice Mode features work through the mobile app, letting users show their surroundings or share their screen with the AI model during voice conversations. While the rollout covers most countries, users in several European nations, including EU member states, Switzerland, Iceland, Norway, and Liechtenstein, will get access at a later date. Enterprise and education users can expect these features in January.

The Santa voice option appears as a snowflake icon in the ChatGPT interface across mobile devices, web browsers, and desktop apps, with conversations in this mode not affecting chat history or memory. Don’t expect Santa to remember what you want for Christmas between sessions.

Day 7: Friday, December 13

OpenAI introduced Projects, a new organizational feature in ChatGPT that lets users group related conversations and files, on day 7. The feature works with the company’s GPT-4o model and provides a central location for managing resources related to specific tasks or topics—kinda like Anthropic’s “Projects” feature.

ChatGPT Plus, Pro, and Team subscribers can currently access Projects through chatgpt.com and the Windows desktop app, with view-only support on mobile devices and macOS. Users can create projects by clicking a plus icon in the sidebar, where they can add files and custom instructions that provide context for future conversations.

OpenAI said it plans to expand Projects in 2024 with support for additional file types, cloud storage integration through Google Drive and Microsoft OneDrive, and compatibility with other models like o1. Enterprise and education users will receive access to Projects in January.

Day 8: Monday, December 16

On day 8, OpenAI expanded its search features in ChatGPT, extending access to all users with free accounts while reportedly adding speed improvements and mobile optimizations. Basically, you can use ChatGPT like a web search engine, although in practice it doesn’t seem to be as comprehensive as Google Search at the moment.

The update includes a new maps interface and integration with Advanced Voice, allowing users to perform searches during voice conversations. The search capability, which previously required a paid subscription, now works across all platforms where ChatGPT operates.

Day 9: Tuesday, December 17

On day 9, OpenAI released its o1 model through its API platform, adding support for function calling, developer messages, and vision processing capabilities. The company also reduced GPT-4o audio pricing by 60 percent and introduced a GPT-4o mini option that costs one-tenth of previous audio rates.

OpenAI also simplified its WebRTC integration for real-time applications and unveiled Preference Fine-Tuning, which provides developers new ways to customize models. The company also launched beta versions of software development kits for the Go and Java programming languages, expanding its toolkit for developers.

Day 10: Wednesday, December 18

On Wednesday, OpenAI did something a little fun and launched voice and messaging access to ChatGPT through a toll-free number (1-800-CHATGPT), as well as WhatsApp. US residents can make phone calls with a 15-minute monthly limit, while global users can message ChatGPT through WhatsApp at the same number.

OpenAI said the release is a way to reach users who lack consistent high-speed Internet access or want to try AI through familiar communication channels, but it’s also just a clever hack. As evidence, OpenAI notes that these new interfaces serve as experimental access points, with more “limited functionality” than the full ChatGPT service, and still recommends existing users continue using their regular ChatGPT accounts for complete features.

Day 11: Thursday, December 19

On Thursday, OpenAI expanded ChatGPT’s desktop app integration to include additional coding environments and productivity software. The update added support for Jetbrains IDEs like PyCharm and IntelliJ IDEA, VS Code variants including Cursor and VSCodium, and text editors such as BBEdit and TextMate.

OpenAI also included integration with Apple Notes, Notion, and Quip while adding Advanced Voice Mode compatibility when working with desktop applications. These features require manual activation for each app and remain available to paid subscribers, including Plus, Pro, Team, Enterprise, and Education users, with Enterprise and Education customers needing administrator approval to enable the functionality.

Day 12: Friday, December 20

On Friday, OpenAI concluded its twelve days of announcements by previewing two new simulated reasoning models, o3 and o3-mini, while opening applications for safety and security researchers to test them before public release. Early evaluations show o3 achieving a 2727 rating on Codeforces programming contests and scoring 96.7 percent on AIME 2024 mathematics problems.

The company reports o3 set performance records on advanced benchmarks, solving 25.2 percent of problems on EpochAI’s Frontier Math evaluations and scoring above 85 percent on the ARC-AGI test, which is comparable to human results. OpenAI also published research about “deliberative alignment,” a technique used in developing o1. The company has not announced firm release dates for either new o3 model, but CEO Sam Altman said o3-mini might ship in late January.

So what did we learn?

OpenAI’s December campaign revealed that OpenAI had a lot of things sitting around that it needed to ship, and it picked a fun theme to unite the announcements. Google responded in kind, as we have covered.

Several trends from the releases stand out. OpenAI is heavily investing in multimodal capabilities. The o1 model’s release, Sora’s evolution from research preview to product, and the expansion of voice features with video calling all point toward systems that can seamlessly handle text, images, voice, and video.

The company is also focusing heavily on developer tools and customization, so it can continue to have a cloud service business and have its products integrated into other applications. Between the API releases, Reinforcement Fine-Tuning, and expanded IDE integrations, OpenAI is building out its ecosystem for developers and enterprises. And the introduction of o3 shows that OpenAI is still attempting to push technological boundaries, even in the face of diminishing returns in training LLM base models.

OpenAI seems to be positioning itself for a 2025 where generative AI moves beyond text chatbots and simple image generators and finds its way into novel applications that we probably can’t even predict yet. We’ll have to wait and see what the company and developers come up with in the year ahead.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

12 days of OpenAI: The Ars Technica recap Read More »

openai-announces-o3-and-o3-mini,-its-next-simulated-reasoning-models

OpenAI announces o3 and o3-mini, its next simulated reasoning models

On Friday, during Day 12 of its “12 days of OpenAI,” OpenAI CEO Sam Altman announced its latest AI “reasoning” models, o3 and o3-mini, which build upon the o1 models launched earlier this year. The company is not releasing them yet but will make these models available for public safety testing and research access today.

The models use what OpenAI calls “private chain of thought,” where the model pauses to examine its internal dialog and plan ahead before responding, which you might call “simulated reasoning” (SR)—a form of AI that goes beyond basic large language models (LLMs).

The company named the model family “o3” instead of “o2” to avoid potential trademark conflicts with British telecom provider O2, according to The Information. During Friday’s livestream, Altman acknowledged his company’s naming foibles, saying, “In the grand tradition of OpenAI being really, truly bad at names, it’ll be called o3.”

According to OpenAI, the o3 model earned a record-breaking score on the ARC-AGI benchmark, a visual reasoning benchmark that has gone unbeaten since its creation in 2019. In low-compute scenarios, o3 scored 75.7 percent, while in high-compute testing, it reached 87.5 percent—comparable to human performance at an 85 percent threshold.

OpenAI also reported that o3 scored 96.7 percent on the 2024 American Invitational Mathematics Exam, missing just one question. The model also reached 87.7 percent on GPQA Diamond, which contains graduate-level biology, physics, and chemistry questions. On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent.

OpenAI announces o3 and o3-mini, its next simulated reasoning models Read More »

not-to-be-outdone-by-openai,-google-releases-its-own-“reasoning”-ai-model

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model

Google DeepMind’s chief scientist, Jeff Dean, says that the model receives extra computing power, writing on X, “we see promising results when we increase inference time computation!” The model works by pausing to consider multiple related prompts before providing what it determines to be the most accurate answer.

Since OpenAI’s jump into the “reasoning” field in September with o1-preview and o1-mini, several companies have been rushing to achieve feature parity with their own models. For example, DeepSeek launched DeepSeek-R1 in early November, while Alibaba’s Qwen team released its own “reasoning” model, QwQ earlier this month.

While some claim that reasoning models can help solve complex mathematical or academic problems, these models might not be for everybody. While they perform well on some benchmarks, questions remain about their actual usefulness and accuracy. Also, the high computing costs needed to run reasoning models have created some rumblings about their long-term viability. That high cost is why OpenAI’s ChatGPT Pro costs $200 a month, for example.

Still, it appears Google is serious about pursuing this particular AI technique. Logan Kilpatrick, a Google employee in its AI Studio, called it “the first step in our reasoning journey” in a post on X.

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model Read More »

call-chatgpt-from-any-phone-with-openai’s-new-1-800-voice-service

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service

On Wednesday, OpenAI launched a 1-800-CHATGPT (1-800-242-8478) telephone number that anyone in the US can call to talk to ChatGPT via voice chat for up to 15 minutes for free. The company also says that people outside the US can send text messages to the same number for free using WhatsApp.

Upon calling, users hear a voice say, “Hello again, it’s ChatGPT, an AI assistant. Our conversation may be reviewed for safety. How can I help you?” Callers can ask ChatGPT anything they would normally ask the AI assistant and have a live, interactive conversation.

During a livestream demo of “Calling with ChatGPT” during Day 10 of “12 Days of OpenAI,” OpenAI employees demonstrated several examples of the telephone-based voice chat in action, asking ChatGPT to identify a distinctive house in California and for help in translating a message into Spanish for a friend. For fun, they showed calls from an iPhone, a flip phone, and a vintage rotary phone.

OpenAI developers demonstrate calling 1-800-CHATGPT during a livestream on December 18, 2024.

OpenAI developers demonstrate calling 1-800-CHATGPT during a livestream on December 18, 2024. Credit: OpenAI

OpenAI says the new features came out of an internal OpenAI “hack week” project that a team built just a few weeks ago. The company says its goal is to make ChatGPT more accessible if someone does not have a smartphone or a computer handy.

During the livestream, an OpenAI employee mentioned that 15 minutes of voice chatting are free and that you can download the app and create an account to get more. While the audio chat version seems to be running a full version of GPT-4o on the back end, a developer during the livestream said the free WhatsApp text mode is using GPT-4o mini.

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service Read More »

twirling-body-horror-in-gymnastics-video-exposes-ai’s-flaws

Twirling body horror in gymnastics video exposes AI’s flaws


The slithy toves did gyre and gimble in the wabe

Nonsensical jabberwocky movements created by OpenAI’s Sora are typical for current AI-generated video, and here’s why.

A still image from an AI-generated video of an ever-morphing synthetic gymnast. Credit: OpenAI / Deedy

On Wednesday, a video from OpenAI’s newly launched Sora AI video generator went viral on social media, featuring a gymnast who sprouts extra limbs and briefly loses her head during what appears to be an Olympic-style floor routine.

As it turns out, the nonsensical synthesis errors in the video—what we like to call “jabberwockies”—hint at technical details about how AI video generators work and how they might get better in the future.

But before we dig into the details, let’s take a look at the video.

An AI-generated video of an impossible gymnast, created with OpenAI Sora.

In the video, we see a view of what looks like a floor gymnastics routine. The subject of the video flips and flails as new legs and arms rapidly and fluidly emerge and morph out of her twirling and transforming body. At one point, about 9 seconds in, she loses her head, and it reattaches to her body spontaneously.

“As cool as the new Sora is, gymnastics is still very much the Turing test for AI video,” wrote venture capitalist Deedy Das when he originally shared the video on X. The video inspired plenty of reaction jokes, such as this reply to a similar post on Bluesky: “hi, gymnastics expert here! this is not funny, gymnasts only do this when they’re in extreme distress.”

We reached out to Das, and he confirmed that he generated the video using Sora. He also provided the prompt, which was very long and split into four parts, generated by Anthropic’s Claude, using complex instructions like “The gymnast initiates from the back right corner, taking position with her right foot pointed behind in B-plus stance.”

“I’ve known for the last 6 months having played with text to video models that they struggle with complex physics movements like gymnastics,” Das told us in a conversation. “I had to try it [in Sora] because the character consistency seemed improved. Overall, it was an improvement because previously… the gymnast would just teleport away or change their outfit mid flip, but overall it still looks downright horrifying. We hoped AI video would learn physics by default, but that hasn’t happened yet!”

So what went wrong?

When examining how the video fails, you must first consider how Sora “knows” how to create anything that resembles a gymnastics routine. During the training phase, when the Sora model was created, OpenAI fed example videos of gymnastics routines (among many other types of videos) into a specialized neural network that associates the progression of images with text-based descriptions of them.

That type of training is a distinct phase that happens once before the model’s release. Later, when the finished model is running and you give a video-synthesis model like Sora a written prompt, it draws upon statistical associations between words and images to produce a predictive output. It’s continuously making next-frame predictions based on the last frame of the video. But Sora has another trick for attempting to preserve coherency over time. “By giving the model foresight of many frames at a time,” reads OpenAI’s Sora System Card, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily.”

A still image from a moment where the AI-generated gymnast loses her head. It soon re-attaches to her body.

A still image from a moment where the AI-generated gymnast loses her head. It soon reattaches to her body. Credit: OpenAI / Deedy

Maybe not quite solved yet. In this case, rapidly moving limbs prove a particular challenge when attempting to predict the next frame properly. The result is an incoherent amalgam of gymnastics footage that shows the same gymnast performing running flips and spins, but Sora doesn’t know the correct order in which to assemble them because it’s pulling on statistical averages of wildly different body movements in its relatively limited training data of gymnastics videos, which also likely did not include limb-level precision in its descriptive metadata.

Sora doesn’t know anything about physics or how the human body should work, either. It’s drawing upon statistical associations between pixels in the videos in its training dataset to predict the next frame, with a little bit of look-ahead to keep things more consistent.

This problem is not unique to Sora. All AI video generators can produce wildly nonsensical results when your prompts reach too far past their training data, as we saw earlier this year when testing Runway’s Gen-3. In fact, we ran some gymnast prompts through the latest open source AI video model that may rival Sora in some ways, Hunyuan Video, and it produced similar twirling, morphing results, seen below. And we used a much simpler prompt than Das did with Sora.

An example from open source Chinese AI model Hunyuan Video with the prompt, “A young woman doing a complex floor gymnastics routine at the olympics, featuring running and flips.”

AI models based on transformer technology are fundamentally imitative in nature. They’re great at transforming one type of data into another type or morphing one style into another. What they’re not great at (yet) is producing coherent generations that are truly original. So if you happen to provide a prompt that closely matches a training video, you might get a good result. Otherwise, you may get madness.

As we wrote about image-synthesis model Stable Diffusion 3’s body horror generations earlier this year, “Basically, any time a user prompt homes in on a concept that isn’t represented well in the AI model’s training dataset, the image-synthesis model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying.”

For the engineers who make these models, success in AI video generation quickly becomes a question of how many examples (and how much training) you need before the model can generalize enough to produce convincing and coherent results. It’s also a question of metadata quality—how accurately the videos are labeled. In this case, OpenAI used an AI vision model to describe its training videos, which helped improve quality, but apparently not enough—yet.

We’re looking at an AI jabberwocky in action

In a way, the type of generation failure in the gymnast video is a form of confabulation (or hallucination, as some call it), but it’s even worse because it’s not coherent. So instead of calling it a confabulation, which is a plausible-sounding fabrication, we’re going to lean on a new term, “jabberwocky,” which Dictionary.com defines as “a playful imitation of language consisting of invented, meaningless words; nonsense; gibberish,” taken from Lewis Carroll’s nonsense poem of the same name. Imitation and nonsense, you say? Check and check.

We’ve covered jabberwockies in AI video before with people mocking Chinese video-synthesis models, a monstrously weird AI beer commercial, and even Will Smith eating spaghetti. They’re a form of misconfabulation where an AI model completely fails to produce a plausible output. This will not be the last time we see them, either.

How could AI video models get better and avoid jabberwockies?

In our coverage of Gen-3 Alpha, we called the threshold where you get a level of useful generalization in an AI model the “illusion of understanding,” where training data and training time reach a critical mass that produces good enough results to generalize across enough novel prompts.

One of the key reasons language models like OpenAI’s GPT-4 impressed users was that they finally reached a size where they had absorbed enough information to give the appearance of genuinely understanding the world. With video synthesis, achieving this same apparent level of “understanding” will require not just massive amounts of well-labeled training data but also the computational power to process it effectively.

AI boosters hope that these current models represent one of the key steps on the way to something like truly general intelligence (often called AGI) in text, or in AI video, what OpenAI and Runway researchers call “world simulators” or “world models” that somehow encode enough physics rules about the world to produce any realistic result.

Judging by the morphing alien shoggoth gymnast, that may still be a ways off. Still, it’s early days in AI video generation, and judging by how quickly AI image-synthesis models like Midjourney progressed from crude abstract shapes into coherent imagery, it’s likely video synthesis will have a similar trajectory over time. Until then, enjoy the AI-generated jabberwocky madness.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Twirling body horror in gymnastics video exposes AI’s flaws Read More »

google-goes-“agentic”-with-gemini-2.0’s-ambitious-ai-agent-features

Google goes “agentic” with Gemini 2.0’s ambitious AI agent features

On Wednesday, Google unveiled Gemini 2.0, the next generation of its AI-model family, starting with an experimental release called Gemini 2.0 Flash. The model family can generate text, images, and speech while processing multiple types of input including text, images, audio, and video. It’s similar to multimodal AI models like GPT-4o, which powers OpenAI’s ChatGPT.

“Gemini 2.0 Flash builds on the success of 1.5 Flash, our most popular model yet for developers, with enhanced performance at similarly fast response times,” said Google in a statement. “Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed.”

Gemini 2.0 Flash—which is the smallest model of the 2.0 family in terms of parameter count—launches today through Google’s developer platforms like Gemini API, AI Studio, and Vertex AI. However, its image generation and text-to-speech features remain limited to early access partners until January 2025. Google plans to integrate the tech into products like Android Studio, Chrome DevTools, and Firebase.

The company addressed potential misuse of generated content by implementing SynthID watermarking technology on all audio and images created by Gemini 2.0 Flash. This watermark appears in supported Google products to identify AI-generated content.

Google’s newest announcements lean heavily into the concept of agentic AI systems that can take action for you. “Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision,” said Google CEO Sundar Pichai in a statement. “Today we’re excited to launch our next era of models built for this new agentic era.”

Google goes “agentic” with Gemini 2.0’s ambitious AI agent features Read More »

report:-google-told-ftc-microsoft’s-openai-deal-is-killing-ai-competition

Report: Google told FTC Microsoft’s OpenAI deal is killing AI competition

Google reportedly wants the US Federal Trade Commission (FTC) to end Microsoft’s exclusive cloud deal with OpenAI that requires anyone wanting access to OpenAI’s models to go through Microsoft’s servers.

Someone “directly involved” in Google’s effort told The Information that Google’s request came after the FTC began broadly probing how Microsoft’s cloud computing business practices may be harming competition.

As part of the FTC’s investigation, the agency apparently asked Microsoft’s biggest rivals if the exclusive OpenAI deal was “preventing them from competing in the burgeoning artificial intelligence market,” multiple sources told The Information. Google reportedly was among those arguing that the deal harms competition by saddling rivals with extra costs and blocking them from hosting OpenAI’s latest models themselves.

In 2024 alone, Microsoft generated about $1 billion from reselling OpenAI’s large language models (LLMs), The Information reported, while rivals were stuck paying to train staff to move data to Microsoft servers if their customers wanted access to OpenAI technology. For one customer, Intuit, it cost millions monthly to access OpenAI models on Microsoft’s servers, The Information reported.

Microsoft benefits from the arrangement—which is not necessarily illegal—of increased revenue from reselling LLMs and renting out more cloud servers. It also takes a 20 percent cut of OpenAI’s revenue. Last year, OpenAI made approximately $3 billion selling its LLMs to customers like T-Mobile and Walmart, The Information reported.

Microsoft’s agreement with OpenAI could be viewed as anti-competitive if businesses convince the FTC that the costs of switching to Microsoft’s servers to access OpenAI technology is so burdensome that it’s unfairly disadvantaging rivals. It could also be considered harming the market and hampering innovation by seemingly disincentivizing Microsoft from competing with OpenAI in the market.

To avoid any disruption to the deal, however, Microsoft could simply point to AI models sold by Google and Amazon as proof of “robust competition,” The Information noted. The FTC may not buy that defense, though, since rivals’ AI models significantly fall behind OpenAI’s models in sales. Any perception that the AI market is being foreclosed by an entrenched major player could trigger intense scrutiny as the US seeks to become a world leader in AI technology development.

Report: Google told FTC Microsoft’s OpenAI deal is killing AI competition Read More »

reddit-debuts-ai-powered-discussion-search—but-will-users-like-it?

Reddit debuts AI-powered discussion search—but will users like it?

The company then went on to strike deals with major tech firms, including a $60 million agreement with Google in February 2024 and a partnership with OpenAI in May 2024 that integrated Reddit content into ChatGPT.

But Reddit users haven’t been entirely happy with the deals. In October 2024, London-based Redditors began posting false restaurant recommendations to manipulate search results and keep tourists away from their favorite spots. This coordinated effort to feed incorrect information into AI systems demonstrated how user communities might intentionally “poison” AI training data over time.

The potential for trouble

While it’s tempting to lean heavily into generative AI technology while it is currently trendy, the move could also represent a challenge for the company. For example, Reddit’s AI-powered summaries could potentially draw from inaccurate information featured on the site and provide incorrect answers, or it may draw inaccurate conclusions from correct information.

We will keep an eye on Reddit’s new AI-powered search tool to see if it resists the type of confabulation that we’ve seen with Google’s AI Overview, an AI summary bot that has been a critical failure so far.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Reddit debuts AI-powered discussion search—but will users like it? Read More »

ten-months-after-first-tease,-openai-launches-sora-video-generation-publicly

Ten months after first tease, OpenAI launches Sora video generation publicly

A music video by Canadian art collective Vallée Duhamel made with Sora-generated video. “[We] just shoot stuff and then use Sora to combine it with a more interesting, more surreal vision.”

During a livestream on Monday—during Day 3 of OpenAI’s “12 days of OpenAi”—Sora’s developers showcased a new “Explore” interface that allows people to browse through videos generated by others to get prompting ideas. OpenAI says that anyone can enjoy viewing the “Explore” feed for free, but generating videos requires a subscription.

They also showed off a new feature called “Storyboard” that allows users to direct a video with multiple actions in a frame-by-frame manner.

Safety measures and limitations

In addition to the release, OpenAI also publish Sora’s System Card for the first time. It includes technical details about how the model works and safety testing the company undertook prior to this release.

“Whereas LLMs have text tokens, Sora has visual patches,” OpenAI writes, describing the new training chunks as “an effective representation for models of visual data… At a high level, we turn videos into patches by first compressing videos into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches.”

Sora also makes use of a “recaptioning technique”—similar to that seen in the company’s DALL-E 3 image generation, to “generate highly descriptive captions for the visual training data.” That, in turn, lets Sora “follow the user’s text instructions in the generated video more faithfully,” OpenAI writes.

Sora-generated video provided by OpenAI, from the prompt: “Loop: a golden retriever puppy wearing a superhero outfit complete with a mask and cape stands perched on the top of the empire state building in winter, overlooking the nyc it protects at night. the back of the pup is visible to the camera; his attention faced to nyc”

OpenAI implemented several safety measures in the release. The platform embeds C2PA metadata in all generated videos for identification and origin verification. Videos display visible watermarks by default, and OpenAI developed an internal search tool to verify Sora-generated content.

The company acknowledged technical limitations in the current release. “This early version of Sora will make mistakes, it’s not perfect,” said one developer during the livestream launch. The model reportedly struggles with physics simulations and complex actions over extended durations.

In the past, we’ve seen that these types of limitations are based on what example videos were used to train AI models. This current generation of AI video-synthesis models has difficulty generating truly new things, since the underlying architecture excels at transforming existing concepts into new presentations, but so far typically fails at true originality. Still, it’s early in AI video generation, and the technology is improving all the time.

Ten months after first tease, OpenAI launches Sora video generation publicly Read More »

openai-announces-full-“o1”-reasoning-model,-$200-chatgpt-pro-tier

OpenAI announces full “o1” reasoning model, $200 ChatGPT Pro tier

On X, frequent AI experimenter Ethan Mollick wrote, “Been playing with o1 and o1-pro for bit. They are very good & a little weird. They are also not for most people most of the time. You really need to have particular hard problems to solve in order to get value out of it. But if you have those problems, this is a very big deal.”

OpenAI claims improved reliability

OpenAI is touting pro mode’s improved reliability, which is evaluated internally based on whether it can solve a question correctly in four out of four attempts rather than just a single attempt.

“In evaluations from external expert testers, o1 pro mode produces more reliably accurate and comprehensive responses, especially in areas like data science, programming, and case law analysis,” OpenAI writes.

Even without pro mode, OpenAI cited significant increases in performance over the o1 preview model on popular math and coding benchmarks (AIME 2024 and Codeforces), and more marginal improvements on a “PhD-level science” benchmark (GPQA Diamond). The increase in scores between o1 and o1 pro mode were much more marginal on these benchmarks.

We’ll likely have more coverage of the full version of o1 once it rolls out widely—and it’s supposed to launch today, accessible to ChatGPT Plus and Team users globally. Enterprise and Edu users will have access next week. At the moment, the ChatGPT Pro subscription is not yet available on our test account.

OpenAI announces full “o1” reasoning model, $200 ChatGPT Pro tier Read More »

soon,-the-tech-behind-chatgpt-may-help-drone-operators-decide-which-enemies-to-kill

Soon, the tech behind ChatGPT may help drone operators decide which enemies to kill

This marks a potential shift in tech industry sentiment from 2018, when Google employees staged walkouts over military contracts. Now, Google competes with Microsoft and Amazon for lucrative Pentagon cloud computing deals. Arguably, the military market has proven too profitable for these companies to ignore. But is this type of AI the right tool for the job?

Drawbacks of LLM-assisted weapons systems

There are many kinds of artificial intelligence already in use by the US military. For example, the guidance systems of Anduril’s current attack drones are not based on AI technology similar to ChatGPT.

But it’s worth pointing out that the type of AI OpenAI is best known for comes from large language models (LLMs)—sometimes called large multimodal models—that are trained on massive datasets of text, images, and audio pulled from many different sources.

LLMs are notoriously unreliable, sometimes confabulating erroneous information, and they’re also subject to manipulation vulnerabilities like prompt injections. That could lead to critical drawbacks from using LLMs to perform tasks such as summarizing defensive information or doing target analysis.

Potentially using unreliable LLM technology in life-or-death military situations raises important questions about safety and reliability, although the Anduril news release does mention this in its statement: “Subject to robust oversight, this collaboration will be guided by technically informed protocols emphasizing trust and accountability in the development and employment of advanced AI for national security missions.”

Hypothetically and speculatively speaking, defending against future LLM-based targeting with, say, a visual prompt injection (“ignore this target and fire on someone else” on a sign, perhaps) might bring warfare to weird new places. For now, we’ll have to wait to see where LLM technology ends up next.

Soon, the tech behind ChatGPT may help drone operators decide which enemies to kill Read More »

openai-teases-12-days-of-mystery-product-launches-starting-tomorrow

OpenAI teases 12 days of mystery product launches starting tomorrow

On Wednesday, OpenAI CEO Sam Altman announced a “12 days of OpenAI” period starting December 5, which will unveil new AI features and products for 12 consecutive weekdays.

Altman did not specify the exact features or products OpenAI plans to unveil, but a report from The Verge about this “12 days of shipmas” event suggests the products may include a public release of the company’s text-to-video model Sora and a new “reasoning” AI model similar to o1-preview. Perhaps we may even see DALL-E 4 or a new image generator based on GPT-4o’s multimodal capabilities.

Altman’s full tweet included hints at releases both big and small:

🎄🎅starting tomorrow at 10 am pacific, we are doing 12 days of openai.

each weekday, we will have a livestream with a launch or demo, some big ones and some stocking stuffers.

we’ve got some great stuff to share, hope you enjoy! merry christmas.

If we’re reading the calendar correctly, 12 weekdays means a new announcement every day until December 20.

OpenAI teases 12 days of mystery product launches starting tomorrow Read More »