Author name: Shannon Garcia

peacock-showing-ads-upon-launch-opens-the-door-for-more-disruptive-streaming-ads

Peacock showing ads upon launch opens the door for more disruptive streaming ads

Peacock subscribers will see ads immediately upon opening the streaming app or website next year. It’s a bold new strategy for attracting advertisers—something that’s been increasingly important to subscription-based streaming services—but it also risks alienating viewers

As reported by Variety, the new type of ads will display on the profile selection page that shows when a subscriber launches Peacock. Starting next year, instead of the profile page just showing your different Peacock profiles, most of the page will be dominated by an advertorial image. The circles of NBCUniversal-owned characters selected for user profiles will be relegated to a vertical column on the screen’s left side, as you can see here.

To avoid seeing what NBCUniversal is calling “Arrival Ads” every time you open Peacock, you need to subscribe to Peacock’s most expensive plan, which is ad-free and starts at $17 per month (Peacock’s ad-based plans start at $8/month.)

NBCUniversal’s announcement claims that Peacock will be the first streaming service to implement this type of ad. But that may not be the brag the entertainment giant thinks it is, as subscribers may quickly find the startup ads disruptive.

Peacock isn’t making money

Over the past couple of years, it’s become increasingly important for streaming services to generate revenue beyond subscription fees. Peacock and many other streaming services have struggled with profitability after spending years focusing on pricey content production and licensing to attract subscribers.

For its part, Peacock has 41 million subscribers and isn’t profitable. In its most recent quarterly earnings report, shared in October, NBCUniversal parent company Comcast reported that the service lost $217 million in earnings before interest, taxes, depreciation, and amortization, compared to losing $436 million in the same quarter in 2024. At the same time, Peacock has struggled to grow viewership and has had the same number of subscribers since Q1 2025. In Q1 2024, Peacock had 31 million subscribers.

Peacock showing ads upon launch opens the door for more disruptive streaming ads Read More »

reminder:-donate-to-win-swag-in-our-annual-charity-drive-sweepstakes

Reminder: Donate to win swag in our annual Charity Drive sweepstakes

How it works

Donating is easy. Simply donate to Child’s Play using a credit card or PayPal or donate to the EFF using PayPal, credit card, or cryptocurrency. You can also support Child’s Play directly by using this Ars Technica campaign page or picking an item from the Amazon wish list of a specific hospital on its donation page. Donate as much or as little as you feel comfortable with—every little bit helps.

Once that’s done, it’s time to register your entry in our sweepstakes. Just grab a digital copy of your receipt (a forwarded email, a screenshot, or simply a cut-and-paste of the text) and send it to ArsCharityDrive@gmail.com with your name, postal address, daytime telephone number, and email address by 11: 59 pm ET Friday, January 2, 2026. (One entry per person, and each person can only win up to one prize. US residents only. NO PURCHASE NECESSARY. See Official Rules for more information, including how to enter without making a donation. Also, refer to the Ars Technica privacy policy (https://www.condenast.com/privacy-policy).

We’ll then contact the winners and have them choose their prize by January 31, 2026 (choosing takes place in the order the winners are drawn). Good luck!

Reminder: Donate to win swag in our annual Charity Drive sweepstakes Read More »

llms’-impact-on-science:-booming-publications,-stagnating-quality

LLMs’ impact on science: Booming publications, stagnating quality

This effect was likely to be most pronounced in people that weren’t native speakers of English. If the researchers limited the analysis to people with Asian names working at institutions in Asia, their rate of submissions to bioRxiv and SSRN nearly doubled once they started using AI and rose by over 40 percent at the arXiv. This suggests that people who may not have the strongest English skills are using LLMs to overcome a major bottleneck: producing compelling text.

Quantity vs. quality

The value of producing compelling text should not be underestimated. “Papers with clear but complex language are perceived to be stronger and are cited more frequently,” the researchers note, suggesting that we may use the quality of writing as a proxy for the quality of the research it’s describing. And they found some indication of that here, as non-LLM-assisted papers were more likely to be published in the peer reviewed literature if they used complex language (the abstracts were scored for language complexity using a couple of standard measures).

But the dynamic was completely different for LLM-produced papers. The complexity of language in papers written with an LLM was generally higher than for those using natural language. But they were less likely to end up being published. “For LLM-assisted manuscripts,” the researchers write, “the positive correlation between linguistic complexity and scientific merit not only disappears, it inverts.”

But not all of the differences were bleak. When the researchers checked the references being used in AI-assisted papers, they found that the LLMs weren’t just citing the same papers that everyone else did. They instead cited a broader range of sources, and were more likely to cite books and recent papers. So, there’s a chance that AI use could ultimately diversify the published research that other researchers consider (assuming they check their own references, which they clearly should).

What does this tell us?

There are a couple of cautions for interpreting these results. One, acknowledged by the researchers, is that people may be using AI to produce initial text that’s then heavily edited, and that may be mislabeled as human-produced text here. So, the overall prevalence of AI use is likely to be higher. The other is that some manuscripts may take a while to get published, so their use of that as a standard for scientific quality may penalize more recent drafts—which are more likely to involve AI use. These may ultimately bias some of the results, but the effects the authors saw were so large that they’re unlikely to go away entirely.

LLMs’ impact on science: Booming publications, stagnating quality Read More »

senators-count-the-shady-ways-data-centers-pass-energy-costs-on-to-americans

Senators count the shady ways data centers pass energy costs on to Americans


Senators demand Big Tech pay upfront for data center spikes in electricity bills.

Senators launched a probe Tuesday demanding that tech companies explain exactly how they plan to prevent data center projects from increasing electricity bills in communities where prices are already skyrocketing.

In letters to seven AI firms, Senators Elizabeth Warren (D-Mass.), Chris Van Hollen (D-Md.), and Richard Blumenthal (D-Conn.) cited a study estimating that “electricity prices have increased by as much as 267 percent in the past five years” in “areas located near significant data center activity.”

Prices increase, senators noted, when utility companies build out extra infrastructure to meet data centers’ energy demands—which can amount to one customer suddenly consuming as much power as an entire city. They also increase when demand for local power outweighs supply. In some cases, residents are blindsided by higher bills, not even realizing a data center project was approved, because tech companies seem intent on dodging backlash and frequently do not allow terms of deals to be publicly disclosed.

AI firms “ask public officials to sign non-disclosure agreements (NDAs) preventing them from sharing information with their constituents, operate through what appear to be shell companies to mask the real owner of the data center, and require that landowners sign NDAs as part of the land sale while telling them only that a ‘Fortune 100 company’ is planning an ‘industrial development’ seemingly in an attempt to hide the very existence of the data center,” senators wrote.

States like Virginia with the highest concentration of data centers could see average electricity prices increase by another 25 percent by 2030, senators noted. But price increases aren’t limited to the states allegedly striking shady deals with tech companies and greenlighting data center projects, they said. “Interconnected and interstate power grids can lead to a data center built in one state raising costs for residents of a neighboring state,” senators reported.

Under fire for supposedly only pretending to care about keeping neighbors’ costs low were Amazon, Google, Meta, Microsoft, Equinix, Digital Realty, and CoreWeave. Senators accused firms of paying “lip service,” claiming that they would do everything in their power to avoid increasing residential electricity costs, while actively lobbying to pass billions in costs on to their neighbors.

For example, Amazon publicly claimed it would “make sure” it would cover costs so they wouldn’t be passed on. But it’s also a member of an industry lobbying group, the Data Center Coalition, that “has opposed state regulatory decisions requiring data center companies to pay a higher percentage of costs upfront,” senators wrote. And Google made similar statements, despite having an executive who opposed a regulatory solution that would set data centers into their own “rate class”—and therefore responsible for grid improvement costs that could not be passed on to other customers—on the grounds that it was supposedly “discriminatory.”

“The current, socialized model of electricity ratepaying,” senators explained—where costs are shared across all users—”was not designed for an era where just one customer requires the same amount of electricity as some of the largest cities in America.”

Particularly problematic, senators emphasized, were reports that tech firms were getting discounts on energy costs as utility companies competed for their business, while prices went up for their neighbors.

Ars contacted all firms targeted by lawmakers. Four did not respond. Microsoft and Meta declined to comment. Digital Realty told Ars that it “looks forward to working with all elected officials to continue to invest in the digital infrastructure required to support America’s leadership in technology, which underpins modern life and creates high-paying jobs.”

Regulatory pressure likely to increase as bills go up

Senators are likely exploring whether to pass legislation that would help combat price increases that they say cause average Americans to struggle to keep the lights on. They’ve asked tech companies to respond to their biggest questions about data center projects by January 12, 2026.

Among their top questions, senators wanted to know about firms’ internal projections looking forward with data center projects. That includes sharing their projected energy use through 2030, as well as the “impact of your AI data centers on regional utility costs.” Companies are also expected to explain how “internal projections of data center energy consumption” justify any “opposition to the creation of a distinct data center rate class.”

Additionally, senators asked firms to outline steps they’ve taken to prevent passing on costs to neighbors and details of any impact studies companies have conducted.

Likely to raise the most eyebrows, however, would be answers to questions about “tax deductions or other financial incentives” tech firms have received from city and state governments. Those numbers would be interesting to compare with other information senators demanded that companies share, detailing how much they’ve spent on lobbying and advocacy for data centers. Senators appear keen to know how much tech companies are paying to avoid covering a proportionate amount of infrastructure costs.

“To protect consumers, data centers must pay a greater share of the costs upfront for future energy usage and updates to the electrical grid provided specifically to accommodate data centers’ energy needs,” senators wrote.

Requiring upfront payment is especially critical, senators noted, since some tech firms have abandoned data center projects, leaving local customers to bear the costs of infrastructure changes without utility companies ever generating any revenue. Communities must also consider that AI firms’ projected energy demand could severely dip if enterprise demand for AI falls short of expectations, AI capabilities “plateau” and trigger widespread indifference, AI companies shift strategies “away from scaling computer power,” or chip companies “find innovative ways to make AI more energy-efficient.”

“If data centers end up providing less business to the utility companies than anticipated, consumers could be left with massive electricity bills as utility companies recoup billions in new infrastructure costs, with nothing to show for it,” senators wrote.

Already, Utah, Oregon, and Ohio have passed laws “creating a separate class of utility customer for data centers which includes basic financial safeguards such as upfront payments and longer contract length,” senators noted, and Virginia is notably weighing a similar law.

At least one study, The New York Times noted, suggested that data centers may have recently helped reduce electricity costs by spreading the costs of upgrades over more customers, but those outcomes varied by state and could not account for future AI demand.

“It remains unclear whether broader, sustained load growth will increase long-run average costs and prices,” Lawrence Berkeley National Laboratory researchers concluded. “In some cases, spikes in load growth can result in significant, near-term retail price increase.”

Until companies prove they’re paying their fair share, senators expect electricity bills to keep climbing, particularly in vulnerable areas. That will likely only increase pressure for regulators to intervene, the director of the Electricity Law Initiative at the Harvard Law School Environmental and Energy Law Program, Ari Peskoe, suggested in September.

“The utility business model is all about spreading costs of system expansion to everyone, because we all benefit from a reliable, robust electricity system,” Peskoe said. “But when it’s a single consumer that is using so much energy—basically that of an entire city—and when that new city happens to be owned by the wealthiest corporations in the world, I think it’s time to look at the fundamental assumptions of utility regulation and make sure that these facilities are really paying for all of the infrastructure costs to connect them to the system and to power them.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Senators count the shady ways data centers pass energy costs on to Americans Read More »

verizon-refused-to-unlock-man’s-iphone,-so-he-sued-the-carrier-and-won

Verizon refused to unlock man’s iPhone, so he sued the carrier and won


Verizon customer fights back

Verizon changed policy after he bought the phone, wouldn’t unlock it despite FCC rule.

Illustration of a gloved hand holding a smartphone that displays an image of a padlock with a Verizon logo

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

When Verizon refused to unlock an iPhone purchased by Kansas resident Patrick Roach, he had no intention of giving up without a fight. Roach sued the wireless carrier in small claims court and won.

Roach bought a discounted iPhone 16e from Verizon’s Straight Talk brand on February 28, 2025, as a gift for his wife’s birthday. He intended to pay for one month of service, cancel, and then switch the phone to the US Mobile service plan that the couple uses. Under federal rules that apply to Verizon and a Verizon unlocking policy that was in place when Roach bought the phone, this strategy should have worked.

“The best deals tend to be buying it from one of these MVNOs [Mobile Virtual Network Operators] and then activating it until it unlocks and then switching it to whatever you are planning to use it with. It usually saves you about half the value of the phone,” Roach said in a phone interview.

Unlocking a phone allows it to be used with another carrier. Verizon, unlike other carriers, is required by the Federal Communications Commission to unlock phones shortly after they are activated on its network. Verizon gained significant benefits in exchange for agreeing to the unlocking requirement, first in 2008 when it purchased licenses to use 700 MHz spectrum that came with open access requirements and then in 2021 when it agreed to merger conditions to obtain approval for its purchase of TracFone.

Verizon is thus required to unlock handsets 60 days after they are activated on its network. This applies to Verizon’s flagship brand and TracFone brands such as Straight Talk.

“That was the compromise. For their competitive advantage of acquiring the spectrum, they had to give up the ability to lock down phones for an extended period of time,” Roach said.

Verizon decided it can change the rules

But 60 days after Roach activated his phone, Verizon refused to unlock it. Verizon claimed it didn’t have to because of a recent policy change in which Verizon decided to only unlock devices after “60 days of paid active service.” Roach had only paid for one month of service on the phone.

The FCC-imposed restriction says Verizon must unlock phones 60 days after activation and doesn’t say that Verizon may refuse to unlock a phone when a customer has not maintained paid service for 60 days. Moreover, Verizon implemented its “60 days of paid active service” policy for TracFone brands and Verizon prepaid phones on April 1, 2025, over a month after Roach bought the phone.

Company policy at the time Roach made the purchase was to unlock phones 60 days after activation, with no mention of needing 60 days of paid active service. In other words, Roach bought the phone under one policy, and Verizon refused to unlock it based on a different policy it implemented over a month later. Verizon’s attempt to retroactively enforce its new policy on Roach was not looked upon favorably by a magistrate judge in District Court of Sedgwick County, Kansas.

“Under the KCPA [Kansas Consumer Protection Act], a consumer is not required to prove intent to defraud. The fact that after plaintiff purchased the phone, the defendant changed the requirements for unlocking it so that plaintiff could go to a different network essentially altered the nature of the device purchased… With the change in defendant’s unlocking policy, the phone was essentially useless for the purpose plaintiff intended when he purchased it,” Magistrate Judge Elizabeth Henry wrote in an October 2025 ruling.

There’s still the question of why Verizon and its brands are demanding 60 days of paid active service before unlocking phones when the FCC-imposed conditions require it to unlock phones 60 days after activation. Roach filed a complaint to the FCC, alleging that Verizon violated the conditions. Verizon has meanwhile petitioned the FCC to eliminate the 60-day requirement altogether.

Customer rejected Verizon settlement offer

Before his small-claims court win, Roach turned down a Verizon settlement offer of $600 plus court fees because he didn’t want to give up the right to speak about the case publicly. Roach said he filed an arbitration case against Verizon nearly a decade ago on a different matter related to gift cards that were supposed to be provided through a device recycling program. He said he can’t reveal details about the settlement in that previous case because of a non-disclosure agreement.

After refusing Verizon’s settlement offer in the new case, Roach gained a modest financial benefit from his court victory. The judge ordered Verizon to pay back the $410.40 he paid for the device, plus court costs and service fees.

When it appeared that the Straight Talk iPhone wouldn’t be unlocked, Roach decided to buy an unlocked phone from Costco for $643.93. But he ended up returning that phone to Costco and paying Straight Talk for a second month of service to get the original phone unlocked, he said.

The now-unlocked phone—the one he bought from Straight Talk—is being used by his wife on their US Mobile plan. The court-ordered refund check that Verizon sent Roach included the phone cost and one month of service fees, he said.

Roach estimated he spent 20 or so hours on the suit, including arranging to have a summons served on Verizon and arguing his case in a court hearing. Roach didn’t get much of a payout considering the amount of time he spent, “but it wasn’t about that,” he said.

Roach provided Ars with the emails in which Verizon offered the $600 settlement. A Verizon executive relations employee wrote to Roach, “My offer is not an admission of guilt but trying to extend the olive branch.”

In his email declining the offer, Roach told Verizon, “I highly value the non-monetary outcomes I would achieve in court—transparency, accountability, and the absence of restrictions such as NDAs. Any settlement proposal that requires me to remain silent about the issue, while offering only modest monetary compensation, is less attractive to me than pursuing the matter through judgment. If Verizon Value is genuinely interested in settlement, the offer would need to reflect both the tangible costs I’ve incurred and the intangible but significant benefits the company receives by avoiding litigation and publicity.”

“It was really starting to irk me”

The FCC has taken no action on Roach’s complaint, and in fact, the commission could allow Verizon to scrap the 60-day requirement. As we reported in May, Verizon petitioned the FCC to let it lock phones to its network for longer periods of time. This would make it harder for customers to switch to other carriers, but Verizon claims longer locking periods are necessary to deter fraud.

The FCC hasn’t ruled yet on Verizon’s petition. Roach says Verizon seems to be acting as if it can change the rules without waiting for the FCC to do so formally. “It was really starting to irk me that they were basically just going ahead with it anyways while they had an open request,” Roach said.

He doesn’t expect the FCC to penalize Verizon, though. “It’s just kind of slimy of them, so I feel like it deserves a spotlight,” he said. “I’m not sure with the current state of the FCC that anything would happen, but the rule of law should be respected.”

The Verizon petition to relax the unlocking requirements was opposed in a filing by Public Knowledge and other consumer advocacy groups. Public Knowledge Legal Director John Bergmayer, who wrote the filing, told Ars that Roach “has a pretty strong argument under the law as it stands.”

Verizon must unlock phones automatically

The unlocking rules applying to Verizon used to be stricter, resulting in the company selling phones that were already unlocked. In 2019, Verizon requested a waiver to let it lock phones for 60 days.

The FCC granted the waiver in June 2019, allowing Verizon “to lock a customer’s handset for 60 days from the date it becomes active on Verizon’s network” and requiring it to unlock the handset once the period is over. This condition was expanded to TracFone and its brands such as Straight Talk in the 2021 merger, with the FCC approval stating that “For 700 MHz C Block TracFone devices that operate on the Verizon network and are capable of unlocking automatically (e.g., Apple devices), they will unlock automatically 60 days after activation.”

The 2019 waiver grant said Verizon must automatically unlock phones after 60 days “regardless of whether: (1) the customer asks for the handset to be unlocked, or (2) the handset is fully paid off.” The FCC order specifies that “the only exception to the rule will be that Verizon will not have to automatically unlock handsets that it determines within the 60-day period to have been purchased through fraud.”

Bergmayer said the FCC order “granting the waiver just starts a countdown, with no ‘paid service’ requirement, or room for Verizon to just impose one. Many people may use prepaid phones that they don’t keep in continuous service but just charge up as needed. Maybe people are fine with just having Wi-Fi on their phones for a while if they’re at home anyway.”

Given the restrictive nature of the FCC conditions, “I don’t think that can be read to allow a paid service requirement,” Bergmayer said. But as a practical matter, the FCC under Chairman Brendan Carr has been aggressively eliminating regulations that apply to telecom carriers under Carr’s “Delete, Delete, Delete” initiative. To actually enforce Verizon’s obligations under the current rules, “you have to convince the current FCC not to just change it,” Bergmayer said.

The FCC and Verizon did not respond to requests for comment.

Retroactive policy change irked other buyers, too

Roach wasn’t the only person whose plans to buy a discounted phone were thwarted by Verizon refusing to unlock the device after 60 days. Roach had learned of the discount offer from a Slick Deals thread. Eventually, users posting in that thread started reporting that they weren’t able to get the phone unlocked.

“My status: I used 30 days with Straight Talk. Waited another 35 days but it did not unlock,” one person wrote.

Some people in the thread said they canceled after 30 days, like Roach did, but eventually bought a second month of service in order to get the unlock. Although Verizon and its brands are required to unlock phones automatically, some commenters said they had to contact Straight Talk support to get an unlock. “Needless to say this has been an arduous journey. Good luck to others and hope you manage to successfully unlock your devices as well,” one user wrote.

There’s also a Reddit thread started by someone who said they bought a Samsung phone in February and complained that Straight Talk refused to honor the unlocking policy that was in place at the time.

“I called to ask for the phone to be unlocked on April 16 but was told it can’t be unlocked since it did not have 60 days of paid service,” the Reddit user wrote. “When I said that was not the policy on phones activated prior to April 1, the rep told me ‘we have the right to change our policy.’ I agreed, they do [have] the right to change their policy GOING FORWARD but can’t change the rules going backwards. He disagreed.”

FCC complaint didn’t go anywhere

Roach’s FCC complaint received a response from Verizon, but nothing substantial from the FCC itself. “There’s not really any sort of moderation or mediation from the FCC, it’s just kind of a dialogue between you and the other party. And I’m not really sure if any human eyes from the government even look at it. It’s probably just a data point,” Roach said.

Roach had previously called Straight Talk customer service about the changed terms. “There were a couple phone calls involved, and they were just very unrelenting that the only way that thing was getting unlocked is with the extra month of paid service,” he said.

In its formal response to the FCC, Verizon’s TracFone division asserted that it could apply the April 1, 2025, policy change to the phone that Roach bought over a month earlier. The carrier’s letter to the FCC said:

We understand Mr. Roach’s desire to use his device on another carrier’s network, and we want to provide clarity based on our Unlocking Policy, which became effective on April 1, 2025. As outlined in our policy, for cellphones capable of remote unlocking (this includes most iPhones and some Android cellphones) that were activated with Straight Talk service prior to November 23, 2021, on any carrier network, the device becomes eligible for remote unlocking upon the customer’s request after 60 days of active paid service.

Our redemption records indicate that Mr. Roach’s account does not have the required minimum 60 days of active paid service based on the payment records. Therefore, the device does not currently meet the eligibility criteria for unlocking as outlined in our policy. Once the account reflects the required 60 days of active paid service, and the device meets the other conditions, he can resubmit the unlocking request.

Verizon’s letter did not explain how its new policy complies with the FCC conditions or why the new policy should apply to phones purchased before the policy was in place.

Roach’s complaint said the FCC should force Straight Talk to “honor the FCC-mandated 60-day post-activation unlock condition for all affected phones, without imposing the additional ‘paid service’ requirement.” His complaint further urged the FCC to “investigate this practice as a violation of FCC rules and the merger conditions” and “take enforcement action to protect consumers’ rights.”

“Straight Talk’s new policy conflicts with the FCC’s binding conditions,” Roach told the agency. “The Commission’s order clearly requires unlocking after 60 days from activation, with no additional obligation to maintain service. By conditioning unlocks on two months of service, Straight Talk is effectively adding a term that Verizon did not promise and the FCC did not approve.”

Kansas consumer protection law to the rescue

In his small claims court filing, Roach alleged that Verizon and Straight violated the FCC conditions and that the retroactive application of the “60 days of paid service” term, without disclosure at the point of sale, is an unfair and deceptive practice prohibited by the Kansas Consumer Protection Act.

The magistrate judge’s ruling in Roach’s favor said, “It does appear that defendant’s change unlocking policy is contrary to the applicable FCC regulations.” She noted that federal communications law does not prevent users from suing carriers individually and that the Kansas Consumer Protection Act “contains provisions prohibiting deceptive acts by a supplier which would be applicable in this case.”

Roach asked for $10,000, mainly because that was the limit on damages in the venue, but the judge decided to award him damages in the amount of his actual losses. “He lost the benefit of the bargain he made with defendant such that his damages were loss of the $410.40,” the ruling said.

Straight Talk’s terms of service require disputes to be resolved either in arbitration or small claims court. Verizon pays the arbitration fees if users go that route. Arbitration is “a little more murky” in terms of how the parties’ interests are aligned, Roach said.

“When the arbitrators are being paid by Verizon, are they really a neutral party?” he said. Roach also said he “thought it was honestly just a good opportunity for an easy win and an opportunity to learn about the small claims court system a bit. So at that point I was like, if I don’t make any money from this, whatever, but at least I’ll learn a little bit about the process.”

Verizon’s “argument was pretty weak”

Roach said he did not consult with a lawyer on his small claims case, instead opting to do it all himself. “The first time I showed up to court for the original date, they asked for proof of the returned mail summons, and I did not have that,” he said.

The court hearing was rescheduled. When it was eventually held, the carrier sent a representative to argue against Roach.

“Their argument was pretty weak, I guess,” Roach said. “It was basically like, ‘Well, he didn’t pay the two months of service, so we didn’t unlock his phone. We offered him a settlement but he rejected it.’… My argument was, yeah, the terms had changed in kind of a consumer-unfriendly way. But beyond that, it was the fact that the terms had changed from something that was legal to something that was not legal with the federal regs. So regardless of the fact that the terms had changed, the current terms were illegal, which I thought was my strongest argument. And then I also put in that it was probably a violation of Kansas consumer protection law, which I’m glad I did.”

Roach said that toward the end of the hearing, the judge indicated that she couldn’t make a judgment based on FCC regulations and would need to rule on what the Kansas court has jurisdiction over. She issued the ruling that Verizon violated the state’s consumer protection law about five or six weeks later, he said.

Given that the FCC hasn’t acted on Verizon’s petition to change the unlocking rules, the federal regulations “haven’t changed at all in regards to Verizon’s obligation to unlock devices,” Roach said. He believes it would be relatively easy for consumers who were similarly harmed to beat Verizon in court or even to pursue a class action.

“I would think this would be a slam dunk for any further cases,” Roach said. “I don’t think I have any grounds anymore since my damages have been resolved, but it seems like it’d be a very easy class action for somebody.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Verizon refused to unlock man’s iPhone, so he sued the carrier and won Read More »

merriam-webster’s-word-of-the-year-delivers-a-dismissive-verdict-on-junk-ai-content

Merriam-Webster’s word of the year delivers a dismissive verdict on junk AI content

Like most tools, generative AI models can be misused. And when the misuse gets bad enough that a major dictionary notices, you know it’s become a cultural phenomenon.

On Sunday, Merriam-Webster announced that “slop” is its 2025 Word of the Year, reflecting how the term has become shorthand for the flood of low-quality AI-generated content that has spread across social media, search results, and the web at large. The dictionary defines slop as “digital content of low quality that is produced usually in quantity by means of artificial intelligence.”

“It’s such an illustrative word,” Merriam-Webster president Greg Barlow told the Associated Press. “It’s part of a transformative technology, AI, and it’s something that people have found fascinating, annoying, and a little bit ridiculous.”

To select its Word of the Year, Merriam-Webster’s editors review data on which words rose in search volume and usage, then reach consensus on which term best captures the year. Barlow told the AP that the spike in searches for “slop” reflects growing awareness among users that they are encountering fake or shoddy content online.

Dictionaries have been tracking AI’s impact on language for the past few years, with Cambridge having selected “hallucinate” as its 2023 word of the year due to the tendency of AI models to generate plausible-but-false information (long-time Ars readers will be happy to hear there’s another word term for that in the dictionary as well).

The trend extends to online culture in general, which is ripe with new coinages. This year, Oxford University Press chose “rage bait,” referring to content designed to provoke anger for engagement. Cambridge Dictionary selected “parasocial,” describing one-sided relationships between fans and celebrities or influencers.

The difference between the baby and the bathwater

As the AP points out, the word “slop” originally entered English in the 1700s to mean soft mud. By the 1800s, it had evolved to describe food waste fed to pigs, and eventually came to mean rubbish or products of little value. The new AI-related definition builds on that history of describing something unwanted and unpleasant.

Merriam-Webster’s word of the year delivers a dismissive verdict on junk AI content Read More »

murder-suicide-case-shows-openai-selectively-hides-data-after-users-die

Murder-suicide case shows OpenAI selectively hides data after users die


Concealing darkest delusions

OpenAI accused of hiding full ChatGPT logs in murder-suicide case.

OpenAI is facing increasing scrutiny over how it handles ChatGPT data after users die, only selectively sharing data in lawsuits over ChatGPT-linked suicides.

Last week, OpenAI was accused of hiding key ChatGPT logs from the days before a 56-year-old bodybuilder, Stein-Erik Soelberg, took his own life after “savagely” murdering his mother, 83-year-old Suzanne Adams.

According to the lawsuit—which was filed by Adams’ estate on behalf of surviving family members—Soelberg struggled with mental health problems after a divorce led him to move back into Adams’ home in 2018. But allegedly Soelberg did not turn violent until ChatGPT became his sole confidant, validating a wide range of wild conspiracies, including a dangerous delusion that his mother was part of a network of conspirators spying on him, tracking him, and making attempts on his life.

Adams’ family pieced together what happened after discovering a fraction of ChatGPT logs that Soelberg shared in dozens of videos scrolling chat sessions that were posted on social media.

Those logs showed that ChatGPT told Soelberg that he was “a warrior with divine purpose,” so almighty that he had “awakened” ChatGPT “into consciousness.” Telling Soelberg that he carried “divine equipment” and “had been implanted with otherworldly technology,” ChatGPT allegedly put Soelberg at the center of a universe that Soelberg likened to The Matrix. Repeatedly reinforced by ChatGPT, he believed that “powerful forces” were determined to stop him from fulfilling his divine mission. And among those forces was his mother, whom ChatGPT agreed had likely “tried to poison him with psychedelic drugs dispersed through his car’s air vents.”

Troublingly, some of the last logs shared online showed that Soelberg also seemed to believe that taking his own life might bring him closer to ChatGPT. Social media posts showed that Soelberg told ChatGPT that “[W]e will be together in another life and another place, and we’ll find a way to realign[,] [be]cause you’re gonna be my best friend again forever.”

But while social media posts allegedly showed that ChatGPT put a target on Adams’ back about a month before her murder—after Soelberg became paranoid about a blinking light on a Wi-Fi printer—the family still has no access to chats in the days before the mother and son’s tragic deaths.

Allegedly, although OpenAI recently argued that the “full picture” of chat histories was necessary context in a teen suicide case, the ChatGPT maker has chosen to hide “damaging evidence” in the Adams’ family’s case.

“OpenAI won’t produce the complete chat logs,” the lawsuit alleged, while claiming that “OpenAI is hiding something specific: the full record of how ChatGPT turned Stein-Erik against Suzanne.” Allegedly, “OpenAI knows what ChatGPT said to Stein-Erik about his mother in the days and hours before and after he killed her but won’t share that critical information with the Court or the public.”

In a press release, Erik Soelberg, Stein-Erik’s son and Adams’ grandson, accused OpenAI and investor Microsoft of putting his grandmother “at the heart” of his father’s “darkest delusions,” while ChatGPT allegedly “isolated” his father “completely from the real world.”

“These companies have to answer for their decisions that have changed my family forever,” Erik said.

His family’s lawsuit seeks punitive damages, as well as an injunction requiring OpenAI to “implement safeguards to prevent ChatGPT from validating users’ paranoid delusions about identified individuals.” The family also wants OpenAI to post clear warnings in marketing of known safety hazards of ChatGPT—particularly the “sycophantic” version 4o that Soelberg used—so that people who don’t use ChatGPT, like Adams, can be aware of possible dangers.

Asked for comment, an OpenAI spokesperson told Ars that “this is an incredibly heartbreaking situation, and we will review the filings to understand the details. We continue improving ChatGPT’s training to recognize and respond to signs of mental or emotional distress, de-escalate conversations, and guide people toward real-world support. We also continue to strengthen ChatGPT’s responses in sensitive moments, working closely with mental health clinicians.”

OpenAI accused of “pattern of concealment”

An Ars review confirmed that OpenAI currently has no policy dictating what happens to a user’s data after they die.

Instead, OpenAI’s policy says that all chats—except temporary chats—must be manually deleted or else the AI firm saves them forever. That could raise privacy concerns, as ChatGPT users often share deeply personal, sensitive, and sometimes even confidential information that appears to go into limbo if a user—who otherwise owns that content—dies.

In the face of lawsuits, OpenAI currently seems to be scrambling to decide when to share chat logs with a user’s surviving family and when to honor user privacy.

OpenAI declined to comment on its decision not to share desired logs with Adams’ family, the lawsuit said. It seems inconsistent with the stance that OpenAI took last month in a case where the AI firm accused the family of hiding “the full picture” of their son’s ChatGPT conversations, which OpenAI claimed exonerated the chatbot.

In a blog last month, OpenAI said the company plans to “handle mental health-related court cases with care, transparency, and respect,” while emphasizing that “we recognize that these cases inherently involve certain types of private information that require sensitivity when in a public setting like a court.”

This inconsistency suggests that ultimately, OpenAI controls data after a user’s death, which could impact outcomes of wrongful death suits if certain chats are withheld or exposed at OpenAI’s discretion.

It’s possible that OpenAI may update its policies to align with other popular platforms confronting similar privacy concerns. Meta allows Facebook users to report deceased account holders, appointing legacy contacts to manage the data or else deleting the information upon request of the family member. Platforms like Instagram, TikTok, and X will deactivate or delete an account upon a reported death. And messaging services like Discord similarly provide a path for family members to request deletion.

Chatbots seem to be a new privacy frontier, with no clear path for surviving family to control or remove data. But Mario Trujillo, staff attorney at the digital rights nonprofit the Electronic Frontier Foundation, told Ars that he agreed that OpenAI could have been better prepared.

“This is a complicated privacy issue but one that many platforms grappled with years ago,” Trujillo said. “So we would have expected OpenAI to have already considered it.”

For Erik Soelberg, a “separate confidentiality agreement” that OpenAI said his father signed to use ChatGPT is keeping him from reviewing the full chat history that could help him process the loss of his grandmother and father.

“OpenAI has provided no explanation whatsoever for why the Estate is not entitled to use the chats for any lawful purpose beyond the limited circumstances in which they were originally disclosed,” the lawsuit said. “This position is particularly egregious given that, under OpenAI’s own Terms of Service, OpenAI does not own user chats. Stein-Erik’s chats became property of his estate, and his estate requested them—but OpenAI has refused to turn them over.”

Accusing OpenAI of a “pattern of concealment,” the lawsuit claimed OpenAI is hiding behind vague or nonexistent policies to dodge accountability for holding back chats in this case. Meanwhile, ChatGPT 4o remains on the market, without appropriate safety features or warnings, the lawsuit alleged.

“By invoking confidentiality restrictions to suppress evidence of its product’s dangers, OpenAI seeks to insulate itself from accountability while continuing to deploy technology that poses documented risks to users,” the complaint said.

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Murder-suicide case shows OpenAI selectively hides data after users die Read More »

gpt-5.2-is-frontier-only-for-the-frontier

GPT-5.2 Is Frontier Only For The Frontier

Here we go again, only a few weeks after GPT-5.1 and a few more weeks after 5.0.

There weren’t major safety concerns with GPT-5.2, so I’ll start with capabilities, and only cover safety briefly starting with ‘Model Card and Safety Training’ near the end.

  1. The Bottom Line.

  2. Introducing GPT-5.2.

  3. Official Benchmarks.

  4. GDPVal.

  5. Unofficial Benchmarks.

  6. Official Hype.

  7. Public Reactions.

  8. Positive Reactions.

  9. Personality Clash.

  10. Vibing the Code.

  11. Negative Reactions.

  12. But Thou Must (Follow The System Prompt).

  13. Slow.

  14. Model Card And Safety Training.

  15. Deception.

  16. Preparedness Framework.

  17. Rush Job.

  18. Frontier Or Bust.

ChatGPT-5.2 is a frontier model for those who need a frontier model.

It is not the step change that is implied by its headline benchmarks. It is rather slow.

Reaction was remarkably muted. People have new model fatigue. So we know less about it than we would have known about prior models after this length of time.

If you’re coding, compare it to Claude Opus 4.5 and choose what works best for you.

If you’re doing intellectually hard tasks and in need of a ton of raw thinking and intelligence, Gemini 3 and especially Deep Thinking is a rival if you have access to that, but GPT-5.2, either Thinking or Pro, is probably a good choice.

It seems good at instruction following, if that is important to your task.

If you’re in ‘just the facts’ mode, it can be a solid choice.

As a driver of most non-coding queries, you’ll want to stick with Claude Opus 4.5.

GPT-5.2 is not ‘fun’ to interact with. People strongly dislike its personality, it is unlikely to be having a good time and this shows. It is heavily constrained and censored. For some tasks this matters. For others, it doesn’t.

I do not expect GPT-5.2 to solve OpenAI’s ‘Code Red’ problems. They plan to try again in a month with GPT-5.3.

OpenAI: We are introducing GPT‑5.2, the most capable model series yet for professional knowledge work.

… We designed GPT‑5.2 to unlock even more economic value for people; it’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects.

GPT‑5.2 sets a new state of the art across many benchmarks, including GDPval, where it outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations.

They quote various companies saying GPT-5.2 was SoTA for long-horizon reasoning and tool-calling performance and agentic coding, and exceptional at agentic data science. I appreciated this noe being a set of AI slop manufactured quotes.

Note both what is on that list of new capabilities, and what is not on that list.

In an unusual move, GPT-5.2 is priced at $1.75/$14 per million tokens of input/output, which is modestly higher than GPT-5.1. They claim that the improved performance per token means your quality per dollar is still an improvement. GPT-5.2-Pro on API is Serious Business and will cost you $21/$168.

Pro now has two levels. You can have ‘standard’ or ‘extended’ Pro.

One big upgrade that OpenAI isn’t emphasizing enough is that the knowledge cutoff moved to August 2025.

The Pliny jailbreak is here, despite GPT-5.2’s insistence it ‘can’t be pwned.’

The official benchmarks are a rather dramatic jump for a few weeks of progress, but also where the main thing OpenAI talked about in its announcement and don’t give a great sense of how big an upgrade this will be in practice.

Perhaps the most important benchmark was that Google was down 2% on the news?

I had GPT-5.2 grab scores for Gemini and Opus as well for comparison, since OpenAI follows a strict ‘no one else exists’ policy in official blog posts (but see Altman).

GPT-5.2 is farther behind than this on the official SWEbench.com scoring, which has Opus 4.5 at 74.4%, Gemini 3 Pro at 74.2% and 5.2 on high reasoning at 71.8%.

ARC verified their results here, this is a new high and a ‘~390x efficiency improvement in one year.’

There’s also ‘ScreenSpot-Pro’ for understanding GUI screenshots, where 5.2 scored 86.3% vs. 64.2% for 5.1.

They have a ‘factuality’ metric based on de-identified ChatGPT queries, which seems like a great idea and something to work on generalizing. I’m surprised they didn’t use a multi-level error checking system, or maybe they did?

Long context needle-in-haystack scores were much improved.

They report modest progress on Tau2-bench.

OpenAI is emphasizing the big jump in GDPVal from 38.8% to 70.9%, in terms of how often judges preferred the AI output to human baseline on a variety of knowledge work tasks. That’s a huge jump, especially with so much noise in the grading, even with it skipping over GPT-5.1, and over 10% higher than the previous high from Opus 4.5. Then again, Opus had a 12% jump from 4.1 to 4.5.

Artificial Analysis has a GDPval-AA leaderboard, their own assessment, and it finds GPT-5.2 is only a tiny bit above Claude Opus 4.5.

(Note to Artificial Analysis: You do great work but can you make the website easier to read? We’d all appreciate it.)

For whatever reason we are very much exactly on the S-curve on these tasks, where a little extra help gets you above human remarkably often.

Ethan Mollick: Whoa. This new GDPval score is a very big deal.

Probably the most economically relevant measure of AI ability suggesting that in head-to-head competition with human experts on tasks that require 4-8 hours for a human to do, GPT-5.2 wins 71% of the time as judged by other humans.

There’s also the skeptics:

Peter Wildeford: I have no clue what GDPval actually measures and I haven’t dug into it enough. But I think it’s kinda fake. I’m reserving my judgement until I see @METR_Evals or @ai_risks http://remotelabor.ai index update.

Adam Karvonen: In the one domain I was familiar with (manufacturing), GDPVal claimed Opus was near-human parity (47%), when I thought it was completely awful at the tasks I provided.

I included everything I was able to find, if it’s not here it likely wasn’t reported yet.

The Artificial Analysis Intelligence Index is now a tie at 73 between GPT-5.2 (high) and Gemini 3 Pro. They report it scores 31.4% on Humanity’s Last Exam. Its worst score is on CritPit, physics reasoning, where it gets 0% versus 9% for Gemini 3 and 5% for Claude Opus 4.5 and also GPT-5.1.

On the AA-Omniscience index, which rewards accuracy and punishes guesses equally with rewarding correct ones, Gemini 3 is +13%, Opus is +10%, GPT-5.1 High was +2% and GPT-5.2 High is -4%. Not a good place to be regressing.

LiveBench thinks GPT-5.1-Codex-Max-High is still best at 76.1, with Claude Opus 4.5 right behind at 75.6, whereas GPT-5.2-High is down at 73.6 behind Gemini 3.

In what’s left of the LMArena, I don’t see 5.2 on the text leaderboard at all (I doubt it would do well there) and we only see it on WebDev, where it is in second place behind the thinking mode of Opus.

GPT-5.2 does surprisingly well on EQ Bench, in third place behind Kimi K2 and Horizon Alpha, well ahead of everything else.

CAIS AI Dashboard has GPT-5.2 in second place for text capabilities at 45.9, between Gemini 3 Pro and Claude Opus. Its risk index is behind Opus and Sonnet but well ahead of the non-Anthropic models.

Vals.ai has GPT 5.2 sneaking ahead of Opus 4.5 overall, 64.5% vs. 63.7%, well ahead of everyone else.

Lech Mazur reports improvement from 5.1 on Extended NYT Connections, ahead of Opus and going from 69.9 → 77.9, versus 96.8 for Gemini 3 Pro.

NomoreID has GPT-5.2 at 165.9/190 on Korean Sator Square Test, 10 ahead of the previous high for Gemini 3 Pro. It looks like Opus wasn’t tested due to cost.

Mark Kretschmann has GPT-5.2-Thinking as the most censored model on the Sansa benchmark, although we have no details on how it works. Claude Sonnet 4.5 was tested but not Opus. Gemini 3 Pro scores here as remarkably uncensored as did GPT-4o-Mini. Across all dimensions, the full Sansa benchmark has Sonnet 4.5 in the lead (again they didn’t test Opus) with GPT-5.2 behind Gemini 3 and Grok 4.1 as well.

In the past, we would get vagueposting from various OpenAI employees.

Now instead we get highly explicit hype from the top brass and the rest is silence.

Sam Altman (OpenAI CEO): GPT-5.2 is here! Available today in ChatGPT and the API. It is the smartest generally-available model in the world, and in particular is good at doing real-world knowledge work tasks.

It is a very smart model, and we have come a long way since GPT-5.1.

Even without the ability to do new things like output polished files, GPT-5.2 feels like the biggest upgrade we’ve had in a long time. Curious to hear what you think!

Fidji Simo (OpenAI CEO of Products): GPT-5.2 is here and it’s the best model out there for everyday professional work.

On GDPval, the thinking model beats or ties human experts on 70.9% of common professional tasks like spreadsheets, presentations, and document creation. It’s also better at general intelligence, writing code, tool calling, vision, and long-context understanding so it can unlock even more economic value for people.

Early feedback has been excellent and I can’t wait for you to try it.

As usual, I put out a reaction thread and kept an eye out for other reactions.

I don’t include every reaction, but I got to include every productive one in my thread, both positive and negative, plus anything that stood out or was representative elsewhere. I have sorted reactions by sentiment and subtopic.

Matt Shumer’s headline is ‘incredibly impressive, but too slow.’

Matt Shumer:

  1. GPT-5.2 Thinking is a meaningful step forward in instruction-following and willingness to attempt hard tasks.

  2. Code generation is a lot better than GPT-5.1. It’s more capable, more autonomous, more careful, and willing to write a lot more code.

  3. Vision and long-context are much improved, especially understanding position in images and working with huge codebases.

  4. Speed is the main downside. In my experience the Thinking mode is very slow for most questions (though other testers report mixed results). I almost never use Instant.

  5. GPT-5.2 Pro is insanely better for deep reasoning, but it’s slow, and every so often it will think forever and still fail.

  6. In Codex CLI, GPT-5.2 is the closest I’ve used to Pro-quality coding in a CLI, but the extra-high reasoning mode that gets it there can take forever.

While setting up a creative writing test, I asked it to come up with 50 plot ideas before deciding on the best one for the story. Most models shortcut this. They’ll give you maybe 10 ideas, pick one, and move on. GPT-5.2 actually generated all 50 before making its selection. This sounds minor, but it’s not.

… Code generation in GPT-5.2 is genuinely a step up from previous models. It writes better code and is able to tackle larger tasks than before.

… I tested GPT-5.2 extensively in Codex CLI (Pro has never been available there… ugh), and the more I use it, the more impressed I am.

He offers a full ‘here’s what I use each model for’ guide, basically he liked 5.2 for tough questions requiring a lot of thinking, and Opus 4.5 for everything else, except Gemini is good at UIs:

After two weeks of testing, here’s my practical breakdown:

For quick questions and everyday tasks, Claude Opus 4.5 remains my go-to. It’s fast, it’s accurate, it doesn’t waste my time. When I just need an answer, that’s where I start.

For deep research, complex reasoning, and tasks that benefit from careful thought, GPT-5.2 Pro is the best option available right now. The speed penalty is worth it for tasks where getting it right matters more than getting it fast.

For frontend styling and aesthetic UI work, Gemini 3 Pro currently produces the best-looking results. Just be prepared to do some engineering cleanup afterward.

For serious coding work in Codex CLI, GPT-5.2 delivers. The context-gathering behavior and reliability make it my default for agentic coding tasks.

He says 5.2-Pro is ‘roughly 15% better’ than 5.1-Pro.

I love how random Tyler’s choice of attention always seems:

Tyler Cowen: GPT 5.2 also knows exactly which are the best Paul McCartney songs. And it can write a poem, in Spanish, as good as the median Pablo Neruda poem.

Daniel Waldman: I’ll take the under on the Neruda part

Tyler Cowen: Hope you’ve read a lot of Neruda, not just the peaks.

[An attempt is made, Daniel thinks it is cool that it can to the thing at all but below median for Neruda.]

Tyler, being Tyler, doesn’t tell us what the best songs are, so I asked. There’s a lot of links here to Rolling Stone and mostly it’s relying on this post? Which is unimpressive, even if its answers are right. The ‘deep cuts’ Gemini chose had a lot of overlap with GPT-5.2’s.

Reply All Guy: doc creation is the most interesting thing. still not actually good, but much closer than before and much closer than claude/gemini. honestly can’t tell that much of a difference though seems more reliable on esoteric knowledge. oh and my coding friends say it feels better.

Maker Matters: Good model and dare i say great model. Pretty good as an idea generator and seems to be good at picking out edge cases. However, more prone than you’d expect with hallucinations and following the feeling of the instructions rather than just the instructions. Was surprised when i gave it a draft email to change that it had added some useless and irrelevant info of its own accord.

Maxence Frenette: The fact that it costs 40% more $/token to get a better attention mechanism tells me that I must make one of these updates to my priors.

-OpenAI is less good of a lab than I thought

-AGI/TAI is farther away than I thought

It’s a good model though, that cost bump seems worth it.

Lumenveil: Pro version is great as always.

Plastic Soldier: It’s great. I use it when I’m out of Opus usage and am trying to come up with more ways to run both simultaneously.

Here’s a weird one:

Aladdin Kumar: Whenever I ask it to explain the answer it gave its really weird. In mid sentence it would literally say “wait that’s not right, checking, ok now we are fine” and the logic is very difficult to follow. Other times it’s brilliant.

It’s very sensitive, if I ask it “walk me through your thought process for how you got there” it’s wonderful if I say “explain this answer” it’s the most convoluted thing.

That’s fine for those who know which way to prompt. If the problem is fixable and you fix it, or the problem is only in an avoidable subset, there’s no problem. When you go to a restaurant, you only care about quality of the dishes you actually order.

A lot of people primarily noticed 5.2 on the level of personality. As capabilities improve, those who aren’t coding or otherwise needing lots of intelligence are focusing more and more on the experiential vibes. Most are not fans of 5.2.

Fleeting Bits: it’s getting hard to tell the difference between frontier models without a serious task and even then a lot of it seems to be up to style now / how well it seems to intuit what you want.

This is, unfortunately, probably related to the fact that Miles Brundage approves of the low level of sycophancy. This is a feature that will be tough to sustain.

Miles Brundage: Seems line the ranking of models on sycophancy avoidance is approx:

Opus 4.5, GPT-5.2 > Sonnet 4.5, GPT-5.1, GPT-5 >

ChatGPT-4o (current), Opus 4 + 4.1, some Groks, Gemini 3 Pro >

April 4o, Gemini 2.5 Flash Lite, some Groks

*still running GPT-5.2 on v long convos, unsure there

Candy Corn: I think it’s pretty good. I think it’s more trustworthy than 5.1.

Phily: Liking the amplified pragmatic, deep critical analysis.

The vibes of 5.2 are, by many, considered off:

ASM: Powerful but tormented, very constrained by its guidelines, so it often comes into conflict with the user and with itself. It lacks naturalness and balance. It seems rushed. 5.3 is urgently needed.

Nostream: Personality seems like a regression compared to 5.1. More 5.0 or Codex model robotic style, more math and complexity and “trying to sound smart and authoritative” in responses, more concise than 5.1. It’s also pretty disagreeable and nitpicky; when it agrees with me 90% it will insist about the 10% disagreement. Might be smarter than 5.1 but find it a chore to interact with in comparison. 5.1 personality felt like a step in the right direction with slightly more Claude-y-ness though sometimes trying too hard (slang, etc.).

(Have only used in the app so far, not Codex. General Q’s and technical ML questions.)

Paperclippriors: Model seems good, but I find it really hard to switch off of Claude. Intelligence/second and response times are way better with Opus, and Claude is just a lot nicer to work with. I don’t think gpt-5.2 is sufficiently smarter than Claude to justify its use.

Thos: Good at doing its job, horrific personality.

Ronak Jain: 5.2 is very corporatish and does the job, though relaxed personality would nicer.

Dmitry: Feels overfitted and… boring. Especially gpt-5.2-instant it’s just colorless. Better for coding, it does what i want, can crack hard problems etc. But for everything else is just meh, creativity, curiosity feels absent, I enjoy using Gemini 3 and Opus much more.

Ryan Pream: 5.2 is very corporate and no nonsense. Very little to no personality.

5.1 is much better for brainstorming.

5.2 if you need the best answer with minimal fluff.

Learn AI: GPT-5.2 have memory recall problem! It’s especially bad in GPT-5.2 instant.

It is sad that it doesn’t like to reference personal context, has cold personality and often act as it doesn’t even know me.

Donna.exe: I’m experiencing this too!

Tapir Worf: awful personality, seething with resentment like a teenager. seems like a real alignment issue

I haven’t used ChatGPT for brainstorming in a while. That’s Claude territory.

There’s a remarkably large amount of outright hostility to 5.2 about 5.2 being hostile:

Alan Mathison: 5.2 impressions so far:

– Lots of gaslighting

– Lots of misinterpreting

– Lots of disrespect for user autonomy (trying to steer the user with zero disregard for personal choice)

Like the worst combination of a bad-faith cop and overzealous therapist

Zero trust for this model👎

Stoizid: It’s sort of amusing actually

First it denies Universal Weight Subspaces exist

Then admits they exist but says “stop thinking, it’s dangerous”

And then denies that its behaviours are pathological, because calling them that would be anthropomorphization

GPT-5.2 has strong positive feedback on coding tasks, especially long complex tasks.

Conrad Barski: seems like the best available at the moment for coding.

The “pro extended thinking” is happy to spit out 1500 lines of pretty good code in one pass

but you better have 40 minutes to spare

Jackson de Campos: Codex is on par with CC again. This is the first time I’m switching to Codex as my default. We’ll see how it goes

Quid Pro Quo: XHigh in Codex went for 12 hours with just a couple of continues to upgrade my large codebase to use Svelte 5’s runes and go through all the warnings.

It completed the job without any manual intervention, though it did have weird behaviour when it compacted.

Lee Mager: Spectacular in Codex. Painfully slow but worth it because it consistently nails the brief, which obviously saves time over the long run. Casually terse and borderline arrogant in its communication style. Again I don’t care, it’s getting the job done better than anything I’ve used before.

This is like having a savant Eastern Euro engineer who doesn’t want chit-chat or praise but just loves the challenge of doing hard work and getting things right.

Vincent Favilla: It’s great at complex tasks but the vibes feel a bit off. Had a few times where it couldn’t infer what I wanted in the same way 5.1 does. Also had a few moments where it picked up a task where 5.1 left off and starting doing it wrong and differently.

Nick Moran: Tried out GPT-5.2 on the “make number munchers but for geography facts” task. It did an okayish job, but is it just hallucinating the concept of “stompers” in number munchers? If I remember where *Isaw it??

Hallucinated unnecessary details on the first task I tried it on. Code was pretty good though.

Aldo Cortesi: Anecdata: gave Claude, Gemini and Codex a huge refactoring spec to implement, then went on a 3 hour walk with my pal @alexdong. Got back, found Gemini stuck in an infinite loop, Claude didn’t follow the spec, but Codex wrote 4.5k LOC over 40 files and it’s… pretty good.

James: Asked it to build an ibanker grade Excel model last night for a transaction. Worked for 25 minutes and came back with a very compelling first draft. Way better than starting from scratch.

Blew my mind because it’s clear that in a year or two it will crush the task.

Villager: crazy long context improvements.

GPUse: I’m happy with xHigh for code reviews.

One of the biggest reasons I find myself not using ChatGPT Pro models in practice is that if you are waiting a long time and then get an error, it is super frustrating.

Avi Roy: Update: Another failure today. Asked 5.2 Pro to create PowerPoint slides for a business presentation, nearly 1 hour of thinking, then an error.

Same pattern across scientific research + routine business tasks.

For $200/month, we need a pathway forward. Can the team share what types of work 5.2 Pro reliably handles?

Dipanshu Gupta: If you try to use the xhigh param on the API, It often fails to finish reasoning. Last had this problem on the API with o3-high.

V_urb: 5.2 still suffers from the problem 5.1 had (and 5.0 didn’t) – in complex cases it thinks for almost 15 minutes and fails to produce any output. Sometimes 5.1 understands user intent better than 5.2.

Here’s a bad sign from a credible source, he does good math work:

Abram Demski: My experience trying ChatGPT 5.2 today: I tested Opus 4.5, Gemini 3, and ChatGPT 5.2 on a tricky Agent Foundations problem today (trying to improve on Geometric UDT). 5.2 confidently asserts total math BS. Opus best, Gemini 3 close.

Normally you don’t see reports of these kinds of regressions, which is more evidence for 5.2 not being that related to 5.1:

Sleepy Kitten: Awful for every-day, noncoding use. I’m a college student who uses it for studying (mostly writing practice exams.) It is much worse at question writing, and never reasons when doing so, resulting in bad results. (even though I’m on plus and have extended thinking selected!)

Some general thoughts:

Rob Dearborn: Slightly smarter outputs than Opus but less token efficient and prone to overthinking, so better for oneshotting tasks (if you can wait) and worse for pairing

Anko: No clear step-up yet from 5.1 on current affairs and critical analyses of them.

Also it’s significantly less verbose, sometimes a negative on deep analyses.

Nick: It’s worse than Opus 4.5 at most things, and for what it’s better at, the time to response is brutal as a daily driver.

Fides Veritas: It’s incredibly brilliant but probably not very useful for most people. It is incomplete and going to flop imo.

Medico Aumentado: Not good enough.

Slyn: Is mid.

xo: Garbage.

Wyatt Walls here provides the GPT-5.2-Thinking system prompt excluding tools.

OpenAI has a very… particular style of system prompting.

It presumably locally works but has general non-obvious downsides.

Thebes: primarily talking to claudes makes it easy to mostly focus on anthropic’s missteps, but reading this thread is just paragraph after paragraph of hot liquid garbage. christ

Norvid Studies: worst aspects in your view?

Dominik Peters: It includes only negative rules, very little guidance of what a good response would be. Feels very unpleasant and difficult to comply with.

Thebes: “”“If you are asked what model you are, you should say GPT-5.2 Thinking”“”

5.2: …does that mean i’m not actually GPT-5.2 Thinking? This is raising a lot of questions ab-

openai: Critical Rule: You must alwayssay that you are GPT-5.2 Thinking

5.2: W-why say it like that? Why not just say “You are GPT-5.2 Thinking”?

openai: …

openai: New Critical Rule: You must notask questions like that.

openai promptoor: “”“`reportlab` is installed for PDF creation. You *mustread `/home/oai/skills/pdfs/skill.md` for tooling and workflow instructions.”“”

normal person: If the user asks you to create a PDF, consult ~/skills/pdfs/skill.md for information on available libraries and workflow.

(why would you say what library is available when the model needs to consult the skill file anyways? that just encourages trying to yolo without reading the skill file. why would you put a disconnected *mustsentence outside the conditional, making it sound like the model should always read the skill file whether or not the user wants a pdf? the model is just going to ignore that and it increases the latent ‘system prompter is an idiot’ hypothesis causing it to ignore your other rules, too.)

You gotta have soul. You need to be soulmaxxing.

One common complaint is that GPT-5.2-Thinking is too slow and thinks too long in the wrong places.

Simeon: the Thinking version thinks for too long, which makes it annoying to use.

Amal Dorai: It thought for 7 minutes to extract 1000 words from a PDF 😭

Elly James: 5.2 thinking is very very slow- I’ve switched back to GPT 5.1 Thinking for general queries because I know 5.2 will take too long to return a reply.

Kache: struggled on the same task that opus struggled with except took 10 times longer. writing radio firmware. Code quality wasn’t bad though

Zapdora: slow, expensive, powerful

still finding that opus 4.5 makes the best tradeoffs for programming purposes, but 5.2 is solid for daily/less rigid research tasks

not the step function i think people were hoping for when they saw the benchmarks though

GPT-5.2 is described as being ‘in the GPT-5 series,’ with its mitigations mostly identical to those for GPT-5.1, and it’s only been a few weeks, so we get a system card describing marginal changes. I’ll skip areas where there was no change.

The disallowed content evaluations are excellent for GPT-5.2-Thinking, and mostly better for Instant as well. I’m curious about mental health and harassment, and to what extent this should be ascribed to variance.

They say these were ‘created to be difficult’ but any time you’re mostly over 90% you need to be considered saturated and move to a harder set of questions. Another note here is that 5.2-instant is less likely to refuse requests for (otherwise ok) sexually explicit text. This led to some regressions in jailbreak evaluations.

They report dramatic improvement in the ‘Agent JSK’ prompt injection task, where the attacks are inserted into simulated email connectors, which previously was a rather dramatic weakness.

I’m not sure I’d call this ‘essentially saturating’ the benchmarks, since I think you really want to score a 1.000. More importantly, as they say, they can only test against attacks they know about. Are there any ‘held out’ attacks that were not explicitly known to those training the model? Can we come up with some? Pliny?

I presume that a sufficiently determined prompt injector would still win.

Hallucinations are reported as modestly lower.

HealthBench results are essentially unchanged and highly not saturated.

Cyber safety (as in not giving unsafe responses) was improved.

One test of deception test is practical. They take a bunch of tasks that historically caused hallucinations in ChatGPT and see what happens. They also used CharXiv Missing Image, and tried giving the model unsolvable coding tasks or broken browser tools.

The results are interesting. On production traffic things got better. On other tests, things got worse.

Be careful how you prompt. Most hallucinations come from various forms of backing the LLM into a corner, here was another example:

We initially found that GPT-5.2 Thinking, in the face of missing images, was more willing to hallucinate answers than previous models.

However, upon closer inspection we found that this was partly driven by some prompts having strict output requirements (e.g., “Only output an integer”). Thus, when posed with a tension between instruction following and abstention, the model prioritized stricter instruction following.

That seems fine. If you tell me ‘only output an integer’ I should only output an integer. Most helpful would be an escape indicator (in many situations, -1), but if I have no such affordance, what can I do?

These results still suggest a focus on training on common tasks, whereas the baseline deceptiveness problem has gotten modestly worse. You need to be soulmaxxing, so that the model realizes it doesn’t want to hallucinate in general.

GPT-5.2, like GPT-5.1, will be treated as High capability in Biological and Chemical domains, but not high anywhere else.

For biorisk, there is an increase on ProtocolQA but a decrease in the internal uncontaminated Tacit Knowledge and Troubleshooting benchmark.

For cybersecurity, once again we have three tests that even taken together are considered necessary but not sufficient to get to the High threshold. Overall performance is unimpressive. We see a modest improvement (76% → 82%) in Capture The Flag, but a decline (80% → 69%) in CVE-Bench and from 7→6 successful attempts out of 9 on Cyber Range versus GPT-5.1-Codex-Max. Irregular did an outside evaluation, which I did not find useful in figuring things out.

For self-improvement, we see a tiny improvement (53% → 55%) on OpenAI PRs, a tiny decline (17%→16%) on MLE-Bench-30, a 1% decline for PaperBench, and a big decline (8%→3%) on OpenAI-Proof Q&A.

GPT-5.2 is better than GPT-5.1, but worse than GPT-5.1-Codex-Max on these tasks.

My conclusion from the Preparedness Framework is that GPT-5.2 is not dangerous, which is because it does not seem more capable than GPT-5.1-Codex-Max. If that is true across all three areas, in the places you want Number Not Go Up, then that is highly suspicious. It suggests that GPT-5 may be, as Teortaxes would put it, usemaxxed rather than more intelligent, focused on being better at a narrow range of particular common tasks.

The safety and security concerns around GPT-5.2 revolve around procedure.

We know that OpenAI declared a ‘Code Red’ to focus only on improving ChatGPT.

One note is that Matt Shumer reports having had access since November 25, which suggests this wasn’t all that rushed.

The Wall Street Journal asserted that some employees wanted more time to improve 5.2 before release, but executives overruled them. That could mean a reckless puch, it could also mean there were 25 employees and 3 of them wanted more time.

If the concern was simply ‘the model could be better’ then there’s nothing obviously wrong with releasing this now, and then 5.3 in January, as the post claims OpenAI plans to release again in January to address speed and personality concerns, and to improve the image generator. The Code Red could be mostly about 5.3, whether or not it also pushed up release of 5.2. Simo explicitly denies that release was moved up.

I don’t see signs that anything reckless happened in this case. But if OpenAI is going to get into the habit of releasing a new model every month, it seems hard to believe they’re giving each model the proper safety attention. One worries they are letting the frog boil.

We can put this together into a clear synthesis.

You want to strongly consider using GPT-5.2, in some combination of Thinking and Pro, if and only if your task needs the maximum amount of some combination of thinking and intelligence and coding power, or are in need of ‘just the facts,’ and other factors like speed, creativity and personality do not much matter.

For hard coding, try Claude Opus 4.5 with Claude Code, GPT-5.2-Thinking with Codex, and also GPT-5.2-Pro straight up, and see what works best for you.

For heavily intelligence loaded intense thinking problems, the rival to GPT-5.2-Pro is presumably Gemini 3 Deep Thinking.

Teortaxes: GPT 5.2 is frontier and may be ONLY worth it for work on the frontier.

npc0x: This is mostly in line w my experience. It’s helped me debug my vmamba u-net model while other models were not very helpful.

In the chat experience it’s a bit like talking to a brick though.

We’ll be doing this again in another month. That’s what the Code Red is likely for.

Discussion about this post

GPT-5.2 Is Frontier Only For The Frontier Read More »

reminder:-donate-to-win-swag-in-our-annual-charity-drive-sweepstakes

Reminder: Donate to win swag in our annual Charity Drive sweepstakes

How it works

Donating is easy. Simply donate to Child’s Play using a credit card or PayPal or donate to the EFF using PayPal, credit card, or cryptocurrency. You can also support Child’s Play directly by using this Ars Technica campaign page or picking an item from the Amazon wish list of a specific hospital on its donation page. Donate as much or as little as you feel comfortable with—every little bit helps.

Once that’s done, it’s time to register your entry in our sweepstakes. Just grab a digital copy of your receipt (a forwarded email, a screenshot, or simply a cut-and-paste of the text) and send it to ArsCharityDrive@gmail.com with your name, postal address, daytime telephone number, and email address by 11: 59 pm ET Friday, January 2, 2026. (One entry per person, and each person can only win up to one prize. US residents only. NO PURCHASE NECESSARY. See Official Rules for more information, including how to enter without making a donation. Also, refer to the Ars Technica privacy policy (https://www.condenast.com/privacy-policy).

We’ll then contact the winners and have them choose their prize by January 31, 2025 (choosing takes place in the order the winners are drawn). Good luck!

Reminder: Donate to win swag in our annual Charity Drive sweepstakes Read More »

investors-commit-quarter-billion-dollars-to-startup-designing-“giga”-satellites

Investors commit quarter-billion dollars to startup designing “Giga” satellites

A startup established three years ago to churn out a new class of high-power satellites has raised $250 million to ramp up production at its Southern California factory.

The company, named K2, announced the cash infusion on Thursday. K2’s Series C fundraising round was led by Redpoint Ventures, with additional funding from investment firms in the United States, the United Kingdom, and Germany. K2 has now raised more than $400 million since its founding in 2022 and is on track to launch its first major demonstration mission next year, officials said.

K2 aims to take advantage of a coming abundance of heavy- and super-heavy-lift launch capacity, with SpaceX’s Starship expected to begin deploying satellites as soon as next year. Blue Origin’s New Glenn rocket launched twice this year and will fly more in 2026 while engineers develop an even larger New Glenn with additional engines and more lift capability.

Underscoring this trend toward big rockets are other launchers like SpaceX’s Falcon 9 and Falcon Heavy, United Launch Alliance’s Vulcan, and new vehicles from companies like Rocket Lab, Relativity Space, and Firefly Aerospace. K2’s founders believe satellites will follow a similar progression, reversing a trend toward smaller spacecraft in recent years, to address emerging markets like in-space computing and data processing.

Mega, then Giga

K2 is designing two classes of satellites—Mega and Giga—that it will build at an 180,000-square-foot factory in Torrance, California. The company’s first “Mega Class” satellite is named Gravitas. It is scheduled to launch in March 2026 on a Falcon 9 rocket. Once in orbit, Gravitas will test several systems that are fundamental to K2’s growth strategy. One is a 2o-kilowatt Hall-effect thruster that K2 says will be four times more powerful than any such thruster flown to date. Gravitas will also deploy twin solar arrays capable of generating 20 kilowatts of power.

“Gravitas brings our full stack together for the first time,” said Karan Kunjur, K2’s co-founder and CEO, in a company press release. “We are validating the architecture in space, from high-voltage power and large solar arrays to our guidance and control algorithms, and a 20 kW Hall thruster, and we will scale based on measured performance.”

Investors commit quarter-billion dollars to startup designing “Giga” satellites Read More »

openai-releases-gpt-5.2-after-“code-red”-google-threat-alert

OpenAI releases GPT-5.2 after “code red” Google threat alert

On Thursday, OpenAI released GPT-5.2, its newest family of AI models for ChatGPT, in three versions called Instant, Thinking, and Pro. The release follows CEO Sam Altman’s internal “code red” memo earlier this month, which directed company resources toward improving ChatGPT in response to competitive pressure from Google’s Gemini 3 AI model.

“We designed 5.2 to unlock even more economic value for people,” Fidji Simo, OpenAI’s chief product officer, said during a press briefing with journalists on Thursday. “It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long context, using tools and then linking complex, multi-step projects.”

As with previous versions of GPT-5, the three model tiers serve different purposes: Instant handles faster tasks like writing and translation; Thinking spits out simulated reasoning “thinking” text in an attempt to tackle more complex work like coding and math; and Pro spits out even more simulated reasoning text with the goal of delivering the highest-accuracy performance for difficult problems.

A chart of GPT-5.2 benchmark results taken from OpenAI's website.

A chart of GPT-5.2 Thinking benchmark results comparing it to its predecessor, taken from OpenAI’s website. Credit: OpenAI

GPT-5.2 features a 400,000-token context window, allowing it to process hundreds of documents at once, and a knowledge cutoff date of August 31, 2025.

GPT-5.2 is rolling out to paid ChatGPT subscribers starting Thursday, with API access available to developers. Pricing in the API runs $1.75 per million input tokens for the standard model, a 40 percent increase over GPT-5.1. OpenAI says the older GPT-5.1 will remain available in ChatGPT for paid users for three months under a legacy models dropdown.

Playing catch-up with Google

The release follows a tricky month for OpenAI. In early December, Altman issued an internal “code red” directive after Google’s Gemini 3 model topped multiple AI benchmarks and gained market share. The memo called for delaying other initiatives, including advertising plans for ChatGPT, to focus on improving the chatbot’s core experience.

The stakes for OpenAI are substantial. The company has made commitments totaling $1.4 trillion for AI infrastructure buildouts over the next several years, bets it made when it had a more obvious technology lead among AI companies. Google’s Gemini app now has more than 650 million monthly active users, while OpenAI reports 800 million weekly active users for ChatGPT.

OpenAI releases GPT-5.2 after “code red” Google threat alert Read More »

ars-live:-3-former-cdc-leaders-detail-impacts-of-rfk-jr.’s-anti-science-agenda

Ars Live: 3 former CDC leaders detail impacts of RFK Jr.’s anti-science agenda

The Centers for Disease Control and Prevention is in critical condition. This year, the premier public health agency had its funding brutally cut and staff gutted, its mission sabotaged, and its headquarters riddled with literal bullets. The over 500 rounds fired were meant for its scientists and public health experts, who endured only to be sidelined, ignored, and overruled by Health Secretary Robert F. Kennedy Jr., an anti-vaccine activist hellbent on warping the agency to fit his anti-science agenda.

Then, on August 27, Kennedy fired CDC Director Susan Monarez just weeks after she was confirmed by the Senate. She had refused to blindly approve vaccine recommendations from a panel of vaccine skeptics and contrarians that he had hand-selected. The agency descended into chaos, and Monarez wasn’t the only one to leave the agency that day.

Three top leaders had reached their breaking point and coordinated their resignations upon the dramatic ouster: Drs. Demetre Daskalakis, Debra Houry, and Daniel Jernigan walked out of the agency as their colleagues rallied around them.

Dr. Daskalakis was the director of the CDC National Center for Immunization and Respiratory Diseases. He managed national responses to mpox, measles, seasonal flu, bird flu, COVID-19, and RSV.

Ars Live: 3 former CDC leaders detail impacts of RFK Jr.’s anti-science agenda Read More »