Author name: Mike M.

cmu-research-shows-compression-alone-may-unlock-ai-puzzle-solving-abilities

CMU research shows compression alone may unlock AI puzzle-solving abilities


Tis the season for a squeezin’

New research challenges prevailing idea that AI needs massive datasets to solve problems.

A pair of Carnegie Mellon University researchers recently discovered hints that the process of compressing information can solve complex reasoning tasks without pre-training on a large number of examples. Their system tackles some types of abstract pattern-matching tasks using only the puzzles themselves, challenging conventional wisdom about how machine learning systems acquire problem-solving abilities.

“Can lossless information compression by itself produce intelligent behavior?” ask Isaac Liao, a first-year PhD student, and his advisor Professor Albert Gu from CMU’s Machine Learning Department. Their work suggests the answer might be yes. To demonstrate, they created CompressARC and published the results in a comprehensive post on Liao’s website.

The pair tested their approach on the Abstraction and Reasoning Corpus (ARC-AGI), an unbeaten visual benchmark created in 2019 by machine learning researcher François Chollet to test AI systems’ abstract reasoning skills. ARC presents systems with grid-based image puzzles where each provides several examples demonstrating an underlying rule, and the system must infer that rule to apply it to a new example.

For instance, one ARC-AGI puzzle shows a grid with light blue rows and columns dividing the space into boxes. The task requires figuring out which colors belong in which boxes based on their position: black for corners, magenta for the middle, and directional colors (red for up, blue for down, green for right, and yellow for left) for the remaining boxes. Here are three other example ARC-AGI puzzles, taken from Liao’s website:

Three example ARC-AGI benchmarking puzzles.

Three example ARC-AGI benchmarking puzzles. Credit: Isaac Liao / Albert Gu

The puzzles test capabilities that some experts believe may be fundamental to general human-like reasoning (often called “AGI” for artificial general intelligence). Those properties include understanding object persistence, goal-directed behavior, counting, and basic geometry without requiring specialized knowledge. The average human solves 76.2 percent of the ARC-AGI puzzles, while human experts reach 98.5 percent.

OpenAI made waves in December for the claim that its o3 simulated reasoning model earned a record-breaking score on the ARC-AGI benchmark. In testing with computational limits, o3 scored 75.7 percent on the test, while in high-compute testing (basically unlimited thinking time), it reached 87.5 percent, which OpenAI says is comparable to human performance.

CompressARC achieves 34.75 percent accuracy on the ARC-AGI training set (the collection of puzzles used to develop the system) and 20 percent on the evaluation set (a separate group of unseen puzzles used to test how well the approach generalizes to new problems). Each puzzle takes about 20 minutes to process on a consumer-grade RTX 4070 GPU, compared to top-performing methods that use heavy-duty data center-grade machines and what the researchers describe as “astronomical amounts of compute.”

Not your typical AI approach

CompressARC takes a completely different approach than most current AI systems. Instead of relying on pre-training—the process where machine learning models learn from massive datasets before tackling specific tasks—it works with no external training data whatsoever. The system trains itself in real-time using only the specific puzzle it needs to solve.

“No pretraining; models are randomly initialized and trained during inference time. No dataset; one model trains on just the target ARC-AGI puzzle and outputs one answer,” the researchers write, describing their strict constraints.

When the researchers say “No search,” they’re referring to another common technique in AI problem-solving where systems try many different possible solutions and select the best one. Search algorithms work by systematically exploring options—like a chess program evaluating thousands of possible moves—rather than directly learning a solution. CompressARC avoids this trial-and-error approach, relying solely on gradient descent—a mathematical technique that incrementally adjusts the network’s parameters to reduce errors, similar to how you might find the bottom of a valley by always walking downhill.

A block diagram of the CompressARC architecture, created by the researchers.

A block diagram of the CompressARC architecture, created by the researchers. Credit: Isaac Liao / Albert Gu

The system’s core principle uses compression—finding the most efficient way to represent information by identifying patterns and regularities—as the driving force behind intelligence. CompressARC searches for the shortest possible description of a puzzle that can accurately reproduce the examples and the solution when unpacked.

While CompressARC borrows some structural principles from transformers (like using a residual stream with representations that are operated upon), it’s a custom neural network architecture designed specifically for this compression task. It’s not based on an LLM or standard transformer model.

Unlike typical machine learning methods, CompressARC uses its neural network only as a decoder. During encoding (the process of converting information into a compressed format), the system fine-tunes the network’s internal settings and the data fed into it, gradually making small adjustments to minimize errors. This creates the most compressed representation while correctly reproducing known parts of the puzzle. These optimized parameters then become the compressed representation that stores the puzzle and its solution in an efficient format.

An animated GIF showing the multi-step process of CompressARC solving an ARC-AGI puzzle.

An animated GIF showing the multi-step process of CompressARC solving an ARC-AGI puzzle. Credit: Isaac Liao

“The key challenge is to obtain this compact representation without needing the answers as inputs,” the researchers explain. The system essentially uses compression as a form of inference.

This approach could prove valuable in domains where large datasets don’t exist or when systems need to learn new tasks with minimal examples. The work suggests that some forms of intelligence might emerge not from memorizing patterns across vast datasets, but from efficiently representing information in compact forms.

The compression-intelligence connection

The potential connection between compression and intelligence may sound strange at first glance, but it has deep theoretical roots in computer science concepts like Kolmogorov complexity (the shortest program that produces a specified output) and Solomonoff induction—a theoretical gold standard for prediction equivalent to an optimal compression algorithm.

To compress information efficiently, a system must recognize patterns, find regularities, and “understand” the underlying structure of the data—abilities that mirror what many consider intelligent behavior. A system that can predict what comes next in a sequence can compress that sequence efficiently. As a result, some computer scientists over the decades have suggested that compression may be equivalent to general intelligence. Based on these principles, the Hutter Prize has offered awards to researchers who can compress a 1GB file to the smallest size.

We previously wrote about intelligence and compression in September 2023, when a DeepMind paper discovered that large language models can sometimes outperform specialized compression algorithms. In that study, researchers found that DeepMind’s Chinchilla 70B model could compress image patches to 43.4 percent of their original size (beating PNG’s 58.5 percent) and audio samples to just 16.4 percent (outperforming FLAC’s 30.3 percent).

Photo of a C-clamp compressing books.

That 2023 research suggested a deep connection between compression and intelligence—the idea that truly understanding patterns in data enables more efficient compression, which aligns with this new CMU research. While DeepMind demonstrated compression capabilities in an already-trained model, Liao and Gu’s work takes a different approach by showing that the compression process can generate intelligent behavior from scratch.

This new research matters because it challenges the prevailing wisdom in AI development, which typically relies on massive pre-training datasets and computationally expensive models. While leading AI companies push toward ever-larger models trained on more extensive datasets, CompressARC suggests intelligence emerging from a fundamentally different principle.

“CompressARC’s intelligence emerges not from pretraining, vast datasets, exhaustive search, or massive compute—but from compression,” the researchers conclude. “We challenge the conventional reliance on extensive pretraining and data, and propose a future where tailored compressive objectives and efficient inference-time computation work together to extract deep intelligence from minimal input.”

Limitations and looking ahead

Even with its successes, Liao and Gu’s system comes with clear limitations that may prompt skepticism. While it successfully solves puzzles involving color assignments, infilling, cropping, and identifying adjacent pixels, it struggles with tasks requiring counting, long-range pattern recognition, rotations, reflections, or simulating agent behavior. These limitations highlight areas where simple compression principles may not be sufficient.

The research has not been peer-reviewed, and the 20 percent accuracy on unseen puzzles, though notable without pre-training, falls significantly below both human performance and top AI systems. Critics might argue that CompressARC could be exploiting specific structural patterns in the ARC puzzles that might not generalize to other domains, challenging whether compression alone can serve as a foundation for broader intelligence rather than just being one component among many required for robust reasoning capabilities.

And yet as AI development continues its rapid advance, if CompressARC holds up to further scrutiny, it offers a glimpse of a possible alternative path that might lead to useful intelligent behavior without the resource demands of today’s dominant approaches. Or at the very least, it might unlock an important component of general intelligence in machines, which is still poorly understood.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

CMU research shows compression alone may unlock AI puzzle-solving abilities Read More »

when-europe-needed-it-most,-the-ariane-6-rocket-finally-delivered

When Europe needed it most, the Ariane 6 rocket finally delivered


“For this sovereignty, we must yield to the temptation of preferring SpaceX.”

Europe’s second Ariane 6 rocket lifted off from the Guiana Space Center on Thursday with a French military spy satellite. Credit: ESA-CNES-Arianespace-P. Piron

Europe’s Ariane 6 rocket lifted off Thursday from French Guiana and deployed a high-resolution reconnaissance satellite into orbit for the French military, notching a success on its first operational flight.

The 184-foot-tall (56-meter) rocket lifted off from Kourou, French Guiana, at 11: 24 am EST (16: 24 UTC). Twin solid-fueled boosters and a hydrogen-fueled core stage engine powered the Ariane 6 through thick clouds on an arcing trajectory north from the spaceport on South America’s northeastern coast.

The rocket shed its strap-on boosters a little more than two minutes into the flight, then jettisoned its core stage nearly eight minutes after liftoff. The spent rocket parts fell into the Atlantic Ocean. The upper stage’s Vinci engine ignited two times to reach a nearly circular polar orbit about 500 miles (800 kilometers) above the Earth. A little more than an hour after launch, the Ariane 6 upper stage deployed CSO-3, a sharp-eyed French military spy satellite, to begin a mission providing optical surveillance imagery to French intelligence agencies and military forces.

“This is an absolute pleasure for me today to announce that Ariane 6 has successfully placed into orbit the CSO-3 satellite,” said David Cavaillolès, who took over in January as CEO of Arianespace, the Ariane 6’s commercial operator. “Today, here in Kourou, we can say that thanks to Ariane 6, Europe and France have their own autonomous access to space back, and this is great news.”

This was the second flight of Europe’s new Ariane 6 rocket, following a mostly successful debut launch last July. The first test flight of the unproven Ariane 6 carried a batch of small, relatively inexpensive satellites. An Auxiliary Propulsion Unit (APU)—essentially a miniature second engine—on the upper stage shut down in the latter portion of the inaugural Ariane 6 flight, after the rocket reached orbit and released some of its payloads. But the unit malfunctioned before a third burn of the upper stage’s main engine, preventing the Ariane 6 from targeting a controlled reentry into the atmosphere.

The APU has several jobs on an Ariane 6 flight, including maintaining pressure inside the upper stage’s cryogenic propellant tanks, settling propellants before each main engine firing, and making fine adjustments to the rocket’s position in space. The APU appeared to work as designed Thursday, although this launch flew a less demanding profile than the test flight last year.

Is Ariane 6 the solution?

Ariane 6 has been exorbitantly costly and years late, but its first operational success comes at an opportune time for Europe.

Philippe Baptiste, France’s minister for research and higher education, says Ariane 6 is “proof of our space sovereignty,” as many European officials feel they can no longer rely on the United States. Baptiste, an engineer and former head of the French space agency, mentioned “sovereignty” so many times, turning his statement into a drinking game crossed my mind.

“The return of Donald Trump to the White House, with Elon Musk at his side, already has significant consequences on our research partnerships, on our commercial partnerships,” Baptiste said. “Should I mention the uncertainties weighing today on our cooperation with NASA and NOAA, when emblematic programs like the ISS (International Space Station) are being unilaterally questioned by Elon Musk?

“If we want to maintain our independence, ensure our security, and preserve our sovereignty, we must equip ourselves with the means for strategic autonomy, and space is an essential part of this,” he continued.

Philippe Baptiste arrives at a government question session at the Senate in Paris on March 5, 2025. Credit: Magali Cohen/Hans Lucas/AFP via Getty Images

Baptiste’s comments echo remarks from a range of European leaders in recent weeks.

French President Emmanuel Macron said in a televised address Wednesday night that the French were “legitimately worried” about European security after Trump reversed US policy on Ukraine. America’s NATO allies are largely united in their desire to continue supporting Ukraine in its defense against Russia’s invasion, while the Trump administration seeks a ceasefire that would require significant Ukrainian concessions.

“I want to believe that the United States will stay by our side, but we have to be prepared for that not to be the case,” Macron said. “The future of Europe does not have to be decided in Washington or Moscow.”

Friedrich Merz, set to become Germany’s next chancellor, said last month that Europe should strive to “achieve independence” from the United States. “It is clear that the Americans, at least this part of the Americans, this administration, are largely indifferent to the fate of Europe.”

Merz also suggested Germany, France, and the United Kingdom should explore cooperation on a European nuclear deterrent to replace that of the United States, which has committed to protecting European territory from Russian attack for more than 75 years. Macron said the French military, which runs the only nuclear forces in Europe fully independent of the United States, could be used to protect allies elsewhere on the continent.

Access to space is also a strategic imperative for Europe, and it hasn’t come cheap. ESA paid more than $4 billion to develop the Ariane 6 rocket as a cheaper, more capable replacement for the Ariane 5, which retired in 2023. There are still pressing questions about Ariane 6’s cost per launch and whether the rocket will ever be able to meet its price target and compete with SpaceX and other companies in the commercial market.

But European officials have freely admitted the commercial market is secondary on their list of Ariane 6 goals.

European satellite operators stopped launching their payloads on Russian rockets after the invasion of Ukraine in 2022. Now, with Elon Musk inserting himself into European politics, there’s little appetite among European government officials to launch their satellites on SpaceX’s Falcon 9 rocket.

The second Ariane 6 rocket on the launch pad in French Guiana. Credit: ESA–S. Corvaja

The Falcon 9 was the go-to choice for the European Space Agency, the European Union, and several national governments in Europe after they lost access to Russia’s Soyuz rocket and when Europe’s homemade Ariane 6 and Vega rockets faced lengthy delays. ESA launched a $1.5 billion space telescope on a Falcon 9 rocket in 2023, then returned to SpaceX to launch a climate research satellite and an asteroid explorer last year. The European Union paid SpaceX to launch four satellites for its flagship Galileo navigation network.

European space officials weren’t thrilled to do this. ESA was somewhat more accepting of the situation, with the agency’s director general recognizing Europe was suffering from an “acute launcher crisis” two years ago. On the other hand, the EU refused to even acknowledge SpaceX’s role in delivering Galileo satellites to orbit in the text of a post-launch press release.

“For this sovereignty, we must yield to the temptation of preferring SpaceX or another competitor that may seem trendier, more reliable, or cheaper,” Baptiste said. “We did not yield for CSO-3, and we will not yield in the future. We cannot yield because doing so would mean closing the door to space for good, and there would be no turning back. This is why the first commercial launch of Ariane 6 is not just a technical and one-off success. It marks a new milestone, essential in the choice of European space independence and sovereignty.”

Two flights into its career, Ariane 6 seems to offer a technical solution for Europe’s needs. But at what cost? Arianespace hasn’t publicly disclosed the cost for an Ariane 6 launch, although it’s likely somewhere in the range of 80 million to 100 million euros, about 40 percent lower than the cost of an Ariane 5. This is about 50 percent more than SpaceX’s list price for a dedicated Falcon 9 launch.

A new wave of European startups should soon begin launching small rockets to gain a foothold in the continent’s launch industry. These include Isar Aerospace, which could launch its first Spectrum rocket in a matter of weeks. These companies have the potential to offer Europe an option for cheaper rides to space, but the startups won’t have a rocket in the class of Ariane 6 until at least the 2030s.

Until then, at least, European governments will have to pay more to guarantee autonomous access to space.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

When Europe needed it most, the Ariane 6 rocket finally delivered Read More »

elon-musk-loses-initial-attempt-to-block-openai’s-for-profit-conversion

Elon Musk loses initial attempt to block OpenAI’s for-profit conversion

A federal judge rejected Elon Musk’s request to block OpenAI’s planned conversion from a nonprofit to for-profit entity but expedited the case so that Musk’s core claims can be addressed in a trial before the end of this year.

Musk had filed a motion for preliminary injunction in US District Court for the Northern District of California, claiming that OpenAI’s for-profit conversation “violates the terms of Musk’s donations” to the company. But Musk failed to meet the burden of proof needed for an injunction, Judge Yvonne Gonzalez Rogers ruled yesterday.

“Plaintiffs Elon Musk, [former OpenAI board member] Shivon Zilis, and X.AI Corp. (‘xAI’) collectively move for a preliminary injunction barring defendants from engaging in various business activities, which plaintiffs claim violate federal antitrust and state law,” Rogers wrote. “The relief requested is extraordinary and rarely granted as it seeks the ultimate relief of the case on an expedited basis, with a cursory record, and without the benefit of a trial.”

Rogers said that “the Court is prepared to offer an expedited schedule on the core claims driving this litigation [to] address the issues which are allegedly more urgent in terms of public, not private, considerations.” There would be important public interest considerations if the for-profit shift is found to be illegal at a trial, she wrote.

Musk said OpenAI took advantage of him

Noting that OpenAI donors may have taken tax deductions from a nonprofit that is now turning into a for-profit enterprise, Rogers said the court “agrees that significant and irreparable harm is incurred when the public’s money is used to fund a non-profit’s conversion into a for-profit.” But as for the motion to block the for-profit conversion before a trial, “The request for an injunction barring any steps towards OpenAI’s conversion to a for-profit entity is DENIED.”

Elon Musk loses initial attempt to block OpenAI’s for-profit conversion Read More »

shadowveil-is-a-stylish,-tough-single-player-auto-battler

Shadowveil is a stylish, tough single-player auto-battler

One thing Shadowveil: Legend of the Five Rings does well is invoke terror. Not just the terror of an overwhelming mass of dark energy encroaching on your fortress, which is what the story suggests. Moreso, the terror of hoping your little computer-controlled fighters will do the smart thing, then being forced to watch, helpless, as they are consumed by algorithmic choices, bad luck, your strategies, or some combination of all three.

Shadowveil, the first video game based on the more than 30-year-old Legend of the Five Rings fantasy franchise, is a roguelite auto-battler. You pick your Crab Clan hero (berserker hammer-wielder or tactical support type), train up some soldiers, and assign all of them abilities, items, and buffs you earn as you go. When battle starts, you choose which hex to start your fighters on, double-check your load-outs, then click to start and watch what happens. You win and march on, or you lose and regroup at base camp, buying some upgrades with your last run’s goods.

Shadowveil: Legend of the Five Rings launch trailer.

In my impressions after roughly seven hours of playing, Shadowveil could do more to soften its learning curve, but it presents a mostly satisfying mix of overwhelming odds and achievement. What’s irksome now could get patched, and what’s already there is intriguing, especially for the price.

The hard-worn path to knowledge

There are almost always more enemies than you have fighters, so it’s your job to find efficiencies, choke points, and good soldier pairings.

Credit: Palindrome Interactive

There are almost always more enemies than you have fighters, so it’s your job to find efficiencies, choke points, and good soldier pairings. Credit: Palindrome Interactive

Some necessary disclosure: Auto-battlers are not one of my go-to genres. Having responsibility for all the prep, but no control over what fighters will actually do when facing a glut of enemies, can feel punishing, unfair, and only sometimes motivating to try something different. Add that chaos and uncertainty to procedurally generated paths (like in Slay the Spire), and sometimes the defeats felt like my fault, sometimes the random number generator’s doing.

Losing is certainly anticipated in Shadowveil. The roguelite elements are the items and currencies you pick up from victories and carry back after defeat. With these, you can unlock new kinds of fighters, upgrade your squad members, and otherwise grease the skids for future runs. You’ll have to make tough choices here, as there are more than a half-dozen resources, some unique to each upgrade type, and some you might not pick up at all in any given run.

Shadowveil is a stylish, tough single-player auto-battler Read More »

eerily-realistic-ai-voice-demo-sparks-amazement-and-discomfort-online

Eerily realistic AI voice demo sparks amazement and discomfort online


Sesame’s new AI voice model features uncanny imperfections, and it’s willing to act like an angry boss.

In late 2013, the Spike Jonze film Her imagined a future where people would form emotional connections with AI voice assistants. Nearly 12 years later, that fictional premise has veered closer to reality with the release of a new conversational voice model from AI startup Sesame that has left many users both fascinated and unnerved.

“I tried the demo, and it was genuinely startling how human it felt,” wrote one Hacker News user who tested the system. “I’m almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound.”

In late February, Sesame released a demo for the company’s new Conversational Speech Model (CSM) that appears to cross over what many consider the “uncanny valley” of AI-generated speech, with some testers reporting emotional connections to the male or female voice assistant (“Miles” and “Maya”).

In our own evaluation, we spoke with the male voice for about 28 minutes, talking about life in general and how it decides what is “right” or “wrong” based on its training data. The synthesized voice was expressive and dynamic, imitating breath sounds, chuckles, interruptions, and even sometimes stumbling over words and correcting itself. These imperfections are intentional.

“At Sesame, our goal is to achieve ‘voice presence’—the magical quality that makes spoken interactions feel real, understood, and valued,” writes the company in a blog post. “We are creating conversational partners that do not just process requests; they engage in genuine dialogue that builds confidence and trust over time. In doing so, we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding.”

Sometimes the model tries too hard to sound like a real human. In one demo posted online by a Reddit user called MetaKnowing, the AI model talks about craving “peanut butter and pickle sandwiches.”

An example of Sesame’s female voice model craving peanut butter and pickle sandwiches, captured by Reddit user MetaKnowing.

Founded by Brendan Iribe, Ankit Kumar, and Ryan Brown, Sesame AI has attracted significant backing from prominent venture capital firms. The company has secured investments from Andreessen Horowitz, led by Anjney Midha and Marc Andreessen, along with Spark Capital, Matrix Partners, and various founders and individual investors.

Browsing reactions to Sesame found online, we found many users expressing astonishment at its realism. “I’ve been into AI since I was a child, but this is the first time I’ve experienced something that made me definitively feel like we had arrived,” wrote one Reddit user. “I’m sure it’s not beating any benchmarks, or meeting any common definition of AGI, but this is the first time I’ve had a real genuine conversation with something I felt was real.” Many other Reddit threads express similar feelings of surprise, with commenters saying it’s “jaw-dropping” or “mind-blowing.”

While that sounds like a bunch of hyperbole at first glance, not everyone finds the Sesame experience pleasant. Mark Hachman, a senior editor at PCWorld, wrote about being deeply unsettled by his interaction with the Sesame voice AI. “Fifteen minutes after ‘hanging up’ with Sesame’s new ‘lifelike’ AI, and I’m still freaked out,” Hachman reported. He described how the AI’s voice and conversational style eerily resembled an old friend he had dated in high school.

Others have compared Sesame’s voice model to OpenAI’s Advanced Voice Mode for ChatGPT, saying that Sesame’s CSM features more realistic voices, and others are pleased that the model in the demo will roleplay angry characters, which ChatGPT refuses to do.

An example argument with Sesame’s CSM created by Gavin Purcell.

Gavin Purcell, co-host of the AI for Humans podcast, posted an example video on Reddit where the human pretends to be an embezzler and argues with a boss. It’s so dynamic that it’s difficult to tell who the human is and which one is the AI model. Judging by our own demo, it’s entirely capable of what you see in the video.

“Near-human quality”

Under the hood, Sesame’s CSM achieves its realism by using two AI models working together (a backbone and a decoder) based on Meta’s Llama architecture that processes interleaved text and audio. Sesame trained three AI model sizes, with the largest using 8.3 billion parameters (an 8 billion backbone model plus a 300 million parameter decoder) on approximately 1 million hours of primarily English audio.

Sesame’s CSM doesn’t follow the traditional two-stage approach used by many earlier text-to-speech systems. Instead of generating semantic tokens (high-level speech representations) and acoustic details (fine-grained audio features) in two separate stages, Sesame’s CSM integrates into a single-stage, multimodal transformer-based model, jointly processing interleaved text and audio tokens to produce speech. OpenAI’s voice model uses a similar multimodal approach.

In blind tests without conversational context, human evaluators showed no clear preference between CSM-generated speech and real human recordings, suggesting the model achieves near-human quality for isolated speech samples. However, when provided with conversational context, evaluators still consistently preferred real human speech, indicating a gap remains in fully contextual speech generation.

Sesame co-founder Brendan Iribe acknowledged current limitations in a comment on Hacker News, noting that the system is “still too eager and often inappropriate in its tone, prosody and pacing” and has issues with interruptions, timing, and conversation flow. “Today, we’re firmly in the valley, but we’re optimistic we can climb out,” he wrote.

Too close for comfort?

Despite CSM’s technological impressiveness, advancements in conversational voice AI carry significant risks for deception and fraud. The ability to generate highly convincing human-like speech has already supercharged voice phishing scams, allowing criminals to impersonate family members, colleagues, or authority figures with unprecedented realism. But adding realistic interactivity to those scams may take them to another level of potency.

Unlike current robocalls that often contain tell-tale signs of artificiality, next-generation voice AI could eliminate these red flags entirely. As synthetic voices become increasingly indistinguishable from human speech, you may never know who you’re talking to on the other end of the line. It’s inspired some people to share a secret word or phrase with their family for identity verification.

Although Sesame’s demo does not clone a person’s voice, future open source releases of similar technology could allow malicious actors to potentially adapt these tools for social engineering attacks. OpenAI itself held back its own voice technology from wider deployment over fears of misuse.

Sesame sparked a lively discussion on Hacker News about its potential uses and dangers. Some users reported having extended conversations with the two demo voices, with conversations lasting up to the 30-minute limit. In one case, a parent recounted how their 4-year-old daughter developed an emotional connection with the AI model, crying after not being allowed to talk to it again.

The company says it plans to open-source “key components” of its research under an Apache 2.0 license, enabling other developers to build upon their work. Their roadmap includes scaling up model size, increasing dataset volume, expanding language support to over 20 languages, and developing “fully duplex” models that better handle the complex dynamics of real conversations.

You can try the Sesame demo on the company’s website, assuming that it isn’t too overloaded with people who want to simulate a rousing argument.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Eerily realistic AI voice demo sparks amazement and discomfort online Read More »

george-orwell’s-1984-as-a-’90s-pc-game-has-to-be-seen-to-be-believed

George Orwell’s 1984 as a ’90s PC game has to be seen to be believed

Quick, to the training sphere!

The Big Brother announcement promised the ability to “interact with everything” and “disable and destroy intrusive tele-screens and spy cameras watching the player’s every move” across “10 square blocks of Orwell’s retro-futuristic world.” But footage from the demo falls well short of that promise, instead covering some extremely basic Riven-style puzzle gameplay (flips switches to turn on the power; use a screwdriver to open the grate, etc.) played from a first-person view.

Sample gameplay from the newly unearthed Big Brother demo.

It all builds up to a sequence where (according to a walk-through included on the demo disc) you have to put on a “zero-g suit” before planting a bomb inside a “zero gravity training sphere” guarded by robots. Sounds like inhabiting the world of the novel to us!

Aside from the brief mentions of the Thought Police and MiniPac, the short demo does include a few other incidental nods to its licensed source material, including a “WAR IS PEACE” propaganda banner and an animated screen with the titular Big Brother seemingly looking down on you. Still, the entire gameplay scenario is so far removed from anything in the actual 1984 novel to make you wonder why they bothered with the license in the first place. Of course, MediaX answers that question in the game’s announcement, predicting that “while the game stands on its own as an entirely new creation in itself and will attract the typical game audience, the ‘Big Brother’ game will undoubtedly also attract a large literary audience.”

We sadly never got the chance to see how that “large literary audience” would have reacted to a game that seemed poised to pervert both the name and themes of 1984 so radically. In any case, this demo can now sit alongside the release of 1984’s Fahrenheit 451 and 1992’s The Godfather: The Action Game on any list of the most questionable game adaptations of respected works of art.

George Orwell’s 1984 as a ’90s PC game has to be seen to be believed Read More »

netflix-drops-trailer-for-the-russo-brothers’-the-electric-state

Netflix drops trailer for the Russo brothers’ The Electric State

Millie Bobby Brown and Chris Pratt star in the Netflix original film The Electric State.

Anthony and Joe Russo have their hands full these days with the Marvel films Avengers: Doomsday and Avengers: Secret War, slated for 2026 and 2027 releases, respectively. But we’ll get a chance to see another, smaller film from the directors this month on Netflix: The Electric State, adapted from the graphic novel by Swedish artist/designer Simon Stålenhag.

Stålenhag’s stunningly surreal neofuturistic art—featured in his narrative art books, 2014’s Tales from the Loop and 2016’s Things From the Flood—inspired the 2020 eight-episode series Tales From the Loop, in which residents of a rural town find themselves grappling with strange occurrences thanks to the presence of an underground particle accelerator. That adaptation captured the mood and tone of the art that inspired it and received Emmy nominations for cinematography and special visual effects.

The Electric State was Stålenhag’s third such book, published in 2018 and set in a similar dystopian, ravaged landscape. Paragraphs of text, accompanied by larger artworks, tell the story of a teen girl named Michelle who must travel across the country with her robot companion to find her long-lost brother, while being pursued by a federal agent. The Russo brothers acquired the rights early on and initially intended to make the film with Universal, but when the studio decided it would not be giving the film a theatrical release, Netflix bought the distribution rights.

It’s worth noting that the Russo brothers have made several major plot changes from the source material, a decision that did not please Stålenhag’s many fans, particularly since the first-look images revealed that the directors were also adopting more of a colorful 1990s aesthetic than the haunting art that originally inspired their film. Per the official premise:

Netflix drops trailer for the Russo brothers’ The Electric State Read More »

these-hot-oil-droplets-can-bounce-off-any-surface

These hot oil droplets can bounce off any surface

The Hong Kong physicists were interested in hot droplets striking cold surfaces. Prior research showed there was less of a bouncing effect in such cases involving heated water droplets, with the droplets sticking to the surface instead thanks to various factors such as reduced droplet surface tension. The Hong Kong team discovered they could achieve enhanced bouncing by using hot droplets of less volatile liquids—namely, n-hexadecane, soybean oil, and silicon oil, which have lower saturation pressures compared to water.

Follow the bouncing droplet

The researchers tested these hot droplets (as well as burning and normal temperature droplets) on various solid, cold surfaces, including scratched glass, smooth glass, acrylic surfaces, surfaces with liquid-repellant coatings from candle soot, and surfaces coated with nanoparticles with varying “wettability” (i.e., how well particles stick to the surface). They captured the droplet behavior with both high-speed and thermal cameras, augmented with computer modeling.

The room-temperature droplets stuck to all the surfaces as expected, but the hot and burning droplets bounced. The team found that the bottom of a hot droplet cools faster than the top as it approaches a room-temperature surface, which causes hotter liquid within the droplet to flow from the edges toward the bottom. The air that is dragged to the bottom with it forms a thin cushion there and prevents the droplet from making contact with the surface, bouncing off instead. They dubbed the behavior “self-lubricated bouncing.”

“It is now clear that droplet-bouncing strategies are not isolated to engineering the substrate and that the thermophysical properties of droplets themselves are critical,” Jonathan B. Boreyko of Virginia Tech, who was not involved in the research, wrote in an accompanying commentary.

Future applications include improving the combustion efficiency of fuels or developing better fire-retardant coatings. “If burning droplets can’t stick to surfaces, they won’t be able to ignite new materials and allow fires to propagate,” co-author Pingan Zhu said. “Our study could help protect flammable materials like textiles from burning droplets. Confining fires to a smaller area and slowing their spread could give firefighters more time to put them out.”

DOI: Newton, 2025. 10.1016/j.newton.2025.100014  (About DOIs).

These hot oil droplets can bounce off any surface Read More »

the-modern-era-of-low-flying-satellites-may-begin-this-week

The modern era of low-flying satellites may begin this week

Clarity-1 at the pad

Albedo’s first big test may come within the next week and the launch of the “Transporter-13” mission on SpaceX’s Falcon 9 rocket. The company’s first satellite, Clarity-1, is 530 kg (1170 pounds) and riding atop the stack of ridesharing spacecraft. The mission could launch as soon as this coming weekend from Vandenberg Space Force Base in California.

The Clarity-1 satellite will be dropped off between 500 and 600 km orbit and then attempt to lower itself to an operational orbit 274 km (170 miles) above the planet.

This is a full-up version of Albedo’s satellite design. The spacecraft is larger than a full-size refrigerator, similar to a phone booth, and is intended to operate for a lifetime of about five years, depending on the solar cycle. Clarity-1 is launching near the peak of the 11-year solar cycle, so this could reduce its active lifetime.

Albedo recently won a contract from the US Air Force Research Laboratory that is worth up to $12 million to share VLEO-specific, on-orbit data and provide analysis to support the development of new missions and payloads beyond its own optical sensors.

Serving many different customers

The advantages of such a platform include superior image quality, less congested orbits, and natural debris removal as inoperable satellites are pulled down into Earth’s atmosphere and burnt up.

But what about the drawbacks? In orbits closer to Earth the primary issue is atomic oxygen, which is highly reactive and energetic. There are also plasma eddies and other phenomena that interfere with the operation of satellites and degrade their materials. This makes VLEO far more hazardous than higher altitudes. It’s also more difficult to capture precise imagery.

“The hardest part is pointing and attitude control,” Haddad said, “because that’s already hard in LEO, when you have a big telescope and you’re trying to get a high resolution. Then you put it in VLEO, where the Earth’s rotation beneath is moving faster, and it just exacerbates the problem.”

In the next several years, Albedo is likely to reach a constellation sized at about 24 satellites, but that number will depend on customer demand, Haddad said. Albedo has previously announced about half a dozen of its commercial customers who will task Clarity-1 for various purposes, such as power and pipeline monitoring or solar farm maintenance.

But first, it has to demonstrate its technology.

The modern era of low-flying satellites may begin this week Read More »

ai-firms-follow-deepseek’s-lead,-create-cheaper-models-with-“distillation”

AI firms follow DeepSeek’s lead, create cheaper models with “distillation”

Thanks to distillation, developers and businesses can access these models’ capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.

Developers can use OpenAI’s platform for distillation, learning from the large language models that underpin products like ChatGPT. OpenAI’s largest backer, Microsoft, used GPT-4 to distill its small language family of models Phi as part of a commercial partnership after investing nearly $14 billion into the company.

However, the San Francisco-based start-up has said it believes DeepSeek distilled OpenAI’s models to train its competitor, a move that would be against its terms of service. DeepSeek has not commented on the claims.

While distillation can be used to create high-performing models, experts add they are more limited.

“Distillation presents an interesting trade-off; if you make the models smaller, you inevitably reduce their capability,” said Ahmed Awadallah of Microsoft Research, who said a distilled model can be designed to be very good at summarising emails, for example, “but it really would not be good at anything else.”

David Cox, vice-president for AI models at IBM Research, said most businesses do not need a massive model to run their products, and distilled ones are powerful enough for purposes such as customer service chatbots or running on smaller devices like phones.

“Any time you can [make it less expensive] and it gives you the right performance you want, there is very little reason not to do it,” he added.

That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load.

AI firms follow DeepSeek’s lead, create cheaper models with “distillation” Read More »

commercials-are-still-too-loud,-say-“thousands”-of-recent-fcc-complaints

Commercials are still too loud, say “thousands” of recent FCC complaints

Streaming ads could get muzzled, too

As you may have noticed—either through the text of this article or your own ears—The Calm Act doesn’t apply to streaming services. And because The Calm Act doesn’t affect commercials viewed on the Internet, online services providing access to broadcast channels, like YouTube TV and Sling, don’t have to follow the rules. This is despite such services distributing the same content as linear TV providers.

For years, this made sense. The majority of TV viewing occurred through broadcast, cable, or satellite access. Further, services like Netflix and Amazon Prime Video used to be considered safe havens from constant advertisements. But today, streaming services are more popular than ever and have grown to love ads, which have become critical to most platforms’ business models. Further, many streaming services are airing more live events. These events, like sports games, show commercials to all subscribers, even those with a so-called “ad-free” subscription.

Separate from the Calm Act violation complaints, the FCC noted this month that other recent complaints it has seen illustrate “growing concern with the loudness of commercials on streaming services and other online platforms.” If the FCC decides to apply Calm Act rules to the web, it would need to create new methods for ensuring compliance, it said.

TV viewing trends by platform bar graph by Nielsen.

Nielsen’s most recent data on how people watch TV. Credit: Nielsen

The FCC didn’t specify what’s behind the spike in consumers’ commercial complaints. Perhaps with declining audiences, traditional TV providers thought it would be less likely for anyone to notice and formally complain about Ozempic ads shouting at them. Twelve years have passed since the rules took effect, so it’s also possible that organizations are getting lackadaisical about ensuring compliance or have dwindling resources.

With Americans spending similar amounts of time—if not longer—watching TV online versus via broadcast, cable, and satellite, The Calm Act would have to take on the web in order to maximize effectiveness. The streaming industry is young, though, and operates differently than linear TV distribution, presenting new regulation challenges.

Commercials are still too loud, say “thousands” of recent FCC complaints Read More »

microsoft-brings-an-official-copilot-app-to-macos-for-the-first-time

Microsoft brings an official Copilot app to macOS for the first time

It took a couple of years, but it happened: Microsoft released its Copilot AI assistant as an application for macOS. The app is available for download for free from the Mac App Store right now.

It was previously available briefly as a Mac app, sort of; for a short time, Microsoft’s iPad Copilot app could run on the Mac, but access on the Mac was quickly disabled. Mac users have been able to use a web-based interface for a while.

Copilot initially launched on the web and in web browsers (Edge, obviously) before making its way onto iOS and Android last year. It has since been slotted into all sorts of first-party Microsoft software, too.

The Copilot app joins a trend already spearheaded by ChatGPT and Anthropic of bringing native apps to the macOS platform. Like those, it enables an OS-wide keyboard shortcut to invoke a field for starting a chat at any time. It offers most of the same use cases: translating or summarizing text, answering questions, preparing reports and documents, solving coding problems or generating scripts, brainstorming, and so on.

Copilot uses OpenAI models like GPT-4 and DALL-E 3 (yes, it generates images, too) alongside others like Microsoft’s in-house Prometheus. Microsoft has invested significant amounts of money into OpenAI in recent years as the basis for Copilot and basically everything in its AI strategy.

Like Apple’s own built-in generative AI features, Copilot for macOS requires an M1 or later Mac. It also requires users to run macOS 14 or later.

Microsoft brings an official Copilot app to macOS for the first time Read More »