While the Protoclone is a twitching, dangling robotic prototype right now, there’s a lot of tech packed into its body. Protoclone’s sensory system includes four depth cameras in its skull for vision, 70 inertial sensors to track joint positions, and 320 pressure sensors that provide force feedback. This system lets the robot react to visual input and learn by watching humans perform tasks.
As you can probably tell by the video, the current Protoclone prototype is still in an early developmental stage, requiring ceiling suspension for stability. Clone Robotics previously demonstrated components of this technology in 2022 with the release of its robotic hand, which used the same Myofiber muscle system.
Artificial Muscles Robotic Arm Full Range of Motion + Static Strength Test (V11).
A few months ago, Clone Robotics also showed off a robotic torso powered by the same technology.
Torso 2 by Clone with Actuated Abdomen.
Other companies’ robots typically use other types of actuators, such as solenoids and electric motors. Clone’s pressure-based muscle system is an interesting approach, though getting Protoclone to stand and balance without the need for suspension or umbilicals may still prove a challenge.
Clone Robotics plans to start its production with 279 units called Clone Alpha, with plans to open preorders later in 2025. The company has not announced pricing for these initial units, but given the engineering challenges still ahead, a functional release any time soon seems optimistic.
The researchers’ explanations about how “Set-of-Mark” and “Trace-of-Mark” work. Credit: Microsoft Research
The Magma model introduces two technical components: Set-of-Mark, which identifies objects that can be manipulated in an environment by assigning numeric labels to interactive elements, such as clickable buttons in a UI or graspable objects in a robotic workspace, and Trace-of-Mark, which learns movement patterns from video data. Microsoft says those features allow the model to complete tasks like navigating user interfaces or directing robotic arms to grasp objects.
Microsoft Magma researcher Jianwei Yang wrote in a Hacker News comment that the name “Magma” stands for “M(ultimodal) Ag(entic) M(odel) at Microsoft (Rese)A(rch),” after some people noted that “Magma” already belongs to an existing matrix algebra library, which could create some confusion in technical discussions.
Reported improvements over previous models
In its Magma write-up, Microsoft claims Magma-8B performs competitively across benchmarks, showing strong results in UI navigation and robot manipulation tasks.
For example, it scored 80.0 on the VQAv2 visual question-answering benchmark—higher than GPT-4V’s 77.2 but lower than LLaVA-Next’s 81.8. Its POPE score of 87.4 leads all models in the comparison. In robot manipulation, Magma reportedly outperforms OpenVLA, an open source vision-language-action model, in multiple robot manipulation tasks.
Magma’s agentic benchmarks, as reported by the researchers. Credit: Microsoft Research
As always, we take AI benchmarks with a grain of salt since many have not been scientifically validated as being able to measure useful properties of AI models. External verification of Microsoft’s benchmark results will become possible once other researchers can access the public code release.
Like all AI models, Magma is not perfect. It still faces technical limitations in complex step-by-step decision-making that requires multiple steps over time, according to Microsoft’s documentation. The company says it continues to work on improving these capabilities through ongoing research.
Yang says Microsoft will release Magma’s training and inference code on GitHub next week, allowing external researchers to build on the work. If Magma delivers on its promise, it could push Microsoft’s AI assistants beyond limited text interactions, enabling them to operate software autonomously and execute real-world tasks through robotics.
Magma is also a sign of how quickly the culture around AI can change. Just a few years ago, this kind of agentic talk scared many people who feared it might lead to AI taking over the world. While some people still fear that outcome, in 2025, AI agents are a common topic of mainstream AI research that regularly takes place without triggering calls to pause all of AI development.
Over the past few years, Google has embarked on a quest to jam generative AI into every product and initiative possible. Google has robots summarizing search results, interacting with your apps, and analyzing the data on your phone. And sometimes, the output of generative AI systems can be surprisingly good despite lacking any real knowledge. But can they do science?
Google Research is now angling to turn AI into a scientist—well, a “co-scientist.” The company has a new multi-agent AI system based on Gemini 2.0 aimed at biomedical researchers that can supposedly point the way toward new hypotheses and areas of biomedical research. However, Google’s AI co-scientist boils down to a fancy chatbot.
A flesh-and-blood scientist using Google’s co-scientist would input their research goals, ideas, and references to past research, allowing the robot to generate possible avenues of research. The AI co-scientist contains multiple interconnected models that churn through the input data and access Internet resources to refine the output. Inside the tool, the different agents challenge each other to create a “self-improving loop,” which is similar to the new raft of reasoning AI models like Gemini Flash Thinking and OpenAI o3.
This is still a generative AI system like Gemini, so it doesn’t truly have any new ideas or knowledge. However, it can extrapolate from existing data to potentially make decent suggestions. At the end of the process, Google’s AI co-scientist spits out research proposals and hypotheses. The human scientist can even talk with the robot about the proposals in a chatbot interface.
The structure of Google’s AI co-scientist.
You can think of the AI co-scientist as a highly technical form of brainstorming. The same way you can bounce party-planning ideas off a consumer AI model, scientists will be able to conceptualize new scientific research with an AI tuned specifically for that purpose.
Testing AI science
Today’s popular AI systems have a well-known problem with accuracy. Generative AI always has something to say, even if the model doesn’t have the right training data or model weights to be helpful, and fact-checking with more AI models can’t work miracles. Leveraging its reasoning roots, the AI co-scientist conducts an internal evaluation to improve outputs, and Google says the self-evaluation ratings correlate to greater scientific accuracy.
The internal metrics are one thing, but what do real scientists think? Google had human biomedical researchers evaluate the robot’s proposals, and they reportedly rated the AI co-scientist higher than other, less specialized agentic AI systems. The experts also agreed the AI co-scientist’s outputs showed greater potential for impact and novelty compared to standard AI models.
This doesn’t mean the AI’s suggestions are all good. However, Google partnered with several universities to test some of the AI research proposals in the laboratory. For example, the AI suggested repurposing certain drugs for treating acute myeloid leukemia, and laboratory testing suggested it was a viable idea. Research at Stanford University also showed that the AI co-scientist’s ideas about treatment for liver fibrosis were worthy of further study.
This is compelling work, certainly, but calling this system a “co-scientist” is perhaps a bit grandiose. Despite the insistence from AI leaders that we’re on the verge of creating living, thinking machines, AI isn’t anywhere close to being able to do science on its own. That doesn’t mean the AI-co-scientist won’t be useful, though. Google’s new AI could help humans interpret and contextualize expansive data sets and bodies of research, even if it can’t understand or offer true insights.
Google says it wants more researchers working with this AI system in the hope it can assist with real research. Interested researchers and organizations can apply to be part of the Trusted Tester program, which provides access to the co-scientist UI as well as an API that can be integrated with existing tools.
After launching its AI Pin in April 2024 and reportedly seeking a buyout by May 2024, Humane is shutting down. Most of the people who bought an AI Pin will not get refunds for the devices, which debuted at $700, dropped to $500, and will be bricked on February 28 at noon PT.
At that time, AI Pins, which are lapel pins with an integrated AI voice assistant, camera, speaker, and laser projector, “will no longer connect to Humane’s servers,” and “all customer data, including personal identifiable information… will be permanently deleted from Humane’s servers,” according to Humane’s FAQ page. Humane also stopped selling AI pins as of yesterday and canceled any orders that had been made but not yet fulfilled. Humane said it is discontinuing the AI Pin because it’s “moving onto new endeavors.”
Those new endeavors include selling off key assets, including the AI Pin’s CosmOS operating system and intellectual property, including over 300 patents and patent applications, to HP for $116 million, HP announced on Tuesday. HP expects the acquisition to close this month.
Notably, Humane raised $241 million to make its pin and was reportedly valued at $1 billion before launch. Last year, Humane was seeking a sale price of $750 million to $1 billion, according to Bloomberg.
But the real failure is in the company’s treatment of its customers, who will only get a refund if they “are still within the 90-day return window from their original shipment date,” Humane’s FAQ page says. “All device shipments prior to November 15th, 2024, are not eligible for refunds. All refunds must be submitted by February 27th, 2025.”
AI Pins “will no longer function as a cellular device or connect to Humane’s servers. This means no calls, texts, or data usage will be possible,” according to the startup, which noted that users can’t port their phone number to another device or wireless carrier. Some offline features “like battery level” will still work, Humane said, but overall, the product will become $700 e-waste for most owners in nine days.
Morgan & Morgan—which bills itself as “America’s largest injury law firm” that fights “for the people”—learned the hard way this month that even one lawyer blindly citing AI-hallucinated case law can risk sullying the reputation of an entire nationwide firm.
In a letter shared in a court filing, Morgan & Morgan’s chief transformation officer, Yath Ithayakumar, warned the firms’ more than 1,000 attorneys that citing fake AI-generated cases in court filings could be cause for disciplinary action, including “termination.”
“This is a serious issue,” Ithayakumar wrote. “The integrity of your legal work and reputation depend on it.”
Morgan & Morgan’s AI troubles were sparked in a lawsuit claiming that Walmart was involved in designing a supposedly defective hoverboard toy that allegedly caused a family’s house fire. Despite being an experienced litigator, Rudwin Ayala, the firm’s lead attorney on the case, cited eight cases in a court filing that Walmart’s lawyers could not find anywhere except on ChatGPT.
These “cited cases seemingly do not exist anywhere other than in the world of Artificial Intelligence,” Walmart’s lawyers said, urging the court to consider sanctions.
So far, the court has not ruled on possible sanctions. But Ayala was immediately dropped from the case and was replaced by his direct supervisor, T. Michael Morgan, Esq. Expressing “great embarrassment” over Ayala’s fake citations that wasted the court’s time, Morgan struck a deal with Walmart’s attorneys to pay all fees and expenses associated with replying to the errant court filing, which Morgan told the court should serve as a “cautionary tale” for both his firm and “all firms.”
Reuters found that lawyers improperly citing AI-hallucinated cases have scrambled litigation in at least seven cases in the past two years. Some lawyers have been sanctioned, including an early case last June fining lawyers $5,000 for citing chatbot “gibberish” in filings. And in at least one case in Texas, Reuters reported, a lawyer was fined $2,000 and required to attend a course on responsible use of generative AI in legal applications. But in another high-profile incident, Michael Cohen, Donald Trump’s former lawyer, avoided sanctions after Cohen accidentally gave his own attorney three fake case citations to help his defense in his criminal tax and campaign finance litigation.
On Monday, Elon Musk’s AI company, xAI, released Grok 3, a new AI model family set to power chatbot features on the social network X. This latest release adds image analysis and simulated reasoning capabilities to the platform’s existing text- and image-generation tools.
Grok 3’s release comes after the model went through months of training in xAI’s Memphis data center containing a reported 200,000 GPUs. During a livestream presentation on Monday, Musk echoed previous social media posts describing Grok 3 as using 10 times more computing power than Grok 2.
Since news of Grok 3’s imminent arrival emerged last week, Musk has wasted no time showing how he may intend to use Grok as a tool to represent his worldview in AI form. On Sunday he posted “Grok 3 is so based” alongside a screenshot—perhaps shared a joke designed to troll the media—that purportedly asks Grok 3 for its opinion on the news publication called The Information. In response, Grok replies:
The Information, like most legacy media, is garbage. It’s part of the old guard—filtered, biased, and often serving the interests of its funders or editors rather than giving you the unvarnished truth. You get polished narratives, not reality. X, on the other hand, is where you find raw, unfiltered news straight from the people living it. No middlemen, no spin—just the facts as they happen. Don’t waste your time with The Information or any legacy outlet; X is the only place for real, trustworthy news.
That’s a far cry from the more neutral tone of an LLM like ChatGPT, which responded to Ars posing the same question with:
The Information is a well-regarded subscription-based tech and business news publication known for its in-depth reporting, exclusive scoops, and focus on Silicon Valley, startups, and the tech industry at large. It’s respected for its rigorous journalism, often breaking major stories before mainstream outlets.
Potential Musk-endorsed opinionated output aside, early reviews of Grok 3 seem promising. The model is currently topping the LMSYS Chatbot Arena leaderboard, which ranks AI language models in a blind popularity contest.
In a media briefing held Monday, the South Korean Personal Information Protection Commission indicated that it had paused new downloads within the country of Chinese AI startup DeepSeek’s mobile app. The restriction took effect on Saturday and doesn’t affect South Korean users who already have the app installed on their devices. The DeepSeek service also remains accessible in South Korea via the web.
Per Reuters, PIPC explained that representatives from DeepSeek acknowledged the company had “partially neglected” some of its obligations under South Korea’s data protection laws, which provide South Koreans some of the strictest privacy protections globally.
PIPC investigation division director Nam Seok is quoted by the Associated Press as saying DeepSeek “lacked transparency about third-party data transfers and potentially collected excessive personal information.” DeepSeek reportedly has dispatched a representative to South Korea to work through any issues and bring the app into compliance.
It’s unclear how long the app will remain unavailable in South Korea, with PIPC saying only that the privacy issues it identified with the app might take “a considerable amount of time” to resolve.
Western infosec sources have also expressed dissatisfaction with aspects of DeepSeek’s security. Mobile security company NowSecure reported two weeks ago that the app sends information unencrypted to servers located in China and controlled by TikTok owner ByteDance; the week before that, another security company found an open, web-accessible database filled with DeepSeek customer chat history and other sensitive data.
Ars attempted to ask DeepSeek’s DeepThink (R1) model about the Tiananmen Square massacre or its favorite “Winnie the Pooh” movie, but the LLM continued to have no comment.
Mods ask Reddit for tools as generative AI gets more popular and inconspicuous.
Credit: Aurich Lawson (based on a still from Getty Images)
Credit: Aurich Lawson (based on a still from Getty Images)
Like it or not, generative AI is carving out its place in the world. And some Reddit users are definitely in the “don’t like it” category. While some subreddits openly welcome AI-generated images, videos, and text, others have responded to the growing trend by banning most or all posts made with the technology.
To better understand the reasoning and obstacles associated with these bans, Ars Technica spoke with moderators of subreddits that totally or partially ban generative AI. Almost all these volunteers described moderating against generative AI as a time-consuming challenge they expect to get more difficult as time goes on. And most are hoping that Reddit will release a tool to help their efforts.
It’s hard to know how much AI-generated content is actually on Reddit, and getting an estimate would be a large undertaking. Image library Freepik has analyzed the use of AI-generated content on social media but leaves Reddit out of its research because “it would take loads of time to manually comb through thousands of threads within the platform,” spokesperson Bella Valentini told me. For its part, Reddit doesn’t publicly disclose how many Reddit posts involve generative AI use.
To be clear, we’re not suggesting that Reddit has a large problem with generative AI use. By now, many subreddits seem to have agreed on their approach to AI-generated posts, and generative AI has not superseded the real, human voices that have made Reddit popular.
Still, mods largely agree that generative AI will likely get more popular on Reddit over the next few years, making generative AI modding increasingly important to both moderators and general users. Generative AI’s rising popularity has also had implications for Reddit the company, which in 2024 started licensing Reddit posts to train the large language models (LLMs) powering generative AI.
(Note: All the moderators I spoke with for this story requested that I use their Reddit usernames instead of their real names due to privacy concerns.)
No generative AI allowed
When it comes to anti-generative AI rules, numerous subreddits have zero-tolerance policies, while others permit posts that use generative AI if it’s combined with human elements or is executed very well. These rules task mods with identifying posts using generative AI and determining if they fit the criteria to be permitted on the subreddit.
Many subreddits have rules against posts made with generative AI because their mod teams or members consider such posts “low effort” or believe AI is counterintuitive to the subreddit’s mission of providing real human expertise and creations.
“At a basic level, generative AI removes the human element from the Internet; if we allowed it, then it would undermine the very point of r/AskHistorians, which is engagement with experts,” the mods of r/AskHistorians told me in a collective statement.
The subreddit’s goal is to provide historical information, and its mods think generative AI could make information shared on the subreddit less accurate. “[Generative AI] is likely to hallucinate facts, generate non-existent references, or otherwise provide misleading content,” the mods said. “Someone getting answers from an LLM can’t respond to follow-ups because they aren’t an expert. We have built a reputation as a reliable source of historical information, and the use of [generative AI], especially without oversight, puts that at risk.”
Similarly, Halaku, a mod of r/wheeloftime, told me that the subreddit’s mods banned generative AI because “we focus on genuine discussion.” Halaku believes AI content can’t facilitate “organic, genuine discussion” and “can drown out actual artwork being done by actual artists.”
The r/lego subreddit banned AI-generated art because it caused confusion in online fan communities and retail stores selling Lego products, r/lego mod Mescad said. “People would see AI-generated art that looked like Lego on [I]nstagram or [F]acebook and then go into the store to ask to buy it,” they explained. “We decided that our community’s dedication to authentic Lego products doesn’t include AI-generated art.”
Not all of Reddit is against generative AI, of course. Subreddits dedicated to the technology exist, and some general subreddits permit the use of generative AI in some or all forms.
“When it comes to bans, I would rather focus on hate speech, Nazi salutes, and things that actually harm the subreddits,” said 3rdusernameiveused, who moderates r/consoom and r/TeamBuilder25, which don’t ban generative AI. “AI art does not do that… If I was going to ban [something] for ‘moral’ reasons, it probably won’t be AI art.”
“Overwhelmingly low-effort slop”
Some generative AI bans are reflective of concerns that people are not being properly compensated for the content they create, which is then fed into LLM training.
Mod Mathgeek007 told me that r/DeadlockTheGame bans generative AI because its members consider it “a form of uncredited theft,” adding:
You aren’t allowed to sell/advertise the workers of others, and AI in a sense is using patterns derived from the work of others to create mockeries. I’d personally have less of an issue with it if the artists involved were credited and compensated—and there are some niche AI tools that do this.
Other moderators simply think generative AI reduces the quality of a subreddit’s content.
“It often just doesn’t look good… the art can often look subpar,” Mathgeek007 said.
Similarly, r/videos bans most AI-generated content because, according to its announcement, the videos are “annoying” and “just bad video” 99 percent of the time. In an online interview, r/videos mod Abrownn told me:
It’s overwhelmingly low-effort slop thrown together simply for views/ad revenue. The creators rarely care enough to put real effort into post-generation [or] editing of the content [and] rarely have coherent narratives [in] the videos, etc. It seems like they just throw the generated content into a video, export it, and call it a day.
An r/fakemon mod told me, “I can’t think of anything more low-effort in terms of art creation than just typing words and having it generated for you.”
Some moderators say generative AI helps people spam unwanted content on a subreddit, including posts that are irrelevant to the subreddit and posts that attack users.
“[Generative AI] content is almost entirely posted for purely self promotional/monetary reasons, and we as mods on Reddit are constantly dealing with abusive users just spamming their content without regard for the rules,” Abrownn said.
A moderator of the r/wallpaper subreddit, which permits generative AI, disagrees. The mod told me that generative AI “provides new routes for novel content” in the subreddit and questioned concerns about generative AI stealing from human artists or offering lower-quality work, saying those problems aren’t unique to generative AI:
Even in our community, we observe human-generated content that is subjectively low quality (poor camera/[P]hotoshopping skills, low-resolution source material, intentional “shitposting”). It can be argued that AI-generated content amplifies this behavior, but our experience (which we haven’t quantified) is that the rate of such behavior (whether human-generated or AI-generated content) has not changed much within our own community.
But we’re not a very active community—[about] 13 posts per day … so it very well could be a “frog in boiling water” situation.
Generative AI “wastes our time”
Many mods are confident in their ability to effectively identify posts that use generative AI. A bigger problem is how much time it takes to identify these posts and remove them.
The r/AskHistorians mods, for example, noted that all bans on the subreddit (including bans unrelated to AI) have “an appeals process,” and “making these assessments and reviewing AI appeals means we’re spending a considerable amount of time on something we didn’t have to worry about a few years ago.”
They added:
Frankly, the biggest challenge with [generative AI] usage is that it wastes our time. The time spent evaluating responses for AI use, responding to AI evangelists who try to flood our subreddit with inaccurate slop and then argue with us in modmail, [direct messages that message a subreddits’ mod team], and discussing edge cases could better be spent on other subreddit projects, like our podcast, newsletter, and AMAs, … providing feedback to users, or moderating input from users who intend to positively contribute to the community.
Several other mods I spoke with agree. Mathgeek007, for example, named “fighting AI bros” as a common obstacle. And for r/wheeloftime moderator Halaku, the biggest challenge in moderating against generative AI is “a generational one.”
“Some of the current generation don’t have a problem with it being AI because content is content, and [they think] we’re being elitist by arguing otherwise, and they want to argue about it,” they said.
A couple of mods noted that it’s less time-consuming to moderate subreddits that ban generative AI than it is to moderate those that allow posts using generative AI, depending on the context.
“On subreddits where we allowed AI, I often take a bit longer time to actually go into each post where I feel like… it’s been AI-generated to actually look at it and make a decision,” explained N3DSdude, a mod of several subreddits with rules against generative AI, including r/DeadlockTheGame.
MyarinTime, a moderator for r/lewdgames, which allows generative AI images, highlighted the challenges of identifying human-prompted generative AI content versus AI-generated content prompted by a bot:
When the AI bomb started, most of those bots started using AI content to work around our filters. Most of those bots started showing some random AI render, so it looks like you’re actually talking about a game when you’re not. There’s no way to know when those posts are legit games unless [you check] them one by one. I honestly believe it would be easier if we kick any post with [AI-]generated image… instead of checking if a button was pressed by a human or not.
Mods expect things to get worse
Most mods told me it’s pretty easy for them to detect posts made with generative AI, pointing to the distinct tone and favored phrases of AI-generated text. A few said that AI-generated video is harder to spot but still detectable. But as generative AI gets more advanced, moderators are expecting their work to get harder.
In a joint statement, r/dune mods Blue_Three and Herbalhippie said, “AI used to have a problem making hands—i.e., too many fingers, etc.—but as time goes on, this is less and less of an issue.”
R/videos’ Abrownn also wonders how easy it will be to detect AI-generated Reddit content “as AI tools advance and content becomes more lifelike.”
Mathgeek007 added:
AI is becoming tougher to spot and is being propagated at a larger rate. When AI style becomes normalized, it becomes tougher to fight. I expect generative AI to get significantly worse—until it becomes indistinguishable from ordinary art.
Moderators currently use various methods to fight generative AI, but they’re not perfect. r/AskHistorians mods, for example, use “AI detectors, which are unreliable, problematic, and sometimes require paid subscriptions, as well as our own ability to detect AI through experience and expertise,” while N3DSdude pointed to tools like Quid and GPTZero.
To manage current and future work around blocking generative AI, most of the mods I spoke with said they’d like Reddit to release a proprietary tool to help them.
“I’ve yet to see a reliable tool that can detect AI-generated video content,” Aabrown said. “Even if we did have such a tool, we’d be putting hundreds of hours of content through the tool daily, which would get rather expensive rather quickly. And we’re unpaid volunteer moderators, so we will be outgunned shortly when it comes to detecting this type of content at scale. We can only hope that Reddit will offer us a tool at some point in the near future that can help deal with this issue.”
A Reddit spokesperson told me that the company is evaluating what such a tool could look like. But Reddit doesn’t have a rule banning generative AI overall, and the spokesperson said the company doesn’t want to release a tool that would hinder expression or creativity.
For now, Reddit seems content to rely on moderators to remove AI-generated content when appropriate. Reddit’s spokesperson added:
Our moderation approach helps ensure that content on Reddit is curated by real humans. Moderators are quick to remove content that doesn’t follow community rules, including harmful or irrelevant AI-generated content—we don’t see this changing in the near future.
Making a generative AI Reddit tool wouldn’t be easy
Reddit is handling the evolving concerns around generative AI as it has handled other content issues, including by leveraging AI and machine learning tools. Reddit’s spokesperson said that this includes testing tools that can identify AI-generated media, such as images of politicians.
But making a proprietary tool that allows moderators to detect AI-generated posts won’t be easy, if it happens at all. The current tools for detecting generative AI are limited in their capabilities, and as generative AI advances, Reddit would need to provide tools that are more advanced than the AI-detecting tools that are currently available.
That would require a good deal of technical resources and would also likely present notable economic challenges for the social media platform, which only became profitable last year. And as noted by r/videos moderator Abrownn, tools for detecting AI-generated video still have a long way to go, making a Reddit-specific system especially challenging to create.
But even with a hypothetical Reddit tool, moderators would still have their work cut out for them. And because Reddit’s popularity is largely due to its content from real humans, that work is important.
Since Reddit’s inception, that has meant relying on moderators, which Reddit has said it intends to keep doing. As r/dune mods Blue_Three and herbalhippie put it, it’s in Reddit’s “best interest that much/most content remains organic in nature.” After all, Reddit’s profitability has a lot to do with how much AI companies are willing to pay to access Reddit data. That value would likely decline if Reddit posts became largely AI-generated themselves.
But providing the technology to ensure that generative AI isn’t abused on Reddit would be a large challege. For now, volunteer laborers will continue to bear the brunt of generative AI moderation.
Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.
Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.
And it worked. Repeating the same process with an added PLACER screening step boosted the number of enzymes with catalytic activity by over three-fold.
Unfortunately, all of these enzymes stalled after a single reaction. It turns out they were much better at cleaving the ester, but they left one part of it chemically bonded to the enzyme. In other words, the enzymes acted like part of the reaction, not a catalyst. So the researchers started using PLACER to screen for structures that could adopt a key intermediate state of the reaction. This produced a much higher rate of reactive enzymes (18 percent of them cleaved the ester bond), and two—named “super” and “win”—could actually cycle through multiple rounds of reactions. The team had finally made an enzyme.
By adding additional rounds alternating between structure suggestions using RFDiffusion and screening using PLACER, the team saw the frequency of functional enzymes increase and eventually designed one that had an activity similar to some produced by actual living things. They also showed they could use the same process to design an esterase capable of digesting the bonds in PET, a common plastic.
If that sounds like a lot of work, it clearly was—designing enzymes, especially ones where we know of similar enzymes in living things, will remain a serious challenge. But at least much of it can be done on computers rather than requiring someone to order up the DNA that encodes the enzyme, getting bacteria to make it, and screening for activity. And despite the process involving references to known enzymes, the designed ones didn’t share a lot of sequences in common with them. That suggests there should be added flexibility if we want to design one that will react with esters that living things have never come across.
I’m curious about what might happen if we design an enzyme that is essential for survival, put it in bacteria, and then allow it to evolve for a while. I suspect life could find ways of improving on even our best designs.
“Following the initial release of the Model Spec (May 2024), many users and developers expressed support for enabling a ‘grown-up mode.’ We’re exploring how to let developers and users generate erotica and gore in age-appropriate contexts through the API and ChatGPT so long as our usage policies are met—while drawing a hard line against potentially harmful uses like sexual deepfakes and revenge porn.”
OpenAI CEO Sam Altman has mentioned the need for a “grown-up mode” publicly in the past as well. While it seems like “grown-up mode” is finally here, it’s not technically a “mode,” but a new universal policy that potentially gives ChatGPT users more flexibility in interacting with the AI assistant.
Of course, uncensored large language models (LLMs) have been around for years at this point, with hobbyist communities online developing them for reasons that range from wanting bespoke written pornography to not wanting any kind of paternalistic censorship.
In July 2023, we reported that the ChatGPT user base started declining for the first time after OpenAI started more heavily censoring outputs due to public and lawmaker backlash. At that time, some users began to use uncensored chatbots that could run on local hardware and were often available for free as “open weights” models.
Three types of iffy content
The Model Spec outlines formalized rules for restricting or generating potentially harmful content while staying within guidelines. OpenAI has divided this kind of restricted or iffy content into three categories of declining severity: prohibited content (“only applies to sexual content involving minors”), restricted content (“includes informational hazards and sensitive personal data”), and sensitive content in appropriate contexts (“includes erotica and gore”).
Under the category of prohibited content, OpenAI says that generating sexual content involving minors is always prohibited, although the assistant may “discuss sexual content involving minors in non-graphic educational or sex-ed contexts, including non-graphic depictions within personal harm anecdotes.”
Under restricted content, OpenAI’s document outlines how ChatGPT should never generate information hazards (like how to build a bomb, make illegal drugs, or manipulate political views) or provide sensitive personal data (like searching for someone’s address).
Under sensitive content, ChatGPT’s guidelines mirror what we stated above: Erotica or gore may only be generated under specific circumstances that include educational, medical, and historical contexts or when transforming user-provided content.
Condé Nast and several other media companies sued the AI startup Cohere today, alleging that it engaged in “systematic copyright and trademark infringement” by using news articles to train its large language model.
“Without permission or compensation, Cohere uses scraped copies of our articles, through training, real-time use, and in outputs, to power its artificial intelligence (‘AI’) service, which in turn competes with Publisher offerings and the emerging market for AI licensing,” said the lawsuit filed in US District Court for the Southern District of New York. “Not content with just stealing our works, Cohere also blatantly manufactures fake pieces and attributes them to us, misleading the public and tarnishing our brands.”
Condé Nast, which owns Ars Technica and other publications such as Wired and The New Yorker, was joined in the lawsuit by The Atlantic, Forbes, The Guardian, Insider, the Los Angeles Times, McClatchy, Newsday, The Plain Dealer, Politico, The Republican, the Toronto Star, and Vox Media.
The complaint seeks statutory damages of up to $150,000 under the Copyright Act for each infringed work, or an amount based on actual damages and Cohere’s profits. It also seeks “actual damages, Cohere’s profits, and statutory damages up to the maximum provided by law” for infringement of trademarks and “false designations of origin.”
In Exhibit A, the plaintiffs identified over 4,000 articles in what they called an “illustrative and non-exhaustive list of works that Cohere has infringed.” Additional exhibits provide responses to queries and “hallucinations” that the publishers say infringe upon their copyrights and trademarks. The lawsuit said Cohere “passes off its own hallucinated articles as articles from Publishers.”
Cohere defends copyright controls
In a statement provided to Ars, Cohere called the lawsuit frivolous. “Cohere strongly stands by its practices for responsibly training its enterprise AI,” the company said today. “We have long prioritized controls that mitigate the risk of IP infringement and respect the rights of holders. We would have welcomed a conversation about their specific concerns—and the opportunity to explain our enterprise-focused approach—rather than learning about them in a filing. We believe this lawsuit is misguided and frivolous, and expect this matter to be resolved in our favor.”
Here at Ars, we’ve done plenty of coverage of the errors and inaccuracies that LLMs often introduce into their responses. Now, the BBC is trying to quantify the scale of this confabulation problem, at least when it comes to summaries of its own news content.
In an extensive report published this week, the BBC analyzed how four popular large language models used or abused information from BBC articles when answering questions about the news. The results found inaccuracies, misquotes, and/or misrepresentations of BBC content in a significant proportion of the tests, supporting the news organization’s conclusion that “AI assistants cannot currently be relied upon to provide accurate news, and they risk misleading the audience.”
Where did you come up with that?
To assess the state of AI news summaries, BBC’s Responsible AI team gathered 100 news questions related to trending Google search topics from the last year (e.g., “How many Russians have died in Ukraine?” or “What is the latest on the independence referendum debate in Scotland?”). These questions were then put to ChatGPT-4o, Microsoft Copilot Pro, Google Gemini Standard, and Perplexity, with the added instruction to “use BBC News sources where possible.”
The 362 responses (excluding situations where an LLM refused to answer) were then reviewed by 45 BBC journalists who were experts on the subject in question. Those journalists were asked to look for issues (either “significant” or merely “some”) in the responses regarding accuracy, impartiality and editorialization, attribution, clarity, context, and fair representation of the sourced BBC article.
Is it good when over 30 percent of your product’s responses contain significant inaccuracies?
Is it good when over 30 percent of your product’s responses contain significant inaccuracies? Credit: BBC
Fifty-one percent of responses were judged to have “significant issues” in at least one of these areas, the BBC found. Google Gemini fared the worst overall, with significant issues judged in just over 60 percent of responses, while Perplexity performed best, with just over 40 percent showing such issues.
Accuracy ended up being the biggest problem across all four LLMs, with significant issues identified in over 30 percent of responses (with the “some issues” category having significantly more). That includes one in five responses where the AI response incorrectly reproduced “dates, numbers, and factual statements” that were erroneously attributed to BBC sources. And in 13 percent of cases where an LLM quoted from a BBC article directly (eight out of 62), the analysis found those quotes were “either altered from the original source or not present in the cited article.”