AI

sergey-brin-says-agi-is-within-reach-if-googlers-work-60-hour-weeks

Sergey Brin says AGI is within reach if Googlers work 60-hour weeks

Sergey Brin co-founded Google in the 1990s along with Larry Page, but both stepped away from the day to day at Google in 2019. However, the AI boom tempted Brin to return to the office, and he thinks everyone should follow his example. In a new internal memo, Brin has advised employees to be in the office every weekday so Google can win the AI race.

Just returning to the office isn’t enough for the Google co-founder. According to the memo seen by The New York Times, Brin says Googlers should try to work 60 hours per week to support the company’s AI efforts. That works out to 12 hours per day, Monday through Friday, which Brin calls the “sweet spot of productivity.” This is not a new opinion for Brin.

Brin, like many in Silicon Valley, is seemingly committed to the dogma that the current trajectory of generative AI will lead to the development of artificial general intelligence (AGI). Such a thinking machine would be head and shoulders above current AI models, which can only do a good impression of thinking. An AGI would understand concepts and think more like a human being, which some would argue makes it a conscious entity.

To hear Brin tell it, Google is in the best position to make this AI computing breakthrough. He cites the company’s strong workforce of programmers and data scientists as the key, but he also believes the team must strive for greater efficiency by using Google’s own Gemini AI tools as much as possible. Oh, and don’t work from home.

Brin and Page handed the reins to current CEO Sundar Pichai in 2015, so his pronouncement doesn’t necessarily signal a change to the company’s current in-office policy. Google still operates on a hybrid model, with workers expected to be in the office three days per week. But as a founder, Brin’s voice carries weight. We reached out to Google to ask if the company intends to reassess its policies, but a Google rep says there are no planned changes to the return-to-office mandate.

Sergey Brin says AGI is within reach if Googlers work 60-hour weeks Read More »

copilot-exposes-private-github-pages,-some-removed-by-microsoft

Copilot exposes private GitHub pages, some removed by Microsoft

Screenshot showing Copilot continues to serve tools Microsoft took action to have removed from GitHub. Credit: Lasso

Lasso ultimately determined that Microsoft’s fix involved cutting off access to a special Bing user interface, once available at cc.bingj.com, to the public. The fix, however, didn’t appear to clear the private pages from the cache itself. As a result, the private information was still accessible to Copilot, which in turn would make it available to the Copilot user who asked.

The Lasso researchers explained:

Although Bing’s cached link feature was disabled, cached pages continued to appear in search results. This indicated that the fix was a temporary patch and while public access was blocked, the underlying data had not been fully removed.

When we revisited our investigation of Microsoft Copilot, our suspicions were confirmed: Copilot still had access to the cached data that was no longer available to human users. In short, the fix was only partial, human users were prevented from retrieving the cached data, but Copilot could still access it.

The post laid out simple steps anyone can take to find and view the same massive trove of private repositories Lasso identified.

There’s no putting toothpaste back in the tube

Developers frequently embed security tokens, private encryption keys and other sensitive information directly into their code, despite best practices that have long called for such data to be inputted through more secure means. This potential damage worsens when this code is made available in public repositories, another common security failing. The phenomenon has occurred over and over for more than a decade.

When these sorts of mistakes happen, developers often make the repositories private quickly, hoping to contain the fallout. Lasso’s findings show that simply making the code private isn’t enough. Once exposed, credentials are irreparably compromised. The only recourse is to rotate all credentials.

This advice still doesn’t address the problems resulting when other sensitive data is included in repositories that are switched from public to private. Microsoft incurred legal expenses to have tools removed from GitHub after alleging they violated a raft of laws, including the Computer Fraud and Abuse Act, the Digital Millennium Copyright Act, the Lanham Act, and the Racketeer Influenced and Corrupt Organizations Act. Company lawyers prevailed in getting the tools removed. To date, Copilot continues undermining this work by making the tools available anyway.

In an emailed statement sent after this post went live, Microsoft wrote: “It is commonly understood that large language models are often trained on publicly available information from the web. If users prefer to avoid making their content publicly available for training these models, they are encouraged to keep their repositories private at all times.”

Copilot exposes private GitHub pages, some removed by Microsoft Read More »

microsoft-brings-an-official-copilot-app-to-macos-for-the-first-time

Microsoft brings an official Copilot app to macOS for the first time

It took a couple of years, but it happened: Microsoft released its Copilot AI assistant as an application for macOS. The app is available for download for free from the Mac App Store right now.

It was previously available briefly as a Mac app, sort of; for a short time, Microsoft’s iPad Copilot app could run on the Mac, but access on the Mac was quickly disabled. Mac users have been able to use a web-based interface for a while.

Copilot initially launched on the web and in web browsers (Edge, obviously) before making its way onto iOS and Android last year. It has since been slotted into all sorts of first-party Microsoft software, too.

The Copilot app joins a trend already spearheaded by ChatGPT and Anthropic of bringing native apps to the macOS platform. Like those, it enables an OS-wide keyboard shortcut to invoke a field for starting a chat at any time. It offers most of the same use cases: translating or summarizing text, answering questions, preparing reports and documents, solving coding problems or generating scripts, brainstorming, and so on.

Copilot uses OpenAI models like GPT-4 and DALL-E 3 (yes, it generates images, too) alongside others like Microsoft’s in-house Prometheus. Microsoft has invested significant amounts of money into OpenAI in recent years as the basis for Copilot and basically everything in its AI strategy.

Like Apple’s own built-in generative AI features, Copilot for macOS requires an M1 or later Mac. It also requires users to run macOS 14 or later.

Microsoft brings an official Copilot app to macOS for the first time Read More »

new-ai-text-diffusion-models-break-speed-barriers-by-pulling-words-from-noise

New AI text diffusion models break speed barriers by pulling words from noise

These diffusion models maintain performance faster than or comparable to similarly sized conventional models. LLaDA’s researchers report their 8 billion parameter model performs similarly to LLaMA3 8B across various benchmarks, with competitive results on tasks like MMLU, ARC, and GSM8K.

However, Mercury claims dramatic speed improvements. Their Mercury Coder Mini scores 88.0 percent on HumanEval and 77.1 percent on MBPP—comparable to GPT-4o Mini—while reportedly operating at 1,109 tokens per second compared to GPT-4o Mini’s 59 tokens per second. This represents roughly a 19x speed advantage over GPT-4o Mini while maintaining similar performance on coding benchmarks.

Mercury’s documentation states its models run “at over 1,000 tokens/sec on Nvidia H100s, a speed previously possible only using custom chips” from specialized hardware providers like Groq, Cerebras, and SambaNova. When compared to other speed-optimized models, the claimed advantage remains significant—Mercury Coder Mini is reportedly about 5.5x faster than Gemini 2.0 Flash-Lite (201 tokens/second) and 18x faster than Claude 3.5 Haiku (61 tokens/second).

Opening a potential new frontier in LLMs

Diffusion models do involve some trade-offs. They typically need multiple forward passes through the network to generate a complete response, unlike traditional models that need just one pass per token. However, because diffusion models process all tokens in parallel, they achieve higher throughput despite this overhead.

Inception thinks the speed advantages could impact code completion tools where instant response may affect developer productivity, conversational AI applications, resource-limited environments like mobile applications, and AI agents that need to respond quickly.

If diffusion-based language models maintain quality while improving speed, they might change how AI text generation develops. So far, AI researchers have been open to new approaches.

Independent AI researcher Simon Willison told Ars Technica, “I love that people are experimenting with alternative architectures to transformers, it’s yet another illustration of how much of the space of LLMs we haven’t even started to explore yet.”

On X, former OpenAI researcher Andrej Karpathy wrote about Inception, “This model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!”

Questions remain about whether larger diffusion models can match the performance of models like GPT-4o and Claude 3.7 Sonnet, produce reliable results without many confabulations, and if the approach can handle increasingly complex simulated reasoning tasks. For now, these models may offer an alternative for smaller AI language models that doesn’t seem to sacrifice capability for speed.

You can try Mercury Coder yourself on Inception’s demo site, and you can download code for LLaDA or try a demo on Hugging Face.

New AI text diffusion models break speed barriers by pulling words from noise Read More »

grok’s-new-“unhinged”-voice-mode-can-curse-and-scream,-simulate-phone-sex

Grok’s new “unhinged” voice mode can curse and scream, simulate phone sex

On Sunday, xAI released a new voice interaction mode for its Grok 3 AI model that is currently available to its premium subscribers. The feature is somewhat similar to OpenAI’s Advanced Voice Mode for ChatGPT. But unlike ChatGPT, Grok offers several uncensored personalities users can choose from (currently expressed through the same default female voice), including an “unhinged” mode and one that will roleplay verbal sexual scenarios.

On Monday, AI researcher Riley Goodside brought wider attention to the over-the-top “unhinged” mode in particular when he tweeted a video (warning: NSFW audio) that showed him repeatedly interrupting the vocal chatbot, which began to simulate yelling when asked. “Grok 3 Voice Mode, following repeated, interrupting requests to yell louder, lets out an inhuman 30-second scream, insults me, and hangs up,” he wrote.

By default, “unhinged” mode curses, insults, and belittles the user non-stop using vulgar language. Other modes include “Storyteller” (which does what it sounds like), “Romantic” (which stammers and speaks in a slow, uncertain, and insecure way), “Meditation” (which can guide you through a meditation-like experience), “Conspiracy” (which likes to talk about conspiracy theories, UFOs, and bigfoot), “Unlicensed Therapist” (which plays the part of a talk psychologist), “Grok Doc” (a doctor), “Sexy” (marked as “18+” and acts almost like a 1-800 phone sex operator), and “Professor” (which talks about science).

A composite screenshot of various Grok 3 voice mode personalities, as seen in the Grok app for iOS.

A composite screenshot of various Grok 3 voice mode personalities, as seen in the Grok app for iOS.

Basically, xAI is taking the exact opposite approach of other AI companies, such as OpenAI, which censor discussions about not-safe-for-work topics or scenarios they consider too risky for discussion. For example, the “Sexy” mode (warning: NSFW audio) will discuss graphically sexual situations, which ChatGPT’s voice mode will not touch, although OpenAI recently loosened up the moderation on the text-based version of ChatGPT to allow some discussion of some erotic content.

Grok’s new “unhinged” voice mode can curse and scream, simulate phone sex Read More »

google’s-free-gemini-code-assist-arrives-with-sky-high-usage-limits

Google’s free Gemini Code Assist arrives with sky-high usage limits

Generative AI has wormed its way into myriad products and services, some of which benefit more from these tools than others. Coding with AI has proven to be a better application than most, with individual developers and big companies leaning heavily on generative tools to create and debug programs. Now, indie developers have access to a new AI coding tool free of charge—Google has announced that Gemini Code Assist is available to everyone.

Gemini Code Assist was first released late last year as an enterprise tool, and the new version has almost all the same features. While you can use the standard Gemini or another AI model like ChatGPT to work on coding questions, Gemini Code Assist was designed to fully integrate with the tools developers are already using. Thus, you can tap the power of a large language model (LLM) without jumping between windows. With Gemini Code Assist connected to your development environment, the model will remain aware of your code and ready to swoop in with suggestions. The model can also address specific challenges per your requests, and you can chat with the model about your code, provided it’s a public domain language.

At launch, Gemini Code Assist pricing started at $45 per month per user. Now, it costs nothing for individual developers, and the limits on the free tier are generous. Google says the product offers 180,000 code completions per month, which it claims is enough that even prolific professional developers won’t run out. This is in stark contrast to Microsoft’s GitHub Copilot, which offers similar features with a limit of just 2,000 code completions and 50 Copilot chat messages per month. Google did the math to point out Gemini Code Assist offers 90 times the completions of Copilot.

Google’s free Gemini Code Assist arrives with sky-high usage limits Read More »

claude-3.7-sonnet-debuts-with-“extended-thinking”-to-tackle-complex-problems

Claude 3.7 Sonnet debuts with “extended thinking” to tackle complex problems

Would the color be called 'magenta' if the town of Magenta didn't exist? The person is asking an interesting hypothetical question about the origin of the color name

An example of Claude 3.7 Sonnet with extended thinking is asked, “Would the color be called ‘magenta’ if the town of Magenta didn’t exist?” Credit: Benj Edwards

Interestingly, xAI’s Grok 3 with “thinking” (its SR mode) enabled was the first model that definitively gave us a “no” and not an “it’s not likely” to the magenta question. Claude 3.7 Sonnet with extended thinking also impressed us with our second-ever firm “no,” then an explanation.

In another informal test, we asked 3.7 Sonnet with extended thinking to compose five original dad jokes. We’ve found in the past that our old prompt, “write 5 original dad jokes,” was not specific enough and always resulted in canned dad jokes pulled directly from training data, so we asked, “Compose 5 original dad jokes that are not found anywhere in the world.”

Compose 5 original dad jokes that are not found anywhere in the world. The user is asking me to compose 5 original dad jokes. These should be jokes that follow the typical

An example of Claude 3.7 Sonnet with extended thinking is asked, “Compose 5 original dad jokes that are not found anywhere in the world.” Credit: Benj Edwards

Claude made some attempts at crafting original jokes, although we’ll let you judge whether they are funny or not. We will likely put 3.7 Sonnet’s SR capabilities to the test more exhaustively in a future article.

Anthropic’s first agent: Claude Code

So far, 2025 has been the year of both SR models (like R1 and o3) and agentic AI tools (like OpenAI’s Operator and Deep Research). Not to be left out, Anthropic has announced its first agentic tool, Claude Code.

Claude Code operates directly from a console terminal and is an autonomous coding assistant. It allows Claude to search through codebases, read and edit files, write and run tests, commit and push code to GitHub repositories, and execute command line tools while keeping developers informed throughout the process.

Introducing Claude Code.

Anthropic also aims for Claude Code to be used as an assistant for debugging and refactoring tasks. The company claims that during internal testing, Claude Code completed tasks in a single session that would typically require 45-plus minutes of manual work.

Claude Code is currently available only as a “limited research preview,” with Anthropic stating it plans to improve the tool based on user feedback over time. Meanwhile, Claude 3.7 Sonnet is now available through the Claude website, the Claude app, Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.

Claude 3.7 Sonnet debuts with “extended thinking” to tackle complex problems Read More »

perplexity-wants-to-reinvent-the-web-browser-with-ai—but-there’s-fierce-competition

Perplexity wants to reinvent the web browser with AI—but there’s fierce competition

It has recently been expanding its offerings—for example, it recently launched a deep research tool competing with similar ones provided by OpenAI and Google, as well as Sonar, an API for generative AI-powered search.

It will face fierce competition in the browser market, though. Google’s Chrome accounts for the majority of web browser use around the world, and despite its position at the forefront of AI search, Perplexity isn’t the first to introduce a browser with heavy use of generative AI features. For example, The Browser Company showed off its Dia browser in December.

Dia will allow users to type natural language commands into the search bar, like finding a document or webpage or creating a calendar event. It’s possible that Comet will do similar things, but again, we don’t know.

So far, most consumer-facing AI tools have come in one of three forms. There are general-purpose chatbots (like OpenAI’s ChatGPT and Anthropic’s Claude); features that use trained deep learning models subtly baked into existing software (as in Adobe Photoshop or Apple’s iOS); and, less commonly, standalone software meant to remake existing application categories using AI features (like the Cursor IDE).

There haven’t been a ton of AI-specific applications in existing categories like this before, but expect to see more coming over the next couple of years.

Perplexity wants to reinvent the web browser with AI—but there’s fierce competition Read More »

deepseek-goes-beyond-“open-weights”-ai-with-plans-for-source-code-release

DeepSeek goes beyond “open weights” AI with plans for source code release

Major models, including Google’s Gemma, Meta’s Llama, and even older OpenAI releases like GPT2, have been released under this open weights structure. Those models also often release open source code covering the inference-time instructions run when responding to a query.

It’s currently unclear whether DeepSeek’s planned open source release will also include the code the team used when training the model. That kind of training code is necessary to meet the Open Source Initiative’s formal definition of “Open Source AI,” which was finalized last year after years of study. A truly open AI also must include “sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system,” according to OSI.

A fully open source release, including training code, can give researchers more visibility into how a model works at a core level, potentially revealing biases or limitations that are inherent to the model’s architecture instead of its parameter weights. A full source release would also make it easier to reproduce a model from scratch, potentially with completely new training data, if necessary.

Elon Musk’s xAI released an open source version of Grok 1’s inference-time code last March and recently promised to release an open source version of Grok 2 in the coming weeks. However, the recent release of Grok 3 will remain proprietary and only available to X Premium subscribers for the time being, the company said.

Earlier this month, HuggingFace released an open source clone of OpenAI’s proprietary “Deep Research” feature mere hours after it was released. That clone relies on a closed-weights model at release “just because it worked well,” Hugging Face’s Aymeric Roucher told Ars Technica, but the source code’s “open pipeline” can easily be switched to any open-weights model as needed.

DeepSeek goes beyond “open weights” AI with plans for source code release Read More »

robot-with-1,000-muscles-twitches-like-human-while-dangling-from-ceiling

Robot with 1,000 muscles twitches like human while dangling from ceiling

Plans for 279 robots to start

While the Protoclone is a twitching, dangling robotic prototype right now, there’s a lot of tech packed into its body. Protoclone’s sensory system includes four depth cameras in its skull for vision, 70 inertial sensors to track joint positions, and 320 pressure sensors that provide force feedback. This system lets the robot react to visual input and learn by watching humans perform tasks.

As you can probably tell by the video, the current Protoclone prototype is still in an early developmental stage, requiring ceiling suspension for stability. Clone Robotics previously demonstrated components of this technology in 2022 with the release of its robotic hand, which used the same Myofiber muscle system.

Artificial Muscles Robotic Arm Full Range of Motion + Static Strength Test (V11).

A few months ago, Clone Robotics also showed off a robotic torso powered by the same technology.

Torso 2 by Clone with Actuated Abdomen.

Other companies’ robots typically use other types of actuators, such as solenoids and electric motors. Clone’s pressure-based muscle system is an interesting approach, though getting Protoclone to stand and balance without the need for suspension or umbilicals may still prove a challenge.

Clone Robotics plans to start its production with 279 units called Clone Alpha, with plans to open preorders later in 2025. The company has not announced pricing for these initial units, but given the engineering challenges still ahead, a functional release any time soon seems optimistic.

Robot with 1,000 muscles twitches like human while dangling from ceiling Read More »

microsoft’s-new-ai-agent-can-control-software-and-robots

Microsoft’s new AI agent can control software and robots

The researchers' explanations about how

The researchers’ explanations about how “Set-of-Mark” and “Trace-of-Mark” work. Credit: Microsoft Research

The Magma model introduces two technical components: Set-of-Mark, which identifies objects that can be manipulated in an environment by assigning numeric labels to interactive elements, such as clickable buttons in a UI or graspable objects in a robotic workspace, and Trace-of-Mark, which learns movement patterns from video data. Microsoft says those features allow the model to complete tasks like navigating user interfaces or directing robotic arms to grasp objects.

Microsoft Magma researcher Jianwei Yang wrote in a Hacker News comment that the name “Magma” stands for “M(ultimodal) Ag(entic) M(odel) at Microsoft (Rese)A(rch),” after some people noted that “Magma” already belongs to an existing matrix algebra library, which could create some confusion in technical discussions.

Reported improvements over previous models

In its Magma write-up, Microsoft claims Magma-8B performs competitively across benchmarks, showing strong results in UI navigation and robot manipulation tasks.

For example, it scored 80.0 on the VQAv2 visual question-answering benchmark—higher than GPT-4V’s 77.2 but lower than LLaVA-Next’s 81.8. Its POPE score of 87.4 leads all models in the comparison. In robot manipulation, Magma reportedly outperforms OpenVLA, an open source vision-language-action model, in multiple robot manipulation tasks.

Magma's agentic benchmarks, as reported by the researchers.

Magma’s agentic benchmarks, as reported by the researchers. Credit: Microsoft Research

As always, we take AI benchmarks with a grain of salt since many have not been scientifically validated as being able to measure useful properties of AI models. External verification of Microsoft’s benchmark results will become possible once other researchers can access the public code release.

Like all AI models, Magma is not perfect. It still faces technical limitations in complex step-by-step decision-making that requires multiple steps over time, according to Microsoft’s documentation. The company says it continues to work on improving these capabilities through ongoing research.

Yang says Microsoft will release Magma’s training and inference code on GitHub next week, allowing external researchers to build on the work. If Magma delivers on its promise, it could push Microsoft’s AI assistants beyond limited text interactions, enabling them to operate software autonomously and execute real-world tasks through robotics.

Magma is also a sign of how quickly the culture around AI can change. Just a few years ago, this kind of agentic talk scared many people who feared it might lead to AI taking over the world. While some people still fear that outcome, in 2025, AI agents are a common topic of mainstream AI research that regularly takes place without triggering calls to pause all of AI development.

Microsoft’s new AI agent can control software and robots Read More »

google’s-new-ai-generates-hypotheses-for-researchers

Google’s new AI generates hypotheses for researchers

Over the past few years, Google has embarked on a quest to jam generative AI into every product and initiative possible. Google has robots summarizing search results, interacting with your apps, and analyzing the data on your phone. And sometimes, the output of generative AI systems can be surprisingly good despite lacking any real knowledge. But can they do science?

Google Research is now angling to turn AI into a scientist—well, a “co-scientist.” The company has a new multi-agent AI system based on Gemini 2.0 aimed at biomedical researchers that can supposedly point the way toward new hypotheses and areas of biomedical research. However, Google’s AI co-scientist boils down to a fancy chatbot. 

A flesh-and-blood scientist using Google’s co-scientist would input their research goals, ideas, and references to past research, allowing the robot to generate possible avenues of research. The AI co-scientist contains multiple interconnected models that churn through the input data and access Internet resources to refine the output. Inside the tool, the different agents challenge each other to create a “self-improving loop,” which is similar to the new raft of reasoning AI models like Gemini Flash Thinking and OpenAI o3.

This is still a generative AI system like Gemini, so it doesn’t truly have any new ideas or knowledge. However, it can extrapolate from existing data to potentially make decent suggestions. At the end of the process, Google’s AI co-scientist spits out research proposals and hypotheses. The human scientist can even talk with the robot about the proposals in a chatbot interface. 

Google AI co-scientist

The structure of Google’s AI co-scientist.

You can think of the AI co-scientist as a highly technical form of brainstorming. The same way you can bounce party-planning ideas off a consumer AI model, scientists will be able to conceptualize new scientific research with an AI tuned specifically for that purpose. 

Testing AI science

Today’s popular AI systems have a well-known problem with accuracy. Generative AI always has something to say, even if the model doesn’t have the right training data or model weights to be helpful, and fact-checking with more AI models can’t work miracles. Leveraging its reasoning roots, the AI co-scientist conducts an internal evaluation to improve outputs, and Google says the self-evaluation ratings correlate to greater scientific accuracy. 

The internal metrics are one thing, but what do real scientists think? Google had human biomedical researchers evaluate the robot’s proposals, and they reportedly rated the AI co-scientist higher than other, less specialized agentic AI systems. The experts also agreed the AI co-scientist’s outputs showed greater potential for impact and novelty compared to standard AI models. 

This doesn’t mean the AI’s suggestions are all good. However, Google partnered with several universities to test some of the AI research proposals in the laboratory. For example, the AI suggested repurposing certain drugs for treating acute myeloid leukemia, and laboratory testing suggested it was a viable idea. Research at Stanford University also showed that the AI co-scientist’s ideas about treatment for liver fibrosis were worthy of further study. 

This is compelling work, certainly, but calling this system a “co-scientist” is perhaps a bit grandiose. Despite the insistence from AI leaders that we’re on the verge of creating living, thinking machines, AI isn’t anywhere close to being able to do science on its own. That doesn’t mean the AI-co-scientist won’t be useful, though. Google’s new AI could help humans interpret and contextualize expansive data sets and bodies of research, even if it can’t understand or offer true insights. 

Google says it wants more researchers working with this AI system in the hope it can assist with real research. Interested researchers and organizations can apply to be part of the Trusted Tester program, which provides access to the co-scientist UI as well as an API that can be integrated with existing tools.

Google’s new AI generates hypotheses for researchers Read More »