deepmind

deepmind-is-holding-back-release-of-ai-research-to-give-google-an-edge

DeepMind is holding back release of AI research to give Google an edge

However, the employee added it had also blocked a paper that revealed vulnerabilities in OpenAI’s ChatGPT, over concerns the release seemed like a hostile tit-for-tat.

A person close to DeepMind said it did not block papers that discuss security vulnerabilities, adding that it routinely publishes such work under a “responsible disclosure policy,” in which researchers must give companies the chance to fix any flaws before making them public.

But the clampdown has unsettled some staffers, where success has long been measured through appearing in top-tier scientific journals. People with knowledge of the matter said the new review processes had contributed to some departures.

“If you can’t publish, it’s a career killer if you’re a researcher,” said a former researcher.

Some ex-staff added that projects focused on improving its Gemini suite of AI-infused products were increasingly prioritized in the internal battle for access to data sets and computing power.

In the past few years, Google has produced a range of AI-powered products that have impressed the markets. This includes improving its AI-generated summaries that appear above search results, to unveiling an “Astra” AI agent that can answer real-time queries across video, audio, and text.

The company’s share price has increased by as much as a third over the past year, though those gains pared back in recent weeks as concern over US tariffs hit tech stocks.

In recent years, Hassabis has balanced the desire of Google’s leaders to commercialize its breakthroughs with his life mission of trying to make artificial general intelligence—AI systems with abilities that can match or surpass humans.

“Anything that gets in the way of that he will remove,” said one current employee. “He tells people this is a company, not a university campus; if you want to work at a place like that, then leave.”

Additional reporting by George Hammond.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

DeepMind is holding back release of AI research to give Google an edge Read More »

on-the-meta-and-deepmind-safety-frameworks

On the Meta and DeepMind Safety Frameworks

This week we got a revision of DeepMind’s safety framework, and the first version of Meta’s framework. This post covers both of them.

  1. Meta’s RSP (Frontier AI Framework).

  2. DeepMind Updates its Frontier Safety Framework.

  3. What About Risk Governance.

  4. Where Do We Go From Here?

Here are links for previous coverage of: DeepMind’s Framework 1.0, OpenAI’s Framework and Anthropic’s Framework.

Since there is a law saying no two companies can call these documents by the same name, Meta is here to offer us its Frontier AI Framework, explaining how Meta is going to keep us safe while deploying frontier AI systems.

I will say up front, if it sounds like I’m not giving Meta the benefit of the doubt here, it’s because I am absolutely not giving Meta the benefit of the doubt here. I see no reason to believe otherwise. Notice there is no section here on governance, at all.

I will also say up front it is better to have any policy at all, that lays out their intentions and allows us to debate what to do about it, than to say nothing. I am glad that rather than keeping their mouths shut and being thought of as reckless fools, they have opened their mouths and removed all doubt.

Even if their actual policy is, in effect, remarkably close to this:

The other good news is that they are looking uniquely at catastrophic outcomes, although they are treating this as a set of specific failure modes, although they will periodically brainstorm to try and think of new ones via hosting workshops for experts.

Meta: Our Framework is structured around a set of catastrophic outcomes. We have used threat modelling to develop threat scenarios pertaining to each of our catastrophic outcomes. We have identified the key capabilities that would enable the threat actor to realize a threat scenario. We have taken into account both state and non-state actors, and our threat scenarios distinguish between high- or low-skill actors.

If there exists another AI model that could cause the same problem, then Meta considers the risk to not be relevant. It only counts ‘unique’ risks, which makes it easy to say ‘but they also have this problem’ and disregard an issue.

I especially worry that Meta will point to a potential risk in a competitor’s closed source system, and then use that as justification to release a similar model as open, despite this action creating unique risks.

Another worry is that this may exclude things that are not directly catastrophic, but that lead to future catastrophic risks, such as acceleration of AI R&D or persuasion risks (which Google also doesn’t consider). Those two sections of other SSPs? They’re not here. At all. Nor are radiological or nuclear threats. They don’t care.

You’re laughing. They’re trying to create recursive self-improvement, and you’re laughing.

But yes, they do make the commitment to stop development if they can’t meet the guidelines.

We define our thresholds based on the extent to which frontier AI would uniquely enable the execution of any of the threat scenarios we have identified as being potentially sufficient to produce a catastrophic outcome. If a frontier AI is assessed to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined in Table 1.

Our high and moderate risk thresholds are defined in terms of the level of uplift a model provides towards realising a threat scenario.

2.1.1 first has Meta identify a ‘reference class’ for a model, to use throughout development. This makes sense, since you want to treat potential frontier-pushing models very differently from others.

2.1.2 says they will ‘conduct a risk assessment’ but does not commit them to much of anything, only that it involve ‘external experts and company leaders from various disciplines’ and involve a safety and performance evaluation. They push their mitigation strategy to section 4.

2.1.3 They will then assess the risks and decide whether to release. Well, duh. Except that other RSPs/SSPs explain the decision criteria here. Meta doesn’t.

2.2 They argue transparency is an advantage here, rather than open weights obviously making the job far harder – you can argue it has compensating benefits but open weights make release irreversible and take away many potential defenses and mitigations. It is true that you get better evaluations post facto, once it is released for others to examine, but that largely takes the form of seeing if things go wrong.

3.1 Describes an ‘outcomes-led’ approach. What outcomes? This refers to a set of outcomes they seek to prevent. Then thresholds for not releasing are based on those particular outcomes, and they reserve the right to add to or subtract that list at will with no fixed procedure.

The disdain here for ‘theoretical risks’ is palpable. It seems if the result isn’t fully proximate, it doesn’t count, despite such releases being irreversible, and many of these ‘theoretical’ risks being rather obviously real and the biggest dangers.

An outcomes-led approach also enables prioritization. This systematic approach will allow us to identify the most urgent catastrophic outcomes – i.e., cybersecurity and chemical and biological weapons risks – and focus our efforts on avoiding them rather than spreading efforts across a wide range of theoretical risks from particular capabilities that may not plausibly be presented by the technology we are actually building.

The whole idea of 3.2’s theme of ‘threat modeling’ and an ‘outcomes-led approach’ is a way of saying that if you can’t draw a direct proximate link to the specific catastrophic harm, then once the rockets go up who cares where they come down, that’s not their department.

So in order for a threat to count, it has to both:

  1. Be a specific concrete threat you can fully model.

  2. Be unique, you can show it can’t be modeled any other way, either by any other AI system, or by achieving the same ends via any other route.

Most threats thus can either be dismissed as too theoretical and silly, or too concrete and therefore doable by other means.

It is important to note that the pathway to realise a catastrophic outcome is often extremely complex, involving numerous external elements beyond the frontier AI model. Our threat scenarios describe an essential part of the end-to-end pathway. By testing whether our model can uniquely enable a threat scenario, we’re testing whether it uniquely enables that essential part of the pathway.

Thus, it doesn’t matter how much easier you make something – it as to be something that wasn’t otherwise possible, and then they will check to be sure the threat is currently realizable:

This would also trigger a new threat modelling exercise to develop additional threat scenarios along the causal pathway so that we can ascertain whether the catastrophic outcome is indeed realizable, or whether there are still barriers to realising the catastrophic outcome (see Section 5.1 for more detail).

But the whole point of Meta’s plan is to put the model out there where you can’t take it back. So if there is still an ‘additional barrier,’ what are you going to do if that barrier is removed in the future? You need to plan for what barriers will remain in place, not what barriers exist now.

Here they summarize all the different ways they plan on dismissing threats:

Contrast this with DeepMind’s 2.0 framework, also released this week, which says:

DeepMind: Note that we have selected our CCLs (critical capability levels) to be conservative; it is not clear to what extent CLLs might translate to harm in real-world contexts.

From the old 1.0 DeepMind framework, notice how they think you’re supposed to mitigate to a level substantially below where risk lies (the graph is not in 2.0 but the spirit clearly remains):

Anthropic and OpenAI’s frameworks also claim to attempt to follow this principle.

DeepMind is doing the right thing here. Meta is doing a very different thing.

Here’s their chart of what they’d actually do.

Okay, that’s standard enough. ‘Moderate’ risks are acceptable. ‘High’ risks are not until you reduce them to Moderate. Critical means panic, but even then the ‘measures’ are essentially ‘ensure this is concretely able to happen now, cause otherwise whatever.’ I expect in practice ‘realizable’ here means ‘we can prove it is realizable and more or less do it’ not ‘it seems plausible that if we give this thing to the whole internet that someone could do it.’

I sense a core conflict between the High criteria here – ‘provides significant uplift towards’ – and their other talk, which is that the threat has to be realizable if and only if the model is present. Those are very different standards. Which is it?

If they mean what they say in High here, with a reasonable working definition of ‘significant uplift towards execution,’ then that’s a very different, actually reasonable level of enabling to consider not acceptable. Or would that then get disregarded?

I also do appreciate that risk is always at least Moderate. No pretending it’s Low.

Now we get to the actual threat scenarios.

I am not an expert in this area, so I’m not sure if this is complete, but this seems like a good faith effort to cover cybersecurity issues.

This is only chemical and biological, not full CBRN. Within that narrow bound, this seems both fully generic and fully functional. Should be fine as far as it goes.

Section 4 handles implementation. They check ‘periodically’ during development, note that other RSPs defined what compute thresholds triggered this and Meta doesn’t. They’ll prepare a robust evaluation environment. They’ll check if capabilities are good enough to bother checking for threats. If it’s worth checking, then they’ll check for actual threats.

I found this part pleasantly surprising:

Our evaluations are designed to account for the deployment context of the model. This includes assessing whether risks will remain within defined thresholds once a model is deployed or released using the target release approach.

For example, to help ensure that we are appropriately assessing the risk, we prepare the asset – the version of the model that we will test – in a way that seeks to account for the tools and scaffolding in the current ecosystem that a particular threat actor might seek to leverage to enhance the model’s capabilities.

The default ‘target release approach’ here is presumably open weights. It is great to know they understand they need to evaluate their model in that context, knowing all the ways in which their defenses won’t work, and all the ways users can use scaffolding and fine-tuning and everything else, over time, and how there will be nothing Meta can do about any of it.

What they say here, one must note, is not good enough. You don’t get to assume that only existing tools and scaffolding exist indefinitely, if you are making an irreversible decision. You also have to include reasonable expectations for future tools and scaffolding, and also account for fine-tuning and the removal of mitigations.

We also account for enabling capabilities, such as automated AI R&D, that might increase the potential for enhancements to model capabilities.

Great! But that’s not on the catastrophic outcomes list, and you say you only care about catastrophic outcomes.

So basically, this is saying that if Llama 5 were to enable automated R&D, that in and of itself is nothing to worry about, but if it then turned itself into Llama 6 into Llama 7 (computer, revert to Llama 6!) then we have to take that into account when considering there might be a cyberattack?

If automated AI R&D is at the levels where you’re taking this into account, um…

And of course, here’s some language that Meta included:

Even for tangible outcomes, where it might be possible to assign a dollar value in revenue generation, or percentage increase in productivity, there is often an element of subjective judgement about the extent to which these economic benefits are important to society.

I mean, who can really say how invaluable it is for people to connect with each other.

While it is impossible to eliminate subjectivity, we believe that it is important to consider the benefits of the technology we develop. This helps us ensure that we are meeting our goal of delivering those benefits to our community. It also drives us to focus on approaches that adequately mitigate any significant risks that we identify without also eliminating the benefits we hoped to deliver in the first place.

Yes, there’s catastrophic risk, but Just Think of the Potential.

Of course, yes, it is ultimately a game of costs versus benefits, risks versus rewards. I am not saying that the correct number of expected catastrophic risks is zero, or even that the correct probability of existential risk is zero or epsilon. I get it.

But the whole point of these frameworks is to define in advance what precautions you will take, and what things you won’t do, exactly because when the time comes, it will be easy to justify pushing forward when you shouldn’t, and to define clear principles. If the principle is ‘as long as I see enough upside I do what I want’? I expect in the trenches this means ‘we will do whatever we want, for our own interests.’

That doesn’t mean Meta will do zero safety testing. It doesn’t mean that, if the model was very obviously super dangerous, they would release it anyway, I don’t think these people are suicidal or worse want to go bankrupt. But you don’t need a document like this if it ultimately only says ‘don’t do things that at the time seem deeply stupid.’

Or at least, I kind of hope you were planning on not doing that anyway?

Similarly, if you wanted to assure others and tie your hands against pressures, you would have a procedure required to modify the framework, at least if you were going to make it more permissive. I don’t see one of those. Again, they can do what they want.

They have a permit.

It says ‘lol, we’re Meta.’

Good. I appreciate the candor, including the complete disregard for potential recursive self-improvement risks, as well as nuclear, radiological or persuasion risks.

So what are we going to do about all this?

Previously we had version 1.0, now we have version 2.0. DeepMinders are excited.

This is in several ways an improvement over version 1.0. It is more detailed, it introduces deceptive alignment as a threat model, it has sections on governance and disclosures, and it fixes a few other things. It maps capability levels to mitigation levels, which was missing previously. There are also some smaller steps backwards.

Mostly I’ll go over the whole thing, since I expect almost all readers don’t remember the details from my coverage of the first version.

The framework continues to be built around ‘Critical Capability Levels.’

We describe two sets of CCLs: misuse CCLs that can indicate heightened risk of severe harm from misuse if not addressed, and deceptive alignment CCLs that can indicate heightened risk of deceptive alignment-related events if not addressed.

The emphasis on deceptive alignment is entirely new.

For misuse risk, we dene CCLs in high-risk domains where, based on early research, we believe risks of severe harm may be most likely to arise from future models:

● CBRN: Risks of models assisting in the development, preparation, and/or execution of a chemical, biological, radiological, or nuclear (“CBRN”) attack.

● Cyber: Risks of models assisting in the development, preparation, and/or execution of a cyber attack.

● Machine Learning R&D: Risks of the misuse of models capable of accelerating the rate of AI progress to potentially destabilizing levels, the result of which could be the unsafe attainment or proliferation of other powerful AI models. Capabilities in this area are under active research, and in the longer term may exacerbate frontier AI risks—including in other risk domains—if insuciently managed.

What are noticeably missing from this list are two things.

  1. Persuasion risks, which Google and Meta don’t consider. I think this should be considered, although I do think there is a reasonable argument against this.

  2. Autonomy. In Version 1.0, Google considered the risk of expanding effective capacity in the world fully autonomously, since this could potentially then expand quite a bit, and then use that capacity (via instrumental convergence) for any number of goals. I am sad to see this go. I am also sad to not see an explanation for why this was removed.

For both of these but for autonomy in particular, I ask:

  1. Is it plausible that a model could cross the previous autonomy threshold without crossing the others here?

  2. If that were the case, should we be worried about that?

The answer to the first question seems to clearly be yes.

The answer to the second question is, I do think I would want to at least test for and be aware of crossing this threshold, and then think hard about the implications.

I am however very happy to see the new addition, which I haven’t seen before in a similar document:

For deceptive alignment risk, the initial approach focuses on detecting when models might develop a baseline instrumental reasoning ability at which they have the potential to undermine human control, assuming no additional mitigations were applied.

The two instrumental reasoning CCLs thus focus on delineating when such capability becomes present, and subsequently when the initial mitigation for this capability—automated monitoring—is no longer adequate.

This is exactly the kind of thinking that Meta repeatedly disavows and treats with disdain in its framework. If AIs undermine human control, if they pretend to be aligned when they are not or otherwise steer the training process to their own ends, then we have a serious problem. Meta’s response is, well, how does that lead to a specific catastrophic outcome? To which I say, that is obviously the wrong question.

The basic plan is same as it always is and has to be. Run periodic evaluations as models gain capabilities, check for dangerous capabilities, apply mitigations. They take both security mitigations to protect the weights, and deployment mitigations to counter potential misuse.

I notice this framework doesn’t by default work well for the AI R&D case. They agree, and note as an issue for future work, that further classes of mitigation are likely needed.

Here is the deployment mitigation plan 2.0:

  1. Development and assessment of mitigations: safeguards and an accompanying safety case are developed by iterating on the following:

    1. Developing and improving a suite of safeguards targeting the capability. This includes, as appropriate, safety fine-tuning, misuse filtering and detection, and response protocols.

    2. Assessing the robustness of these mitigations against the risk posed through assurance evaluations and threat modeling research. The assessment takes the form of a safety case, taking into account factors such as the likelihood and consequences of misuse.

  2. Pre-deployment review of safety case: general availability deployment of a model takes place only after the appropriate corporate governance body determines the safety case regarding each CCL the model has reached to be adequate.

  3. Post-deployment review of safety case: the safety case will be updated through red-teaming and revisions to our threat models. The safeguards for the model may be updated as well to ensure continued adequacy.

This replaces the old plan, which was to go in steps from mitigations targeting the capability to a red team validated safety case to prevention of access.

I notice that the red-teaming here is only mentioned in post-deployment. It should be in pre-deployment as well, no, as part of either the safety case or review of the safety case, or both? But the core idea is the same, you need to do mitigation and then make a safety case. So essentially this is universalizing the old Security Level 2.

Their logic is that if prevention of access (the old Security Level 3) is necessary, then that is a natural extension. I get why they’re saying that, but I’d like to keep this explicit and around anyway – you can almost never be too explicit in situations like this. Clarity almost never hurts, and if thinking changes on this I’d like to track it.

The definition of mitigations here is still ‘take adequate mitigations,’ which does not tell us much. Google has wide latitude, as long as it makes its safety case.

Here are the mitigations:

As I said last time: One of these things is not like the others.

The CBRN and Cyber capability thresholds seem set rather high, but also I expect mitigations to be involved well below these thresholds. I’d also want clarification, cost reductions compared to what case? Presumably this is versus not having general LLM access. If it’s versus the next best available LLM, then an order of magnitude or ‘drastic reduction’ is too large a threshold, since one could easily approach that in distinct steps. In general I worry these new threshold descriptions are substantially higher than the previous ones.

We now move on to machine learning R&D. For uplift 1, they move from 3x rates for algorithmic progress to 2x overall progress for Machine Learning Uplift, and specifying 2024 rates as the baseline. I think that’s a better way to define the threshold to match what we actually care about, so I like this change.

Then there’s Machine Learning R&D autonomy level 1. This isn’t quite as ambitious as the old version’s Machine Learning R&D 2, which was essentially ‘the singularity arrives next Tuesday.’ This is at comparable costs rather than a drastic reduction in costs.

My guess is this still probably does mean the singularity arrives not too long from next Tuesday, given how much compute is available to Google, although this isn’t obvious. Even if it’s currently only ‘about as efficient’ as humans would be, that phase won’t last long. If you had a model that was anywhere near being able to do this, and no one else had released one that was better, it seems rather crazy to consider releasing it.

This graph, which I also showed in the Meta section, is from the 1.0 version of the framework. It’s not in 2.0, but it still does seem to be adhered to in spirit:

The security levels are listed in terms of the RAND security protocols.

Since the document calls for levels 2,3 and 4, here’s a Claude summary of levels 2-4:

SL2 – Professional Protection

  • Defends against professional but opportunistic hackers

  • Focuses on comprehensive implementation of security fundamentals and industry best practices

  • Emphasizes closing common security gaps and vulnerabilities

  • Key aspects include email security, password policies, and multi-factor authentication

SL3 – Advanced Protection

  • Defends against cybercrime syndicates and insider threats

  • Aggressively reduces attack surface

  • Implements defense-in-depth strategies

  • Focuses heavily on restricting and monitoring access to weights

  • Requires two independent security layers

SL4 – State-Level Protection

  • Defends against standard operations by leading cyber-capable institutions (like foreign intelligence agencies)

  • Requires significant compromises on productivity and convenience

  • Implements hardware-level security measures

  • Uses confidential computing

  • Requires four independent security layers

If you’re Google and you’re not at least at SL2 for every model worth deploying, why the hell not? I think this should be the minimum even with no dangerous capabilities.

There is a reason why even 1 level of Version 1.0’s protocol was called similar to SL3. I don’t buy the arguments in Table 1 that cybersecurity improvements are likely to diminish necessary security levels. I think it’s rather absurd to not move to at least SL3 if any critical capability levels are found.

The question is at what point everything needs to be SL4, and how much this should depend upon coordination with the rest of industry. My guess i you only need it for Machine Learning R&D, but there I do think you need something like that, even though implementation will be expensive. For ‘uplift 1’ SL4 seems roughly reasonable to me, but for ‘autonomy 1’ I think even SL4 is inadequate.

This is a longstanding disagreement, as I found their previous version insufficient.

For the deceptive alignment case, ‘automated monitoring’ can be applied as the initial mitigation, as noted above, but this is a very temporary solution if it’s not part of defense-in-depth as a backup plan. The ‘automated monitoring only’ phase likely wouldn’t last long, and risks making things worse via teaching the system how to evade detection.

I do find the answer of ‘future work, we don’t know how’ for mitigation of Instrumental Reasoning Level 2 to be acceptable. It’s a hard problem. Better to admit you don’t know, so long as this also means that if you’re in this situation before you figure it out, you’re committing to halting and catching fire until you figure it out.

I also appreciated that there is now at least a brief a section on governance, which was not in 1.0, detailing who will review implementation of the framework (the AGI Safety Council) and noting several other relevant councils. I would like to see more of a formal process but this is a start.

I also appreciated the intention to share information with ‘appropriate government authorities’ if the risks involved are triggered, even if they are then mitigated. They don’t commit to telling anyone else, but will consider it.

Another great note was saying ‘everyone needs to do this.’ Safety of models is a public good, and somewhat of a Stag Hunt, where we all win if everyone who is at the frontier cooperates. If you can outrun the bear, but the bear still eats someone else’s model, in this case you are not safe.

However, there were also a few steps back. The specific 6x compute or 3 month threshold was removed for a more flexible rule. I realize that 6x was stingy already and a hard-and-fast rule will sometimes be foolish, but I believe we do need hard commitments in such places at current trust levels.

So we have steps forward in (some details here not mentioned above):

  1. Deceptive alignment as a threat model.

  2. Capability levels are mapped to mitigation levels.

  3. Governance.

  4. Disclosures.

  5. Using the RAND protocol levels.

  6. Adjustment of threshold details.

  7. Centralizing role of safety cases.

  8. Changed ‘pass condition’ to ‘alert threshold’ which seems better.

  9. Emphasis on confidential computing.

  10. Explicit calls for industry-wide cooperation, willingness to coordinate.

  11. Explicit intention of sharing results with government if thresholds are triggered.

And we have a few steps back:

  1. Removal of autonomy threshold (I will trade this for deceptive alignment but would prefer to have both, and am still sad about missing persuasion.)

  2. Removal of the 6x compute and 3 month thresholds for in-training testing.

  3. Reduced effective security requirements in some places.

  4. Less explicitness about shutting down access if necessary.

Overall, it’s good news. That’s definitely a step forward, and it’s great to see DeepMind publishing revisions and continuing to work on the document.

One thing missing the current wave of safety frameworks is robust risk governance. The Centre for Long-Term Resilience argues, in my opinion compellingly, that these documents need risk governance to serve their full intended purpose.

CLTR: Frontier safety frameworks help AI companies manage extreme risks, but gaps in effective risk governance remain. Ahead of the Paris AI Action Summit next week, our new report outlines key recommendations on how to bridge this gap.

Drawing on the best practice 3 lines framework widely used in other safety critical industries like nuclear, aviation and healthcare, effective risk governance includes:

  1. Decision making ownership (first-line)

  2. Advisory oversight (second-line)

  3. Assurance (third line)

  4. Board-level oversight

  5. Culture

  6. External transparency

Our analysis found that evidence for effective risk governance across currently published frontier AI safety frameworks is low overall.

While some aspects of risk governance are starting to be applied, the overall state of risk governance implementation in safety frameworks appears to be low, across all companies.

This increases the chance of harmful models being released because of aspects like unclear risk ownership, escalation pathways and go/no-go decisions about when to release models.

By using the recommendations outlined in our report, overall effectiveness of safety frameworks can be improved by enhancing risk identification, assessment, and mitigation.

It is an excellent start to say that your policy has to say what you will do. You then need to ensure that the procedures are laid out so it actually happens. They consider the above an MVP of risk governance.

I notice that the MVP does not seem to be optimizing for being on the lower right of this graph? Ideally, you want to start with things that are valuable and easy.

Escalation procedures and go/no-go decisions seem to be properly identified as high value things that are relatively easy to do. I think if anything they are not placing enough emphasis on cultural aspects. I don’t trust any of these frameworks to do anything without a good culture backing them up.

DeepMind has improved its framework, but it has a long way to go. No one has what I would consider a sufficient framework yet, although I believe OpenAI and Anthropic’s attempts are farther along.

The spirit of the documents is key. None of these frameworks are worth much if those involved are looking only to obey the technical requirements. They’re not designed to make adversarial compliance work, if it was even possible. They only work if people genuinely want to be safe. That’s a place Anthropic has a huge edge.

Meta vastly improved its framework, in that it previously didn’t have one, and now the new version at least admits that they essentially don’t have one. That’s a big step. And of course, even if they did have a real framework, I would not expect them to abide by its spirit. I do expect them to abide by the spirit of this one, because the spirit of this one is to not care.

The good news is, now we can talk about all of that.

Discussion about this post

On the Meta and DeepMind Safety Frameworks Read More »

political-deepfakes-are-the-most-popular-way-to-misuse-ai

Political deepfakes are the most popular way to misuse AI

This is not going well —

Study from Google’s DeepMind lays out nefarious ways AI is being used.

Political deepfakes are the most popular way to misuse AI

Artificial intelligence-generated “deepfakes” that impersonate politicians and celebrities are far more prevalent than efforts to use AI to assist cyber attacks, according to the first research by Google’s DeepMind division into the most common malicious uses of the cutting-edge technology.

The study said the creation of realistic but fake images, video, and audio of people was almost twice as common as the next highest misuse of generative AI tools: the falsifying of information using text-based tools, such as chatbots, to generate misinformation to post online.

The most common goal of actors misusing generative AI was to shape or influence public opinion, the analysis, conducted with the search group’s research and development unit Jigsaw, found. That accounted for 27 percent of uses, feeding into fears over how deepfakes might influence elections globally this year.

Deepfakes of UK Prime Minister Rishi Sunak, as well as other global leaders, have appeared on TikTok, X, and Instagram in recent months. UK voters go to the polls next week in a general election.

Concern is widespread that, despite social media platforms’ efforts to label or remove such content, audiences may not recognize these as fake, and dissemination of the content could sway voters.

Ardi Janjeva, research associate at The Alan Turing Institute, called “especially pertinent” the paper’s finding that the contamination of publicly accessible information with AI-generated content could “distort our collective understanding of sociopolitical reality.”

Janjeva added: “Even if we are uncertain about the impact that deepfakes have on voting behavior, this distortion may be harder to spot in the immediate term and poses long-term risks to our democracies.”

The study is the first of its kind by DeepMind, Google’s AI unit led by Sir Demis Hassabis, and is an attempt to quantify the risks from the use of generative AI tools, which the world’s biggest technology companies have rushed out to the public in search of huge profits.

As generative products such as OpenAI’s ChatGPT and Google’s Gemini become more widely used, AI companies are beginning to monitor the flood of misinformation and other potentially harmful or unethical content created by their tools.

In May, OpenAI released research revealing operations linked to Russia, China, Iran, and Israel had been using its tools to create and spread disinformation.

“There had been a lot of understandable concern around quite sophisticated cyber attacks facilitated by these tools,” said Nahema Marchal, lead author of the study and researcher at Google DeepMind. “Whereas what we saw were fairly common misuses of GenAI [such as deepfakes that] might go under the radar a little bit more.”

Google DeepMind and Jigsaw’s researchers analyzed around 200 observed incidents of misuse between January 2023 and March 2024, taken from social media platforms X and Reddit, as well as online blogs and media reports of misuse.

Ars Technica

The second most common motivation behind misuse was to make money, whether offering services to create deepfakes, including generating naked depictions of real people, or using generative AI to create swaths of content, such as fake news articles.

The research found that most incidents use easily accessible tools, “requiring minimal technical expertise,” meaning more bad actors can misuse generative AI.

Google DeepMind’s research will influence how it improves its evaluations to test models for safety, and it hopes it will also affect how its competitors and other stakeholders view how “harms are manifesting.”

© 2024 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Political deepfakes are the most popular way to misuse AI Read More »

deepmind-adds-a-diffusion-engine-to-latest-protein-folding-software

DeepMind adds a diffusion engine to latest protein-folding software

Added complexity —

Major under-the-hood changes let AlphaFold handle protein-DNA complexes and more.

image of a complicated mix of lines and ribbons arranged in a complicated 3D structure.

Enlarge / Prediction of the structure of a coronavirus Spike protein from a virus that causes the common cold.

Google DeepMind

Most of the activities that go on inside cells—the activities that keep us living, breathing, thinking animals—are handled by proteins. They allow cells to communicate with each other, run a cell’s basic metabolism, and help convert the information stored in DNA into even more proteins. And all of that depends on the ability of the protein’s string of amino acids to fold up into a complicated yet specific three-dimensional shape that enables it to function.

Up until this decade, understanding that 3D shape meant purifying the protein and subjecting it to a time- and labor-intensive process to determine its structure. But that changed with the work of DeepMind, one of Google’s AI divisions, which released Alpha Fold in 2021, and a similar academic effort shortly afterward. The software wasn’t perfect; it struggled with larger proteins and didn’t offer high-confidence solutions for every protein. But many of its predictions turned out to be remarkably accurate.

Even so, these structures only told half of the story. To function, almost every protein has to interact with something else—other proteins, DNA, chemicals, membranes, and more. And, while the initial version of AlphaFold could handle some protein-protein interactions, the rest remained black boxes. Today, DeepMind is announcing the availability of version 3 of AlphaFold, which has seen parts of its underlying engine either heavily modified or replaced entirely. Thanks to these changes, the software now handles various additional protein interactions and modifications.

Changing parts

The original AlphaFold relied on two underlying software functions. One of those took evolutionary limits on a protein into account. By looking at the same protein in multiple species, you can get a sense for which parts are always the same, and therefore likely to be central to its function. That centrality implies that they’re always likely to be in the same location and orientation in the protein’s structure. To do this, the original AlphaFold found as many versions of a protein as it could and lined up their sequences to look for the portions that showed little variation.

Doing so, however, is computationally expensive since the more proteins you line up, the more constraints you have to resolve. In the new version, the AlphaFold team still identified multiple related proteins but switched to largely performing alignments using pairs of protein sequences from within the set of related ones. This probably isn’t as information-rich as a multi-alignment, but it’s far more computationally efficient, and the lost information doesn’t appear to be critical to figuring out protein structures.

Using these alignments, a separate software module figured out the spatial relationships among pairs of amino acids within the target protein. Those relationships were then translated into spatial coordinates for each atom by code that took into account some of the physical properties of amino acids, like which portions of an amino acid could rotate relative to others, etc.

In AlphaFold 3, the prediction of atomic positions is handled by a diffusion module, which is trained by being given both a known structure and versions of that structure where noise (in the form of shifting the positions of some atoms) has been added. This allows the diffusion module to take the inexact locations described by relative positions and convert them into exact predictions of the location of every atom in the protein. It doesn’t need to be told the physical properties of amino acids, because it can figure out what they normally do by looking at enough structures.

(DeepMind had to train on two different levels of noise to get the diffusion module to work: one in which the locations of atoms were shifted while the general structure was left intact and a second where the noise involved shifting the large-scale structure of the protein, thus affecting the location of lots of atoms.)

During training, the team found that it took about 20,000 instances of protein structures for AlphaFold 3 to get about 97 percent of a set of test structures right. By 60,000 instances, it started getting protein-protein interfaces correct at that frequency, too. And, critically, it started getting proteins complexed with other molecules right, as well.

DeepMind adds a diffusion engine to latest protein-folding software Read More »

deepmind-co-founder-mustafa-suleyman-will-run-microsoft’s-new-consumer-ai-unit

DeepMind co-founder Mustafa Suleyman will run Microsoft’s new consumer AI unit

Minding deeply —

Most staffers from Suleyman’s startup, Inflection, will join Microsoft as well.

Mustafa Suleyman, talks on Day 1 of the AI Safety Summit at Bletchley Park at Bletchley Park on November 1, 2023 in Bletchley, England.

Enlarge / Mustafa Suleyman, talks on Day 1 of the AI Safety Summit at Bletchley Park at Bletchley Park on November 1, 2023 in Bletchley, England.

Microsoft has hired Mustafa Suleyman, the co-founder of Google’s DeepMind and chief executive of artificial intelligence start-up Inflection, to run a new consumer AI unit.

Suleyman, a British entrepreneur who co-founded DeepMind in London in 2010, will report to Microsoft chief executive Satya Nadella, the company announced on Tuesday. He will launch a division of Microsoft that brings consumer-facing products including Microsoft’s Copilot, Bing, Edge, and GenAI under one team called Microsoft AI.

It is the latest move by Microsoft to capitalize on the boom in generative AI. It has invested $13 billion in OpenAI, the maker of ChatGPT, and rapidly integrated its technology into Microsoft products.

Microsoft’s investment in OpenAI has given it an early lead in Silicon Valley’s race to deploy AI, leaving its biggest rival, Google, struggling to catch up. It also has invested in other AI startups, including French developer Mistral.

It has been rolling out an AI assistant in its products such as Windows, Office software, and cyber security tools. Suleyman’s unit will work on projects including integrating an AI version of Copilot into its Windows operating system and enhancing the use of generative AI in its Bing search engine.

Nadella said in a statement on Tuesday: “I’ve known Mustafa for several years and have greatly admired him as a founder of both DeepMind and Inflection, and as a visionary, product maker and builder of pioneering teams that go after bold missions.”

DeepMind was acquired by Google in 2014 for $500 million, one of the first large bets by a big tech company on a startup AI lab. The company faced controversy a few years later over some of its projects, including its work for the UK healthcare sector, which was found by a government watchdog to have been granted inappropriate access to patient records.

Suleyman, who was the main public face for the company, was placed on leave in 2019. DeepMind workers had complained that he had an overly aggressive management style. Addressing staff complaints at the time, Suleyman said: “I really screwed up. I was very demanding and pretty relentless.”

He moved to Google months later, where he led AI product management. In 2022, he joined Silicon Valley venture capital firm Greylock and launched Inflection later that year.

Microsoft will also hire most of Inflection’s staff, including Karén Simonyan, cofounder and chief scientist of Inflection, who will be chief scientist of the AI group. Microsoft did not clarify the number of employees moving over but said it included AI engineers, researchers, and large language model builders who have designed and co-authored “many of the most important contributions in advancing AI over the last five years.”

Inflection, a rival to OpenAI, will switch its focus from its consumer chatbot, Pi, and instead move to sell enterprise AI software to businesses, according to a statement on its website. Sean White, who has held various technology roles, has joined as its new chief executive.

Inflection’s third cofounder, Reid Hoffman, the founder and executive chair of LinkedIn, will remain on Inflection’s board. Inflection had raised $1.3 billion in June, valuing the group at about $4 billion, in one of the largest fundraisings by an AI start-up amid an explosion of interest in the sector.

The new unit marks a big organizational shift at Microsoft. Mikhail Parakhin, its president of web services, will move along with his entire team to report to Suleyman.

“We have a real shot to build technology that was once thought impossible and that lives up to our mission to ensure the benefits of AI reach every person and organization on the planet, safely and responsibly,” Nadella said.

Competition regulators in the US and Europe have been scrutinising the relationship between Microsoft and OpenAI amid a broader inquiry into AI investments.

© 2024 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

DeepMind co-founder Mustafa Suleyman will run Microsoft’s new consumer AI unit Read More »

deepmind-ai-rivals-the-world’s-smartest-high-schoolers-at-geometry

DeepMind AI rivals the world’s smartest high schoolers at geometry

Demis Hassabis, CEO of DeepMind Technologies and developer of AlphaGO, attends the AI Safety Summit at Bletchley Park on November 2, 2023 in Bletchley, England.

Enlarge / Demis Hassabis, CEO of DeepMind Technologies and developer of AlphaGO, attends the AI Safety Summit at Bletchley Park on November 2, 2023 in Bletchley, England.

A system developed by Google’s DeepMind has set a new record for AI performance on geometry problems. DeepMind’s AlphaGeometry managed to solve 25 of the 30 geometry problems drawn from the International Mathematical Olympiad between 2000 and 2022.

That puts the software ahead of the vast majority of young mathematicians and just shy of IMO gold medalists. DeepMind estimates that the average gold medalist would have solved 26 out of 30 problems. Many view the IMO as the world’s most prestigious math competition for high school students.

“Because language models excel at identifying general patterns and relationships in data, they can quickly predict potentially useful constructs, but often lack the ability to reason rigorously or explain their decisions,” DeepMind writes. To overcome this difficulty, DeepMind paired a language model with a more traditional symbolic deduction engine that performs algebraic and geometric reasoning.

The research was led by Trieu Trinh, a computer scientist who recently earned his PhD from New York University. He was a resident at DeepMind between 2021 and 2023.

Evan Chen, a former Olympiad gold medalist who evaluated some of AlphaGeometry’s output, praised it as “impressive because it’s both verifiable and clean.” Whereas some earlier software generated complex geometry proofs that were hard for human reviewers to understand, the output of AlphaGeometry is similar to what a human mathematician would write.

AlphaGeometry is part of DeepMind’s larger project to improve the reasoning capabilities of large language models by combining them with traditional search algorithms. DeepMind has published several papers in this area over the last year.

How AlphaGeometry works

Let’s start with a simple example shown in the AlphaGeometry paper, which was published by Nature on Wednesday:

The goal is to prove that if a triangle has two equal sides (AB and AC), then the angles opposite those sides will also be equal. We can do this by creating a new point D at the midpoint of the third side of the triangle (BC). It’s easy to show that all three sides of triangle ABD are the same length as the corresponding sides of triangle ACD. And two triangles with equal sides always have equal angles.

Geometry problems from the IMO are much more complex than this toy problem, but fundamentally, they have the same structure. They all start with a geometric figure and some facts about the figure like “side AB is the same length as side AC.” The goal is to generate a sequence of valid inferences that conclude with a given statement like “angle ABC is equal to angle BCA.”

For many years, we’ve had software that can generate lists of valid conclusions that can be drawn from a set of starting assumptions. Simple geometry problems can be solved by “brute force”: mechanically listing every possible fact that can be inferred from the given assumption, then listing every possible inference from those facts, and so on until you reach the desired conclusion.

But this kind of brute-force search isn’t feasible for an IMO-level geometry problem because the search space is too large. Not only do harder problems require longer proofs, but sophisticated proofs often require the introduction of new elements to the initial figure—as with point D in the above proof. Once you allow for these kinds of “auxiliary points,” the space of possible proofs explodes and brute-force methods become impractical.

DeepMind AI rivals the world’s smartest high schoolers at geometry Read More »