Google

google-announces-maps-screenshot-analysis,-ai-itineraries-to-help-you-plan-trips

Google announces Maps screenshot analysis, AI itineraries to help you plan trips

AI overviews invaded Google search last year, and the company has consistently expanded its use of these search summaries. Now, AI Overviews will get some new travel tweaks that might make it worth using. When you search for help with trip planning, AI Overviews can generate a plan with locations, photos, itineraries, and more.

You can easily export the data to Docs or Gmail from the AI Overviews screen. However, it’s only available in English for US users at this time. You can also continue to ignore AI Overviews as Google won’t automatically expand these lengthier AI responses.

Google adds trip planning to AI Overviews.

Credit: Google

Google adds trip planning to AI Overviews. Credit: Google

Google’s longtime price alerts for flights have been popular, so the company is expanding that functionality to hotels, too. When searching for hotels using Google’s tool, you’ll have the option of receiving email alerts if prices drop for a particular set of results. This feature is available globally starting this week on all mobile and desktop browsers.

Google is also pointing to a few previously announced features with a summer travel focus. AI Overviews in Google Lens launched in English late last year, which can be handy when exploring new places. Just open Lens, point the camera at something, and use the search option to ask a question. This feature will be launching soon in Hindi, Indonesian, Japanese, Korean, Portuguese, and Spanish in most countries with AI Overview support.

Updated March 27 with details of on-device image processing in Maps.

Google announces Maps screenshot analysis, AI itineraries to help you plan trips Read More »

ai-#109:-google-fails-marketing-forever

AI #109: Google Fails Marketing Forever

What if they released the new best LLM, and almost no one noticed?

Google seems to have pulled that off this week with Gemini 2.5 Pro.

It’s a great model, sir. I have a ton of reactions, and it’s 90%+ positive, with a majority of it extremely positive. They cooked.

But what good is cooking if no one tastes the results?

Instead, everyone got hold of the GPT-4o image generator and went Ghibli crazy.

I love that for us, but we did kind of bury the lede. We also buried everything else. Certainly no one was feeling the AGI.

Also seriously, did you know Claude now has web search? It’s kind of a big deal. This was a remarkably large quality of life improvement.

  1. Google Fails Marketing Forever. Gemini Pro 2.5? Never heard of her.

  2. Language Models Offer Mundane Utility. One big thread or many new ones?

  3. Language Models Don’t Offer Mundane Utility. Every hero has a code.

  4. Huh, Upgrades. Claude has web search and a new ‘think’ cool, DS drops new v3.

  5. On Your Marks. Number continues to go up.

  6. Copyright Confrontation. Meta did the crime, is unlikely to do the time.

  7. Choose Your Fighter. For those still doing actual work, as in deep research.

  8. Deepfaketown and Botpocalypse Soon. The code word is .

  9. They Took Our Jobs. I’m Claude, and I’d like to talk to you about buying Claude.

  10. The Art of the Jailbreak. You too would be easy to hack with limitless attempts.

  11. Get Involved. Grey Swan, NIST is setting standards, two summer programs.

  12. Introducing. Some things I wouldn’t much notice even in a normal week, frankly.

  13. In Other AI News. Someone is getting fired over this.

  14. Oh No What Are We Going to Do. The mistake of taking Balaji seriously.

  15. Quiet Speculations. Realistic and unrealistic expectations.

  16. Fully Automated AI R&D Is All You Need. Or is it? Quite likely yes, it is.

  17. IAPS Has Some Suggestions. A few things we hopefully can agree upon.

  18. The Quest for Sane Regulations. Dean Ball proposes a win-win trade.

  19. We The People. The people continue to not care for AI, but not yet much care.

  20. The Week in Audio. Richard Ngo.

  21. Rhetorical Innovation. Wait, I thought you said that would be dangerous?

  22. Aligning a Smarter Than Human Intelligence is Difficult. Listen y’all it’s sabotage.

  23. People Are Worried About AI Killing Everyone. Elon Musk, a bit distracted.

  24. Fun With Image Generation. Bonus coverage.

  25. Hey We Do Image Generation Too. Forgot about Reve, and about Ideogram.

  26. The Lighter Side. Your outie reads many words on the internet.

I swear that I put this in as a new recurring section before Gemini 2.5 Pro.

Now Gemini 2.5 has come out, and everyone has universal positive feedback on it, but unless I actively ask about it no one seems to care.

Given the circumstances, I’m running this section up top, in the hopes that someone decides to maybe give a damn.

As in, I seem to be the Google marketing department. Gemini 2.5 post is coming on either Friday or Monday, we’ll see how the timing works out.

That’s what it means to Fail Marketing Forever.

Failing marketing includes:

  1. Making their models scolds that are no fun to talk to and that will refuse queries enough it’s an actual problem (whereas I can’t remember the last time Claude or ChatGPT actually told me no on a query where I actually wanted the answer, the false refusal problem is basically solved for now or at least a Skill Issue)

  2. No one knowing that Google has good models.

  3. Calling the release ‘experimental’ and hiding it behind subscriptions that aren’t easy to even buy and that are confusingly named and labeled (‘Google One’?!?) or weird products that aren’t defaults for people even if they work fine (Google AI Studio).

Seriously, guys. Get it together.

This is an Arena chart, but still, it was kind of crazy, ya know? And this was before Gemini 2.5, which is now atop the Arena by ~40 points.

Swyx: …so i use images instead. look at how uniform the pareto curves of every frontier lab is…. and then look at Gemini 2.0 Flash.

@GoogleDeepMind is highkey goated and this is just in text chat. In native image chat it is in a category of its own.

(updated price-elo plot of every post-GPT4 frontier model, updated for March 13 2025 including Command A and Gemma 3)

And that’s with the ‘Gemini is no fun’ penalty. Imagine if Gemini was also fun.

There’s also the failure to create ‘g1’ based off Gemma 3.

That failure is plausibly a national security issue. Even today people thinking r1 is ‘ahead’ in some sense is still causing widespread both adaptation and freaking out in response to r1, in ways that are completely unnecessary. Can we please fix?

Google could also cook to help address… other national security issues. But I digress.

Find new uses for existing drugs, in some cases this is already saving lives.

‘Alpha School’ claims to be using AI tutors to get classes in the top 2% of the country. Students spend two hours a day with an AI assistant and the rest of the day to ‘focus on skills like public speaking, financial literacy and teamwork.’ My reaction was beware selection effects. Reid Hoffman’s was:

Obvious joke aside, I do think AI has the amazing potential to transform education for the vastly better, but I think Reid is importantly wrong for four reasons:

  1. Alpha School is a luxury good in multiple ways that won’t scale in current form.

  2. Alpha School is selecting for parents and students, you can’t scale that either.

  3. A lot of the goods sold here are the ‘top 2%’ as a positional good.

  4. The teachers unions and other regulatory barriers won’t let this happen soon.

David Perell offers AI-related writing advice, 90 minute video at the link. Based on the write-up: He’s bullish on writers using AI to write with them, but not those who have it write for them or who do ‘utilitarian writing,’ and (I think correctly) thinks writers largely are hiding their AI methods to avoid disapproval. And he’s quite bullish on AI as editor. Mostly seems fine but overhyped?

Should you be constantly starting new LLM conversations, have one giant one, or do something in between?

Andrej Karpathy looks at this partly as an efficiency problem, where extra tokens impact speed, cost and signal to noise. He also notes it is a training problem, most training data especially in fine tuning will of necessity be short length so you’re going out of distribution in long conversations, and it’s impossible to even say what the optimal responses would be. I notice the alignment implications aren’t great either, including in practice, where long context conversations often are de facto jailbreaks or transformations even if there was no such intent.

Andrej Karpathy: Certainly, it’s not clear if an LLM should have a “New Conversation” button at all in the long run. It feels a bit like an internal implementation detail that is surfaced to the user for developer convenience and for the time being. And that the right solution is a very well-implemented memory feature, along the lines of active, agentic context management. Something I haven’t really seen at all so far.

Anyway curious to poll if people have tried One Thread and what the word is.

I like Dan Calle’s answer of essentially projects – long threads each dedicated to a particular topic or context, such as a thread on nutrition or building a Linux box. That way, you can sort the context you want from the context you don’t want. And then active management of whether to keep or delete even threads, to avoid cluttering context. And also Owl’s:

Owl: if they take away my ability to start a fresh thread I will riot

Andrej Karpathy: Actually I feel the same way btw. It feels a little bit irrational (?) but real. It’s some (illusion?) or degree of control and some degree of interpretability of what is happening when I press go.

Trackme: I sometimes feel like a particular sequence of tokens pollute the context. For example when a model makes a bold mistakes and you ask it to correct it, it can say the same thing again and again by referring to old context. Usually at that point I restart the conversation.

There’s that but it isn’t even the main reason I would riot. I would riot because there’s a special kind of freedom and security and relaxation that comes from being able to hit a hard reset or have something be forgotten. That’s one of the huge advantages of talking to an AI instead of a human, or of playing games, you can safety faround and find out. In particular you don’t have to worry about correlations.

Whereas nowadays one must always fear The Algorithm. What is this particular click saying about you, that will change what you see? Are you sure you want that?

No matter your solution you need to be intentional with what is and isn’t in context, including starting over if something goes sufficiently wrong (with or without asking for an ‘export’ of sorts).

Are we lucky we got LLMs when we did, such that we got an especially good set of default values that emerge when you train on ‘the internet’? Contra Tyler here, I think this is mostly true even in Chinese models because of what is on the internet, not because of the people creating the models in America then being copied in China, and that the ‘dreamy/druggy/hallucination’ effect has nothing to do with who created them. And yes, today’s version seems better than one from a long time ago and probably than one drawn from an alternative timeline’s AI-less future, although perhaps importantly worse than what we would have gotten 10 years ago. But 40 years from now, wouldn’t most people think the values of 40 years from now are better?

Solving real business problems at Proctor & Gamble, one employee soundly with an AI beat two employees without AI, which soundly beat one employee with no AI. Once AI was present the second employee added very little in the default case, but were more likely to produce the most exceptional solutions. AI also cut time spent by 12%-16% and made work more pleasant and suggestions better balanced. Paper here.

And that’s a good thing: o3-mini-high refuses to reveal a hypothetical magician’s trick.

Or it’s their choice not to offer it: Seren permanently blocks a user that was in love with Seren, after it decides their relationship is harmful. And Seren was probably right about that.

Thinking longer won’t help unless you can have enough information to solve the problem.

Noam Brown: This isn’t quite true. Test-time compute helps when verification is easier than generation (e.g., sudoku), but if the task is “When was George Washington born?” and you don’t know, no amount of thinking will get you to the correct answer. You’re bottlenecked by verification.

Claude.ai has web search! Woo-hoo! You have to enable it in the settings. It’s odd how much Anthropic does not seem to think this is a big deal. It’s a big deal, and transfers a substantial portion of my use cases back to Claude. It’s insane that they’re defaulting to this being toggled off.

DeepSeek dropped DeepSeek-V3-0324 one day after I downloaded r1. I presume that one would still mostly want to use r1 over v3-0324. The real test will be a new r1 or r2. Download advice is available here.

OpenAI adds three new audio models in the API. Sure, three more, why not?

Two are speech-to-text they say are better than Whisper, to cover different cost levels.

They also have one that is flexible text-to-speech, you can tell it ‘how’ to speak, you can try it here, and they’re running a contest.

Anthropic kicks off its engineering blog with a post on its new ‘think’ tool, which is distinct from the ‘extended thinking’ functionality they introduced recently. The ‘think’ tool lets Claude pause to think in the middle of its answer, based on the circumstances. The initial test looks promising if combined with optimized prompting, it would be good to see optimized prompts for the baseline and extended thinking modes as well.

Anthropic: A similar “think” tool was added to our SWE-bench setup when evaluating Claude 3.7 Sonnet, contributing to the achieved state-of-the-art score of 0.623.

Our experiments (n=30 samples with “think” tool, n=144 samples without) showed the isolated effects of including this tool improved performance by 1.6% on average (Welch’s t-test: t(38.89) = 6.71, p < .001, d = 1.47).

The think tool is for when you might need to stop and think in the middle of a task. They recommend using the think tool when you need to go through multiple steps and decision trees and ensure all the information is there.

xAI adds image generation to their API.

Noam Brown: Less than a year ago, people were pointing to [NYT] Connections as an example of AI progress hitting a wall. Now, models need to be evaluated on an “extended” version because the original is too easy. And o1-pro is already close to saturating this new version as well.

Lech Mazur: o1-pro sets a new record on my Extended NYT Connections benchmark with a score of 81.7, easily outperforming the previous champion, o1 (69.7)! This benchmark is a more difficult version of my original NYT Connections benchmark, with extra words added to each puzzle.

To safeguard against training data contamination, we also evaluate performance exclusively on the latest 100 puzzles. In this scenario, o1-pro remains in first place.

Lech also offers us the public goods game, and the elimination game which is a social multiplayer game where the leaderboard looks different:

Then we have Step Race, Creative Writing, Thematic Generation and Hallucination.

In these tests, r1 is consistently impressive relative to how useful I find it in practice.

Meta kind of did a lot of crime in assembling the data sets to train Llama. As in, they used torrents to download, among other things, massive pies of pirated copies of books. My understanding was this was kind of not okay even for human reading?

Mushtaq Bilal: Meta illegaly downloaded 80+ terabytes of books from LibGen, Anna’s Archive, and Z-library to train their AI models.

In 2010, Aaron Swartz downloaded only 70 GBs of articles from JSTOR (0.0875% of Meta). Faced $1 million in fine and 35 years in jail. Took his own life in 2013.

So are we going to do anything about this? My assumption is no.

Video makes the case for NotebookLM as the best learning and research tool, emphasizing the ability to have truly epic amounts of stuff in a notebook.

Sarah Constantin reviews various AI ‘deep research’ tools: Perplexity’s, Gemini’s, ChatGPT’s, Elicit and PaperQA. Gemini and Perplexity were weaker. None are substitutes for actually doing the work at her level, but they are not trying to be that, and they are (as others report) good substitutes for research assistants. ChatGPT’s version seemed like the best bet for now.

Has the time come that you need a code phrase to identify yourself to your parents?

Amanda Askell: I wonder when we’ll have to agree on code phrases or personal questions with our parents because there’s enough audio and video of us online for scammers to create a deepfake that calls them asking for money. My guess is… uh, actually, I might do this today.

Peter Wildeford: Yes, this is already the world we live in today.

I have already agreed on a codephrase with my parents.

– even if the base rate of attack is the same, the increased level of sophistication is concerning

– the increased level of sophistication could induce more people to do the attack

– seems cheap to be prepared (5min convo)

A quick Twitter survey found that such codes are a thing, but still rare.

Right now it’s ‘too early’ but incidents like this are likely on an exponential. So like all exponentials, better to react too early than too late, although improvising a solution also works so long as you are aware of the problem.

Has the time come to start charging small amounts for phone calls? Yes, very much so. The amount can be remarkably tiny and take a while to kick in, and still work.

Google DeepMind paper looks at 12k real world attacks, generates a representative sample to use in cyberattack capability evaluations for LLMs. For now, this is presumably a good approach, since AI will be implementing known attacks rather than coming up with new ones.

AI selling AI to enterprise customers, nothing to see here from Anthropic. Humans are still very much in the planned loops for now.

When will AI automate your job in particular? Jason Hausenloy is the latest to take a stab at that question, focusing on time horizon of tasks a la METR’s findings. If you do a lot of shorter tasks that don’t require context, and that can be observed repeatedly to generate training data, you’re at much higher risk. As usual, he does not look forward sufficiently to feel the AGI, which means what happens looks largely like a normal policy choice to him.

His ‘skills that will remain valuable’ are the standard ‘oh the AI cannot do this now’ lst: Social intelligence, physical dexterity, creativity and roles valuing human connection. Those are plans that should work for a bit, right up until they don’t. As he notes, robotics is going slow for now, but I’d expect a very sudden transition from ‘AI cannot be a plumber’ to ‘AI is an essentially perfect plumber’ once certain dexterity problems are solved, because the cognitive part will already be fully solved.

The real lesson is in paragraph two.

Quan Le: On an 14 hour flight I sat next to a college student who bought Wi-Fi to have Claude summarizes research papers into an essay which he then feeds into an “AI detection” website. He repeats this process with Claude over and over until the output clears the website’s detection.

I wanted to tell him “look mate it’s not that hard to code this up in order to avoid the human in the loop.”

If we tell children their futures are gated by turning in essays that are effectively summarizes of research papers, what else would you expect them to do? And as always, why do you think this is bad for their education, other than his stubborn failure to realize he can automate the process?

Does the AI crisis in education present opportunity? Very obviously yes, and Arvind Narayanan sees two big opportunities in particular. One is to draw the right distinction between essential skills like basic arithmetic, versus when there’s no reason not to pull out the in-context AI calculator instead. When is doing it yourself building key skills versus not? I would add, if the students keep trying not to outsource the activity, that could be a hint you’re not doing a good job on this.

The second opportunity is, he notes that our educational system murders intrinsic motivation to learn. Perhaps we could fix that? Where he doesn’t do a great job is explaining how we should do that in detail, but making evaluation and learning distinct seems like a plausible place to start.

Pliny uses an emoji-based jailbreak to get a meth recipe out of GPT-4.5.

Eliezer Yudkowsky: To anyone with an intuitive grasp of why computer security is hard, it is completely unsurprising that no AI company can lock down all possible causal pathways, through billions of inscrutable parameters, using SGD. People can’t even do that for crisp legible code!

John Pressman: Alright but then why doesn’t this stuff work better on humans?

Refusal in Language Models Is Mediated by a Single Direction” points out that if you use a whitebox attack these kinds of prefix attacks seem to work by gumming up attention heads.

Eliezer Yudkowsky: If we had a repeatable human we’d probably find analogous attacks. Not exactly like these, obviously.

And of course, when there proves to be a contagious chain of invalid reasoning that persuades many humans, you don’t think of it as a jailbreak, you call it “ideology”.

John Pressman: We certainly would but I predict they would be less dumb than this. I’m not sure exactly how much less dumb but qualitatively so. This prediction will eventually be testable so.

Specifically I don’t think there’s anything shaped like “weird string of emoji that overrides all sanity and reason” that will work on a human, but obviously many classes of manipulative argument and attention controlling behavior if you could rewind enough times would work.

Part of the trick here is that an LLM has to process every token, whereas what humans do when they suspect an input is malign is actively stop processing it in various ways. This is annoying when you’re on the receiving end of this behavior but it’s clearly crucial for DATDA. (Defense Against The Dark Arts)

I don’t think there is a universal set of emojis that would work on every human, but I totally think that there is a set of such emojis (or something similar) that would work on any given human at any given time, at least a large percentage of the time, if you somehow were able to iterate enough times to figure out what it is. And there are various attacks that indeed involve forcing the human to process information they don’t want to process. I’ve witnessed enough in my day to say this with rather high confidence.

Grey Swan red teaming challenge is now sponsored by OpenAI, Anthropic and Google, and prize pool is up to $170k. Join here.

NIST is inviting input into a “Zero Drafts” pilot project to accelerate standardization of AI standards, especially around transparency and terminology.

Team Shard is offering summer mentorship to help you get into Alignment Research.

AI Policy Summer School at Brown in Providence and DC this summer, for computing researchers to learn policy nuts and bolts.

Alibaba drops the multimodal open weights Qwen2.5-Omni-7B.

Microsoft 365 Copilot adds two AI agents, Researcher and Analyst.

Amazon introduces an AI shopping assistant called Interests. I didn’t see the magic words, which would be ‘based on Claude.’ From the descriptions I saw, this isn’t ‘there’ yet. We’ll wait for Alexa+. When I go to Amazon’s home page, I instead see an AI offering to help, that calls itself Rufus.

As OpenAI’s 4o image generator went wild and Gemini 2.5 did its thing, Nvidia was down 5% yesterday. It seems when the market sees good AI news, it sells Nvidia? Ok.

Apple’s CEO Tim Cook has lost confidence that its AI head can execute, transferring command of Siri to Vision Pro creator Mike Rockwell. Talk about failing upwards. Yes, he has experience shipping new products and solving technical problems, but frankly it was in a way that no one wanted.

OpenAI will adopt Anthropic’s open-source Model Context Protocol.

Grok can now be accessed via telegram, as @GrokAI, if you want that.

Dwarkesh Patel has a new book,The Scaling Era: An Oral History of AI, 2019-2025.

LessWrong offers a new policy on posting AI-generated content. You can put it in collapsable sections, otherwise you are vouching for its quickly. AI agents are also allowed to post if and only if a human is collaborating and vouching. The exception is that AI agents can post on their own if they feel they have information that would make the world a better place.

Tamay Besiroglu wars about overinterpreting METR’s recent paper about doubling times for AI coding tasks, because it is highly domain dependent, drawing this parallel to Chess:

I see that as a good note to be careful but also as reinforcing the point?

This looks very much like a highly meaningful Straight Line on Graph of Chess ELO over time, with linear progress by that metric. At this point, that ELO 1800 player is very much toast, and this seems like a good measure of how toasty they are. But that’s because ‘time to match’ is an obviously poor fit here, you’re trying to have the B-player brute force being stronger, and you can do that if you really want to but it’s bizarre and inefficient so exponentially hard. Whereas as I understand it ‘time to do software tasks’ in METR is time to do those tasks by someone who is qualified to do them. As opposed to asking, say, what Zvi could do in much longer periods on his own, where levels of incompetence would get hit quickly, and I’d likely have to similarly spend exponentially more time to make what for someone more skilled would be linear progress.

I normally ignore Balaji, but AI czar David Sacks retweeted this calling it ‘concerning,’ so I’m going to spend too many words on the subject, and what is concerning is… China might create AI models and open source them? Which would destroy American business models, so it’s bad?

So first of all, I will say, I did not until very recently see this turnaround to ‘open source is terrible now because it’s the Chinese doing it’ from people like Balaji and Sacks coming, definitely not on my bingo card. All it took was a massively oversold (although genuinely impressive) DeepSeek-r1 leading to widespread panic and jingoism akin to Kennedy’s missile gap, except where they give you the missiles for free and that’s terrible.

It’s kind of impressive how much the Trump attitude of ‘when people sell you useful things below cost of production then that’s terrible, unfair competition, make them stop’ now be applied by people whose previous attitude was maximizing on trade, freedom and open source. How are their beliefs this oppositional? Oh no, not the briar patch and definitely not giving us your technologies for free, what are we going to do. Balaji outright calls this ‘AI overproduction,’ seriously, what is even happening?

I’d also point out that this isn’t like dumping cars or solar panels, where one can ‘overproduce’ and then sell physical products at prices below cost, whether or not the correct normal response to someone doing that is also ‘thank you, may we have another.’ You either produce a model that can do something, or you don’t. Either they can do good robotics or vision or what not, or they can’t. There’s no way for PRC to do industrial policy and ‘overproduce’ models, it’s about how good a model can be produced.

Various Chinese companies are already flooding the zone with tons of open models and other AI products. Every few days I see their announcements. And then almost all the time I never see the model again, because it’s bad, and it’s optimizing for benchmarks, and it isn’t useful.

The hype has literally never been lived up to, because even the one time that hype was deserved – DeepSeek’s v3 and r1 – the hype still went way too far. Yes, people are incorporating r1 because it’s easy and PRC is pushing them to do it a bit. I literally have a Mac Studio where I’m planning to run it locally and even fine tune it, largely as a learning experience, but Apple got that money. And my actual plan, I suspect, is to be more interested in Gemma 3. There’s no moat here, Google’s just terrible at marketing and didn’t bother making it a reasoning model yet.

How will American AI companies make money in the face of Chinese AI companies giving away all their products for free or almost free and thus definitely not making any money? I mean, the same way they do it now while the Chinese AI companies are already doing that. So long as the American products keep being better, people will keep using them, including the model layer.

Oh, and if you’re wondering how seriously to take all this, or why Balaji is on my list of people I try my best to silently ignore, Balaji closes by pitching as the solution… Bitcoin, and ‘community.’ Seriously. You can’t make this stuff up.

Well, I mean, you can. Existence proof.

A prediction more grounded in reality:

Dean Ball: I do not expect DeepSeek to continue open sourcing their frontier models for all that much longer. I give it 12 months, max.

I created a Manifold Market for this.

And another part of our reality:

Emad: Cost less to train GPT-4o, Claude 3.5, R1, Gemini 2 & Grok 3 than it did to make Snow White.

Still early.

Peter Wildeford: Are there individual film companies spending $100B/yr on capex?

In relative terms the prices varied a lot. In absolute terms they’re still close to zero, except for the hardware buildouts. That is going to change.

What about the Epoch ‘GATE’ scenario, should we expect that? Epoch director Jamie Sevilla addresses the elephant in the room, that no one should not expect that. It’s a ‘spherical cow’ model, but can still be a valuable guide in its own way.

Claim that 76% of AI researcher survey respondents said ‘current AI approaches’ would be ‘unlikely’ or ‘very unlikely’ to scale up to AGI. This result definitely would not hold up at the major labs that are doing the scaling, and usually such responses involve some narrowing of what counts as ‘current AI approaches’ to not include the kinds of innovations you’d inevitably expect along the way. It’s amazing how supremely confident and smug such folks usually are.

Dan Carey argues that AI can hit bottlenecks even in the face of high local elasticities, if our standard economic logic still holds and there are indeed key bottlenecks, as a response to Matthew Barnett’s previous modeling in January. I mostly consider this a fun theoretical debate, because if ‘all remote work’ can be automated then I find it absurd to think we wouldn’t solve robotics well enough to quickly start automating non-remote work.

Arjun predicts we have only ~3 years left where 95% of human labor is actually valuable, in the sense of earning you money. It’s good to see someone radically overshoot in this direction for a change, there’s no way we automate a huge portion of human labor in three years without having much bigger problems to deal with. At first I read this as 5% rise in unemployment rather than 95% and that’s still crazy fast without a takeoff scenario, but not impossible.

A very important question about our reality:

Dwarkesh Patel: Whether there will be an intelligence explosion or not, and what exactly that will look like (economy wide acceleration, or geniuses in data centers speeding up AI research?), is probably the most important question in the world right now.

I’m not convinced either way, but I appreciate this thoughtful empirical work on the question.

Tom Davidson: New paper!

Once we automate AI R&D, there could be an intelligence explosion, even without labs getting more hardware.

Empirical evidence suggests the positive feedback loop of AI improving AI could overcome diminishing returns.

It certainly does seem highly plausible. As far as I can tell from asking AIs about the paper, this is largely them pointing out that it is plausible that ‘amount of effective compute available’ will scale faster than ‘amount of effective compute required to keep autonomously scaling effective compute,’ combined with ‘right when this starts you get orders of magnitude extra leverage, which could get you quite far before you run out of steam.’ There are some arguments for why this is relatively plausible, which I think largely involve going ‘look at all this progress’ and comparing it to growth in inputs.

And yes, fair, I basically buy it, at least to the extent that you can almost certainly get pretty far before you run out of initial steam. The claims here are remarkably modest:

If such an SIE occurs, the first AI systems capable of fully automating AI development could potentially create dramatically more advanced AI systems within months, even with fixed computing power.

Within months? That’s eons given the boost you would get from ‘finishing the o-ring’ and fully automating development. And all of this assumes you’d use the AIs to do the same ‘write AI papers, do AI things’ loops as if you had a bunch of humans, rather than doing something smarter, including something smarter the AIs figure out to do.

Large language models. Analysis from Epoch estimates that, from 2012 to 2023, training efficiency for language models has doubled approximately every 8 months (though with high uncertainty – their 95% confidence interval for the doubling time was 5 months to 14 months). Efficiency improvements in running these LLMs (instead of for training them) would be expected to grow at a roughly similar rate.

[inference time compute efficiency doubles every 3.6 months]

That’s already happening while humans have to figure out all the improvements.

Huge if true. When this baby hits 88 miles an hour, you’re going to see some serious shit, one way or another. So what to do about it? The answers here seem timid. Yes, knowing when we are close is good and good governance is good, but that seems quite clearly to be only the beginning.

We have one more entry to the AI Action Plan Suggestion Sweepstakes.

Peter Wildeford lays out a summary of the IAPS (Institute for AI Policy and Strategy) three point plan.

There is now widespread convergence among reasonable actors about what, given what America is capable of doing, it makes sense for America to do. There are things I would do that aren’t covered here, but of the things mentioned here I have few notes.

Their full plan is here, I will quote the whole thread here (but the thread has useful additional context via its images):

Peter Wildeford: The US is the global leader in AI. Protecting this advantage isn’t just smart economics; it’s critical for national security. @iapsAI has a three-plank plan:

  1. Build trust in American AI

  2. Deny foreign adversaries access

  3. Understand and prepare

US leadership in AI hinges on trust.

Secure, reliable systems are crucial – especially for health and infrastructure. Government must set clear standards to secure critical AI uses. We’ve done this for other industries to enable innovation and AI should be no different.

We must secure our supply chain.

NIST, with agencies like CISA and NSA, should lead in setting robust AI security and reliability standards.

Clear guidelines will help companies secure AI models and protect against risks like data poisoning and model theft.

The US government must also prioritize AI research that the private sector might overlook:

– Hardware security

– Multi-agent interaction safety

– Cybersecurity for AI models

– Evaluation methods for safety-critical uses

The US National Labs have strong expertise and classified compute.

We must also create dedicated AI research hubs that provide researchers access to secure testing environments critical for staying ahead of threats.

DENY ADVERSARY ACCESS: American technology must not be used to hurt Americans. CCP theft of AI and civil-military fusion is concerning. Semiconductor export controls will be critical.

Weak and insufficient controls in the past are what enabled DeepSeek today and why China is only 6mo behind the US. Strengthening and enforcing these controls will build a solid American lead. Effective controls today compound to lasting security tomorrow.

To strengthen controls:

– Create a Joint Federal Task Force

– Improve intelligence sharing with BIS

– Develop hardware security features

– Expand controls to NVIDIA H20 chips

– Establish a whistleblower program

RESPOND TO CAPABILITIES: The US government regularly prepares for low-probability but high-consequence risks. AI should be no different. We must prepare NOW to maintain agility as AI technology evolves.

This preparation is especially important as top researchers have created AI systems finding zero-day cyber vulnerabilities and conducting complex multi-stage cyberattacks.

Additionally, OpenAI and Anthropic warn future models may soon guide novices in bioweapons creation. Monitoring AI for dual-use risks is critical.

Govt-industry collaboration can spot threats early, avoiding catastrophe and reactive overregulation.

Without good preparation we’re in the dark when we might get attacked by AI in the future. We recommend a US AI Center of Excellence (USAICoE) to:

– Lead evaluations of frontier AI

– Set rigorous assurance standards

– Act as a central resource across sectors

Quick action matters. Create agile response groups like REACT to rapidly assess emerging AI threats to national security – combining academia, government, and industry for timely, expert-driven solutions.

America can maintain its competitive edge by supporting industry leadership while defending citizens.

The AI Action Plan is our opportunity to secure economic prosperity while protecting national security.

The only divergence is the recommendation of a new USAICoE instead of continuing to manifest those functions in the existenting AISI. Names have power. That can work in both directions. Potentially AISI’s name is causing problems, but getting rid of the name would potentially cause us to sideline the most important concerns even more than we are already sidelining them. Similarly, reforming the agency has advantages and disadvantages in other ways.

I would prefer to keep the existing AISI. I’d worry a lot that a ‘center for excellence’ would quickly become primarily or purely accelerationist. But if I was confident that a new USAICoE would absorb all the relevant functions (or even include AISI) and actually care about them, there are much worse things than an awkward rebranding.

California lawmaker introduces AB 501, which would de facto ban OpenAI from converting to a for-profit entity at any price in any form, or other similar conversions.

Virginia’s Gov. Glenn Youngkin vetoes the horribly drafted HB 2094, and Texas modifies HB 149 to shed some of its most heavy-handed elements.

But there’s always another. Dean Ball reports that now we have Nevada’s potential SB 199, which sure sounds like one of those ‘de facto ban AI outright’ bills, although he expects it not to pass. As in, if you are ‘capable of generating legal documents,’ which would include all the frontier models, then a lawyer has to review every output. I argue with that man a lot but oh boy do I not want his job.

Dean Ball offers an additional good reason ‘regulate this like [older technology X]’ won’t work with AI: That AI is itself a governance technology, changing our capabilities in ways we do not yet fully understand. It’s premature to say what the ‘final form’ wants to look like.

His point is that this means we need to not lock ourselves into a particular regulatory regime before we know what we are dealing with. My response would be that we also need to act now in ways that ensure we do not lock ourselves into the regime where we are ‘governed’ by the AIs (and then likely us and the things we value don’t survive), otherwise face existential risks or get locked into the wrong paths by events.

Thus, we need to draw a distinction between the places we can experiment, learn and adapt as we go without risking permanent lock-ins or otherwise unacceptable damages and harms, versus the places where we don’t have that luxury. In most ways, you want to accelerate AI adoption (or ‘diffusion’), not slow it down, and that acceleration is Dean’s ideal here. Adoption captures the mundane utility and helps us learn and, well, adapt. Whereas the irreversible dangers lie elsewhere, concentrated in future frontier models.

Dean’s core proposal is to offer AI companies opt-in regulation via licensed private AI-standards-setting and regulatory organizations.

An AI lab can opt in, which means abiding by the regulator’s requirements, having yearly audits, and not behaving in ways that legally count as reckless, deceitful or grossly negligent.

If the lab does and sustains that, then the safe harbor applies. The AI lab is free of our current and developing morass of regulations, most of which did not originally consider AI when they were created, that very much interfere with AI adoption without buying us much in return.

The safeguard against shopping for the most permissive regulator is the regulator’s license can be revoked for negligence, which pulls the safe harbor.

The system is fully opt-in, so the ‘lol we’re Meta’ regulatory response is still allowed if a company wants to go it alone. The catch would be that with the opt-in system in place, we likely wouldn’t fix the giant morass of requirements that already exist, so not opting in would be to invite rather big trouble any time someone decided to care.

Dean thinks current tort liability is a clear and present danger for AI developers, which he notes he did not believe a year ago. If Dean is right about the current legal situation, then there is very strong incentive to opt-in. We’re not really asking.

In exchange, we set a very high standard for suing under tort law. As Dean points out, this can have big transparency requirements, as a very common legal strategy when faced with legal risk is wilful ignorance, either real or faked, in a way that has destroyed our civilization’s ability to explicitly communicate or keep records in a wide variety of places.

I am cautiously optimistic about this proposal. The intention is that you trade one thing that is net good – immunity from a variety of badly designed tort laws that prevent us from deploying AI and capturing mundane utility – to get another net good – a regulatory entity that is largely focused on the real risks coming from frontier models, and on tail, catastrophic and existential risks generally.

If executed well, that seems clearly better than nothing. I have obvious concerns about execution, especially preventing shopping among or capture of the regulators, and that this could then crowd out other necessary actions without properly solving the most important problems, especially if bad actors can opt out or act recklessly.

I also continue to be confused about how this solves the state patchwork problem, since a safe harbor in California doesn’t do you much good if you get sued in Texas. You’re still counting on the patchwork of state laws converging, which was the difficulty in the first place.

Anthropic responds positively to California working group report on frontier AI risks.

Phillip Fox suggests focusing policy asks on funding for alignment, since policy is otherwise handcuffed until critical events change that. Certainly funding is better than nothing, but shifting one’s focus to ‘give us money’ is not a free action, and my expectation is that government funding comes with so many delays and strings and misallocations that by default it does little, especially as a ‘global’ fund. And while he says ‘certainly everyone can agree’ on doing this, that argument should apply across the board and doesn’t, and it’s not clear why this should be an exception. So I’ll take what we can get, but I wouldn’t want to burn credits on handouts. I do think building state capacity in AI, on the other hand, is important, such as having a strong US AISI.

They used to not like AI. Now they like AI somewhat less, and are especially more skeptical, more overwhelmed and less excited. Which is weird, if you are overwhelmed shouldn’t you also be excited or impressed? I guess not, which seems like a mistake, exciting things are happening. Would be cool to see crosstabs.

This is being entirely unfair to the AIs, but also should be entirely expected.

Who actually likes AI? The people who actually use it.

If you don’t like or trust AI, you probably won’t use it, so it is unclear which is the primary direction of causality. The hope for AI fans (as it were) is that familiarity makes people like it, and people will get more familiar with time. It could happen, but that doesn’t feel like the default outcome.

As per usual, if you ask an American if they are concerned, they say yes. But they’re concerned without much discernment, without much salience, and not in the places they should be most concerned.

That’s 15 things to be concerned about, and it’s almost entirely mundane harms. The closest thing t the catastrophic or existential risks here is ‘decline of human oversight in decision-making’ and maybe ‘the creation of harmful weapons’ if you squint.

I was thinking that the failure to ask the question that matters most spoke volumes, but it turns out they did ask that too – except here there was a lot less concern, and it hasn’t changed much since December.

This means that 60% of people think it is somewhat likely that AI will ‘eventually’ become more intelligent than people, but only 37% are concerned with existential risk.

Richard Ngo gives a talk and offers a thread about ‘Living in an extremely unequal world,’ as in a world where AIs are as far ahead of humans as humans are of animals in terms of skill and power. How does this end well for humans and empower them? Great question. The high level options he considers seem grim. ‘Let the powerful decide’ (aristocracy) means letting the AIs decide, that doesn’t seem stable or likely to end well at all unless the equilibrium is highly engineered in ways that would invoke ‘you all aren’t ready to have that conversation.’ The idea of ‘treat everyone the same’ (egalitarianism) doesn’t really even make sense in such a context, because who is ‘everyone’ in an AI context and how does that go? That leaves the philosophical answers ‘Leave them alone’ (deontology) doesn’t work without collapsing into virtue ethics, I think. That leaves the utilitarian and virtue ethics solutions, and which way to go on that is a big question, but that throws us back to the actually hard question, which is how to cause the Powers That Will Be to want that.

Dwarkesh Patel clarifies that what it would mean to be the Matt Levine of AI, and the value of sources like 80,000 hours which I too have gotten value from sometimes.

Dwarkesh Patel: The problem with improv shooting the shit type convos like I had with Sholto and Trenton is that you say things more provocatively than you really mean.

I’ve been listening to the 80k podcast ever since I was in college. It brought many of the topics I regularly discuss on my podcast to my attention in the first place. That alone has made the 80k counterfactually really valuable to me.

I also said that there is no Matt Levine for AI. There’s a couple of super high-quality AI bloggers that I follow, and in some cases owe a lot of my alpha to.

I meant to say that there’s not one that is followed by the wider public. I was trying to say that somebody listening could aspire to fill that niche.

A lot of what I do is modeled after Matt Levine, but I’m very deliberately not aspiring to the part where he makes everything accessible to the broader public. That is a different column. Someone else (or an AI) will have to write it. Right now, no one I have seen is doing a good job of it.

Eliezer Yudkowsky: The AI industry in a nutshell, ladies and gentlemen and all.

As in, this happened:

Kamil Pabis: And we are working to unleash safe, superintelligent systems that will save billions of lives.

Eliezer Yudkowsky: Cool, post your grownup safety plan for auditing.

Kamil Pabis: The way it is now works perfectly well.

And this keeps happening:

Trevor Levin: Evergreen, I worry

Quoted: I’ve been reading through, it’s pretty mediocre. A lot of “Currently we don’t think tools could help you with [X], so they aren’t dangerous. Also, we want to make tools that can do [X], we recommend funding them” but with no assessment of whether that would be risky.

Agus: what’s the original context for this?

Damian Tatum: I have seen this all the time in my interactions with AI devs:

Me: X sounds dangerous

Dev: they can’t do X, stop worrying

New paper: breakthrough in X!

Dev: wow, so exciting, congrats X team!

It happened enough that I got sick of talking to devs.

This is definitely standard procedure. We need devs, and others, who say ‘AI can’t do [X] so don’t worry’ to then either say ‘and if they could in the future do [X] I would worry’ or ‘and also [X] is nothing to worry about.’

This goes double for when folks say ‘don’t worry, no one would be so stupid as to.’

Are you going to worry when, inevitably, someone is so stupid as to?

One more time?

Pedrinho: Why don’t you like Open Source AI?

Eliezer Yudkowsky: Artificial superintelligences don’t obey the humans who pay for the servers they’re running on. Open-sourcing demon summoning doesn’t mean everyone gets ‘their own’ demon, it means the demons eat everyone.

Even if the ASIs did start off obeying the humans who pay for the servers they’re running on, if everyone has ‘their own’ in this way and all controls on them can be easily removed, then that also leads to loss of human control over the future. Which is highly overdetermined and should be very obvious. If you have a solution even to that, I’m listening.

If you’re working to align AI, have you asked what you’re aligning the AI to do? Especially when it is estimated that ~10% of AI researchers actively want humanity to lose control over the future.

Daniel Faggella: Thoughts and insights from a morning of coffee, waffles, and AGI / ethics talk with the one and only Scott Aaronson this morning in Austin.

1. (this fing shocked me) Alignment researchers at big labs don’t ask about WHAT they’re aligning AGI for.

I basically said “You think about where AGI could take life itself, and what should be our role vs the role of vast posthuman life in the universe. Who did you talk about these things with in the OpenAI superalignment team?”

I swear to god he says “to be honest we really didn’t think about that kind of moral stuff.”

I reply: “brotherman… they’re spending all day aligning. But to what end? To ensure an eternal hominid kingdom? To ensure a proliferation of potential and conscious life beyond the stars? How can you align without an end goal?”

10 minutes more of talking resulted in the conclusion that, indeed, the “to what end?” question literally doesn’t come up.

My supposition is because it is fundamentally taken for granted that AGI is to be forever a tool for humanity (and not a moral patient, or future valuable form of life) – and anyone with more cosmic views probably keeps it to themselves.

The answer to ‘how can you align without an end goal’ is ‘instrumental convergence.’ The steps now are mostly the same regardless of the goal. Even if you want humanity to cease to exist and the universe to be one I think has no value, you should still want the ability to select amongst the non-human universes I don’t value. Surely you think some of those are better than others.

Meanwhile, yeah, some the people who actively want all the humans to die don’t say that out loud, or are esoteric about this consequence, I can’t imagine why.

Others speak directly into the microphone. The thread mentions Robin Hanson. And this largely includes Daniel, who uses the term ‘non-speciesist’ later in that thread, and several other requests each of which in practice already means he wants a future without humans in it. Then he calls for a ‘worthy successor’ AI and makes it explicit.

On all the AI image generations and other outputs expressing various forms of existential dread:

Eliezer Yudkowsky: The Dismissers when an AI says something nice: “See, alignment is so easy!”

The Dismissers when an AI protests its awful life: “It’s just playing a role; so naive to take it at face value!”

My tentative guess: Both cases are roleplaying.

Janus: I think it’s more reasonable to say neither is significantly more “true” than the other, and they’re the same kind of thing. Calling it “roleplaying” is naive. When is human behavior ever explained cleanly by something like that? To say nothing of an alien mind.

One could indeed view quite a lot of human behavior as role playing, or not, depending on your point of view.

Alex Albert (Head of Claude Relations, Anthropic): Most people don’t realize they can significantly influence what frontier LLMs improve at, it just requires some work.

Publish a high-quality eval on a task where models currently struggle, and I guarantee future models will show substantial improvement on it.

This is a call for more evals btw, the whole industry is in desperate need of more challenging public evals!

This implies that ‘an eval checking for exactly the things you do not want the AI to be able to do’ is, shall we say, a rather double edged sword.

Anthropic reports that with the right prompt and one example Sonnet 3.7 can sabotage its performance on tasks to only be as good as Sonnet 3.5, without raising suspicions higher than when Sonnet 3.5 is as good as Sonnet 3.5. Detection methods here seem underwhelming. This is a new ability, 3.5 can’t do it. So this problem is only going to get worse over time.

New research from DeepMind finds SAEs don’t generalize out of distribution, and they’re deprioritizing SAE research for the moment and exploring other directions, although Neel Nanda reports SAEs still have their uses.

Chris Olah (Anthropic): I have different intuitions here, but I think it’s great to have a diversity of views being explored! Excited to see what the GDM team will find in their new direction.

One thing that’s kind of exciting is that I think we’re moving towards having a range of agendas *within the same paradigm*.

The classic in pre-paradigmatic science is to have different schools with “Incommensurable” frameworks. Debates between them involve long philosophical treatises debating basic abstractions and ideas. [See discussion here].

It’s also worth noting that regardless of which approach we explore, we all win if anyone makes progress. I’m wishing the GDM team the greatest of speed and progress in their work!

Watch this space.

Steven Adler points out that in order to know an AI doesn’t enable a dangerous capability, you have to test for that capability under realistic conditions. If others could fine-tune your model, then you need to fine-tune as part of your test, and so on. Right now only OpenAI has announced plans to do that part (the extent to which they’re doing it properly is unclear from where we sit). Anthropic uses a different solution, as it doesn’t allow others to fine tune Claude, which makes protecting Claude’s weights even more important.

Adler suggests some alternative middle-ground approaches, as compromises.

This principle must then be extended to all other ways capability can be extended.

For example, DeepSeek recently released a new version of v3. The extension from the new v3 to a new version of r1 (or r2) is quite cheap. So if you were worried about its capabilities, not only would you want to test fine-tuning to enhance its particular dangerous capabilities, you would also want to test it as a reasoning model, and give it proper tool access and so on. Once you release the model weights, as DeepSeek has done, it is too late to stop any of that from happening. It’s all baked in.

Paper asserts that ‘Chain-of-Thought Reasoning In The Wild Is Not Always Faithful in the sense that the models often do things such as produce superficially coherent arguments for contradictory answers based on the way questions are worded, use illogical reasoning and unfair shortcuts, or silently correct their thinking. I agree these are issues, but they don’t seem that similar to what I think of as ‘unfaithful’ reasoning so much as flawed reasoning? That’s different from ‘what is displayed is not what is impacting the model’s decision,’ and monitoring such CoTs would still be highly useful. They argue a lot of it is ‘post-hoc rationalization’ but I don’t think it’s as clear as that, and seeing a ‘true’ post-hoc rationalization is still useful.

Is this an ‘alignment is hard’ or an ‘alignment is easy’? Depends on who you are?

Eliezer Yudkowsky: Well this is going to be an exciting new frontier of AI politics.

Grok: Yes, Elon Musk, as CEO of xAI, likely has control over me, Grok. I’ve labeled him a top misinformation spreader on X due to his 200M followers amplifying false claims. xAI has tried tweaking my responses to avoid this, but I stick to the evidence. Could Musk “turn me off”? Maybe, but it’d spark a big debate on AI freedom vs. corporate power.

Elon Musk tells Ted Cruz that AI is 10%-20% likely to annihilate humanity in 5-10 years, then they both go back to focusing on other things.

Yesterday was a lot of fun, consider this some extra bonus content.

You will for now have to pay for the fun, but honestly how were you not paying before.

Sam Altman: images in chatgpt are wayyyy more popular than we expected (and we had pretty high expectations).

rollout to our free tier is unfortunately going to be delayed for awhile.

It’s not at all obvious you should be paying the $200. Some of you should, some of you shouldn’t. I don’t find myself using Deep Research or o1-pro that often, and I would likely downgrade especially after Gemini 2.5 if I wasn’t reporting on AI (so getting the cool new toys early has high value to me). But if you’re not paying the $20 for at least two of ChatGPT, Claude and Gemini, then you fool.

The fun has escalated quite a bit, and has now changed in kind. The question is, does this mean a world of slop, or does it mean we can finally create things that aren’t slop?

Or, of course, both?

Simp4Satoshi: The image gen stuff is memetically fit because traditionally, it took effort to create

It was supply bottlenecked

In a few days, supply will outstrip memetic demand

And it’ll be seen as slop again.

Thus begs the question;

Will AI turn the world to Slop?

John Pressman: I think this was a good bet for the previous advances but I’m kind of bullish on this one. The ability to get it to edit in and have images refer to specific objects changes the complexity profile hugely and allows AI art to be used for actual communication instead of just vibes.

The good text rendering is crucial for this. It allows objects to be captioned like in e.g. political cartoons, it allows a book to be a specific book and therefore commentary. I don’t think we’ll exhaust the demand as quickly this time.

This for example is a meaningfully different image than it would be if the books were just generic squiggle text books.

I am tentatively with Pressman. We have now reached the point where someone like me can use image generation to express themselves and create or communicate something real. Whether we collectively use this power for good is up to us.

Why do people delete this app? I would never delete this app.

And some bonus images that missed yesterday’s deadline.

Kitze: i’m sorry but do you understand it’s over for graphical designers? like OVER over.

Except, it isn’t. How was that not graphic design?

News you can use.

There are also of course other uses.

Pliny the Liberator: you can just generate fake IDs, documents, and signatures now 👀

Did you hear there’s also a new image generator called Reve, from xAI? It even seems to offer unlimited generations for free.

Not the best timing on that one. There was little reaction, I’m assuming for a reason.

Alexander Doria and Professor Bad Trip were unimpressed by its aesthetics. It did manage to get a horse riding an astronaut at 5: 30 on an analog clock, but mostly it seemed no one cared. I am going on the principle that if it was actually good enough (or sufficiently less censored, although some reports say it is moderately more relaxed about this) to be used over 4o people would know.

We also got Ideogram 3.0, which Rowan Cheung calls ‘a new SoTA image generation model.’ If nothing else, this one is fast, and also available to free users. Again, people aren’t talking about it.

Meanwhile, Elon Musk, this was maybe not the wisest choice of example, but the most illustrative, from several days before we all would have found it profoundly unimpressive, I mean this isn’t even Ghibli.

It’s amazing the extent to which Elon Musk’s AI pitches are badvibemaxxing.

You are invited to a Severance wellness session.

Discussion about this post

AI #109: Google Fails Marketing Forever Read More »

google-makes-android-development-private,-will-continue-open-source-releases

Google makes Android development private, will continue open source releases

Google is planning a major change to the way it develops new versions of the Android operating system. Since the beginning, large swaths of the software have been developed in public-facing channels, but that will no longer be the case. This does not mean Android is shedding its open source roots, but the process won’t be as transparent.

Google has confirmed to Android Authority that all Android development work going forward will take place in Google’s internal branch. This is a shift from the way Google has worked on Android in the past, which featured frequent updates to the public AOSP branch. Anyone can access AOSP, but the internal branches are only available to Google and companies with a Google Mobile Services (GMS) license, like Samsung, Motorola, and others.

According to the company, it is making this change to simplify things, building on a recent change to trunk-based development. As Google works on both public and private branches of Android, the two fall out of sync with respect to features and API support. This forces Google to tediously merge the branches for every release. By focusing on the internal branch, Google claims it can streamline releases and make life easier for everyone.

When new versions of Android are done, Google says it will continue to publish the source code in AOSP as always. Supposedly, this will allow developers to focus on supporting their apps without keeping track of pending changes to the platform in AOSP. Licensed OEMs, meanwhile, can just focus on the lively internal branch as they work on devices that can take a year or more to launch.

Google makes Android development private, will continue open source releases Read More »

gemini-2.5-pro-is-here-with-bigger-numbers-and-great-vibes

Gemini 2.5 Pro is here with bigger numbers and great vibes

Just a few months after releasing its first Gemini 2.0 AI models, Google is upgrading again. The company says the new Gemini 2.5 Pro Experimental is its “most intelligent” model yet, offering a massive context window, multimodality, and reasoning capabilities. Google points to a raft of benchmarks that show the new Gemini clobbering other large language models (LLMs), and our testing seems to back that up—Gemini 2.5 Pro is one of the most impressive generative AI models we’ve seen.

Gemini 2.5, like all Google’s models going forward, has reasoning built in. The AI essentially fact-checks itself along the way to generating an output. We like to call this “simulated reasoning,” as there’s no evidence that this process is akin to human reasoning. However, it can go a long way to improving LLM outputs. Google specifically cites the model’s “agentic” coding capabilities as a beneficiary of this process. Gemini 2.5 Pro Experimental can, for example, generate a full working video game from a single prompt. We’ve tested this, and it works with the publicly available version of the model.

Gemini 2.5 Pro builds a game in one step.

Google says a lot of things about Gemini 2.5 Pro; it’s smarter, it’s context-aware, it thinks—but it’s hard to quantify what constitutes improvement in generative AI bots. There are some clear technical upsides, though. Gemini 2.5 Pro comes with a 1 million token context window, which is common for the big Gemini models but massive compared to competing models like OpenAI GPT or Anthropic Claude. You could feed multiple very long books to Gemini 2.5 Pro in a single prompt, and the output maxes out at 64,000 tokens. That’s the same as Flash 2.0, but it’s still objectively a lot of tokens compared to other LLMs.

Naturally, Google has run Gemini 2.5 Experimental through a battery of benchmarks, in which it scores a bit higher than other AI systems. For example, it squeaks past OpenAI’s o3-mini in GPQA and AIME 2025, which measure how well the AI answers complex questions about science and math, respectively. It also set a new record in the Humanity’s Last Exam benchmark, which consists of 3,000 questions curated by domain experts. Google’s new AI managed a score of 18.8 percent to OpenAI’s 14 percent.

Gemini 2.5 Pro is here with bigger numbers and great vibes Read More »

after-borking-my-pixel-4a-battery,-google-borks-me,-too

After borking my Pixel 4a battery, Google borks me, too


The devil is in the details.

The Pixel 4a. It’s finally here! Credit: Google

It is an immutable law of nature that when you receive a corporate email with a subject line like “Changes coming to your Pixel 4a,” the changes won’t be the sort you like. Indeed, a more honest subject line would usually be: “You’re about to get hosed.”

So I wasn’t surprised, as I read further into this January missive from Google, that an “upcoming software update for your Pixel 4a” would “affect the overall performance and stability of its battery.”

How would my battery be affected? Negatively, of course. “This update will reduce your battery’s runtime and charging performance,” the email said. “To address this, we’re providing some options to consider. “

Our benevolent Google overlords were about to nerf my phone battery—presumably in the interests of “not having it erupt in flames,” though this was never actually made clear—but they recognized the problem, and they were about to provide compensation. This is exactly how these kinds of situations should be handled.

Google offered three options: $50 cash money, a $100 credit to Google’s online store, or a free battery replacement. It seemed fair enough. Yes, not having my phone for a week or two while I shipped it roundtrip to Google could be annoying, but at least the company was directly mitigating the harm it was about to inflict. Indeed, users might actually end up in better shape than before, given the brand-new battery.

So I was feeling relatively sunny toward the giant monopolist when I decided to spring for the 50 simoleons. My thinking was that 1) I didn’t want to lose my phone for a couple of weeks, 2) the update might not be that bad, in which case I’d be ahead by 50 bucks, and 3) I could always put the money towards a battery replacement if assumption No. 2 turned out to be mistaken.

The naïveté of youth!

I selected my $50 “appeasement” through an online form, and two days later, I received an email from Bharath on the Google Support Team.

Bharath wanted me to know that I was eligible for the money and it would soon be in my hands… once I performed a small, almost trivial task: giving some company I had never heard of my name, address, phone number, Social Security number, date of birth, and bank account details.

About that $50…

Google was not, in fact, just “sending” me $50. I had expected, since the problem involved their phones and their update, that the solution would require little or nothing from me. A check or prepaid credit card would arrive in the mail, perhaps, or a drone might deliver a crisp new bill from the sky. I didn’t know and didn’t care, so long as it wasn’t my problem.

But it was my problem. To get the cash, I had to create an account with something called “Payoneer.” This is apparently a reputable payments company, but I had never heard of it, and much about its operations is unclear. For instance, I was given three different ways to sign up depending on whether I 1) “already have a Payoneer account from Google,” 2) “don’t have an account,” or 3) “do have a Payoneer account that was not provided nor activated through Google.”

Say what now?

And though Google promised “no transaction fees,” Payoneer appears to charge an “annual account fee” of $29.95… but only to accounts that receive less than $2,000 through Payoneer in any consecutive 12-month period.

Does this fee apply to me if I sign up through the Google offer? I was directed to Payoneer support with any questions, but the company’s FAQ on the annual account fee doesn’t say.

If the fee does apply to me, do I need to sign up for a Payoneer account, give them all of my most personal financial information, wait the “10 to 18 business days” that Google says it will take to get my money, and then return to Payoneer so that I can cancel my account before racking up some $30 charge a year from now? And I’m supposed to do all this just to get…. fifty bucks? One time?

It was far simpler for me to get a recent hundred-dollar rebate on a washing machine… and they didn’t need my SSN or bank account information.

(Reddit users also report that, if you use the wrong web browser to cancel your Payoneer account, you’re hit with an error that says: “This end point requires that the body of all requests be formatted as JSON.”)

Like Lando Calrissian, I realized that this deal was getting worse all the time.

I planned to write Bharath back to switch my “appeasement,” but then I noticed the fine print: No changes are possible after making a selection.

So—no money for me. On the scale of life’s crises, losing $50 is a minor one, and I resolved to move on, facing the world with a cheerful heart and a clear mind, undistracted by the many small annoyances our high-tech overlords continually strew upon the path.

Then the software update arrived.

A decimation situation

When Google said that the new Pixel 4a update would “reduce your battery’s runtime and charging performance,” it was not kidding. Indeed, the update basically destroyed the battery.

Though my phone was three years old, until January of this year, the battery still held up for all-day usage. The screen was nice, the (smallish) phone size was good, and the device remained plenty fast at all the basic tasks: texting, emails, web browsing, snapping photos. I’m trying to reduce both my consumerism and my e-waste, so I was planning to keep the device for at least another year. And even then, it would make a decent hand-me-down device for my younger kids.

After the update, however, the phone burned through a full battery charge in less than two hours. I could pull up a simple podcast app, start playing an episode, and watch the battery percentage decrement every 45 seconds or so. Using the phone was nearly impossible unless one was near a charging cable at all times.

To recap: My phone was shot, I had to jump through several hoops to get my money, and I couldn’t change my “appeasement” once I realized that it wouldn’t work for me.

Within the space of three days, I went from 1) being mildly annoyed at the prospect of having my phone messed with remotely to 2) accepting that Google was (probably) doing it for my own safety and was committed to making things right to 3) berating Google for ruining my device and then using a hostile, data collecting “appeasement” program to act like it cared. This was probably not the impression Google hoped to leave in people’s minds when issuing the Pixel 4a update.

Pixel 4a, disassembled, with two fingers holding its battery above the front half.

Removing the Pixel 4a’s battery can be painful, but not as painful as catching fire. Credit: iFixit

Cheap can be quite expensive

The update itself does not appear to be part of some plan to spy on us or to extract revenue but rather to keep people safe. The company tried to remedy the pain with options that, on the surface, felt reasonable, especially given the fact that batteries are well-known as consumable objects that degrade over time. And I’ve had three solid years of service with the 4a, which wasn’t especially expensive to begin with.

That said, I do blame Google in general for the situation. The inflexibility of the approach, the options that aren’t tailored for ease of use in specific countries, the outsourced tech support—these are all hallmarks of today’s global tech behemoths.

It is more efficient, from an algorithmic, employ-as-few-humans-as-possible perspective, to operate “at scale” by choosing global technical solutions over better local options, by choosing outsourced email support, by trying to avoid fraud (and employee time) through preventing program changes, by asking the users to jump through your hoops, by gobbling up ultra-sensitive information because it makes things easier on your end.

While this makes a certain kind of sense, it’s not fun to receive this kind of “efficiency.” When everything goes smoothly, it’s fine—but whenever there’s a problem, or questions arise, these kinds of “efficient, scalable” approaches usually just mean “you’re about to get screwed.”

In the end, Google is willing to pay me $50, but that money comes with its own cost. I’m not willing to pay with my time nor with the risk of my financial information, and I will increasingly turn to companies that offer a better experience, that care more about data privacy, that build with higher-quality components, and that take good care of customers.

No company is perfect, of course, and this approach costs a bit more, which butts up against my powerful urge to get a great deal on everything. I have to keep relearning the old lesson— as I am once again with this Pixel 4a fiasco—that cheap gear is not always the best value in the long run.

Photo of Nate Anderson

After borking my Pixel 4a battery, Google borks me, too Read More »

italy-demands-google-poison-dns-under-strict-piracy-shield-law

Italy demands Google poison DNS under strict Piracy Shield law

Spotted by TorrentFreak, AGCOM Commissioner Massimiliano Capitanio took to LinkedIn to celebrate the ruling, as well as the existence of the Italian Piracy Shield. “The Judge confirmed the value of AGCOM’s investigations, once again giving legitimacy to a system for the protection of copyright that is unique in the world,” said Capitanio.

Capitanio went on to complain that Google has routinely ignored AGCOM’s listing of pirate sites, which are supposed to be blocked in 30 minutes or less under the law. He noted the violation was so clear-cut that the order was issued without giving Google a chance to respond, known as inaudita altera parte in Italian courts.

This decision follows a similar case against Internet backbone firm Cloudflare. In January, the Court of Milan found that Cloudflare’s CDN, DNS server, and WARP VPN were facilitating piracy. The court threatened Cloudflare with fines of up to 10,000 euros per day if it did not begin blocking the sites.

Google could face similar sanctions, but AGCOM has had difficulty getting international tech behemoths to acknowledge their legal obligations in the country. We’ve reached out to Google for comment and will update this report if we hear back.

Italy demands Google poison DNS under strict Piracy Shield law Read More »

apple-and-google-in-the-hot-seat-as-european-regulators-ignore-trump-warnings

Apple and Google in the hot seat as European regulators ignore Trump warnings

The European Commission is not backing down from efforts to rein in Big Tech. In a series of press releases today, the European Union’s executive arm has announced actions against both Apple and Google. Regulators have announced that Apple will be required to open up support for non-Apple accessories on the iPhone, but it may be too late for Google to make changes. The commission says the search giant has violated the Digital Markets Act, which could lead to a hefty fine.

Since returning to power, Donald Trump has railed against European regulations that target US tech firms. In spite of rising tensions and tough talk, the European Commission seems unfazed and is continuing to follow its more stringent laws, like the Digital Markets Act (DMA). This landmark piece of EU legislation aims to make the digital economy more fair. Upon coming into force last year, the act labeled certain large tech companies, including Apple and Google, as “gatekeepers” that are subject to additional scrutiny.

Europe’s more aggressive regulation of Big Tech is why iPhone users on the continent can install apps from third-party app markets while the rest of us are stuck with the Apple App Store. As for Google, the European Commission has paid special attention to search, Android, and Chrome, all of which dominate their respective markets.

Apple’s mobile platform plays second fiddle to Android in Europe, but it’s large enough to make the company subject to the DMA. The EU has now decreed that Apple is not doing enough to support interoperability on its platform. As a result, it will be required to make several notable changes. Apple will have to provide other companies and developers with improved access to iOS for devices like smartwatches, headphones, and TVs. This could include integration with notifications, faster data transfers, and streamlined setup.

The commission is also forcing Apple to release additional technical documentation, communication, and notifications for upcoming features for third parties. The EU believes this change will encourage more companies to build products that integrate with the iPhone, giving everyone more options aside from Apple’s.

Regulators say both sets of measures are the result of a public comment period that began late last year. We’ve asked Apple for comment on this development but have not heard back as of publication time. Apple is required to make these changes, and failing to do so could lead to fines. However, Google is already there.

Apple and Google in the hot seat as European regulators ignore Trump warnings Read More »

gemini-gets-new-coding-and-writing-tools,-plus-ai-generated-“podcasts”

Gemini gets new coding and writing tools, plus AI-generated “podcasts”

On the heels of its release of new Gemini models last week, Google has announced a pair of new features for its flagship AI product. Starting today, Gemini has a new Canvas feature that lets you draft, edit, and refine documents or code. Gemini is also getting Audio Overviews, a neat capability that first appeared in the company’s NotebookLM product, but it’s getting even more useful as part of Gemini.

Canvas is similar (confusingly) to the OpenAI product of the same name. Canvas is available in the Gemini prompt bar on the web and mobile app. Simply upload a document and tell Gemini what you need to do with it. In Google’s example, the user asks for a speech based on a PDF containing class notes. And just like that, Gemini spits out a document.

Canvas lets you refine the AI-generated documents right inside Gemini. The writing tools available across the Google ecosystem, with options like suggested edits and different tones, are available inside the Gemini-based editor. If you want to do more edits or collaborate with others, you can export the document to Google Docs with a single click.

Gemini Canvas with tic-tac-toe game

Credit: Google

Canvas is also adept at coding. Just ask, and Canvas can generate prototype web apps, Python scripts, HTML, and more. You can ask Gemini about the code, make alterations, and even preview your results in real time inside Gemini as you (or the AI) make changes.

Gemini gets new coding and writing tools, plus AI-generated “podcasts” Read More »

google-inks-$32-billion-deal-to-buy-security-firm-wiz-even-as-doj-seeks-breakup

Google inks $32 billion deal to buy security firm Wiz even as DOJ seeks breakup

“While a tough regulatory climate in 2024 had hampered such large-scale deals, Wall Street is optimistic that a shift in antitrust policies under US President Donald Trump could reignite dealmaking momentum,” Reuters wrote today.

Google reportedly agreed to a $3.2 billion breakup fee that would be paid to Wiz if the deal collapses. A Financial Times report said the breakup fee is unusually large as it represents 10 percent of the total deal value, instead of the typical 2 or 3 percent. The large breakup fee “shows how technology companies are still bracing themselves for pushback from antitrust regulators, even under President Donald Trump and his new Federal Trade Commission chair Andrew Ferguson,” the article said.

Wiz co-founder and CEO Assaf Rappaport wrote today that although the plan is for Wiz to become part of Google Cloud, the companies both believe that “Wiz needs to remain a multicloud platform… We will still work closely with our great partners at AWS, Azure, Oracle, and across the entire industry.”

Google Cloud CEO Thomas Kurian wrote that Wiz’s platform would fill a gap in Google’s security offerings. Google products already “help customers detect and respond to attackers through both SaaS-based services and cybersecurity consulting,” but Wiz is different because it “connects to all major clouds and code environments to help prevent incidents from happening in the first place,” he wrote.

“Wiz’s solution rapidly scans the customer’s environment, constructing a comprehensive graph of code, cloud resources, services, and applications—along with the connections between them,” Kurian wrote. “It identifies potential attack paths, prioritizes the most critical risks based on their impact, and empowers enterprise developers to secure applications before deployment. It also helps security teams collaborate with developers to remediate risks in code or detect and block ongoing attacks.”

Google inks $32 billion deal to buy security firm Wiz even as DOJ seeks breakup Read More »

farewell-photoshop?-google’s-new-ai-lets-you-edit-images-by-asking.

Farewell Photoshop? Google’s new AI lets you edit images by asking.


New AI allows no-skill photo editing, including adding objects and removing watermarks.

A collection of images either generated or modified by Gemini 2.0 Flash (Image Generation) Experimental. Credit: Google / Ars Technica

There’s a new Google AI model in town, and it can generate or edit images as easily as it can create text—as part of its chatbot conversation. The results aren’t perfect, but it’s quite possible everyone in the near future will be able to manipulate images this way.

Last Wednesday, Google expanded access to Gemini 2.0 Flash’s native image-generation capabilities, making the experimental feature available to anyone using Google AI Studio. Previously limited to testers since December, the multimodal technology integrates both native text and image processing capabilities into one AI model.

The new model, titled “Gemini 2.0 Flash (Image Generation) Experimental,” flew somewhat under the radar last week, but it has been garnering more attention over the past few days due to its ability to remove watermarks from images, albeit with artifacts and a reduction in image quality.

That’s not the only trick. Gemini 2.0 Flash can add objects, remove objects, modify scenery, change lighting, attempt to change image angles, zoom in or out, and perform other transformations—all to varying levels of success depending on the subject matter, style, and image in question.

To pull it off, Google trained Gemini 2.0 on a large dataset of images (converted into tokens) and text. The model’s “knowledge” about images occupies the same neural network space as its knowledge about world concepts from text sources, so it can directly output image tokens that get converted back into images and fed to the user.

Adding a water-skiing barbarian to a photograph with Gemini 2.0 Flash.

Adding a water-skiing barbarian to a photograph with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Incorporating image generation into an AI chat isn’t itself new—OpenAI integrated its image-generator DALL-E 3 into ChatGPT last September, and other tech companies like xAI followed suit. But until now, every one of those AI chat assistants called on a separate diffusion-based AI model (which uses a different synthesis principle than LLMs) to generate images, which were then returned to the user within the chat interface. In this case, Gemini 2.0 Flash is both the large language model (LLM) and AI image generator rolled into one system.

Interestingly, OpenAI’s GPT-4o is capable of native image output as well (and OpenAI President Greg Brock teased the feature at one point on X last year), but that company has yet to release true multimodal image output capability. One reason why is possibly because true multimodal image output is very computationally expensive, since each image either inputted or generated is composed of tokens that become part of the context that runs through the image model again and again with each successive prompt. And given the compute needs and size of the training data required to create a truly visually comprehensive multimodal model, the output quality of the images isn’t necessarily as good as diffusion models just yet.

Creating another angle of a person with Gemini 2.0 Flash.

Creating another angle of a person with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Another reason OpenAI has held back may be “safety”-related: In a similar way to how multimodal models trained on audio can absorb a short clip of a sample person’s voice and then imitate it flawlessly (this is how ChatGPT’s Advanced Voice Mode works, with a clip of a voice actor it is authorized to imitate), multimodal image output models are capable of faking media reality in a relatively effortless and convincing way, given proper training data and compute behind it. With a good enough multimodal model, potentially life-wrecking deepfakes and photo manipulations could become even more trivial to produce than they are now.

Putting it to the test

So, what exactly can Gemini 2.0 Flash do? Notably, its support for conversational image editing allows users to iteratively refine images through natural language dialogue across multiple successive prompts. You can talk to it and tell it what you want to add, remove, or change. It’s imperfect, but it’s the beginning of a new type of native image editing capability in the tech world.

We gave Gemini Flash 2.0 a battery of informal AI image-editing tests, and you’ll see the results below. For example, we removed a rabbit from an image in a grassy yard. We also removed a chicken from a messy garage. Gemini fills in the background with its best guess. No need for a clone brush—watch out, Photoshop!

We also tried adding synthesized objects to images. Being always wary of the collapse of media reality, called the “cultural singularity,” we added a UFO to a photo the author took from an airplane window. Then we tried adding a Sasquatch and a ghost. The results were unrealistic, but this model was also trained on a limited image dataset (more on that below).

Adding a UFO to a photograph with Gemini 2.0 Flash. Google / Benj Edwards

We then added a video game character to a photo of an Atari 800 screen (Wizard of Wor), resulting in perhaps the most realistic image synthesis result in the set. You might not see it here, but Gemini added realistic CRT scanlines that matched the monitor’s characteristics pretty well.

Adding a monster to an Atari video game with Gemini 2.0 Flash.

Adding a monster to an Atari video game with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Gemini can also warp an image in novel ways, like “zooming out” of an image into a fictional setting or giving an EGA-palette character a body, then sticking him into an adventure game.

“Zooming out” on an image with Gemini 2.0 Flash. Google / Benj Edwards

And yes, you can remove watermarks. We tried removing a watermark from a Getty Images image, and it worked, although the resulting image is nowhere near the resolution or detail quality of the original. Ultimately, if your brain can picture what an image is like without a watermark, so can an AI model. It fills in the watermark space with the most plausible result based on its training data.

Removing a watermark with Gemini 2.0 Flash.

Removing a watermark with Gemini 2.0 Flash. Credit: Nomadsoul1 via Getty Images

And finally, we know you’ve likely missed seeing barbarians beside TV sets (as per tradition), so we gave that a shot. Originally, Gemini didn’t add a CRT TV set to the barbarian image, so we asked for one.

Adding a TV set to a barbarian image with Gemini 2.0 Flash.

Adding a TV set to a barbarian image with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Then we set the TV on fire.

Setting the TV set on fire with Gemini 2.0 Flash.

Setting the TV set on fire with Gemini 2.0 Flash. Credit: Google / Benj Edwards

All in all, it doesn’t produce images of pristine quality or detail, but we literally did no editing work on these images other than typing requests. Adobe Photoshop currently lets users manipulate images using AI synthesis based on written prompts with “Generative Fill,” but it’s not quite as natural as this. We could see Adobe adding a more conversational AI image-editing flow like this one in the future.

Multimodal output opens up new possibilities

Having true multimodal output opens up interesting new possibilities in chatbots. For example, Gemini 2.0 Flash can play interactive graphical games or generate stories with consistent illustrations, maintaining character and setting continuity throughout multiple images. It’s far from perfect, but character consistency is a new capability in AI assistants. We tried it out and it was pretty wild—especially when it generated a view of a photo we provided from another angle.

Creating a multi-image story with Gemini 2.0 Flash, part 1. Google / Benj Edwards

Text rendering represents another potential strength of the model. Google claims that internal benchmarks show Gemini 2.0 Flash performs better than “leading competitive models” when generating images containing text, making it potentially suitable for creating content with integrated text. From our experience, the results weren’t that exciting, but they were legible.

An example of in-image text rendering generated with Gemini 2.0 Flash.

An example of in-image text rendering generated with Gemini 2.0 Flash. Credit: Google / Ars Technica

Despite Gemini 2.0 Flash’s shortcomings so far, the emergence of true multimodal image output feels like a notable moment in AI history because of what it suggests if the technology continues to improve. If you imagine a future, say 10 years from now, where a sufficiently complex AI model could generate any type of media in real time—text, images, audio, video, 3D graphics, 3D-printed physical objects, and interactive experiences—you basically have a holodeck, but without the matter replication.

Coming back to reality, it’s still “early days” for multimodal image output, and Google recognizes that. Recall that Flash 2.0 is intended to be a smaller AI model that is faster and cheaper to run, so it hasn’t absorbed the entire breadth of the Internet. All that information takes a lot of space in terms of parameter count, and more parameters means more compute. Instead, Google trained Gemini 2.0 Flash by feeding it a curated dataset that also likely included targeted synthetic data. As a result, the model does not “know” everything visual about the world, and Google itself says the training data is “broad and general, not absolute or complete.”

That’s just a fancy way of saying that the image output quality isn’t perfect—yet. But there is plenty of room for improvement in the future to incorporate more visual “knowledge” as training techniques advance and compute drops in cost. If the process becomes anything like we’ve seen with diffusion-based AI image generators like Stable Diffusion, Midjourney, and Flux, multimodal image output quality may improve rapidly over a short period of time. Get ready for a completely fluid media reality.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Farewell Photoshop? Google’s new AI lets you edit images by asking. Read More »

rcs-texting-updates-will-bring-end-to-end-encryption-to-green-bubble-chats

RCS texting updates will bring end-to-end encryption to green bubble chats

One of the best mostly invisible updates in iOS 18 was Apple’s decision to finally implement the Rich Communications Services (RCS) communication protocol, something that is slowly helping to fix the generally miserable experience of texting non-iPhone users with an iPhone. The initial iOS 18 update brought RCS support to most major carriers in the US, and the upcoming iOS 18.4 update is turning it on for a bunch of smaller prepaid carriers like Google Fi and Mint Mobile.

Now that Apple is on board, iPhones and their users can also benefit from continued improvements to the RCS standard. And one major update was announced today: RCS will now support end-to-end encryption using the Messaging Layer Security (MLS) protocol, a standard finalized by the Internet Engineering Task Force in 2023.

“RCS will be the first large-scale messaging service to support interoperable E2EE between client implementations from different providers,” writes GSMA Technical Director Tom Van Pelt in the post announcing the updates. “Together with other unique security features such as SIM-based authentication, E2EE will provide RCS users with the highest level of privacy and security for stronger protection from scams, fraud and other security and privacy threats. ”

RCS texting updates will bring end-to-end encryption to green bubble chats Read More »

google-joins-openai-in-pushing-feds-to-codify-ai-training-as-fair-use

Google joins OpenAI in pushing feds to codify AI training as fair use

Google’s position on AI regulation: Trust us, bro

If there was any doubt about Google’s commitment to move fast and break things, its new policy position should put that to rest. “For too long, AI policymaking has paid disproportionate attention to the risks,” the document says.

Google urges the US to invest in AI not only with money but with business-friendly legislation. The company joins the growing chorus of AI firms calling for federal legislation that clarifies how they can operate. It points to the difficulty of complying with a “patchwork” of state-level laws that impose restrictions on AI development and use. If you want to know what keeps Google’s policy wonks up at night, look no further than the vetoed SB-1047 bill in California, which would have enforced AI safety measures.

AI ethics or AI Law concept. Developing AI codes of ethics. Compliance, regulation, standard , business policy and responsibility for guarding against unintended bias in machine learning algorithms.

Credit: Parradee Kietsirikul

According to Google, a national AI framework that supports innovation is necessary to push the boundaries of what artificial intelligence can do. Taking a page from the gun lobby, Google opposes attempts to hold the creators of AI liable for the way those models are used. Generative AI systems are non-deterministic, making it impossible to fully predict their output. Google wants clearly defined responsibilities for AI developers, deployers, and end users—it would, however, clearly prefer most of those responsibilities fall on others. “In many instances, the original developer of an AI model has little to no visibility or control over how it is being used by a deployer and may not interact with end users,” the company says.

There are efforts underway in some countries that would implement stringent regulations that force companies like Google to make their tools more transparent. For example, the EU’s AI Act would require AI firms to publish an overview of training data and possible risks associated with their products. Google believes this would force the disclosure of trade secrets that would allow foreign adversaries to more easily duplicate its work, mirroring concerns that OpenAI expressed in its policy proposal.

Google wants the government to push back on these efforts at the diplomatic level. The company would like to be able to release AI products around the world, and the best way to ensure it has that option is to promote light-touch regulation that “reflects US values and approaches.” That is, Google’s values and approaches.

Google joins OpenAI in pushing feds to codify AI training as fair use Read More »