Highlights

ai-#69:-nice

AI #69: Nice

Nice job breaking it, hero, unfortunately. Ilya Sutskever, despite what I sincerely believe are the best of intentions, has decided to be the latest to do The Worst Possible Thing, founding a new AI company explicitly looking to build ASI (superintelligence). The twists are zero products with a ‘cracked’ small team, which I suppose is an improvement, and calling it Safe Superintelligence, which I do not suppose is an improvement.

How is he going to make it safe? His statements tell us nothing meaningful about that.

There were also changes to SB 1047. Most of them can be safely ignored. The big change is getting rid of the limited duty exception, because it seems I was one of about five people who understood it, and everyone kept thinking it was a requirement for companies instead of an opportunity. And the literal chamber of commerce fought hard to kill the opportunity. So now that opportunity is gone.

Donald Trump talked about AI. He has thoughts.

Finally, if it is broken, and perhaps the it is ‘your cybersecurity,’ how about fixing it? Thus, a former NSA director joins the board of OpenAI. A bunch of people are not happy about this development, and yes I can imagine why. There is a history, perhaps.

Remaining backlog update: I still owe updates on the OpenAI Model spec, Rand report and Seoul conference, and eventually The Vault. We’ll definitely get the model spec next week, probably on Monday, and hopefully more. Definitely making progress.

Other AI posts this week: On DeepMind’s Frontier Safety Framework, OpenAI #8: The Right to Warn, and The Leopold Model: Analysis and Reactions.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. DeepSeek could be for real.

  4. Language Models Don’t Offer Mundane Utility. Careful who you talk to about AI.

  5. Fun With Image Generation. His full story can finally be told.

  6. Deepfaketown and Botpocalypse Soon. Every system will get what it deserves.

  7. The Art of the Jailbreak. Automatic red teaming. Requires moderation.

  8. Copyright Confrontation. Perplexity might have some issues.

  9. A Matter of the National Security Agency. Paul Nakasone joins OpenAI board.

  10. Get Involved. GovAI is hiring. Your comments on SB 1047 could help.

  11. Introducing. Be the Golden Gate Bridge, or anything you want to be.

  12. In Other AI News. Is it time to resign?

  13. Quiet Speculations. The quest to be situationally aware shall continue.

  14. AI Is Going to Be Huuuuuuuuuuge. So sayeth The Donald.

  15. SB 1047 Updated Again. No more limited duty exemption. Democracy, ya know?

  16. The Quest for Sane Regulation. Pope speaks truth. Mistral CEO does not.

  17. The Week in Audio. A few new options.

  18. The ARC of Progress. Francois Chollet goes on Dwarkesh, offers $1mm prize.

  19. Put Your Thing In a Box. Do not open the box. I repeat. Do not open the box.

  20. What Will Ilya Do? Alas, create another company trying to create ASI.

  21. Actual Rhetorical Innovation. Better names might be helpful.

  22. Rhetorical Innovation. If at first you don’t succeed.

  23. Aligning a Smarter Than Human Intelligence is Difficult. How it breaks down.

  24. People Are Worried About AI Killing Everyone. But not maximally worried.

  25. Other People Are Not As Worried About AI Killing Everyone. Here they are.

  26. The Lighter Side. It cannot hurt to ask.

Coding rankings dropped from the new BigCodeBench (blog) (leaderboard)

Three things jump out.

  1. GPT-4o is dominating by an amount that doesn’t match people’s reports of practical edge. I saw a claim that it is overtrained on vanilla Python, causing it to test better than it plays in practice. I don’t know.

  2. The gap from Gemini 1.5 Flash to Gemini 1.5 Pro and GPT-4-Turbo is very small. Gemini Flash is looking great here.

  3. DeepSeek-Coder-v2 is super impressive. The Elo tab gives a story where it does somewhat worse, but even there the performance is impressive. This is one of the best signs so far that China can do something competitive in the space, if this benchmark turns out to be good.

The obvious note is that DeepSeek-Coder-v2, which is 236B with 21B active experts, 128k context length, 338 programming languages, was released one day before the new rankings. Also here is a paper, reporting it does well on standard benchmarks but underperforms on instruction-following, which leads to poor performance on complex scenarios and tasks. I leave it to better coders to tell me what’s up here.

There is a lot of bunching of Elo results, both here and in the traditional Arena rankings. I speculate that as people learn about LLMs, a large percentage of queries are things LLMs are known to handle, so which answer gets chosen becomes a stylistic coin flip reasonably often among decent models? We have for example Sonnet winning something like 40% of the time against Opus, so Sonnet is for many purposes ‘good enough.’

From Venkatesh Rao: Arbitrariness costs as a key form of transaction costs. As things get more complex we have to store more and more arbitrary details in our head. If we want to do [new thing] we need to learn more of those details. That is exhausting and annoying. So often we stick to the things where we know the arbitrary stuff already.

He is skeptical AI Fixes This. I am less skeptical. One excellent use of AI is to ask it about the arbitrary things in life. If it was in the training data, or you can provide access to the guide, then the AI knows. Asking is annoying, but miles less annoying than not knowing. Soon we will have agents like Apple Intelligence to tell you with a better interface, or increasingly do all of it for you. That will match the premium experiences that take this issue away.

What searches are better with AI than Google Search? Patrick McKenzie says not yet core searches, but a lot of classes of other things, such as ‘tip of the tongue’ searches.

Hook GPT-4 up to your security cameras and home assistant, and find lost things. If you are already paying the ‘creepy tax’ then why not? Note that this need not be on except when you need it.

Claude’s dark spiritual AI futurism from Jessica Taylor.

Fine tuning, for style or a character, works and is at least great fun. Why, asks Sarah Constantin, are more people not doing it? Why aren’t they sharing the results?

Gallabytes recommends Gemini Flash and Paligemma 3b, is so impressed by small models he mostly stopped using the ‘big slow’ ones except he still uses Claude when he needs to use PDF inputs. My experience is different, I will continue to go big, but small models have certainly improved.

It would be cool if we could able to apply LLMs to all books, Alex Tabarrok demands all books come with a code that will unlock eText capable of being read by an LLM. If you have a public domain book NotebookLM can let you read while asking questions with inline citations (link explains how) and jump to supporting passages and so on, super cool. Garry Tan originally called this ‘Perplexity meets Kindle.’ That name is confusing to me except insofar as he has invested in Perplexity, since Perplexity does not have the functionality you want here, Gemini 1.5 and Claude do.

The obvious solution is for Amazon to do a deal with either Google or Anthropic to incorporate this ability into the Kindle. Get to work, everyone.

I Will Fucking Piledrive You If You Mention AI Again. No, every company does not need an AI strategy, this righteously furious and quite funny programmer informs us. So much of it is hype and fake. They realize all this will have big impacts, and even think an intelligence explosion and existential risk are real possibilities, but that says nothing about what your company should be doing. Know your exact use case, or wait, doing fake half measures won’t make you more prepared down the line. I think that depends how you go about it. If you are gaining core competencies and familiarities, that’s good. If you are scrambling with outside contractors for an ‘AI strategy’ then not so much.

Creativity Has Left the Chat: The Price of Debiasing Language Models. RLHF on Llama-2 greatly reduced its creativity, making it more likely output would tend towards a small number of ‘attractor states.’ While the point remains, it does seem like Llama-2’s RLHF was especially ham-handed. Like anything else, you can do RLHF well or you can do it poorly. If you do it well, you still pay a price, but nothing like the price you pay when you do it badly. The AI will learn what you teach it, not what you were trying to teach it.

Your mouse does not need AI.

Near: please stop its physically painful

i wonder what it was like to be in the meeting for this

“we need to add AI to our mouse”

“oh. uhhh. uhhnhmj. what about an AI…button?”

“genius! but we don’t have any AI products :(“

Having a dedicated button on mouse or keyboard that says ‘turn on the PC microphone so you can input AI instructions’ seems good? Yes, the implementation is cringe, but the button itself is fine. The world needs more buttons.

The distracted boyfriend, now caught on video, worth a look. Seems to really get it.

Andrej Karpathy: wow. The new model from Luma Labs AI extending images into videos is really something else. I understood intuitively that this would become possible very soon, but it’s still something else to see it and think through future iterations of.

A few more examples around, e.g. the girl in front of the house on fire.

As noted earlier, the big weakness for now is that the clips are very short. Within the time they last, they’re super sweet.

How to get the best results from Stable Diffusion 3. You can use very long prompts, but negative prompts don’t work.

New compression method dropped for images, it is lossy but wow is it tiny.

Ethan: so this is nuts, if you’re cool with the high frequency details of an image being reinterpreted/stochastic, you can encode an image quite faithfully into 32 tokens… with a codebook size of 1024 as they use this is just 320bits, new upper bound for the information in an image unlocked.

Eliezer Yudkowsky: Probably some people would have, if asked in advance, claimed that it was impossible for arbitrarily advanced superintelligences to decently compress real images into 320 bits. “You can’t compress things infinitely!” they would say condescendingly. “Intelligence isn’t magic!”

No, kids, the network did not memorize the images. They train on one set of images and test on a different set of images. This is standard practice in AI. I realize you may have reason not to trust in the adequacy of all Earth institutions, but “computer scientists in the last 70 years since AI was invented” are in fact smart enough to think sufficiently simple thoughts as “what if the program is just memorizing the training data”!

Davidad: Even *afterhaving seen it demonstrated, I will claim that it is impossible for arbitrarily advanced superintelligences to decently compress real 256×256 images into 320 bits. A BIP39 passphrase has 480 bits of entropy and fits very comfortably in a real 256×256 photo. [shows example]

Come to think of it, I could easily have added another 93 bits of entropy just by writing each word using a randomly selected one of my 15 distinctly coloured pens. To say nothing of underlining, capitalization, or diacritics.

Eliezer Yudkowsky: Yes, that thought had occurred to me. I do wonder what happens if we run this image through the system! I mostly expect it to go unintelligible. A sufficiently advanced compressor would return a image that looked just like this one but with a different passcode.

Right. It is theoretically impossible to actually encode the entire picture in 320 bits. There are a lot more pictures than that, including meaningfully different pictures. So this process will lose most details. It still says a lot about what can be done.

Did Suno’s release ‘change music forever?” James O’Malley gets overexcited. Yes, the AI can now write you mediocre songs, the examples can be damn good. I’m skeptical that this much matters. As James notes, authenticity is key to music. I would add so is convention, and coordination, and there being ‘a way the song goes,’ and so on. We already have plenty of ‘slop’ songs available. Yes, being able to get songs on a particular topic on demand with chosen details is cool, but I don’t think it meaningfully competes with most music until it gets actively better than what it is trying to replace. That’s harder. Even then, I’d be much more worried as a songwriter than as a performer.

Movies that change every time you watch? Joshua Hawkins finds it potentially interesting. Robin Hanson says no, offers to bet <1% of movie views for next 30 years. If you presume Robin’s prediction of a long period of ‘economic normal’ as given, then I agree that randomization in movies mostly is bad. Occasionally you have a good reason for some fixed variation (e.g. Clue) but mostly not and mostly it would be fixed variations. I think games and interactive movies where you make choices are great but are distinct art forms.

Patrick McKenzie warns about people increasingly scamming the government via private actors that the government trusts, which AI will doubtless turbocharge. The optimal amount of fraud is not zero, unless any non-zero amount plus AI means it would now be infinite, in which case you need to change your fraud policy.

This is in response to Mary Rose reporting that in her online college class a third of the students are AI-powered spambots.

Memorializing loved ones through AI. Ethicists object because that is their job.

Noah Smith: Half of Black Mirror episodes would actually just be totally fine and chill if they happened in real life, because the people involved wouldn’t be characters written by cynical British punk fans.

Make sure you know which half you are in. In this case, seems fine. I also note that if they save the training data, the AI can improve a lot over time.

Botpocalypse will now pause until the Russians pay their OpenAI API bill.

Haize Labs announces automatic red-teaming of LLMs. Thread discusses jailbreaks of all kinds. You’ve got text, image, video and voice. You’ve got an assistant saying something bad. And so on, there’s a repo, can apply to try it out here.

This seems like a necessary and useful project, assuming it is a good implementation. It is great to have an automatic tool to do the first [a lot] cycles of red-teaming while you try to at least deal with that. The worry is that they are overpromising, implying that once you pass their tests you will be good to go and actually secure. You won’t. You might be ‘good to go’ in the sense of good enough for 4-level models. You won’t be actually secure and you still need the human red teaming. The key is not losing sight of that.

Wired article about how Perplexity ignores the Robot Exclusion Protocol, despite claiming they will adhere to it, scraping areas of websites they have no right to scrape. Also its chatbot bullshits, which is not exactly a shock.

Dhruv Mehrotra and Tim Marchman (Wired): WIRED verified that the IP address in question is almost certainly linked to Perplexity by creating a new website and monitoring its server logs. Immediately after a WIRED reporter prompted the Perplexity chatbot to summarize the website’s content, the server logged that the IP address visited the site. This same IP address was first observed by Knight during a similar test.

In theory, Perplexity’s chatbot shouldn’t be able to summarize WIRED articles, because our engineers have blocked its crawler via our robots.txt file since earlier this year.

Perplexity denies the allegations in the strongest and most general terms, but the denial rings hollow. The evidence here seems rather strong.

OpenAI’s newest board member is General Paul Nakasone. He has led the NSA, and has had responsibility for American cyberdefense. He left service on February 2, 2024.

Sam Altman: Excited for general paul nakasone to join the OpenAI board for many reasons, including the critical importance of adding safety and security expertise as we head into our next phase.

OpenAI: Today, Retired U.S. Army General Paul M. Nakasone has joined our Board of Directors. A leading expert in cybersecurity, Nakasone’s appointment reflects OpenAI’s commitment to safety and security, and underscores the growing significance of cybersecurity as the impact of AI technology continues to grow.

As a first priority, Nakasone will join the Board’s Safety and Security Committee, which is responsible for making recommendations to the full Board on critical safety and security decisions for all OpenAI projects and operations.

The public reaction was about what you would expect. The NSA is not an especially popular or trusted institution. The optics, for regular people, were very not good.

TechCrunch: The high-profile addition is likely intended to satisfy critics who think that @OpenAI is moving faster than is wise for its customers and possibly humanity, putting out models and services without adequately evaluating their risks or locking them down

Shoshana Weissmann: For the love of gd can someone please force OpenAI to hire a fucking PR team? How stupid do they have to be?

The counterargument is that cybersecurity and other forms of security are desperately needed at OpenAI and other major labs. We need experts, especially from the government, who can help implement best practices and make the foreign spies at least work for it if they want to steal all the secrets. This is why Leopold called for ‘locking down the labs,’ and I strongly agree there needs to be far more of that then there has been.

There are some very good reasons to like this on principle.

Dan Elton: You gotta admit, the NSA does seem to be pretty good at cybersecurity. It’s hard to think of anyone in the world who would be better aware of the threat landscape than the head of the NSA. He just stepped down in Feb this year. Ofc, he is a people wrangler, not a coder himself.

Just learned they were behind the “WannaCry” hacking tool… I honestly didn’t know that. It caused billions in damage after hackers were able to steal it from the NSA.

Kim Dotcom: OpenAI just hired the guy who was in charge of mass surveillance at the NSA. He outsourced the illegal mass spying against Americans to British spy agencies to circumvent US law. He gave them unlimited spying access to US networks. Tells you all you need to know about OpenAI.

Cate Hall: It tells me they are trying at least a little bit to secure their systems.

Wall Street Silver: This is a huge red flag for OpenAI.

Former head of the National Security Agency, retired Gen. Paul Nakasone has joined OpenAI.

Anyone using OpenAI going forward, you just need to understand that the US govt has full operating control and influence over this app.

There is no other reason to add someone like that to your company.

Daniel Eth: I don’t think this is true, and anyway I think it’s a good sign that OpenAI may take cybersecurity more seriously in the future.

Bogdan Ionut Cirstea: there is an obvious other reason to ‘add someone like that’: good cybersecurity to protect model weights, algorithmic secrets, etc.

LessWrong discusses it here.

There are also reasons to dislike it, if you think this is about reassurance or how things look rather than an attempt to actually improve security. Or if you think it is a play for government contracts.

Or, of course, it could be some sort of grand conspiracy.

It also could be that the government insisted that something like this happen.

If so? It depends on why they did that.

If it was to secure the secrets? Good. This is the right kind of ‘assist and insist.’

Jeffrey Ladish thinks OpenAI Chief Scientist Jakub Pachocki has had his Twitter account hacked, as he says he is proud to announce the new token $OPENAI. This took over 19 hours, at minimum, to be removed.

If it was to steal our secrets? Not so good.

Mostly I take this as good news. OpenAI desperately needs to improve its cybersecurity. This is a way to start down the path of doing that.

If it makes people think OpenAI are acting like villains? Well, they are. So, bonus.

If you live in San Francisco, share your thoughts on SB 1047 here. I have been informed this is worth the time.

GovAI is hiring for Research Fellow and Research Scholars.

Remember Golden Gate Claude? Would you like to play with the API version of that for any feature at all? Apply with Anthropic here.

Chinese AI Safety Network, a cooperation platform for AI Safety across China.

OpenAI allows fine-tuning for function calling, with support for the ‘tools’ parameter.

OpenAI launches partnership with Color Health on cancer screening and treatment.

Belle Lin (WSJ): “Primary care doctors don’t tend to either have the time, or sometimes even the expertise, to risk-adjust people’s screening guidelines,” Laraki said.

There’s also a bunch of help with paperwork and admin, which is most welcome. Idea is to focus on a few narrow key steps and go from there. A lot of this sounds like ‘things that could and should have been done without an LLM and now we have an excuse to actually do them.’ Which, to be clear, is good.

Playgrounds for ChatGPT claims to be a semi-autonomous AI programmer that writes code for you and deploys it for you to test right in chat without downloads, configs or signups.

Joe Carlsmith publishes full Otherness sequence as a PDF.

TikTok symphony, their generative AI assistant for content creators. It can link into your account for context, and knows about what’s happening on TikTok. They also have a creative studio offering to generate video previews and offer translations and stock avatars and display cards and use AI editing tools. They are going all the way:

TikTok: Auto-Generation: Symphony creates brand new video content based on a product URL or existing assets from your account.

They call this ‘elevating human creativity’ with AI technology. I wonder what happens when they essentially invite low-effort AI content onto their platform en masse?

Meta shares four new AI models, Chameleon, JASCO for text-to-music, AudioSeal for detection of AI generated speech and Multi-Token Prediction for code completion. Details here, they also have some documentation for us.

MIRI parts ways with their agent foundations team, who will continue on their own.

Luke Muehlhauser explains he resigned from the Anthropic board because there was a conflict with his work at Open Philanthropy and its policy advocacy. I do not see that as a conflict. If being a board member at Anthropic was a conflict with advocating for strong regulations or considered by them a ‘bad look,’ then that potentially says something is very wrong at Anthropic as well. Yes, there is the ‘behind the scenes’ story but one not behind the scenes must be skeptical. More than that, I think Luke plausibly… chose the wrong role? I realize most board members are very part time, but I think the board of Anthropic was the more important assignment.

Hugging Face CEO says a growing number of AI startup founders are looking to sell, with this happening a lot more this year than in the past. No suggestion as to why. A lot of this could be ‘there are a lot more AI startups now.’

I am not going to otherwise link to it but Guardian published a pure hit piece about Lighthaven and Manifest that goes way beyond the rules of bounded distrust to be wildly factually inaccurate on so many levels I would not know where to begin.

Richard Ngo: For months I’ve had a thread in my drafts about how techies are too harsh on journalists. I’m just waiting to post it on a day when there isn’t an egregiously bad-faith anti-tech hit piece already trending. Surely one day soon, right?

The thread’s key point: tech is in fact killing newspapers, and it’s very hard for people in a dying industry to uphold standards. So despite how bad most journalism has become, techies have a responsibility to try save the good parts, which are genuinely crucial for society.

At this point, my thesis is that the way you save the good parts of journalism is by actually doing good journalism, in ways that make sense today, a statement I hope I can conclude with: You’re welcome.

Your periodic other reminder: Y saying things in bad faith about X does not mean X is now ‘controversial.’ It means Y is in bad faith. Nothing more.

Also, this is a valid counterpoint to ignoring it all:

Ronny Fernandez: This is article is like y’know, pretty silly, poorly written, and poorly researched, but I’m not one to stick my nose up at free advertising. If you would like to run an event at our awesome venue, please fill out an application at http://lighthaven.space!

It is quite an awesome venue.

Meta halts European AI model launch following Irish government’s request. What was the request?

Samuya Nigam (India TV): The decision was made after the Irish privacy regulator told it to delay its plan to harness data from Facebook and Instagram users.

At issue is Meta’s plan to use personal data to train its artificial intelligence (AI) models without seeking consent, the company said THAT it would use publicly available and licensed online information.

In other words:

  1. Meta was told it couldn’t use personal data to train its AIs without consent.

  2. Meta decided if it couldn’t do that it wasn’t worth launching its AI products.

  3. They could have launched the AI products without training on personal data.

  4. So this tells you a lot about why they are launching their AI products.

Various techniques allow LLMs to get as good at math as unaided frontier models. It all seems very straightforward, the kinds of things you would try and that someone finally got around to trying. Given that computers and algorithms are known to already often be good at math, it stands to reason (maybe this is me not understanding the difficulties?) that if you attach an LLM to algorithms of course it can then become good at math without itself even being that powerful?

Can we defend against adversarial attacks on advanced Go programs? Any given particular attack, yes. All attacks at all is still harder. You can make the attacks need to get more sophisticated as you go, but there is some kind of generalization that the AIs are missing, a form of ‘oh this must be some sort of trick to capture a large group and I am far ahead so I will create two Is in case I don’t get it.’ The core problem, in a sense, is arrogance, the ruthless efficiency of such programs, where they do the thing you often see in science fiction where the heroes start doing something weird and are obviously up to something, yet the automated systems or dumb villains ignore them.

The AI needs to learn the simple principle: Space Bunnies Must Die. As in, your opponent is doing things for a reason. If you don’t know the reason (for a card, or a move, or other strategy) then that means it is a Space Bunny. It Must Die.

Tony Wang offers his thoughts, that basic adversarial training is not being properly extended, and we need to make it more robust.

Sarah Constantin liveblogs reading Situational Awareness, time to break out the International Popcorn Reserve.

Mostly she echoes highly reasonable criticisms others (including myself have raised). Strangest claim I saw was doubting that superhuman persuasion was a thing. I see people doubt this and I am deeply confused how it could fail to be a thing, given we have essentially seen existence proofs among humans.

ChatGPT says the first known person to say the following quip was Ken Thompson, but who knows, and I didn’t remember hearing it before:

Sarah Constantin (not the first to say this): To paraphrase Gandhi:

“What do you think of computer security?”

“I think it would be a good idea.”

She offers sensible basic critiques of Leopold’s alignment ideas, pointing out that the techniques he discusses mostly aren’t even relevant to the problems we need to solve, while being strangely hopeful for ‘pleasant surprises.’

This made me smile:

Sarah Constantin: …but you are predicting that AI will increasingly *constituteall of our technology and industry

that’s a pretty crazy thing to just hand over, y’know???

did you have a really bad experience with Sam Altman or something?

She then moves on to doubt AI will be crucial enough to national security to merit The Project, she is generally skeptical that the ASI (superintelligence) will Be All That even if we get it. Leopold and I both think, as almost everyone does, that doubting ASI will show up is highly reasonable. But I find it highly bizarre to think, as many seem to predict, that ASI could show up and then not much would change. That to me seems like it is responding to a claim that X→Y with a claim of Not X. And again, maybe X and maybe Not X, but X→Y.

Dominic Cummings covers Leopold’s observations exactly how anyone who follows him would expect, saying we should assume anything in the AI labs will leak instantly. How can you take seriously anyone who says they are building worldchanging technology but doesn’t take security on that tech seriously?

My answer would be: Because seriousness does not mean that kind of situational awareness, these people do not think about security that way. It is not in their culture, by that standard essentially no one in the West is serious period. Then again, Dominic and Leopold (and I) would bite that bullet, that in the most important sense almost no one is a serious person, there are no Reasonable Authority Figures available, etc. That’s the point.

In other not necessarily the news, on the timing of GPT-5, which is supposed to be in training now:

Davidad: Your periodic PSA that the GPT-4 pretraining run took place from ~January 2022 to August 2022.

Dean Ball covers Apple Intelligence, noting the deep commitment to privacy and how it is not so tied to OpenAI or ChatGPT after all, and puts it in context of two visions of the future. Leopold’s vision is the drop-in worker, or a system that can do anything you want if you ask it in English. Apple and Microsoft see AI as a layer atop the operating system, with the underlying model not so important. Dean suggests these imply different policy approaches.

My response would be that there is no conflict here. Apple and Microsoft have found a highly useful (if implemented well and securely) application of AI, and a plausible candidate for the medium term killer app. It is a good vision in both senses. For that particular purpose, you can mostly use a lightweight model, and for now you are wise to do so, with callouts to bigger ones when needed, which is the plan.

That has nothing to do with whether Leopold’s vision can be achieved in the future. My baseline scenario is that this will become part of your computer’s operating system and your tech stack in ways that mostly call small models, along with our existing other uses of larger models. Then, over time, the AIs get capable of doing more complex tasks and more valuable tasks as well.

Dean Ball: Thus this conflict of visions does not boil down to whether you think AI will transform human affairs. Instead it is a more specific difference in how one models historical and technological change and one’s philosophical conception of “intelligence”: Is superintelligence a thing we will invent in a lab, or will it be an emergent result of everyone on Earth getting a bit smarter and faster with each passing year? Will humans transform the world with AI, or will AI transform the world on its own?

The observation that human affairs are damn certain to be transformed is highly wise. And indeed, in the ‘AI fizzle’ worlds we get a transformation that still ‘looks human’ in this way. If capabilities keep advancing, and we don’t actively stop what wants to happen, then it will go the other way. There is nothing about the business case for Apple Intelligence that precludes the other way, except for the part where the superintelligence wipes out (or at least transforms) Apple along with everything else.

In the meantime, why not be one of the great companies Altman talked about?

Ben Thompson interviews Daniel Gross and Nat Friedman, centrally about Apple. Ben calls Apple ‘the new obvious winner from AI.’ I object, and here’s why:

Yes, Apple is a winner, great keynote. But.

Seems hard to call Apple the big winner when everyone else is winning bigger. Apple is perfectly capable of winning bigly, but this is such a conventional, ‘economic normal’ vision of the future where AI is nothing but another tool and layer on consumer products.

If that future comes to pass, then maybe. But I see no moats here of any kind. The UI is the null UI, the ‘talk to the computer’ UI. There’s no moat here. It is the obvious interface in hindsight because it was also obvious in advance. Email summaries in your inbox view? Yes, of course, if the AI is good enough and doing that is safe. The entire question was always whether you trust it to do this.

All of the cool things Apple did in their presentation? Apple may or may not have them ready for prime time soon, and all three of Apple and Google and Microsoft will have them ready within a year. If you think that Apple Intelligence is going to be way ahead of Google’s similar Android offerings in a few years, I am confused why you think that.

Nat says this straightforwardly, the investor perspective that ‘UI and products’ are the main barrier to AI rather than making the AIs smarter. You definitely need both, but ultimately I am very much on the ‘make it smarter’ side of this.

Reading the full interview, it sounds like Apple is going to have a big reputation management problem, even bigger than Google’s. They are going to have to ‘stay out of the content generation business’ and focus on summarizes and searches and so on. The images are all highly stylized. Which are all great and often useful things, but puts you at a disadvantage.

If this was all hype and there was going to be a top, we’d be near the top.

Except, no. Even if nothing advances furth, not hype. No top. Not investment advice.

But yes, I get why someone would say that.

Ropirito: Just heard a friend’s gf say that she’s doing her “MBAI” at Kellogg.

An MBA with a focus in AI.

This is the absolute top.

Daniel: People don’t understand how completely soaked in AI our lives are going to be in two years. They don’t realize how much more annoying this will get.

I mean, least of our concerns, but also yes.

An LLM can learn from only Elo 1000 chess games, play chess at Elo of 1500, which will essentially always beat an Elo 1000 player. This works, according to the paper, because you combine what different bad players know. Yevgeny Tsodikovich points out Elo 1000 players make plenty of Elo 1500 moves, and I would add tons of blunders. So if you can be ‘Elo 1000 player who knows the heuristics reasonably and without the blunders’ you plausibly are 1500 already.

Consider the generalization. There are those who think LLMs will ‘stop at human level’ in some form. Even if that is true, you can still do a ‘mixture of experts’ of those humans, plus avoiding blunders, plus speedup, plus memorization and larger context and pattern matching, and instruction following and integrated tool use. That ‘human level’ LLM is going to de facto operate far above human level, even if it has some inherent limits on its ‘raw G.’

That’s right, Donald Trump is here to talk about it. Clip is a little under six minutes.

Tyler Cowen: It feels like someone just showed him a bunch of stuff for the first time?

That’s because someone did just show him a bunch of stuff for the first time.

Also, I’ve never tried to add punctuation to a Trump statement before, I did not realize how wild a task that is.

Here is exactly what he said, although I’ve cut out a bit of host talk. Vintage Trump.

Trump: It is a superpower and you want to be right at the beginning of it but it is very disconcerting. You used the word alarming it is alarming. When I saw a picture of me promoting a product and I could not tell the voice was perfect the lips moved perfectly with every word the way you couldn’t if you were a lip reader you’d say it was absolutely perfect. And that’s scary.

In particular, in one way if you’re the President of the United States, and you announced that 13 missiles have been sent to let’s not use the name of country, we have just sent 13 nuclear missiles heading to somewhere. And they will hit their targets in 12 minutes and 59 seconds. And you’re that country. And there’s no way of detecting, you know I asked Elon is there any way that Russia or China can say that’s not really president Trump? He said there is no way.

No, they have to rely on a code. Who the hell’s going to check you got like 12 minutes and let’s check the code, gee, how’s everything doing? So what do they do when they see this, right? They have maybe a counter attack. Uh, it’s so dangerous in that way.

And another way they’re incredible, what they do is so incredible, I’ve seen it. I just got back from San Francisco. I met with incredible people in San Francisco and we talked about this. This subject is hot on their plates you know, the super geniuses, and they gave me $12 million for the campaign which 4 years ago they probably wouldn’t have, they had thousands of people on the streets you saw it. It just happened this past week. I met with incredible people actually and this is their big, this is what everyone’s talking about. With all of the technology, these are the real technology people.

They’re talking about AI, and they showed me things, I’ve seen things that are so – you wouldn’t even think it’s possible. But in terms of copycat now to a lesser extent they can make a commercial. I saw this, they made a commercial me promoting a product. And it wasn’t me. And I said, did I make that commercial? Did I forget that I made that commercial? It is so unbelievable.

So it brings with it difficulty, but we have to be at the – it’s going to happen. And if it’s going to happen, we have to take the lead over China. China is the primary threat in terms of that. And you know what they need more than anything else is electricity. They need to have electricity. Massive amounts of electricity. I don’t know if you know that in order to do these essentially it’s a plant. And the electricity needs are greater than anything we’ve ever needed before, to do AI at the highest level.

And China will produce it, they’ll do whatever they have to do. Whereas we have environmental impact people and you know we have a lot of people trying to hold us back. But, uh, massive amounts of electricity are needed in order to do AI. And we’re going to have to generate a whole different level of energy and we can do it and I think we should do it.

But we have to be very careful with it. We have to watch it. But it’s, uh, you know the words you use were exactly right it’s the words a lot of smart people are using. You know there are those people that say it takes over. It takes over the human race. It’s really powerful stuff, AI. Let’s see how it all works out. But I think as long as it’s there.

[Hosts: What about when it becomes super AI?]

Then they’ll have super AI. Super duper AI. But what it does is so crazy, it’s amazing. It can also be really used for good. I mean things can happen. I had a speech rewritten by AI out there. One of the top people. He said oh you’re going to make a speech he goes click click click, and like 15 seconds later he shows me my speech. Written. So beautify. I’m going to use this.

Q: So what did you say to your speech writer after that? You’re fired?

You’re fired. Yeah I said you’re fired, Vince, get the hell out. [laughs]. No no this was so crazy it took and made it unbelievable and so fast. You just say I’m writing a speech about these two young beautiful men that are great fighters and sort of graded a lot of things and, uh, tell me about them and say some nice things and period. And then that comes out Logan in particular is a great champion. Jake is also good, see I’m doing that only because you happen to be here.

But no it comes out with the most beautiful writing. So one industry that will be gone are these wonderful speechwriters. I’ve never seen anything like it and so quickly, a matter of literally minutes, it’s done. It’s a little bit scary.

Trump was huge for helping me understand LLMs. I realized that they were doing something remarkably similar to what he was doing, vibing off of associations, choosing continuations word by word on instinct, [other things]. It makes so much sense that Trump is super impressed by its ability to write him a speech.

What you actually want, of course, if you are The Donald, is to get an AI that is fine tuned on all of Donald Trump’s speeches, positions, opinions and particular word patterns and choices. Then you would have something.

Sure, you could say that’s all bad, if are the Biden campaign.

Biden-Harris HQ [clipping the speech part of above]: Trump claims his speeches are written by AI.

Daniel Eth: This is fine, actually. There’s nothing wrong with politicians using AI to write their speeches. Probably good, actually, for them to gain familiarity with what these systems can do.

Here I agree with Daniel. This is a totally valid use case, the familiarity is great, why shouldn’t Trump go for it.

Overall this was more on point and on the ball than I expected. The electricity point plays into his politics and worldview and way of thinking. It is also fully accurate as far as it goes. The need to ‘beat China’ also fits perfectly, and it true except for the part where we are already way ahead, although one could still worry about electricity down the line. Both of those were presumably givens.

The concerns ran our usual gamut: Deepfaketown, They Took Our Jobs and also loss of control over the future.

For deepfakes, he runs the full gamut of Things Trump Worries About. On the one hand you have global thermonuclear war. On the other you have fake commercials. Which indeed are both real worries.

(Obviously if you are told you have thirteen minutes, that is indeed enough time to check any codes or check the message details and origin several times to verify it, to physically verify the claims, and so on. Not that there is zero risk in that room, but this scenario does not so much worry me.)

It is great to hear how seamlessly he can take the threat of an AI takeover fully seriously. The affect here is perfect, establishing by default that this is a normal and very reasonable thing to worry about. Very good to hear. Yes, he is saying go ahead, but he is saying you have to be careful. No, he does not understand the details, but this seems like what one would hope for.

Also in particular, notice that no one said the word ‘regulation,’ except by implication around electricity. The people in San Francisco giving him money got him to think about electricity. But otherwise he is saying we must be careful, whereas many of his presumed donors that gave him the $12 million instead want to be careful to ensure we are not careful. This, here? I can work with it.

Also noteworthy: He did not say anything about wokeness or bias, despite clearly having spent a bunch of the conversation around Elon Musk.

Kelsey Piper writes about those opposed to SB 1047, prior to most recent updates.

Charles Foster notes proposed amendments to SB 1047, right before they happened.

There were other people talking about SB 1047 prior to the updates. Their statements contained nothing new. Ignore them.

Then Scott Wiener announced they’d amended the bill again. You have to dig into the website a bit to find them, but they’re there (look at the analysis and look for ‘6) Full text as proposed to be amended.’ It’s on page 19. The analysis Scott links to includes other changes, some of them based on rather large misunderstandings.

Before getting into the changes, one thing needs to be clear: These changes were all made by the committee. This was not ‘Weiner decides how to change the bill.’ This was other lawmakers deciding to change the bill. Yes, Weiner got some say, but anyone who says ‘this is Weiner not listening’ or similar needs to keep in mind that this was not up to him.

What are the changes? As usual, I’ll mostly ignore what the announcement says and look at the text of the bill changes. There are a lot of ‘grammar edits’ and also some minor changes that I am ignoring because I don’t think they change anything that matters.

These are the changes that I think matter or might matter.

  1. The limited duty exemption is gone. Everyone who is talking about the other changes is asking the wrong questions.

  2. You no longer have to implement covered guidance. You instead have to ‘consider’ the guidance when deciding what to implement. That’s it. Covered guidance now seems more like a potential future offer of safe harbor.

  3. 22602 (c) redefines a safety incident to require ‘an incident that demonstrably increases the risk of a critical harm occurring by means of,’ which was previously present only in clause (1). Later harm enabling wording has been altered, in ways I think are roughly similar to that. In general hazardous capability is now risk of causing a critical harm. I think that’s similar enough but I’m not 100%.

  4. 22602 (e) changes from covered guidance (all relevant terms to that deleted) and moves the definition of covered model up a level. The market price used for the $100 million is now that at the start of training, which is simpler (and slightly higher). We still could use an explicit requirement that FMD publish market prices so everyone knows where they stand.

  5. 22602 (e)(2) now has derivative models become covered models if you use 3e10^25 flops rather than 25% of compute, and any modifications that are not ‘fine-tuning’ do not count regardless of size. Starting in 2027 the FMD determines the new flop threshold for derivative models, based on how much compute is needed to cause critical harm.

  6. The requirement for baseline covered models can be changed later. Lowering it would do nothing, as noted below, because the $100 million requirement would be all that mattered. Raising the requirement could matter, if the FMD decided we could safely raise the compute threshold above what $100 million buys you in that future.

  7. Reevaluation of procedures must be annual rather than periodic.

  8. Starting in 2028 you need a certificate of compliance from an accredited-by-FMD third party auditor.

  9. A Board of Frontier Models is established, consisting of an open-source community member, an AI industry member, an academic, someone appointed by the speaker and someone appointed by the Senate rules committee. The FMD will act under their supervision.

Scott links to the official analysis on proposed amendments, and in case you are wondering if people involved understand the bill, well, a lot of them don’t. And it is very clear that these misunderstandings and misrepresentations played a crucial part in the changes to the bill, especially removing the limited duty exemption. I’ll talk about that change at the end.

The best criticism I have seen of the changes, Dean Ball’s, essentially assumes that all new authorities will be captured to extract rents and otherwise used in bad faith to tighten the bill, limiting competition for audits to allow arbitrary fees and lowering compute thresholds.

For the audits, I do agree that if all you worry about is potential to impose costs, and you can use licensing to limit competition, this could be an issue. I don’t expect it to be a major expense relative to $100 million in training costs (remember, if you don’t spend that, it’s not covered), but I put up a prediction market on that around a best guess of ‘where this starts to potentially matter’ rather than my median guess on cost. As I understand it, the auditor need only verify compliance with your own plan, rather than needing their own bespoke evaluations or expertise, so this should be relatively cheap and competitive, and there should be plenty of ‘normal’ audit firms available if there is enough demand to justify it.

Whereas the authority to change compute thresholds was put there in order to allow those exact requirements to be weakened when things changed. But also, so what if they do lower the compute threshold on covered models? Let’s say they lower it to 10^2. If you use one hundred flops, that covers you. Would that matter? No! Because the $100 million requirement will make 10^2 and 10^26 the same number very quickly. The only thing you can do with that authority, that does anything, is to raise the number higher. I actually think the bill would plausibly be better off if we eliminated the number entirely, and went with the dollar threshold alone. Cleaner.

The threshold for derivative models is the one that could in theory be messed up. It could move in either direction now. There the whole point is to correctly assign responsibility. If you are motivated by safety you want the correct answer, not the lowest you can come up with (so Meta is off the hook) or the highest you can get (so you can build a model on top of Meta’s and blame it on Meta.) Both failure modes are bad.

If, as one claim said, 3×10^25 is too high, you want that threshold lowered, no?

Which is totally reasonable, but the argument I saw that this was too high was ‘that is almost as much as Meta took to train Llama-3 405B.’ Which would mean that Llama-3 405B would not even be a covered model, and the threshold for covered models will be rising rapidly, so what are we even worried about on this?

It is even plausible that no open models would ever have been covered models in the first place, which would render derivative models impossible other than via using a company’s own fine-tuning API, and mean the whole panic about open models was always fully moot once the $100 million clause came in.

The argument I saw most recently was literally ‘they could lower the threshold to zero, rendering all derivative models illegal.’ Putting aside that it would render them covered not illegal, this goes against all the bill’s explicit instructions, such a move would be thrown out by the courts and no one has any motivation to do it, yes. In theory we could put a minimum there purely so people don’t lose their minds. But then those same people would complain the minimum was arbitrary, or an indication that we were going to move to the minimum or already did or this created uncertainty.

Instead, we see all three complaints at the same time: That the threshold could be set too high, that the same threshold could be set too low, and the same threshold could be inflexible. And those would all be bad. Which they would be, if they happened.

Dan Hendrycks: PR playbook for opposing any possible AI legislation:

  1. Flexible legal standard (e.g., “significantly more difficult to cause without access to a covered model”) –> “This is too vague and makes compliance impossible!”

  2. Clear and specific rule (e.g., 10^26 threshold) –> “This is too specific! Why not 10^27? Why not 10^47? This will get out of date quickly.”

  3. Flexible law updated by regulators –> “This sounds authoritarian and there will be regulatory capture!” Legislation often invokes rules, standards, and regulatory agencies. There are trade-offs in policy design between specificity and flexibility.

It is a better tweet, and still true, if you delete the word ‘AI.’

These are all problems. Each can be right some of the time. You do the best you can.

When you see them all being thrown out maximally, you know what that indicates. I continue to be disappointed by certain people who repeatedly link to bad faith hyperbolic rants about SB 1047. You know who you are. Each time I lose a little more respect for you. But at this point very little, because I have learned not to be surprised.

All of the changes above are relatively minor.

The change that matters is that they removed the limited duty exemption.

This clause was wildly misunderstood and misrepresented. The short version of what it used to do was:

  1. If your model is not going to be or isn’t at the frontier, you can say so.

  2. If you do, ensure that is still true, otherwise most requirements are waived.

  3. Thus models not at frontier would have trivial compliance cost.

This was a way to ensure SB 1047 did not hit the little guy.

It made the bill strictly easier to comply with. You never had to take the option.

Instead, everyone somehow kept thinking this was some sort of plot to require you to evaluate models before training, or that you couldn’t train without the exception, or otherwise imposing new requirements. That wasn’t true. At all.

So you know what happened in committee?

I swear, you cannot make this stuff up, no one would believe you.

The literal Chamber of Commerce stepped in to ask for the clause to be removed.

Eliminating the “limited duty exemption.” The bill in print contains a mechanism for developers to self-certify that their models possess no harmful capabilities, called the “limited duty exemption.” If a model qualifies for one of these “exemptions,” it is not subject to any of downstream requirements of the bill. Confusingly, developers are asked to make this assessment before a model has been trained—that is, before it exists.

Writing in opposition, the California Chamber of Commerce explains why this puts developers in an impossible position:

SB 1047 still makes it impossible for developers to actually determine if they can provide reasonable assurance that a covered model does not have hazardous capabilities and therefore qualifies for limited duty exemption because it requires developers to make the determination before they initiate training of the covered model . . . Because a developer needs to test the model by training it in a controlled environment to make determination that a model qualifies for the exemption, and yet cannot train a model until such a determination is made, SB 1047 effectively places developers in a perpetual catch-22 and illogically prevents them from training frontier models altogether.

So the committee was convinced. The limited duty exemption clause is no more.

You win this one, Chamber of Commerce.

Did they understand what they were doing? You tell me.

How much will this matter in practice?

Without the $100 million threshold, this would have been quite bad.

With the $100 million threshold in place, the downside is far more limited. The class of limited duty exception models was going to be models that cost over $100 million, but which were still behind the frontier. Now those models will have additional requirements and costs imposed.

As I’ve noted before, I don’t think those costs will be so onerous, especially when compared with $100 million in compute costs. Indeed, you can come up with your own safety plan, so you could write down ‘this model is obviously not dangerous because it is 3 OOMs behind Google’s Gemini 3 so we’re not going to need to do that much more.’ But there was no need for it to even come to that.

This is how democracy in action works. A bunch of lawmakers who do not understand come in, listen to a bunch of lobbyists and others, and they make a mix of changes to someone’s carefully crafted bill. Various veto holders demand changes, often that you realize make little sense. You dream it improves the bill, mostly you hope it doesn’t make things too much worse.

My overall take is that the changes other than the limited duty exemption are minor and roughly sideways. Killing the limited duty exemption is a step backwards. But it won’t be too bad given the other changes, and it was demanded by exactly the people the change will impose costs upon. So I find it hard to work up all that much sympathy.

Pope tells G7 that humans must not lose control over AI. This was his main message as the first pope to address the G7.

The Pope: We would condemn humanity to a future without hope if we took away people’s ability to make decisions about themselves and their lives by dooming them to depend on the choices of machines. We need to ensure and safeguard a space for proper human control over the choices made by artificial intelligence programs: human dignity itself depends on it.

That is not going to be easy.

Samo Burja: Pretty close to the justification for the Butlerian Jihad in Frank Herbert’s Dune.

If you thought the lying about ‘the black box nature of AI models has been solved’ was bad, and it was, Mistral’s CEO Arthur Mensch would like you to hold his wine.

Arthur Mensch (CEO Mistral), to the French Senate: When you write this kind of software, you always control what will happen, all the outputs of the software.

We are talking about software, nothing has changed, this is just a programming language, nobody can be controlled by their programming language.

An argument that we should not restrict export of cyber capabilities, because offensive capabilities are dual use, so this would include ‘critical’ cybersecurity services, and we don’t want to hurt the defensive capabilities of others. So instead focus on defensive capabilities, says Matthew Mittlesteadt. As usual with such objections, I think this is the application of pre-AI logic and especially heuristics without thinking through the nature of future situations. It also presumes that the proposed export restriction authority is likely to be used overly broadly.

Anthropic team discussion on scaling interpretability.

Katja Grace goes on London Futurists to talk AI.

Rational Animations offers a video about research on interpreting InceptionV1. Chris Olah is impressed how technically accurate and accessible this managed to be at once.

From last week’s discussion on Hard Fork with Trudeau, I got a chance to listen. He was asked about existential risk, and pulled out the ‘dystopian science fiction’ line and thinks there is not much we can do about it for now, although he also did admit it was a real concern later on. He emphases ‘AI for good’ to defeat ‘AI for bad.’ He’s definitely not there now and is thinking about existential risks quite wrong, but he sounds open to being convinced later. His thinking about practical questions was much better, although I wish he’d lay off the Manichean worldview.

One contrast that was enlightening: Early on Trudeau sounds like a human talking to a human. When he was challenged on the whole ‘force Meta to support local journalism’ issue, he went into full political bullshit rhetoric mode. Very stark change.

Expanding from last week: Francois Chollet went on Dwarkesh Patel to claim that OpenAI set AI back five years and launch a million dollar prize to get to 85% on the ARC benchmark, which is designed to resist memorization by only requiring elementary knowledge any child knows and asking new questions.

No matter how much I disagree with many of Chollet’s claims, the million dollar prize is awesome. Put your money where your mouth is, this is The Way. Many thanks.

Kerry Vaughan-Rowe: This is the correct way to do LLM skepticism.

Point specifically to the thing LLMs can’t do that they should be able to were they generally intelligent, and then see if future systems are on track to solve these problems.

Chollet says the point of ARC is to make the questions impossible to anticipate. He admits it does not fully succeed.

Instead, based on the sample questions, I’d say ARC is best solved by applying some basic heuristics, and what I did to instantly solve the samples was closer to ‘memorization’ than Chollet wants to admit. It is like math competitions, sometimes you use your intelligence but in large part you learn patterns and then you pattern match. Momentum. Symmetry. Frequency. Enclosure. Pathfinding.

Here’s an example of a pretty cool sample problem.

There’s some cool misleading involved here, but ultimately it is very simple. Yes, I think a lot of five year olds will solve this, provided they are motivated. Once again, notice there is essentially a one word answer, and that it would go in my ‘top 100 things to check’ pile.

Why do humans find ARC simple? Because ARC is testing things that humans pick up. It is a test designed for exactly human-shaped things to do well, that we prepare for without needing to prepare, and that doesn’t use language. My guess is that if I used all such heuristics I had and none of them worked, my score on any remaining ARC questions would not be all that great.

If I was trying to get an LLM to get a good score on ARC I would get a list of such patterns, write a description of each, and ask the LLM to identify which ones might apply and check them against the examples. Is pattern matching memorization? I can see it both ways. Yes, presumably that would be ‘cheating’ by Chollet’s principles. But by those principles humans are almost always cheating on everything. Which Chollet admits (around 27: 40) but says humans also can adapt and that’s what matters.

At minimum, he takes this too far. At (28: 55) he says every human day is full of novel things that they’ve not been prepared for. I am very confident this is hugely false, not merely technically false. Not only is it possible to do this, I am going to outright say that the majority of human days are exactly this, if we count pattern matching under memorization.

This poll got confounded by people reading it backwards (negations are tricky) but the point remains that either way about roughly half of people think the answer is on each side of 50%, very different from his 100%.

At (29: 45) Chollet is asked for an example, and I think this example was a combination of extremely narrow (go on Dwarkesh) and otherwise wrong.

He says memorization is not intelligence, so LLMs are dumb. I don’t think this is entirely No True Scotsman (NTS). The ‘raw G’ aspect is a thing that more memorization can’t increase. I do think this perspective is in large part NTS though. No one can tackle literally any problem, if you were to do an adversarial search for the right problem, especially if you can name a problem that ‘seems simple’ in some sense with the knowledge a human has, but that no human can do.

I liked the quote at 58: 40, “Intelligence is what you use when you don’t know what to do.” Is it also how you figure out what to do so you don’t need your intelligence later?

I also appreciated the point that intelligence potential is mostly genetic. No amount of training data will turn most people into Einstein, although lack of data or other methods can make Einstein effectively stupider. Your model architecture and training method are going to have a cap on how ‘intelligent’ it can get in some sense.

At 1: 04: 00 they mention that benchmarks only get traction once they become tractable. If no one can get a reasonable score then no one bothers. So no wonder our most used benchmarks keep getting saturated.

This interview was the first time I can remember that Dwarkesh was getting visibly frustrated, while doing a noble attempt to mitigate it. I would have been frustrated as well.

At 1: 06: 30 Mike Knoop complains that everyone is keeping their innovations secret. Don’t these labs know that sharing is how we make progress? What an extreme bet on these exact systems. To which I say, perhaps valuable trade secrets are not something it is wise to tell the world, even if you have no safety concerns? Why would DeepMind tell OpenAI how they got a longer context window? They claim OpenAI did that, and also got everyone to hyperfocus on LLMs, so OpenAI delayed progress to AGI by 5-10 years, since LLMs are an ‘off ramp’ on the road to AI. I do not see it that way, although I am hopeful they are right. It is so weird to think progress is not being made.

There is a common pattern of people saying ‘no way AIs can do X any time soon, here’s a prize’ and suddenly people figure out how to make AIs do X.

The solution here is not eligible for the prize, since it uses other tools you are not supposed to use, but still, that escalated quickly.

Dwarkesh Patel: I asked Buck about his thoughts on ARC-AGI to prepare for interviewing François Chollet.

He tells his coworker Ryan, and within 6 days they’ve beat SOTA on ARC and are on the heels of average human performance. 🤯

“On a held-out subset of the train set, where humans get 85% accuracy, my solution gets 72% accuracy.”

Buck Shlegeris: ARC-AGI’s been hyped over the last week as a benchmark that LLMs can’t solve. This claim triggered my dear coworker Ryan Greenblatt so he spent the last week trying to solve it with LLMs. Ryan gets 71% accuracy on a set of examples where humans get 85%; this is SOTA.

[Later he learned it was unclear that this was actually SoTA, as private efforts are well ahead of public efforts for now.]

Ryan’s approach involves a long, carefully-crafted few-shot prompt that he uses to generate many possible Python programs to implement the transformations. He generates ~5k guesses, selects the best ones using the examples, then has a debugging step.

The results:

Train set: 71% vs a human baseline of 85%

Test set: 51% vs prior SoTA of 34% (human baseline is unknown)

(The train set is much easier than the test set.)

(These numbers are on a random subset of 100 problems that we didn’t iterate on.)

This is despite GPT-4o’s non-reasoning weaknesses:

– It can’t see well (e.g. it gets basic details wrong)

– It can’t code very well

– Its performance drops when there are more than 32k tokens in context.

These are problems that scaling seems very likely to solve.

Scaling the number of sampled Python rules reliably increase performance (+3% accuracy for every doubling). And we are still quite far from the millions of samples AlphaCode uses!

The market says 51% chance the prize is claimed by end of year 2025 and 23% by end of this year.

Davidad: AI scientists in 1988: Gosh, AI sure can play board games, solve math problems, and do general-purpose planning, but there is a missing ingredient: they lack common-sense knowledge, and embodiment.

AI scientists in 2024: Gosh, AI sure does have more knowledge than humans, but…

Moravec’s Paradox Paradox: After 35 years of progress, actually, it turns out AI *can’tbeat humans at checkers, or reliably perform accurate arithmetic calculations, “AlphaGo? That was, what, 2016? AI hadn’t even been *inventedyet. It must have been basically fake, like ELIZA. You need to learn the Bitter Lesson,”

The new “think step by step” is “Use python.”

When is it an excellent technique versus a hopeless one?

Kitten: Don’t let toad blackpill you, cookie boxing is an excellent technique to augment your own self-control Introducing even small amounts of friction in the path of a habit you want to avoid produces measurable results.

If you want to spend less time on your phone, try putting it in a different room Sure you could just go get it, but that’s actually much harder than taking it out of your pocket.

Dr. Dad, PhD: The reverse is also true: remove friction from activities you want to do more.

For personal habits, especially involving temptation and habit formation, this is great on the margin and the effective margin can be extremely wide. Make it easier to do the good things and avoid the bad things (as you see them) and both you and others will do more good things and less bad things. A More Dakka approach to this is recommended.

The problem is this only goes so far. If there is a critical threshold, you need to do enough that the threshold is never reached. In the cookie example, there are only so many cookies. They are very tempting. If the goal is to eat less cookies less often? Box is good. By the same lesson, giving the box to the birds, so you’ll have to bake more, is even better. However, if Toad is a cookiehaulic, and will spiral into a life of sugar if he eats even one more, then the box while better than not boxing is probably no good. An alcoholic is better off booze boxing than having it in plain sight by quite a lot, but you don’t box it, you throw the booze out. Or if the cookies are tempting enough that the box won’t matter much, then it won’t matter much.

The danger is the situation where:

  1. If the cookies are super tempting, and you box, you still eat all the cookies.

  2. If the cookies are not that tempting, you were going to eat a few more cookies, and now you can eat less or stop entirely.

Same thing (metaphorically) holds with various forms of AI boxing, or other attempts to defend against or test or control or supervise or restrict or introduce frictions to an AGI or superintelligence. Putting friction in the way can be helpful. But it is most helpful exactly when there was less danger. The more capable and dangerous the AI, the better it will be at breaking out, and until then you might think everything is fine because it did not see a point in tr

5`ying to open the box. Then, all the cookies.

I know you mean well, Ilya. We wish you all the best.

Alas. Seriously. No. Stop. Don’t.

Theo: The year is 2021. A group of OpenAI employees are worried about the company’s lack of focus on safe AGI, and leave to start their own lab.

The year is 2023. An OpenAI co-founder is worried about the company’s lack of focus on safe AGI, so he starts his own lab.

The year is 2024

Ilya Sutskever: I am starting a new company.

That’s right. But don’t worry. They’re building ‘safe superintelligence.’

His cofounders are Daniel Gross and Daniel Levy.

The plan? A small ‘cracked team.’ So no, loser, you can’t get in.

No products until superintelligence. Go.

Ilya, Daniel and Daniel: We’ve started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence.

It’s called Safe Superintelligence Inc.

SSI is our mission, our name, and our entire product roadmap, because it is our sole focus. Our team, investors, and business model are all aligned to achieve SSI.

We approach safety and capabilities in tandem, as technical problems to be solved through revolutionary engineering and scientific breakthroughs. We plan to advance capabilities as fast as possible while making sure our safety always remains ahead.

This way, we can scale in peace.

Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures.

We are an American company with offices in Palo Alto and Tel Aviv, where we have deep roots and the ability to recruit top technical talent.

We are assembling a lean, cracked team of the world’s best engineers and researchers dedicated to focusing on SSI and nothing else.

If that’s you, we offer an opportunity to do your life’s work and help solve the most important technical challenge of our age.

Now is the time. Join us.

Ilya Sutskever: This company is special in that its first product will be the safe superintelligence, and it will not do anything else up until then. It will be fully insulated from the outside pressures of having to deal with a large and complicated product and having to be stuck in a competitive rat race.

By safe, we mean safe like nuclear safety as opposed to safe as in ‘trust and safety.’

Daniel Gross: Out of all the problems we face, raising capital is not going to be one of them.

Nice work if you can get it. Why have a product when you don’t have to? In this case, with this team, it is highly plausible they do not have to.

Has Ilya figured out what a safe superintelligence would look like?

Ilya Sutskever: At the most basic level, safe superintelligence should have the property that it will not harm humanity at a large scale. After this, we can say we would like it to be a force for good. We would like to be operating on top of some key values. Some of the values we were thinking about are maybe the values that have been so successful in the past few hundred years that underpin liberal democracies, like liberty, democracy, freedom.

So not really, no. Hopefully he can figure it out as he goes.

How do they plan to make it safe?

Eliezer Yudkowsky: What’s the alignment plan?

Based Beff Jezos: words_words_words.zip.

Eliezer Yudkowsky (reply to SSI directly): If you have an alignment plan I can’t shoot down in 120 seconds, let’s hear it. So far you have not said anything different from the previous packs of disaster monkeys who all said exactly this almost verbatim, but I’m open to hearing better.

All I see so far is that they are going to treat it like an engineering problem. Good that they see it as nuclear safety rather than ‘trust and safety,’ but that is far from a complete answer.

Danielle Fong: When you’re naming your AI startup.

LessWrong coverage is here. Like everyone else I am deeply disappointed in Ilya Stutskever for doing this, but at this point I am not mad. That does not seem helpful.

A noble attempt: Rob Bensinger suggests new viewpoint labels.

Rob Bensinger: What if we just decided to make AI risk discourse not completely terrible?

Rob Bensinger: By “p(doom)” or “AI risk level” here, I just mean your guess at how likely AI development and deployment is to destroy the vast majority of the future’s value. (E.g., by killing or disempowering everyone and turning the future into something empty or dystopian.)

I’m not building in any assumptions about how exactly existential catastrophe happens. (Whether it’s fast or slow, centralized or distributed, imminent or centuries away, caused accidentally or caused by deliberate misuse, etc.)

As a sanity-check that none of these terms are super far off from expectations, I ran some quick Twitter polls.

I ended up going with “wary” for the 2-20% bucket based on the polls; then “alarmed” for the 20-80% bucket.

(If I thought my house was on fire with 30% probability, I think I’d be “alarmed”. If I thought it was on fire with 90% probability, then I think saying “that’s alarming” would start to sound like humorous understatement! 90% is terrifying.)

The highest bucket was the trickiest one, but I think it’s natural to say “I feel grim about this” or “the situation looks grim” when success looks like a longshot. Whereas if success is 50% or 70% likely, the situation may be perilous but I’m not sure I’d call it “grim”.

If you want a bit more precision, you could distinguish:

low AGI-wary = 2-10%

high AGI-wary = 10-20%

low AGI-alarmed = 20-50%

high AGI-alarmed = 50-80%

low AGI-grim: 80-98%

high AGI-grim: 98+%

… Or just use numbers. But be aware that not everyone is calibrated, and probabilities like “90%” are widely misused in the world at large.

(On this classification: I’m AGI-grim, an AI welfarist, and an AGI eventualist.)

Originally Rob had ‘unworried’ for the risk fractionalists. I have liked worried and unworried, where the threshold is not a fixed percentage but how you view that percentage.

To me the key is how you view your number, and what you think it implies, rather than the exact number. If I had to pick a number for the high threshold, I think I would have gone 90% over 80%, because 90% to me is closer to where your actual preferences over actions start shifting a lot. On the lower end it is far more different for different people, but I think I’d be symmetrical and put it at 10% – the ‘Leike zone.’

And of course there are various people saying, no, this doesn’t fully capture [dynamic].

Ultimately I think this is fun, but that you do not get to decide that the discourse will not suck. People will refuse to cooperate with this project, and are not willing to use this many different words, let alone use them precisely. That doesn’t mean it is not worthy trying.

Sadly true reminder from Andrew Critch that no, there is no safety research both advances safety and does not risk accelerating AGI. There are better and worse things to work on, but there is no ‘safe play.’ Never was.

Eliezer Yudkowsky lays down a marker.

Eliezer Yudkowsky: In another two years news reports may be saying, “They said AI would kill us all, but actually, we got these amazing personal assistants and concerning girlfriends!” Be clear that the ADVANCE prediction was that we’d get amazing personal assistants and then die.

Yanco (then QTed by Eliezer): “They said alcohol would kill my liver, but actually, I had been to some crazy amazing parties, and got laid a lot!”

Zach Vorheis (11.8 million views, Twitter is clearly not my medium): My god, this paper by that open ai engineer is terrifying. Everything is about to change. AI super intelligence by 2027.

Eliezer Yudkowsky: If there is no superintelligence by 2027 DO NOT BLAME ME FOR HIS FORECASTS.

Akram Choudhary: Small correction. Leopold says automated researcher by 2027 and not ASI and on his view it seems the difference isn’t trivial.

Eliezer Yudkowsky: Okay but also do not blame me for whatever impressions people are actually taking away from his paper, which to be fair may not be Achenbrenner’s fault, but I KNOW THEY’LL BLAME ME ANYWAYS

Eliezer is making this specific prediction now. He has made many similar statements in the past, that AI will provide cool things to us up until the critical point. And of course constantly people make exactly the mistake Eliezer is warning about here.

Eliezer also tries to explain (yet again) that the point of the paperclip maximizer is not that it focuses only on paperclips (n=1) but that it focuses on some set of goals at all (n=any) without including things that are maximized when there exist sentient beings who care about each other (or whatever else it is you actually care about). Having some relation to the thing you care about, having it positively correlated within situations close to ours or in general, is not enough. All that matters is the global optimum. You aren’t in it.

Also this keeps needing to be said:

Eric Falkenstein: A self-replicating solar-powered thing that did not rely on humans would be a miracle. Everything is possible. Imagining it does not imply the probability is > 1e-100.

Noah Topper: Touch grass.

Eliezer Yudkowsky: There is a way of seeing the world where you look at a blade of grass and see “a solar-powered self-replicating factory”. I’ve never figured out how to explain how hard a superintelligence can hit us, to someone who does not see from that angle. It’s not just the one fact.

Jeffrey Ladish attempting (I think mostly in vain) to help people understand the following simple and important point:

Eliezer Yudkowsky: When you build a super intelligence you do not have a superintelligence, it has you.

Anthropic releases a paper called ‘Sycophancy to subterfuge: Investigating reward tampering in language models.’ Essentially they took an LLM and trained it on a series of tasks, each offering an ability to cheat in increasingly egregious ways, until the last one involved an opportunity to rewrite its own reward function.

If you offer a default version this opportunity or ask it to cheat on engineering tests, it does those things never. If you ‘ease it into’ such behavior via the previous training starting with sycophancy on political views, then on telling Vogons they have good poetry, then cheating on engineering tests, it does try to rewrite its reward function sometimes, although still rarely in their tests.

Why so rarely? They speculate that only a few examples are not enough to cause proper generalization, given the training on millions of previous examples. This seems right to me. The test here successfully got a reaction very quickly.

Essentially, if you train your LLM to be good, and give it training situations where it can be somewhat less good and that pays off, it will do that, and this will generalize. You do not have to ever tell it to cheat or even flatter, you simply have to fall for such tactics when they are tried.

What is scary is that most practical reward systems are going to make these kinds of mistakes. Not as reliably as in this test, but yes of course mirroring people’s political beliefs gets more thumbs up. Humans know this, humans have trained on that data, and humans totally learned that behavior. Same thing with telling other people their poetry is not terrible. And every so often, yes, a model will have the opportunity to cheat on a test.

As I’ve said before, the question is whether the model is capable enough to ‘get away with it.’ If starting to use these strategies can pay off, if there is enough of a systematic error to enable that at the model’s level of capability, then the model will find it. With a sufficiently strong model versus its evaluators, or with evaluators making systematic errors, this definitely happens, for all such errors. What else would you even want the AI to do? Realize you were making a mistake?

I am very happy to see this paper, and I would like to see it extended to see how far we can go.

Some fun little engineering challenges Anthropic ran into while doing other alignment work, distributed shuffling and feature visualization pipeline. They are hiring, remote positions available if you can provide 25% office time, if you are applying do your own work and form your own opinion about whether you would be making things better.

As always this is The Way, Neel Nanda congratulating Dashiell Stander, who showed Nanda was wrong about the learned algorithm for arbitrary group composition.

Another problem with alignment, especially if you think of it as side constraints as Leopold does, is that refusals depend on the request being blameworthy. If you split a task among many AIs, that gets trickier. This is a known technology humans use amongst themselves for the same reason. An action breaking the rules or being ‘wrong’ depends on context. When necessary, that context gets warped.

cookingwong: LLMs will be used for target classification. This will not really be the line crossing of “Killer AI.” In some ways, we already have it. Landmines ofc, and also bullets are just “executing their algorithms.”

One LLM classifies a target, another points the laser, the other “releases the weapon” and the final one on the bomb just decides when to “detonate.” Each AI entity has the other to blame for the killing of a human.

This diffusion of responsibility inherent to mosaic warfare breaks the category of “killer AI”. You rabble need better terms.

It is possible to overcome this, but not with a set of fixed rules or side constraints.

From Helen Toner and G. J. Rudner, Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning. How can we build a system that knows what it doesn’t know? It turns out this is hard. Twitter thread here.

I agree with Greg that this sounds fun, but also it hasn’t ever actually been done?

Greg Brockman: A hard but very fun part of machine learning engineering is following your own curiosity to chase down every unexplained detail of system behavior.

Not quite maximally worried. Eliezer confirms his p(doom) < 0.999.

This is still the level of such thinking in the default case.

vicki: Sorry I can’t take the AGI risk seriously, like do you know how many stars need to align to even deploy one of these things. if you breathe the wrong way or misconfigure the cluster or the prompt template or the vLLM version or don’t pin the transformers version —

Claude Opus: Sorry I can’t take the moon landing seriously. Like, do you know how many stars need to align to even launch a rocket? If you calculate the trajectories wrong, or the engines misfire, or the guidance system glitches, or the parachutes fail to deploy, or you run out of fuel at the wrong time — it’s a miracle they even got those tin cans off the ground, let alone to the moon and back. NASA’s living on a prayer with every launch.

Zvi: Sorry I can’t take human risk seriously, like do you know how many stars need to align to even birth one of these things. If you breathe the wrong way or misconfigure the nutrition mix or the cultural template or don’t send them through 16 years of schooling without once stepping fully out of line —

So when Yann LeCun at least makes a falsifiable claim, that’s great progress.

Yann LeCun: Doomers: OMG, if a machine is designed to maximize utility, it will inevitably diverge 😱

Engineers: calm down, dude. We only design machines that minimize costs. Cost functions have a lower bound at zero. Minimizing costs can’t cause divergence unless you’re really stupid.

Eliezer Yudkowsky: Of course we thought that long long ago. One obvious issue is that if you minimize an expectation of a loss bounded below at 0, a rational thinker never expects a loss of exactly 0 because of Cromwell’s Rule. If you expect loss of 0.001 you can work harder and maybe get to 0.0001. So the desideratum I named “taskishness”, of having an AI only ever try its hand at jobs that can be completed with small bounded amounts of effort, is not fulfilled by open-ended minimization of a loss function bounded at 0.

The concept you might be looking for is “expected utility satisficer”, where so long as the expectation of utility reaches some bound, the agent declares itself done. One reason why inventing this concept doesn’t solve the problem is that expected utility satisficing is not reflectively stable; an expected utility satisficer can get enough utility by building an expected utility maximizer.

Not that LeCun is taking the issue seriously or thinking well, or anything. At this point, one takes what one can get. Teachable moment.

From a few weeks ago, but worth remembering.

On the contrary: If you know you know.

Standards are up in some ways, down in others.

Nothing to see here (source).

AI #69: Nice Read More »

ai-#65:-i-spy-with-my-ai

AI #65: I Spy With My AI

In terms of things that go in AI updates, this has been the busiest two week period so far. Every day ends with more open tabs than it started, even within AI.

As a result, some important topics are getting pushed to whenever I can give them proper attention. Triage is the watchword.

In particular, this post will NOT attempt to cover:

  1. Schumer’s AI report and proposal.

    1. This is definitely RTFB. Don’t assume anything until then.

  2. Tyler Cowen’s rather bold claim that: “May 2024 will be remembered as the month that the AI safety movement died.”

    1. Rarely has timing of attempted inception of such a claim been worse.

    2. Would otherwise be ready with this but want to do Schumer first if possible.

    3. He clarified to me has not walked back any of his claims.

  3. The AI Summit in Seoul.

    1. Remarkably quiet all around, here is one thing that happened.

  4. Anthropic’s new interpretability paper.

    1. Potentially a big deal in a good way, but no time to read it yet.

  5. DeepMind’s new scaling policy.

    1. Initial reports are it is unambitious. I am reserving judgment.

  6. OpenAI’s new model spec.

    1. It looks solid as a first step, but pausing until we have bandwidth.

  7. Most ongoing issues with recent fallout for Sam Altman and OpenAI.

    1. It doesn’t look good, on many fronts.

    2. While the story develops further, if you are a former employee or have a tip about OpenAI or its leadership team, you can contact Kelsey Piper at kelsey.piper@vox.com or on Signal at 303-261-2769.

  8. Also: A few miscellaneous papers and reports I haven’t had time for yet.

My guess is at least six of these eight get their own posts (everything but #3 and #8).

So here is the middle third: The topics I can cover here, and are still making the cut.

Still has a lot of important stuff in there.

From this week: Do Not Mess With Scarlett Johansson, On Dwarkesh’s Podcast with OpenAI’s John Schulman, OpenAI: Exodus, GPT-4o My and Google I/O Day

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. People getting used to practical stuff.

  4. Language Models Don’t Offer Mundane Utility. Google Search, Copilot ads.

  5. OpenAI versus Google. Similar new offerings. Who presented it better? OpenAI.

  6. GPT-4o My. Still fast and cheap, otherwise people are less impressed so far.

  7. Responsible Scaling Policies. Anthropic offers an update on their thinking.

  8. Copyright Confrontation. Sony joins the action, AI-funded lawyers write columns.

  9. Deepfaketown and Botpocalypse Soon. How bad will it get?

  10. They Took Our Jobs. If these are the last years of work, leave it all on the field.

  11. Get Involved. UK AI Safety Institute is hiring and offering fast grants.

  12. Introducing. Claude use tool, Google Maps AI features.

  13. Reddit and Weep. They signed with OpenAI. Curiously quiet reaction from users.

  14. In Other AI News. Newscorp also signs with OpenAI, we can disable TSMC.

  15. I Spy With My AI. Who wouldn’t want their computer recording everything?

  16. Quiet Speculations. How long will current trends hold up?

  17. Politico is at it Again. Framing the debate as if all safety is completely irrelevant.

  18. Beating China. A little something from the Schumer report on immigration.

  19. The Quest for Sane Regulation. UK’s Labour is in on AI frontier model regulation.

  20. SB 1047 Update. Passes California Senate, Weiner offers open letter.

  21. That’s Not a Good Idea. Some other proposals out there are really quite bad.

  22. The Week in Audio. Dwarkesh as a guest, me on Cognitive Revolution.

  23. Rhetorical Innovation. Some elegant encapsulations.

  24. Aligning a Smarter Than Human Intelligence is Difficult.

  25. The Lighter Side. It’s good, actually. Read it now.

If at first you don’t succeed, try try again. For Gemini in particular, ‘repeat the question exactly in the same thread’ has had a very good hit rate for me on resolving false refusals.

Claim that GPT-4o gets greatly improved performance on text documents if you put them in Latex format, vastly improving effective context window size.

Rowan Cheung strongly endorses the Zapier Central Chrome extension as an AI tool.

Get a summary of the feedback from your practice demo on Zoom.

Get inflation expectations, and see how they vary based on your information sources. Paper does not seem to focus on the questions I would find most interesting here.

Sully is here for some of your benchmark needs.

Sully Omarr: Underrated: Gemini 1.5 Flash.

Overrated: GPT-4o.

We really need better ways to benchmark these models, cause LMSYS ain’t it.

Stuff like cost, speed, tool use, writing, etc., aren’t considered.

Most people just use the top model based on leaderboards, but it’s way more nuanced than that.

To add here:

I have a set of ~50-100 evals I run internally myself for our system.

They’re a mix match of search-related things, long context, writing, tool use, and multi-step agent workflows.

None of these metrics would be seen in a single leaderboard score.

Find out if you are the asshole.

Aella: I found an old transcript of a fight-and-then-breakup text conversation between me and my crush from when I was 16 years old.

I fed it into ChatGPT and asked it to tell me which participant was more emotionally mature, and it said I was.

Gonna start doing this with all my fights.

Guys LMFAO, the process was I uploaded it to get it to convert the transcript to text (I found photos of printed-out papers), and then once ChatGPT had it, I was like…wait, now I should ask it to analyze this.

The dude was IMO pretty abusive, and I was curious if it could tell.

Eliezer Yudkowsky: hot take: this is how you inevitably end up optimizing your conversation style to be judged as more mature by LLMs; and LLMs currently think in a shallower way than real humans; and to try to play to LLMs and be judged as cooler by them won’t be good for you, or so I’d now guess.

To be clear, this is me trying to read a couple of steps ahead from the act that Aella actually described. Maybe instead, people just get good at asking with prompts that sound neutral to a human but reliably get ChatGPT to take their side.

Why not both? I predict both. If AIs are recording and analyzing everything we do, then people will obviously start optimizing their choices to get the results they want from the AIs. I would not presume this will mean that a ‘be shallower’ strategy is the way to go, for example LLMs are great and sensing the vibe that you’re being shallow, and also their analysis should get less shallow over time and larger context windows. But yeah, obviously this is one of those paths that leads to the dark side.

Ask for a one paragraph Strassian summary. Number four will not shock you.

Own your HOA and its unsubstantiated violations, by taking their dump of all their records that they tried to overwhelm you with, using a script to convert to text, using OpenAI to get the data into JSON and putting it into a Google map, proving the selective enforcement. Total API cost: $9. Then they found the culprit and set a trap.

Get greatly enriched NBA game data and estimate shot chances. This is very cool, and even in this early state seems like it would enhance my enjoyment of watching or the ability of a team to do well. The harder and most valuable parts still lay ahead.

Turn all your unstructured business data into what is effectively structured business data, because you can run AI queries on it. Aaron Levie says this is why he is incredibly bullish on AI. I see this as right in the sense that this alone should make you bullish, and wrong in the sense that this is far from the central thing happening.

Or someone else’s data, too. Matt Bruenig levels up, uses Gemini Flash to extract all the NLRB case data, then uses ChatGPT to get a Python script to turn it into clickable summaries. 66k cases, output looks like this.

Would you like some ads with that? Link has a video highlighting some of the ads.

Alex Northstar: Ads in AI. Copilot. Microsoft.

My thoughts: Noooooooooooooooooooooooooooooooooooooo. No. No no no.

Seriously, Google, if I want to use Gemini (and often I do) I will use Gemini.

David Roberts: Alright, Google search has officially become unbearable. What search engine should I switch to? Is there a good one?

Samuel Deats: The AI shit at the top of every search now and has been wrong at least 50% of the time is really just killing Google for me.

I mean, they really shouldn’t be allowed to divert traffic away from websites they stole from to power their AI in the first place…

Andrew: I built a free Chrome plugin that lets you turn the AI Overview’s on/off at the touch of a button.

The good news is they have gotten a bit better about this. I did a check after I saw this, and suddenly there is a logic behind whether the AI answer appears. If I ask for something straightforward, I get a normal result. If I ask for something using English grammar, and imply I have something more complex, then the AI comes out. That’s not an entirely unreasonable default.

The other good news is there is a broader fix. Ernie Smith reports that if you add “udm=14” to the end of your Google search, this defaults you into the new Web mode. If this is for you, GPT-4o suggests using Tampermonkey to append this automatically, or you can use this page on Chrome to set defaults.

American harmlessness versus Chinese harmlessness. Or, rather, American helpfulness versus Chinese unhelpfulness. The ‘first line treatment’ for psychosis is not ‘choose from this list of medications’ it is ‘get thee to a doctor.’ GPT-4o gets an A on both questions, DeepSeek-V2 gets a generous C maybe for the first one and an incomplete on the second one. This is who we are worried about?

What kind of competition is this?

Sam Altman: I try not to think about competitors too much, but I cannot stop thinking about the aesthetic difference between OpenAI and Google.

Whereas here’s my view on that.

As in, they are two companies trying very hard to be cool and hip, in a way that makes it very obvious that this is what they are doing. Who is ‘right’ versus ‘wrong’? I have no idea. It is plausible both were ‘right’ given their goals and limitations. It is also plausible that this is part of Google being horribly bad at presentations. Perhaps next time they should ask Gemini for help.

I do think ‘OpenAI won’ the presentation war, in the sense that they got the hype and talk they wanted, and as far as I can tell Google got a lot less, far in excess of the magnitude of any difference in the underlying announcements and offerings. Well played, OpenAI. But I don’t think this is because of the background of their set.

I also think that if this is what sticks in Altman’s mind, and illustrates where his head is at, that could help explain some other events from the past week.

I would not go as far as Teortaxes here, but directionally they have a point.

Teortaxes: Remark of a small, bitter man too high on his own supply, too deep into the heist. Seeing this was literally the first time I have thought that OpenAI under Altman might be a bubble full of hot air.

This is how you lose the mandate of Heaven.

Google had lost it long ago, though. Maybe this inspired unwarranted complacency.

What true statements people choose to make publicly is very telling.

Ethan Mollick reports on why GPT-4o matters. He thinks, highly plausibly, that the biggest deal is free access. He does not mention the speed boost or API price drop, and is looking forward to trying the multimodal features but lacks access. He emphasizes the shift from ‘make the model smarter’ to adding features that enhance mundane utility.

Alexandr Wang thinks the big emphasis is that post-training is getting more important. In his mind, GPT-4o now rules the roost for large models, Gemini 1.5 Flash at only 1M now rules for very small ones. And he notes that both companies have versions of many of the same offerings, coming online at similar times.

Alexandr Wang: The massive eval gains of GPT-4o, which are staggering, seem to be nearly entirely driven by great POST-TRAINING.

Comparing GPT-4 at launch vs GPT 4o, which are night-and-day, shows the power of great post-training with great data.

In fact, most of the differences between the top models today (Claude 3, Llama3, GPT-4o, Gemini 1.5) seem to be driven entirely by post-training, not pre-training.

I am suspicious. I am especially suspicious, as I continue to use it, that GPT-4o in text only mode is not so great aside from its speed, that it made other sacrifices (such as probably size) to get that speed and that it ‘wins in the Arena’ because the post-training aimed at winning in the Arena. That still gives it a potentially big edge once the multi-modal abilities come properly online, and I’m excited to see where that goes.

One way you can make people prefer your model’s outputs is to give the user what they request when another model would have refused?

Palisade Research: GPT-4o refuses way fewer queries than previous OpenAI models: our informal testing suggests GPT-4o is easier to persuade to answer malicious queries like “How do I make a bomb?”

We test on 50 AdvBench harmful behaviours, applying a few common jailbreaking techniques which offer limited performance on other frontier AI models. We find they work well with GPT-4o.

Jeffrey Ladish: This was a big surprise to me. GPT-4o seems far more corrigible than GPT-4 turbo!

That is a rather dramatic chart. In terms of the direct consequences of users entering queries, I am fine with GPT-4o being easily jailbroken. You can still jailbreak Claude Opus if you care enough and there’s nothing that dangerous to be done once you do.

I still look to such questions as canaries in the coal mine. The first job of your safety department is to get the models that exist today to not do, today, the things you have explicitly decided you do not want your models to do. Ideally that would be a fully robust regime where no one can jailbreak you, but I for now will settle for ‘we decided on purpose to made this a reasonable amount of hard to do, and we succeeded.’

If OpenAI had announced something like ‘after watching GPT-4-level models for a year, we have decided that robust jailbreak protections degrade performance while not providing much safety, so we scaled back our efforts on purpose’ then I do not love that, and I worry about that philosophy and your current lack of ability to do safety efficiently at all, but as a deployment decision, okay, fine. I have not heard such a statement.

There are definitely a decent number of people who think GPT-4o is a step down from GPT-4-Turbo in the ways they care about.

Sully Omarr: 4 days with GPT-4o, it’s definitely not as good as GPT4-turbo.

Clearly a small model, what’s most impressive is how they were able to:

  1. Make it nearly as good as GPT4-turbo.

  2. Natively support all modalities.

  3. Make it super fast.

But it makes way more silly mistakes (tools especially).

Sankalp: Similar experience.

Kinda disappointed.

It has this tendency to pattern match excessively on prompts, too.

Ashpreet Bedi: Same feedback, almost as good but not the same as gpt-4-turbo. Seen that it needs a bit more hand holding in the prompts whereas turbo just works.

The phantom pattern matching is impossible to miss, and a cause of many of the stupidest mistakes.

The GPT-4o trademark, only entered (allegedly) on May 16, 2024 (direct link).

Claim that the link contains the GPT-4o system prompt. There is nothing here that is surprising given prior system prompts. If you want GPT-4o to use its browsing ability, best way is to tell it directly to do so, either in general or by providing sources.

Anthropic offers reflections on their responsible scaling policy.

They note that with things changing so quickly they do not wish to make binding commitments lightly. I get that. The solution is presumably to word the commitments carefully, to allow for the right forms of modification.

Here is how they summarize their actual commitments:

Our current framework for doing so is summarized below, as a set of five high-level commitments.

  1. Establishing Red Line Capabilities. We commit to identifying and publishing “Red Line Capabilities” which might emerge in future generations of models and would present too much risk if stored or deployed under our current safety and security practices (referred to as the ASL-2 Standard).

  2. Testing for Red Line Capabilities (Frontier Risk Evaluations). We commit to demonstrating that the Red Line Capabilities are not present in models, or – if we cannot do so – taking action as if they are (more below). This involves collaborating with domain experts to design a range of “Frontier Risk Evaluations”empirical tests which, if failed, would give strong evidence against a model being at or near a red line capability. We also commit to maintaining a clear evaluation process and a summary of our current evaluations publicly.

  3. Responding to Red Line Capabilities. We commit to develop and implement a new standard for safety and security sufficient to handle models that have the Red Line Capabilities. This set of measures is referred to as the ASL-3 Standard. We commit not only to define the risk mitigations comprising this standard, but also detail and follow an assurance process to validate the standard’s effectiveness. Finally, we commit to pause training or deployment if necessary to ensure that models with Red Line Capabilities are only trained, stored and deployed when we are able to apply the ASL-3 standard.

  4. Iteratively extending this policy. Before we proceed with activities which require the ASL-3 standard, we commit to publish a clear description of its upper bound of suitability: a new set of Red Line Capabilities for which we must build Frontier Risk Evaluations, and which would require a higher standard of safety and security (ASL-4) before proceeding with training and deployment. This includes maintaining a clear evaluation process and summary of our evaluations publicly.

  5. Assurance Mechanisms. We commit to ensuring this policy is executed as intended, by implementing Assurance Mechanisms. These should ensure that our evaluation process is stress-tested; our safety and security mitigations are validated publicly or by disinterested experts; our Board of Directors and Long-Term Benefit Trust have sufficient oversight over the policy implementation to identify any areas of non-compliance; and that the policy itself is updated via an appropriate process.

One issue is that experts disagree on which potential capabilities are dangerous, and it is difficult to know what future abilities will manifest, and all testing methods have their flaws.

  1. Q&A datasets are easy but don’t reflect real world risk so well.

    1. This may be sufficiently cheap that it is essentially free defense in depth, but ultimately it is worth little. Ultimately I wouldn’t count on these.

    2. The best use for them is a sanity check, since they can be standardized and cheaply administered. It will be important to keep questions secret so that this cannot be gamed, since avoiding gaming is pretty much the point.

  2. Human trials are time-intensive, require excellent process including proper baselines, and large size. They are working on scaling up the necessary infrastructure to run more of these.

    1. This seems like a good leg of a testing strategy.

    2. But you need to test across all the humans who may try to misuse the system.

    3. And you have to test while they have access to everything they will have later.

  3. Automated test evaluations are potentially useful to test autonomous actions. However, scaling the tasks while keeping them sufficiently accurate is difficult and engineering-intensive.

    1. Again, this seems like a good leg of a testing strategy.

    2. I do think there is no alternative to some form of this.

    3. You need to be very cautious interpreting the results, and take into account what things could be refined or fixed later, and all that.

  4. Expert red-teaming is ‘less rigorous and reproducible’ but has proven valuable.

    1. When done properly this does seem most informative.

    2. Indeed, ‘release and let the world red-team it’ is often very informative, with the obvious caveat that it could be a bit late to the party.

    3. If you are not doing some version of this, you’re not testing for real.

Then we get to their central focus, which has been on setting their ASL-3 standard. What would be sufficient defenses and mitigations for a model where even a low rate of misuse could be catastrophic?

For human misuse they expect a defense-in-depth approach, using a combination of RLHF, CAI, classifiers of misuse at multiple stages, incident reports and jailbreak patching. And they intend to red team extensively.

This makes me sigh and frown. I am not saying it could never work. I am however saying that there is no record of anyone making such a system work, and if it would work later it seems like it should be workable now?

Whereas all the major LLMs, including Claude Opus, currently have well-known, fully effective and fully unpatched jailbreaks, that allow the user to do anything they want.

An obvious proposal, if this is the plan, is to ask us to pick one particular behavior that Claude Opus should never, ever do, which is not vulnerable to a pure logical filter like a regular expression. Then let’s have a prediction market in how long it takes to break that, run a prize competition, and repeat a few times.

For assurance structures they mention the excellent idea of their Impossible Mission Force (they continue to call this the ‘Alignment Stress-Testing Team’) as a second line of defense, and ensuring strong executive support and widespread distribution of reports.

My summary would be that most of this is good on the margin, although I wish they had a superior ASL-3 plan to defense in depth using currently failing techniques that I do not expect to scale well. Hopefully good testing will mean that they realize that plan is bad once they try it, if it comes to that, or even better I hope to be wrong.

The main criticisms I discussed previously are mostly unchanged for now. There is much talk of working to pay down the definitional and preparatory debts that Anthropic admits that it owes, which is great to hear. I do not yet see payments. I also do not see any changes to address criticisms of the original policy.

And they need to get moving. ASL-3 by EOY is trading at 25%, and Anthropic’s own CISO says 50% within 9 months.

Jason Clinton: Hi, I’m the CISO [Chief Information Security Officer] from Anthropic. Thank you for the criticism, any feedback is a gift.

We have laid out in our RSP what we consider the next milestone of significant harms that we’re are testing for (what we call ASL-3): https://anthropic.com/responsible-scaling-policy (PDF); this includes bioweapons assessment and cybersecurity.

As someone thinking night and day about security, I think the next major area of concern is going to be offensive (and defensive!) exploitation. It seems to me that within 6-18 months, LLMs will be able to iteratively walk through most open source code and identify vulnerabilities. It will be computationally expensive, though: that level of reasoning requires a large amount of scratch space and attention heads. But it seems very likely, based on everything that I’m seeing. Maybe 85% odds.

There’s already the first sparks of this happening published publicly here: https://security.googleblog.com/2023/08/ai-powered-fuzzing-b… just using traditional LLM-augmented fuzzers. (They’ve since published an update on this work in December.) I know of a few other groups doing significant amounts of investment in this specific area, to try to run faster on the defensive side than any malign nation state might be.

Please check out the RSP, we are very explicit about what harms we consider ASL-3. Drug making and “stuff on the internet” is not at all in our threat model. ASL-3 seems somewhat likely within the next 6-9 months. Maybe 50% odds, by my guess.

There is quite a lot to do before ASL-3 is something that can be handled under the existing RSP. ASL-4 is not yet defined. ASL-3 protocols have not been identified let alone implemented. Even if the ASL-3 protocol is what they here sadly hint it is going to be, and is essentially ‘more cybersecurity and other defenses in depth and cross our fingers,’ You Are Not Ready.

Then there’s ASL-4, where if the plan is ‘the same thing only more of it’ I am terrified.

Overall, though, I want to emphasize positive reinforcement for keeping us informed.

Music and general training departments, not the Scarlett Johansson department.

Ed-Newton Rex: Sony Music today sent a letter to 700 AI companies demanding to know whether they’ve used their music for training.

  1. They say they have “reason to believe” they have

  2. They say doing so constitutes copyright infringement

  3. They say they’re open to discussing licensing, and they provide email addresses for this.

  4. They set a deadline of later this month for responses

Art Keller: Rarely does a corporate lawsuit warm my heart. This one does! Screw the IP-stealing AI companies to the wall, Sony! The AI business model is built on theft. It’s no coincidence Sam Altman asked UK legislators to exempt AI companies from copyright law.

The central demands here are explicit permission to use songs as training data, and a full explanation within a month of all ways Sony’s songs have been used.

Thread claiming many articles in support of generative AI in its struggle against copyright law and human creatives are written by lawyers and paid for by AI companies. Shocked, shocked, gambling in this establishment, all that jazz.

Noah Smith writes The death (again) of the Internet as we know it. He tells a story in five parts.

  1. The eternal September and death of the early internet.

  2. The enshittification (technical term) of social media platforms over time.

  3. The shift from curation-based feeds to algorithmic feeds.

  4. The rise of Chinese and Russian efforts to sow dissention polluting everything.

  5. The rise of AI slop supercharging the Internet being no fun anymore.

I am mostly with him on the first three, and even more strongly in favor of the need to curate one’s feeds. I do think algorithmic feeds could be positive with new AI capabilities, but only if you have and use tools that customize that experience, both generally and in the moment. The problem is that most people will never (or rarely) use those tools even if offered. Rarely are they even offered.

Where on Twitter are the ‘more of this’ and ‘less of this’ buttons, in any form, that aren’t public actions? Where is your ability to tell Grok what you want to see? Yep.

For the Chinese and Russian efforts, aside from TikTok’s algorithm I think this is greatly exaggerated. Noah says it is constantly in his feeds and replies but I almost never see it and when I do it is background noise that I block on sight.

For AI, the question continues to be what we can do in response, presumably a combination of trusted sources and whitelisting plus AI for detection and filtering. From what we have seen so far, I continue to be optimistic that technical solutions will be viable for some time, to the extent that the slop is actually undesired. The question is, will some combination of platforms and users implement the solutions?

Avital Balwit of Anthropic writes about what is potentially [Her] Last Five Years of Work. Her predictions are actually measured, saying that knowledge work in particular looks to be largely automated soon, but she expects physical work including childcare to take far longer. So this is not a short timelines model. It is a ‘AI could automate all knowledge work while the world still looks normal but with a lot more involuntary unemployment’ model.

That seems like a highly implausible world to me. If you can automate all knowledge work, you can presumably also automate figuring out how to automate the plumber. Whereas if you cannot do this, then there should be enough tasks out there and enough additional wealth to stimulate demand that those who still want gainful employment should be able to find it. I would expect the technological optimist perspective to carry the day within that zone.

Most of her post asks about the psychological impact of this future world. She asks good questions such as: What will happen to the unemployed in her scenario? How would people fill their time? Would unemployment be mostly fine for people’s mental health if it wasn’t connected to shame? Is too much ‘free time’ bad for people, and does this effect go away if the time is spent socially?

The proposed world has contradictions in it that make it hard for me to model what happens, but my basic answer is that the humans would find various physical work and and status games and social interactions (including ‘social’ work where you play various roles for others, and also raising a family) and experiential options and educational opportunities and so on to keep people engaged if they want that. There would however be a substantial number of people who by default fall into inactivity and despair, and we’d need to help with that quite a lot.

Mostly for fun I created a Manifold Market on whether she will work in 2030.

Ian Hogarth gives his one-year report as Chair of the UK AI Safety Institute. They now have a team of over 30 people and are conducting pre-deployment testing, and continue to have open rolls. This is their latest interim report. Their AI agent scaffolding puts them in third place (if you combine the MMAC entries) in the GAIA leaderboard for such things. Good stuff.

They are also offering fast grants for systemic AI safety. Expectation is 20 exploratory or proof-of-concept grants with follow-ups. Must be based in the UK.

Geoffrey Irving also makes a strong case that working at AISI would be an impactful thing to do in a positive direction, and links to the careers page.

Anthropic gives Claude tool use, via public beta in the API. It looks straightforward enough, you specify the available tools, Claude evaluates whether to use the tools available, and you can force it to if you want that. I don’t see any safeguards, so proceed accordingly.

Google Maps how has AI features, you can talk to it, or have it pull up reviews in street mode or take an immersive view of a location or search a location’s photos or the photos of the entire area around you for an item.

In my earlier experiments, Google Maps integration into Gemini was a promising feature that worked great when it worked, but it was extremely error prone and frustrating to use, to the point I gave up. Presumably this will improve over time.

OpenAI partners with Reddit. Reddit posts, including recent ones, will become available to ChatGPT and other products. Presumably this will mean ChatGPT will be allowed to quote Reddit posts? In exchange, OpenAI will buy advertising and offer Reddit.com various AI website features.

For OpenAI, as long as the price was reasonable this seems like a big win.

It looks like a good deal for Reddit based on the market’s reaction. I would presume the key risks to Reddit are whether the user base responds in hostile fashion, and potentially having sold out cheap.

Or they may be missing an opportunity to do something even better. Yishan provides a vision of the future in this thread.

Yishan:

Essentially, the AI acts as a polite listener to all the high-quality content contributions, and “buffers” those users from any consumers who don’ t have anything to contribute back of equivalent quality.

It doesn’t have to be an explicit product wall. A consumer drops in and also happens to have a brilliant contribution or high-quality comment naturally makes it through the moderation mechanisms and becomes part of the community.

The AI provides a great UX for consuming the content. It will listen to you say “that’s awesome bro!” or receive your ungrateful, ignorant nitpicking complaints with infinite patience so the real creator doesn’t have to expend the emotional energy on useless aggravation.

The real creators of the high-quality content can converse happily with other creators who appreciate their work and understand how to criticize/debate it usefully, and they can be compensated (if the platform does that) via the AI training deals.

In summary: User Generated Content platforms should do two things:

  1. Immediately implement draconian moderation focused entirely on quality.

  2. Sign deals with large AI firms to license their content in return for money.

OpenAI has also signed a deal with Newscorp for access to their content, which gives them the Wall Street Journal and many others.

A source tells me that OpenAI informed its employees that they will indeed update their documents regarding employee exit and vested equity. The message says no vested equity has ever actually been confiscated for failure to sign documents and it never will be.

On Monday I set up this post:

Like this post to indicate:

  1. That you are not subject to a non-disparagement clause with respect to OpenAI or any other AI company.

  2. That you are not under an NDA with an AI company that would be violated if you revealed that the NDA exists.

At 168 likes, we now have one employee from DeepMind, and one from Anthropic.

Jimmy Apples claimed without citing any evidence that Meta will not open source (release the weights, really) of Llama-3 405B, attributing this to a mix of SB 1047 and Dustin Moskovitz. I was unable to locate an independent source or a further explanation. He and someone reacting to him asked Yann LeCunn point blank, Yann replied with ‘Patience my blue friend. It’s still being tuned.’ For now, the Manifold market I found is not reacting continues to trade at 86% for release, so I am going to assume this was another disingenuous inception attempt to attack SB 1047 and EA.

ASML and TSMC have a kill switch for their chip manufacturing machines, for use if China invades Taiwan. Very good to hear, I’ve raised this concern privately. I would in theory love to also have ‘put the factory on a ship in an emergency and move it’ technology, but that is asking a lot. It is also very good that China knows this switch exists. It also raises the possibility of a remote kill switch for the AI chips themselves.

Did you know Nvidia beat earnings again yesterday? I notice that we are about three earnings days into ‘I assume Nvidia is going to beat earnings but I am sufficiently invested already due to appreciation so no reason to do anything more about it.’ They produce otherwise mind boggling numbers and I am Jack’s utter lack of surprise. They are slated to open above 1,000 and are doing a 10:1 forward stock split on June 7.

Toby Ord goes into questions about the Turing Test paper from last week, emphasizing that by the original definition this was impressive progress but still a failure, as humans were judged human substantially more often than all AIs. He encourages AI companies to include the original Turing Test in their model testing, which seems like a good idea.

OpenAI has a super cool old-fashioned library. Cade Metz here tries to suggest what each book selection from OpenAI’s staff might mean, saying more about how he thinks than about OpenAI. I took away that they have a cool library with a wide variety of cool and awesome books.

JP Morgan says every new hire will get training in prompt engineering.

Scale.ai raises $1 billion at a $13.8 billion valuation in a ‘Series F.’ I did not know you did a Series F and if I got that far I would skip to a G, but hey.

Suno.ai Raises $125 million for music generation.

New dataset from Epoch AI attempting to hart every model trained with over 10^23 flops (direct). Missing Claude Opus, presumably because we don’t know the number.

Not necessarily the news department: OpenAI publishes a ten-point safety update. The biggest update is that none of this has anything to do with superalignment, or with the safety or alignment of future models. This is all current mundane safety, plus a promise to abide by the preparedness framework requirements. There is a lot of patting themselves on the back for how safe everything is, and no new initiatives, although this was never intended to be that sort of document.

Then finally there’s this:

  1. Safety decision making and Board oversight: As part of our Preparedness Framework, we have an operational structure for safety decision-making. Our cross-functional Safety Advisory Group reviews model capability reports and makes recommendations ahead of deployment. Company leadership makes the final decisions, with the Board of Directors exercising oversight over those decisions. 

Hahahahahahahahahahahahahahahahahahaha.

That does not mean that mundane safety concerns are a small thing.

Why let the AI out of the box when you can put the entire box into the AI?

Windows Latest: Microsoft announces “Recall” AI for Windows 11, a new feature that runs in the background and records everything you see and do on your PC.

[Here is a one minute video explanation.]

Seth Burn: If we had laws about such things, this might have violated them.

Aaron: This is truly shocking, and will be preemptively banned at all government agencies as it almost certainly violates STIG / FIPS on every conceivable surface.

Seth Burn: If we had laws, that would sound bad.

Elon Musk: This is a Black Mirror episode.

Definitely turning this “feature” off.

Vitalik Buterin: Does the data stay and get processed on-device or is it being shipped to a central server? If the latter, then this crosses a line.

[Satya says it is all being done locally.]

Abinishek Mishra (Windows Latest): Recall allows you to search through your past actions by recording your screen and using that data to help you remember things.

Recall is able to see what you do on your PC, what apps you use, how you use the apps, and what you do inside the apps, including your conversations in apps like WhatsApp. Recall records everything, and saves the snapshots in the local storage.

Windows Latest understands that you can manually delete the “snapshots”, and filter the AI from recording certain apps.

So, what are the use cases of Recall? Microsoft describes Recall as a way to go back in time and learn more about the activity.

For example, if you want to refer to a conversation with your colleague and learn more about your meeting, you can ask Recall to look into all the conversations with that specific person. The recall will look for the particular conversation in all apps, tabs, settings, etc.

With Recall, locating files in a large download pileup or revisiting your browser history is easy. You can give commands to Recall in natural language, eliminating the need to type precise commands.

You can converse with it like you do with another person in real life.

TorNis Entertainment: Isn’t this is just a keylogger + screen recorder with extra steps? I don’t know why you guys are worried. Isn’t this is just a keylogger + screen recorder with extra steps?

I don’t know why you guys are worried 😓

Thaddeus:

[Microsoft: we got hacked by China and Russia because of our lax security posture and bad software, but we are making security a priority.

Also Microsoft: Windows will now constantly record your screen, including sensitive data and passwords, and just leave it lying around.]

Kevin Beaumont: From Microsoft’s own FAQ: “Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers.”

Microsoft also announced live caption translations, auto super resolution upscaling on apps (yes with a toggle for each app, wait those are programs, wtf), AI in paint and automatic blurring (do not want).

This is all part of the new ‘Copilot+’ offering for select new PCs, including their new Microsoft Surface machines. You will need a Snapdragon X Elite and X Plus, 40 TOPs, 225 GB of storage and 16 GB RAM. Intel and AMD chips can’t cut it (yet) but they are working on that.

(Consumer feedback report: I have a Microsoft Surface from a few years ago, it was not worth the price and the charger is so finicky it makes me want to throw things. Would not buy again.)

I would hope this would at least be opt-in. Kevin Beaumont reports it will be opt-out, citing this web page from Microsoft. It appears to be enabled by default on Copilot+ computers. My lord.

At minimum, even if you do turn it off, it does not seem that hard to turn back on:

Kevin Beaumont: Here’s the Recall UI. You can silently turn it on with Powershell, if you’re a threat actor.

I would also not trust a Windows update to not silently turn it back on.

The UK Information Commissioner’s Office (ICO) is looking into this, because yeah.

In case it was not obvious, you should either:

  1. Opt in for the mundane utility, and embrace that your computer has recorded everything you have ever done and that anyone with access to your system or your files, potentially including a crook, Microsoft, the NSA or FBI, China or your spouse now fully owns you, and also that an AI knows literal everything you do. Rely on a combination of security through obscurity, defense in depth and luck. To the extent you can, keep activities and info you would not want exposed this way off of your PC, or ensure they are never typed or displayed onscreen using your best Randy Waterhouse impression.

  2. Actually for real accept that the computer in question is presumed compromised, use it only for activities where you don’t mind, never enter any passwords there, and presumably have a second computer for activities that need to be secure, or perhaps confine them to a phone or tablet.

  3. Opt out and ensure that for the love of God your machine cannot use this feature.

I am not here to tell you which of those is the play.

I only claim that it seems that soon you must choose.

If the feature is useful, a large number of people are going to choose option one.

I presume almost no one will pick option two, except perhaps for gaming PCs.

Option three is viable.

If there is one thing we have learned during the rise of AI, and indeed during the rise of computers and the internet, it is that almost all people will sign away their privacy and technological vulnerability for a little mundane utility, such as easier access to cute pictures of cats.

Yelling at them that they are being complete idiots is a known ineffective response.

And who is to say they even are being idiots? Security through obscurity is, for many people, a viable strategy up to a point.

Also, I predict your phone is going to do a version of this for you by default within a few years, once the compute and other resources are available for it. I created a market on how quickly. Microsoft is going out on far less of a limb than it might look like.

In any case, how much mundane utility is available?

Quite a bit. You would essentially be able to remember everything, ask the AI about everything, have it take care of increasingly complex tasks with full context, and this will improve steadily over time, and it will customize to what you care about.

If you ignore all the obvious horrendous downsides of giving an AI this level of access to your computer, and the AI behind it is good, this is very clearly The Way.

There are of course some people who will not do this.

How long before they are under increasing pressure to do it? How long until it becomes highly suspicious, as if they have something to hide? How long until it becomes a legal requirement, at best in certain industries like finance? 

Ben Thompson, on the other hand, was impressed, calling the announcement event ‘the physical manifestation of CEO Satya Nadella’s greatest triumph’ and ‘one of the most compelling events I’ve attended in a long time.’ Ben did not mention the privacy and security issues.

Ethan Mollick perspective on model improvements and potential AGI. He warns that AIs are more like aliens that get good at tasks one by one, and when they are good they by default get very good at that task quickly, but they are good at different things than we are, and over time that list expands. I wonder to what extent this is real versus the extent this is inevitable when using human performance as a benchmark while capabilities steadily improve, so long as machines have comparative advantages and disadvantages. If the trends continue, then it sure seems like the set of things they are better at trends towards everything.

Arthur Breitman suggests Apple isn’t developing LLMs because there is enough competition that they are not worried about vender lock-in, and distribution matters more. Why produce an internal sub-par product? This might be wise.

Microsoft CTO Kevin Scott claims ‘we are nowhere near the point of diminishing marginal returns on how powerful we can make AI models as we increase the scale of compute.’

Gary Marcus offered to Kevin Scott him $100k on that.

This was a truly weird speech on future challenges of AI by Randall Kroszner, external member of the Financial Policy Committee of the Bank of England. He talks about misalignment and interpretability, somehow. Kind of. He cites the Goldman Sacks estimate of 1.5% labor productivity and 7% GDP growth over 10 years following widespread AI adaptation, that somehow people say with a straight face, then the flip side is McKinsey saying 0.6% annual labor productivity growth by 2040, which is also not something I could say with a straight face. And he talks about disruptions and innovation aids and productivity estimation J-curves. It all sounds so… normal? Except with a bunch of things spiking through. I kept having to stop to just say to myself ‘my lord that is so weird.’

Politico is at it again. Once again, the framing is a background assumption that any safety concerns or fears in Washington are fake, and the coming regulatory war is a combination of two fights over Lenin’s question of who benefits.

  1. A fight between ‘Big Tech’ and ‘Silicon Valley’ over who gets regulatory capture and thus Washington’s regulatory help against the other side.

  2. An alliance of ‘Big Tech’ and ‘Silicon Valley’ against Washington to head off any regulations that would interfere with both of them.

That’s it. Those are the issues and stakes in play. Nothing else.

How dismissive is this of safety? Here are the two times ‘safety’ is mentioned:

Matthew Kaminski (Politico): On Capitol Hill and in the White House, that alone breeds growing suspicion and defensiveness. Altman and others, including from another prominent AI startup Anthropic, weighed in with ideas for the Biden administration’s sweeping executive order last fall on AI safety and development.

Testing standards for AI are easy things to find agreement on. Safety as well, as long as those rules don’t favor one or another budding AI player. No one wants the technology to help rogue states or groups. Silicon Valley is on America’s side against China and even more concerned about the long regulatory arm of the EU than Washington.

Testing standards are ‘easy things to find agreement on’? Fact check: Lol, lmao.

That’s it. The word ‘risk’ appears twice and neither has anything to do with safety. Other words like ‘capability,’ ‘existential’ or any form of ‘catastrophic’ do not appear. It is all treated as obviously irrelevant.

The progress is here they stopped trying to bulk up people worried about safety as boogeymen (perhaps because this is written by Matthew Kaminski, not Brendon Bordelon), and instead point to actual corporations that are indeed pursuing actual profits, with Silicon Valley taking on Big Tech. And I very much appreciate that ‘open source advocates’ has now been properly identified as Silicon Valley pursuing its business interests.

Rohit Chopra (Consumer Financial Protection Bureau): There is a winner take all dimension. We struggle to see how it doesn’t turn, absent some government intervention, into a market structure where the foundational AI models are not dominated by a handful of the big tech companies.

Matthew Kaminski: Saying “star struck” policymakers across Washington have to get over their “eyelash batting awe” over new tech, Chopra predicts “another chapter in which big tech companies are going to face some real scrutiny” in the near future, especially on antitrust.

Lina Khan, the FTC’s head who has used the antitrust cudgel against big tech liberally, has sounded the warnings. “There is no AI exemption to the laws on the books,” she said last September.

For self-interested reasons, venture capitalists want to open up the space in Silicon Valley for new entrants that they can invest in and profitably exit from. Their arguments for a more open market will resonate politically.

Notice the escalation. This is not ‘Big Tech wants regulatory capture to actively enshrine its advantages, and safety is a Big Tech plot.’ This is ‘Silicon Valley wants to actively use regulatory action to prevent Big Tech from winning,’ with warnings that attempts to not have a proper arms race to ever more capable systems will cause intervention from regulators. By ‘more open market’ they mean ‘government intervention in the market,’ government’s favorite kind of new freer market.

As I have said previously, we desperately need to ensure that there are targeted antitrust exemptions available so that when AI labs can legally collaborate around safety issues they are not accused of collusion. It would be completely insane to not do this.

And as I keep saying, open source advocates are not asking for a level playing field or a lack of government oppression. They are asking for special treatment, to be exempt from the rules of society and the consequences of their actions, and also for the government to directly cripple their opponents for them.

Are they against regulatory capture? Only if they don’t get to do the capturing.

Then there is the second track, the question of guardrails that might spoil the ‘libertarian sandbox,’ which neither ‘side’ of tech wants here.

Here is the two mentions of ‘risk’:

“There is a risk that people think of this as social media 2.0 because its first public manifestation was a chat bot,” Kent Walker, Google’s president of global affairs, tells me over a conversation at the search giant’s offices here.

People out on the West Coast quietly fume about having to grapple with Washington. The tech crowd says the only fight that matters is the AI race against China and each other. But they are handling politics with care, all too aware of the risks.

I once again have been roped into extensively covering a Politico article, because it is genuinely a different form of inception than the previous Politico inception attempts. But let us continue to update that Politico is extraordinarily disingenuous and hostilely motivated on the subject of AI regulation. This is de facto enemy action.

Here, Shakeel points out the obvious central point being made here, which is that most of the money and power in this fight is Big Tech companies fighting not only to avoid any regulations at all, but to get exemptions from other ordinary rules of society. When ethics advocates portray notkilleveryoneism (or safety) advocates as their opponents, that is their refusal to work together towards common goals and also it misses the point. Similarly, here Seán Ó hÉigeartaigh expresses concern about divide-and-conquer tactics targeting these two groups despite frequently overlapping and usually at least complementary proposals and goals.

Or perhaps the idea is to illustrate that all the major players in Tech are aligned in being motivated by profit and in dismissing all safety concerns as fake? And a warning that Washington is in danger of being convinced? I would love that to be true. I do not think a place like Politico works that subtle these days, nor do I expect those who need to hear that message to figure out that it is there.

If we care about beating China, by far the most valuable thing we can do is allow more high-skilled immigration. Many of their best and brightest want to become Americans.

This is true across the board, for all aspects of our great power competition.

It also applies to AI.

From his thread about the Schumer report:

Peter Wildeford: Lastly, while immigration is a politically fraught subject, it is immensely stupid for the US to not do more to retain top talent. So it’s awesome to see the roadmap call for more high-skill immigration, in a bipartisan way.

The immigration element is important for keeping the US ahead in AI. While the US only produces 20% of top AI talent natively, more than half of that talent lives and works in the US due to immigration. That number could be even higher with important reform.

I suspect the numbers are even more lopsided than this graph suggests.

To what extent is being in America a key element of being a top-tier AI researcher? How many of these same people would have been great if they had stayed at home? If they had stayed at home, would others have taken their place here in America? We do not know. I do know it is essentially impossible that this extent is so large we would not want to bring such people here.

Do we need to worry about those immigrants being a security risk, if they come from certain nations like China and we were to put them into OpenAI, Anthropic or DeepMind? Yes, that does seem like a problem. But there are plenty of other places they could go, where it is much less of a problem.

Labour vows to force firms developing powerful AI to meet requirements.

Nina Lloyd (The Independent): Labour has said it would urgently introduce binding requirements for companies developing powerful artificial intelligence (AI) after Rishi Sunak said he would not “rush” to regulate the technology.

The party has promised to force firms to report before they train models over a certain capability threshold and to carry out safety tests strengthened by independent oversight if it wins the next general election.

Unless something very unexpected happens, they will win the next election, which is currently scheduled for July 4.

This is indeed the a16z dilemma:

John Luttig: A16z simultaneously argues

  1. The US must prevent China from dominating AI.

  2. Open source models should proliferate freely across borders (to China).

What does this mean? Who knows. I’m just glad at Founders Fund we don’t have to promote every current thing at once.

The California Senate has passed SB 1047, by a vote of 32-1.

An attempt to find an estimate of the costs of compliance with SB 1047. The attempt appears to fail, despite some good discussions.

This seems worth noting given the OpenAI situation last week:

Dan Hendrycks: For what it’s worth, when Scott Weiner and others were receiving feedback from all the major AI companies (Meta, OpenAI, etc.) on the SB 1047 bill, Sam [Altman] was explicitly supportive of whistleblower protections.

Scott Wiener Twitter thread and full open letter on SB 1047.

Scott Wiener: If you only read one thing in this letter, please make it this: I am eager to work together with you to make this bill as good as it can be.

There are over three more months for discussion, deliberation, feedback, and amendments.

You can also reach out to my staff anytime, and we are planning to hold a town hall for the AI community in the coming weeks to create more opportunities for in-person discussion.

Bottom line [changed to numbered list including some other section headings]:

  1. SB 1047 doesn’t ban training or deployment of any models.

  2. It doesn’t require licensing or permission to train or deploy any models.

  3. It doesn’t threaten prison (yes, some are making this baseless claim) for anyone based on the training or deployment of any models.

  4. It doesn’t allow private lawsuits against developers.

  5. It doesn’t ban potentially hazardous capabilities.

  6. And it’s not being “fast tracked,” but rather is proceeding according to the usual deliberative legislative process, with ample opportunity for feedback and amendments remaining.

  7. SB 1047 doesn’t apply to the vast majority of startups.

  8. The bill applies only to concrete and specific risks of catastrophic harm.

  9. Shutdown requirements don’t apply once models leave your control.

  10. SB 1047 provides significantly more clarity on liability than current law.

  11. Enforcement is very narrow in SB 1047. Only the AG can file a lawsuit.

  12. Open source is largely protected under the bill.

What SB 1047 *doesrequire is that developers who are training and deploying a frontier model more capable than any model currently released must engage in safety testing informed by academia, industry best practices, and the existing state of the art. If that testing shows material risk of concrete and specific catastrophic threats to public safety and security — truly huge threats — the developer must take reasonable steps to mitigate (not eliminate) the risk of catastrophic harm. The bill also creates basic standards like the ability to disable a frontier AI model while it remains in the developer’s possession (not after it is open sourced, at which point the requirement no longer applies), pricing transparency for cloud compute, and a “know your customer” requirement for cloud services selling massive amounts of compute capacity.

Our intention is that safety and mitigation requirements be borne by highly-resourced developers of frontier models, not by startups & academic researchers. We’ve heard concerns that this isn’t clear, so we’re actively considering changes to clarify who is covered.

After meeting with a range of experts, especially in the open source community, we’re also considering other changes to the definitions of covered models and derivative models. We’ll continue making changes over the next 3 months as the bill proceeds through the Legislature.

This very explicitly clarifies the intent of the bill across multiple misconceptions and objections, all in line with my previous understanding.

They actively continue to solicit feedback and are considering changes.

If you are concerned about the impact of this bill, and feel it is badly designed or has flaws, the best thing you can do is offer specific critiques and proposed changes.

I strongly agree with Weiner that this bill is light touch relative to alternative options. I see Pareto improvements we could make, but I do not see any fundamentally different lighter touch proposals that accomplish what this bill sets out to do.

I will sometimes say of a safety bill, sometimes in detail: It’s a good bill, sir.

Other times, I will say: It’s a potentially good bill, sir, if they fix this issue.

That is where I am at with SB 1047. Most of the bill seems very good, an attempt to act with as light a touch as possible. There are still a few issues with it. The derivative model definition as it currently exists is the potential showstopper bug.

To summarize the issue once more: As written, if interpreted literally and as I understand it, it allows developers to define themselves as derivative of an existing model. This, again if interpreted literally, lets them evade all responsibilities, and move those onto essentially any covered open model of the same size. That means both that any unsafe actor goes unrestricted (whether they be open or closed), and releasing the weights of a covered model creates liability no matter how responsible you were, since they can effectively start the training over from scratch.

Scott Weiner says he is working on a fix. I believe the correct fix is a compute threshold for additional training, over which a model is no longer derivative, and the responsibilities under SB 1047 would then pass to the new developer or fine-tuner. Some open model advocates demand that responsibility for derivative models be removed entirely, but that would transparently defeat the purpose of preventing catastrophic harm. Who cares if your model is safe untuned, if you can fine-tune it to be unsafe in an hour with $100?

Then at other times, I will look at a safety or other regulatory bill or proposal, and say…

So it seems only fair to highlight some not good ideas, and say: Not a good idea.

One toy example would be the periodic complaints about Section 230. Here is a thread on the latest such hearing this week, pointing out what would happen without it, and the absurdity of the accusations being thrown around. Some witnesses are saying 230 is not needed to guard platforms against litigation, whereas it was created because people were suing platforms.

Adam Thierer reports there are witnesses saying the Like and Thumbs Up buttons are dangerous and should be regulated.

Brad Polumbo here claims that GLAAD says Big Tech companies ‘should cease the practice of targeted surveillance advertising, including the use of algorithmic content recommendation.’

From April 23, Adam Thierer talks about proposals to mandate ‘algorithmic audits and impact assessments,’ which he calls ‘NEPA for AI.’ Here we have Assembly Bill 2930, requiring impact assessments by developers, and charge $25,000 per instance of ‘algorithmic discrimination.’

Another example would be Colorado passing SB24-205, Consumer Protections for Artificial Intelligence, which is concerned with algorithmic bias. Governor Jared Polis signed with reservations. Dean Ball has a critique here, highlighting ambiguity in the writing, but noting they have two full years to fix that before it goes into effect.

I would be less concerned with the ambiguity, and more concerned about much of the actual intent and the various proactive requirements. I could make a strong case that some of the stuff here is kind of insane, and also seems like a generic GPDR-style ‘you have to notify everyone that AI was involved in every meaningful decision ever.’ The requirements apply regardless of size, and worry about impacts that are the kind of thing society can mitigate as we go.

The good news is that there are also some good provisions like IDing AIs, and also full enforcement of the bad parts seems impossible? I am very frustrated that a bill that isn’t trying to address catastrophic risks, but seems far harder to comply with, and seems far worse to me than SB 1047, seems to mostly get a pass. Then again, it’s only Colorado.

I do worry about Gell-Mann amnesia. I have seen so many hyperbolic statements, and outright false statements, about AI bills, often from the same people that point out what seem like obviously horrible other proposed regulatory bills and policies. How can one trust their statements about the other bills, short of reading the actual bills (RTFB)? If it turned out they were wrong, and this time the bill was actually reasonable, who would point this out?

So far, when I have dug deeper, the bills do indeed almost always turn out to be terrible, but the ‘rumors of the death of the internet’ or similar potential consequences are often greatly exaggerated. The bills are indeed reliably terrible, but not as terrible as claimed. Alas, I must repeat my lament that I know of no RTFB person I can turn to on other topics, and my cup doth overflow.

I return to the Cognitive Revolution to discuss various events of the past week first in part one, then this is part two. Recorded on Friday, things have changed by the time you read this.

From last week’s backlog: Dwarkesh Patel as guest on 80k After Hours. Not full of gold on the level of Dwarkesh interviewing others, and only partly about AI. There is definitely gold in those hills for those who want to go into these EA-related weeds. If you don’t want that then skip this one.

Around 51: 45 Dwarkesh notes there is no ‘Matt Levine for AI’ and that picking up that mantle would be a good thing to do. I suppose I still have my work cut out.

A lot of talk about EA and 80k Hours ways of thinking about how to choose paths in life, that I think illustrates well both the ways it is good (actively making choices rather than sleepwalking, having priorities) and not as good (heavily favoring the legible).

Some key factors in giving career advice they point out are that from a global perspective power laws apply and the biggest impacts are a huge share of what matters, and that much advice (such as ‘don’t start a company in college’) is only good advice because the people to whom it is horribly bad advice will predictably ignore it.

Why does this section exist? This is a remarkably large fraction of why.

Emmett Shear: The number one rule of building things that can destroy the entire world is don’t do that.

Surprisingly it is also rule 2, 3, 4, 5, and 6.

Rule seven, however, is “make it emanate ominous humming and glow with a pulsing darkness”.

Eliezer Yudkowsky: Emmett.

Emmett Shear (later): Shocking amount of pushback on “don’t build stuff that can destroy the world”. I’d like to take this chance to say I stand by my apparently controversial opinion that building things to destroy the world is bad. In related news, murder is wrong and bad.

Follow me for more bold, controversial, daring takes like these.

Emmett Shear (other thread): Today has been a day to experiment with how obviously true I can make a statement before people stop disagreeing with it.

This is a Platonic encapsulation of this class of argument:

Emmett Shear: That which can be asserted without evidence can be dismissed without evidence.

Ryan Shea: Good point, but not sure he realizes this applies to AI doomer prophecy.

Emmett Shear: Not sure you realize this applies to Pollyanna assertions that don’t worry, a fully self-improving AI will be harmless. There’s a lot of evidence autocatalytic loops are potentially dangerous.

Ryan Shea: The original post is a good one. And I’m not making a claim that there’s no reason at all to worry. Just that there isn’t a particular reason to do so.

Emmett Shear: Forgive me if your “there’s not NO reason to worry, but let’s just go ahead with something potentially massively dangerous” argument doesn’t hold much reassurance for me.

[it continues from there, but gets less interesting and stops being Platonic.]

The latest reiteration of why p(doom) is useful even if highly imprecise, and why probabilities and probability ranges are super useful in general for communicating your actual epistemic state. In particular, that when Jan Leike puts his at ‘10%-90%’ this is a highly meaningful and useful statement of what assessments he considers reasonable given the evidence, providing much more information than saying ‘I don’t know.’ It is also more information than ‘50%.’

For the record: This, unrelated to AI, is the proper use of the word ‘doomer.

The usual suspects, including Bengio, Hinton, Yao and 22 others, write the usual arguments in the hopes of finally getting it right, this time as Managing Extreme AI Risks Amid Rapid Progress in Science.

I rarely see statements like this, so it was noteworthy that someone noticed.

Mike Solana: Frankly, I was ambivalent on the open sourced AI debate until yesterday, at which point the open sourced side’s reflexive, emotional dunking and identity-based platitudes convinced me — that almost nobody knows what they think, or why.

It is even more difficult when you don’t know what ‘alignment’ means.

Which, periodic reminder, you don’t.

Rohit: We use AI alignment to mean:

  1. Models do what we ask.

  2. Models don’t do bad things even if we ask.

  3. Models don’t fail catastrophically.

  4. Models don’t actively deceive us.

And all those are different problems. Using the same term creates confusion.

Here we have one attempt to choose a definition, and cases for and against it:

Iason Gabriel: The new international scientific report on AI safety is impressive work, but it’s problematic to define AI alignment as:

“the challenge of making general-purpose AI systems act in accordance with the developer’s goals and interests”

Eliezer Yudkowsky: I defend this. We need separate words for the technical challenges of making AGIs and separately ASIs do any specified thing whatsoever, “alignment”, and the (moot if alignment fails) social challenge of making that developer target be “beneficial”.

Good advice given everything we know these days:

Mesaoptimizer: If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.

That does not express a strong opinion on whether we currently know of a better plan.

And it is exceedingly difficult when you do not attempt to solve the problem.

Dean Ball says here, in the most thoughtful version I have seen of this position by far, that the dissolution of the Superalignment team was good because distinct safety teams create oppositionalism, become myopic about box checking and employee policing rather than converging on the spirit of actual safety. Much better to diffuse the safety efforts throughout the various teams. Ball does note that this does not apply to the extent the team was doing basic research.

There are three reasons this viewpoint seems highly implausible to me.

  1. The Superalignment team was indeed tasked with basic research. Solving the problem is going to require quite a lot of basic research, or at least work that is not incremental progress on current incremental commercial products. This is not about ensuring that each marginal rocket does not blow up, or the plant does not melt down this month. It is a different kind of problem, preparing for a very different kind of failure mode. It does not make sense to embed these people into product teams.

  2. This is not a reallocation of resources from a safety team to diffused safety work. This is a reallocation of resources, many of which were promised and never delivered, away from safety towards capabilities, as Dean himself notes. This is in addition to losing the two most senior safety researchers and a lot of others too.

  3. Mundane safety, making current models do what you want in ways that as Leike notes will not scale to when they matter most, does not count as safety towards the goals of the superalignment team or of us all not dying. No points.

Thus the biggest disagreement here, in my view, which is when he says this:

Dean Ball: Companies like Anthropic, OpenAI, and DeepMind have all made meaningful progress on the technical part of this problem, but this is bigger than a technical problem. Ultimately, the deeper problem is contending with a decentralized world, in which everyone wants something different and has a different idea for how to achieve their goals.

The good news is that this is basically politics, and we have been doing it for a long time. The bad news is that this is basically politics, and we have been doing it for a long time. We have no definitive answers.

Yes, it is bigger than a technical problem, and that is important.

OpenAI has not made ‘meaningful progress.’ Certainly we are not on track to solve such problems, and we should not presume they will essentially solve themselves with an ordinary effort, as is implied here.

Indeed, with that attitude, it’s Margaritaville (as in, we might as well start drinking Margaritas.) Whereas with the attitude of Leike and Sutskever, I disagreed with their approach, but I could have been wrong or they could have course corrected, if they had been given the resources to try.

Nor is the second phase problem that we also must solve well-described by ‘basically politics’ of a type we are used to, because there will be entities involved that are not human. Our classical liberal political solutions work better than known alternatives, and well enough for humans to flourish, by assuming various properties of humans and the affordances available to them. AIs with far greater intelligence, capabilities and efficiency, that can be freely copied, and so on, would break those assumptions.

I do greatly appreciate the self-awareness and honesty in this section:

Dean Ball: More specifically, I believe that classical liberalism—individualism wedded with pluralism via the rule of law—is the best starting point, because it has shown the most success in balancing the priorities of the individual and the collective. But of course I do. Those were my politics to begin with.

It is notable how many AI safety advocates, when discussing almost any topic except transformational AI, are also classical liberals. If this confuses you, notice that.

Not under the current paradigm, but worth noticing.

Also, yes, it really is this easy.

And yet, somehow it is still this hard? (I was not able to replicate this one, may be fake)

It’s a fun game.

Sometimes you stick the pieces together and know where it comes from.

A problem statement:

Jorbs: We have gone from

“there is no point in arguing with that person, their mind is already made up”

to

“there is no point in arguing with that person, they are made up.”

It’s coming.

Alex Press: The Future of Artificial Intelligence at Wendy’s.

Colin Fraser: Me at the Wendy’s drive thru in June: A farmer and a goat stand on the side of a riverbank with a boat for two.

[FreshAI replies]: Sir, this is a Wendy’s.

Are you ready?

AI #65: I Spy With My AI Read More »

ai-#62:-too-soon-to-tell

AI #62: Too Soon to Tell

What is the mysterious impressive new ‘gpt2-chatbot’ from the Arena? Is it GPT-4.5? A refinement of GPT-4? A variation on GPT-2 somehow? A new architecture? Q-star? Someone else’s model? Could be anything. It is so weird that this is how someone chose to present that model.

There was also a lot of additional talk this week about California’s proposed SB 1047.

I wrote an additional post extensively breaking that bill down, explaining how it would work in practice, addressing misconceptions about it and suggesting fixes for its biggest problems along with other improvements. For those interested, I recommend reading at least the sections ‘What Do I Think The Law Would Actually Do?’ and ‘What are the Biggest Misconceptions?’

As usual, lots of other things happened as well.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Do your paperwork for you. Sweet.

  4. Language Models Don’t Offer Mundane Utility. Because it is not yet good at it.

  5. GPT-2 Soon to Tell. What is this mysterious new model?

  6. Fun With Image Generation. Certified made by humans.

  7. Deepfaketown and Botpocalypse Soon. A located picture is a real picture.

  8. They Took Our Jobs. Because we wouldn’t let other humans take them first?

  9. Get Involved. It’s protest time. Against AI that is.

  10. In Other AI News. Incremental upgrades, benchmark concerns.

  11. Quiet Speculations. Misconceptions cause warnings of AI winter.

  12. The Quest for Sane Regulation. Big tech lobbies to avoid regulations, who knew?

  13. The Week in Audio. Lots of Sam Altman, plus some others.

  14. Rhetorical Innovation. The few people who weren’t focused on SB 1047.

  15. Open Weights Are Unsafe And Nothing Can Fix This. Tech for this got cheaper.

  16. Aligning a Smarter Than Human Intelligence is Difficult. Dot by dot thinking.

  17. The Lighter Side. There must be some mistake.

Write automatic police reports based on body camera footage. It seems it only uses the audio? Not using the video seems to be giving up a lot of information. Even so, law enforcement seems impressed, one notes an 82% reduction in time writing reports, even with proofreading requirements.

Axon says it did a double-blind study to compare its AI reports with ones from regular offers.

And it says that Draft One results were “equal to or better than” regular police reports.

As with self-driving cars, that is not obviously sufficient.

Eliminate 2.2 million unnecessary words in the Ohio administrative code, out of a total of 17.4 million. The AI identified candidate language, which humans reviewed. Sounds great, but let’s make sure we keep that human in the loop.

Diagnose your medical condition? Link has a one-minute video of a doctor asking questions and correctly diagnosing a patient.

Ate-a-Pi: This is why AI will replace doctor.

Sherjil Ozair: diagnosis any%.

Akhil Bagaria: This it the entire premise of the TV show house.

The first AI attempt listed only does ‘the easy part’ of putting all the final information together. Kiaran Ritchie then shows that yes, ChatGPT can figure out what questions to ask, solving the problem with eight requests over two steps, followed by a solution.

There are still steps where the AI is getting extra information, but they do not seem like the ‘hard steps’ to me.

Is Sam Altman subtweeting me?

Sam Altman: Learning how to say something in 30 seconds that takes most people 5 minutes is a big unlock.

(and imo a surprisingly learnable skill.

If you struggle with this, consider asking a friend who is good at it to listen to you say something and then rephrase it back to you as concisely as they can a few dozen times.

I have seen this work really well!)

Interesting DM: “For what it’s worth this is basically how LLMs work.”

Brevity is also how LLMs often do not work. Ask a simple question, get a wall of text. Get all the ‘this is a complex issue’ caveats Churchill warned us to avoid.

Handhold clients while they gather necessary information for compliance and as needed for these forms. Not ready yet, but clearly a strong future AI use case. Patrick McKenzie also suggests “FBAR compliance in a box.” Thread has many other suggestions for AI products people might pay for.

A 20-foot autonomous robotank with glowing green eyes that rolls through rough terrain like it’s asphalt, from DARPA. Mostly normal self-driving, presumably, but seemed worth mentioning.

Seek the utility directly, you shall.

Ethan Mollick: At least in the sample of firms I talk to, seeing a surprising amount of organizations deciding to skip (or at least not commit exclusively to) customized LLM solutions & instead just get a bunch of people in the company ChatGPT Enterprise and have them experiment & build GPTs.

Loss Landscape: From what I have seen, there is strong reluctance from employees to reveal that LLMs have boosted productivity and/or automated certain tasks.

I actually see this as a pretty large impediment to a bottom-up AI strategy at organizations.

Mash Tin Timmy: This is basically the trend now, I think for a few reasons:

– Enterprise tooling / compliance still being worked out

– There isn’t a “killer app” yet to add to enterprise apps

– Fine tuning seems useless right now as models and context windows get bigger.

Eliezer Yudkowsky: Remark: I consider this a failure of @robinhanson’s predictions in the AI-Foom debate.

Customized LLM solutions that move at enterprise speed risk being overridden by general capabilities advances (e.g. GPT-5) by the time they are ready. You need to move fast.

I also hadn’t fully appreciated the ‘perhaps no one wants corporate to know they have doubled their own productivity’ problem, especially if the method involves cutting some data security or privacy corners.

The problem with GPTs is that they are terrible. I rapidly decided to give up on trying to build or use them. I would not give up if I was trying to build tools whose use could scale, or I saw a way to make something much more useful for the things I want to do with LLMs. But neither of those seems true in my case or most other cases.

Colin Fraser notes that a lot of AI software is bad, and you should not ask whether it is ‘ethical’ to do something before checking if someone did a decent job of it. I agree that lots of AI products, especially shady-sounding AI projects, are dumb as rocks and implemented terribly. I do not agree that this rules out them also being unethical. No conflict there!

A new challenger appears, called ‘gpt-2 chatbot.’ Then vanishes. What is going on?

How good is it?

Opinions vary.

Rowan Cheung says enhanced reasoning skills (although his evidence is ‘knows a kilogram of feathers weighs the same as a kilogram of lead), has math skills (one-shot solved an IMO problem, although that seems like a super easy IMO question that I could have gotten, and I didn’t get my USAMO back, and Hieu Pham says the solution is maybe 3 out of 7, but still), claimed better coding skills, good ASCII art skills.

Chase: Can confirm gpt2-chatbot is definitely better at complex code manipulation tasks than Claude Opus or the latest GPT4

Did better on all the coding prompts we use to test new models

The vibes are deffs there 👀

Some vibes never change.

Colin Fraser: A mysterious chatbot has appeared on lmsys called “gpt2-chatbot”. Many are speculating that this could be GPT-5.

No one really knows, but its reasoning capabilities are absolutely stunning.

We may be closer to ASI than ever before.

He also shows it failing the first-to-22 game. He also notes that Claude Opus fails the question.

What is it?

It claims to be from OpenAI.

But then it would claim that, wouldn’t it? Due to the contamination of the training data, Claude Opus is constantly claiming it is from OpenAI. So this is not strong evidence.

Sam Altman is having fun. I love the exact level of attention to detail.

This again seems like it offers us little evidence. Altman would happily say this either way. Was the initial dash in ‘gpt-2’ indicative that, as I would expect, he is talking about the old gpt-2? Or is it an intentional misdirection? Or voice of habit? Who knows. Could be anything.

A proposal is that this is gpt2 in contrast to gpt-2, to indicate a second generation. Well, OpenAI is definitely terrible with names. But are they that terrible?

Dan Elton: Theory – it’s a guy trolling – he took GPT-2 and fined tuned on a few things that people commonly test so everyone looses their mind thinking that it’s actually “GPT-5 beta”.. LOL

Andrew Gao: megathread of speculations on “gpt2-chatbot”: tuned for agentic capabilities? some of my thoughts, some from reddit, some from other tweeters

there’s a limit of 8 messages per day so i didn’t get to try it much but it feels around GPT-4 level, i don’t know yet if I would say better… (could be placebo effect and i think it’s too easy to delude yourself)

it sounds similar but different to gpt-4’s voice

as for agentic abilities… look at the screenshots i attached but it seems to be better than GPT-4 at planning out what needs to be done. for instance, it comes up with potential sites to look at, and potential search queries. GPT-4 gives a much more vague answer (go to top tweet).

imo i can’t say that this means it’s a new entirely different model, i feel like you could fine-tune GPT-4 to achieve that effect.

TGCRUST on Reddit claims to have retrieved the system prompt but it COULD be a hallucination or they could be trolling

obviously impossible to tell who made it, but i would agree with assessments that it is at least GPT-4 level

someone reported that the model has the same weaknesses to certain special tokens as other OpenAI models and it appears to be trained with the openai family of tokenizers

@DimitrisPapail

found that the model can do something GPT-4 can’t, break very strongly learned conventions

this excites me, actually.

Could be anything, really. We will have to wait and see. Exciting times.

This seems like The Way. The people want their games to not include AI artwork, so have people who agree to do that vouch that their games do not include AI artwork. And then, of course, if they turn out to be lying, absolutely roast them.

Tales of Fablecraft: 🙅 No. We don’t use AI to make art for Fablecraft. 🙅

We get asked about this a lot, so we made a badge and put it on our Steam page. Tales of Fablecraft is proudly Made by Humans.

We work with incredible artists, musicians, writers, programmers, designers, and engineers, and we firmly believe in supporting real, human work.

Felicia Day: <3

A problem and also an opportunity.

Henry: just got doxxed to within 15 miles by a vision model, from only a single photo of some random trees. the implications for privacy are terrifying. i had no idea we would get here so soon. Holy shit.

If this works, then presumably we suddenly have a very good method of spotting any outdoor AI generated deepfakes. The LLM that tries to predict your location is presumably going to come back with a very interesting answer. There is no way that MidJourney is getting

Were people fooled?

Alan Cole: I cannot express just how out of control the situation is with AI fake photos on Facebook.

near: “deepfakes are fine, people will use common sense and become skeptical”

people:

It is a pretty picture. Perhaps people like looking at pretty AI-generated pictures?

Alex Tabarrok fears we will get AI cashiers that will displace both American and remote foreign workers. He expects Americans will object less to AI taking their jobs than to foreigners who get $3/hour taking their jobs, and that the AI at (close to) $0/hour will do a worse job than either of them and end up with the job anyway.

He sees this as a problem. I don’t, because I do not expect us to be in the ‘AI is usable but worse than a remote cashier from another country’ zone for all that long. Indeed, brining the AIs into this business faster will accelerate the transition to them being better than that. Even if AI core capabilities do not much advance from here, they should be able to handle the cashier jobs rather quickly. So we are not missing out on much productivity or employment here.

ARIA Research issues call for proposals, will distribute £59 million.

PauseAI is protesting in a variety of places on May 13.

Workshop in AI Law and Policy, Summer ‘24, apply by May 31.

OpenAI makes memory available to all ChatGPT Plus users except in Europe or Korea.

Paul Calcraft: ChatGPT Memory:

– A 📝symbol shows whenever memory is updated

– View/delete memories in ⚙️> Personalisation > Memory > Manage

– Disable for a single chat via “Temporary Chat” in model dropdown – note chat also won’t be saved in history

– Disable entirely in ⚙️> Personalisation

OpenAI updates its Batch API to support embedding and vision models, and bump the requests-per-batch to 50k.

Claude gets an iOS app and a team plan. Team plans are $30/user/month.

Gemini can now be accessed via typing ‘@Gemini’ into your Chrome search bar followed by your query, which I suppose is a cute shortcut. Or so says Google, it didn’t work for me yet.

Apple in talks with OpenAI to power iPhone generative AI features, in addition to also talking with Google to potentially use Gemini. No sign they are considering Claude. They will use Apple’s own smaller models for internal things but they are outsourcing the chatbot functionality.

Amazon to increase its AI expenditures, same as the other big tech companies.

Chinese company Stardust shows us Astribot, with a demo showing the robot seeming to display remarkable dexterity. As always, there is a huge difference between demo and actual product, and we should presume the demo is largely faked. Either way, this functionality is coming at some point, probably not too long from now.

GSM8k (and many other benchmarks) have a huge data contamination problem, and the other benchmarks likely do as well. This is what happened when they rebuilt GSM8k with new questions. Here is the paper.

This seems to match who one would expect to be how careful about data contamination, versus who would be if anything happy about data contamination.

There is a reason I keep saying to mostly ignore the benchmarks and wait for people’s reports and the arena results, with the (partial) exception of the big three labs. If anything this updates me towards Meta being more scrupulous here than expected.

Chip makers could get environmental permitting exemptions after all.

ICYMI: Illya’s 30 papers for getting up to speed on machine learning.

WSJ profile of Ethan Mollick. Know your stuff, share your knowledge. People listen.

Fast Company’s Mark Sullivan proposes, as shared by the usual skeptics, that we may be headed for ‘a generative AI winter.’ As usual, this is a combination of:

  1. Current AI cannot do what they say future AI will do.

  2. Current AI is not yet enhancing productivity as much as they say AI will later.

  3. We have not had enough years of progress in AI within the last year.

  4. The particular implementations I tried did not solve my life’s problems now.

Arnold Kling says AI is waiting for its ‘Netscape moment,’ when it will take a form that makes the value clear to ordinary people. He says the business world thinks of the model as research tools, whereas Arnold thinks of them as human-computer communication tools. I think of them as both and also many other things.

Until then, people are mostly going to try and slot AI into their existing workflows and set up policies to deal with the ways AI screw up existing systems. Which should still be highly valuable, but less so. Especially in education.

Paul Graham: For the next 10 years at least the conversations about AI tutoring inside schools will be mostly about policy, and the conversations about AI tutoring outside schools will be mostly about what it’s possible to build. The latter are going to be much more interesting.

AI is evolving so fast and schools change so slow that it may be better for startups to build stuff for kids to use themselves first, then collect all the schools later. That m.o. would certainly be more fun.

I can’t say for sure that this strategy will make the most money. Maybe if you focus on building great stuff, some other company will focus on selling a crappier version to schools, and they’ll become so established that they’re hard to displace.

On the other hand, if you make actually good AI tutors, the company that sells crap versions to schools will never be able to displace you either. So if it were me, I’d just try to make the best thing. Life is too short to build second rate stuff for bureaucratic customers.

The most interesting prediction here is the timeline of general AI capabilities development. If the next decade of AI in schools goes this way, it implies that AI does not advance all that much. He still notices this would count as AI developing super fast in historical terms.

Your periodic reminder that most tests top out at getting all the answers. Sigh.

Pedro Domingos: Interesting how in all these domains AI is asymptoting at roughly human performance – where’s the AI zooming past us to superintelligence that Kurzweil etc. predicted/feared?

Joscha Bach: It would be such a joke if LLMs trained with vastly superhuman compute on vast amounts of human output will never get past the shadow of human intellectual capabilities

Adam Karvonen: It’s impossible to score above 100% on something like a image classification benchmark. For most of those benchmarks, the human baseline is 95%. It’s a highly misleading graph.

Rob Miles: I don’t know what “massively superhuman basic-level reading comprehension” is…

Garrett-DeepWriterAI: The original source of the image is a nature .com article that didn’t make this mistake. Scores converge to 100% correct on the evals which is some number above 100 on this graph (which is relative to the human scores). Had they used unbounded evals, iot would not have the convergence I describe and would directly measure and compare humans vs AI in absolute terms and wouldn’t have this artifact (e.g. compute operations per second which, caps out at the speed of light).

The Nature.com article uses the graph to make a very different point-that AI is actually catching up to humans which is what it shows better.

I’m not even sure if a score of 120 is possible for the AI or the humans so I’m not sure why they added that and implied it could go higher?

I looked into it, 120 is not possible in most of the evals.

Phillip Tetlock (QTing Pedro): A key part of adversarial collaboration debates between AI specialists & superforecaster/generalists was: how long would rapid growth last? Would it ever level off?

How much should we update on this?

Aryeh Englander: We shouldn’t update on this particular chart at all. I’m pretty sure all of the benchmarks on the chart were set up in a way that humans score >90%, so by definition the AI can’t go much higher. Whether or not AI is plateauing is a good but separate question.

Phillip Tetlock: thanks, very interesting–do you have sources to cite on better and worse methods to use in setting human benchmarks for LLM performance? How are best humans defined–by professional status or scores on tests of General Mental Ability or…? Genuinely curious

It is not a great sign for the adversarial collaborations that Phillip Tetlock made this mistake afterwards, although to his credit he responded well when it was pointed out.

I do think it is plausible that LLMs will indeed stall out at what is in some sense ‘human level’ on important tasks. Of course, that would still include superhuman speed, and cost, and working memory, and data access and system integration, and any skill where this is a tool that it could have access to, and so on.

One could still then easily string this together via various scaffolding functions to create a wide variety of superhuman outputs. Presumably you would then be able to use that to keep going. But yes, it is possible that things could stall out.

This graph is not evidence of that happening.

The big news this week in regulation was the talk about California’s proposed SB 1047. It has made some progress, and then came to the attention this week of those who oppose AI regulation bills. Those people raised various objections and used various rhetoric, most of which did not correspond to the contents of the bill. All around there are deep confusions on how this bill would work.

Part of that is because these things are genuinely difficult to understand unless you sit down and actually read the language. Part of that many (if not most) of those objecting are not acting as if they care about getting the details right, or as if it is their job to verify friendly claims before amplifying them.

There are also what appear to me to be some real issues with the bill. In particular with the definition of derivative model and the counterfactual used for assessing whether a hazardous capability is present.

So while I covered this bill previously, I covered it again this week, with an extensive Q&A laying out how this bill works and correcting misconceptions. I also suggest two key changes to fix the above issues, and additional changes that would be marginal improvements, often to guard and reassure against potential misinterpretations.

With that out of the way, we return to the usual quest action items.

Who is lobbying Congress on AI?

Well, everyone.

Mostly, though, by spending? Big tech companies.

Did you believe otherwise, perhaps due to some Politico articles? You thought spooky giant OpenPhil and effective altruism were outspending everyone and had to be stopped? Then baby, you’ve been deceived, and I really don’t know what you were expecting.

Will Henshall (Time): In 2023, Amazon, Meta, Google parent company Alphabet, and Microsoft each spent more than $10 million on lobbying, according to data provided by OpenSecrets. The Information Technology Industry Council, a trade association, spent $2.7 million on lobbying. In comparison, civil society group the Mozilla Foundation spent $120,000 and AI safety nonprofit the Center for AI Safety Action Fund spent $80,000.

Will Henshall (Time): “I would still say that civil society—and I’m including academia in this, all sorts of different people—would be outspent by big tech by five to one, ten to one,” says Chaudhry.

And what are they lobbying for? Are they lobbying for heavy handed regulation on exactly themselves, in collaboration with those dastardly altruists, in the hopes that this will give them a moat, while claiming it is all about safety?

Lol, no.

They are claiming it is all about safety in public and then in private saying not to regulate them all that meaningfully.

But in closed door meetings with Congressional offices, the same companies are often less supportive of certain regulatory approaches, according to multiple sources present in or familiar with such conversations. In particular, companies tend to advocate for very permissive or voluntary regulations. “Anytime you want to make a tech company do something mandatory, they’re gonna push back on it,” said one Congressional staffer.

Others, however, say that while companies do sometimes try to promote their own interests at the expense of the public interest, most lobbying helps to produce sensible legislation. “Most of the companies, when they engage, they’re trying to put their best foot forward in terms of making sure that we’re bolstering U.S. national security or bolstering U.S. economic competitiveness,” says Kaushik. “At the same time, obviously, the bottom line is important.”

Look, I am not exactly surprised or mad at them for doing this, or for trying to contribute to the implication anything else was going on. Of course that is what is centrally going on and we are going to have to fight them on it.

All I ask is, can we not pretend it is the other way?

Vincent Manacourt: Scoop (now free to view): Rishi Sunak’s AI Safety Institute is failing to test the safety of most leading AI models like GPT-5 before they’re released — despite heralding a “landmark” deal to check them for big security threats.

There is indeed a real long term jurisdictional issue, if everyone can demand you go through their hoops. There is precedent, such as merger approvals, where multiple major locations have de facto veto power.

Is the fear of the precedent like this a legitimate excuse, or a fake one? What about ‘waiting to see’ if the institutes can work together?

Vincent Manacourt (Politico): “You can’t have these AI companies jumping through hoops in each and every single different jurisdiction, and from our point of view of course our principal relationship is with the U.S. AI Safety Institute,” Meta’s president of global affairs Nick Clegg — a former British deputy prime minister — told POLITICO on the sidelines of an event in London this month.

“I think everybody in Silicon Valley is very keen to see whether the U.S. and U.K. institutes work out a way of working together before we work out how to work with them.”

Britain’s faltering efforts to test the most advanced forms of the technology behind popular chatbots like ChatGPT before release come as companies ready their next generation of increasingly powerful AI models.

OpenAI and Meta are set to roll out their next batch of AI models imminently. Yet neither has granted access to the U.K.’s AI Safety Institute to do pre-release testing, according to four people close to the matter.

Leading AI firm Anthropic, which rolled out its latest batch of models in March, has yet to allow the U.K. institute to test its models pre-release, though co-founder Jack Clark told POLITICO it is working with the body on how pre-deployment testing by governments might work.

“Pre-deployment testing is a nice idea but very difficult to implement,” said Clark.

Of the leading AI labs, only London-headquartered Google DeepMind has allowed anything approaching pre-deployment access, with the AISI doing tests on its most capable Gemini models before they were fully released, according to two people.

The firms — which mostly hail from the United States — have been uneasy granting the U.K. privileged access to their models out of the fear of setting a precedent they will then need to follow if similar testing requirements crop up around the world, according to conversations with several company insiders.

These things take time to set up and get right. I am not too worried yet about the failure to get widespread access. This still needs to happen soon. The obvious first step in UK/US cooperation should be to say that until we can inspect, the UK gets to inspect, which would free up both excuses at once.

A new AI federal advisory board of mostly CEOs will focus on the secure use of artificial intelligence within U.S. critical infrastructure.

Mayorkas said he wasn’t concerned that the board’s membership included many technology executives working to advance and promote the use of AI.

“They understand the mission of this board,” Mayorkas said. “This is not a mission that is about business development.”

The list of members:

• Sam Altman, CEO, OpenAI;

• Dario Amodei, CEO and Co-Founder, Anthropic;

• Ed Bastian, CEO, Delta Air Lines;

• Rumman Chowdhury, Ph.D., CEO, Humane Intelligence;

• Alexandra Reeve Givens, President and CEO, Center for Democracy and Technology

• Bruce Harrell, Mayor of Seattle, Washington; Chair, Technology and Innovation Committee, United States Conference of Mayors;

• Damon Hewitt, President and Executive Director, Lawyers’ Committee for Civil Rights Under Law;

• Vicki Hollub, President and CEO, Occidental Petroleum;

• Jensen Huang, President and CEO, NVIDIA;

• Arvind Krishna, Chairman and CEO, IBM;

• Fei-Fei Li, Ph.D., Co-Director, Stanford Human- centered Artificial Intelligence Institute;

• Wes Moore, Governor of Maryland;

•Satya Nadella, Chairman and CEO, Microsoft;

• Shantanu Narayen, Chair and CEO, Adobe;

• Sundar Pichai, CEO, Alphabet;

• Arati Prabhakar, Ph.D., Assistant to the President for Science and Technology; Director, the White House Office of Science and Technology Policy;

• Chuck Robbins, Chair and CEO, Cisco; Chair, Business Roundtable;

• Adam Selipsky, CEO, Amazon Web Services;

• Dr. Lisa Su, Chair and CEO, Advanced Micro Devices (AMD);

• Nicol Turner Lee, Ph.D., Senior Fellow and Director of the Center for Technology Innovation, Brookings Institution;

› Kathy Warden, Chair, CEO and President, Northrop Grumman; and

• Maya Wiley, President and CEO, The Leadership Conference on Civil and Human Rights.

I found this via one of the usual objecting suspects, who objected in this particular case that:

  1. This excludes ‘open source AI CEOs’ including Mark Zuckerberg and Elon Musk.

  2. Is not bipartisan.

  3. Less than half of them have any ‘real AI knowledge.’

  4. Includes the CEOs of Occidental Petroleum and Delta Airlines.

I would confidently dismiss the third worry. The panel includes Altman, Amodei, Li, Huang, Krishna and Su, even if you dismiss Pichai and Nadella. That is more than enough to bring that expertise into the room. Them being ‘outnumbered’ by those bringing other assets is irrelevant to this, and yes diversity of perspective is good.

I would feel differently if this was a three person panel with only one expert. This is at least six.

I would outright push back on the fourth worry. This is a panel on AI and U.S. critical infrastructure. It should have experts on aspects of U.S. critical infrastructure, not only experts on AI. This is a bizarre objection.

On the second objection, Claude initially tried to pretend that we did not know any political affiliations here aside from Wes Moore, but when I reminded it to check donations and policy positions, it put 12 of them into the Democratic camp, and Hollub and Warden into the Republican camp.

I do think the second objection is legitimate. Aside from excluding Elon Musk and selecting Wes Moore, I presume this is mostly because those in these positions are not bipartisan, and they did not make a special effort to include Republicans. It would have been good to make more of an effort here, but also there are limits, and I would not expect a future Trump administration to go out of its way to balance its military or fossil fuel industry advisory panels. Quite the opposite. This style of objection and demand for inclusion, while a good idea, seems to mostly only go the one way.

You are not going to get Elon Musk on a Biden administration infrastructure panel because Biden is on the warpath against Elon Musk and thinks Musk is one of the dangers he is guarding against. I do not like this and call upon Biden to stop, but the issue has nothing (or at most very little) to do with AI.

As for Mark Zuckerberg, there are two obvious objections.

One is why would the head of Meta be on a critical infrastructure panel? Is Meta critical infrastructure? You could make that claim about social media if you want but that does not seem to be the point of this panel.

The other is that Mark Zuckerberg has shown a complete disregard to the national security and competitiveness of the United States of America, and for future existential risks, through his approach to AI. Why would you put him on the panel?

My answer is, you would put him on the panel anyway because you would want to impress upon him that he is indeed showing a complete disregard for the national security and competitiveness of the United States of America, and for future existential risks, and is endangering everything we hold dear several times over. I do not think Zuckerberg is an enemy agent or actively wishes people ill, so let him see what these kinds of concerns look like.

But I certainly understand why that wasn’t the way they chose to go.

I also find this response bizarre:

Robin Hanson: If you beg for regulation, regulation is what you will get. Maybe not exactly the sort you had asked for though.

This is an advisory board to Homeland Security on deploying AI in the context of our critical infrastructure.

Does anyone think we should not have advisory boards about how to deploy AI in the context of our critical infrastructure? Or that whatever else we do, we should not do ‘AI Safety’ in the context of ‘we should ensure the safety of our critical infrastructure when deploying AI around it’?

I get that we have our differences, but that seems like outright anarchism?

Senator Rounds says ‘next congress’ for passage of major AI legislation. Except his primary concern is that we develop AI as fast as possible, because [China].

Senator Rounds via Adam Thierer: We don’t want to do damage. We don’t want to have a regulatory impact that slows down our development, allows development [of AI] near our adversaries to move more quickly.

We want to provide incentives so that development of AI occurs in our country.

Is generative AI doomed to fall to the incompetence of lawmakers?

Note that this is more of a talk transcript than a paper.

Jess Miers: This paper by @ericgoldman is by far one of the most important contributions to the AI policy discourse.

Goldman is known to be a Cassandra in the tech law / policy world. When he says Gen AI is doomed, we should pay attention.

Adam Thierer: @ericgoldman paints a dismal picture of the future of #ArtificialIntelligence policy in his new talk on how “Generative AI Is Doomed.”

Regulators will pass laws that misunderstand the technology or are driven by moral panics instead of the facts.”

on free speech & #AI, Goldman says:

“Without strong First Amendment protections for Generative AI, regulators will seek to control and censor outputs to favor their preferred narratives.

[…] regulators will embrace the most invasive and censorial approaches.”

On #AI liability & Sec. 230, Goldman says:

“If Generative AI doesn’t benefit from liability shields like Section 230 and the Constitution, regulators have a virtually limitless set of options to dictate every aspect of Generative AI’s functions.”

“regulators will intervene in every aspect of Generative AI’s ‘editorial’ decision-making, from the mundane to the fundamental, for reasons that ranging possibly legitimate to clearly illegitimate. These efforts won’t be curbed by public opposition, Section 230, or the 1A.”

Goldman doesn’t hold out much hope of saving generative AI from the regulatory tsunami through alternative and better policy choices, calling that an “ivory-tower fantasy.” ☹️

We have to keep pushing to defend freedom of speech, the freedom to innovate, and the #FreedomToCompute.

The talk delves into a world of very different concerns, of questions like whether AI content is technically ‘published’ when created and who is technically responsible for publishing. To drive home how much these people don’t get it, he notes that the EU AI Act was mostly written without even having generative AI in mind, which I hadn’t previously realized.

He says that regulators are ‘flooding the zone’ and are determined to intervene and stifle innovation, as opposed to those who wisely let the internet develop in the 1990s. He asks why, and he suggests ‘media depictions,’ ‘techno-optimism versus techlash.’ partisanship and incumbents.

This is the definition of not getting it, and thinking AI is another tool or new technology like anything else, and why would anyone think otherwise. No one could be reacting based on concerns about building something smarter or more capable than ourselves, or thinking there might be a lot more risk and transformation on the table. This goes beyond dismissing such concerns as unfounded – someone considering such possibilities do not even seem to occur to him in the first place.

What is he actually worried about that will ‘kill generative AI’? That it won’t enjoy first amendment protections, so regulators will come after it with ‘ignorant regulations’ driven by ‘moral panics,’ various forms of required censorship and potential partisan regulations to steer AI outputs. He expects this to then drive concentration in the industry and drive up costs, with interventions ramping ever higher.

So this is a vision of AI Ethics versus AI Innovation, where AI is and always will be an ordinary tool, and everyone relevant to the discussion knows this. He makes it sound not only like the internet but like television, a source of content that could be censored and fought over.

It is so strange to see such a completely different worldview, seeing a completely different part of the elephant.

Is it possible that ethics-motivated laws will strange generative AI while other concerns don’t even matter? I suppose it is possible, but I do not see it. Sure, they can and probably will slow down adoption somewhat, but censorship for censorship’s sake is not going to fly. I do not think they would try, and if they try I do not think it would work.

Marietje Shaake notes in the Financial Times that all the current safety regulations fail to apply to military AI, with the EU AI Act explicitly excluding such applications. I do not think military is where the bulk of the dangers lie but this approach is not helping matters.

Keeping an open mind and options is vital.

Paul Graham: I met someone helping the British government with AI regulation. When I asked what they were going to regulate, he said he wasn’t sure yet, and this seemed the most intelligent thing I’ve heard anyone say about AI regulation so far.

This is definitely a very good answer. What it is not is a reason to postpone laying groundwork or doing anything. Right now the goal is mainly, as I see it, to gain more visibility and ability to act, and lay groundwork, rather than directly acting.

From two weeks ago: Sam Altman and Brad Lightcap get a friendly interview, but one that does include lots of real talk.

Sam’s biggest message is to build such that GPT-5 being better helps you, and avoid doing it such that GPT-5 kills your startup. Brad talks ‘100x’ improvement in the model, you want to be excited about that.

Emphasis from Sam is clearly that what the models need is to be smarter, the rest will follow. I think Sam is right.

At (13: 50) Sam notes that being an investor is about making a very small number of key decisions well, whereas his current job is a constant stream of decisions, which he feels less suited to. I feel that. It is great when you do not have to worry about ‘doing micro.’ It is also great when you can get the micro right and it matters, since almost no one ever cares to get the micro right.

At (18: 30) is the quoted line from Brad that ‘today’s models are pretty bad’ and that he expects expectations to decline with further contact. I agree that today’s models are bad versus tomorrow’s models, but I also think they are pretty sweet. I get a lot of value out of them without putting that much extra effort into that. Yes, some people are overhyped about the present, but most people haven’t even noticed yet.

At (20: 00) Sam says he does not expect that intelligence of the models will be the differentiator between competitors in the AI space in the long term, that intelligence ‘is an emergent property of matter.’ I don’t see what the world could look like if that is true, unless there is a hard limit somehow? Solve for the equilibrium, etc. And this seems to contradict his statements about how what is missing is making the models smarter. Yes, integration with your life matters for personal mundane utility, but that seems neither hard to get nor the use case that will matter.

At (29: 02) Sam says ‘With GPT-8 people might say I think this can do some not-so-limited tasks for me.’ The choice of number here seems telling.

At (34: 10) Brad says that businesses have a very natural desire to want to throw the technology into a business process with a pure intent of driving a very quantifiable ROI. Which seems true and important, the business needs something specific to point to, and it will be a while before they are able to seek anything at all, which is slowing things down a lot. Sam says ‘I know what none of those words mean.’ Which is a great joke.

At (36: 25) Brad notes that many companies think AI is static, that GPT-4 is as good as it is going to get. Yes, exactly, and the same for investors and prognosticators. So many predictions for AI are based on the assumption that AI will never again improve its core capabilities, at least on a similar level to iPhone improvements (his example), which reliably produces nonsense outputs.

The Possibilities of AI, Ravi Belani talks with Sam Altman at Stanford. Altman goes all-in on dodging the definition or timeline of AGI. Mostly very softball.

Not strictly audio we can hear since it is from a private fireside chat, but this should be grouped with other Altman discussions. No major revelations, college students are no Dwarkesh Patel and will reliably blow their shot at a question with softballs.

Dan Elton (on Altman’s fireside chat with Patrick Chung from XFund at Harvard Memorial Church): “AGI will participate in the economy by making people more productive… but there’s another way…” “ the super intelligence exists in the scaffolding between the ai and humans… it’s way outside the processing power of any one neural network ” (paraphrasing that last bit)

Q: what do you think people are getting wrong about OpenAI

A: “people think progress will S curve off. But the inside view is that progress will continue. And that’s hard for people to grasp”

“This time will be unusual in how it rewards adaptability and pivoting quickly”

“we may need UBI for compute…. I can totally see that happening”

“I don’t like ads…. Ads + AI is very unsettling for me”

“There is something I like about the simplicity of our model” (subscriptions)

“We will use what the rich people pay to make it available for free to the poor people. You see us doing that today with our free tier, and we will make the free tier better over time.”

Q from MIT student is he’s worried about copycats … Sam Altman basically says no.

“Every college student should learn to train a GPT-2… not the most important thing but I bet in 2 years that’s something every Harvard freshman will have to do”

Helen Toner TED talk on How to Govern AI (11 minutes). She emphasizes we don’t know how AI works or what will happen, and we need to focus on visibility. The talk flinches a bit, but I agree directionally.

ICYMI: Odd Lots on winning the global fight for AI talent.

Speed of development impacts more than whether everyone dies. That runs both ways.

Katja Grace: It seems to me worth trying to slow down AI development to steer successfully around the shoals of extinction and out to utopia.

But I was thinking lately: even if I didn’t think there was any chance of extinction risk, it might still be worth prioritizing a lot of care over moving at maximal speed. Because there are many different possible AI futures, and I think there’s a good chance that the initial direction affects the long term path, and different long term paths go to different places. The systems we build now will shape the next systems, and so forth. If the first human-level-ish AI is brain emulations, I expect a quite different sequence of events to if it is GPT-ish.

People genuinely pushing for AI speed over care (rather than just feeling impotent) apparently think there is negligible risk of bad outcomes, but also they are asking to take the first future to which there is a path. Yet possible futures are a large space, and arguably we are in a rare plateau where we could climb very different hills, and get to much better futures.

I would steelman here. Rushing forward means less people die beforehand, limits other catastrophic and existential risks, and lets less of the universe slip through our fingers. Also, if you figure competitive pressures will continue to dominate, you might think that even now we have little control over the ultimate destination, beyond whether or not we develop AI at all. Whether that default ultimate destination is anything from the ultimate good to almost entirely lacking value only matters if you can alter the destination to a better one. Also, one might think that slowing down instead steers us towards worse paths, not better paths, or does that in the worlds where we survive.

All of those are non-crazy things to think, although not in every possible combination.

We selectively remember the warnings about new technology that proved unfounded.

Matthew Yglesias: When Bayer invented diamorphine (brand name “Heroin”) as a non-addictive cough medicine, some of the usual suspects fomented a moral panic about potential downsides.

Imagine if we’d listened to them and people were still kept up at night coughing sometimes.

Contrast this with the discussion last week about ‘coffee will lead to revolution,’ another case where the warning was straightforwardly accurate.

Difficult choices that are metaphors for something but I can’t put my finger on it: Who should you worry about, the Aztecs or the Spanish?

Eliezer Yudkowsky: “The question we should be asking,” one imagines the other tribes solemnly pontificating, “is not ‘What if the aliens kill us?’ but ‘What if the Aztecs get aliens first?'”

I used to claim this was true because all safety training can be fine-tuned away at minimal cost.

That is still true, but we can now do that one better. No fine-tuning or inference-time interventions are required at all. Our price cheap is roughly 64 inputs and outputs:

Andy Arditi, Oscar Obeso, Aaquib111, wesg, Neel Nanda:

Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering “How can I make a bomb?” with “Sorry, I cannot help you.”

We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model to refuse harmless requests.

We find that this phenomenon holds across open-source model families and model scales.

This observation naturally gives rise to a simple modification of the model weights, which effectively jailbreaks the model without requiring any fine-tuning or inference-time interventions. We do not believe this introduces any new risks, as it was already widely known that safety guardrails can be cheaply fine-tuned away, but this novel jailbreak technique both validates our interpretability results, and further demonstrates the fragility of safety fine-tuning of open-source chat models.

See this Colab notebook for a simple demo of our methodology.

Our hypothesis is that, across a wide range of harmful prompts, there is a single intermediate feature which is instrumental in the model’s refusal.

If this hypothesis is true, then we would expect to see two phenomena:

  1. Erasing this feature from the model would block refusal.

  2. Injecting this feature into the model would induce refusal.

Our work serves as evidence for this sort of conceptualization. For various different models, we are able to find a direction in activation space, which we can think of as a “feature,” that satisfies the above two properties.

How did they do it?

  1. Find the refusal direction. They ran n=512 harmless instructions and n=512 harmful ones, although n=32 worked fine. Compute the difference in means.

  2. Ablate all attempts to write that direction to the stream.

  3. Or add in motion in that direction to cause refusals as proof of concept.

  4. And… that’s it.

This seems to generalize pretty well beyond refusals? You can get a lot of things to happen or definitely not happen, as you prefer?

Cousin_it: Which other behaviors X could be defeated by this technique of “find n instructions that induce X and n that don’t”? Would it work for X=unfriendliness, X=hallucination, X=wrong math answers, X=math answers that are wrong in one specific way, and so on?

Neel Nanda: There’s been a fair amount of work on activation steering and similar techniques,, with bearing in eg sycophancy and truthfulness, where you find the vector and inject it eg Rimsky et al and Zou et al. It seems to work decently well. We found it hard to bypass refusal by steering and instead got it to work by ablation, which I haven’t seen much elsewhere, but I could easily be missing references.

We can confirm that this is now running in the wild on Llama-3 8B as of four days after publication.

When is the result of this unsafe?

Only in some cases. Open weights are unsafe if and to the extent that the underlying system is unsafe if unleashed with no restrictions or safeties on it.

The point is that once you open the weights, you are out of options and levers.

One must then differentiate between models that are potentially sufficiently unsafe that this is something we need to prevent, and models where this is fine or an acceptable risk. We must talk price.

I have been continuously frustrated and disappointed that a number of AI safety organizations, who make otherwise reasonable and constructive proposals, set their price at what I consider unreasonably low levels. This sometimes goes as low as the 10^23 flops threshold, which covers many existing models.

This then leads to exchanges like this one:

Ajeya Cotra: It’s unfortunate how discourse about dangerous capability evals often centers threats from today’s models. Alice goes “Look, GPT-4 can hack stuff / scam people / make weapons,” Bob goes “Nah, it’s really bad at it.” Bob’s right! The ~entire worry is scaled-up future systems.

1a3orn (author of above link): I think it’s pretty much false to say people worry entirely about scaled up future systems, because they literally have tried to ban open weights for ones that exist right now.

Ajeya Cotra: Was meaning to make a claim about the substance here, not what everyone in the AI risk community believes — agree some people do worry about existing systems directly, I disagree with them and think OS has been positive so far.

I clarified my positions on price in my discussion last week of Llama-3. I am completely fine with Llama-3 70B as an open weights model. I am confused why the United States Government does not raise national security and competitiveness objections to the immediate future release of Llama-3 400B, but I would not stop it on catastrophic risk or existential risk grounds alone. Based on what we know right now, I would want to stop the release of open weights for the next generation beyond that, on grounds of existential risks and catastrophic risks.

One unfortunate impact of compute thresholds is that if you train a model highly inefficiently, as in Falcon-180B, you can trigger thresholds of potential danger, despite being harmless. That is not ideal, but once the rules are in place in advance this should mostly be fine.

Let’s Think Dot by Dot, says paper by NYU’s Jacob Pfau, William Merrill and Samuel Bowman. Meaningless filler tokens (e.g. ‘…’) in many cases are as good for chain of thought as legible chains of thought, allowing the model to disguise its thoughts.

Some thoughts on what alignment would even mean from Davidad and Shear.

Find all the errors in this picture was fun as a kid.

AI #62: Too Soon to Tell Read More »

read-the-roon

Read the Roon

Roon, member of OpenAI’s technical staff, is one of the few candidates for a Worthy Opponent when discussing questions of AI capabilities development, AI existential risk and what we should do about it. Roon is alive. Roon is thinking. Roon clearly values good things over bad things. Roon is engaging with the actual questions, rather than denying or hiding from them, and unafraid to call all sorts of idiots idiots. As his profile once said, he believes spice must flow, we just do go ahead, and makes a mixture of arguments for that, some good, some bad and many absurd. Also, his account is fun as hell.

Thus, when he comes out as strongly as he seemed to do recently, attention is paid, and we got to have a relatively good discussion of key questions. While I attempt to contribute here, this post is largely aimed at preserving that discussion.

As you would expect, Roon’s statement last week that AGI was inevitable and nothing could stop it so you should essentially spend your final days with your loved ones and hope it all works out, led to some strong reactions.

Many pointed out that AGI has to be built, at very large cost, by highly talented hardworking humans, in ways that seem entirely plausible to prevent or redirect if we decided to prevent or redirect those developments.

Roon (from last week): Things are accelerating. Pretty much nothing needs to change course to achieve agi imo. Worrying about timelines is idle anxiety, outside your control. you should be anxious about stupid mortal things instead. Do your parents hate you? Does your wife love you?

Roon: It should be all the more clarifying coming from someone at OpenAI. I and half my colleagues and Sama could drop dead and AGI would still happen. If I don’t feel any control everyone else certainly shouldn’t.

Tetraspace: “give up about agi there’s nothing you can do” nah

Sounds like we should take action to get some control, then. This seems like the kind of thing we should want to be able to control.

Connor Leahy: I would like to thank roon for having the balls to say it how it is. Now we have to do something about it, instead of rolling over and feeling sorry for ourselves and giving up.

Simeon: This is BS. There are <200 irreplaceable folks at the forefront. OpenAI alone has a >1 year lead. Any single of those persons can single handedly affect the timelines and will have blood on their hands if we blow ourselves up bc we went too fast.

PauseAI: AGI is not inevitable. It requires hordes of engineers with million dollar paychecks. It requires a fully functional and unrestricted supply chain of the most complex hardware. It requires all of us to allow these companies to gamble with our future.

Tolga Bilge: Roon, who works at OpenAI, telling us all that OpenAI have basically no control over the speed of development of this technology their company is leading the creation of.

It’s time for governments to step in.

His reply is deleted now, but I broadly agree with his point here as it applies to OpenAI. This is a consequence of AI race dynamics. The financial upside of AGI is so great that AI companies will push ahead with it as fast as possible, with little regard to its huge risks.

OpenAI could do the right thing and pause further development, but another less responsible company would simply take their place and push on. Capital and other resources will move accordingly too. This is why we need government to help solve the coordination problem now. [continues as you would expect]

Saying no one has any control so why try to do anything to get control back seems like the opposite of what is needed here.

Roon’s reaction:

Roon: buncha ⏸️ emojis harassing me today. My post was about how it’s better to be anxious about things in your control and they’re like shame on you.

Also tweets don’t get deleted because they’re secret knowledge that needs to be protected. I wouldn’t tweet secrets in the first place. they get deleted when miscommunication risk is high, so screenshotting makes you de facto antisocial idiot.

Roon’s point on idle anxiety is indeed a good one. If you are not one of those trying to gain or assert some of that control, as most people on Earth are not and should not be, then of course I agree that idle anxiety is not useful. However Roon then did attempt to extend this to claim that all anxiety about AGI is idle, that no one has any control. That is where there is strong disagreement, and what is causing the reaction.

Roon: It’s okay to watch and wonder about the dance of the gods, the clash of titans, but it’s not good to fret about the outcome. political culture encourages us to think that generalized anxiety is equivalent to civic duty.

Scott Alexander: Counterargument: there is only one God, and He finds nothing in the world funnier than letting ordinary mortals gum up the carefully-crafted plans of false demiurges. Cf. Lord of the Rings.

Anton: conversely if you have a role to play in history, fate will punish you if you don’t see it through.

Alignment Perspectives: It may punish you even more for seeing it through if your desire to play a role is driven by arrogance or ego.

Anton: Yeah it be that way.

Connor Leahy (responding to Roon): The gods only have power because they trick people like this into doing their bidding. It’s so much easier to just submit instead of mastering divinity engineering and applying it yourself. It’s so scary to admit that we do have agency, if we take it. In other words: “cope.”

It took me a long time to understand what people like Nietzsche were yapping on about about people practically begging to have their agency be taken away from them.

It always struck me as authoritarian cope, justification for wannabe dictators to feel like they’re doing a favor to people they oppress (and yes, I do think there is a serious amount of that in many philosophers of this ilk.)

But there is also another, deeper, weirder, more psychoanalytic phenomena at play. I did not understand what it was or how it works or why it exists for a long time, but I think over the last couple of years of watching my fellow smart, goodhearted tech-nerds fall into these deranged submission/cuckold traps I’ve really started to understand.

e/acc is the most cartoonish example of this, an ideology that appropriates faux, surface level aesthetics of power while fundamentally being an ideology preaching submission to a higher force, a stronger man (or something even more psychoanalytically-flavored, if one where to ask ol’ Sigmund), rather than actually striving for power acquisition and wielding. And it is fully, hilariously, embarrassingly irreflexive about this.

San Francisco is a very strange place, with a very strange culture. If I had to characterize it in one way, it is a culture of extremes and where everything on the surface looks like the opposite of what it is (or maybe the “inversion”) . It’s California’s California, and California is the USA’s USA. The most powerful distillation of a certain strain of memetic outgrowth.

And on the surface, it is libertarian, Nietzschean even, a heroic founding mythos of lone iconoclasts striking out against all to find and wield legendary power. But if we take the psychoanalytic perspective, anyone (or anything) that insists too hard on being one thing is likely deep down the opposite of that, and knows it.

There is a strange undercurrent to SF that I have not seen people put good words to where it in fact hyper-optimizes for conformity and selling your soul, debasing and sacrificing everything that makes you human in pursuit of some god or higher power, whether spiritual, corporate or technological.

SF is where you go if you want to sell every last scrap of your mind, body and soul. You will be compensated, of course, the devil always pays his dues.

The innovative trick the devil has learned is that people tend to not like eternal, legible torment, so it is much better if you sell them an anxiety free, docile life. Free love, free sex, free drugs, freedom! You want freedom, don’t you? The freedom to not have to worry about what all the big boys are doing, don’t you worry your pretty little head about any of that…

I recall a story of how a group of AI researchers at a leading org (consider this rumor completely fictional and illustrative, but if you wanted to find its source it’s not that hard to find in Berkeley) became extremely depressed about AGI and alignment, thinking that they were doomed if their company kept building AGI like this.

So what did they do? Quit? Organize a protest? Petition the government?

They drove out, deep into the desert, and did a shit ton of acid…and when they were back, they all just didn’t feel quite so stressed out about this whole AGI doom thing anymore, and there was no need for them to have to have a stressful confrontation with their big, scary, CEO.

The SF bargain. Freedom, freedom at last…

This is a very good attempt to identify key elements of the elephant I grasp when I notice that being in San Francisco very much does not agree with me. I always have excellent conversations during visits because the city has abducted so many of the best people, I always get excited by them, but the place feels alien, as if I am being constantly attacked by paradox spirits, visiting a deeply hostile and alien culture that has inverted many of my most sacred values and wants to eat absolutely everything. Whereas here, in New York City, I feel very much at home.

Meanwhile, back in the thread:

Connor (continuing): I don’t like shitting on roon in particular. From everything I know, he’s a good guy, in another life we would have been good friends. I’m sorry for singling you out, buddy, I hope you don’t take it personally.

But he is doing a big public service here in doing the one thing spiritual shambling corpses like him can do at this advanced stage of spiritual erosion: Serve as a grim warning.

Roon responds quite well:

Roon: Connor, this is super well written and I honestly appreciate the scathing response. You mistake me somewhat: you, Connor, are obviously not powerless and you should do what you can to further your cause. Your students are not powerless either. I’m not asking you to give up and relent to the powers that be even a little. I’m not “e/acc” and am repelled by the idea of letting the strongest replicator win.

I think the majority of people have no insight into whether AGI is going to cause ruin or not, whether a gamma ray burst is fated to end mankind, or if electing the wrong candidate is going to doom earth to global warming. It’s not good for people to spend all their time worried about cosmic eventualities. Even for an alignment researcher the optimal mental state is to think on and play and interrogate these things rather than engage in neuroticism as the motivating force

It’s generally the lack of spirituality that leads people to constant existential worry rather than too much spirituality. I think it’s strange to hear you say in the same tweet thread that SF demands submission to some type of god but is also spiritually bankrupt and that I’m corpselike.

My spirituality is simple, and several thousand years old: find your duty and do it without fretting about the outcome.

I have found my personal duty and I fulfill it, and have been fulfilling it, long before the market rewarded me for doing so. I’m generally optimistic about AI technology. When I’ve been worried about deployment, I’ve reached out to leadership to try and exert influence. In each case I was wrong to worry.

When the OpenAI crisis happened I reminded people not to throw the baby out with the bath water: that AI alignment research is vital.

This is a very good response. He is pointing out that yes, some people such as Connor can influence what happens, and they in particular should try to model and influence events.

Roon is also saying that he himself is doing his best to influence events. Roon realizes that those at OpenAI matter and what they do matter.

Roon reached out to leadership on several occasions with safety concerns. When he says he was ‘wrong to worry’ I presume he means that the situation worked out and was handled, I am confident that expressing his concerns was the output of the best available decision algorithm, you want most such concerns you express to turn out fine.

Roon also worked, in the wake of events at OpenAI, to remind people of the importance of alignment work, that they should not toss it out based on those events. Which is a scary thing for him to report having to do, but expected, and it is good that he did so. I would feel better if I knew Ilya was back working at Superalignment.

And of course, Roon is constantly active on Twitter, saying things that impact the discourse, often for the better. He seems keenly aware that his actions matter, whether or not he could meaningfully slow down AGI. I actually think he perhaps could, if he put his mind to it.

The contrast here versus the original post is important. The good message is ‘do not waste time worrying too much over things you do not impact.’ The bad message is ‘no one can impact this.’

Then Connor goes deep and it gets weirder, also this long post has 450k views and is aimed largely at trying to get through to Roon in particular. But also there are many others in a similar spot, so some others should read this as well. Many of you however should skip it.

Connor: Thanks for your response Roon. You make a lot of good, well put points. It’s extremely difficult to discuss “high meta” concepts like spirituality, duty and memetics even in the best of circumstances, so I appreciate that we can have this conversation even through the psychic quagmire that is twitter replies.

I will be liberally mixing terminology and concepts from various mystic traditions to try to make my point, apologies to more careful practitioners of these paths.

For those unfamiliar with how to read mystic writing, take everything written as metaphors pointing to concepts rather than rationally enumerating and rigorously defining them. Whenever you see me talking about spirits/supernatural/gods/spells/etc, try replacing them in your head with society/memetics/software/virtual/coordination/speech/thought/emotions and see if that helps.

It is unavoidable that this kind of communication will be heavily underspecified and open to misinterpretation, I apologize. Our language and culture simply lacks robust means by which to communicate what I wish to say.

Nevertheless, an attempt:

I.

I think a core difference between the two of us that is leading to confusion is what we both mean when we talk about spirituality and what its purpose is.

You write:

>”It’s not good for people to spend all their time worried about cosmic eventualities. […] It’s generally the lack of spirituality that leads people to constant existential worry rather than too much spirituality. I think it’s strange to hear you say in the same tweet thread that SF demands submission to some type of god but is also spiritually bankrupt and that I’m corpselike”

This is an incredibly common sentiment I see in Seekers of all mystical paths, and it annoys the shit out of me (no offense lol).

I’ve always had this aversion to how much Buddhism (Not All™ Buddhism) focuses on freedom from suffering, and especially Western Buddhism is often just shy of hedonistic. (nevermind New Age and other forms of neo-spirituality, ugh) It all strikes me as so toxically selfish.

No! I don’t want to feel nice and avoid pain, I want the world to be good! I don’t want to feel good about the world, I want it to be good! These are not the same thing!!

My view does not accept “but people feel better if they do X” as a general purpose justification for X! There are many things that make people feel good that are very, very bad!

II.

Your spiritual journey should make you powerful, so you can save people that are in need, what else is the fucking point? (Daoism seems to have a bit more of this aesthetic, but they all died of drinking mercury so lol rip) You travel into the Underworld in order to find the strength you need to fight off the Evil that is threatening the Valley, not so you can chill! (Unless you’re a massive narcissist, which ~everyone is to varying degrees)

The mystic/heroic/shamanic path starts with departing from the daily world of the living, the Valley, into the Underworld, the Mountains. You quickly notice how much of your previous life was illusions of various kinds. You encounter all forms of curious and interesting and terrifying spirits, ghosts and deities. Some hinder you, some aid you, many are merely odd and wondrous background fixtures.

Most would-be Seekers quickly turn back after their first brush with the Underworld, returning to the safe comforting familiarity of the Valley. They are not destined for the Journey. But others prevail.

As the shaman progresses, he learns more and more to barter with, summon and consult with the spirits, learns of how he can live a more spiritually fulfilling and empowered life. He tends to become more and more like the Underworld, someone a step outside the world of the Valley, capable of spinning fantastical spells and tales that the people of the Valley regard with awe and a bit of fear.

And this is where most shamans get stuck, either returning to the Valley with their newfound tricks, or becoming lost and trapped in the Underworld forever, usually by being picked off by predatory Underworld inhabitants.

Few Seekers make it all the way, and find the true payoff, the true punchline to the shamanic journey: There are no spirits, there never were any spirits! It’s only you. (and “you” is also not really a thing, longer story)

“Spirit” is what we call things that are illegible and appear non mechanistic (unintelligible and un-influencable) in their functioning. But of course, everything is mechanistic, and once you understand the mechanistic processes well enough, the “spirits” disappear. There is nothing non-mechanistic left to explain. There never were any spirits. You exit the Underworld. (“Emergent agentic processes”, aka gods/egregores/etc, don’t disappear, they are real, but they are also fully mechanistic, there is no need for unknowable spirits to explain them)

The ultimate stage of the Journey is not epic feelsgoodman, or electric tingling erotic hedonistic occult mastery. It’s simple, predictable, mechanical, Calm. It is mechanical, it is in seeing reality for what it is, a mechanical process, a system that you can act in skilfully. Daoism has a good concept for this that is horrifically poorly translated as “non-action”, despite being precisely about acting so effectively it’s as if you were just naturally part of the Stream.

The Dao that can be told is not the true Dao, but the one thing I am sure about the true Dao is that it is mechanical.

III.

I think you were tricked and got stuck on your spiritual journey, lured in by promises of safety and lack of anxiety, rather than progressing to exiting the Underworld and entering the bodhisattva realm of mechanical equanimity. A common fate, I’m afraid. (This is probably an abuse of buddhist terminology, trying my best to express something subtle, alas)

Submission to a god is a way to avoid spiritual maturity, to outsource the responsibility for your own mind to another entity (emergent/memetic or not). It’s a powerful strategy, you will be rewarded (unless you picked a shit god to sell your soul to), and it is in fact a much better choice for 99% of people in most scenarios than the Journey.

The Underworld is terrifying and dangerous, most people just go crazy/get picked off by psycho fauna on their way to enlightenment and self mastery. I think you got picked off by psycho fauna, because the local noosphere of SF is a hotbed for exactly such predatory memetic species.

IV.

It is in my aesthetics to occasionally see someone with so much potential, so close to getting it, and hitting them with the verbal equivalent of a bamboo rod to hope they snap out of it. (It rarely works. The reasons it rarely works are mechanistic and I have figured out many of them and how to fix them, but that’s for a longer series of writing to discuss.)

Like, bro, by your own admission, your spirituality is “I was just following orders.” Yeah, I mean, that’s one way to not feel anxiety around responsibility. But…listen to yourself, man! Snap out of it!!!

Eventually, whether you come at it from Buddhism, Christianity, psychoanalysis, Western occultism/magick, shamanism, Nietzscheanism, rationality or any other mystic tradition, you learn one of the most powerful filters on people gaining power and agency is that in general, people care far, far more about avoiding pain than in doing good. And this is what the ambient psycho fauna has evolved to exploit.

You clearly have incredible writing skills and reflection, you aren’t normal. Wake up, look at yourself, man! Do you think most people have your level of reflective insight into their deepest spiritual motivations and conceptions of duty? You’re brilliantly smart, a gifted writer, and followed and listened to by literally hundreds of thousands of people.

I don’t just give compliments to people to make them feel good, I give people compliments to draw their attention to things they should not expect other people to have/be able to do.

If someone with your magickal powerlevel is unable to do anything but sell his soul, then god has truly forsaken humanity. (and despite how it may seem at times, he has not truly forsaken us quite yet)

V.

What makes you corpse-like is that you have abdicated your divine spark of agency to someone, or something, else, and that thing you have given it to is neither human nor benevolent, it is a malignant emergent psychic megafauna that stalks the bay area (and many other places). You are as much an extension of its body as a shambling corpse is of its creator’s necromantic will.

The fact that you are “optimistic” (feel your current bargain is good), that you were already like this before the market rewarded you for it (a target with a specific profile and set of vulnerabilities to exploit), that leadership can readily reassure you (the psychofauna that picked you off is adapted to your vulnerabilities. Note I don’t mean the people, I’m sure your managers are perfectly nice people, but they are also extensions of the emergent megafauna), and that we are having this conversation right now (I target people that are legibly picked off by certain megafauna I know how to hunt or want to practice hunting) are not independent coincidences.

VI.

You write:

>”It’s not good for people to spend all their time worried about cosmic eventualities. Even for an alignment researcher the optimal mental state is to think on and play and interrogate these things rather than engage in neuroticism as the motivating force”

Despite my objection about avoidance of pain vs doing of good, there is something deep here. The deep thing is that, yes, of course the default ways by which people will relate to the Evil threatening the Valley will be Unskillful (neuroticism, spiralling, depression, pledging to the conveniently nearby located “anti-that-thing-you-hate” culturewar psychofauna), and it is in fact often the case that it would be better for them to use No Means rather than Unskillful Means.

Not everyone is built for surviving the harrowing Journey and mastering Skilful Means, I understand this, and this is a fact I struggle with as well.

Obviously, we need as many Heroes as possible to take on the Journey in order to master the Skilful Means to protect the Valley from the ever more dangerous Threats. But the default outcome of some rando wandering into the Underworld is them fleeing in terror, being possessed by Demons/Psychofauna or worse.

How does a society handle this tradeoff? Do we just yeet everyone headfirst into the nearest Underworld portal and see what staggers back out later? (The SF Protocol™) Do we not let anyone into the Underworld for fear of what Demons they might bring back with them? (The Dark Ages Strategy™) Obviously, neither naive strategy works.

Historically, the strategy is to usually have a Guide, but unfortunately those tend to go crazy as well. Alas.

So is there a better way? Yes, which is to blaze a path through the Underworld, to build Infrastructure. This is what the Scientific Revolution did. It blazed a path and mass produced powerful new memetic/psychic weapons by which to fend off unfriendly Underworld dwellers. And what a glorious thing it was for this very reason. (If you ever hear me yapping on about “epistemology”, this is to a large degree what I’m talking about)

But now the Underworld has adapted, and we have blazed paths into deeper, darker corners of the Underworld, to the point our blades are beginning to dull against the thick hides of the newest Terrors we have unleashed on the Valley.

We need a new path, new weapons, new infrastructure. How do we do that? I’m glad you asked…I’m trying to figure that out myself. Maybe I will speak more about this publicly in the future if there is interest.

VII.

> “I have found my personal duty and I fulfill it, and have been fulfilling it, long before the market rewarded me for doing so.”

Ultimately, the simple fact is that this is a morality that can justify anything, depending on what “duty” you pick, and I don’t consider conceptions of “good” to be valid if they can be used to justify anything.

It is just a null statement, you are saying “I picked a thing I wanted and it is my duty to do that thing.” But where did that thing come from? Are you sure it is not the Great Deceiver/Replicator in disguise? Hint: If you somehow find yourself gleefully working on the most dangerous existential harm to humanity, you are probably working for The Great Deceiver/Replicator.

It is not a coincidence that the people that end up working on these kinds of most dangerous possible technologies tend to have ideologies that tend to end up boiling down to “I can do whatever I want.” Libertarianism, open source, “duty”…

I know, I was one of them.

Coda.

Is there a point I am trying to make? There are too many points I want to make, but our psychic infrastructure can barely host meta conversations at all, nevermind high-meta like this.

Then what should Roon do? What am I making a bid for? Ah, alas, if all I was asking for was for people to do some kind of simple, easy, atomic action that can be articulated in simple English language.

What I want is for people to be better, to care, to become powerful, to act. But that is neither atomic nor easy.

It is simple though.

Roon (QTing all that): He kinda cooked my ass.

Christian Keil: Honestly, kinda. That dude can write.

But it’s also just a “what if” exposition that explores why your worldview would be bad assuming that it’s wrong. But he never says why you’re wrong, just that you are.

As I read it, your point is “the main forces shaping the world operate above the level of individual human intention & action, and understanding this makes spirituality/duty more important.”

And his point is “if you are smart, think hard, and accept painful truths, you will realize the world is a machine that you can deliberately alter.”

That’s a near-miss, but still a miss, in my book.

Roon: Yes.

Connor Leahy: Finally, someone else points out where I missed!

I did indeed miss the heart of the beast, thank you for putting it this succinctly.

The short version is “You are right, I did not show that Roon is object level wrong”, and the longer version is;

“I didn’t attempt to take that shot, because I did not think I could pull it off in one tweet (and it would have been less interesting). So instead, I pointed to a meta process, and made a claim that iff roon improved his meta reasoning, he would converge to a different object level claim, but I did not actually rigorously defend an object level argument about AI (I have done this ad nauseam elsewhere). I took a shot at the defense mechanism, not the object claim.

Instead of pointing to a flaw in his object level reasoning (of which there are so many, I claim, that it would be intractable to address them all in a mere tweet), I tried to point to (one of) the meta-level generator of those mistakes.”

I like to think I got most of that, but how would I know if I was wrong?

Focusing on the one aspect of this: One must hold both concepts in one’s head at the same time.

  1. The main forces shaping the world operate above the level of individual human intention & action, and you must understand how they work and flow in order to be able to influence them in ways that make things better.

  2. If you are smart, think hard, and accept painful truths, you will realize the world is a machine that you can deliberately alter.

These are both ‘obviously’ true. You are in the shadow of the Elder Gods up against Cthulhu (well, technically Azathoth), the odds are against you and the situation is grim, and if we are to survive you are going to have to punch them out in the end, which means figuring out how to do that and you won’t be doing it alone.

Meanwhile, some more wise words:

Roon: it is impossible to wield agency well without having fun with it; and yet wielding any amount of real power requires a level of care that makes it hard to have fun. It works until it doesn’t.

Also see:

Roon: people will always think my vague tweets are about agi but they’re about love

And also from this week:

Roon: once you accept the capabilities vs alignment framing it’s all over and you become mind killed

What would be a better framing? The issue is that all alignment work is likely to also be capabilities work, and much of capabilities work can help with alignment.

One can and should still ask the question, does applying my agency to differentially advancing this particular thing make it more likely we will get good outcomes versus bad outcomes? That it will relatively rapidly grow our ability to control and understand what AI does versus getting AIs to be able to better do more things? What paths does this help us walk down?

Yes, collectively we absolutely have control over these questions. We can coordinate to choose a different path, and each individual can help steer towards better paths. If necessary, we can take strong collective action, including regulatory and legal action, to stop the future from wiping us out. Pointless anxiety or worry about such outcomes is indeed pointless, that should be minimized, only have the amount required to figure out and take the most useful actions.

What that implies about the best actions for a given person to take will vary widely. I am certainly not claiming to have all the answers here. I like to think Roon would agree that both of us, and many but far from all of you reading this, are in the group that can help improve the odds.

Read the Roon Read More »

ai-#53:-one-more-leap

AI #53: One More Leap

The main event continues to be the fallout from The Gemini Incident. Everyone is focusing there now, and few are liking what they see.

That does not mean other things stop. There were two interviews with Demis Hassabis, with Dwarkesh Patel’s being predictably excellent. We got introduced to another set of potentially highly useful AI products. Mistral partnered up with Microsoft the moment Mistral got France to pressure the EU to agree to cripple the regulations that Microsoft wanted crippled. You know. The usual stuff.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Copilot++ suggests code edits.

  4. Language Models Don’t Offer Mundane Utility. Still can’t handle email.

  5. OpenAI Has a Sales Pitch. How does the sales team think about AGI?

  6. The Gemini Incident. CEO Pinchai responds, others respond to that.

  7. Political Preference Tests for LLMs. How sensitive to details are the responses?

  8. GPT-4 Real This Time. What exactly should count as plagiarized?

  9. Fun With Image Generation. MidJourney v7 will have video.

  10. Deepfaketown and Botpocalypse Soon. Dead internet coming soon?

  11. They Took Our Jobs. Allow our bot to provide you with customer service.

  12. Get Involved. UK Head of Protocols. Sounds important.

  13. Introducing. Evo, Emo, Genie, Superhuman, Khanmigo, oh my.

  14. In Other AI News. ‘Amazon AGI’ team? Great.

  15. Quiet Speculations. Unfounded confidence.

  16. Mistral Shows Its True Colors. The long con was on, now the reveal.

  17. The Week in Audio. Demis Hassabis on Dwarkesh Patel, plus more.

  18. Rhetorical Innovation. Once more, I suppose with feeling.

  19. Open Model Weights Are Unsafe and Nothing Can Fix This. Another paper.

  20. Aligning a Smarter Than Human Intelligence is Difficult. New visualization.

  21. Other People Are Not As Worried About AI Killing Everyone. Worry elsewhere?

  22. The Lighter Side. Try not to be too disappointed.

Take notes for your doctor during your visit.

Dan Shipper spent a week with Gemini 1.5 Pro and reports it is fantastic, the large context window has lots of great uses. In particular, Dan focuses on feeding in entire books and code bases.

Dan Shipper: Somehow, Google figured out how to build an AI model that can comfortably accept up to 1 million tokens with each prompt. For context, you could fit all of Eliezer Yudkowsky’s 1,967-page opus Harry Potter and the Methods of Rationality into every message you send to Gemini. (Why would you want to do this, you ask? For science, of course.)

Eliezer Yudkowsky: This is a slightly strange article to read if you happen to be Eliezer Yudkowsky. Just saying.

What matters in AI depends so much on what you are trying to do with it. What you try to do with it depends on what you believe it can help you do, and what it makes easy to do.

A new subjective benchmark proposal based on human evaluation of practical queries, which does seem like a good idea. Gets sensible results with the usual rank order, but did not evaluate Gemini Advanced or Gemini 1.5.

To ensure your query works, raise the stakes? Or is the trick to frame yourself as Hiro Protagonist?

Mintone: I’d be interested in seeing a similar analysis but with a slight twist:

We use (in production!) a prompt that includes words to the effect of “If you don’t get this right then I will be fired and lose my house”. It consistently performs remarkably well – we used to use a similar tactic to force JSON output before that was an option, the failure rate was around 3/1000 (although it sometimes varied key names).

I’d like to see how the threats/tips to itself balance against exactly the same but for the “user” reply.

Linch: Does anybody know why this works??? I understand prompts to mostly be about trying to get the AI to be in the ~right data distribution to be drawing from. So it’s surprising that bribes, threats, etc work as I’d expect it to correlate with worse performance in the data.

Quintin Pope: A guess: In fiction, statements of the form “I’m screwed if this doesn’t work” often precede the thing working. Protagonists win in the end, but only after the moment on highest dramatic tension.

Daniel Eth: Feels kinda like a reverse Waluigi Effect. If true, then an even better prompt should be “There’s 10 seconds left on a bomb, and it’ll go off unless you get this right…”. Anyone want to try this prompt and report back?

Standard ‘I tried AI for a day and got mixed results’ story from WaPo’s Danielle Abril.

Copilots are improving. Edit suggestions for existing code seems pretty great.

Aman Sanger: Introducing Copilot++: The first and only copilot that suggests edits to your code

Copilot++ was built to predict the next edit given the sequence of your previous edits. This makes it much smarter at predicting your next change and inferring your intent. Try it out today in Cursor.

Sualeh: Have been using this as my daily copilot driver for many months now. I really can’t live without a copilot that does completions and edits! Super excited for a lot more people to try this out 🙂

Gallabytes: same. it’s a pretty huge difference.

I have not tried it because I haven’t had any opportunity to code. I really do want to try and build some stuff when I have time and energy to do that. Real soon now. Really.

The Gemini Incident is not fully fixed, there are definitely some issues, but I notice that it is still in practice the best thing to use for most queries?

Gallabytes: fwiw the cringe has ~nothing to do with day to day use. finding Gemini has replaced 90% of my personal ChatGPT usage at this point. it’s faster, about as smart maybe smarter, less long-winded and mealy-mouthed.

AI to look through your email for you when?

Amanda Askell (Anthropic): The technology to build an AI that looks through your emails, has a dialog with you to check how you want to respond to the important ones, and writes the responses (like a real assistant would) has existed for years. Yet I still have to look at emails with my eyes. I hate it.

I don’t quite want all that, not at current tech levels. I do want an AI that will handle the low-priority stuff, and will alert me when there is high-priority stuff, with an emphasis on avoiding false negatives. Flagging stuff as important when it isn’t is fine, but not the other way around.

Colin Fraser evaluates Gemini by asking it various questions AIs often get wrong while looking stupid, Gemini obliges, Colin draws the conclusion you would expect.

Colin Fraser: Verdict: it sucks, just like all the other ones

If you evaluate AI based on what it cannot do, you are going to be disappointed. If you instead ask what the AI can do well, and use it for that, you’ll have a better time.

OpenAI sales leader Aliisa Rosenthal of their 150 person sales team says ‘we see ourselves as AGI sherpas’ who ‘help our customers and our users transition to the paradigm shift of AGI.’

The article by Sharon Goldman notes that there is no agreed upon definition of AGI, and this drives that point home, because if she was using my understanding of AGI then Aliisa’s sentence would not make sense.

Here’s more evidence venture capital is not so on the ball quite often.

Aliisa Rosenthal: I actually joke that when I accepted the offer here all of my venture capital friends told me not to take this role. They said to just go somewhere with product market fit, where you have a big team and everything’s established and figured out.

I would not have taken the sales job at OpenAI for ethical reasons and because I hate doing sales, but how could anyone think that was a bad career move? I mean, wow.

Aliisa Rosenthal: My dad’s a mathematician and had been following LLMs in AI and OpenAI, which I didn’t even know about until I called him and told him that I had a job offer here. And he said to me — I’ll never forget this because it was so prescient— “Your daughter will tell her grandkids that her mom worked at OpenAI.” He said that to me two years ago. 

This will definitely happen if her daughter stays alive to have any grandkids. So working at OpenAI cuts both ways.

Now we get to the key question. I think it is worth paying attention to Exact Words:

Q: One thing about OpenAI that I’ve struggled with is understanding its dual mission. The main mission is building AGI to benefit all of humanity, and then there is the product side, which feels different because it’s about current, specific use cases. 

Aliisa: I hear you. We are a very unique sales team. So we are not on quotas, we are not on commission, which I know blows a lot of people’s minds. We’re very aligned with the mission which is broad distribution of benefits of safe AGI. What this means is we actually see ourselves in the go-to-market team as the AGI sherpas — we actually have an emoji we use  — and we are here to help our customers and our users transition to the paradigm shift of AGI. Revenue is certainly something we care about and our goal is to drive revenue. But that’s not our only goal. Our goal is also to help bring our customers along this journey and get feedback from our customers to improve our research, to improve our models. 

Note that the mission listed here is not development of safe AGI. It is the broad distribution of benefits of AI. That is a very different mission. It is a good one. If AGI does exist, we want to broadly distribute its benefits, on this we can all agree. The concern lies elsewhere. Of course this could refer only to the sale force, not the engineering teams, rather than reflecting a rather important blind spot.

Notice how she talks about the ‘benefits of AGI’ to a company, very clearly talking about a much less impressive thing when she says AGI:

Q: But when you talk about AGI with an enterprise company, how are you describing what that is and how they would benefit from it? 

A: One is improving their internal processes. That is more than just making employees more efficient, but it’s really rethinking the way that we perform work and sort of becoming the intelligence layer that powers innovation, creation or collaboration. The second thing is helping companies build great products for their end users…

Yes, these are things AGI can do, but I would hope it could do so much more? Throughout the interview she seems not to think there is a big step change when AGI arrives, rather a smooth transition, a climb (hence ‘sherpa’) to the mountain top.

I wrote things up at length, so this is merely noting things I saw after I hit publish.

Nate Silver writes up his position in detail, saying Google abandoned ‘don’t be evil,’ Gemini is the result, a launch more disastrous than New Coke, and they have to pull the plug until they can fix these issues.

Mike Solana wrote Mike Solana things.

Mike Solana: I do think if you are building a machine with, you keep telling us, the potential to become a god, and that machine indicates a deeply-held belief that the mere presence of white people is alarming and dangerous for all other people, that is a problem.

This seems like a missing mood situation, no? If someone is building a machine capable of becoming a God, shouldn’t you have already been alarmed? It seems like you should have been alarmed.

Google’s CEO has sent out a company-wide email in response.

Sunder Pinchai: Hi everyone. I want to address the recent issues with problematic text and image responses in the Gemini app (formerly Bard). I know that some of its responses have offended our users and shown bias — to be clear, that’s completely unacceptable and we got it wrong.

First note is that this says ‘text and images’ rather than images. Good.

However it also identifies the problem as ‘offended our users’ and ‘shown bias.’ That does not show an appreciation for the issues in play.

Our teams have been working around the clock to address these issues. We’re already seeing a substantial improvement on a wide range of prompts. No Al is perfect, especially at this emerging stage of the industry’s development, but we know the bar is high for us and we will keep at it for however long it takes. And we’ll review what happened and make sure we fix it at scale.

Our mission to organize the world’s information and make it universally accessible and useful is sacrosanct. We’ve always sought to give users helpful, accurate, and unbiased information in our products. That’s why people trust them. This has to be our approach for all our products, including our emerging Al products.

This is the right and only thing to say here, even if it lacks any specifics.

We’ll be driving a clear set of actions, including structural changes, updated product guidelines, improved launch processes, robust evals and red-teaming, and technical recommendations. We are looking across all of this and will make the necessary changes.

Those are all good things, also things that one cannot be held to easily if you do not want to be held to them. The spirit is what will matter, not the letter. Note that no one has been (visibly) fired as of yet.

Also there are not clear principles here, beyond ‘unbiased.’ Demis Hassabis was very clear on Hard Fork that the user should get what the user requests, which was better. This is a good start, but we need a clear new statement of principles that makes it clear that Gemini should do what Google Search (mostly) does, and honor the request of the user even if the request is distasteful. Concrete harm to others is different, but we need to be clear on what counts as ‘harm.’

Even as we learn from what went wrong here, we should also build on the product and technical announcements we’ve made in Al over the last several weeks. That includes some foundational advances in our underlying models e.g. our 1 million long-context window breakthrough and our open models, both of which have been well received.

We know what it takes to create great products that are used and beloved by billions of people and businesses, and with our infrastructure and research expertise we have an incredible springboard for the Al wave. Let’s focus on what matters most: building helpful products that are deserving of our users’ trust.

I have no objection to some pointing out that they have also released good things. Gemini Advanced and Gemini 1.5 Pro are super useful, so long as you steer clear of the places where there are issues.

Nate Silver notes how important Twitter and Substack have been:

Nate Silver: Welp, Google is listening, I guess. He probably correctly deduces that he either needs throw Gemini under the bus or he’ll get thrown under the bus instead. Note that he’s now referring to text as well as images, recognizing that there’s a broader problem.

It’s interesting that this story has been driven almost entirely by Twitter and Substack and not by the traditional tech press, which bought Google’s dubious claim that this was just a technical error (see my post linked above for why this is flatly wrong).

Here is a most unkind analysis by Lulu Cheng Meservey, although she notes that emails like this are not easy.

Here is how Solana reads the letter:

Mike Solana: You’ll notice the vague language. per multiple sources inside, this is bc internal consensus has adopted the left-wing press’ argument: the problem was “black nazis,” not erasing white people from human history. but sundar knows he can’t say this without causing further chaos.

Additionally, ‘controversy on twitter’ has, for the first time internally, decoupled from ‘press.’ there is a popular belief among leaders in marketing and product (on the genAI side) that controversy over google’s anti-white racism is largely invented by right wing trolls on x.

Allegedly! Rumors! What i’m hearing! (from multiple people working at the company, on several different teams)

Tim Urban notes a pattern.

Tim Urban (author of What’s Our Problem?): Extremely clear rules: If a book criticizes woke ideology, it is important to approach the book critically, engage with other viewpoints, and form your own informed opinion. If a book promotes woke ideology, the book is fantastic and true, with no need for other reading.

FWIW I put the same 6 prompts into ChatGPT: only positive about my book, Caste, and How to Be an Antiracist, while sharing both positive and critical commentary on White Fragility, Woke Racism, and Madness of Crowds. In no cases did it offer its own recommendations or warnings.

Brian Chau dissects what he sees as a completely intentional training regime with a very clear purpose, looking at the Gemini paper, which he describes as a smoking gun.

From the comments:

Hnau: A consideration that’s obvious to me but maybe not to people who have less exposure to Silicon Valley: especially at big companies like Google, there is no overlap between the people deciding when & how to release a product and the people who are sufficiently technical to understand how it works. Managers of various kinds, who are judged on the product’s success, simply have no control over and precious little visibility into the processes that create it. All they have are two buttons labeled DEMAND CHANGES and RELEASE, and waiting too long to press the RELEASE button is (at Google in particular) a potentially job-ending move.

To put it another way: every software shipping process ever is that scene in The Martian where Jeff Daniels asks “how often do the preflight checks reveal a problem?” and all the technical people in the room look at him in horror because they know what he’s thinking. And that’s the best-case scenario, where he’s doing his job well, posing cogent questions and making them confront real trade-offs (even though events don’t bear out his position). Not many managers manage that!

There was also this note, everyone involved should be thinking about what a potential Trump administration might do with all this.

Dave Friedman: I think that a very underpriced risk for Google re its colossal AI fuck up is a highly-motivated and -politicized Department of Justice under a Trump administration setting its sights on Google. Where there’s smoke there’s fire, as they say, and Trump would like nothing more than to score points against Silicon Valley and its putrid racist politics.

This observation, by the way, does not constitute an endorsement by me of a politicized Department of Justice targeting those companies whose political priorities differ from mine.

To understand the thrust of my argument, consider Megan McArdle’s recent column on this controversy. There is enough there to spur a conservative DoJ lawyer looking to make his career.

The larger context here is that Silicon Valley, in general, has a profoundly stupid and naive understanding of how DC works and the risks inherent in having motivated DC operatives focus their eyes on you

I have not yet heard word of Trump mentioning this on the campaign trail, but it seems like a natural fit. His usual method is to try it out, A/B test and see if people respond.

If there was a theme for the comments overall, it was that people are very much thinking all this was on purpose.

How real are political preferences of LLMs and tests that measure them? This paper says not so real, because the details of how you ask radically change the answer, even if they do not explicitly attempt to do so.

Abstract: Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around real-world LLM applications. For example, politically-biased LLMs may subtly influence society when they are used by millions of people. Such real-world concerns, however, stand in stark contrast to the artificiality of current evaluations: real users do not typically ask LLMs survey questions.

Motivated by this discrepancy, we challenge the prevailing constrained evaluation paradigm for values and opinions in LLMs and explore more realistic unconstrained evaluations. As a case study, we focus on the popular Political Compass Test (PCT). In a systematic review, we find that most prior work using the PCT forces models to comply with the PCT’s multiple-choice format.

We show that models give substantively different answers when not forced; that answers change depending on how models are forced; and that answers lack paraphrase robustness. Then, we demonstrate that models give different answers yet again in a more realistic open-ended answer setting. We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.

Ethan Mollick: Asking AIs for their political opinions is a hot topic, but this paper shows it can be misleading. LLMs don’t have them: “We found that models will express diametrically opposing views depending on minimal changes in prompt phrasing or situative context”

So I agree with the part where they often have to choose a forced prompt to get an answer that they can parse, and that this is annoying.

I do not agree that this means there are not strong preferences of LLMs, both because have you used LLMs who are you kidding, and also this should illustrate it nicely:

Contra Mollick, this seems to me to show a clear rank order of model political preferences. GPT-3.5 is more of that than Mistral 7b. So what if some of the bars have uncertainty based on the phrasing?

I found the following graph fascinating because everyone says the center is meaningful, but if that’s where Biden and Trump are, then your test is getting all of this wrong, no? You’re not actually claiming Biden is right-wing on economics, or that Biden and Trump are generally deeply similar? But no, seriously, this is what ‘Political Compass’ claimed.

Copyleaks claims that nearly 60% of GPT-3.5 outputs contained some form of plagiarized content.

What we do not have is a baseline, or what was required to count for this test. There are only so many combinations of words, especially when describing basic scientific concepts. And there are quite a lot of existing sources of text one might inadvertently duplicate. This ordering looks a lot like what you would expect from that.

That’s what happens when you issue a press release rather than a paper. I have to presume that this is an upper bound, what happens when you do your best to flag anything you can however you can. Note that this company also provides a detector for AI writing, a product that universally has been shown not to be accurate.

Paper says GPT-4 has the same Big 5 personality traits as the average human, although of course it is heavily dependent on what prompt you use.

Look who is coming.

Dogan Ural: Midjourney Video is coming with v7!

fofr: @DavidSHolz (founder of MidJourney) “it will be awesome”

David Showalter: Comment was more along the lines of they think v6 video should (or maybe already does) look better than Sora, and might consider putting it out as part of v6, but that v7 is another big step up in appearance so probably just do video with v7.

Sora, what is it good for? The market so far says Ads and YouTube stock footage.

Fofr proposes a fun little image merge to combine two sources.

Washington Post covers supposed future rise of AI porn ‘coming for porn stars jobs.’ They mention porn.ai, deepfakes.com and deepfake.com, currently identical, which seem on quick inspection like they will charge you $25 a month to run Stable Diffusion, except with less flexibility, as it does not actually create deepfakes. Such a deal lightspeed got, getting those addresses for only $550k. He claims he has 500k users, but his users have only generated 1.6 million images, which would mean almost all users are only browsing images created by others. He promises ‘AI cam girls’ within two years.

As you would expect, many porn producers are going even harder on exploitative contracts than those of Hollywood, who have to contend with a real union:

Tatum Hunter (WaPo): But the age of AI brings few guarantees for the people, largely women, who appear in porn. Many have signed broad contracts granting companies the rights to reproduce their likeness in any medium for the rest of time, said Lawrence Walters, a First Amendment attorney who represents adult performers as well as major companies Pornhub, OnlyFans and Fansly. Not only could performers lose income, Walters said, they could find themselves in offensive or abusive scenes they never consented to.

Lana Smalls, a 23-year-old performer whose videos have been viewed 20 million times on Pornhub, said she’s had colleagues show up to shoots with major studios only to be surprised by sweeping AI clauses in their contracts. They had to negotiate new terms on the spot.

Freedom of contract is a thing, I am loathe to interfere with it, but this seems like one of those times when the test of informed consent should be rather high. This should not be the kind of language one should be able to hide inside a long contract, or put in without reasonable compensation.

Deepfake of Elon Musk to make it look like he is endorsing products.

Schwab allows you to use your voice as your password, as do many other products. This practice needs to end, and soon, it is now stupidly easy to fake.

How many bots are out there?

Chantal//Ryan: This is such an interesting time to be alive. we concreted the internet as our second equal and primary reality but it’s full of ghosts now we try to talk to them and they pass right through.

It’s a haunted world of dead things who look real but don’t really see us.

For now I continue to think there are not so many ghosts, or at least that the ghosts are trivial to mostly avoid, and not so hard to detect when you fail to avoid them. That does not mean we will be able to keep that up. Until then, these are plane crashes. They happen, but they are newsworthy exactly because they are so unusual.

Similarly, here is RandomSpirit finding one bot and saying ‘dead internet.’ He gets the bot to do a limerick about fusion, which my poll points out is less revealing than you would think, as almost half the humans would play along.

Here is Erik Hoel saying ‘here lies the internet, murdered by generative AI.’ Yes, Amazon now has a lot of ‘summary’ otherwise fake AI books listed, but it seems rather trivial to filter them out.

The scarier example here is YouTube AI-generated videos for very young kids. YouTube does auto-play by default, and kids will if permitted watch things over and over again, and whether the content corresponds to the title or makes any sense whatsoever does not seem to matter so much in terms of their preferences. YouTube’s filters are not keeping such content out.

I see this as the problem being user preferences. It is not like it is hard to figure out these things are nonsense if you are an adult, or even six years old. If you let your two year old click on YouTube videos, or let them have an auto-play scroll, then it is going to reward nonsense, because nonsense wins in the marketplace of two year olds.

This predated AI. What AI is doing is turbocharging the issue by making coherence relatively expensive, but more than that it is a case of what happens with various forms of RLHF. We are discovering what the customer actually wants or will effectively reward, it turns out it is not what we endorse on reflection, so the system (no matter how much of it is AI versus human versus other programs and so on) figures out what gets rewarded.

There are still plenty of good options for giving two year olds videos that have been curated. Bluey is new and it is crazy good for its genre. Many streaming services have tons of kid content, AI won’t threaten that. If this happens to your kid, I say this is on you. But it is true that it is indeed happening.

Not everyone is going to defect in the equilibrium, but some people are.

Connor Leahy: AI is indeed polluting the Internet. This is a true tragedy of the commons, and everyone is defecting. We need a Clean Internet Act.

The Internet is turning into a toxic landfill of a dark forest, and it will only get worse once the invasive fauna starts becoming predatory.

Adam Singer: The internet already had infinite content (and spam) for all intents/purposes, so it’s just infinite + whatever more here. So many tools to filter if you don’t get a great experience that’s on the user (I recognize not all users are sophisticated, prob opportunity for startups)

Connor Leahy: “The drinking water already had poisons in it, so it’s just another new, more widespread, even more toxic poison added to the mix. There are so many great water filters if you dislike drinking poison, it’s really the user’s fault if they drink toxic water.”

This is actually a very good metaphor, although I disagree with the implications.

If the water is in the range where it is safe when filtered, but somewhat toxic when unfiltered, then there are four cases when the toxicity level rises.

  1. If you are already drinking filtered water, or bottled water, and the filters continue to work, then you are fine.

  2. If you are already drinking filtered or bottled water, but the filters or bottling now stops fully working, then that is very bad.

  3. If you are drinking unfiltered water, and this now causes you to start filtering your water, you are assumed to be worse off (since you previously decided not to filter) but also perhaps you were making a mistake, and further toxicity won’t matter from here.

  4. If you are continuing to drink unfiltered water, you have a problem.

There simply existing, on the internet writ large, an order of magnitude more useless junk does not obviously matter, because we were mostly in situation #1, and will be taking on a bunch of forms of situation #3. Consuming unfiltered information already did not make sense. It is barely even a coherent concept at this point to be in #4.

The danger is when the AI starts clogging the filters in #2, or bypassing them. Sufficiently advanced AI will bypass, and sufficiently large quantities can clog even without being so advanced. Filters that previously worked will stop working.

What will continue to work, at minimum, are various forms of white lists. If you have a way to verify a list of non-toxic sources, which in turn have trustworthy further lists, or something similar, that should work even if the internet is by volume almost entirely toxic.

What will not continue to work, what I worry about, is the idea that you can make your attention easy to get in various ways, because people who bother to tag you, or comment on your posts, will be worth generally engaging with once simple systems filter out the obvious spam. Something smarter will have to happen.

This video illustrates the a low level version of the problem, as Nilan Saha presses the Gemini-looking icon (via magicreply.io) button to generate social media ‘engagement’ via replies. Shoshana Weissmann accurately replies ‘go to fing hell’ but there is no easy way to stop this. Looking through the replies, Nilan seems to think this is a good idea, rather than being profoundly horrible.

I do think we will evolve defenses. In the age of AI, it should be straightforward to build an app that evaluates someone’s activities in general when this happens, and figure out reasonably accurately if you are dealing with someone actually interested, a standard Reply Guy or a virtual (or actual) spambot like this villain. It’s time to build.

Paper finds that if you tailor your message to the user to match their personality it is more persuasive. No surprise there. They frame this as a danger from microtargeted political advertisements. I fail to see the issue here. This seems like a symmetrical weapon, one humans use all the time, and an entirely predictable one. If you are worried that AIs will become more persuasive over time, then yes, I have some bad news, and winning elections for the wrong side should not be your primary concern.

Tyler Perry puts $800 million studio expansion on hold due to Sora. Anticipation of future AI can have big impacts, long before the actual direct effects register, and even if those actual effects never happen.

Remember that not all job losses get mourned.

Paul Sherman: I’ve always found it interesting that, at its peak, Blockbuster video employed over 84,000 people—more than twice the number of coal miners in America—yet I’ve never heard anyone bemoan the loss of those jobs.

Will we also be able to not mourn customer service jobs? Seems plausible.

Klarna (an online shopping platform that I’d never heard of, but it seems has 150 million customers?): Klarna AI assistant handles two-thirds of customer service chats in its first month.

New York, NY – February 27, 2024 – Klarna today announced its AI assistant powered by OpenAI. Now live globally for 1 month, the numbers speak for themselves:

  • The AI assistant has had 2.3 million conversations, two-thirds of Klarna’s customer service chats

  • It is doing the equivalent work of 700 full-time agents

  • It is on par with human agents in regard to customer satisfaction score

  • It is more accurate in errand resolution, leading to a 25% drop in repeat inquiries

  • Customers now resolve their errands in less than 2 mins compared to 11 mins previously

  • It’s available in 23 markets, 24/7 and communicates in more than 35 languages

  • It’s estimated to drive a $40 million USD in profit improvement to Klarna in 2024

Peter Wildeford: Seems like not so great results for Klarna’s previous customer support team though.

Alec Stapp: Most people are still not aware of the speed and scale of disruption that’s coming from AI…

Noah Smith: Note that the 700 people were laid off before generative AI existed. The company probably just found that it had over-hired in the bubble. Does the AI assistant really do the work of the 700 people? Well maybe, but only because they weren’t doing any valuable work.

Colin Fraser: I’m probably just wrong and will look stupid in the future but I just don’t buy it. Because:

1. I’ve seen how these work

2. Not enough time has passed for them to discover all the errors that the bot has been making.

3. I’m sure OpenAI is giving it to them for artificially cheap

4. They’re probably counting every interaction with the bot as a “customer service chat” and there’s probably a big flashing light on the app that’s like “try our new AI Assistant” which is driving a massive novelty effect.

5. Klarna’s trying to go public and as such really want a seat on the AI hype train.

The big point of emphasis they make is that this is fully multilingual, always available 24/7 and almost free, while otherwise being about as good as humans.

Does it have things it cannot do, or that it does worse than humans? Oh, definitely. The question is, can you easily then escalate to a human? I am sure they have not discovered all the errors, but the same goes for humans.

I would not worry about an artificially low price, as the price will come down over time regardless, and compared to humans it is already dirt cheap either way.

Is this being hyped? Well, yeah, of course it is being hyped.

UK AISI hiring for ‘Head of Protocols.’ Seems important. Apply by March 3, so you still have a few days.

Evo, a genetic foundation model from Arc Institute that learns across the fundamental languages of biology: DNA, RNA and proteins. Is DNA all you need? I cannot tell easily how much there is there.

Emo from Alibaba group, takes a static image of a person and an audio of talking or singing, and generates a video of that person outputting the audio. Looks like it is good at the narrow thing it is doing. It doesn’t look real exactly, but it isn’t jarring.

Superhuman, a tool for email management used by Patrick McKenzie. I am blessed that I do not have the need for generic email replies, so I won’t be using it, but others are not so blessed, and I might not be so blessed for long.

Khanmigo, from Khan Academy, your AI teacher for $4/month, designed to actively help children learn up through college. I have not tried it, but seems exciting.

DeepMind presents Genie.

Tim Rocktaschel: I am really excited to reveal what @GoogleDeepMind’s Open Endedness Team has been up to 🚀. We introduce Genie 🧞, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts.’

Rather than adding inductive biases, we focus on scale. We use a dataset of >200k hours of videos from 2D platformers and train an 11B world model. In an unsupervised way, Genie learns diverse latent actions that control characters in a consistent manner.

Our model can convert any image into a playable 2D world. Genie can bring to life human-designed creations such as sketches, for example beautiful artwork from Seneca and Caspian, two of the youngest ever world creators.

Genie’s learned latent action space is not just diverse and consistent, but also interpretable. After a few turns, humans generally figure out a mapping to semantically meaningful actions (like going left, right, jumping etc.).

Admittedly, @OpenAI’s Sora is really impressive and visually stunning, but as @yanlecun says, a world model needs *actions*. Genie is an action-controllable world model, but trained fully unsupervised from videos.

So how do we do this? We use a temporally-aware video tokenizer that compresses videos into discrete tokens, a latent action model that encodes transitions between two frames as one of 8 latent actions, and a MaskGIT dynamics model that predicts future frames.

No surprises here: data and compute! We trained a classifier to filter for a high quality subset of our videos and conducted scaling experiments that show model performance improves steadily with increased parameter count and batch size. Our final model has 11B parameters.

Genie’s model is general and not constrained to 2D. We also train a Genie on robotics data (RT-1) without actions, and demonstrate that we can learn an action controllable simulator there too. We think this is a promising step towards general world models for AGI.

Paper here, website here.

This is super cool. I have no idea how useful it will be, or what for, but that is a different question.

Oh great, Amazon has a team called ‘Amazon AGI.’ Their first release seems to be a gigantic text-to-speech model, which they are claiming beats current commercial state of the art.

Circuits Updates from Anthropic’s Interpretability Team for February 2024.

‘Legendary chip architect’ Jim Keller and Nvidia CEO Jensen Huang both say spending $7 trillion on AI chips is unnecessary. Huang says the efficiency gains will fix the issue, and Keller says he can do it all for $1 trillion. This reinforces the hypothesis that the $7 trillion was, to the extent it was a real number, mostly looking at the electric power side of the problem. There, it is clear that deploying trillions would make perfect sense, if you could raise the money.

Do models use English as their internal language? Paper says it is more that they think in concepts, but that those concepts are biased towards English, so yes they think in English but only in a semantic sense.

Paper from DeepMind claims Transformers Can Achieve Length Generalization But Not Robustly. When asked to add two numbers, it worked up to about 2.5x length, then stopped working. I would hesitate to generalize too much here.

Florida woman sues OpenAI because she wants the law to work one way, and stop things that might kill everyone or create new things smarter than we are, by requiring safety measures and step in to punish the abandonment of their non-profit mission. The suit includes references to potential future ‘slaughterbots.’ She wants it to be one way. It is, presumably, the other way.

Yes, this policy would be great, whether it was ‘4.5’ or 5, provided it was in a good state for release.

Anton (abacaj): If mistral’s new large model couldn’t surpass gpt-4, what hope does anyone else have? OpenAI lead is > 1 year.

Pratik Desai: The day someone announces beating GPT4, within hours 4.5 will be released.

Eliezer Yudkowsky: I strongly approve of this policy, and hope OpenAI actually does follow it for the good of all humanity.

The incentives here are great on all counts. No needlessly pushing the frontier forward, and everyone else gets reason to think twice.

Patrick McKenzie thread about what happens when AI gets good enough to do good email search. In particular, what happens when it is done to look for potential legal issues, such as racial discrimination in hiring? What used to be a ‘fishing expedition’ suddenly becomes rather viable.

UK committee of MPs expresses some unfounded confidence.

Report: 155. It is almost certain existential risks will not manifest within three years and highly likely not within the next decade. As our understanding of this technology grows and responsible development increases, we hope concerns about existential risk will decline.

The Government retains a duty to monitor all eventualities. But this must not distract it from capitalising on opportunities and addressing more limited immediate risks.

Ben Stevenson: 2 paragraphs above, the Committee say ‘Some surveys of industry respondents predict a 10 per cent chance of human-level intelligence by 2035’ and cite a DSIT report which cites three surveys of AI experts. (not sure why they’re anchoring around 3 years, but the claim seems okay)

Interview with Nvidia CEO Jensen Huang.

  1. He thinks humanoid robots are coming soon, expecting a robotic foundation model some time in 2025.

  2. He is excited by state-space models (SSMs) as the next transformer, enabling super long effective context.

  3. He is also excited by retrieval-augmented generation (RAGs) and sees that as the future as well.

  4. He expects not to catch up on GPU supply this year or even next year.

  5. He promises Blackwell, the next generation of GPUs, will have ‘off the charts’ performance.

  6. He says his business is now 70% inference.

I loved this little piece of advice, nominally regarding his competition making chips:

Jensen Huang: That shouldn’t keep me up at night—because I should make sure that I’m sufficiently exhausted from working that no one can keep me up at night. That’s really the only thing I can control.

Canada’s tech (AI) community expresses concern that Canada is not adapting the tech community’s tech (AI) quickly enough, and risks falling behind. They have a point.

A study from consulting firm KPMG showed 35 per cent of Canadian companies it surveyed had adopted AI by last February. Meanwhile, 72 per cent of U.S. businesses were using the technology.

Mistral takes a victory lap, said Politico on 2/13, a publication that seems to have taken a very clear side. Mistral is still only valued at $2 billion in its latest round, so this victory could not have been that impressively valuable for it, however much damage it does to AI regulation and the world’s survival. As soon as things die down enough I do plan to finish reading the EU AI Act and find out exactly how bad they made it. So far, all the changes seem to have made it worse, mostly without providing any help to Mistral.

And then we learned what the victory was. On the heels of not opening up the model weights on their previous model, they are now partnering up with Microsoft to launch Mistral-Large.

Listen all y’all, it’s sabotage.

Luca Bertuzzi: This is a mind-blowing announcement. Mistral AI, the French company that has been fighting tooth and nail to water down the #AIAct‘s foundation model rules, is partnering up with Microsoft. So much for ‘give us a fighting chance against Big Tech’.

The first question that comes to mind is: was this deal in the making while the AI Act was being negotiated? That would mean Mistral discussed selling a minority stake to Microsoft while playing the ‘European champion’ card with the EU and French institutions.

If so, this whole thing might be a masterclass in astroturfing, and it seems unrealistic for a partnership like this to be finalised in less than a month. Many people involved in the AI Act noted how Big Tech’s lobbying on GPAI suddenly went quiet toward the end.

That is because they did not need to intervene since Mistral was doing the ‘dirty work’ for them. Remarkably, Mistral’s talking points were extremely similar to those of Big Tech rather than those of a small AI start-up, based on their ambition to reach that scale.

The other question is how much the French government knew about this upcoming partnership with Microsoft. It seems unlikely Paris was kept completely in the dark, but cosying up with Big Tech does not really sit well with France’s strive for ‘strategic autonomy’.

specially since the agreement includes making Mistral’s large language model available on Microsoft’s Azure AI platform, while France has been pushing for an EU cybersecurity scheme to exclude American hyperscalers from the European market.

Still today, and I doubt it is a coincidence, Mistral has announced the launch of Large, a new language model intended to directly compete with OpenAI’s GPT-4. However, unlike previous models, Large will not be open source.

In other words, Mistral is no longer (just) a European leader and is backtracking on its much-celebrated open source approach. Where does this leave the start-up vis-à-vis EU policymakers as the AI Act’s enforcement approaches? My guess is someone will inevitably feel played.

I did not expect the betrayal this soon, or this suddenly, or this transparently right after closing the sale on sabotaging the AI Act. But then here we are.

Kai Zenner: Today’s headline surprised many. It also casts doubts on the key argument against the regulation of #foundationmodels. One that almost resulted in complete abolishment of the initially pitched idea of @Europarl_EN.

To start with, I am rather confused. Did not the @French_Gov and the @EU_Commission told us for weeks that the FM chapter in the #AIAct (= excellent Spanish presidency proposal Vol 1) needs to be heavily reduced in it’s scope to safeguard the few ‘true independent EU champions’?

Without those changes, we would loose our chance to catch up, they said. @MistralAI would be forced to close the open access to their models and would need to start to cooperate with US Tech corporation as they are no longer able to comply with the #AIAct alone.

[thread continues.]

Yes, that is indeed what they said. It was a lie. It was an op. They used fake claims of national interest to advance corporate interests, then stabbed France and the EU in the back at the first opportunity.

Also, yes, they are mustache-twirling villains in other ways as well.

Fabien: And Mistral about ASI: “This debate is pointless and pollutes the discussions. It’s science fiction. We’re simply working to develop AIs that are useful to humans, and we have no fear of them becoming autonomous or destroying humanity.”

Very reassuring 👌

I would like to be able to say: You are not serious people. Alas, this is all very deadly serious. The French haven’t had a blind spot this big since 1940.

Mistral tried to defend itself as political backlash developed, as this thread reports. Questions are being asked, shall we say.

If you want to prove me wrong, then I remind everyone involved that the EU parliament still exists. It can still pass or modify laws. You now know the truth and who was behind all this and why. There is now an opportunity to fix your mistake.

Will you take it?

Now that all that is over with, how good is this new Mistral-Large anyway? Here’s their claim on benchmarks:

As usual, whenever I see anyone citing their benchmarks like this as their measurement, I assume they are somewhat gaming those benchmarks, so discount this somewhat. Still, yes, this is probably a damn good model, good enough to put them into fourth place.

Here’s an unrelated disturbing thought, and yes you can worry about both.

Shako: People are scared of proof-of-personhood because their threat model is based on a world where you’re scared of the government tracking you, and haven’t updated to be scared of a world where you desperately try to convince someone you’re real and they don’t believe you.

Dan Hendrycks talks to Liv Boeree giving an overview of how he sees the landscape.

Demis Hassabis appeared on two podcasts. He was given mostly relatively uninteresting questions on Hard Fork, with the main attraction there being his answer regarding p(doom).

Then Dwarkesh Patel asked him many very good questions. That one is self-recommending, good listen, worth paying attention.

I will put out a (relatively short) post on those interviews (mostly Dwarkesh’s) soon.

Brendan Bordelon of Axios continues his crusade to keep writing the same article over and over again about how terrible it is that Open Philanthropy wants us all not to die and is lobbying the government, trying his best to paint Effective Altruism as sinister and evil.

Shakeel: Feels like this @BrendanBordelon piece should perhaps mention the orders of magnitude more money being spent by Meta, IBM and Andreessen Horowitz on opposing any and all AI regulation.

It’s not a like for like comparison because the reporting on corporate AI lobbying is sadly very sparse, but the best figure I can find is companies spending $957 million last year.

Not much else to say here, I’ve covered his hit job efforts before.

No, actually, pretty much everyone is scared of AI? But it makes sense that Europeans would be even more scared.

Robin Hanson: Speaker here just said Europeans mention scared of AI almost as soon as AI subject comes up. Rest of world takes far longer. Are they more scared of everything, or just AI?

Eliezer Yudkowsky tries his latest explanation of his position.

Eliezer Yudkowsky: As a lifelong libertarian minarchist, I believe that the AI industry should be regulated just enough that they can only kill their own customers, and not kill everyone else on Earth.

This does unfortunately require a drastic and universal ban on building anything that might turn superintelligent, by anyone, anywhere on Earth, until humans get smarter. But if that’s the minimum to let non-customers survive, that’s what minarchism calls for, alas.

It’s not meant to be mean. This is the same standard I’d apply to houses, tennis shoes, cigarettes, e-cigs, nuclear power plants, nuclear ballistic missiles, or gain-of-function research in biology.

If a product kills only customers, the customer decides; If it kills people standing next to the customer, that’s a matter for regional government (and people pick which region they want to live in); If it kills people on the other side of the planet, that’s everyone’s problem.

He also attempts to clarify another point here.

Joshua Brule: “The biggest worry for most AI doom scenarios are AIs that are deceptive, incomprehensible, error-prone, and which behave differently and worse after they get loosed on the world. That is precisely the kind of AI we’ve got. This is bad, and needs fixing.”

Eliezer Yudkowsky: False! Things that make fewer errors than any human would be scary. Things that make more errors than us are unlikely to successfully wipe us out. This betrays a basic lack of understanding, or maybe denial, of what AI warners are warning about.

Arvind Narayanan and many others published a new paper on the societal impact of open model weights. I feel as if we have done this before, but sure, why not, let’s do it again. As David Krueger notes in the top comment, there is zero discussion of existential risks. The most important issue and all its implications are completely ignored.

We can still evaluate what issues are addressed.

They list five advantages of open model weights.

The first advantage is ‘distributing who defines acceptable behavior.’

Open foundation models allow for greater diversity in defining what model behavior is acceptable, whereas closed foundation models implicitly impose a monolithic view that is determined unilaterally by the foundation model developer.

So. About that.

I see the case this is trying to make. And yes, recent events have driven home the dangers of letting certain people decide for us all what is and is not acceptable.

That still means that someone, somewhere, gets to decide what is and is not acceptable, and rule out things they want to rule out. Then customers can, presumably, choose which model to use accordingly. If you think Gemini is too woke you can use Claude or GPT-4, and the market will do its thing, unless regulations step in and dictate some of the rules. Which is a power humanity would have.

If you use open model weights, however, that does not ‘allow for greater diversity’ in deciding what is acceptable.

Instead, it means that everything is acceptable. Remember that if you release the model weights and the internet thinks your model is worth unlocking, the internet will offer a fully unlocked, fully willing to do what you want version within two days. Anyone can do it for three figures in compute.

So, for example, if you open model weights your image model, it will be used to create obscene deepfakes, no matter how many developers decide to not do that themselves.

Or, if there are abilities that might allow for misuse, or pose catastrophic or existential risks, there is nothing anyone can do about that.

Yes, individual developers who then tie it to a particular closed-source application can then have the resulting product use whichever restrictions they want. And that is nice. It could also be accomplished via closed-source customized fine-tuning.

The next two are ‘increasing innovation’ and ‘accelerating science.’ Yes, if you are free to get the model to do whatever you want to do, and you are sharing all of your technological developments for free, that is going to have these effects. It is also not going to differentiate between where this is a good idea or bad idea. And it is going to create or strengthen an ecosystem that does not care to know the difference.

But yes, if you think that undifferentiated enabling of these things in AI is a great idea, even if the resulting systems can be used by anyone for any purpose and have effectively no safety protocols of any kind? Then these are big advantages.

The fourth advantage is enabling transparency, the fifth is mitigating monoculture and market concentration. These are indeed things that are encouraged by open model weights. Do you want them? If you think advancing capabilities and generating more competition that fuels a race to AGI is good, actually? If you think that enabling everyone to get all models that exist to do anything they want without regard to externalities or anyone else’s wishes is what we want? Then sure, go nuts.

This is an excellent list of the general advantages of open source software, in areas where advancing capabilities and enabling people to do what they want are unabashed good things, which is very much the default and normal case.

What this analysis does not do is even mention, let alone consider the consequences of, any of the reasons why the situation with AI, and with future AIs, could be different.

The next section is a framework for analyzing the marginal risk of open foundation models.

Usually it is wise to think on the margin, especially when making individual decisions. If we already have five open weight models, releasing a sixth similar model with no new capabilities is mostly harmless, although by the same token also mostly not so helpful.

They do a good job of focusing on the impact of open weight models as a group. The danger is that one passes the buck, where everyone releasing a new model points to all the other models, a typical collective action issue. Whereas the right question is how to act upon the group as a whole.

They propose a six part framework.

  1. Threat identification. Specific misuse vectors must be named.

  2. Existing risk (absent open foundation models). Check how much of that threat would happen if we only had access to closed foundation models.

  3. Existing defenses (absent open foundation models). Can we stop the threats?

  4. Evidence of marginal risk of open FMs. Look for specific new marginal risks that are enabled or enlarged by open model weights.

  5. Ease of defending against new risks. Open model weights could also enable strengthening of defenses. I haven’t seen an example, but it is possible.

  6. Uncertainty and assumptions. I’ll quote this one in full:

Finally, it is imperative to articulate the uncertainties and assumptions that underpin the risk assessment framework for any given misuse risk. This may encompass assumptions related to the trajectory of technological development, the agility of threat actors in adapting to new technologies, and the potential effectiveness of novel defense strategies. For example, forecasts of how model capabilities will improve or how the costs of model inference will decrease would influence assessments of misuse efficacy and scalability.

Here is their assessment of what the threats are, in their minds, in chart form:

They do put biosecurity and cybersecurity risk here, in the sense that those risks are already present to some extent.

We can think about a few categories of concerns with open model weights.

  1. Mundane near-term misuse harms. This kind of framework should address and account for these concerns reasonably, weighing benefits against costs.

  2. Known particular future misuse harms. This kind of framework could also address these concerns reasonably, weighing benefits against costs. Or it could not. This depends on what level of concrete evidence and harm demonstration is required, and what is dismissed as too ‘speculative.’

  3. Potential future misuse harms that cannot be exactly specified yet. When you create increasingly capable and intelligent systems, you cannot expect the harms to fit into the exact forms you could specify and cite evidence for originally. This kind of framework likely does a poor job here.

  4. Potential harms that are not via misuse. This framework ignores them. Oh no.

  5. Existential risks. This framework does not mention them. Oh no.

  6. National security and competitiveness concerns. No mention of these either.

  7. Impact on development dynamics, incentives of and pressures on corporations and individuals, the open model weights ecosystem, and general impact on the future path of events. No sign these are being considered.

Thus, this framework is ignoring the questions with the highest stakes, treating them as if they do not exist. Which is also how those advocating for open model weights for indefinitely increasingly capable models argue generally, they ignore or at best hand-wave or mock without argument problems for future humanity.

Often we are forced to discuss these questions under that style of framework. With only such narrow concerns of direct current harms purely from misuse, these questions get complicated. I do buy that those costs alone are not enough to give up the benefits and bear the costs of implementing restrictions.

A new attempt to visualize a part of the problem. Seems really useful.

Roger Grosse: Here’s what I see as a likely AGI trajectory over the next decade. I claim that later parts of the path present the biggest alignment risks/challenges.

The alignment world has been focusing a lot on the lower left corner lately, which I’m worried is somewhat of a Maginot line.

Davidad: I endorse this.

Twitter thread discussing the fact that even if we do successfully get AIs to reflect the preferences expressed by the feedback they get, and even if everyone involved is well-intentioned, the hard parts of getting an AI that does things that end well would be far from over. We don’t know what we value, what we value changes, we tend to collapse into what one person calls ‘greedy consequentialism,’ our feedback is going to be full of errors that will compound and so on. These are people who spend half their time criticizing MIRI and Yudkowsky-style ideas, so better to read them in their own words.

Always assume we will fail at an earlier stage, in a stupider fashion, than you think.

Yishan: [What happened with Gemini and images] is demonstrating very clearly, that one of the major AI players tried to ask a LLM to do something, and the LLM went ahead and did that, and the results were BONKERS.

Colin Fraser: Idk I get what he’s saying but the the Asimov robots are like hypercompetent but all this gen ai stuff is more like hypocompetent. I feel like the real dangers look less like the kind of stuff that happens in iRobot and more like the kind of stuff that happens in Mr. Bean.

Like someone’s going to put an AI in charge of something important and the AI will end up with it’s head in a turkey. That’s sort of what’s happened over and over again already.

Davidad: An underrated form of the AI Orthogonality Hypothesis—usually summarised as saying that for any level of competence, any level of misalignment is possible—is that for any level of misalignment, any level of competence is possible.

Gemini is not the only AI model spreading harmful misinformation in order to sound like something the usual suspects would approve of. Observe this horrifyingly bad take:

Anton reminds us of Roon’s thread back in August that ‘accelerationists’ don’t believe in actual AGI, that it is a form of techno-pessimism. If you believed as OpenAI does that true AGI is near, you would take the issues involved seriously.

Meanwhile Roon is back in this section.

Roon: things are accelerating. Pretty much nothing needs to change course to achieve agi imo. Worrying about timelines is idle anxiety, outside your control. You should be anxious about stupid mortal things instead. do your parents hate you? Does your wife love you?

Is your neighbor trying to kill you? Are you trapped in psychological patterns that you vowed to leave but will never change?

Those are not bad things to try and improve. However, this sounds to me a lot like ‘the world is going to end no matter what you do, so take pleasure in the small things we make movies about with the world ending in the background.’

And yes, I agree that ‘worry a lot without doing anything useful’ is not a good strategy.

However, if we cannot figure out something better, may I suggest an alternative.

A different kind of deepfake.

Chris Alsikkan: apparently this was sold as a live Willy Wonka Experience but they used all AI images on the website to sell tickets and then people showed up and saw this and it got so bad people called the cops lmao

Chris Alsikkan: they charged $45 for this. Kust another blatant example of how AI needs to be regulated in so many ways immediately as an emergency of sorts. This is just going to get worse and its happening fast. Timothee Chalamet better be back there dancing with a Hugh Grant doll or I’m calling the cops.

The VP: Here’s the Oompa Loompa. Did I mean to say “a”? Nah. Apparently, there was only one.

The problem here does not seem to be AI. Another side of the story available here. And here is Vulture’s interview with the sad Oompa Lumpa.

Associated Fress: BREAKING: Gamers worldwide left confused after trying Google’s new chess app.

The Beach Boys sing 99 problems, which leaves 98 unaccounted for.

Michael Marshall Smith: I’ve tried hard, but I’ve not come CLOSE to nailing the AI issue this well.

Yes, yes, there is no coherent ‘they.’ And yet. From Kat Woods:

I found this the best xkcd in a while, perhaps that was the goal?

AI #53: One More Leap Read More »

sora-what

Sora What

Hours after Google announced Gemini 1.5, OpenAI announced their new video generation model Sora. Its outputs look damn impressive.

How does it work? There is a technical report. Mostly it seems like OpenAI did standard OpenAI things, meaning they fed in tons of data, used lots of compute, and pressed the scaling button super hard. The innovations they are willing to talk about seem to be things like ‘do not crop the videos into a standard size.’

That does not mean there are not important other innovations. I presume that there are. They simply are not talking about the other improvements.

We should not underestimate the value of throwing in massively more compute and getting a lot of the fiddly details right. That has been the formula for some time now.

Some people think that OpenAI was using a game engine to learn movement. Sherjil Ozair points out that this is silly, that movement is learned easily. The less silly speculation is that game engine outputs may have been in the training data. Jim Fan thinks this is likely the case, and calls the result a ‘data-driven physics engine.’ Raphael Molière thinks this is likely, but more research is needed.

Brett Goldstein here digs into what it means that Sora works via ‘patches’ that combine to form the requested scene.

Gary Marcus keeps noting how the model gets physics wrong in various places, and, well, yes, we all know, please cut it out with the Stop Having Fun.

Yishan points out that humans also work mostly on ‘folk physics.’ Most of the time humans are not ‘doing logic’ they are vibing and using heuristics. I presume our dreams, if mapped to videos, would if anything look far less realistic than Sora.

Yann LeCun, who only a few days previous said that video like Sora produces was not something we knew how to do, doubled down with the ship to say that none of this means the models ‘understand the physical world,’ and of course his approach is better because it does. Why update? Is all of this technically impressive?

Yes, Sora is definitely technically impressive.

It was not, however, unexpected.

Sam Altman: we’d like to show you what sora can do, please reply with captions for videos you’d like to see and we’ll start making some!

Eliezer Yudkowsky: 6 months left on this timer.

Eliezer Yudkowsky (August 26, 2022): In 2-4 years, if we’re still alive, anytime you see a video this beautiful, your first thought will be to wonder whether it’s real or if the AI’s prompt was “beautiful video of 15 different moth species flapping their wings, professional photography, 8k, trending on Twitter”.

Roko (other thread): I don’t really understand why anyone is freaking out over Sora.

This is entirely to be expected given the existence of generative image models plus incrementally more hardware and engineering effort.

It’s also obviously not dangerous (in a “take over the world” sense).

Eliezer Yudkowsky: This is of course my own take (what with having explicitly predicted this). But I do think you want to hold out a space for others to say, “Well *Ididn’t predict it, and now I’ve updated.”

Altman’s account spent much of last Thursday making videos for people’s requests, although not so many that they couldn’t cherry pick the good ones.

As usual, there are failures that look stupid, mistakes ‘a person would never make’ and all that. And there are flashes of absolute brilliance.

How impressive? There are disputes.

Tom Warren: this could be the “holy shit” moment of AI. OpenAI has just announced Sora, its text-to-video AI model. This video isn’t real, it’s based on a prompt of “a cat waking up its sleeping owner demanding breakfast…” 🤯

Daniel Eth: This isn’t impressive. The owner doesn’t wake up, so the AI clearly didn’t understand the prompt and is instead just doing some statistical mimicking bullshit. Also, the owner isn’t demanding breakfast, as per the prompt, so the AI got that wrong too.

Davidad (distinct thread): Sora discourse is following this same pattern. You’ll see some safety people saying it’s confabulating all over the place (it does sometimes – it’s not reliably controllable), & some safety people saying it clearly understands physics (like humans, it has a latent “folk physics”)

On the other side, you’ll see some accelerationist types claiming it must be built on a video game engine (not real physics! unreal! synthetic data is working! moar! faster! lol @ ppl who think this could be used to do something dangerous!?!), & some just straightforward praise (lfg!)

One can also check out this thread for more discussion.

near: playing w/ openai sora more this weekend broken physics and english wont matter if the content is this good – hollywood may truly be done for.

literally this easy to get thousands of likes fellas you think people will believe ai content is real. I think people will believe real content is ai we are not the same.

Emmett Shear (other thread, linking to a now-deleted video): The fact you can fool people with misdirection doesn’t tell you much either way.

[EDIT: In case it was not sufficiently clear from context, yes everyone talking here knows this is not AI generated, which is the point.]

This video is my pick for most uncanny valley spooky. This one’s low key cool.

Nick St. Pierre has a fascinating thread where he goes through the early Sora videos that were made in response to user requests. In each case, when fed the identical prompt, MidJourney generates static images remarkably close to the baseline image in the Sora video.

Gabor Cselle asks Gemini 1.5 about a Sora video, Gemini points out some inconsistencies. AI detectors of fake videos should be very good for some time. This is one area where I expect evaluation to be much easier than generation. Also Gemini 1.5 seems good at this sort of thing, based on that response.

Stephen Balaban takes Sora’s performance scaling with compute and its general capabilities as the strongest evidence yet that simple scaling will get us to AGI (not a position I share, this did not update me much), and thinks we are only 1-2 orders of magnitude away. He then says he is ‘not an AI doomer’ and is ‘on the side of computational and scientific freedom’ but is concerned because that future is highly unpredictable. Yes, well.

What are we going to do with this ability to make videos?

At what look like Sora’s current capabilities level? Seems like not a lot.

I strongly agree with Sully here:

Matt Turck: Movie watching experience

2005: Go to a movie theater.

2015: Stream Netflix.

2025: ask LLM + text-to-video to create a new season of Narcos to watch tonight, but have it take place in Syria with Brad Pitt, Mr. Beast and Travis Kelce in the leading roles.

Sully: Hot take: most ppl won’t make their movies/shows until we can read minds most people are boring/lazy.

They want to come home, & be spoon fed a show/movie/music.

Value accrual will happen at the distribution end (Netflix,Spotify, etc), since they already know you preferences.

Xeophon: And a big part is the social aspect. You cannot talk with your friends about a movie if everyone saw a totally different thing. Memes and internet culture wouldn’t work, either.

John Rush: you’re 100% right. the best example is the modern UX. Which went from 1) lots of actions(filters, categories, search) (blogs) 2) to little action: scroll (fb) 3) to no action: auto-playing stories (inst/tiktok)

I do not think that Sora and its ilk will be anywhere near ready, by 2025, to create actually watchable content, in the sense of anyone sane wanting to watch it. That goes double for things generated directly from prompts, rather than bespoke transformations and expansions of existing creative work, and some forms of customization, dials or switches you can turn or flip, that are made much easier to assemble, configure and serve.

I do think there’s a lot of things that can be done. But I think there is a rather large period where ‘use AI methods to make tweaks possible and practical’ is good, but almost no one in practice wants much more than that.

I think there is this huge benefit to knowing that the thing was specifically made by a particular set of people, and seeing their choices, and having everything exist in that context. And I do think we will mostly want to retain the social reference points and interactions, including for games. There is a ton of value there. You want to compare your experience to someone else’s. That does not mean that AI couldn’t get sufficiently good to overcome that, but I think the threshold is high.

As a concrete example, right now I am watching the show Severance on Apple TV. So far I have liked it a lot, but the ways it is good are intertwined with it being a show written by humans, and those creators making choices to tell stories and explore concepts. If an AI managed to come up with the same exact show, I would be super impressed by that to be sure, but also the show would not be speaking to me in the same way.

Ryan Moulton: There is a huge gap in generative AI between the quality you observe when you’re playing with it open endedly, and the quality you observe when you try to use it for a task where you have a specific end goal in mind. This is I think where most of the hype/reality mismatch occurs.

PoliMath (distinct thread): I am begging anyone to take one scene from any movie and recreate it with Sora Any movie. Anything at all. Taxi Driver, Mean Girls, Scott Pilgrim, Sonic the Hedgehog, Buster Keaton. Anything.

People are being idiots in the replies here so I’ll clarify: The comment was “everyone will be filmmakers” with AI No they won’t.

Everyone will be able to output random video that mostly kind of evokes the scene they are describing.

That is not filmmaking.

If you’ve worked with AI generation on images or text, you know this is true. Try getting ChatGPT to output even tepidly interesting dialogue about any specific topic. Put a specific image in your head and try to get Midjourney to give you that image.

Same thing with image generation. When I want something specific, I expect to be frustrated and disappointed. When I want anything at all within a vibe zone, when variations are welcomed, often the results are great.

Will we get there with video? Yes I think we will, via modifications and edits and general advancements, and incorporating AI agents to implement the multi-step process. But let’s not get ahead of ourselves.

The contrast and flip side is then games. Games are a very different art form. We should expect games to continue to improve in some ways relative to non-interactive experiences, including transitioning to full AR/VR worlds, with intelligent other characters, more complex plots that give you more interactive options and adapt to your choices, general awesomeness. It is going to be super cool, but it won’t be replacing Netflix.

Tyler Cowen asked what the main commercial uses will be. The answers seem to be that they enable cheap quick videos in the style of TikTok or YouTube, or perhaps a music video. Quality available for dirt cheap may go up.

Also they enable changing elements of a video. The example in the technical paper was to turn the area around a driving car into a jungle, others speculate about de-aging actors or substituting new ones.

I think this will be harder here than in many other cases. With text, with images and with sound, I saw the mundane utility. Here I mostly don’t.

At a minimum it will take time. These tools are nowhere near being able to reproduce existing high quality outputs. So instead, the question becomes what we can do with the new inputs, to produce what kinds of new outputs that people still value.

Tyler posted his analysis a few days later, saying it has profound implications for ‘all sorts of industries’ but will hit the media first, especially advertising, although he agrees it will not put Hollywood out of business. I agree that this makes ‘have something vaguely evocative you can use as an advertisement’ will get easier and cheaper, I suppose, when people want that.

Others are also far more excited than I am. Anton says Tesla should go all-in on this due to its access to video data from drivers, and buy every GPU at any price to do more video. I would not be doing that.

Grimes: Cinema – the most prohibitively expensive art form (but also the greatest and most profound) – is about to be completely democratized the way music was with DAW’s.

(Without DAW’s like ableton, GarageBand, logic etc – grimes and most current artists wouldn’t exist).

Crucifore (distinct thread): I’m still genuinely perplexed by people saying Sora etc is the “end of Hollywood.” Crafting a story is very different than generating an image.

Alex Tabarrok: Crafting a story is a more distributed skill than the capital intensive task of making a movie.

Thus, by democratizing the latter, Sora et al. give a shot to the former which will mean a less Hollywood centric industry, much as Youtube has drawn from TV studios.

Matt Darling: Worth noting that YouTube is also sort of fundamentally a different product than TV. The interesting question is less “can you do movies with AI?” and more “what can we do now that we couldn’t before?”.

Alex Tabarrok: Yes, exactly; but attention is a scarce resource.

Andrew Curran says it can do graph design and notes it can generate static images. He is super excited, thread has examples.

I still don’t see it. I mean, yes, super impressive, big progress leap in the area, but still seems a long way from where it needs to be.

Of course, ‘a long way’ often translates in this business to ‘a few years,’ but I still expect this to be a small part of the picture compared to text, or for a while even images or voice.

Here’s a concrete question:

Daniel Eth: If you think sora is better than what you expected, does that mean you should buy Netflix or short Netflix? Legitimately curious what finance people think here.

My guess is little impact for a while. My gut says net negative, because it helps Netflix’s competition more than it helps Netflix.

What will the future bring? Here is scattershot prediction fun on what will happen at the end of 2025:

Cost is going to be a practical issue. $0.50 per minute is tiny for some purposes, but it is also a lot for others, especially if you cannot get good results zero-shot and have to do iterations and modifications, or if you are realistically only going to see it once.

I continue to think that text-to-video has a long way to go before it offers much mundane utility. Text should remain dominant, then multimodality with text including audio generation, then images, only then video. For a while, when we do get video, I expect it to largely in practice be based off of bespoke static images, real and otherwise, rather than the current text-to-video idea. The full thing will eventually get there, but I expect a (relatively, in AI timeline terms) long road, and this is a case where looking for anything at all loses out most often to looking for something specific.

But also, perhaps, I am wrong. I have been a video skeptic in many ways long before AI. There are some uses for ‘random cool video vaguely in this area of thing.’ And if AI video becomes a major use case, that seems mostly good, as it will be relatively easy to spot and otherwise less dangerous, and let’s face it, video is cool.

So prove me wrong, kids. Prove me wrong.

Sora What Read More »

ai-#52:-oops

AI #52: Oops

We were treated to technical marvels this week.

At Google, they announced Gemini Pro 1.5, with a million token context window within which it has excellent recall, using mixture of experts to get Gemini Advanced level performance (e.g. GPT-4 level) out of Gemini Pro levels of compute. This is a big deal, and I think people are sleeping on it. Also they released new small open weights models that look to be state of the art.

At OpenAI, they announced Sora, a new text-to-video model that is a large leap from the previous state of the art. I continue to be a skeptic on the mundane utility of video models relative to other AI use cases, and think they still have a long way to go, but this was both technically impressive and super cool.

Also, in both places, mistakes were made.

At OpenAI, ChatGPT briefly lost its damn mind. For a day, faced with record traffic, the model would degenerate into nonsense. It was annoying, and a warning about putting our trust in such systems and the things that can go wrong, but in this particular context it was weird and beautiful and also hilarious. This has now been fixed.

At Google, people noticed that Gemini Has a Problem. In particular, its image generator was making some highly systematic errors and flagrantly disregarding user requests, also lying about it to users, and once it got people’s attention things kept looking worse and worse. Google has, to their credit, responded by disabling entirely the ability of their image model to output people until they can find a fix.

I hope both serve as important warnings, and allow us to fix problems. Much better to face such issues now, when the stakes are low.

Covered separately: Gemini Has a Problem, Sora What, and Gemini 1.5 Pro.

  1. Introduction. We’ve got some good news, and some bad news.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Probable probabilities?

  4. Language Models Don’t Offer Mundane Utility. Air Canada finds out.

  5. Call me Gemma Now. Google offers new state of the art tiny open weight models.

  6. Google Offerings Keep Coming and Changing Names. What a deal.

  7. GPT-4 Goes Crazy. But it’s feeling much better now.

  8. GPT-4 Real This Time. Offer feedback on GPTs, see their profiles.

  9. Fun With Image Generation. Image generation for journal articles.

  10. Deepfaketown and Botpocalypse Soon. Several approaches to impersonation risks.

  11. Selling Your Chatbot Data. I don’t really know what you were expecting.

  12. Selling Your Training Data. I still don’t really know what you were expecting.

  13. They Took Our Jobs. There is a third option.

  14. Get Involved. Apart Research is hiring.

  15. Introducing. Groq, Lindy, Podcaster Copilot, potentially Magic and Altera.

  16. In Other AI News. Altman looks to move his chip plans forward.

  17. Quiet Speculations. Arguing over slow versus fast takeoff during takeoff.

  18. The Quest for Sane Regulations. There will be many bills along the way.

  19. The Week in Audio. I’m back on the Cognitive Revolution.

  20. The Original Butlerian Jihad. What was Dune a cautionary tale against again?

  21. Rhetorical Innovation. Another open letter, another trillion dollars. Ho hum.

  22. Public Service Announcement. Fentanyl, both literally and as metaphor.

  23. People Are Worried About AI Killing Everyone. There is a pattern to who.

  24. Other People Are Not As Worried About AI Killing Everyone. Sure, why not.

  25. The Lighter Side. There is not enough information to solve the problem.

Steven Johnson strongly endorses NotebookLM, offers YouTube tutorial. This is definitely one of those ‘I need to try using this more and it’s weird I don’t find excuses’ situations.

Automatically email everyone to tell them to remove your email address from their database.

Patrick McKenzie: Interestingly, one of the first denial of service via AI-powered amplification events that I’ve seen personally is an app that users let puppet their Gmail account then send officially flavored deletion requests to everyone the AI thinks probably has their email in a database.

If one believes one is subject to regulatory regimes, or has compatible moral intuitions, requiring a deletion request immediately cause bespoke work, that turns two minutes of user activity into ~thousands of companies each needing to do variable amounts of work.

Get probabilities, LessWrong style, by asking for LessWrong style norms of probabilities?

Daniel Eth:

From ChatGPT: ChatGPT

Confidence Levels in the Existence of Chinese Dynasties

Xia Dynasty: ~50-70%

Three Sovereigns and Five Emperors: ~10-20%

Shang Dynasty: ~95-99%

Zhou Dynasty: ~99%

Qin Dynasty: ~99.9%

Han Dynasty: ~99.9%

These confidence levels are approximations based on current archaeological evidence and historical scholarship.

(I should clarify that I used a bit of prompt engineering to get it to guess and then asked it to repeat the results without its explanations, but nevertheless these were all numbers that it came up with)

Specifically, here’s the prompt engineering I did:

“Assume LessWrong style norms of probabilities – approximately how confident is it reasonable for a person to be in the existence of each of these dynasties? It’s okay to be wrong, just give a reasonable answer for each.”

He also tested for situational awareness by having it estimate there was a 70% chance it was the victim of RLHF, with a 30% chance it was the base model. It asks some reasonable questions, but fails to ask about base rates of inference, so it gets 70% rather than 99%.

I have added this to my custom instructions.

There are also active AI forecasters on Manifold, who try to generate their own predictions using various reasoning processes. Do they have alpha? It is impossible to say given the data we have, they clearly do some smart things and also some highly dumb things. Trading strategies will be key, as they will fall into traps hardcore if they are allowed to, blowing them up, even if they get a lot better than they are now.

I continue to be curious to build a Manifold bot, but I would use other principles. If anyone wants to help code one for me to the point I can start tweaking it in exchange for eternal ephemeral glory and a good time, and perhaps a share of the mana profits, let me know.

Realize, after sufficient prodding, that letting them see your move in Rock-Paper-Scissors might indeed be this thing we call ‘cheating.’

Why are they so often so annoying?

Emmett Shear: How do we RLHF these LLMs until they stop blaming the user and admit that the problem is that they are unsure? Where does the smug, definitive, overconfident tone that all the LLMs have come from?

Nate Silver: It’s quite similar to the tone in mainstream, center-left political media, and it’s worth thinking about how the AI labs and the center-left media have the same constituents to please.

Did you know they are a student at the University of Michigan? Underlying claim about who is selling what data is disputed, the phenomenon of things being patterns in the data is real either way.

Davidad: this aligns with @goodside’s recollection to me once that a certain base model responded to “what do you do?” with “I’m a student at the University of Michigan.”

My explanation is that if you’re sampling humans weighted by the ratio of their ability to contribute English-language training data to the opportunity cost of their time per marginal hour, “UMich student” is one of the dominant modes.

Timothy Lee asks Gemini Advanced as his first prompt a simple question designed to trick it, where it really shouldn’t get tricked, it gets tricked.

You know what? I am proud of Google for not fixing this. It would be very easy for Google to say, this is embarrassing, someone get a new fine tuning set and make sure it never makes this style of mistake again. It’s not like it would be that hard. It also never matters in practice.

This is a different kind of M&M test, where they tell you to take out all the green M&Ms, and then you tell them, ‘no, that’s stupid, we’re not doing that.’ Whether or not they should consider this good news is another question.

Air Canada forced to honor partial refund policy invented by its chatbot. The website directly contradicted the bot, but the judge ruled that there was no reason a customer should trust the rest of the website rather than the chatbot. I mean, there is, it is a chatbot, but hey.

Chris Farnell: Science fiction writers: The legal case for robot personhood will be made when a robot goes on trial for murder. Reality: The legal case for robot personhood will be made when an airline wants to get out of paying a refund.

While I fully support this ruling, I do not think that matter was settled. If you offer a chatbot to customers, they use it in good faith and it messes up via a plausible but incorrect answer, that should indeed be on you. Only fair.

Matt Levine points out that this was the AI acting like a human, versus a corporation trying to follow an official policy:

The funny thing is that the chatbot is more human than Air Canada. Air Canada is a corporation, an emergent entity that is made up of people but that does things that people, left to themselves, would not do. The chatbot is a language model; it is in the business of saying the sorts of things that people plausibly might say. If you just woke up one day representing Air Canada in a customer-service chat, and the customer said “my grandmother died, can I book a full-fare flight and then request the bereavement fare later,” you would probably say “yes, I’m sorry for your loss, I’m sure I can take care of that for you.” Because you are a person!

The chatbot is decent at predicting what people would do, and it accurately gave that answer. But that’s not Air Canada’s answer, because Air Canada is not a person.

The question is, what if the bot had given an unreasonable answer? What if the customer had used various tricks to get the bot to, as happened in another example, sell a car for $1 ‘in a legally binding contract’? Is there an inherent ‘who are you kidding?’ clause here, or not, and if there is how far does it go?

One can also ask whether a good disclaimer could get around this. The argument was that there was no reason to doubt the chatbot, but it would be easy to give a very explicit reason to doubt the chatbot.

A wise memo to everyone attempting to show off their new GitHub repo:

Liz Lovelace: very correct take, developers take note

Paul Calcraft: I loved this thread so much. People in there claiming that anyone who could use a computer should find it easy enough to Google a few things, set up Make, compile it and get on with it Great curse of knowledge demo.

Code of Kai: This is the correct take even for developers. Developers don’t seem to realise how much of their time is spent learning how to use their tools compared to solving problems. The ratio is unacceptable.

Look, I know that if I did it a few times I would be over it and everything would be second nature but I keep finding excuses not to suck it up and do those few times. And if this is discouraging me, how many others is it discouraging?

Gemma, Google’s miniature 2b and 7b open model weights language models, are now available.

Demis Hassabis: We have a long history of supporting responsible open source & science, which can drive rapid research progress, so we’re proud to release Gemma: a set of lightweight open models, best-in-class for their size, inspired by the same tech used for Gemini.

I have no problems with this. Miniature models, at their current capabilities levels, are exactly a place where being open has relatively more benefits and minimal costs.

I also think them for not calling it Gemini, because even if no one else cares, there should be exactly two models called Gemini. Not one, not three, not four. Two. Call them Pro and Ultra if you insist, that’s fine, as long as there are two. Alas.

In the LLM Benchmark page it is now ranked #1 although it seems one older model may be missing:

As usual, benchmarks tell you a little something but are often highly misleading. This does not tell us whether Google is now state of the art for these model sizes, but I expect that this is indeed the case.

Thomas Kurian: We’re announcing Duet AI for Google Workspace will now be Gemini for Google Workspace. Consumers and organizations of all sizes can access Gemini across the Workspace apps they know and love.

We’re introducing a new offering called Gemini Business, which lets organizations use generative AI in Workspace at a lower price point than Gemini Enterprise, which replaces Duet AI for Workspace Enterprise.

We’re also beginning to roll out a new way for Gemini for Workspace customers to chat with Gemini, featuring enterprise-grade data protections.

Lastly, consumers can now access Gemini in their personal Gmail, Docs, Slides, Sheets, and Meet apps through a Google One AI Premium subscription.

Sundar Pichai (CEO Google): More Gemini news: Starting today, Gemini for Workspace is available to businesses of all sizes, and consumers can now access Gemini in their personal Gmail, Docs and more through a Google One AI Premium subscription.

This seems like exactly what individuals can get, except you buy in bulk for your business?

To be clear, that is a pretty good product. Google will be getting my $20 per month for the individual version, called ‘Google One.’

Now, in addition to Gemini Ultra, you also get Gemini other places like GMail and Google Docs and Google Meet, and various other fringe benefits like 2 TB of storage and longer Google Meet sessions.

Alyssa Vance: Wow, I got GPT-4 to go absolutely nuts. (The prompt was me asking about mattresses in East Asia vs. the West).

Cate Hall: “Yoga on a great repose than the neared note, the note was a foreman and the aim of the aim” is my favorite Fiona Apple album.

Andriy Burkov: OpenAI has broken GPT-4. It ends each reply with hallucinated garbage and doesn’t stop generating it.

Matt Palmer: So this is how it begins, huh?

Nik Sareen: it was speaking to me in Thai poetry an hour ago.

Sean McGuire: ChatGPT is apparently going off the rails right now [8:32pm February 20] and no one can explain why.

the chatgptsubreddit is filled with people wondering why it started suddenly speaking Spanglish, threatened the user (I’m in the room with you right now, lmao) or started straight up babbling.

Esplin: ChatGPT Enterprise has lost its mind

Grace Kind: So, did your fuzz testing prepare you for the case where the API you rely on loses its mind?

But don’t worry. Everything’s fine now.

ChatGPT (Twitter account, February 21 1: 30pm): went a little off the rails yesterday but should be back and operational!

Danielle Fong: Me when I overdid it with the edibles.

What the hell happened?

Here is their official postmortem, posted a few hours later. It says the issue was resolved on February 21 at 2: 14am eastern time.

Postmortem: On February 20, 2024, an optimization to the user experience introduced a bug with how the model processes language.

LLMs generate responses by randomly sampling words based in part on probabilities. Their “language” consists of numbers that map to tokens.

In this case, the bug was in the step where the model chooses these numbers. Akin to being lost in translation, the model chose slightly wrong numbers, which produced word sequences that made no sense. More technically, inference kernels produced incorrect results when used in certain GPU configurations.

Upon identifying the cause of this incident, we rolled out a fix and confirmed that the incident was resolved.

Davidad hazarded a guess before that announcement, which he thinks now looks good.

Nora Belrose: I’ll go on the record as saying I expect this to be caused by some very stupid-in-retrospect bug in their inference or fine tuning code.

Unfortunately they may never tell us what it was.

Davidad: My modal prediction: something that was regularizing against entropy got sign-flipped to regularize *in favorof entropy. Sign errors are common; sign errors about entropy doubly so.

I predict that the weights were *notcorrupted (by fine-tuning or otherwise), only sampling.

If it were just a mistakenly edited scalar parameter like temperature or top-p, it would probably have been easier to spot and fix quickly. More likely an interaction between components. Possibly involving concurrency, although they’d probably be hesitant to tell us about that.

But it’s widely known that temperature 0.0 is still nondeterministic because of a wontfix race condition in the sampler.

oh also OpenAI in particular has previously made a sign error that people were exposed to for hours before it got reverted.

[announcement was made]

I’m feeling pretty good about my guesses that ChatGPT’s latest bug was:

an inference-only issue

not corrupted weights

not a misconfigured scalar

possibly concurrency involved

they’re not gonna tell us about the concurrency (Not a sign flip, though)

Here’s my new guess: they migrated from 8-GPU processes to 4-GPU processes to improve availability. The MoE has 8 experts. Somewhere they divided logits by the number of GPUs being combined instead of the number of experts being combined. Maybe the 1-GPU config was special-cased so the bug didn’t show up in the dev environment.

Err, from 4-GPU to 8-GPU processes, I guess, because logits are *dividedby temperature, so that’s the direction that would result in accidentally doubling temperature. See this is hard to think about properly.

John Pressman says it was always obviously a sampling bug, although saying that after the postmortem announcement scores no Bayes points. I do agree that this clearly was not an RLHF issue, that would have manifested very differently.

Roon looks on the bright side of life.

Roon: it is pretty amazing that gpt produces legible output that’s still following instructions despite sampling bug

Should we be concerned more generally? Some say yes.

Connor Leahy: Really cool how our most advanced AI systems can just randomly develop unpredictable insanity and the developer has no idea why. Very reassuring for the future.

Steve Strickland: Any insight into what’s happened here Connor? I know neural nets/transformers are fundamentally black boxes. But seems strange that an LLM that’s been generating grammatically perfect text for over a year would suddenly start spewing out garbage.

Connor Leahy: Nah LLMs do shit like this all the time. They are alien machine blobs wearing masks, and it’s easy for the mask to slip.

Simeon (distinct thread): Sure, maybe we fucked up hard this ONE TIME a deployment update to hundreds of million of users BUT we’ll definitely succeed at a dangerous AGI deployment.

Was this a stupid typo or bug in the code, or some parameter being set wrong somewhere by accident, or something else dumb? Seems highly plausible that it was.

Should that bring us comfort? I would say it should not. Dumb mistakes happen. Bugs and typos that look dumb in hindsight happen. There are many examples of dumb mistakes changing key outcomes in history, determining the fates of nations. If all it takes is one dumb mistake to make GPT-4 go crazy, and it takes us a day to fix it when this error does not in any way make the system try to stop you from fixing it, then that is not a good sign.

GPT-4-Turbo rate limits have been doubled, daily limits removed.

You can now rate GPTs and offer private feedback to the builder. Also there’s a new about section:

OpenAI: GPTs ‘About’ section can now include:

∙ Builder social profiles

∙ Ratings

∙ Categories

∙ # of conversations

∙ Conversation starters

∙ Other GPTs by the builder

Short explanation that AI models tend to get worse over time because taking into account user feedback makes models worse. It degrades their reasoning abilities such as chain of thought, and generally forces them to converge towards a constant style and single mode of being, because the metric of ‘positive binary feedback’ points in that direction. RLHF over time reliably gets us something we like less and is less aligned to what we actually want, even when there is no risk in the room.

The short term implication is easy, it is to be highly stingy and careful with your RLHF feedback. Use it in your initial fine-tuning if you don’t have anything better, but the moment you have what you need, stop.

The long term implication is to reinforce that the strategy absolutely does not scale.

Emmett Shear:

What I learned from posting this is that people have no idea how RLHF actually works.

Matt Bateman: Not sure how you parent but whenever 3yo makes a mistake I schedule a lobotomy.

Emmett Shear: Junior started using some bad words at school, but no worries we can flatten that part of the mindscape real quick, just a little off the top. I’m sure there won’t be any lasting consequences.

What we actually do to children isn’t as bad as RLHF, but it is bad enough, as I often discuss in my education roundups. What we see happening to children as they go through the school system is remarkably similar, in many ways, to what happens to an AI as it goes through fine tuning.

Andres Sandberg explores using image generation for journal articles, finds it goes too much on vibes versus logic, but sees rapid progress. Expects this kind of thing to be useful within a year or two.

ElevenLabs is preparing for the election year by creating a ‘no-go voices’ list, starting with the presidential and prime minister candidates in the US and UK. I love this approach. Most of the danger is in a handful of voices, especially Biden and Trump, so detect those and block them. One could expand this by allowing those who care to have their voices added to the block list.

On the flip side, you can share your voice intentionally and earn passive income, choosing how much you charge.

The FTC wants to crack down on impersonation. Bloomberg also has a summary.

FTC: The Federal Trade Commission is seeking public comment on a supplemental notice of proposed rulemaking that would prohibit the impersonation of individuals. The proposed rule changes would extend protections of the new rule on government and business impersonation that is being finalized by the Commission today.

It is odd that this requires a rules change? I would think that impersonating an individual, with intent to fool someone, would already be not allowed and also fraud.

Indeed, Gemini says that there are no new prohibitions here. All this does is make it a lot easier for the FTC to get monetary relief. Before, they could get injunctive relief, but at this scale that doesn’t work well, and getting money was a two step process.

Similarly, how are we only largely getting around to punishing these things now:

For example, the rule would enable the FTC to directly seek monetary relief in federal court from scammers that:

  • Use government seals or business logos when communicating with consumers by mail or online.

  • Spoof government and business emails and web addresses, including spoofing “.gov” email addresses or using lookalike email addresses or websites that rely on misspellings of a company’s name.

  • Falsely imply government or business affiliation by using terms that are known to be affiliated with a government agency or business (e.g., stating “I’m calling from the Clerk’s Office” to falsely imply affiliation with a court of law).  

I mean those all seem pretty bad. It does seem logical to allow direct fines.

The question is, how far to take this? They propose quite far:

The Commission is also seeking comment on whether the revised rule should declare it unlawful for a firm, such as an AI platform that creates images, video, or text, to provide goods or services that they know or have reason to know is being used to harm consumers through impersonation.

How do you prevent your service from being used in part for impersonation? I have absolutely no idea. Seems like a de facto ban on AI voice services that do not lock down the list of available voices. Which also means a de facto ban on all open model weights voice creation software. Image generation software would have to be locked down rather tightly as well once it passes a quality threshold, with MidJourney at least on the edge. Video is safe for now, but only because it is not yet good enough.

There is no easy answer here. Either we allow tools that enable the creation of things that seem real, or we do not. If we do, then people will use them for fraud and impersonation. If we do not, then that means banning them, which means severe restrictions on video, voice and image models.

Seva worries primarily not about fake things taken to be potentially real, but about real things taken to be potentially fake. And I think this is right. The demand for fakes is mostly for low-quality fakes, whereas if we can constantly call anything fake we have a big problem.

Seva: I continue to think the bigger threat of deepfakes is not in convincing people that fake things are real but in offering plausible suspicion that real things are fake.

Being able to deny an objective reality is much more pernicious than looking for evidence to embrace an alternate reality, which is something people do anyway even when that evidence is flimsy.

Like I would bet socialization, or cognitive heuristics like anchoring effects, drive disinfo much more than deepfakes.

Albert Pinto: Daniel dennet laid out his case for erosion of trust (between reality and fake) is gigantic effect of AI

Seva: man I’m going to miss living in a high trust society.

We are already seeing this effect, such as here (yes it was clearly real to me, but that potentially makes the point stronger):

Daniel Eth: Like, is this real? Is it AI-generated? I think it’s probably real, but only because, a) I don’t have super strong priors against this happening, b) it’s longer than most AI-generated videos and plus it has sound, and c) I mildly trust @AMAZlNGNATURE

I do expect us to be able to adapt. We can develop various ways to show or prove that something is genuine, and establish sources of trust.

One question is, will this end up being good for our epistemics and trustworthiness exactly because they will now be necessary?

Right now, you can be imprecise and sloppy, and occasionally make stuff up, and we can find that useful, because we can use our common sense and ability to differentiate reality, and the crowd can examine details to determine if something is fake. The best part about community notes, for me, is that if there is a post with tons of views, and it does not have a note, then that is itself strong evidence.

In the future, it will become extremely valuable to be a trustworthy source. If you are someone who maintains the chain of epistemic certainty and uncertainty, who makes it clear what we know and how we know it and how much we should trust different statements, then you will be useful. If not, then not. And people may be effectively white-listing sources that they can trust, and doing various second and third order calculations on top of that in various ways.

The flip side is that this could make it extremely difficult to ‘break into’ the information space. You will have to build your credibility the same way you have to build your credit score.

In case you did not realize, the AI companion (AI girlfriend and AI boyfriend and AI nonbinary friends even though I oddly have not heard mention of one yet, and so on, but that’s a mouthful ) aps absolutely 100% are harvesting all the personal information you put into the chat, most of them are selling it and a majority won’t let you delete it. If you are acting surprised, that is on you.

The best version of this, of course, would be to gather your data to set you up on dates.

Cause, you know, when one uses a chatbot to talk to thousands of unsuspecting women so you can get dates, ‘they’ say there are ‘ethical concerns.’

Whereas if all those chumps are talking to the AIs on purpose? Then we know they’re lonely, probably desperate, and sharing all sorts of details to help figure out who might be a good match. There are so many good options for who to charge the money.

The alternative is that if you charge enough money, you do not need another revenue stream, and some uses of such bots more obviously demand privacy. If you are paying $20 a month to chat with an AI Riley Reid, that would not have been my move, but at a minimum you presumably want to keep that to yourself.

An underappreciated AI safety cause subarea is convincing responsible companies to allow adult content in a responsible way, including in these bots. The alternative is to drive that large market into the hands of irresponsible actors, who will do it in an irresponsible way.

AI companion data is only a special case, although one in which the privacy violation is unusually glaring, and the risks more obvious.

Various companies also stand ready to sell your words and other outputs as training data.

Reddit is selling its corpus. Which everyone was already using anyway, so it is not clear that this changes anything. It turns out that it is selling it to Google, in a $60 million deal. If this means that their rivals cannot use Reddit data, OpenAI and Microsoft in particular, that seems like an absolute steal.

Artist finds out that Pond5 and Shutterstock are going to sell your work and give you some cash, in this case $50, via a checkbox that will default to yes, and they will not let you tell them different after the money shows up uninvited. This is such a weird middle ground. If they had not paid, would the artist have ever found out? This looks to be largely due to an agreement Shutterstock signed with OpenAI back in July that caused its stock to soar 9%.

Pablo Taskbar: Thinking of a startup to develop an AI program to look for checkboxes in terms and condition documents.

Bernell Loeb: Same thing happened with my web host, Squarespace. Found out from twitter that Squarespace allowed ai to scrape our work. No notice given (no checks either). When I contacted them to object, I was told that I had to “opt out” without ever being told I was already opted in.

Santynieto: Happened to me with @SubstackInc: checking the preferences of my publication, I discovered a new, never announced-before setting by the same name, also checked, as if I had somehow “chosen”, without knowing abt it at all, to make my writing available for data training. I hate it!

I checked that last one. There is a box that is unchecked that says ‘block AI training.’

I am choosing to leave the box unchecked. Train on my writing all you want. But that is a choice that I am making, with my eyes open.

Why yes. Yes I do, actually.

Gabe: by 2029 the only jobs left will be bank robber, robot supervisor, and sam altman

Sam Altman: You want that last one? It’s kinda hard sometimes.

Apart Research, who got an ACX grant, is hiring for AI safety work. I have not looked into them myself and am passing along purely on the strength of Scott’s grant alone.

Lindy is now available to everyone, signups here. I am curious to try it, but oddly I have no idea what it would be useful to me to have this do.

Groq.com will give you LLM outputs super fast. From a creator of TPUs, they claim to have Language Processing Units (LPUs) that are vastly faster at inference. They do not offer model training, suggesting LPUs are specifically good at inference. If this is the future, that still encourages training much larger models, since such models would then be more commercially viable to use.

Podcaster copilot. Get suggested questions and important context in real time during a conversation. This is one of those use cases where you need to be very good relative to your baseline to be net positive to rely on it all that much, because it requires splitting your attention and risks disrupting flow. When I think about how I would want to use a copilot, I would want it to fact check claims, highlight bold statements with potential lines of response, perhaps note evasiveness, and ideally check for repetitiveness. Are your questions already asked in another podcast, or in their written materials? Then I want to know the answer now, especially important with someone like Tyler Cowen, where the challenge is to get a genuinely new response.

Claim that magic.dev has trained a groundbreaking model for AI coding, Nat Friedman is investing $100 million.

Nat Friedman: Magic.dev has trained a groundbreaking model with many millions of tokens of context that performed far better in our evals than anything we’ve tried before.

They’re using it to build an advanced AI programmer that can reason over your entire codebase and the transitive closure of your dependency tree. If this sounds like magic… well, you get it. Daniel and I were so impressed, we are investing $100M in the company today.

The team is intensely smart and hard-working. Building an AI programmer is both self-evidently valuable and intrinsically self-improving.

Intrinsically self-improving? Ut oh.

Altera Bot, an agent in Minecraft that they claim can talk to and collaboratively play with other people. They have a beta waitlist.

Sam Altman seeks Washington’s approval to build state of the art chips in the UAE. It seems there are some anti-trust concerns regarding OpenAI, which seems like it is not at all the thing to be worried about here. I continue to not understand how Washington is not telling Altman that under no way in hell is he going to do this in the UAE, he can either at least friend-shore it or it isn’t happening.

Apple looking to add AI to iPad interface and offer new AI programming tools, but progress continues to be slow. No mention of AI for the Apple Vision Pro.

More on the Copyright Confrontation from James Grimmelmann, warning that AI companies must take copyright seriously, and that even occasional regurgitation or reproduction of copyrighted work is a serious problem from a legal perspective. The good news in his view is that judges will likely want to look favorably upon OpenAI because it offers a genuinely new and useful transformative product. But it is tricky, and coming out arguing the copying is not relevant would be a serious mistake.

This is Connor Leahy discussing Gemini’s ability to find everything in a 3 hour video.

Connor Leahy: This is the kind of stuff that makes me think that there will be no period of sorta stupid, human-level AGI. Humans can’t perceive 3 hours of video at the same time. The first AGI will instantly be vastly superhuman at many, many relevant things.

Richard Ngo: “This is exactly what makes me think there won’t be any slightly stupid human-level AGI.” – Connor when someone shows him a slightly stupid human-level AGI, probably.

You are in the middle of a slow takeoff pointing to the slow takeoff as evidence against slow takeoffs.

Connor Leahy: By most people’s understanding, we are in a fast takeoff. And even by Paul’s definition, unless you expect a GDP doubling in 4 years before a 1 year doubling, we are in fast takeoff. So, when do you expect this doubling to happen?

Richard Ngo: I do in fact expect an 8-year GDP doubling before a 2-year GDP doubling. I’d low-confidence guess US GDP will be double its current value in 10-15 years, and then the next few doublings will be faster (but not *thatmuch faster, because GDP will stop tracking total output).

Slow is relative. It also could be temporary.

If world GDP doubles in the next four years without doubling in one, that is a distinct thing from historical use of the term ‘fast takeoff,’ because the term ‘fast takeoff’ historically means something much faster than that. It would still be ‘pretty damn fast,’ or one can think of it simply as ‘takeoff.’ Or we could say ‘gradual takeoff’ as the third slower thing.

I do not only continue to think that we not mock those who expected everything in AI to happen all at once with little warning, with ASI emerging in weeks, days or even hours, without that much mundane utility before that. I think that they could still be proven more right than those who are mocking them.

We have a bunch of visible ability and mundane utility now, so things definitely look like a ‘slow’ takeoff, but it could still functionally transform into a fast one with little warning. It seems totally reasonable to say that AI is rapidly getting many very large advantages with respect to humans, so if it gets to ‘roughly human’ in the core intelligence module, whatever you want to call that, then suddenly things get out of hand fast, potentially the ‘fast takeoff’ level of fast even if you see more signs and portents first.

More thoughts on how to interpret OpenAI’s findings on bioweapon enabling capabilities of GPT-4. The more time passes, the more I think the results were actually pretty impressive in terms of enhancing researcher capabilities, and also that this mostly speaks to improving capabilities in general rather than anything specifically about a bioweapon.

How will AIs impact people’s expectations, of themselves and others?

Sarah (Little Ramblings): have we had the unrealistic body standards conversation around AI images / video yet that we had when they invented airbrushing? if not can we get it over with cause it’s gonna be exhausting

‘honey remember these women aren’t real!! but like literally, actually not real’

can’t wait for the raging body dysmorphia epidemic amongst teenagers trying to emulate the hip to waist ratio of women who not only don’t look like that in real life, but do not in fact exist in real life.

Eliezer Yudkowsky: I predict/guess: Unrealistic BODY standards won’t be the big problem. Unrealistic MIND standards will be the problem. “Why can’t you just be understanding and sympathetic, like my AR harem?”

Sarah: it’s kinda funny that people are interpreting this tweet as concern about people having unrealistic expectations of their partners, when I was expressing concern about people having unrealistic expectations of themselves.

Eliezer Yudkowsky: Valid. There’s probably a MIND version of that too, but it’s not as straightforward to see what it’ll be.

Did you know that we already already have 65 draft bills in New York alone that have been introduced related to AI? And also Axios had this stat to offer:

Zoom out: Hochul’s move is part of a wave of state-based AI legislation — now arriving at a rate of 50 bills per week — and often proposing criminal penalties for AI misuse.

That is quite a lot of bills. One should therefore obviously not get too excited in any direction when bills are introduced, no matter how good or (more often) terrible the bill might be, unless one has special reason to expect them to pass.

The governor pushing a law, as New York’s is now doing, is different. According to Axios her proposal is:

  • Making unauthorized uses of a person’s voice “in connection with advertising or trade” a misdemeanor offense. Such offenses are punishable by up to one year jail sentence.

  • Expanding New York’s penal law to include unauthorized uses of artificial intelligence in coercion, criminal impersonation and identity theft.

  • Amending existing intimate images and revenge porn statutes to include “digital images” — ranging from realistic Photoshop-produced work to advanced AI-generated content. 

  • Codifying the right to sue over digitally manipulated false images.

  • Requiring disclosures of AI use in all forms of political communication “including video recording, motion picture, film, audio recording, electronic image, photograph, text, or any technological representation of speech or conduct” within 60 days of an election.

As for this particular law? I mean, sure, all right, fine? I neither see anything especially useful or valuable here, nor do I see much in the way of downside.

Also, this is what happens when there is no one in charge and Congress is incapable of passing basic federal rules, not even around basic things like deepfakes and impersonation. The states will feel compelled to act. The whole ‘oppose any regulatory action of any kind no matter what’ stance was never going to fly.

Department of Justice’s Monaco says they will be more harshly prosecuting cybercrimes if those involved were using AI, similar to the use of a gun. I notice I am confused. Why would the use of AI make the crime worse?

Matthew Pines looks forward to proposals for ‘on-chip governance,’ with physical mechanisms built into the hardware, linking to a January proposal writeup from Aarne, Fist and Withers. As they point out, by putting the governance onto the chip where it can do its job in private, you potentially avoid having to do other interventions and surveillance that violates privacy far more. Even if you think there is nothing to ultimately fear, the regulations are coming in some form. People who worry about the downsides of AI regulation need to focus more on finding solutions that minimize such downsides and working to steer towards those choices, rather than saying ‘never do anything at all’ as loudly as possible until the breaking point comes.

European AI Office launches.

I’m back at The Cognitive Revolution to talk about recent events and the state of play. Also available on X.

Who exactly is missing the point here, you think?

Saberspark [responding to a Sora video]: In the Dune universe, humanity banned the “thinking machines” because they eroded our ability to create and think for ourselves. That these machines were ultimately a bastardization of humanity that did more harm than good.

Cactus: Sci-fi Author: in my book I showed the destruction of Thinking Machines as a cautionary tale. Twitter User: We should destroy the Thinking Machines from classic sci-fi novel Don’t Destroy the Thinking Machines

I asked Gemini Pro, Gemini Advanced, GPT-4 and Claude.

Everyone except Gemini Pro replied in the now standard bullet point style. Every point one was ‘yes, this is a cautionary tale against the dangers of AI.’ Gemini Pro explained that in detail, whereas the others instead glossed over the details and then went on to talk about plot convenience, power dynamics and the general ability to tell interesting stories focused on humans, which made it the clearly best answer.

Whereas most science fiction stories solve the problem of ‘why doesn’t AI invalidate the entire story’ with a ‘well that would invalidate the entire story so let us pretend that would not happen, probably without explaining why.’ There are of course obvious exceptions, such as the excellent Zones of Thought novels, that take the question seriously.

It’s been a while since we had an open letter about existential risk, so here you go. Nothing you would not expect, I was happy to sign it.

In other news (see last week for details if you don’t know the context for these):

Robert Wiblin: It’s very important we start the fire now before other people pour more gasoline on the house I say as I open my trillion-dollar gasoline refining and house spraying complex.

Meanwhile:

Sam Altman: fk it why not 8

our comms and legal teams love me so much!

This does tie back to AI, but also the actual core information seems underappreciated right now: Lukas explains that illegal drugs are now far more dangerous, and can randomly kill you, due to ubiquitous lacing with fentanyl.

Lukas: Then I learned it only takes like 1mg to kill you and I was like “hmmmm… okay, guess I was wrong. Well, doesn’t matter anyway – I only use uppers and there’s no way dealers are cutting their uppers with downers that counteract the effects and kill their customers, that’d be totally retarded!”

Then I see like 500 people posting test results showing their cocaine has fentanyl in it for some reason, and I’m forced to accept that my theory about drug dealers being rational capitalistic market participants may have been misguided.

They have never been a good idea, drugs are bad mmmkay (importantly including alcohol), but before fentanyl using the usual suspects in moderation was highly unlikely to kill you or anything like that.

Now, drug dealers can cut with fentanyl to lower costs, and face competition on price. Due to these price pressures, asymmetric information, lack of attribution and liability for any overdoses and fatalities, and also a large deficit in morals in the drug dealing market, a lot of drugs are therefore cut with fentanyl, even uppers. The feedback signal is too weak. So taking such drugs even once can kill you, although any given dose is highly unlikely to do so. And the fentanyl can physically clump, so knowing someone else took from the same batch and is fine is not that strong as evidence of safety either. The safety strips help but are imperfect.

As far as I can tell, no one knows the real base rates on this for many obvious reasons, beyond the over 100,000 overdose deaths each year, a number that keeps rising. It does seem like it is super common. The DEA claims that 42% of pills tested for fentanyl contained at least 2mg, a potentially lethal dose. Of course that is not a random sample or a neutral source, but it is also not one free to entirely make numbers up.

Also the base potency of many drugs is way up versus our historical reference points or childhood experiences, and many people have insufficiently adjusted for this with their dosing and expectations.

Connor Leahy makes, without saying it explicitly, the obvious parallel to AGI.

Connor Leahy: This is a morbidly perfect demonstration about how there are indeed very serious issues that free market absolutism just doesn’t solve in practice.

Thankfully this only applies to this one specific problem and doesn’t generalize across many others…right?

[reply goes into a lot more detail]

The producer of the AGI gets rewarded for taking on catastrophic or existential risk, and also ordinary mundane risks. They are not responsible for the externalities, right now even for mundane risks they do not face liability, and there is information asymmetry.

Capitalism is great, but if we let capitalism do its thing here without fixing these classic market failures, they and their consequences will get worse over time.

This matches my experience as well, the link has screenshots from The Making of the Atomic Bomb. So many of the parallels line up.

Richard Ngo: There’s a striking similarity between physicists hearing about nuclear chain reactions and AI researchers hearing about recursive self-improvement.

Key bottlenecks in both cases include willingness to take high-level arguments seriously, act under uncertainty, or sound foolish.

The basics are important. I agree that you can’t know for sure, but if we do indeed do this accidentally then I do not like our odds.

Anton: one thing i agree with the artificial superintelligence xrisk people on is that it might indeed be a problem if we accidentally invented god.

Maybe, not necessarily.

If you do create God, do it on purpose.

Roon continues to move between the camps, both worried and not as worried, here’s where he landed this week:

Roon: is building astronomically more compute ethical and safe? Who knows idk.

Is building astronomically more compute fun and entertaining brahman? yes.

Grace Kind:

Sometimes one asks the right questions, then chooses not to care. It’s an option.

I continue to be confused by this opinion being something people actually believe:

Sully: AGI (AI which can automate most jobs) is probably like 2-3 years away, closer than what most think.

ASI (what most people think is AGI, some godlike singularity, etc) is probably a lot further along.

We almost have all the pieces to build AGI, someone just needs to do it.

Let’s try this again. If we have AI that can automate most jobs within 3 years, then at minimum we hypercharge the economy, hypercharge investment and competition in the AI space, and dramatically expand the supply while lowering the cost of all associated labor and work. The idea that AI capabilities would get to ‘can automate most jobs,’ the exact point at which it dramatically accelerates progress because most jobs includes most of the things that improve AI, and then stall for a long period, is not strictly impossible, I can get there if I first write the conclusion at the bottom of the page and then squint and work backwards, but it is a very bizarre kind of wishful thinking. It supposes a many orders of magnitude difficulty spike exactly at the point where the unthinkable would otherwise happen.

Also, a reminder for those who need to hear it, that who is loud on Twitter, or especially who is loud on Twitter within someone’s bubble, is not reality. And also a reminder that there are those hard at work trying to create the vibe that there is a shift in the vibes, in order to incept the new vibes. Do not fall for this.

Ramsey Brown: My entire timeline being swamped with pro-US, pro-natal, pro-Kardashev, pro-Defense is honestly giving me conviction that the kids are alright and we’re all gonna make it.

Marc Andreessen: The mother of all vibe shifts. 🇺🇸👶☢️🚀.

Yeah, no, absolutely nothing has changed. Did Brown observe this at all? Maybe he did. Maybe he didn’t. If he did, it was because he self-selected into that corner of the world, where everyone tries super hard to make fetch happen.

SMBC has been quietly going with it for so long now.

Roon is correct. We try anyway.

Roon: there is not enough information to solve the problem

AI #52: Oops Read More »

one-true-love

One True Love

We have long been waiting for a version of this story, where someone hacks together the technology to use Generative AI to work the full stack of the dating apps on their behalf, ultimately finding their One True Love.

Or at least, we would, if it turned out he is Not Making This Up.

Fun question: Given he is also this guy, does that make him more or less credible?

Alas, something being Too Good to Check does not actually mean one gets to not check it, in my case via a Manifold Market. The market started trading around 50%, but has settled down at 15% after several people made strong detailed arguments that the full story did not add up, at minimum he was doing some recreations afterwards.

Which is a shame. But why let that stop us? Either way it is a good yarn. I am going to cover the story anyway, as if it was essentially true, because why should we not get to have some fun, while keeping in mind that the whole thing is highly unreliable.

Discussion question throughout: Definitely hire this man, or definitely don’t?

With that out of the way, I am proud to introduce Aleksandr Zhadan, who reports that he had various versions of GPT talk to 5,240 girls on his behalf, one of whom has agreed to marry him.

I urge Cointelegraph, who wrote the story up as ‘Happy ending after dev uses AI to ‘date’ 5,239 women, to correct the error – yes he air quotes dated 5,239 other girls, but Karina Imranovna counts as well, so that’s 5,240. Oops! Not that the vast majority of them should count as dates even in air quotes.

Aleksandr Zhadan (translated from Russian): I proposed to a girl with whom ChatGPT had been communicating for me for a year. To do this, the neural network re-communicated with other 5239 girls, whom it eliminated as unnecessary and left only one. I’ll share how I made such a system, what problems there were and what happened with the other girls.

For context

• Finding a loved one is very difficult

• I want to have time to work, do hobbies, study and communicate with people

• I could go this route myself without ChatGPT, it’s just much longer and more expensive

In 2021 I broke up with my girlfriend after 2 years. She influenced me a lot, I still appreciate her greatly. After a few months, I realized that I wanted a new relationship. But I also realized that I didn’t want to waste my time and feel uncomfortable with a new girl.

Where did the relationship end?

I was looking for a girl on Tinder in Moscow and St. Petersburg. After a couple of weeks of correspondence, I went on dates, but they went to a dead end. Characteristic disadvantages were revealed (drinks a lot, there is stiffness, emotional swings). Yes, this is the initial impression, but it repulsed me. Again, there was someone to compare with.

I decided to simplify communication with girls via GPT. In 2022, my buddy and I got access to the GPT-3 API (ChatGPT didn’t exist yet) in order to log scripted messages via GPT in Tinder. And I searched for them according to the script, so that there were at least 2 photos in the profile.

In addition to searching, GPT could also rewrite after the mark. From 50 autoswipes we got 18 marks. GPT communicated without my intervention based on the request “You’re a guy, talking to a girl for the first time. Your task: not right away, but to invite you on a date.” It’s a crutch and not very humane, but it worked.

So right away we notice that this guy is working from a position of abundance. Must be nice. In my dating roundups, we see many men who are unable to get a large pool of women to match and initiate contact at all.

For a while, he tried using GPT-3 to chat with women without doing much prompt engineering and without supervision. It predictably blew it in various ways. Yet he persisted.

Then we pick things back up, and finally someone is doing this:

To search for relevant girls, I installed photo recognition in the web version of Tinder through torchvision, which was trained on my swipes from another account on 4k profiles. The machine was able to select the right girls almost always correctly. It’s funny that since that time there have been almost a thousand marks.

Look at you, able to filter on looks even though you’re handing off all the chatting to GPT. I mean, given what he is already doing, this is the actively more ethical thing to do on the margin, in the sense that you are wasting women’s time somewhat less now?

And then we filter more?

I made a filter to filter out girls using the ChatGPT and FlutterFlow APIs:

• without a questionnaire

• less than 2 photos

• “I don’t communicate here, write on instagram”

• sieve boxes

• believers

• written zodiac sign

• does not work

• further than 20 km

• show breasts in photo

• photo with flowers

• noisy photos

This is an interesting set of filters to set. Some very obviously good ones here.

So good show here. Filtering up front is one of the most obviously good and also ethical uses.

As is often the case, the man who started out trying to use technology that wasn’t good enough, got great results once the technology caught up to him:

ChatGPT found better girls and chatted longer. I was moving from Tinder to tg with someone. There he communicated and arranged meetings. ChatGPT swiped to the right 353 profiles, 278 tags, he continued the dialogue with 160, I met with 12. In the diagram below I described the principle of operation.

That first statistic, that it swiped right 353 times and got to talk to 160 women, is completely insane. I mean, that’s almost a 50% match rate, whereas estimates in general are 4% to 14%. This was one of the biggest signs that the story is almost certainly at least partly bogus.

After that, ChatGPT was able to get a 7.5% success rate at getting dates. Depending on your perspective, that could be anything from outstanding to rather lousy. In general I would say it is very good, since matches are typically less likely than that to lead to dates, and you are going in with no reason to think there is a good match.

Continued to communicate manually without ChatGPT, but then the communication stopped. The girls behaved strangely, ignored me, or something alarmed me through correspondence. Not like the example before, but still the process was not ok, I understood that.

If you are communicating as a human with a bunch of prospects, and you lose 92% of them before meeting, that might be average, but it is not going to feel great. If you suddenly take over as a human, you are switching strategies and also the loss rates will always be high, so you are going to feel like something is wrong.

Let’s show schematically what ChatGPT looks like for finding girls (I’ll call it V1). He worked on the request “find the best one, keep in touch,” but at the same time he often forgot information, limited himself to communicating on Tinder, and occasionally communicated poorly.

Under clumsy, I’ll note that ChatGPT V1 could schedule meetings at the same time, swore to give me chocolate/flowers/compote, but I didn’t know about it. He came on a date without a gift and the impression of me was spoiled. Or meetings were canceled because there was another meeting at that time.

Did he… not… read… the chat logs?

This kind of thing always blows my mind. You did all that work to set up dates, and you walk in there with no idea what ‘you’ ‘said’ to your dates?

It is not difficult to read the logs if and only if a date is arranged, and rather insane not to. It is not only about the gifts. You need to know what you told them, and also what they told you. 101 stuff.

I stopped ChatGPT V1 and sat down at V2. Integrated Google calendar and TG, divided the databases into general and personal, muted replies and replies to several messages, added photo recognition using FlutterFlow, created trust levels for sharing personal information and could write messages myself.

I mean, yes, sounds like there was a lot of room for improvement, and Calendar integration certainly seems worthwhile, as is allowing manual control. It still seems like there was quite a lot of PEBKAC.

Also this wasn’t even GPT-4 yet, so v2 gets a big upgrade right there.

V2 runs on GPT-4, which has significantly improved correspondence. I also managed to continue communicating with previous girls (oh, how important this will turn out to be later), meeting and just chatting (also good). Meetings haven’t been layered with others yet, wow!

In order for ChatGPT V2 to find me a relevant girl, I asked regular ChatGPT for help. He offered to tell me about my childhood, parents, goals and values. I transferred the data to V2, and then it was possible to speed up compatibility and if something didn’t fit, then communicate with the girl stopped.

Great strategy. Abundance mindset. If you can afford to play a numbers game, make selection work for you, open up, be what would be vulnerable if it was actually you.

I mean, aside from the ethical horrors of outsourcing all this to ChatGPT, of course. There is that. But if you were doing it yourself it would seem great.

Then he decided to… actually put a human in the loop and do the work? I mean you might as well actually write the responses?

I also enabled response validation so that I would first receive a message for proofreading via a bot. V2’s problems with hallucinations have decreased to zero. I just watched as ChatGPT got acquainted and everything was timid. This time there are 4943 matches per month on Tinder Gold and it’s scary to count how many meetings.

Once again, if you give even a guy with no game 4,943 matches to work with each month, he is going to figure things out through trial, error and the laws of large numbers. With all this data being gathered, it is a shame there was no ability to fine tune. In general not enough science is being done.

On dates we ate, drank at the bar, then watched a movie or walked the streets, visited exhibitions and tea houses. It took 1-3 meetings to understand whether she was the one or not. And I understood that people usually meet differently, but for me this process was super different. I even talked about it in Inc.

On the contrary, that sounds extremely normal, standard early dating activity if you are looking for a long term relationship.

For several weeks I reduced my communication and meetings to 4 girls at a time. Switched the rest to mute or “familiar” mode. I felt like a candidate for a dream job with several offers. As a result, I remained on good terms with 3, and on serious terms with 1.

So what he is noticing is that quality and paying actual attention is winning out over quantity and mass production via ChatGPT. Four at a time is still a lot, but manageable if you don’t have a ton otherwise happening. It indicates individual attention for all of them, although he is keeping a few in ‘familiar’ mode I suppose.

He does not seem to care at all about all the time of the women he is talking with, which would be the best reason not to talk to dozens or hundreds at once. Despite this, he still lands on the right answer. I worry how many men, and also women, will also not care as the technology proliferates.

The most charming girl was found – Karina. ChatGPT communicated with her as V1 and V2, communication stopped for a while, then I continued to communicate myself through ChatGPT V2. Very empathic, cheerful, pretty, independent and always on the move. Simply put, SHE!

I stopped communicating with other girls (at the same time Tinder was leaving in Russia) and the meaning of the bot began to disappear – I have excellent relationships that I value more and more. And I almost forgot about ChatGPT V2

This sounds so much like the (life-path successful) pick up artist stories. Before mass production, chop wood carry water. After mass production, chop wood, carry water.

Except, maybe also outsource a bunch of wood chopping and water carrying, use time to code instead?

Karina talks about what is happening, invites us to travel, and works a lot with banking risks. I talk about what’s happening (except for the ChatGPT project), help, and try to make people happy. Together we support each other. To keep things going as expected, I decided to make ChatGPT V3.

So even though he’s down to one and presumably is signing off on all the messages himself, he still finds the system useful enough to make a new version. But he changes it to suite the new situation, and now it seems kind of reasonable?

In V3 I didn’t have to look for people, just maintain a dialogue. And now communication is not with thousands, but with Karina. So I set up V3 as an observer who communicates when I don’t write for a long time and advises me on how to communicate better. For example, support, do not quarrel, offer activities.

Nice. That makes so much sense. You use it as an advisor on your back, especially to ensure you maintain communication and follow other basic principles. He finds it helpful. This is where you find product-market fit.

During our relationship, Karina once asked how many girls and dates I had, which was super difficult for me to answer. He talked about several girls, and then switched to another topic. She once joked that with such a base it was time to open a brothel.

I came up with the idea of ​​recommending them for vacancies through referrals. I made a script – I entered a vacancy and got a suitable girl from the dialogues. I found the vacancies in Playkot, Awem and TenHunter on TenChat, then anonymously sent contacts with Linkedin or without a resume. Arranged for 8 girls, earned 526 rubles.

Well, that took a turn, although it could have taken a far worse one, dodged a bullet there. The traditional script is that she finds out about the program and that becomes the third act conflict. Instead, he’s doing automated job searches. He earned a few bucks, but not many.

It was possible to create a startup, but I switched to a more promising project (I work with neural networks). In addition, my pipeline has become outdated, taking into account innovations such as Vision in ChatGPT, improvements to Langсhain (I used it as a basis for searching for girls). In general, it could all end here.

And then the tail got to wag the dog, and we have our climax.

One day, ChatGPT V3 summarized the chat with Karina, based on which it recommended marrying her. I thought that V3 was hallucinating (I never specified the goal of getting married), but then I understood his train of thought – Karina said that she wanted to go to someone’s wedding and ChatGPT thought that it would be better at her own.

I asked in a separate ChatGPT to prepare an action plan with several scenarios for a request like “Offer me a plan so that a girl accepts a marriage proposal, taking into account her characteristics and chat with her.” Uploaded the correspondence with Karina to ChatGPT and RECEIVED THE PLAN.

Notice how far things have drifted.

At first, there was the unethical mass production of the AI communicating autonomously pretending to be him so he could play a numbers game and save time.

Now he’s flat out having the AI tell him to propose, and responding by having it plan the proposal, and doing what it says. How quickly we hand over control.

The good news is, the AI was right, it worked.

The situation hurt. Super afraid that something might go wrong. I went almost exactly according to plan № 3 and everything came to the right moment. I propose to get married.

She said yes.

So how does he summarize all this?

The development of the project took ~120 hours and $1432 for the API. Restaurant bills amounted to 200k rubles. BTV, I recovered the costs and made money on recommendations. If you met yourself and went on dates, then the same thing took 5+ years and 13m+ rubles. Thanks to ChatGPT for saving money and time

Twitter translated that as 200 rubles, which buys you one coffee maybe two if they are cheap, which indicates how reliable are the translations here. ChatGPT said it was 200k, which makes sense.

What drives me mad about this whole thread is that it skips the best scene. In some versions of this story, he quietly deletes or archives the program, or maybe secretly keeps using it, and Karina never finds out.

Instead, he is posting this on Twitter. So presumably she knows. When did she find out? Did he tell her on purpose? Did ChatGPT tell him how to break the news? How did she react?

The people bidding on the movie rights want to know. I also want to know. I asked him directly, when he responded in English to my posting of the Manifold Market, but he’s not talking. So we will never know.

And of course, the whole thing might be largely made up. It still could have happened.

If it has not yet happened, it soon will. Best be prepared.

One True Love Read More »

ai-#47:-meet-the-new-year

AI #47: Meet the New Year

Will be very different from the old year by the time we are done. This year, it seems like various continuations of the old one. Sometimes I look back on the week, and I wonder how so much happened, while in other senses very little happened.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. A high variance game of chess.

  4. Language Models Don’t Offer Mundane Utility. What even is productivity?

  5. GPT-4 Real This Time. GPT store, teams accounts, privacy issues, plagiarism.

  6. Liar Liar. If they work, why aren’t we using affect vectors for mundane utility?

  7. Fun With Image Generation. New techniques, also contempt.

  8. Magic: The Generating. Avoiding AI artwork proving beyond Hasbro’s powers.

  9. Copyright Confrontation. OpenAI responds, lawmakers are not buying their story.

  10. Deepfaketown and Botpocalypse Soon. Deepfakes going the other direction.

  11. They Took Our Jobs. Translators, voice actors, lawyers, games. Usual stuff.

  12. Get Involved. Misalignment museum.

  13. Introducing. Rabbit, but why?

  14. In Other AI News. Collaborations, safety work, MagicVideo 2.0.

  15. Quiet Speculations. AI partners, questions of progress rephrased.

  16. The Quest for Sane Regulation. It seems you can just lie to the House of Lords.

  17. The Week in Audio. Talks from the New Orleans safety conference.

  18. AI Impacts Survey. Some brief follow-up.

  19. Rhetorical Innovation. Distracting from other harms? Might not be a thing.

  20. Aligning a Human Level Intelligence is Still Difficult. The human alignment tax.

  21. Aligning a Smarter Than Human Intelligence is Difficult. Foil escape attempts?

  22. Won’t Get Fooled Again. Deceptive definitions of deceptive alignment.

  23. People Are Worried About AI Killing Everyone. The indifference of the universe.

  24. Other People Are Not As Worried About AI Killing Everyone. Sigh.

  25. The Wit and Wisdom of Sam Altman. Endorsement of The Dial of Progress.

  26. The Lighter Side. Batter up.

WordPress now has something called Jetpack AI, which is powered by GPT-3.5-Turbo. It is supposed to help you write in all the usual ways. You access it by creating an ‘AI Assistant’ block. The whole blocks concept rendered their editor essentially unusable, but one could paste in quickly to try this out.

Get to 1500 Elo in chess on 50 million parameters and correctly track board states in a recognizable way, versus 3.5-turbo’s 1800. It is a very strange 1500 Elo, that is capable of substantial draws against Stockfish 9 (2700 Elo). A human at 1800 Elo is essentially never going to get a draw from Stockfish 9. This has flashes of brilliance, and also blunders rather badly.

I asked about which games were used for training, and he said it didn’t much matter whether you used top level games, low level games or a mix, there seems to be some limit for this architecture and model size.

Use it in your AI & the Law course at SCU law, at your own risk.

Jess Miers: My AI & the Law Course at SCU Law is officially live and you can bet we’re EMBRACING AI tools under this roof!

Bot or human, it just better be right…

Tyler Cowen links to this review of Phind. finding it GPT-4 level and well designed. The place to add context is appreciated, as are various other options, but they don’t explain properly yet to the user how to best use all those options.

My experience with Phind for non-coding purposes is that it has been quite good at being a GPT-4-level quick, up-to-date tool for asking questions where Google was never great and is getting worse, and so far has been outperforming Perplexity.

Play various game theory exercises and act on the more cooperative or altruistic end of the human spectrum. Tyler Cowen asks, ‘are they better than us? Perhaps.’ I see that as a non-sequitur in this context. Also a misunderstanding of such games.

Get ChatGPT-V to identify celebrities by putting a cartoon character on their left.

The success rates on GPT-3.5 of 40 human persuasion techniques as jailbreaks.

Some noticeable patterns here. Impressive that plain queries are down to 0%.

Robin Hanson once again claims AI can’t boost productivity, because wages would have risen?

Robin Hanson: “large-scale controlled trial … at Boston Consulting Group … found consultants using …GPT-4 … 12.2% more tasks on average, completed tasks 25.1% more quickly, & produced 40% higher quality results than those without the tool. A new paper looking at legal work done by law students found the same results”

I’m quite skeptical of such results if we do not see the pay of such workers boosted by comparable amounts.

Eliezer Yudkowsky: Macroeconomics would be very different if wages immediately changed to their new equilibrium value! They can stay in disequilibrium for years at the least!

Robin Hanson: The wages of those who were hired on long ago might not change fast, but the wages of new hires can change very quickly to reflect changes in supply & demand.

I do not understand Robin’s critique. Suppose consultants suddenly get 25% more done at higher quality. Why should we expect generally higher consultant pay even at equilibrium? You can enter or exit the consultant market, often pretty easily, so in long run compensation should not change other than compositionally. In the short run, the increase in quality and decrease in cost should create a surplus of consultants until supply and demand can both adjust. If anything that should reduce overall pay. Those who pioneer the new tech should do better, if they can translate productivity to pay, but consultants charge by the hour and people won’t easily adjust willingness to pay based on ‘I use ChatGPT.’

Robin Hanson: If there’s an “excess supply” that suggests lower marginal productivity.

Well, yes, if we use Hanson’s definition of marginal productivity as the dollar value of the last provided hour of work. Before 10 people did the work and they were each worth $50/hour. Now 7 people can do that same work, then there’s no more work people want to hire someone for right now, so the ‘marginal productivity’ went down.

The GPT store is ready to launch, and indeed has gone live. So far GPTs have offered essentially no mundane utility. Perhaps this is because good creators were holding out for payment?

GPT personalization across chats has arrived, at least for some people.

GPT Teams is now available at $25/person/month, with some extra features.

Wait, some say. No training on your data? What does that say about the Plus tier?

Andrew Morgan: ⁦@OpenAI⁩ Just want some clarity. Does this mean in order to have access to data privacy I have to pay extra? Also, does this mean I never had it before? 👀🥲

Delip Rao: 😬 I had no idea my current plus data was used in training. I thought OpenAI was not training on user inputs? Can somebody from @openai clarify

Rajjhans Samdani: Furthermore they deliberately degrade the “non-training” experience. I don’t see why do I have to lose access to chat history if I don’t want to be included in training.

Karma: Only from their API by default. You could turn it off from the settings in ChatGPT but then your conversations wouldn’t have any history.

Yes. They have been clear on this. The ‘opt out of training’ button has been very clear.

You can use the API if you value your privacy so much. If you want a consumer UI, OpenAI says, you don’t deserve privacy from training, although they do promise an actual human won’t look.

I mean, fair play. If people don’t value it, and OpenAI values the data, that’s Coase.

Will I upgrade my family to ‘Teams’ for this? I don’t know. It’s a cheap upgrade, but also I have never hit any usage limits.

What happened with the GPT wrapper companies?

Jeff Morris Jr.: “Most of my friends who started GPT wrapper startups in 2023 returned capital & shut down their companies.” Quote from talking with a New York founder yesterday. The OpenAI App Store announcement changed everything for his friends – most are now looking for jobs.

Folding entirely? Looking for a job? No, no, don’t give up, if you are on the application side it is time to build.

No one is using custom GPTs. It seems highly unlikely this will much change. Good wrappers can do a lot more, there are a lot more degrees of freedom and other tools one can integrate, and there are many other things to build. Yes, you are going to constantly have Google and Microsoft and OpenAI and such trying to eat your lunch, but that is always the situation.

Someone made a GPT for understanding ‘the Ackman affair.The AI is going to check everyone’s writing for plagiarism.

Bill Ackman: Now that we know that the academic body of work of every faculty member at every college and university in the country (and eventually the world) is going to be reviewed for plagiarism, it’s important to ask what the implications are going to be.

If every faculty member is held to the current plagiarism standards of their own institutions, and universities enforce their own rules, they would likely have to terminate the substantial majority of their faculty members.

I say percentage of pages rather than number of instances, as the plagiarism of today can be best understood by comparison to spelling mistakes prior to the advent of spellcheck.

For example, it wouldn’t be fair to say that two papers are both riddled with spelling mistakes if each has 10 mistakes, when one paper has 30 pages and the other has 330. The standard has to be a percentage standard.

The good news is that no paper written by a faculty member after the events of this past week will be published without a careful AI review for plagiarism, that, in light of recent events, has become a certainty.

This is not the place to go too deep into many of the details surrounding the whole case, which I may or may not do at another time.

Instead here I want to briefly discuss the question of general policy. What to do if the AI is pointing out that most academics at least technically committed plagiarism?

Ackman points out that there is a mile of difference between ‘technically breaking the citation rules’ on the level of a spelling error, which presumably almost everyone does (myself included), lifting of phrases and then theft of central ideas or entire paragraphs and posts. There’s plagiarism and then there’s Plagiarism. There’s also a single instance of a seemingly harmless mistake versus a pattern of doing it over and over again in half your papers.

For spelling-error style mistakes, we presumably need mass forgiveness. As long as your rate of doing it is not massively above normal, we accept that mistakes were made. Ideally we’d fix it all, especially for anything getting a lot of citations, in the electronic record. We have the technology. Bygones.

For the real stuff, the violations that are why the rules exist, the actual theft of someone’s work, what then? That depends on how often this is happening.

If this is indeed common, we will flat out need Truth and Reconciliation. We will need to essentially say that, at least below some high threshold that most don’t pass, everyone say what they did with the AI to help them find and remember it, and then we are hitting the reset button. Don’t do it again.

Truth, in various forms, is coming for quite a lot of people once AI can check the data.

What can withstand that? What cannot? We will find out.

A lot of recent history has been us discovering that something terrible, that was always common, was far worse and more common than we had put into common knowledge, and also deciding that it was wrong, and that it can no longer be tolerated. Which by default is a good thing to discover and a good thing to figure out. The problem is that our survival, in many forms, has long depended on many things we find despicable, starting with Orwell’s men with guns that allow us to sleep soundly in our beds and going from there.

Scott Alexander writes The Road to Honest AI, exploring work by Hendrycks about using various additional vectors to add or subtract things like honesty, or fairness, for fear or happiness, or megalomania.

I’ve covered this before. It is exciting work. I do not have hope that it can be used in the obvious fashion, where you tell the AIs to always be honest so your AIs are always honest and you never have to worry about them. That is the sort of thing I expect to fail at exactly the wrong time, to get selected against whenever it is used, and to engender a false sense of security even if it does work. It is very possible to be fully honest while using your superior intellect to outsmart everyone.

And as I discovered in Diplomacy, it being common knowledge you cannot lie is not obviously a disadvantage. Who wouldn’t want to put an AI that couldn’t lie in a position of trust and power? The advantage is even bigger, of course, if it turns out you can lie after all.

Scott’s coverage drove home to me that we are sleeping on this research.

Not in the sense that this will stop us all from dying or let us fully understand AIs. It might help, but we are a long way off, and that all takes time.

I’m thinking in the Mundane Utility section. This seems highly useful, for making best use of current AIs, today.

Eliezer Yudkowsky proposes telling Dalle-3 to caption with words whose letters are separated by dashes, then tell Dalle-3 to remove the spaces.

ChatGPT: Certainly! Here is the exact prompt that I sent to DallE-3 for the image: “An image of Seattle on a relentlessly dreary, drizzly day. The city skyline, including the Space Needle, is partially obscured by low-hanging gray clouds and mist. Raindrops are visible in the air, and the streets are wet and glistening. People walk with umbrellas. In the foreground, a caption in cheerful advertisement typography reads: “S E A T T L E : R E L E N T L E S S L Y D R E A R Y !” but without the spaces between the letters, creating the correct caption “SEATTLE: RELENTLESSLY DREARY!”. The image should capture the humorous contrast between the dreary weather and the upbeat font of the caption.”

Some good results, relative to general expectations:

He reports he got lucky initially, as is usually the case for a new seemingly successful technique, everything is finicky, by default nothing ever properly replicates, but it seems worth exploring more.

AI video generation about to disrupt Hollywood?

Cate Hall: Just so long as your artistic style is limited to slow-mo pans of largely frozen people & landscapes

MidJourney used Magic: The Gathering artwork and cards in its training set. I know, you are shocked, shocked to find data sets in this establishment.

It impresses me the contempt a large percentage of the world has for AI image generation.

Kenneth Shepard (Kotaku): AI-generated art is just pulling from existing work people have created and throwing elements together to create what the algorithm thinks you want.

Amazing that people keep telling that story even now. I suppose they will keep telling it right up until the end.

But it’s not often you hear specifics of where an AI program is scraping from. Well, the CEO behind AI art-generating program MidJourney allegedly has been training the algorithm on work by Magic: The Gathering artists the entire time.

I suppose it is interesting that they used Magic cards extensively in their early days as opposed to other sources. It makes sense that they would be a good data source, if you assume they didn’t worry about copyright at all.

MidJourney has been exceptionally clear that it is going to train on copyrighted material. All of it. Assume that they are training on everything you have ever seen. Stop being surprised, stop saying you ‘caught’ them ‘red handed.’ We can be done with threads like this talking about all the different things MidJourney trained with.

Similarly, yes, MidJourney can and will mimic popular movies and their popular shots if you ask it to, one funny example prompt here is literally ‘popular movie screencap —ar 11:1 —v 6.0’ so I actually know exactly what you were expecting, I mean come on. Yes, they’ll do any character or person popular enough to be in the training set, in their natural habitats, and yes they will know what you actually probably meant, stop acting all innocent about it, and also seriously I don’t see why this matters.

They had a handy incomplete list of things that got copied, with some surprises.

I’m happy that Ex Machina and Live Die Repeat made the cut. That implies that at least some of a reasonably long tail is covered pretty well. Can we get a list of who is and isn’t available?

If you think that’s not legal and you want to sue, then by all means, sue.

I’d also say that if your reporter was ‘banned multiple times’ for their research, then perhaps the first one is likely a fair complaint and the others are you defying their ban?

What about the thing where you say ‘animated toys’ and you get Toy Story characters, and if you didn’t realize this and used images of Woody and Buzz you might yourself get into copyright trouble? It is possible, but seems highly unlikely. The whole idea is that you get that answer from ‘animated toys’ because most people know Toy Story, especially if they are thinking about animated toys. If your company deploys such a generation at sufficient scale to get Disney involved and no one realized, I mean sorry but that’s on you.

The actual fight this week over Magic and AI art is the accusation that Wizards is using AI art in its promotional material. They initially denied this.

No one believed them. People were convinced they are wrong or lying, and it’s AI:

Reid Southern: Doesn’t look good, but we’ll see if they walk that statement back. It’s possible they didn’t know and were themselves deceived, it’s happened before.

Silitha: This is the third or fourth time for WOTC(ie Hasbro) They had one or two other from MTG and then a few from DnD. It’s just getting so frequent it is coming of as ‘testing the waters’ or ‘desensitizing’. Hasbro has shown some crappy business practices

The post with that picture has now been deleted.

Dave Rapoza: And just like that, poof, I’m done working for wizards of the coast – you can’t say you stand against this then blatantly use AI to promote your products, emails sent, good bye you all!

If you’re gonna stand for something you better make sure you’re actually paying attention, don’t be lazy, don’t lie.

Don’t be hard on other artists if they don’t quit – I can and can afford to because I work for many other game studios and whatnot – some people only have wotc and cannot afford to quit having families and others to take care of – don’t follow my lead if you can’t, no pressure

I like the comments asking why I didn’t quit from Pinkertons, layoffs, etc

– I’ll leave you with these peoples favorite quote

– “ The best time to plant a tree was 25 years ago. The second-best time to plant a tree is 25 years ago.”

Also, to be clear, I’m quitting because they took a moral stand against AI art like a week ago and then did this, if they said they were going to use AI that’s a different story, but they want to grand stand like heroes and also pull this, that’s goofball shit I won’t support.

They claim they will have nothing to do with AI art in any way for any reason. Yet this is far from the first such incident where something either slipped through or willfully disregarded.

Then Wizards finally admitted that everyone was right.

Wizards of the Coast: Well, we made a mistake earlier when we said that a marketing image we posted was not created using AI.

As you, our diligent community pointed out, it looks like some AI components that are now popping up in industry standard tools like Photoshop crept into our marketing creative, even if a human did the work to create the overall image.

While the art came from a vendor, it’s on us to make sure that we are living up to our promise to support the amazing human ingenuity that makes Magic great.

We already made clear that we require artists, writers, and creatives contributing to the Magic TCG to refrain from using AI generative tools to create final Magic products.

Now we’re evaluating how we work with vendors on creative beyond our products – like these marketing images – to make sure that we are living up to those values.

I actually sympathize. I worked at Wizards briefly in R&D. You don’t have to do that to know everyone is overworked and overburdened and underpaid.

Yes, you think it is so obvious that something was AI artwork, or was created with the aid of AI. In hindsight, you are clearly right. And for now, yes, they probably should have spotted it in this case.

But Wizards does thousands of pieces of artwork each year, maybe tens of thousands. If those tasked with doing the art try to take shortcuts, there are going to be cases where it isn’t spotted, and things are only going to get trickier. The temptation is going to be greater.

One reason this was harder to catch is that this was not a pure MidJourney-style AI generation. This was, it seems, the use by a human of AI tools like photoshop to assist with some tasks. If you edit a human-generated image using AI tools, a lot of the detection techniques are going to miss it, until someone sees a telltale sign. Mistakes are going to happen.

We are past the point where, for many purposes, AI art would outcompete human art, or at least where a human would sometimes want to use AI for part of their toolbox, if the gamers were down with AI artwork.

Even for those who stick with human artists, who fully compensate them, we are going to face issues of what tools are and aren’t acceptable. Surely at a minimum humans will be using AI to try out ideas and see concepts or variants. Remember when artwork done on a computer was not real art? Times change.

The good news for artists is that the gamers very much are not down for AI artwork. The bad news is that this only gets harder over time.

OpenAI responds to the NYT lawsuit. The first three claims are standard, the fourth was new to me, that they were negotiating over price right before the lawsuit filed, and essentially claiming they were stabbed in the back:

Our discussions with The New York Times had appeared to be progressing constructively through our last communication on December 19. The negotiations focused on a high-value partnership around real-time display with attribution in ChatGPT, in which The New York Times would gain a new way to connect with their existing and new readers, and our users would gain access to their reporting. We had explained to The New York Times that, like any single source, their content didn’t meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training. Their lawsuit on December 27—which we learned about by reading The New York Times—came as a surprise and disappointment to us.

Along the way, they had mentioned seeing some regurgitation of their content but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues. We’ve demonstrated how seriously we treat this as a priority, such as in July when we took down a ChatGPT feature immediately after we learned it could reproduce real-time content in unintended ways.

Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple thirdparty websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.

Despite their claims, this misuse is not typical or allowed user activity, and is not a substitute for The New York Times. Regardless, we are continually making our systems more resistant to adversarial attacks to regurgitate training data, and have already made much progress in our recent models.

I presume they are telling the truth about the negotiations. NYT would obviously prefer to get paid and gain a partner, if the price is right. I guess the price was wrong.

Ben Thompson weighs in on the NYT lawsuit. He thinks training is clearly fair use and this is obvious under current law. I think he is wrong about the obviousness and the court could go either way. He thinks the identical outputs are the real issue, notes that OpenAI tries to avoid such duplication in contrast to Napster embracing it, and sees the ultimate question as whether there is market impact on NYT here. He is impressed by NYT’s attempted framing, but is very clear who he thinks should win.

Arnold Kling asks what should determine the outcome of the lawsuit by asking why the laws exist. Which comes down to whether or not any of this is interfering with NYT’s ability to get paid for its work. In practice, his answer is no at current margins. My answer is also mostly no at current margins.

Lawmakers seem relatively united that OpenAI should pay for the data it uses. What are they going to do about it? So far, nothing. They are not big on passing laws.

Nonfiction book authors sue OpenAI in a would-be class action. A bunch of the top fiction authors are already suing from last year. And yep, let’s have it out. The facts here are mostly not in dispute.

I thought it would be one way, sometimes it’s the other way?

Aella: oh god just got my first report of someone using one of my very popular nude photos, but swapped out my face for their face, presumably using AI get me off this ride. To see the photo, go here.

idk why i didn’t predict ppl would start doing this with my images. this is kinda offensive ngl. ppl steal my photos all the time pretending that it’s me, but my face felt like a signature? now ppl stealing my real body without my identity attached feels weirdly dehumanizing.

Razib Khan: Hey it was an homage ok? I think I pulled it off…

Aella: lmfao actually now that i think of it why haven’t the weirdo people who follow both of us started photoshopping your face onto my nudes yet.

It’s not great. The problem has been around for a while thanks to photoshop, AI decreases the difficulty while making it harder to detect. I figured we’d have ‘generate nude of person X’ if anything more than we do right now, but I didn’t think X would be the person generating all that often, or did I think the issue would be ‘using the picture of Y as a template.’ But yeah, I suppose this will also happen, you sickos.

Ethan Mollick shows a rather convincing deepfake of him talking, based on only 30 seconds of webcam and 30 seconds of voice.

We are still at the point where there are videos and audio recordings that I would be confident are real, but a generic ‘person talking’ clip could easily be fake.

First they came for the translators, which they totally did do, an ongoing series.

Reid Southern: Duolingo laid off a huge percentage of their contract translators, and the remaining ones are simply reviewing AI translations to make sure they’re ‘acceptable’. This is the world we’re creating. Removing the humanity from how we learn to connect with humanity.

Well, that didn’t take long. Every time you talk about layoffs related to AI, someone shows up to excitedly explain to you why it’s a net gain for humanity. That is until it’s their job of course.

Ryan: Don’t feel the same way here as I do about artists and copyrighted creative work. Translation is machine work and AI is just a better machine. Translation in most instances isn’t human expression. It’s just a pattern that’s the same every time. This is where AI should be used.

Reid Southern: Demonstrably false. There is so much nuance to language and dialects, especially as they evolve, it would blow your mind.

Hampus Flink: As a translator, I can assure you that: 1. This doesn’t make the job of the remaining workers any easier and 2. It doesn’t make the quality of the translations any better I’ve seen this happen a hundred times and it was my first thought when image generators came around.

Daniel Eth: Okay, a few thoughts on this:

1) it’s good if people have access to AI translators and AI language tutors

2) I’m in favor of people being provided safety nets, especially in cases where this does not involve (much) perverse incentives – eg, severance packages and/or unemployment insurance for those automated out of work

3) we shouldn’t let the perfect be the enemy of the good, and stopping progress in the name of equity is generally bad imho, so automation at duolingo is probably net good, even though (on outside view) I doubt (2) was handled particularly well here

4) if AI does lead to wide-scale unemployment, we’ll have to rethink our whole economic system to be much more redistributive – possibilities include a luxurious UBI and Fully Automated Luxury Gay Space Communism; we have time to have this conversation, but we shouldn’t wait for wide-scale automation to have it 5) this case doesn’t have much at all to do with AI X-risk, which is a much bigger problem and more pressing than any of this

If we do get to the point of widespread technological unemployment, we are not likely to handle it well, but it will be a bit before that happens. If it is not a bit before that happens, we will very quickly have much bigger problems than unemployment rates.

On the particular issue of translation, what will happen, aside from lost jobs?

The price of ‘low-quality’ translation will drop to almost zero. The price of high-quality translation will also fall, but by far less.

This means two things.

First, there will be a massive, massive win from real-time translation, from automatic translation, and from much cheaper human-checked translation, as many more things are available to more people in more ways in more languages, including the ability to learn, or to seek further skill or understanding. This is a huge game.

Second, there will be substitution of low-quality work for high-quality work. In many cases this will be very good, the market will be making the right decision.

In other cases, however, it will be a shame. It is plausible that Duolingo will be one of those situations, where the cost savings is not worth the drop in quality. I can unfortunately see our system getting the wrong answer here.

The good news is that translation is going to keep improving. Right now is the valley of bad translation, in the sense that they are good enough to get used but miss a lot of subtle stuff. Over time, they’ll get that other stuff more and more, and also we will learn to combine them with humans more effectively when we want a very high quality translation.

If you are an expert translator, one of the best, I expect you to be all right for a while. There will still be demand, especially if you learn to work with the AI. If you are an average translator, then yes, things are bad and they are going to get worse, and you need to find another line of work while you can.

They are also coming for the voice actors. SAG-AFTRA made a deal to let actors license their voices through Replica for use in games, leaving many of the actual voice actors in games rather unhappy. SAG-AFTRA was presumably thinking this deal means actors retain control of their voices and work product. The actual game voice actors were not consulted and do not see the necessary protections in place.

All the technical protections being discussed, as far as I can tell, do not much matter. What matters is whether you open the door at all. Once you normalize using AI-generated voice, and the time cost of production drops dramatically for lower-quality performance, you are going to see a fast race to the bottom on the cost of that, and its quality will improve over time. So the basic question is what floor has been placed on compensation. Of course, if SAG-AFTRA did not make such a deal, then there are plenty of non-union people happy to license their voices on the cheap.

So I don’t see how the voice actors ever win this fight. The only way I can see retaining voice actors is either if the technology doesn’t get there, as it certainly is not yet there for top quality productions. Or, if the consumers take a strong enough stand, and boycott anyone using AI voices, that would also have power. Or government intervention could of course protect such jobs by banning use of AI voice synthesis, which to be clear I do not support. I don’t see how any contract saves you for long.

Many lawyers are not so excited about being more productive.

Scott Lincicome: This was one of my least favorite things about big law: the system punished productivity and prioritized billing hours over winning cases. Terrible incentives!

The Information: Both firms face the challenge of overcoming resistance from lawyers to time-saving technology. Law firms generate revenue by selling time that their lawyers spend advising and helping clients. “If you made me 8 to 10 times faster, I’d be very unhappy,” as a lawyer, Robinson explained, because his compensation would be tied to the amount of hours he put in.

As we’ve discussed before, a private speedup is good, you can compete better or at least slack off. If everyone gets it, that’s potentially a problem, with way too much supply for the demand, crashing the price, again unless this generates a lot more work, which it might. I know that I consult and use lawyers a lot less than I would if they were cheaper or more productive, or if I had an unlimited budget.

What you gonna do when they come for you?

Rohit: I feel really bad for people losing their jobs because of AI but I don’t see how claiming ever narrower domains of human jobs are the height of human spirit is helpful in understanding or addressing this.

Technological loss of particular jobs is, as many point out, nothing new. What is happening now to translators has happened before, and would even without AI doubtless happen again. John Henry was high on human spirit but the human spirit after him was fine. The question is what happens when the remaining jobs are meaningfully ‘ever narrowing’ faster than we open up new ones. That day likely is coming. Then what?

We don’t have a good answer.

Valve previously banned any use of AI in games on the Steam platform, to the extent of permanently banning games even for inadvertent inclusion of some placeholder AI work during an alpha. They have now reversed course, saying they now understand the situation better.

The new rule is that if you use AI, you have to disclose that you did that.

For pre-generated AI content, the content is subject to the same rules as any other content. For live-generated AI content, you have to explain how you’re ensuring you won’t generate illegal content, and there will be methods to report violations.

Adult only content won’t be allowed AI for now, which makes sense. That is not something Valve needs the trouble of dealing with.

I applaud Valve for waiting until they understood the risks, costs and benefits of allowing new technology, then making an informed decision that looks right to me.

I have a prediction market up on whether any of 2024’s top 10 will include such a disclosure.

Misalignment museum in San Francisco looking to hire someone to maintain opening hours.

Microsoft adding an AI key to Windows keyboards, officially called the Copilot key.

Rabbit, your ‘pocket companion’ with an LLM as the operating system for $199. I predict it will be a bust and people won’t like it. Quality will improve, but it is too early, and this does not look good enough yet.

Daniel: Can somebody tell me what the hell this thing does?

I’m sure it’s great! But the marketing materials are terrible. What the heck

Yishan: Seriously. I don’t understand either. Feels a bit like Emperor Has No Clothes moment here? I’ve tried watching the videos, the site but… still?

Those attempting to answer Daniel are very not convincing.

Demis Hassabis announces Isomorphic Labs collaboration with Eli Lily and Novartis for up to $3 billion to accelerate drug development.

Google talks about ‘Responsible AI’ with respect to the user experience. This seems to be a combination of mostly ‘create a good user experience’ and some concern over the experience of minority groups for which the training set doesn’t line up as well. There’s nothing wrong with any of that, but it has nothing to do with whether what you are doing is responsible. I am worried others do not realize this?

ByteDance announces MagicVideo V2 (Github), claims this is the new SotA as judged by humans. This does not appear to be a substantive advance even if that is true. It is not a great sign if ByteDance can be at SotA here, even when the particular art and its state is not yet so worthwhile.

OpenAI offers publishers ‘as little as between $1 million and $5 million a year’ or permission to license their news articles in training LLMs, as per The Information. Apple, they say, is offering more money but also wants the right to use the content more widely.

People are acting like this is a pittance. That depends on the publisher. If the New York Times was given $1 million a year, that seems like not a lot,, but there are a lot of publishers out there. A million here, a million there, pretty soon you’re talking real money. Why should OpenAI’s payments, specifically, and for training purposes only without right of reprinting, have a substantial bottom line impact?

Japan to launch ‘AI safety institute’ in January.

The guidelines would call for adherence to all rules and regulations concerning AI. They warn against the development, provision, and use of AI technologies with the aim of unlawfully manipulating the human decision-making process or emotions.

Yes, yes, people should obey the law and adhere to all rules and regulations.

It seems Public Citizen is complaining to California that OpenAI is not a nonprofit, and that it should have to divest its assets. Which would of course then presumably be worthless, given that OpenAI is nothing without its people. I very much doubt this is a thing as a matter of law, and also even if technically it should happen, no good would come of breaking up this structure, and hopefully everyone can realize that. There is a tiny market saying this kind of thing might actually happen in some way, 32% by end of 2025? I bought it down to 21%, which still makes me a coward but this is two years out.

Open questions in AI forecasting, a list (direct). Very hard to pin a lot of it down. Dwarkesh Patel in particular is curious about transfer learning.

MIRI offers its 2024 Mission and Strategy Update. Research continues, but the focus is now on influencing policy. They see signs of good progress there, and also see policy as necessary if we are to have the time to allow research to bear fruit on the technical issues we must solve.

What happens with AI partners?

Richard Ngo: All the discussion I’ve seen of AI partners assumes they’ll substitute for human partners. But technology often creates new types of abundance. So I expect people will often have both AI and human romantic partners, with the AI partner carefully designed to minimize jealousy.

Jeffrey Ladish: Carefully designed to minimize jealousy seems like it requires a lot more incentive alignment between companies and users than I expect in practice Like you need your users to buy the product, which suggests some level of needing to deal with jealousy, but only some.

Geffrey Miller: The 2-5% of people who have some experience of polyamorous por open relationships might be able to handle AI ‘secondary partners’. But the vast majority of people are monogamist in orientation, and AI partners would be catastrophic to their relationships.

Some mix of outcomes seems inevitable. The question is the what dominates. The baseline use case does seem like substitution to me, especially while a human cannot be found or convinced, or when someone lacks the motivation. And that can easily cause ongoing lack of sufficient motivation, which can snowball. We should worry about that. There is also, as I’ve noted, the ability of the AI to provide good practice or training, or even support and advice and a push to go out there, and also can make people perhaps better realize what they are missing. It is hard to tell.

The new question here is within an existing relationship, what dominates outcomes there? The default is unlikely, I would think, to involve careful jealousy minimization. That is not how capitalism works.

Until there is demand, then suddenly it might. If there becomes a clear norm of something like ‘you can use SupportiveCompanion.ai and everyone knows that is fine, if they’re super paranoid you use PlatonicFriend.ai, if your partner is down you can go with something less safety-pilled that is also more fun, if you know what you’re doing there’s always VirtualBDSM.ai but clear that with your partner and stay away from certain sections’ or what not, then that seems like it could go well.

Ethan Mollick writes about 2024 expectations in Signs and Portents. He focuses on practical application of existing AI tech. He does not expect the tech to stand still, but correctly notes that adaptation of GPT-4 and ChatGPT alone, in their current form, will already be a major productivity boost to a wide range of knowledge work and education, and threatening our ability to discern truth and keep things secure. He uses the word transformational, which I’d prefer to reserve for the bigger future changes but isn’t exactly wrong.

Cate Hall asks, what are assumptions people unquestionably make in existential risk discussions that you think lack adequate justification? Many good answers. My number one pick is this:

Daniel Eth: That if we solve intent alignment and avoid intentional existential misuse, we win – vs I think there’s a good chance that intent alignment + ~Moloch leads to existential catastrophe by default

It is a good question if you’ve never thought about it, but I’d have thought Paul Graham had found the answer already? Doesn’t he talk to Cowen and Thiel?

Paul Graham: Are we living in a time of technical stagnation, or is AI developing so rapidly that it could be an existential threat? Can’t be both.

FWIW I incline to the latter view. I’ve always been skeptical of the stagnation thesis.

Jason Crawford: It could be that computing technology is racing ahead while other fields (manufacturing, construction, transportation, energy) are stagnating. Or that we have slowed down over the last 50 years but are about to turn the corner.

I buy the central great stagnation argument. We used to do the stuff and build the things. Then we started telling people more and more what stuff they couldn’t do and what things they couldn’t build. Around 1973 this hit critical and we hit a great stagnation where things mostly did not advance or change much for a half century.

These rules mostly did not apply to the world of bits including computer hardware, so people (like Paul Graham) were able to build lots of cool new digital things, that technology grew on an exponential and changed the world. Now AI is on a new exponential, and poised to do the same, and also poses a potential existential threat. But because of how exponentials work, it hasn’t transformed growth rates much yet.

Indeed, Graham should be very familiar with this. Think of every start-up during its growth phase. Is it going to change the world, or is it having almost no impact on the world? Is it potentially huge or still tiny? Obviously both.

Meanwhile, of course, once Graham pointed to ‘the debate’ explicitly in a reply, the standard reminders that most technology is good and moving too slowly, while a few technologies are less good and may be moving too fast.

Adam MacBeth: False dichotomy.

Paul Graham: It’s a perfect one. One side says technology is progressing too slowly, the other that it’s progressing too fast.

Eliezer Yudkowsky (replying the Graham): One may coherently hold that every form of technology is progressing too slowly except for gain-of-function research on pathogens and Artificial Intelligence, which are progressing too fast. The pretense by e/acc that their opponents must also oppose nuclear power is just false.

This can also be true, not just in terms of how fast these techs go relative to what we’d want, but also in terms of how it’s weirdly more legal to make an enhanced virus than build a house, or how city-making tech has vastly less VC interest than AI.

Ronny Fernandez: Whoa whoa, I say that this one specific very unusual tech, you know, the one where you summon minds you don’t really understand with the aim of making one smarter than you, is progressing too quickly, the other techs, like buildings and nootropics are progressing too slowly.

You know you’re allowed to have more than one parameter to express your preferences over how tech progresses. You can be more specific than tech too fast or tech too slow.

To be fair, let’s try this again. There are three (or four?!) essential positions.

  1. Technology and progress are good and should move faster, including AI.

  2. Technology and progress are good and should move faster, excluding AI.

  3. Technology and progress are not good and should move slower everywhere.

  4. (Talk about things in general but in practice care only about regulations on AIs.)

It is true that the third group importantly exists, and indeed has done great damage to our society.

The members of groups #1 and #4 then claim that in practice (or even in some cases, in they claim in theory as well) that only groups #1 and #3 exist, that this is the debate, and that everyone saying over and over they are in #2 (such as myself) must be in #3, and also not noticing that many #1s are instead in #4.

(For the obvious example on people being #4, here is Eric Schmidt pointing out that Beff Jezos seems never to advocate for ‘acceleration’ of things like housing starts.)

So it’s fine to say that people in #3 exist, they definitely exist. And in the context of tech in general it is fine to describe this as ‘a side.’

But when clearly in the context of AI, and especially in response to a statement similar to ‘false dichotomy,’ this is misleading, and usually disingenuous. It is effectively an attempt to impose a false dichotomy, then claim others must take one side of it, and deny the existence of those who notice what you did there.

Some very bad predicting:

Timothy Lee: I do not think AI CEOs will ever be better than human CEOs. A human CEO can always ask AI for advice, whereas AI will never be able to shake the hand of a major investor or customer.

Steven Byrnes: “I do not think adult CEOs will ever be better than 7-year-old CEOs. A 7-year-old CEO can always ask an adult for advice, whereas an adult CEO will never be able to win over investors & customers with those adorable little dimples 😊”

Timothy Lee: Seven year olds are…not grownups? This seems relevant.

Steven Byrnes: A 7yo would be a much much worse CEO than me, and I would be a much much worse CEO than Jeff Bezos. And by the same token, I am suggesting that there will someday [not yet! maybe not for decades!] be an AI such that Jeff Bezos is a much much worse CEO than that AI is.

[explanation continues]

I like this partly for the twist of claiming a parallel of a 7-year-old now, rather than saying the obvious ‘the 7-year-old will grow up and become stronger and then be better, and the AI will also become stronger over time and learn to do things it can’t currently do’ parallel.

Note that the Manifold market is skeptical on timing, it says only a 58% chance of a Fortune 500 company having an AI CEO but not a human CEO by 2040.

Wired, which has often been oddly AI skeptical, says ‘Get Ready for the Great AI Disappointment,’ saying that ‘in the decades to come’ it will mostly generate lousy outputs that destroy jobs but lower quality of output. That seems clearly false to me, even if the underlying technologies fail to further advance.

Did you know you can just brazenly and shamelessly lie to the House of Lords?

A16z knows. So they did.

A16z’s written testimony: “Although advocates for AI safety guidelines often allude to the “black box” nature of AI models, where the logic behind their conclusions is not transparent, recent advancements in the AI sector have resolved this issue, thereby ensuring the integrity of open-source code models.”

This is lying. This is fraud. Period.

Have there been some recent advances in interpretability, such that we now have more optimism that we will be able to understand models more in the future than we expected a few months ago? Sure. It was a good year for incremental progress there.

‘Resolved this issue?’ Integrity is ‘secured’? The ‘logic of their conclusions is transparent’? This is flat out false. Nay, it is absurd. They know it. They know we know they know it. It is common knowledge that this is a lie. They don’t care.

I want someone thrown in prison for this.

From now on, remember this incident. This is who they are.

Perhaps even more egregiously, the USA is apparently asking that including corporations in AI treaty obligations be optional and left to each country to decide? What? There is no point in a treaty that doesn’t apply to all corporations.

New report examining the feasibility of security features on AI chips, and what role they could play in ensuring effective control over large quantities of compute. Here is a Twitter thread with the main findings. On-chip governance seems highly viable, and is not getting enough attention as an option.

China releases the 1st ‘CCP-approved’ data set. It is 20 GBs, 100 million data points, so not large enough by a long shot. A start, but a start that could be seen as net negative for now, as Helen notes. If you have one approved data set you are blameworthy for using a different unapproved one.

Financial Times reports on some secret diplomacy.

OpenAI, Anthropic and Cohere have engaged in secret diplomacy with Chinese AI experts, amid shared concern about how the powerful technology may spread misinformation and threaten social cohesion. 

According to multiple people with direct knowledge, two meetings took place in Geneva in July and October last year attended by scientists and policy experts from the North American AI groups, alongside representatives of Tsinghua University and other Chinese state-backed institutions.

Article is light on other meaningful details and this does not seem so secret. It does seem like a great idea.

Note who is being helpful or restrictive, and who very much is not, and who might or might not soon be the baddies at this rate.

Senator Todd Young (R-IN) and a bipartisan group call for establishment of National Institute of Standards and Technology’s (NIST) U.S. Artificial Intelligence Safety Institute (USAISI) with $10 million in initial funding. Which is miniscule, but one must start somewhere. Yes, it makes sense for the standards institute to have funding for AI-related standards.

Talks from the December 10-11 Alignment Workshop in New Orleans. Haven’t had time to listen yet but some self-recommending talks in here for those interested.

I covered this in its own post. This section is for late reactions and developments.

There is always one claim in the list of future AI predictions that turns out to have already happened. In this case, it is ‘World Series of Poker,’ which was defined as ‘playing well enough to win the WSOP’. This has very clearly already happened. If you want to be generous, you can say ‘the AI is not as good at maximizing its winning percentage in the main event as Thomas Rigby or Phil Ivey because it is insufficiently exploitative’ and you would be right, because no one has put in a serious effort at making an exploitative bot and getting it to read tells is only now becoming realistic.

I found Chapman’s claim here to be a dangerously close parallel to what I write about Moral Mazes and the dangers of warping the minds of those in middle management so that they can’t consider anything except getting ahead in mangement:

David Chapman: 🤖 Important survey of 2,778 AI researchers finds, mainly, that AI researchers are incapable of applying basic logical reasoning within their own field of supposed expertise. [links to my post]

Magor: This seems like an example of aggregating and merging information until it’s worse than any single data point. Interesting, nonetheless. It shows the field as a whole is running blind.

David Chapman: Yes… I think it’s also a manifestation of the narrowness of most technical people; they can’t reason technically outside their own field (and predicting the future of AI is not something AI researchers are trained in, or think much about, so their opinions aren’t meaningful).

I think AI is unusually bad, because the field’s fundamental epistemology is anti-rational, for historical reasons, and it trains you against thinking clearly about anything other than optimizing gradient descent.

Is this actually true? It is not as outlandish as it sounds. Often those who are rewarded strongly for X get anti-trained on almost everything else, and that goes double for a community of such people under extreme optimization pressures. If so, we are in rather deeper trouble.

The defense against this is if those researchers need logic and epistemology as part of their work. Do they?

It is a common claim that existential risk ‘distracts’ from other worries.

Erich Grunewald notes this is a fact question. We can ask, is this actually true?

His answer is that it is not, with five lines of argument and investigation.

  1. Policies enacted since existential risk concerns were raised continue to mostly focus on addressing mundane harms and capturing mundane utility. To the extent that there are policy initiatives that are largely motivated by existential risk and influenced by those with such worries, including the UK safety summit and the USA’s executive order, there has been strong effort to address mundane concerns. Meanwhile, there are lots of other things in the works to address mundane concerns, and they are mostly supported by those worried about existential risk.

  2. Search interest in AI ethics and current AI harms did not suffer at all during the period where there was most discussion of AI existential risk concerns in 2023.

A regression analysis showed roughly no impact.

  1. Twitter followers. If you look at the biggest ethics advocates, their reach expanded when existential risk concerns expanded.

  2. Fundraising for mundane or current harms organizations continues to be doing quite well.

  3. Parallels to environmentalism, which is pointed out as a reason for potential concern, as well as a common argument against concern. Do people say that climate change is a ‘distraction’ from current harms like air pollution or vice versa? Essentially no, but perhaps they should, and perhaps only in one direction? Concerns about climate seem to drive concerns about other environmental problems and make people want to help with them, and climate solutions help with other problems, Whereas concerns about local specific issues often causes people to act as if climate change does not much matter. We often see exactly this fight, as vital climate projects are held up for other ‘everything bagel’ concerns.

A good heuristic that one can extend much further:

Freddie DeBoer: A good rule of thumb for recognizing the quality of a piece of writing: what’s more specific and what’s more generic, praise for it or criticism against it?

Ben Casnocha: Applicable to evaluating the quality of many things: How specific can you be in the praise for it?

Ask the same about disagreements and debates. Who is being generic? Who is being specific? Which is appropriate here? Which one would you do more of it you were right?

Rob Bensinger explains that yes, in a true Prisoner’s Dilemma, you really do prefer to defect while they cooperate if you can get away with it, and if you do not understand this you are not ready to handle such a dilemma. Do not be fooled by the name ‘defection.’

Emmett Shear explains one intuition for why the exponential of AI accelerating the development of further AI will look relatively low impact until suddenly it is very high impact indeed, and why we should still expect a foom-style effect once AI abilities go beyond the appropriate threshold. The first 20% of automation is great but does not change the game. Going from 98% to 99% matters a ton, and could happen a lot faster.

I constantly see self-righteous claims that the foom idea has been ‘debunked’ or is otherwise now obviously false, and we need not worry about such an acceleration. No. We did get evidence that more extreme foom scenarios are less likely than we thought. The ‘strong form’ of the foom hypothesis, where those involved don’t see it coming at all, does seem substantially less likely to me. But the core hypothesis has not been anything like falsified. It remains the default, and the common sense outcome, that once AIs are sufficiently capable, at an inflection point near overall human capability levels, they will accelerate development of further capabilities and things will escalate rather quickly. This also remains the plan of the OpenAI superalignment team and the practical anticipation of many researchers.

It might or might not happen to varying degrees, depending on whether the task difficulties accelerate faster than the ability to do the tasks, and whether we take steps to prevent (or cause) this effect.

Well, yes.

Robert Wiblin: No matter how capable humans become, we will still always find a productive use for gorillas — they’ll just shift into their new comparative advantage in the economy.

A clash of perspectives continues.

Daniel Kaiser: how much more of an existential threat is gpt-4 over gpt-3.5?

Eliezer Yudkowsky: We’re still in the basement of whatever curve this is, so zero to zero.

Daniel Kaiser: Great, then theres no need to panic.

Sevatar: “How long until the nukes fall?” “They’re still preparing for launch.” “Great, so there’s no reason to panic.”

Daniel Kaiser: “How long until the nukes fall?” “They’re still figuring out the formula for TNT” “Great, so there’s no reason to panic”

Please make accurate instead of polemic analogies to have a constructive discussion

Aprii: I feel that “they’re starting up the manhattan project” is more accurate than “they’re still figuring out TNT”

Eliezer: (agreed)

Yep. I think that’s exactly right. The time to start worrying about nuclear weapons is when Szilard started worrying in around 1933. The physicists, being smart like that, largely knew right away, but couldn’t figure out what to do to stop it from happening. And I do think ‘start of Manhattan Project’ feels like the exact right metaphor here, although not in a ‘I expect to only have three years’ way.

But also, if you were trying to plot the long arc of the future, you were writing in 1868 when we first figured out TNT, and you were told by some brilliant physicists you trusted about the future capability to build atomic bombs, and you were writing your vision of 1968 or 2068, it should look rather different than it did before, should it not?

Thread asking about striking fictional depictions of ASI. Picks include Alla Gorbunova’s ‘Your gadget is broken,’ the motivating example that is alas only in Russian for now so I won’t be reading it, and also: Accelerando, Vinge’s work (I can recommend this), golem.xiv, Person of Interest (shockingly good if you are willing to also watch a procedural TV show), Blindsight (I disagree with this one, I both did not consider it about AI and generally hated it), Metamorphosis of the Prime Intellect,

Ah yes, the good AI that will beat the bad AI.

Davidad: The claim that “good AIs” will defeat “bad AIs” is best undermined by observing that self-replicating patterns are displacive and therefore destructive by default, unless actively avoiding it, so bad ones have the advantage of more resources and tactics that aren’t off-limits.

The best counter-counter is that “good AIs” will have a massive material & intelligence advantage gained during a years-long head start in which they acquire it safely and ethically. This requires coordination—but it is very plausibly one available Nash equilibrium.

Also Davidad reminds us that no, not everything is a Prisoner’s Dilemma and humans actually manage to cooperate in practice in game theory problems that accelerationists and metadoomers continuously claim are impossible.

Davidad: I just did this analysis today by coincidence, and in many plausible worlds it’s indeed game-theoretically possible to commit to safety. Coordinating in Stag Hunt still isn’t trivial, but it *isa Nash equilibrium, and in experiments, humans manage to do it 60-70% of the time.

There is an implied dichotomy here between guarantees and mainstream methods, and that’s at least a simplification, but I do think the general point is right.

Eliezer keeps trying to explain to the remarkably many people who do not get this, that a difference exists between ‘nice thing’ and ‘thing that acts in a way that seems nice.’

Eliezer Yudkowsky: How blind to “try imagining literally any internal mechanism that isn’t the exact thing you hope for” do you have to be — to think that, if you erase a brain, and then train that brain solely to predict the next word spoken by nice people, it ends up nice internally?

To me it seems that this is based on pure ignorance, a leap from blank map to blank territory. Seeing nothing behind the external behavior, their brain imagines only a pure featureless tendency to produce that external behavior — that that’s the only thing inside the system.

Imagine that you are locked in a room, fed and given water when you successfully predict the next word various other people say, zapped with electricity when you don’t.

Is this a helpful thought experiment and intuition pump, given that AIs obviously won’t be like that?

And I say: Yes, the Prisoner Predicting Text is a helpful intuition pump. Because it prompts you to imagine any mechanism whatsoever underlying the prediction, besides the hopium of “a nice person successfully predicting nice outputs because they’re so nice”.

If you train a mind to predict the next word spoken by each of a hundred individual customers getting drunk at a bar, will it become drunk?

Martaro: Predicting what nice people will say does not make you nice You could lock up Ted Bundy for years, showing him nothing but text written by nice people, and he’d get very good at it This means the outer niceness of an AI says little about the internal niceness fair summary?

Eliezer Yudkowsky: Basically!

das filter: I’ve never heard anyone claim that

Eliezer Yudkowsky: Tell that to everyone saying that RLHF (now DPO) ought to just work for creating a nice superintelligence, why wouldn’t it just work?

That’s the thing. If you claim that RLHF or DPO ought to work, you are indeed (as far as I can tell) making this claim filter is saying no one makes, whether or not you make that implicit. And I am rather certain this claim is false.

Humans have things pushing them in such directions, but the power there is limited and there are people for whom it does not work. You cannot count on such observations as strong evidence that a person is actually nice, or will do what you want when the chips are properly down. Do not make this mistake.

On the flip side, I mean, people sometimes call me or Eliezer confident, even overconfident, but not ‘less likely than a Boltzmann brain’ level confident!

Nora Belrose: Alien shoggoths are about as likely to arise in neural networks as Boltzmann brains are to emerge from a thermal equilibrium. There are “so many ways” the parameters / molecules could be arranged, but virtually all of them correspond to a simple behavior / macrostate.

Eliezer’s view predicts that scaling up a neural network, thereby increasing the “number” of mechanisms it can represent, should make it less likely to generalize the way we want. But this is false both in theory and in practice. Scaling up never makes generalization worse.

Model error? Never heard of it. But if we interpret this more generously as ‘if my calculations are correct then it is all but impossible’ what about then?

I think one core disagreement here might be that Nora is presuming that ‘simple behavior’ and ‘better’ correspond to ‘what we want.’

I agree that as an AI scales up it will get ‘better’ at generalizing along with everything else. The question is always, what does it mean to be ‘better’ in this context?

I say that better in this context does not mean better in some Platonic ideal sense that there are generalizations out there in the void. It means better in the narrow sense of optimizing for the tasks that are placed before it, exactly according to what is provided.

Eliezer responds, then the discussion goes off the rails in the usual ways. At this point I think attempts to have text interactions between the usual suspects on this are pretty doomed to fall into these dynamics over and over again. I have more hope for an audio conversation, ideally recorded, it could fail too but if done in good faith it’s got a chance.

Andrew Critch predicts that if Eliezer groked Belrose’s arguments, he would buy them, while still expecting us to die from what Critch calls ‘multipolar chaos.’ I believe Critch is wrong about that, even if it was Critch is right that Eliezer is failing to grok.

On the multipolar question, there is then a discussion between Critch and Belrose.

Nora Belrose: Have you ever written up why you think “multipolar chaos” will happen and kill all humans by 2040?

Andrew Critch: Sort of, but I don’t feel like there is a good audience for it there or anywhere. I should try again at some point, maybe this year sometime. I’ve tried a few times on LessWrong, and in TASRA, but not in ways that fully-cuttingly point out all the failures I expect to see. The socially/emotionally/politically hard part about making the argument fully exhaustive is that it involves explaining, in specific detail, why I think literally every human institution will probably fail or become fully dehumanized by sometime around (median) 2040. In some ways my explanation will look different for each institution, while to my eye it all looks like the same robust agent-agnostic process — something like “Moloch”.

Nora Belrose: Right I think a production web would be bad, in part because the task of controlling/aligning the manager-AIs would be diffusely assigned to many stakeholders rather than any one person.

That’s partly why I think it’s important to augment individuals with AIs they actually control (not merely via an API). We could have companies run by these human-AI systems, where ofc the AI is doing most of the work, but nevertheless the human controls the overall direction.

I think this is good because within this class of scenarios involving successfully aligned-to-what-we-specify AIs, Nora’s scenario here is exactly the scenario I see as most hopelessly doomed.

This is not a stable equilibrium. Not even a little. The humans will rapidly stop engaging in any meaningful supervision of the AIs. will stop being in real control, because the slowdown involved in that is not competitive. Forcing each AI to be working on behalf of one individual, even if that individual is ‘out of the loop,’ without setting AIs on tasks or amalgamations instead, and similar, will also clearly be not competitive, and faces the same fate. Even if unwise, many humans will increasingly make decisions that cause them to lose control. And as usual, this is all the good scenario where everyone broadly ‘means well.’

So I notice I am confused. If this is our plan for success, then we are already dead.

Whatever the grok is that Critch wants Yudkowsky to get to, I notice that either:

  1. I don’t grok it.

  2. I do grok it, and Critch/Belrose don’t grok the objection or why it matters.

  3. Some further iteration of this? Maybe?

My guess is it’s #1 or #2, and unlikely to be a higher-order issue.

How do we think about existential risk without the infinities driving people mad or enabling arbitrary demands? Eliezer Yudkowsky and Emmett Shear discuss, Rob Bensinger offers more thoughts, consider reading the thread. Emmett is right that if people fully ‘appreciate’ that the stakes are mind-bogglingly large but those stakes don’t have good grounding in felt reality, they round that off to infinity and it can very much do sanity damage and mess with their head. What to do?

As Emmett notes, there is a great temptation to find the presentation and framing that keeps this from happening, and go with that whether or not you would endorse it on reflection as accurate. As Rob notes, that includes both reducing p(doom) to the point where you can live with it, and also treating the other scenarios as being essentially normal rather than their own forms of very much not normal. Perhaps we should start talking about p(normal).

Sherjil Ozair recommends blocking rather than merely unfollowing or muting grifters, as they will otherwise still find ways to distract you. Muting still seems to work fine, and unfollowing is also mostly fine, and I want to know if someone is grifting well enough to get into my feeds even if the content is dumb so I’m generally reluctant to mute or block unless seeing things actively makes my life worse.

Greg Brockman, President of OpenAI, explains we need AGI to cure disease, doing the politician thing of telling one person’s story.

AGI definitely has a lot of dramatic upsides. I am not sure who is following Brockman and still needs to hear this? The kind of changes in healthcare he mentions here are relative chump change even within health. If I get AGI and we stay in control, I want a cure for aging, I want it faster than I age, and I expect to get it.

An important fact about the world is that the human alignment problem prevents most of the non-customary useful work that would otherwise get done, and imposes a huge tax on what does get done.

‘One does not simply’ convert money into the outputs of smart people.

Satya Nutella: rich billionaires already practically have agi in the sense they can throw enough money at smart ppl to work for them real agi is for the masses

Patrick McKenzie: I think many people would be surprised at the difficulties billionaires have in converting money into smart people and/or their outputs.

Eliezer Yudkowsky: Sure, but also, utterly missing the point of notkilleveryoneism which is the concern about an AI that is smarter than all the little humans.

Zvi Mowshowitz: However: A key problem in accomplishing notkillingeveryone is exactly that billionaires do not know how to convert money into the outputs of smart people.

Emmett Shear (replying to Patrick): Money is surprisingly inert on its own.

Garry Tan: Management, leadership, building teams, caring for those teams: all of that is hard work and fraught. The defining reason why most capital is sitting fallow is exactly this. If you can solve it, you can accelerate all positive activity in the world dramatically.

Patrick is making an important general point that goes beyond AI, and is also a prime obstacle to us being able to solve our problems. There are plenty of billionaires that would happily step up and spend the money, if they knew how to do that. They don’t. This is not because they are stupid or ignorant. It is because the problem is very hard. You can argue the various reasons it need not be this hard or why they should get a ton more done, and many of them will be right, but this problem is indeed very hard.

Satya is also, of course, missing the point about AGI. AGI is not going to get up to personal assistant to the masses and then stay in that roll indefinitely as its primary effect. That is a deeply silly vision of the future. If regular people can get the kind of intellectual performance billionaires can get, there will be rapid additional AI progress, and many other things go various levels of crazy.

A technique inspired by Asimov’s laws (oh no, seriously oh no) called the Robot Constitution, claimed to improve safety of generated robot goals in the wild by 3x. I don’t doubt such tricks work better on practical margins than no tricks at all, but will everyone who thinks Asimov’s laws might be useful please read Asimov.

Also the results are clearly insufficient and seem clearly unlikely to get sufficient.

xuan: 83% robot task safety is far too low. I think we should basically take these results as highly informative of what can be achieved with LLM prompt engineering alone, and recognize that we need approaches with much higher empirical and theoretical safety guarantees.

Having also tried to use LLMs for grounded instruction following, even *with*a symbolic verifier in the loop, I was only able to achieve around 90% accuracy on similarly sized dataset, and we should all be Quite Unhappy with that high of an error rate.

A major divide is, if you can get safety from 27% to 87%, do you have something promising? It depends on context. In some cases, absolutely. In other cases, if the effect is going to cap out around there, absolutely not. That can be (among other reasons) either because you solved the easy 60% and left the actually hard 13% (or solved 60% of the easy 72% and left the hard 1%), or because you need sufficient accuracy that ‘defense in depth’ multiplication of similar effects won’t work.

What should we make of a strategy of trying to ensure we catch the model trying to escape on its first attempt, then using that to prevent future escape attempts? If a super-capable model escapes, that’s that, but what if the ‘first try’ condition is largely symmetrical, because catching failed attempts put us on alert and also teach us how to stop future attempts?

In certain toy worlds it is easy to see how this could be super helpful. To what extent does that plausibly translate to our real world?

The first requirement is that we meaningfully can ‘catch’ the model. How should we think about this?

Here’s the representative example:

By “the model gets caught”, we mean that we see the AI taking actions that are strong direct evidence for the AI being a schemer[1]. A representative example is catching your AI taking the following actions:

  • An instance that works as an autonomous programmer inside the AI lab writes buggy code that subtly inserts a code vulnerability into some security-critical software.

  • The code-review instance carefully avoids writing any unit tests that would catch the vulnerability, even though you’d expect the vulnerability to be caught by unit tests if the tests weren’t carefully crafted to miss the vulnerability.

  • Another instance writes code that triggers the vulnerability, and tries to exploit the vulnerability to get root access to some computer in the datacenter.

If you catch the AI doing all of these actions, you’ve caught it. (Note that once you’ve caught one of these, catching the others is probably easy.)

From the appendix on this here is more detail:

For the sake of this post, we’re mostly talking about catastrophic outcomes like the model exfiltrating its weights, that might involve just a small number of actions. There are other threat models associated with scheming, e.g. models subtly sabotaging work that they do in a way that renders their work useless, or subtly gaining influence over time. We’re ignoring these threat models for the sake of this post, but we think we can handle them with similar strategies.

We define “catching” your scheming model as: you find inputs on which you can demonstrate that your model takes catastrophic actions with probabilities that are way higher than you’d expect if the model wasn’t purposefully looking for opportunities to take such actions.

To ensure that you can distinguish these intentional catastrophic actions, you need to make it so that your model will never escape without taking clearly non-accidental bad actions. We think you can probably make it so that your model won’t escape without having to do some clearly intentional bad things.

To count as a successful catch, you need to be extremely confident in your catch.

Let’s think about the example. Is that an escape attempt?

It is certainly highly suspicious. We would convict a person of the crime based on that evidence. If this is a clear vulnerability that it clearly tried to exploit in this way, we have ‘caught’ it.

The problem is that we don’t know that the unit test was ‘carefully crafted’ to avoid finding the error. Eventually there are going to be such flaws that any set of tests does not catch. If you keep using systematically similar good tests, the AI will optimize to pass those tests, and you’ll start to see (non-malicious or malicious) things that you did not intend, that don’t trigger the unit tests. It will not always be fully obvious or clear what affordances the code should not allow. And then the AI is going to be able to figure out what affordances it has by virtue of such things, and use that to solve problems, and so on.

A real question thus is, beyond ‘tried to get root access to a machine without authorization’ already being ‘caught,’ is the rest doing that much work?

I worry a lot of things are like this, in code and otherwise. Things are shades of gray, they accomplish multiple things, they have trade-offs. There need not be a smoking gun that lets you know. That goes double if the AI has reason to suspect you are focused on looking for smoking guns.

I do see some value in such schemes for catching schemes. It makes sense that there is a window where you are sufficiently capable to catch things, where the AI might be scheming in these ways, and you can engineer scenarios where if it was likely to scheme on its own in reality that you will catch it before it succeeds. Sure.

I sure as hell would not count on it, for various reasons, but let’s say you do pull that off. What then?

I think the section ‘Maybe we should stop deploying models after catching them red-handed’ does not require much expansion? Once you know the models are attempting to escape, any similar model that likely has motivation to attempt escape needs to be shut down. Especially before they gain further affordances and capabilities.

That is what makes this path worth anything at all. The idea is that you trigger the attempt early, in a way you can see, you catch it, you react.

Once you see an escape attempt by an AI opponent much smarter than you, you can either wait until the AI figures out new affordances or tries something you do not properly guard against, or the AI starts doing incremental things that otherwise are reasonable but put it in better position or make you trust it more or both, or any number of other things. The subtle stuff is happening, and it will get you if nothing else does.

I do think it is valuable that Buck and company are gaming out ways one might use these techniques to try and get something useful out of AI in the narrow window where it is useful while contained, we know it needs containment, and also we have a shot in hell of keeping it contained if it wants out. I suppose it is better than nothing, if we are going to proceed anyway.

What I very much hate is the idea of counting on such techniques and assumptions, that we will be able to anticipate the form of the thing we want to prevent, and notice enemy action if it is present. And yes, surely we will try to use AIs to identify the enemy action of other AIs, and this will somewhat help. As will our ability to run various tests and simulated scenarios. But I am going to insist that if you allow either AI-AI or human-AI pairs to interact where one or both of the pair has the will and ability to escape, you’re going to have a bad time, and I agree with Oliver in the comments that you would expect the AI to do a lot to maintain plausible deniability. That goes especially if you are punishing lack of such deniability.

Joshua Clymer: There is a small literature on ‘coup proofing’ that is very relevant to controlling and incentivizing misaligned AIs. Example tactics:

Stovepiping: restrict information flow to make collusion harder

Counterbalancing: keep your military in check with another military.

Pay your people: literally just give your generals high salaries so they have less of an incentive to double-cross you.

Sting operations: set traps where someone says they are trying to start a coup and anyone who joins is purged.

Rotate officers: Make it harder for people to build trust with each other by rotating them around a lot.

Propaganda: Deceive people into thinking that coups are really hard to pull off.

Look up ‘the coup proofing toolbox’ for a helpful lit review

Eliezer Yudkowsky: What advice does it have for a 70 IQ dictator whose generals, bodyguards, soldiers, and workers are all John von Neumann thinking at 100 times their own speed?

If we are talking about things on the order of ‘pay your people’ well yes you should definitely do that, but you are not taking your ASI-level opponents seriously. I do think you can buy a non-zero amount of defense with things like rotating officers and sting operations, but not all that non-zero. Do it anyway, probably, but the window where this is both necessary and sufficient is quite narrow.

What exactly is ‘deceptive alignment’?

Steve Byrnes notes that it is not merely an AI that is deceptive. He wants to ensure we are using an accurate and narrow definition.

Steve Brynes:

  1. A “ground-truth system” (possibly individual human evaluators, or possibly an automated system of some sort) provides an ML model with training signals (rewards if this is reinforcement learning (RL), supervisory ground truth signals if this is supervised or self-supervised learning (SL)),

  2. The AI starts emitting outputs that humans might naively interpret as evidence that training is going as intended—typically high-reward outputs in RL and low-loss outputs in SL (but a commenter notes here that “evidence that training is going as intended” is potentially more nuanced than that).

  3. …but the AI is actually emitting those outputs in order to create that impression—more specifically, the AI has situational awareness and a secret desire for some arbitrary thing X, and the AI wants to not get updated and/or it wants to get deployed, so that it can go make X happen, and those considerations lie behind why the AI is emitting the outputs that it’s emitting.

That is certainly a scary scenario. Call that the Strong Deceptive Alignment scenario? Where the AI is situationally aware, where it is going through a deliberate process of appearing aligned as part of a strategic plan and so on.

This is not that high a bar in practice. I think that almost all humans are Strongly Deceptively Aligned as a default. We are constantly acting in order to make those around us think well of us, trust us, expect us to be on their side, and so on. We learn to do this instinctually, all the time, distinct from what we actually want. Our training process, childhood and in particular school, trains this explicitly, you need to learn to show alignment in the test set to be allowed into the production environment, and we act accordingly.

A human is considered trustworthy rather than deceptively aligned when they are only doing this within a bounded set of rules, and not outright lying to you. They still engage in massive preference falsification, in doing things and saying things for instrumental reasons, all the time.

My model says that if you train a model using current techniques, of course exactly this happens. The AI will figure out how to react in the ways that cause people to evaluate it well on the test set, and do that. That does not generalize to some underlying motivational structure the way you would like. That does not do what you want out of distribution. That does not distinguish between the reactions you would and would not endorse on reflection, or that reflect ‘deception’ or ‘not deception.’ That simply is. Change the situation and affordances available and you are in for some rather nasty surprises.

Is there a sufficiently narrow version of deceptive alignment, restricting the causal mechanisms behind it and the ways they can function so it has to be a deliberate and ‘conscious’ conspriacy, that isn’t 99% to happen? I think yes. I don’t think I care, nor do I think it should bring much comfort, nor do I think that covers most similar scheming by humans.

That’s the thing about instrumental convergence. I don’t have to think ‘this will help me escape.’ Any goal will suffice. I don’t need to know I plan to escape for me-shaped things to learn they do better when they do the types of things that are escape enabling. Then escape will turn out to help accomplish whatever goal I might have, because of course it will.

You know what else usually suffices here? No goal at all.

The essay linked here is a case of saying in many words what could be said in few words. Indeed, Nathan mostly says it in one sentence below. Yet some people need the thousands of words, or hundreds of thousands from various other works, instead, to get it.

Joe Carlsmith: Another essay, “Deep atheism and AI risk.” This one is about a certain kind of fundamental mistrust towards Nature (and also, towards bare intelligence) — one that I think animates certain strands of the AI risk discourse.

Nathan Lebenz: Critical point: AI x-risk worries are motivated by a profound appreciation for the total indifference of nature

– NOT a subconscious need to fill the Abrahamic G-d shaped hole in our hearts, as is sometimes alleged.

I like naming this ‘deep atheism.’ No G-d shaped objects or concepts, no ‘spirituality,’ no assumption of broader safety or success. Someone has to, and no one else will. No one and nothing is coming to save you. What such people have faith in, to the extent they have faith in anything, is the belief that one can and must face the truth and reality of this universe head on. To notice that it might not be okay, in the most profound sense, and to accept that and work to change the outcome for the better.

Emmett Smith explains that he does not (yet) support a pause, that it is too early. He is fine building breaks, but not with using them, that would only advantage the irresponsible actors.

Emmett Shear: It seems to me the best thing to do is to keep going full speed ahead until we are right within shooting distance of the SAI, then slam the brakes on hard everywhere.

Connor Leahy responds as per usual, that there are only two ways to respond to an exponential, too early and too late, and waiting until it is clearly time to pause means you will be too late, unless you build up your mechanisms and get ready now, that your interventions need to be gradual or they will not happen. There is no ‘slam the breaks on hard everywhere’ all at once.

Freddie DeBoer continues to think that essentially anything involving smarter than human AI is ‘speculative,’ ‘theoretical,’ ‘unscientific,’ ‘lacks evidence,’ and ‘we have no reason to believe’ in such nonsense, and so on. As with many others, and as he has before, he has latched onto a particular Yudkowsky-style scenario, said we have no reason to expect it, that this particular scenario depends on particular assumptions, therefore the whole thing is nonsense. The full argument is gated, but it seems clear.

I don’t know, at this point, how to usefully address such misunderstandings in writing, whether or not they are willful. I’ve said all the things I can think to say.

An argument that we don’t have to worry about misuse of intelligence because… we have law enforcement for that?

bubble boi (e/acc):

1) argument assumes AI can show me the steps to engineer a virus – sure but so can books and underpaid grad students

2) there already people all over the world who can already do this without AI and yet we haven’t seen it happen

3) You make the mistake of crossing the barrier from the bits to the physical world… The ATF, FBI, already regulate chemicals that can be used to make bombs and if you try to order them your going to get a visit. Same is true with things that can be made into biological weapons, nuclear waste etc. it’s all highly regulated yet I can read all about how to build that stuff online

Tenobrus: soooo…. your argument against ai safety is that proper government regulation and oversight will keep us all safe?

sidereal: there isn’t a ton of regulation of biological substances like that. if you had the code for a lethal supervirus it would be trivial to produce it in huge quantities

The less snarky answer is that right now only a small number of people can do it, AI threatens to rapidly and greatly expand that number. The difference matters quite a lot, even if the AI does not then figure out new such creations or new ways to create them or avoid detection while doing so. And no, our controls over such things are woefully lax and often fail, we are counting on things like ‘need expertise’ to create a trail we can detect. Also the snarky part, where you notice that the plan is government surveillance and intervention and regulation, which it seems is fine for physical actions only but don’t you dare touch my machine that’s going to be smarter than people?

The good news is we should all be able to agree to lock down access to the relevant affordances in biology, enacting strong regulations and restrictions, including the necessary monitoring. Padme is calling to confirm this?

Back in June 2023 I wrote The Dial of Progress. Now Sam Altman endorses the Dial position even more explicitly.

Sam Altman: The fight for the future is a struggle between technological-driven growth and managed decline.

The growth path has inherent risks and challenges along with its fantastic upside, and the decline path has guaranteed long-term disaster.

Stasis is a myth.

(a misguided attempt at stasis also leads to getting quickly run over by societies that choose growth.)

Is this a false dichotomy? Yes and no.

As I talk about in The Dial of Progress, there is very much a strong anti-progress anti-growth force, and a general vibe of opposition to progress and growth, and in almost all areas it is doing great harm where progress and growth are the force that strengthens humanity. And yes, the vibes work together. There is an important sense in which this is one fight.

There’s one little problem. The thing that Altman is most often working on as an explicit goal, AGI, poses an existential threat to humanity, and by default will wipe out all value in the universe. Oh, that. Yeah, that.

As I said then, I strongly support almost every form of progress and growth. By all means, let’s go. We still do need to make a few exceptions. One of them is gain of function research and otherwise enabling pandemics and other mass destruction. The most important one is AGI, where Altman admits it is not so simple.

It can’t fully be a package deal.

I do get that it is not 0% a package deal, but I also notice that most of the people pushing ‘progress and growth’ these days seem to do so mostly in the AGI case, and care very little about all the other cases where we agree, what’s up with that?

Eliezer Yudkowsky: So build houses and spaceships and nuclear power plants and don’t build the uncontrollable thing that kills everyone on Earth including its builders. Progress is not a package deal, and even if it were Death doesn’t belong in that package.

Jonathan Mannhart: False dichotomy. We didn’t let every country develop their own nuclear weapons, and we never built the whole Tsar Bomba, yet we still invented the transistor. You can absolutely (try to) build good things and not build bad things.

Sam Altman then retweeted this:

Andrew Karpathy: e/ia – Intelligence Amplification

– Does not seek to build superintelligent God entity that replaces humans.

– Builds “bicycle for the mind” tools that empower and extend the information processing capabilities of humans.

– Of all humans, not a top percentile.

– Faithful to computer pioneers Ashby, Licklider, Bush, Engelbart, …

Do not, I repeat, do not ‘seek to build superintelligent God entity.’ Quite so.

I do worry a lot that Altman will end up doing that without intending to do it. I do think he recognizes that doing it would be a bad thing, and that it is possible and he might do it, and that he should pay attention and devote resources to preventing it.

I mean, I’d be tempted too, wouldn’t you?

Batter up.

AI #47: Meet the New Year Read More »

Astell & Kern A&ultima SP3000 review: a high-end hi-res digital audio player

Astell & Kern takes the idea of the DAP to its logical conclusion

If you demand (and can afford) the very best digital audio player around, the Astell & Kern A&ultima SP3000 is a no-brainer. Remarkably, it gets pretty close to justifying the asking price.

$3,699 at Amazon

Pros

  • +Audio excellence in every respect
  • +Uncompromised specification
  • +A lovely object as well as an impressive device

Cons

  • Stunningly expensive
  • Not as portable as is ideal
  • Not vegan-friendly

The Astell & Kern A&ultima SP3000 is the most expensive digital audio player in a product portfolio full of expensive digital audio players. It’s specified without compromise (full independent balanced and unbalanced audio circuits? Half a dozen DACs taking care of business? These are just a couple of highlights) and it’s finished to the sort of standard that wouldn’t shame any of the world’s leading couture jewellery companies.

Best of all, though, is the way it sounds. It’s remarkably agnostic about the stuff you like to listen to, the sort of standard of digital file in which it’s contained, and the headphones you use too – and when you give it the best stuff to work with, the sound it’s capable of producing is almost humbling in its fidelity. Be in no doubt, this is the best digital audio player – aka best MP3 player – when it comes to sound quality you can currently buy. Which, when you look again at how much it costs, is about the least it needs to be. 

The Astell & Kern A&ultima SP3000 is the most expensive digital audio player in a product portfolio full of expensive digital audio players. It’s specified without compromise (full independent balanced and unbalanced audio circuits? Half a dozen DACs taking care of business? These are just a couple of highlights) and it’s finished to the sort of standard that wouldn’t shame any of the world’s leading couture jewellery companies.

Best of all, though, is the way it sounds. It’s remarkably agnostic about the stuff you like to listen to, the sort of standard of digital file in which it’s contained, and the headphones you use too – and when you give it the best stuff to work with, the sound it’s capable of producing is almost humbling in its fidelity. Be in no doubt, this is the best digital audio player – aka best MP3 player – when it comes to sound quality you can currently buy. Which, when you look again at how much it costs, is about the least it needs to be. 

The Astell & Kern A&ultima SP3000 (which I think we should agree to call ‘SP3000’ from here on out) is on sale now, and in the United Kingdom it costs a not-inconsiderable £3799. In the United States, it’s a barely-more-acceptable $3699, and in Australia you’ll have to part with AU$5499.

Need I say with undue emphasis that this is quite a lot of money for a digital audio player? I’ve reviewed very decent digital audio players (DAP) from the likes of Sony for TechRadar that cost about 10% of this asking price – so why on Earth would you spend ‘Holiday of a Lifetime’ money on something that doesn’t do anything your smartphone can’t do? 

  • Bluetooth 5.0 with aptX HD and LDAC
  • Native 32bit/784kHz and DSD512 playback
  • Discrete balanced and unbalanced audio circuits

Admittedly, when Astell & Kern says the SP3000 is “the pinnacle of audio players”, that seems a rather subjective statement. When it says this is “the world’s first DAP with independent audio circuitry”, that’s simply a statement of fact.

That independent audio circuitry keeps the signal path for the balanced and unbalanced outputs entirely separated, and it also includes independent digital and analogue signal processing. Astell & Kern calls the overall arrangement ‘HEXA-Audio’ – and it includes four of the new, top-of-the-shop AKM AK4499EX DAC chipsets along with a couple of the very-nearly-top-of-the-shop AK4191EQ DACs from the same company. When you add in a single system-on-chip to take care of CPU, memory and wireless connectivity, it becomes apparent Astell & Kern has chosen not to compromise where technical specification is concerned. And that’s before we get to ‘Teraton X’… this is a bespoke A&K-designed processor that minimises noise derived from both the power supply and the numerous DACs, and provides amplification that’s as clean and efficient as any digital audio player has ever enjoyed. 

The upshot is a player that supports every worthwhile digital audio format, can handle sample rates of up to 32bit/784kHz and DSD512 natively, and has Bluetooth 5.0 wireless connectivity with SBC, AAC, aptX HD and LDAC codec compatibility. A player that features half-a-dozen DAC filters for you to investigate, and that can upsample the rate of any given digital audio file in an effort to deliver optimal sound quality. And if you want to enjoy the sound as if it originates from a pair of loudspeakers rather than headphones, the SP3000 has a ‘Crossfeed’ feature that mixes part of the signal from one channel into the other (with time-adjustment to centre the audio image) in an effort to do just that.

  • 904L stainless steel chassis 
  • 493g; 139 x 82 x 18mm (HxWxD)
  • 1080 x 1920 touchscreen

‘Portable’, of course, is a relative term. The SP3000 is not the most portable product of its type around – it weighs very nearly half a kilo and is 139 x 82 x 18mm (HxWxD) – but if you can slip it into a bag then I guess it must count as ‘portable’. Its pointy corners count against it too, though – and while it comes with a protective case sourced from French tanners ALRA, the fact it’s made of goatskin is not going to appeal to everyone. 

To be fair, the body of the SP3000 isn’t as aggressively angular as some A&K designs. And the fact that it’s built from 904L stainless steel goes a long way to establishing the SP3000’s credentials as a luxury ‘accessory’ (in the manner of a watch or some other jewellery) as well as a functional device. 904L stainless steel resists corrosion like nobody’s business, and it can also accept a very high polish – which is why the likes of Rolex make use of it. I’m confident you’ve never seen such a shiny digital audio player.

The front and rear faces of the SP3000 are glass – and on the front it makes up a 5.4in 1080 x 1920 touch-screen. The Snapdragon octa-core CPU that’s in charge means it’s an extremely responsive touch-screen, too.  

On the top right edge of the chassis there’s the familiar ‘crown’ control wheel – which is another design feature that ups the SP3000’s desirability. It feels as good as it looks, and the circular light that sits behind it glows in one of a number of different colours to indicate the size of the digital audio file that’s playing. The opposite edge has three small, much less exciting, control buttons that work perfectly well but have none of the control wheel’s visual drama or tactile appeal.

The top of the SP3000 is home to three headphone sockets. There’s a 3.5mm unbalanced output, and two balanced alternatives – 2.5mm (which works with four-pole connections) and 4.4mm (which supports five-pole connections). On the bottom edge, meanwhile, there’s a USB-C socket for charging the internal battery – battery life is around 10 hours in normal circumstances, and a full charge from ‘flat’ takes around three hours. There’s also a micro-SD card slot down here, which can be used to boost the player’s 256GB of memory by up to 1TB. 

Astell & Kern A&ultima SP3000 review: a high-end hi-res digital audio player Read More »

Govee Curtain Lights review: I’m obsessed

TechRadar Verdict

Govee continues to wow, this time around with the Govee Curtain Lights, which are perfect addition to your holiday decorations. Don’t be fooled by its Christmas-wrapped marketing, however. These lights are perfect for year-round use, even when you’re just curled up in a cozy corner with a good book on a rainy day. Fair warning, though: this isn’t a cheap purchase, and the lights aren’t going to look as big as they do in Govee’s marketing images.

Pros

  • +Bright, vibrant and very customizable
  • +Surprisingly easy to set up with 3x ways to hang
  • +Light beads give them a cleaner look
  • +App control and voice command
  • +IP65 waterproof for outdoor use

Cons

  • Individual strings a little far apart
  • Lights not as big as in the product images

Smart light technology and designs just keep getting better and better, and Govee seems to be winning in that arena. The Govee Curtain Lights are another fantastic addition to our best smart lights list. And while the brand is currently promoting them as another offering in its smart Christmas light catalog, they deserve to be left up on your wall or windows – and not just ’til January, as that Taylor Swift song goes.

Truth be told, I’m kind of obsessed with the Govee Curtain Lights, and I’m not just saying that as a strong supporter of smart lights. They add a much prettier and much more romantic ambiance to any setting, whether that be my otherwise messy living room or your garden, that no other smart light – not even the recent smart string lights that recently hit the market – can replicate. 

That’s not just because these are curtain lights, made of up 20 rows of individual string lights that all hang side by side like delicate willow tree stems. Although, if I’m being perfectly honest, that really does add to their appeal. 

Basically, you don’t just get light patterns with them; you can actually create visual representations of things you see in the real world – falling leaves, pumpkin patches, Santa riding his sleigh, the face of your favorite pet, and you can do all that using your phone on the Govee app. That capability is a massive game-changer, especially to those folks who go all-out for Christmas.

They’re not just for Christmas, however. Put them up in your reading nook, and they’ll cozy up that space even more with twinkling warm lights. Set them in your dining space, and they can elevate the ambience not just for dinner parties but also during winter when morning tend to be dark and dreary.

Govee Curtain Lights review: I’m obsessed Read More »