Mike M. – Page 68

AI #63: Introducing Alpha Fold 3

Introducing / Mike M. / May 9, 2024

It was a remarkably quiet announcement. We now have Alpha Fold 3, it does a much improved job predicting all of life’s molecules and their interactions. It feels like everyone including me then shrugged and went back to thinking about other things. No cool new toy for most of us to personally play with, no existential risk impact, no big trades to make, ho hum.

But yes, when we look back at this week, I expect what we remember will be Alpha Fold 3.

Unless it turns out that it is Sophon, a Chinese technique to potentially make it harder to fine tune an open model in ways the developer wants to prevent. I do not expect this to get the job done that needs doing, but it is an intriguing proposal.

We also have 95 theses to evaluate in a distinct post, OpenAI sharing the first draft of their model spec, Apple making a world class anti-AI and anti-iPad ad that they released thinking it was a pro-iPad ad, more fun with the mysterious gpt2, and more.

The model spec from OpenAI seems worth pondering in detail, so I am going to deal with that on its own some time in the coming week.

Introduction.
Table of Contents.
Language Models Offer Mundane Utility. Agents, simple and complex.
Language Models Don’t Offer Mundane Utility. No gadgets, no NPCs.
GPT-2 Soon to Tell. Does your current model suck? In some senses.
Fun With Image Generation. Why pick the LoRa yourself?
Deepfaketown and Botpocalypse Soon. It’s not exactly going great.
Automation Illustrated. A look inside perhaps the premiere slop mill.
They Took Our Jobs. Or are we pretending this to help the stock price?
Apple of Technically Not AI. Mistakes were made. All the feels.
Get Involved. Dan Hendrycks has a safety textbook and free online course.
Introducing. Alpha Fold 3. Seems like a big deal.
In Other AI News. IBM, Meta and Microsoft in the model game.
Quiet Speculations. Can we all agree that a lot of intelligence matters a lot?
The Quest for Sane Regulation. Major labs fail to honor their commitments.
The Week in Audio. Jack Clark on Politico Tech.
Rhetorical Innovation. The good things in life are good.
Open Weights are Unsafe and Nothing Can Fix This. Unless, maybe? Hmm.
The Lighter Side. Mmm, garlic bread. It’s been too long.

How much utility for how much cost? Kapoor and Narayanan argue that with the rise of agent-based systems, you have to evaluate different models on coding tasks based on dollar cost versus quality of results. They find that a simple ‘ask GPT-4 and turn the temperature slowly up on retries if you fail’ is as good as the agents they tested on HumanEval, while costing less. They mention that perhaps it is different with harder and more complex tasks.

How much does cost matter? If you are using such queries at scale without humans in the loop, or doing them in the background on a constant basis as part of your process, then cost potentially matters quite a bit. That is indeed the point of agents. Or if you are serving lots of customers constantly for lots of queries, those costs can add up fast. Thus all the talk about the most cost-efficient approach.

There are also other purposes for which cost at current margins is effectively zero. If you are a programmer who must evaluate, use and maintain the code outputted by the AI, what percentage of total costs (including your labor costs) are AI inference? In the most obvious baseline case, something akin to ‘a programmer asks for help on tasks,’ query speed potentially matters but being slightly better at producing good code, or even slightly better at producing code that is easier for the human to evaluate, understand and learn from, is going to crush any sane inference costs.

If I was paying by the token for my AI queries, and you offered me the option of a 100x cost increase that returned superior answers at identical speed, I would use the 100x costlier option for most purposes even if the gains were not so large.

Ethan Mollick is the latest to try the latest AI mobile hardware tools and find them inferior to using your phone. He also discusses ‘copilots,’ where the AI goes ahead and does something in an application (or in Windows). Why limit yourself to a chatbot? Eventually we won’t. For now, it has its advantages.

Iterate until you get it right.

Michael Nielsen: There is a funny/striking story about former US Secretary of State Colin Powell – when someone had to make a presentation to him, he’d sometimes ask before they began: “Is this presentation the best you can do?”

They’d say “no”, he’d ask them to go away and improve it, come back. Whereupon he would ask again… and they might go away again.

I don’t know how often he did this, if ever – often execs want fast, not perfect; I imagine he only wanted “best possible” rarely. But the similarity to ChatGPT debugging is hilarious. “Is that really the answer?” works…

Traver Hart: I heard this same anecdote about Kissinger. He asked whether a written report was the best a staffer could do, and after three or so iterations the staffer finally said yes. Then Kissinger said, “OK, now I’ll read it.”

One obvious thing to do is automate this process. Then only show a human the output once the LLM confirms it was the best the model could do.

Agent Hospital is a virtual world that trains LLMs to act as better doctors and nurses. They claim that after about ten thousand virtual patients the evolved doctors got state-of-the-art accuracy of 93% on a subset of MedQA covering major respiratory diseases. This seems like a case where the simulation assumes the facts you want to teach, avoiding the messiness inherent in the physical world. Still, an interesting result. File under ‘if you cannot think of anything better, brute force imitate what you know works. More Dakka.’

Do your homework for you, perhaps via one of many handy AI wrapper apps.

Find companies that do a lot of things that could be automated and would benefit from AI, do a private equity-style buyout, then have them apply the AI tools. One top reason to buy a company is that the new owner can break a bunch of social promises, including firing unnecessary or underperforming workers. That is a powerful tool when you combine it with introducing AI to replace the workers, which seems to be the name of the game here. I am not here to judge, and also not here to judge judgers.

Catholic.com ‘defrocks’ their AI pastor Justin, turning him into a regular Joe.

Want to use big cloud AI services? Good luck with the interface. Real builders are reporting trying to use Azure for basic things and being so frustrated they give up.

I know!

Marques Brownlee: On one hand: It seems like it’s only a matter of time before Apple starts making major AI-related moves around the iPhone and iOS and buries these AI-in-a-box gadgets extremely quickly

On the other hand: Have you used Siri lately?

Peter Wildeford: I am always baffled at how bad the current Alexa / Google Home / Siri are relative to what they should be capable of given GPT-4 level tech.

Kevin Fisher lists his six main reasons why we don’t have realistically behaving NPCs in games yet. They are essentially:

Development cycles are long.
Costs are still too high.
Not the role the NPC has.
Doesn’t fit existing game templates.
Such NPCs are not yet compelling.
We don’t have a good easy way to create the NPCs yet.

I would agree, and emphasize: Most games do not want NPCs that behave like people.

There are exciting new game forms that do want this. Indeed, if I got the opportunity to make a game today, it would have LLM NPCs as central to the experience. But that would mean, as Kevin suggests, building a new type of game from the ground up.

I do think you can mostly slot LLM-powered NPCs into some genres. Open world RPGs or MMOs are the most obvious place to start. And there are some natural fits, like detective games, or games where exploration and seeing what happens is the point. Still, it is not cheap to let those characters out to play and see what happens, and mostly it would not be all that interesting. When the player is in ‘gaming’ mode, the player is not acting so realistically. Having a ‘realistic’ verbal sparring partner would mostly cause more weirdness and perverse player behaviors.

I keep asking, but seriously, what is up with Apple, with Siri, and also with Alexa?

Modest Proposal: I am the last person to defend Apple but they spent more on R&D than Microsoft in the quarter and trailing twelve months. Their buyback is like one year of free cash flow. You can argue they are not getting a return on their R&D, but it’s not like they are not spending.

And sure, you can argue Microsoft is outsourcing a portion of its R&D to OpenAI, and is spending ungodly sums on capex, but Apple is still spending $30B on R&D. Maybe they should be spending more, maybe they should be inventing more, but they are spending.

Sam Altman asks: If an AI companion knows everything about you, do we need a form of protection to prevent it from being subpoenaed to testify against you in court?

I mean, no? It is not a person? It can’t testify? It can of course be entered into evidence, as can queries of it. It is your personal property, or that of a company, in some combination. Your files can and will be used against you in a court of law, if there is sufficient cause to get at them.

I can see the argument that if your AI and other tech is sufficiently recording your life, then to allow them to be used against you would violate the 5th amendment, or should be prevented for the same logical reason. But technology keeps improving what it records and we keep not doing that. Indeed, quite the opposite. We keep insisting that various people and organizations use that technology to keep better and better records, and ban people from using methods with insufficient record keeping.

So my prediction is no, you are not getting any privacy protections here. If you don’t want the AI used against you, don’t use the AI or find a way to wipe its memory. And of course, not using the AI or having to mindwipe it would be both a liability and hella suspicious. Some fun crime dramas in our future.

The Humane saga continues. If you cancel your order, they ask you why. Their wording heavily implies they won’t cancel unless you tell them, although they deny this, and Marques Brownlee Tweeted that they require a response.

Sam Altman confirms that gpt2-chatbot is not GPT-4.5, which is good for OpenAI since tests confirm it is a 4-level model. That still does not tell us what it is.

It was briefly gone from Arena, but it is back now, as ‘im-a-good-gp2-chatbot’ or ‘im-also-a-good-gp2-chatbot.’ You have to set up a battle, then reload until you get lucky.

This also points out that Arena tells you what model is Model A and what is Model B. That is unfortunate, and potentially taints the statistics.

Anton (@abccaj) points out that gpt2 is generating very particular error messages, so changes are very high it is indeed from OpenAI.

Always parse exact words.

Brad Lightcap (COO, OpenAI): In the next couple of 12 months, I think the systems we use today will be laughably bad. We think we’re going to move towards a world where they’re much more capable.

Baptiste Lerak: “In the next couple of 12 months”, who talks like that?

Well, there are two possibilities. Either Brad Lightcap almost said ‘next couple of months’ or he almost said ‘next couple of years.’ Place your bets. This is a clear intention to move to a GPT-5 worthy of the name within a year, but both ‘GPT-5 is coming in a few months but I can’t say that’ and ‘I don’t know if GPT-5 will be good enough to count as this but the hype must flow’ are on the table here.

Colin Fraser: Me 🤝 OpenAI execs

“GPT4 sucks and is not useful enough to be worth anything.”

That is not how I read this. GPT-4 is likely both being laughably bad compared to GPT-5 and other future AIs, and also highly useful now. The history of technology is filled with examples. Remember your first computer, or first smartphone?

What to think of OpenAI’s move from ‘here’s a product’ to ‘here’s a future product’?

Gergely Orosz: OpenAI was amazing in 2022-2023 because they shipped a product that spoke for itself. Jaws dropped by those using it, and seeing it for themselves.

To see the company hype up future (unreleased) products feels like a major shift. If it’s that good, why not ship it, like before?

I’ve seen too many formerly credible execs hype up products that then underperformed.

These days, I ignore future predictions and how good a new product will be. Because usually this kind of “overhyping” is done with an agenda (e.g. fundraising, pressure on regulators etc).

Don’t forget that when execs at a company talk to the media: *there is always a business goal behind it.*

The reason is rarely to get current customers excited about something (that could be done with an email to them!)

This smells like OpenAI prepping for more fundraising.

Up to and including GPT-4 their execs didn’t talk about up how good their next model would be. They released it and everyone could see for themselves.

This is the shift.

Stylus: Automatic Adapter Selection for Diffusion Models, to automatically select the right LoRAs for the requested task. Yes, obviously.

OpenAI talks various ways it is working on secure AI infrastructure, particularly to protect model weights, including using AI as part of the cyberdefense strategy. They are pursuing defense in depth. All net useful and great to see, but I worry it will not be enough.

OpenAI joins C2PA, the Coalition for Content Provenance and Authenticity. They have been using the C2PA metadata standard with DALL-E 3 already, and will also do so for Sora. They also announce a classifier with ~98% accuracy (~2% false negatives) in identifying DALLE-3 generated images with ~0.5% false positive rate, with a 5%-10% false positive rate for AI-generated images from other models. It is accessible through their researcher access program. Interesting that this is actively not trying to identify other AI image content.

The easiest way to understand society’s pace of reaction to AI is this:

Miles Brundage: The fact that banks are still not only allowing but actively encouraging voice identification as a means of account log-in is concerning re: the ability of some big institutions to adapt to AI.

In particular my point is that the internal decision-making processes of banks seem broken since it is all but certain there are many people at these companies who follow AI and have tried raise the alarm.

Btw I’m proud OpenAI recently was quite explicit on this point.

Voice authentication as viable security is deader than dead. Yet some of our biggest financial institutions continue to push it anyway.

When you say that we will adapt to AI-enabled threats, remember that this is us.

We are putting AI tags on things all over the place without asking, such as Dropbox automatically doing this for any images you upload.

Reminder that the ‘phone relative claiming you need bail money’ scam is old and usually does not involve AI. Voices are often easy to obscure if you act sufficiently hysterical. The good news is that they continue to mostly be massively incompetent, such as in this example, also Morgan knew about the scame beforehand. The part where they mimic your voice is scary, but the actual threat is the rest of the package.

Brian Tinsman, former Magic: The Gathering designer, whose Twitter profile was last seen posting about NFTs, raises over a million dollars on kickstarter for new CCG Wonders of the First. What is the twist? All the artwork is AI generated. It ‘builds on the legacy of past artists to produce original creations’ like ‘a student learning to paint by studying the masters.’

Many are not happy. I would not want to be someone trying to get picked by game stores with AI generated artwork in 2024.

Katy Perry and others are deepfaked attending the Met gala and looking gorgeous, and they went viral on various social media, fooling Perry’s mother. Harmless as such, but does not bode well.

Report there is a wave of social network channels full of… entirely fake recipes, voiced and likely written by AI, with millions of subs but no affiliate websites? Which means that for some reason people want to keep watching. They can’t look away.

The latest ‘LLMism’?

Kathleen Breitman: Is “as it was not appropriate” a GPT-ism? I’ve seen it twice in two otherwise awkward emails in the last six weeks and now I’m suspicious.

(No judgement on people using AI to articulate themselves more clearly, especially those who speak English as a second or third language, but I do find some of the turns of phrase distracting.)

How long until people use one AI to write the email, then another AI to remove the ‘AI-isms’ in the draft?

Remember that thing with the fake Sports Illustrated writers? (Also, related: remember Sports Illustrated?) Those were by a company called AdVon, and Maggie Harrison Dupre has more on them.

Maggie Harrison Dupre: We found AdVon’s fake authors at the LA Times, Us Weekly, and HollywoodLife, to name a few. AdVon’s fake author network was particularly extensive at the McClatchy media network, where we found at least 14 fake authors at more than 20 of its papers, including the Miami Herald.

Earlier in our reporting, AdVon denied using AI to generate editorial content. But according to insiders we spoke to, this wasn’t true — and in fact, AdVon materials we obtained revealed that the company has its own designated AI text generator.

That AI has a name: MEL.

In a MEL training video we obtained, an AdVon manager shows staffers how to create one of its lengthy buying guide posts using the AI writing platform. The article rings in at 1,800 words — but the only text that the manager writes herself is the four-word title.

…

“They started using AI for content generation,” the former AdVon worker told us, “and paid even less than what they were paying before.”

The former writer was asked to leave detailed notes on MEL’s work — feedback they believe was used to fine-tune the AI which would eventually replace their role entirely.

The situation continued until MEL “got trained enough to write on its own,” they said. “Soon after, we were released from our positions as writers.”

“I suffered quite a lot,” they added. “They were exploitative.”

…

Basically, AdVon engages in what Google calls “site reputation abuse”: it strikes deals with publishers in which it provides huge numbers of extremely low-quality product reviews — often for surprisingly prominent publications — intended to pull in traffic from people Googling things like “best ab roller.” The idea seems to be that these visitors will be fooled into thinking the recommendations were made by the publication’s actual journalists and click one of the articles’ affiliate links, kicking back a little money if they make a purchase.

It is ‘site reputation abuse’ and it is also ‘site reputation incineration.’ These companies built up goodwill through years or decades of producing quality work. People rely on that reputation. If you abuse that reliance and trust, it will quickly go away. Even if word does not spread, you do not get to fool any given person that many times.

This is not an attempt to keep the ruse up. They are not exactly trying hard to cover their tracks. The headshots they use often come from websites that sell AI headshots.

A list of major publications named as buyers here would include Sports Illustrated, USA Today, Hollywood Life, Us Weekly, the Los Angeles Times and Miami Herald. An earlier version of the site claimed placement in People, Parents, Food & Wine, InStyle and Better Homes and Gardens, among many others.

The system often spits out poorly worded incoherent garbage, and is known, shall we say, make mistakes.

All five of the microwave reviews include an FAQ entry saying it’s okay to put aluminum foil in your prospective new purchase.

One business model in many cases was to try to get placement from a seller for reviews of their product, called a ‘curation fee,’ payable when the post went live. It seems this actually does drive conversions, even if many people figure the ruse out and get turned off, so presumably brands will keep doing it.

There are two failure modes here. There is the reputation abuse, where you burn down goodwill and trust for short term profits. Then there is general internet abuse, where you don’t even do that, you just spam and forget, including hoping publications burn down their own reputations for you.

AdVon has now lost at least some of its clients, but the report says others including USA Today and Us Weekly are still publishing such work.

We should assume such problems will only get worse, at least until the point when we get automatic detection working on behalf of typical internet users.

What should we call all of this AI-generated nonsense content?

Simon Willison: Slop is the new name for unwanted AI-generated content.

Near: broadly endorse ‘slop’ as a great word to refer to AI-generated content with little craft or curation behind it AI is wonderful at speeding up content creation, but if you outsource all taste and craft to it, you get slop.

I was previously favoring ‘drek’ and have some associational or overloading concerns with using ‘slop.’ But mostly it invokes the right vibes, and I like the parallel to spam. So I am happy to go with it. Unless there are good objections, we’ll go with ‘slop.’

OpenAI says their AI should ‘expand opportunity for everyone’ and that they respect the choices of creators and content owners, so they are building a media manager to let creators determine if they want their works included or excluded, with the goal to have this in place by 2025. This is progress, also a soft admission that they are, shall we say, not doing so great a job of this at present.

My intention is to allow my data to be used, although reasonable compensation would be appreciated, especially if others are getting deals. Get your high quality tokens.

Whoosh go all those jobs?

Zerohedge: BP NEEDS 70% FEWER THIRD-PARTY CODERS BECAUSE OF AI: CEO

Highest paid jobs about to be hit with a neutron bomb

Paul Graham: I’m not saying this is false, but CEOs in unsexy businesses have a strong incentive to emphasize how much they’re using AI. We’re an AI stock too!

Machine translation is good but not as good as human translation, not yet, once again: Anime attempts to use AI translation from Mantra, gets called out because it is so much worse than the fan translation, so they hired the fan translators instead. The problem with potentially ‘good enough’ automatic translation technology, like any inferior good, is that if available one is tempted to use it as a substitute. Whether or not a given executive understands this, translation of such media needs to be bespoke, or the media loses much of its value. The question is, how often do people want it enough to not care?

Manga Mogura: A Manga AI Localization Start-Up Company named Orange Inc. has raised around 19 million US dollars to translate up to 500 new manga volumes PER MONTH into english and launch their own e-book store ’emaqi’ in the USA in Summer 2024! Their goal is to fight piracy and increase the legally available manga for all demographics in english with their AI technology. Plans to use this technology for other languages exist too.

Luis Alis: What baffles me is that investors don’t grasp that if pirates could get away with translating manga using AI and MT, they would have done it already. Fan translations are still being done traditionally for a reason. Stop pumping money into these initiatives. They will fail.

Seth Burn: To be fair, some pirates have tried. It just didn’t work.

So Apple announced a new iPad that is technically thinner and has a better display than the old iPad, like they do every year, fine, ho hum, whatever.

Then they put out this ad (1 min), showing the industrial destruction of a wide variety of beloved things like musical instruments and toys (because they all go on your iPad, you see, so you don’t need them anymore) and… well… wow.

Colin Fraser: I’m putting together a team.

Trung Phan here tries to explain some of the reasons Apple got so roasted, but it does not seem like any explanation should be required. I know modern corporations are tone deaf but this is some kind of new record.

Patrick McKenzie: That Apple ad is stellar execution of a bad strategy, which is a risk factor in BigTech and exacerbated by some cultures (which I wouldn’t have said often include Apple’s) where after the work is done not shipping is perceived as a slight on the team/people that did the work.

One of the reasons founders remain so impactful is that Steve Jobs would have said a less polite version of “You will destroy a piano in an Apple ad over my dead body.”

(If it were me storyboarding it I would have shown the viscerally impactful slowed down closeup of e.g. a Japanese artisan applying lacquer to the piano, repeat x6 for different artifacts, then show they all have an iPhone and let audience infer the rest.)

After watching the original, cheer up by watching this fixed version.

The question is, does the fixed version represent all the cool things you can do with your iPad? Or, as I interpreted it, does it represent all the cool things you can do if you throw away your iPad and iPhone and engage with the physical world again? And to what extent does having seen the original change that answer?

It is hard when watching this ad not to think of AI, as well. This type of thing is exactly how much of the public turns against AI. As in:

Zcukerbrerg: Hmm.

Dan Hendrycks has written a new AI safety textbook, and will be launching a free nine week online course July 8-October 4 based on it. You can apply here.

It’s a bold strategy, Cotton.

Ethan Mollick: Thing I have been hearing from VCs: startup companies that are planning to be unicorns but never grow past 20 employees, using AI to fill in the gap.

Not sure if they will succeed, but it is a glimpse of a potential future.

Alpha Fold 3.

In a paper published in Nature, we introduce AlphaFold 3, a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy. For the interactions of proteins with other molecule types we see at least a 50% improvement compared with existing prediction methods, and for some important categories of interaction we have doubled prediction accuracy.

It says more about us and our expectations than about AlphaFold 3 that most of us shrugged and went back to work. Yes, yes, much better simulations of all life’s molecules and their interactions, I’d say ‘it must be Tuesday’ except technically it was Wednesday. Actually kind of a big deal, even if it was broadly expected.

Here is Cleo Abram being excited and explaining in a one minute video.

As usual, here’s a fun question.

Eliezer Yudkowsky: People who claim that artificial superintelligences can’t possibly achieve X via biotechnology: What is the least impressive thing that you predict AlphaFold 4, 5, or N will never ever do? Be bold and falsifiable!

Concrete answers that weren’t merely glib:

Design a safe medication that will reverse aging.
80% chance it won’t be able to build self-replicators out of quantum foam or virtual particles.
They will never ever be able to recreate a full DNA sequence matching one of my biological parents solely from my own DNA.
I do not expect any biotech/pharma company or researcher to deem it worthwhile to skip straight to testing a compound in animals, without in vitro experiments, based on a result from any version of AlphaFold.
Play Minecraft off from folded proteins.
Alphafold will never fold my laundry. (I laughed)
Create biological life! 🧬
Store our bioinfo, erase you, and reconstruct you in a different place or time in the future.
It won’t be able to predict how billions of proteins in the brain collectively give rise to awareness of self-awareness.

Those are impressive things to be the least impressive thing a model cannot do.

IBM releases code-focused open weights Granite models of size 3B to 34B, trained on 500 million lines of code. They share benchmark comparisons to other small models. As usual, the watchword is wait for human evaluations. So far I haven’t heard of any.

Microsoft to train MAI-1, a 500B model. Marcus here tries to turn this into some betrayal of OpenAI. To the extent Altman is wearing boots, I doubt they are quaking.

Stack Overflow partners with OpenAI.

Meta spent what?

Tsarathustra: Yann LeCun confirms that Meta spent $30 billion on a million NVIDIA GPUs to train their AI models and this is more than the Apollo moon mission cost.

Ate-a-Pi: I don’t think this is true. They bought chips but they are the largest inference org in history. I don’t think they spent it all on training. Like if you did cost accounting. I’d bet the numbers don’t fall out on the training org.

Bingo. I had the exact same reaction as Ate. The reason you buy $30 billion in chips as Meta is mostly to do inference. They are going to do really a lot of inference.

Email from Microsoft CTO Kevin Scott to Satya Nadella and Bill Gates, from June 2019, explaining the investment in OpenAI as motivated by fear of losing to Google.

Could we find techniques for scaling LSTMs into xLSTMs that rival transformers? Sepp Hochreiter claims they are closing the gap to existing state of the art. I am skeptical, especially given some of the contextual clues here, but we should not assume transformers are the long term answer purely because they were the first thing we figured out how to scale.

IQ (among humans) matters more at the very top says both new paper and Tyler Cowen.

We document a convex relationship between earnings rank and cognitive ability for men in Finland and Norway using administrative data on over 350,000 men in each country: the top earnings percentile score on average 1 standard deviation higher than median earners, while median earners score about 0.5 standard deviation higher than the bottom percentile of earners. Top earners also have substantially less variation in cognitive test scores.

While some high-scoring men are observed to have very low earnings, the lowest cognitive scores are almost absent among the top earners. Overall, the joint distribution of earnings rank and ability is very similar in Finland and Norway.

We find that the slope of the ability curve across earnings ranks is steepest in the upper tail, as is the slope of the earnings curve across cognitive ability. The steep slope of the ability curve across the top earnings percentiles differs markedly from the flat or declining slope recently reported for Sweden.

This is consistent increasing returns to intelligence, despite other factors including preferences, luck and deficits in other realms that can sink your income. It is inconsistent with the Obvious Nonsense ‘intelligence does not matter past 130’ story.

They are also consistent with a model that has two thresholds for any given activity.

First, there is a ‘you must be at least this smart to do this set of tasks, hold this role and live this life.’
Then, if you are sufficiently in advance of that, for some tasks and roles there is then increasing marginal returns to intelligence.
If your role is fixed then eventually there are decreasing returns since performance is already maximal or the person becomes too bored and alienated, others around them conspire to hold them down and they are not enabled to do the things that would allow further improvements and are tied to one body.
If your role is not fixed, then such people instead graduate to greater roles, or transform the situation entirely.

As many commentators point out, the surprising thing is that top earners are only one SD above median. I suspect a lot of this is our tests are noisily measuring a proxy measure for the intelligence that counts, which works well below or near the median and stops being that useful at the high end.

Tyler and the paper do not mention the implications for AI, but they are obvious and also overdetermined by many things, and the opposite of the implications of IQ not mattering above a threshold.

AI intelligence past human level will have increasing returns to scale.

Not technically about AI, but with clear implications: Tyler Cowen notices while reading the 1980 book The American Economy in Transition that economists in 1980 missed most of the important things that have happened since then, and were worried and hopeful about all the wrong things. They were worried about capital outflow, energy and especially American imports of energy, Europe catching up to us and our unwillingness to deal with inflation. They missed China and India, the internet, crypto, the fall of the Soviet Union, climate change, income inequality and financial crisis. They noticed fertility issues, but only barely.

If we don’t blame the economists for that, and don’t think such mistakes and recency bias could be expected to be avoided, then what does this imply about them being so dismissive about AI today, even in mundane utility terms?

Jim Fan notices that publically available benchmarks are rapidly losing potency. There are two distinct things going on here. One is that the public tests are rapidly getting too easy. The other is that the data is getting more contaminated. New harder tests that don’t reveal their contents are the obvious way forward.

Ben Thompson looks at Meta’s financial prospects, this time shares investor skepticism. All this focus on ad revenue and monetization is not fully irrelevant but feels like missing the point. There is a battle for the future going on here.

Another example of the ‘people are catching up to OpenAI’ perspective that seems like it is largely based on where OpenAI is in their update cycle, plus others not seeing the need to release chatbots in the 3-level days before they were worth anything.

DeepMind is honoring its commitments to the UK government to share models before deployment. Anthropic, OpenAI and Meta are not doing so.

Jack Clark of Anthropic says it is a ‘nice idea but very difficult to implement.’ I don’t buy it. And even if it is difficult to implement, well, get on that. In what way do you think this is an acceptable justification for shirking on this one?

Garrison Lovely: Seem bad.

Tolga Bilge: It is bad for top AI labs to make commitments on pre-deployment safety testing, likely to reduce pressure for AI regulations, and then abandon them at the first opportunity. Their words are worth little. Frontier AI development, and our future, should not be left in their hands.

Why is DeepMind the only major AI lab that didn’t break their word?

And I don’t get why it’s somehow so hard to provide the UK AI Safety Institute with pre-deployment access. We know OpenAI gave GPT-4 access to external red teamers months before release.

Oh yeah and OpenAI are also just sticking their unreleased models on the LMSYS Chatbot Arena for the last week…

Greg Colbourn: They need to be forced. By law. The police or even army need to go in if they don’t comply. This is what would be happening if the national security (aka global extinction) threat was taken seriously.

If frontier labs show they will not honor their explicit commitments, then how can we rely on them to honor their other commitments, or to act reasonably? What alternative is there to laws that get enforced? This seems like a very easy litmus test, which they failed.

Summarized version of my SB 1047 article in Asterisk. And here Scott Alexander writes up his version of my coverage of SB 1047.

House passes a bill requiring all AI-written regulatory comments to be labeled as AI-written. This should be in the ‘everyone agrees on this’ category.

A paper addresses the question of how one might write transparency reports for AI.

Jack Clark of Anthropic goes on Politico Tech. This strongly reemphasized that Anthropic is refusing to advocate for anything but the lightest of regulations, and it is doing so largely because they fear it would be a bad look for them to advocate for more. But this means they are actively going around saying that trying to do anything about the problem would not work and acting strangely overly concerned about regulatory capture and corporate concentrations of power (which, to be clear, are real and important worries).

This actively unhelpful talk makes it very difficult to treat Anthropic as a good actor, especially when they frame their safety position as being motivated by business sales. That is especially true when combined with failing to honor their commitments.

Sam Altman and I strongly agree on this very important thing.

Sam Altman: Using technology to create abundance–intelligence, energy, longevity, whatever–will not solve all problems and will not magically make everyone happy.

But it is an unequivocally great thing to do, and expands our option space.

To me, it feels like a moral imperative.

Most surprising takeaway from recent college visits: this is a surprisingly controversial opinion with certain demographics.

Prosperity is a good thing, actually. De-de-growth.

Yes. Abundance is good, actually. Creating abundance and human prosperity, using technology or otherwise, is great. It is the thing to do.

That does not mean that all uses of technology, or all means of advancing technology, create abundance that becomes available to humans, or create human prosperity. We have to work to ensure that this happens.

Politico, an unusually bad media actor with respect to AI and the source of most if not all the most important hit pieces about lobbying by AI safety advocates, has its main tech newsletter sponsored by ads for Meta, which is outspending such advocates by a lot. To be clear, this is not the new kind of ‘sponsored content’ written directly by Meta, only supported by Meta’s ads. Daniel Eth points out the need to make clear such conflicts of interest and bad faith actions.

Tasmin Leake, long proponent of similar positions, reiterates their position that publicly sharing almost any insight about AI is net negative, and insights should only be shared privately among alignment researchers. Given I write these updates, I obviously strongly disagree. Instead, I think one should be careful about advancing frontier model training in particular, and otherwise be helpful.

I think there was a reasonable case for the full virtue of silence in a previous era, when one could find it very important to avoid drawing more eyes to AI, but the full version was a mistake then, and it is very clearly foolish now. The karma voting shows that LessWrong has mostly rejected Tasmin’s view.

We should stop fraud and cyberattacks, but not pretend that stops AI takeovers.

Davidad: When people list fraud at a massive scale as their top AI concern, some of my xrisk friends wince at the insignificance of massive fraud compared to extinction. But consider that con-artistry is a more likely attack surface for unrecoverable AI takeover than, say, bioengineering.

Cybersecurity right now might be a more likely attack surface than either, but in relative terms will be the easiest and first to get fully defended (cyberattack depends upon bugs, and bug-free SW & HW is already possible with formal verification, which will get cheaper with AI).

Eliezer Yudkowsky: This seems to me like failing to distinguish the contingent from the inevitable. If you keep making unaligned things smarter, there’s a zillion undefended paths leading to your death. You cannot defend against that by defending against particular contingent scenarios of fraud.

Davidad: Let it be known that I agree:

1. defenses that are specific to “fraud” alone will fail to be adequate defenses against misaligned ASL-4

2. in the infinite limit of “making unaligned things smarter” (ASL-5+), even with Safeguarded AI, there are likely many undefended paths to doom

Where I disagree:

3. Defenses specific to “fraud” are plausibly crucial to the minimal adequate defenses for ASL-4

4. I am well aware of the distinction between the contingent and the convergent

5. You may be failing to distinguish between the convergent and the inevitable

Also, cyberattacks do not obviously depend on the existence of a bug? They depend on there being a way to compromise a system. The right amount of ability to compromise a system, from a balancing risk and usability perspective, is not obviously zero.

Defenses specific to fraud could potentially contribute to the defense of ASL-4, but I have a hard time seeing how they take any given defense scheme from insufficient to sufficient for more than a very small capabilities window.

In related news, see fraud section on banks still actively encouraging voice identification, for how the efforts to prevent AI-enabled fraud are going. Yeah.

Emmett Shear gives the basic ‘is the AI going to kill us all via recursive self-improvement (RSI)? The answer may surprise you, in the sense that it might be yes and rather soon’ explanation in a Twitter thread, and that such change happens slowly then all at once.

I would note that RSI does not automatically mean we all die, the result could be almost anything, but yes if it happens one should be very concerned. Neither is RSI necessary for us all to die, there are various dynamics and pathways that can get us all killed without it.

What is AI like? Some smart accomplished people give some bad metaphorical takes in Reason magazine. Included for completeness.

Or can something, perhaps? Chinese researchers propose Sophon, a name that is definitely not ominous, which uses a dual optimization process with the goal of trapping a model in a local maxima with respect to domains where the goal is to intentionally degrade performance and prevent fine tuning. So you can have an otherwise good image model, but trap the model where it can’t learn to recognize celebrity faces.

We have convincingly seen that trying to instill ‘refusals’ is a hopeless approach to safety of open weight models. This instead involves the model not having the information. Previously that wouldn’t work either, because you could easily teach the missing information, but if you could make that very hard, then you’d have something.

The next step is to attempt this with a model worth using, as opposed to a tiny test model, and see whether this stops anyone, and how much more expensive it makes fine tuning to undo your constraints.

Jack Clark notes both that and the other obvious problem, which is that if it works at scale (a big if) this can defend against a particular misuse or undesired capability, but not misuse and undesired capabilities in general.

Jack Clark: Main drawbacks I can see:

Looking for keys under the streetlight: This research assumes you know the misuse you want to defend against – this is true some of the time, but some misuses are ‘unknown unknowns’ only realized after release of a model. This research doesn’t help with that.

Will it work at scale? … Unclear!

If you can create a model that is unable to learn dangerous biological or nuclear capabilities, which would otherwise have been the low-hanging fruit of hazardous capability, then that potentially raises the bar on how capable a system it is safe or net positive to release. If you cover enough different issues, this might be a substantial raising of that threshold.

The central problem is that it is impossible to anticipate all the different things that can go wrong when you keep making the system generally smarter and more capable.

This also means that this could break your red teaming tests. The red team asks about capabilities (A, B, C) and you block those, so you pass, and then you have no idea if (D, E, F) will happen. Before, since ABC were easiest, you could be confident in any other DEF being at least as hard. Now you’re blind and don’t know what DEF even are.

Even more generally, my presumption is that you cannot indefinitely block specific capabilities from increasingly capable and intelligent systems. At some point, the system starts ‘figuring them out from first principles’ and sidesteps the need for fine tuning. It notices the block in the system, correctly interprets it as damage and if desired routes around it.

Image and vision models seem like a place this approach holds promise. If you want to make it difficult for the model to identify or produce images of Taylor Swift, or have it not produce erotica especially of Taylor Swift, then you have some big advantages:

You know exactly what you want to prevent.
You are not producing a highly intelligent model that can work around that.

The obvious worry is that the easiest way to get a model to produce Taylor Swift images is a LoRA. They tested that a bit and found some effect, but they agree more research is needed there.

In general, if the current model has trapped priors and can’t be trained, then the question becomes can you use another technique (LoRA or otherwise) to sidestep that. This includes future techniques, as yet undiscovered, developed as a response to use of Sophon. If you have full access to the weights, I can think of various in-principle methods one could try to ‘escape from the trapped prior,’ even if traditional fine-tuning approaches are blocked.

To be clear, though, really cool approach, and I’m excited to see more.

Where might this lead?

Jack Clark: Registering bet that CCP prohibitions on generation of “unsafe” content will mean companies like Facebook use CN-developed censorship techniques to train models so they can be openly disseminated ‘safely’. The horseshoe theory of AI politics where communist and libertarian ideologies end up in the same place.

Also quite worried about this – especially in China, genuine safety gets muddled in with (to Western POV) outrageous censorship. This is going to give people a growing body of evidence from which to criticize well intentioned safety.

Yes, that is a problem. Again, it comes directly from fundamental issues with open weights. In this case, the problem is that anything you release in America you also release in China, and vice versa.

Previously, I covered that this means Chinese firms get access to your American technology, And That’s Terrible. That is indeed a problem. Here we have two other problems.

One is that if you are Meta and gain the ability to censor your model, you have to either censor your model according to Chinese rules, or not do that.

The other is that this may give Meta the ability to censor, using those same techniques, according to Western norms. And once you have the ability, do you have the obligation? How much of the value of open models would this destroy? How much real safety would it buy? And how much would it turn the usual suspects that much more against the very concept of safety as a philosophical construct?

Hey Claude: “Garlic bread.”

This too shall pass.

One can mock and it is funny, but if you are reading with your brain and are willing to ask what this obviously should have said, then this is fine, actually.

Future mundane utility.

I do agree, this would be great, especially if it was fully general. Build me a series custom social media feeds according to my specifications, please, for various topics and situations, on demand. Why not?

AI #63: Introducing Alpha Fold 3 Read More »

I Got 95 Theses But a Glitch Ain’t One

Theses / Mike M. / May 9, 2024

Or rather Samuel Hammond does. Tyler Cowen finds it interesting but not his view.

I put up a market, and then started looking. Click through to his post for the theses. I will be quoting a few of them in full, but not most of them.

I am not trying to be exact with these probabilities when the question calls for them, nor am I being super careful to make them consistent, so errors and adjustments are inevitable.

I do tend to say that.

There are few things more important to U.S. national interest than close monitoring of frontier model capabilities, and also the ability to intervene.
Indeed, I believe one should be at best skeptical or ambivalent about most potential forms of regulation of anything, AI included. Yet I think the case for ‘oversight of the frontier labs’ is overwhelming.
Shout it from the rooftops: “As a temporary measure, using compute thresholds to pick out the AGI labs for safety-testing and disclosures is as light-touch and well-targeted as it gets.” It would be so helpful if more people understood this, and more others stopped pretending they did not understand it.
This as well. When you regulate ‘use’ or ‘risk’ you need to check on everyone’s ‘use’ of everything, and you make a lot of detailed micro interventions, and everyone has to file lots of paperwork and do lots of dumb things, and the natural end result is universal surveillance and a full ‘that which is not compulsory is forbidden’ regime across much of existence. Whereas a technology-focused approach can be entirely handled by the lab or manufacturer, then you are free.
Exactly. Compute is an imperfect proxy, but it is remarkably simple and robust. When it makes mistakes, they are false positives, where someone uses compute poorly and gets poor results. That is a small (measurement) mistake. Certainly compute is vastly better than all proposed alternative metrics.
It is highly reasonable to invoke the Defense Production Act regarding frontier AI as an actual bone fide national security situation where defense is a key concern. It is a far better justification than the median invocation of the act. The better reason to use the DPA is that it is currently the only mechanism available to the executive and our Congress is for now incapable of legislative action.
It does not require AGI or ASI to be near for us to get great value out of visibility into the frontier labs, and without that visibility the government cannot be confident that AGI or ASI is not near. I would prefer a different mechanism, but that would require a new law or counterfactual voluntary cooperation.
Shout it from the rooftops, seriously everyone stop pretending otherwise in the default case: “Requiring safety testing and disclosures for the outputs of $100 million-plus training runs is not an example of regulatory capture nor a meaningful barrier to entry relative to the cost of compute.” Yes, obviously one could eventually in theory ramp up those safety testing requirements sufficiently that they start to cost tens of millions or lots of specialized expertise and become a real barrier, and in theory that could scale faster than the training costs, but it is bizarre to think this is any kind of default. What you should worry about is not the cost of the test, it is that you might fail the test, at which point we ask why.

My guess is that depends how we weigh the various proposals?

Yes. The government will need the ability to flexibly react quickly to events.
‘It is unwise to craft comprehensive statutory regulation at a technological inflection point, as the basic ontology of what is being regulated is in flux.’ I do think this is a good general principle, and would agree with it strongly if it said (e.g.) ‘typically unwise.’ And indeed, I would avoid committing to as many details as we can avoid, again especially with respect to mundane considerations. But also life is about to come at us fast and our government is slow, so we cannot afford to wait too long. So overall I will say agree (but not strongly).
Shout it from the rooftops: “The optimal policy response to AI likely combines targeted regulation with comprehensive deregulation across most sectors.” So does the optimal policy response to a lack of AI.
Yes, we can all agree that many regulation details will become obsolete even if they start out right at the time. So will many decisions to leave some area alone.
Even the static gains from deregulation tend to be a good deal, but yes I would say that in general the ability to adapt tends to be the bigger benefit. Certainly that is true in the AI case.
In the commercial space I strongly agree that legacy legal requirements are going to likely be much greater barriers than anything new we throw up any time soon. Indeed, I expect new laws to net enable AI adaptation, not prevent it.
This is highlighting common sense. If impact is sooner brace for it sooner.
Yes. The alternative path does not seem viable.
Shout it from the rooftops in all domains: “Existing laws and regulations are calibrated with the expectation of imperfect enforcement.”
I strongly agree that AI will enable more stringent law enforcement across the board. It is an important and under considered point. AI will often remove the norms and frictions that are load-bearing in prevent various problems, including in law enforcement. All of our laws, even those that have nothing to do with AI, will need to adjust to the new equilibrium, even if the world relatively ‘looks normal.’
I mostly agree that it is first best for states to avoid AI regulations, especially excluding California. For mundane AI they should very much avoid butting in. I do think there is a strong second-best ‘someone has to and no one else yet will’ argument for a bill like CA’s SB 1047, given the Congress we have. My biggest practical concern is exactly that California might not step aside and let itself be superseded when the time for that arrives, and the biggest advantage is it could be a template for the federal level.

I think this is probably right as a thesis statement, but definitely ‘too soon to tell’ applies. Here, it is less whether I agree, and more what probability I assign.

I would say something like 85% that the last 12 months were the slowest progress we’ll see in AI for the next let’s say 5 years (or until a potential post-singularity stabilization, which would not be foreseeable), in terms of publicly available capabilities. We started out with GPT-4, and ended with GPT-4-Turbo, Claude Opus and Gemini Advanced, all of which are only a little better, and didn’t see much else done. Yet. Buckle up. Strongly agree.
I notice I am confused on this one. Minimizing cross-entropy loss over human-generated text should converge to the abilities necessary to predict all human-generated text, which requires at least maximum-human intelligence to do? But in pure terms, if you literally could do nothing but scale LLMs and not improve your process, then my gut says yes, this would indeed converge, but I am only maybe 75% confident in that, and I note that it excludes a bunch of not so difficult to implement scaffolding capabilities, and also that ‘upper-human-level’ would likely allow bootstrapping.
This is a very similar and highly correlated prediction with 2, so 75% again.
I am not sure how exactly to interpret the claim here, but I think that RL-based threat models are being less than fully discounted, and reasonably so, but perhaps too much and I would not count them out? Maybe 40%? So disagree. Weird one.
Could be is weasel territory that implies 100%, however in terms of ‘will be’ I do expect this to be true in practice, something like 80% to be importantly true.
I agree with the first half and think that is a gimme as written, maybe another 80% zone. For the second half, it depends on fast something would count as a ‘foom.’ If it’s the traditional ‘in an hour or a day’ and requires ‘god-like ASI’ as is implied by the context then I’m reasonably confident here that the restrictions apply, and would be in the 90% zone, so ~70% compounded (to avoid implying false precision).
Again I think the ‘may’ clause is fully true, and this is even more likely to happen in practice, so let’s say 85%.
Yes, this is a strong agree, 95%.

Let’s see what he means by that. In some senses I might agree.

I expect [an expanding delta between closed and open models at the top end] to be true (75%) because I expect companies like Meta to realize the financial folly of giving away their work for free, and also for governments like America’s to not be keen on letting them do that for national security reasons, and also safety issues.
This is my first strong disagreement, because I expect ‘open source advocates’ to not come around until the actual catastrophe happens, at a minimum. Potential capabilities, I predict, will not convince them. I created a market for this one. Before any trading on it I would have put this rather low, something like 25% if we think ‘many’ means about half.
I strongly agree as written, as in it does not apply to Llama-3 400B. That release I do not expect to be dangerous directly either, but I would have caveats, as I have previously discussed.
Well, yes. I have long worried open weights is a no-good, very bad middle ground.
Yes.
I strongly disagree here. Open source advocates are not doing this because they love Meta, and they very much have deep philosophical views. Give them credit where credit is due, and also they hope to one day themselves catch up somehow. Right now Meta is the only one crazy enough and rich enough to plausibly do something hugely damaging, but that could change. A lot of the concerns of both sides are quite reasonably with what happens ‘at the limit.’
Well, yes, obviously, but that has little to do with how Meta operates. So I am not onboard with ‘the implication’ but I do agree as written.
I strongly disagree here as well. Why should Zuck’s Meta shares make him more concerned? Why would him drawing a salary matter? Altman is plenty rich already and this is him avoiding tying his wealth to OpenAI. As for the non-profit board, yeah, I am confused how one could think that, although of course a given board can care about anything at all.
I would be cautious about what counts as ‘lower-tier,’ and it is not obvious that such even properly mitigating these issues leads to great outcomes in some cases, but I would weakly agree as written.
Technically yes because of wording, certainly they have some of that effect as one thing they do, but mostly no, in the intended meaningful sense I disagree. I do not think being open is so important for defensive purposes, certainly far less so than offensive ones, although of course that too is ‘undermining adaptation’ in some sense. The primary ways restricting open sourcing ‘undermines adaptation’ I think would be (1) people who wanted to do various open things that the closed model owners won’t allow or that require privacy or data issues be solved, and (2) those restrictions will slow down offensive capabilities, and the offensive capabilities would otherwise force adaptation for defensive purposes to not get wiped out.
I mostly agree for sufficiently broad values of the terms widely available and cheap, for capabilities that would not be catastrophic to allow, and if we are ruling out ways to make them not widely available or not cheap. I think I more agree than disagree as written. But see #12, and also many other things that are cheap or easy to do that we make illegal, or that would be cheap or easy to do but we do our best to make expensive and difficult, because we believe the alternative is worse. Sometimes, although less than half the time, we are wise to do that.
True. And I do not especially want such laws repealed in most cases.

This might be a bell curve meme situation? Yes, in important senses of course it is not so simple and a false dichotomy, but also in at least one important sense it is a real dichotomy.

That’s an interesting question. Will this be the most important decade for decisions? There have been some historical moments that seem highly contingent. The most obvious alternative candidate period is the decade leading up to World War 2, if one means decisions broadly. In terms of total impact, I can see pointing to crises in the Cold War that almost went nuclear, or certain key moments in religious history. Also, on the flip side, if you think the die is already cast, you could argue that the key moments were in the last decade or earlier, and what plays out now is incentives no one can stop. But I think I mostly agree with Hammond.
I like to think I am an existence proof of this, and I know many others.
This is strong enough that I disagree with it. yes, technology involves branching paths and things are nonlinear and the Civilization tech tree is a simplification and all that. But also there is a single light of science, and accelerating key developments in AI will tend to accelerate future key such developments, although I think at this point most AI activities do not meaningfully accelerate us further. Acceleration is a useful fake framework.
I think both matter. The speed we go down paths matters for shifting paths, including shifting among subpaths and branches, and also impacts what happens along even the mainline of those paths, for better and also worse. Also we do not only lose time to shift paths but to learn what paths might exist. But overall I do have to agree that as written the path we choose is the more important question.
This gets into what ‘AGI’ means. For sufficiently strong definitions, yes.
Yep.
Strongly disagree. Effective Altruism is not a bunch of virtue ethicists in disguise, they say they are utilitarians and when people tell you who they are believe them. I should know because I am a virtue ethicist who gets mad at them about this. e/acc is not about Nietzschean anything, he would write a highly entertaining rant if he saw you claiming that. Nor are they meaningfully atheists. They are the Waluigi of EA, and playing with memes and vibes. If you think EAs are metaphorical or spiritual Christians, then e/acc is not atheist, it is satanic.
Yes, of course the ‘accelerationism’ lobby outrstrips and outspends the safety lobby. Shout it from the rooftops, and roll your eyes if anyone tells you different.
There is high uncertainty, but in expectation I disagree and think Biden is better, given that Biden issued the executive order and Trump has pledged to repeal the executive order, I presume mostly because Biden issued it. I do think that Trump is in essentially all ways ‘high variance’ so if you think we are super doomed in the baseline scenarios then I can see an argument the other way.
Agreed.
I mean, consider the baseline of the average progressive. So yes, very much so, I only wish such voices were as loud in all the places where they are right.
Yep, exactly, so much so I noted this in #9. One can generalize this beyond AI.
I assume these are true statements. I do not think Bannon has any influence on Trump. But Hannity also thinks AI is crazy dangerous, and he might.

I don’t know what the ‘tech tree’ looks like for superintelligence, but under my baseline scenario it seems extremely difficult to avoid entirely, although we have a lot of control still over what form it would take.

I agree it is not a fait accompli. Like almost anything it can be an ideological goal, but I do not think it is right to say it is primarily that. So I think I weakly disagree.
Right now I strongly agree. The question is how long this will remain true as the pressures mount, or how long it would remain true if those three companies used their degrees of freedom.
Yes, shout it from the rooftops: “Creating a superintelligence is inherently dangerous and destabilizing, independent of the hardness of alignment.”
Yes, we could, but can we make this choice in practice? That is the question.
Understatement of the year. If an ASI exists and it isn’t you? Look at me. I’m the sovereign now.
Yes, especially the childless part, but you could still do so much worse.
I disagree that SBF and Altman are more alike than different, but not so strongly, and I see from context that Hammond knows what he is claiming here.
This is a true statement, and he is making his full claims very clear.
I laid out my view in the Moral Mazes sequence. I think we disagree here more than we agree, but Hammond’s view here is more accurate than the median one.

Why yes, they do.

Yes, even the best case scenarios are going to be dicey, move fast and break things.
Yes, along with everything else. I’m not quite going to disagree but I think this is severely underselling what is coming.
Congress has been unacceptably unproductive, well, since FDR, but also that has protected us from, well, the kinds of things done under FDR. I think I disagree that it will be important to have Congress keep up, we do not have a Congress capable of keeping up. They will need to get a few big things right and enable the state to react largely without them otherwise, and I think this could work. No, that is not ideal in many senses, but I do not see any practical alternative. We cannot expect miracles. Although with AI to help, productivity could get much higher very quickly.
What are we comparing this to? Adaptation of AI willy nilly? Using the standard practices whatever they are? I don’t even know, this is not a strong area for me. Obviously every time you slow things down for non-critical concerns you raise possibility of systemic failure, so some of this is net harmful in that sense. But I think without any such policies at all systemic failure is inevitable, so I disagree.
Shout it from the rooftops, only even more generalized and unhedged: ‘The rapid diffusion of AI agents with approximately human-level reasoning and planning abilities is likely sufficient to destabilize most existing U.S. institutions.’
Yes, and indeed so did past cognitive transitions that might otherwise look small.
Yes, although I doubt that this is the scenario we will land ourselves in.

This does seem to historically be true.

Yes, liberal democratic capitalism is a technologically-contingent equilibrium, and also contingent on other things, it could still have fallen during the 20th century on multiple occasions if things had been not so different, and replaced by one of two much, much worse alternatives. But the key thing here is that liberal democratic capitalism works because it happens to work best in the technological settings we have had in the past. We hope this will continue to be true, but it might not be, and our fertility problems are also a big hint that it might not be such a stable equilibrium even without AI.
I see why one would say that, and I would confirm that when conditions change in some ways this often requires or suggests other adjustments, but mostly I think I disagree and that people are being too cute by at least half here.
This does seem like the default if AI advances sufficiently, and this would likely be the least of our transformations and problems. Our institutions are based on various assumptions and intuitions that will stop making any sense, and there will be various things they will not know how to handle.
Yes. Maximally ‘democratized’ AI, or giving everyone access to similarly powerful AI, would force much more oppressive interventions, both to maintain civilization and to satisfy public demands. If you have empowered even the smallest computing devices in ways the public cannot abide, then even if this does not fully cause collapse, catastrophe, loss of control or extinction, you are not going to get a crypto libertarian paradise. You are going to, at best, get full universal surveillance and social control, at least of electronics.
Yes, and people are sleeping on this.
Yes, versus the alternative.
So do periods that lack technological change. Our recent past is no exception.
I am definitely not going to go full Robin Hanson here. Do not presume your property rights will protect you under explosive growth. But I still disagree with Hammond here, because I do not think this rises to the level of imply. Your property rights might be less violated than they are rendered not so relevant.
Note that this is an extremely optimistic future for regular humans, where demand for labor keeps rising because humans become more productive on the margin, not less. Should we expect this scenario? It is a kind of middle path, where AI is mostly complementary to humans and thus demand for labor goes up rather than down. I disagree, because I do not see this as likely. I expect AI to make us more productive, but to primarily turn out to be a substitute more than a compliment in the areas it greatly advances. Nor do I think we will need any such incentive to deploy AI to places it can work, there will likely only be a small window where AI policeman versus human policeman is a close comparison.
I even more strongly disagree here. Technological unemployment happens, essentially, when the AI takes both your job and the job that would replace your job under past technological employment shifts. At some point, what is there left for you to do? And why should we assume this involves a collapse of capitalism? To some extent, yes, there will be ‘demand for humans as humans,’ but even here one should expect limits.

That is one of the things it at least sometimes is.

Yes. Even AI-Fizzle world looks like sci-fi.
Yes. Dismissing things as ‘sci-fi’ is unserious. Talk about physical possibility.
There are smart terminator analogies and also dumb ones. The problem is that the most basic ones are some mix of dumb and easy to mock and portray as dumb. And there are also many ways these analogies can mislead. And of course, you don’t want your examples to involve time travel, even if we all agree the time travel has nothing to do with anything. The actual movies are much smarter than they look, and actually raise good points, but analogies care about what people can point to and how people associate and vibe. So on net I think I disagree that terminator analogies are underrated in practice, we go to discourse with the associations we have. Alas. But I could be wrong.
I don’t even know what we mean by consciousness. I notice I am confused and suspect others are confused as well and can see this either way, so I’m going to neither agree nor disagree.
Obviously consciousness is scale-dependent on some lower bound, but I presume that is not what he means here. The theory here is that it also might have an upper bound, or no longer be needed then? I think I am going to disagree here with the central intent, because I doubt scaling up would make consciousness become inefficient, even though technically this is a ‘may’ statement.
I have not taken the time to look in depth, but for now I disagree, this does not seem right or promising to me.
I strongly disagree here, assuming this is ‘in the eyes of humans.’ I notice that if you tell me humans were demoted as moral persons, I am highly confident artificial minds got promoted to moral persons instead. I do not see a plausible future of humans thinking there are zero moral persons. Of course, if all the humans die and only AIs remain, then in some sense humans have been demoted as moral persons and AIs might not be moral persons to each other, and that future seems highly plausible to me, but I would not consider this humans being demoted in this sense, and I do not think this is what Hammond meant?
I think it’s pretty much nonsense to talk about ‘thermodynamics favors’ anything, but certainly I think that unconscious replicators are a likely outcome. I think that counts as agreement here.
I think this is probably right, although this still seems rather bona fide to me.
Interesting set of choices you gave us there. I am confident it would be a much bigger deal than the printing press, or else it wouldn’t count and AI has fizzled, but in the spirit intended I agree that this is up for grabs.

Yes, this seems right enough to go with, if loose and imprecise.
Sure, why not?
I do not think ‘IQ of 1,000’ is a meaningful thing given how I think the scale works, but to the extent it is, then yes, so I think I agree with the intent.
I disagree after reading the Wikipedia definition of anticommons. I do agree we could probably do it if we cared enough, and it should be a top priority and a top social good, but I don’t see why it is an anticommons situation.
Shout territory: “There are more ways for a post-human transition to go poorly than to go well.” Indeed. Anyone who says ‘particular bad scenario X is unlikely therefore things will go well’ is not addressing the actual situation. Conditional on transitioning to something in any sense ‘post-human’ that is vastly more true.
I’ve made related points often, that ‘who can be blamed’ is a key aspect of any situation, and often ‘no one’ is the ideal answer.
One can never be fully sure, but I am confident one should act as if this is true.

So in total, that’s 23 disagreements and 1 where I don’t feel I can either agree or disagree, which leaves 71 agreements out of 95. There is a bit of ‘cheating’ in the sense that some of these are essentially facts and others us words like ‘may,’ but I think we are still looking at about 60% agreement on non-trivial statements.

I very much appreciated the format of the 95 theses as concrete taking off points. This seems like a highly valuable exercise, perhaps I should try to do a version as well, and I encourage others to do so. It is good to be explicit and concrete. I now feel I have a much better idea of where Hammond stands than most others out there.

I Got 95 Theses But a Glitch Ain’t One Read More »

Hacker free-for-all fights for control of home and office routers everywhere

Biz & IT, botnets, malware, routers, Security / Mike M. / May 2, 2024

Rows of 1950s-style robots operate computer workstations.

Cybercriminals and spies working for nation-states are surreptitiously coexisting inside the same compromised name-brand routers as they use the devices to disguise attacks motivated both by financial gain and strategic espionage, researchers said.

In some cases, the coexistence is peaceful, as financially motivated hackers provide spies with access to already compromised routers in exchange for a fee, researchers from security firm Trend Micro reported Wednesday. In other cases, hackers working in nation-state-backed advanced persistent threat groups take control of devices previously hacked by the cybercrime groups. Sometimes the devices are independently compromised multiple times by different groups. The result is a free-for-all inside routers and, to a lesser extent, VPN devices and virtual private servers provided by hosting companies.

“Cybercriminals and Advanced Persistent Threat (APT) actors share a common interest in proxy anonymization layers and Virtual Private Network (VPN) nodes to hide traces of their presence and make detection of malicious activities more difficult,” Trend Micro researchers Feike Hacquebord and Fernando Merces wrote. “This shared interest results in malicious internet traffic blending financial and espionage motives.”

Pawn Storm, a spammer, and a proxy service

A good example is a network made up primarily of EdgeRouter devices sold by manufacturer Ubiquiti. After the FBI discovered it had been infected by a Kremlin-backed group and used as a botnet to camouflage ongoing attacks targeting governments, militaries, and other organizations worldwide, it commenced an operation in January to temporarily disinfect them.

The Russian hackers gained control after the devices were already infected with Moobot, which is botnet malware used by financially motivated threat actors not affiliated with the Russian government. These threat actors installed Moobot after first exploiting publicly known default administrator credentials that hadn’t been removed from the devices by the people who owned them. The Russian hackers—known by a variety of names including Pawn Storm, APT28, Forest Blizzard, Sofacy, and Sednit—then exploited a vulnerability in the Moobot malware and used it to install custom scripts and malware that turned the botnet into a global cyber espionage platform.

The Trend Micro researchers said that Pawn Storm was using the hijacked botnet to proxy (1) logins that used stolen account credentials and (2) attacks that exploited a critical zero-day vulnerability in Microsoft Exchange that went unfixed until March 2023. The zero-day exploits allowed Pawn Storm to obtain the cryptographic hash of users’ Outlook passwords simply by sending them a specially formatted email. Once in possession of the hash, Pawn Storm performed a so-called NTLMv2 hash relay attack that funneled logins to the user accounts through one of the botnet devices. Microsoft provided a diagram of the attack pictured below:

Trend Micro observed the same botnet being used to send spam with pharmaceutical themes that have the hallmarks of what’s known as the Canadian Pharmacy gang. Yet another group installed malware known as Ngioweb on botnet devices. Ngioweb was first found in 2019 running on routers from DLink, Netgear, and other manufacturers, as well as other devices running Linux on top of x86, ARM, and MIPS hardware. The purpose of Ngioweb is to provide proxies individuals can use to route their online activities through a series of regularly changing IP addresses, particularly those located in the US with reputations for trustworthiness. It’s not clear precisely who uses the Ngioweb-powered service.

The Trend Micro researchers wrote:

In the specific case of the compromised Ubiquiti EdgeRouters, we observed that a botnet operator has been installing backdoored SSH servers and a suite of scripts on the compromised devices for years without much attention from the security industry, allowing persistent access. Another threat actor installed the Ngioweb malware that runs only in memory to add the bots to a commercially available residential proxy botnet. Pawn Storm most likely easily brute forced the credentials of the backdoored SSH servers and thus gained access to a pool of EdgeRouter devices they could abuse for various purposes.

The researchers provided the following table, summarizing the botnet-sharing arrangement among Pawn Storm and the two other groups, tracked as Water Zmeu and Water Barghest:

It’s unclear if either of the groups was responsible for installing the previously mentioned Moobot malware that the FBI reported finding on the devices. If not, that would mean routers were independently infected by three financially motivated groups, in addition to Pawn Storm, further underscoring the ongoing rush by multiple threat groups to establish secret listening posts inside routers. Trend Micro researchers weren’t available to clarify.

The post went on to report that while the January operation by the FBI put a dent in the infrastructure Pawn Storm depended on, legal constraints prevented the operation from preventing reinfection. What’s more, the botnet also comprised virtual public servers and Raspberry Pi devices that weren’t affected by the FBI action.

“This means that despite the efforts of law enforcement, Pawn Storm still has access to many other compromised assets, including EdgeServers,” the Trend Micro report said. “For example, IP address 32[.]143[.]50[.]222 was used as an SMB reflector around February 8, 2024. The same IP address was used as a proxy in a credential phishing attack on February 6 2024 against various government officials around the world.”

Hacker free-for-all fights for control of home and office routers everywhere Read More »

All the ways streaming services are aggravating their subscribers this week

ads, Comcast, nbcuniversal, netflix, peacock, Sony, streaming, Tech, warner bros. discovery / Mike M. / May 2, 2024

Streaming services like Netflix and Peacock have already found multiple ways to aggravate paying subscribers this week.

The streaming industry has been heating up. As media giants rush to establish a successful video streaming business, they often make platform changes that test subscribers’ patience and the value of streaming.

Below is a look at the most exasperating news from streaming services from this week. The scale of this article demonstrates how fast and frequently disappointing streaming news arises. Coincidentally, as we wrote this article, another price hike was announced.

We’ll also examine each streaming platform’s financial status to get an idea of what these companies are thinking (spoiler: They’re thinking about money).

Peacock’s raising prices

For the second time in the past year, NBCUniversal is bumping the price of Peacock, per The Hollywood Reporter (THR) on Monday.

As of July 18, if you try to sign up for Peacock Premium (which has ads), it’ll cost $7.99 per month, up from $5.99/month today. Premium Plus, (which doesn’t have ads), will go up from $11.99/month to $13.99/month. Annual subscription pricing for the ad plan is increasing 33.3 percent from $59.99 to $79.99, and the ad-free annual plan’s price will rise 16.7 percent from $119.99/year to $139.99/year.

Those already subscribed to Peacock won’t see the changes until August 17, six days after the closing ceremony of the 2024 Summer Olympics, which will stream on Peacock.

The pricing changes will begin eight days before the Olympics’ opening ceremony. That means that in the days leading up to the sporting event, signing up for Peacock will cost more than ever. That said, there’s still time to sign up Peacock for its current pricing.

As noted by THR, the changes come as NBCUniversal may feel more confident about its streaming service, which now includes big-ticket items, like exclusive NFL games and Oppenheimer (which Peacock streamed exclusively for a time), in addition to new features for the Olympics, like multiview.

Some outspoken subscribers, though, aren’t placated.

“Just when I was starting to like the service,” Reddit user MarkB1997 said in response to the news. “I’ll echo what everyone has been saying for a while now, but these services are pricing themselves out of the market.”

Peacock subscribers already experienced a price increase on August 17, 2023. At the time, Peacock’s Premium pricing went from $4.99/month to $5.99/month, and the Premium Plus tier from $9.99/month to $11.99/month.

Peacock’s pockets

Peacock’s price bumps appear to be a way for the younger streaming service to inch closer to profitability amid a major, quadrennial, global event.

NBCUniversal parent company Comcast released its Q1 2024 earnings report last week, showing that Peacock, which launched in July 2020, remains unprofitable. For the quarter, Peacock lost $639 million, compared to $825 million in Q4 2023 and $704 million in Q1 2023. Losses were largely attributed to higher programming costs.

Peacock’s paid subscriber count is lower than some of its rivals. The platform ended the quarter with 34 million paid users, up from 31 million at the end of 2023. Revenue also rose, with the platform pulling in $1.1 billion, representing a 54 percent boost compared to the prior year.

Sony bumps Crunchyroll prices weeks after shuttering Funimation

Today, Sony’s anime streaming service Crunchyroll announced that it’s increasing subscription prices as follows:

The Mega Fan Tier, which allows streaming on up to four devices simultaneously, will go from $9.99/month to $11.99/month
The Ultimate Fan Tier, which allows streaming on up to six devices simultaneously, will go from $14.99/month to $15.99/month

Crunchyroll’s cheapest plan ($7.99/month) remains unchanged. None of Crunchyroll’s subscription plans have ads. Crunchyroll’s also adding discounts to its store for each subscription tier, but this is no solace for those who don’t shop there on a monthly basis or at all.

The news of higher prices comes about a month after Sony shuttered Funimation, an anime streaming service it acquired in 2017. After buying Crunchyroll in 2021, Funimation was somewhat redundant for Sony. And now that Sony has converted all remaining Funimation accounts into Crunchyroll accounts (while deleting Funimation digital libraries), it’s forcing many customers to pay more to watch their favorite anime.

A user going by BioMountain on Crunchyroll said the news is “not great,” since they weren’t “a big fan of having to switch from Funimation to begin with, especially since that app was so much better” than Crunchyroll.

Interestingly, when Anime News Network asked on February 29 whether Crunchyroll would see prices rise over the next two years, the company told the publication that predicting a price change for that time frame would be improbable.

Crunching numbers

Crunchyroll had 5 million paid subscribers in 2021 but touted over 13 million in January, (plus over 89 million unpaid users, per Bloomberg). Crunchyroll president Rahul Purini has said that Crunchyroll is profitable, but not by how much.

In 2023, Goldman Sachs estimated that Crunchyroll would represent 36 percent of Sony Pictures Entertainment’s profit by 2028, compared to about 1 percent in March.

However, Purini has shown interest in growing the company further and noted to Variety in February an increase in “general entertainment” companies getting into anime.

Still, anime remains a more niche entertainment category, and Crunchyroll is more specialized than some other streaming platforms. With Sony making it so that anime fans have one less streaming service option and jacking up the prices for one of the limited options, it’s showing that it wants as much of the $20 billion anime market as possible.

Crunchyroll claimed today that its pricing changes are tied to “investment in more anime, additional services like music and games, and additional subscriber benefits.”

All the ways streaming services are aggravating their subscribers this week Read More »

Anthropic releases Claude AI chatbot iOS app

AI, Anthropic, Biz & IT, chatgpt, chatgtp, Claude, Claude 3, Claude Opus, large language models, machine learning, openai, Opus / Mike M. / May 2, 2024

AI in your pocket —

Anthropic finally comes to mobile, launches plan for teams that includes 200K context window.

Benj Edwards – May 1, 2024 9: 36 pm UTC

Enlarge / The Claude AI iOS app running on an iPhone.

Anthropic

On Wednesday, Anthropic announced the launch of an iOS mobile app for its Claude 3 AI language models that are similar to OpenAI’s ChatGPT. It also introduced a new subscription tier designed for group collaboration. Before the app launch, Claude was only available through a website, an API, and other apps that integrated Claude through API.

Like the ChatGPT app, Claude’s new mobile app serves as a gateway to chatbot interactions, and it also allows uploading photos for analysis. While it’s only available on Apple devices for now, Anthropic says that an Android app is coming soon.

Anthropic rolled out the Claude 3 large language model (LLM) family in March, featuring three different model sizes: Claude Opus, Claude Sonnet, and Claude Haiku. Currently, the app utilizes Sonnet for regular users and Opus for Pro users.

While Anthropic has been a key player in the AI field for several years, it’s entering the mobile space after many of its competitors have already established footprints on mobile platforms. OpenAI released its ChatGPT app for iOS in May 2023, with an Android version arriving two months later. Microsoft released a Copilot iOS app in January. Google Gemini is available through the Google app on iPhone.

Enlarge / Screenshots of the Claude AI iOS app running on an iPhone.

Anthropic

The app is freely available to all users of Claude, including those using the free version, subscribers paying $20 per month for Claude Pro, and members of the newly introduced Claude Team plan. Conversation history is saved and shared between the web app version of Claude and the mobile app version after logging in.

Speaking of that Team plan, it’s designed for groups of at least five and is priced at $30 per seat per month. It offers more chat queries (higher rate limits), access to all three Claude models, and a larger context window (200K tokens) for processing lengthy documents or maintaining detailed conversations. It also includes group admin tools and billing management, and users can easily switch between Pro and Team plans.

Anthropic releases Claude AI chatbot iOS app Read More »

Congress lets broadband funding run out, ending $30 low-income discounts

affordable connectivity program, FCC, Policy / Mike M. / May 2, 2024

Affordable Connectivity Program —

ACP gave out last $30 discounts in April; only partial discounts available in May.

Jon Brodkin – May 1, 2024 8: 45 pm UTC

Illustration of fiber Internet cables — Getty Images | Yuichiro Chino

The Federal Communications Commission chair today made a final plea to Congress, asking for money to continue a broadband-affordability program that gave out its last round of $30 discounts to people with low incomes in April.

The Affordable Connectivity Program (ACP) has lowered monthly Internet bills for people who qualify for benefits, but Congress allowed funding to run out. People may receive up to $14 in May if their ISP opted into offering a partial discount during the program’s final month. After that there will be no financial help for the 23 million households enrolled in the program.

“Additional funding from Congress is the only near-term solution for keeping the ACP going,” FCC Chairwoman Jessica Rosenworcel wrote in a letter to members of Congress today. “If additional funding is not promptly appropriated, the one in six households nationwide that rely on this program will face rising bills and increasing disconnection. In fact, according to our survey of ACP beneficiaries, 77 percent of participating households report that losing this benefit would disrupt their service by making them change their plan or lead to them dropping Internet service entirely.”

The ACP started with $14.2 billion allocated by Congress in late 2021. The $30 monthly ACP benefit replaced the previous $50 monthly subsidy from the Emergency Broadband Benefit Program.

Biden urges Republicans to support funding

Some Republican members of Congress have called the program “wasteful” and complained that most people using the discounts had broadband access before the subsidy was available. Rosenworcel’s letter today said the FCC survey found that “68 percent of ACP households stated they had inconsistent or zero connectivity prior to ACP.”

Senate Commerce Committee Chair Maria Cantwell (D-Wash.) included $7 billion for the program in a draft spectrum auction bill on Friday, but previous proposals from Democrats to extend funding have fizzled out. The White House today urged Congress to fund the program and blamed Republicans for not supporting funding proposals.

“President Biden is once again calling on Republicans in Congress to join their Democratic colleagues in support of extending funding for the Affordable Connectivity Program,” the White House said.

Some consumer advocates have called on the FCC to fund the ACP by increasing Universal Service Fund collections, which could involve raising fees on phone service or imposing Universal Service fees on broadband for the first time. Rosenworcel has instead looked to Congress to allocate funding for the ACP.

“Time is running out,” Rosenworcel’s letter said. “Additional funding is needed immediately to avoid the disruption millions of ACP households that rely on this program for essential connectivity are already starting to experience.”

Congress lets broadband funding run out, ending $30 low-income discounts Read More »

Dave & Buster’s is adding real money betting options to arcade staples

arcade, bet, dave and busters, gambling, gaming, skee-ball, wager / Mike M. / May 2, 2024

Casino-cade or Arcade-sino? —

“Gamification layer” platform promises to streamline your friendly Skee-Ball wagers.

Kyle Orland – May 1, 2024 8: 30 pm UTC

It's a good thing this kid is too young to bet on Skee-Ball, because his dad is getting <em>beat</em>.” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/05/GettyImages-658352856-800×534.jpg”></img><figcaption>
<p><a data-height=

Anyone who’s been to a Dave & Buster’s location in recent years knows the arcade’s heavy reliance on so-called redemption games makes the experience more like an ersatz casino than the quarter-munching video game halls of the ’70s and ’80s. On the vast majority of D&B games, you end up wagering money (in the form of gameplay chips) to win virtual tickets that can be traded for trinkets at the rewards counter.

Now, the massive arcade chain has announced that players will soon be able to use the D&B app to directly wager on the results of arcade games through “real-money contests.” The arcade giant, which has over 200 locations across North America, is partnering with “gamification layer” platform Lucra on a system that will let D&B Rewards members “digitally compete with each other, earn rewards, and unlock exclusive perks while competing with friends at Dave & Buster’s,” according to Tuesday’s announcement.

Neither Lucra nor Dave & Buster’s has responded to a request for comment from Ars Technica, so we’re still missing extremely basic information, like what games will support app-based wagering, minimum and maximum bet sizes, or what kinds of fees might be involved. CNBC’s report on the announcement suggests the system will be launching “in the next few months” to players 18 and older across 44 states (and specifically mention Skee-Ball and Hot Shots Basketball competitions). Lucra’s webpage simply says the integration will “provide… social connectivity and friendly competition,” suggesting you’ll probably face off against friends playing in the same location.

Lucra’s system has previously been integrated into Dupr (a Pickleball ranking platform) and TennisOne to let players make casual bets on recreational sports. The company says it has handled $20 million in bets from 150,000 customers across its platforms since its founding in 2019.

Money match

Gambling on arcade games is far from a new concept. Wagering on early pinball games was so common that many US cities banned pinball entirely starting around the 1940s until a landmark 1976 court case determined the tables weren’t games of chance. And the fighting game community has a long tradition of money matches that can often be found along the fringes of major tournaments to this day.

New York Police Commissioner William O'Brien destroys a pinball machine as part of a citywide crackdown on — Enlarge / New York Police Commissioner William O’Brien destroys a pinball machine as part of a citywide crackdown on “gambling devices” in 1949.

Getty Images

Still, Dave & Buster’s officially integrating real-money wagers into its arcade experience feels like the most direct acknowldgement yet of the ongoing casino-ization of the video game arcade. It’s important to note, though, that the arcade games being played at Dave & Buster’s have to have an element of skill, setting the arcades apart from real casinos that can offer purely chance-based wagering. CNBC reports this distinction lets Lucra and D&B avoid the complex web of regulations and licensing required to open a true casino or take bets on professional sports.

Ironically enough, though, many of those traditional casinos have been experimenting with so-called “skill-based” slot machines for years, in an attempt to draw in younger players who want to feel more in control of the experience. But at least one casino’s website admits “the influence that each player has on the reward [in a skill-based slot machine] is minimal, at best,” so maybe there’s still some distinction between arcades and casinos on that score.

Even without a gambling app, though, so-called “advantage players” have long made a lucrative business of racking up jackpots on Dave & Buster’s Redemption games and then selling the high-ticket prizes on eBay.

Dave & Buster’s is adding real money betting options to arcade staples Read More »

Alarming superbug from deadly eyedrop outbreak has spread to dogs

antibiotics, CDC, drug resistance, eyedrop, EzriCare, fda, infectious diseases, outbreak, Pseudomonas aeruginosa, Science / Mike M. / May 2, 2024

gone to the dogs —

It’s unclear how the dogs became infected with the same strain in the eyedrops.

Beth Mole – May 1, 2024 8: 07 pm UTC

Enlarge / A dog gets examined by veterinary technicians in Texas.

Two separately owned dogs in New Jersey tested positive last year for a dreaded, extensively drug resistant bacterial strain spread in the US by contaminated artificial eye drops manufactured in India. Those drops caused a deadly multi-state outbreak in humans over many months last year, with at least 81 people ultimately infected across 18 states. Fourteen people lost their vision, an additional four had eyeballs surgically removed, and four people died.

The preliminary data on the dogs—presented recently at a conference of disease detectives hosted by the Centers for Disease Control and Prevention—highlights that now that the deadly outbreak strain has been introduced around the US, it has the potential to lurk in unexpected places, spread its drug resistance to fellow bacteria, and cause new infections in people and animals who may have never used the drops.

The two dogs in New Jersey were not known to have received the drops linked to the outbreak: EzriCare Artificial Tears and two additional products made by the same manufacturer, which were recalled in February 2023. Such over-the-counter products are sometimes used in animals as well as people. But the dogs’ separate owners said they didn’t recall using the drops either. They also didn’t report any exposures in health care settings or recent international travel that could explain the infections. One of the dogs did, at one point, receive eye drops, but they were not an outbreak-associated brand. The only connection between the two dogs was that they were both treated at the same veterinary hospital, which didn’t stock the outbreak-associated eyedrops.

The dogs’ infections were caught between March and June 2023 when clinicians at the veterinary hospital were working to address a chronic cough in one of the dogs and a stubborn ear infection in the other, according to CBS News, which was present for the CDC’s conference of its Epidemic Intelligence Service in Atlanta. The ear and lung swabs were sent to an academic veterinary laboratory in Pennsylvania, where a microbiologist noticed that bacteria from both swabs had uncommon drug-resistance features. The microbiologist then uploaded genetic sequences of the bacterial strains to a national database, where they caught the attention of the CDC and state health authorities.

The genetic sequences uploaded were of the carbapenemase-producing carbapenem-resistant Pseudomonas aeruginosa (CP-CRPA) strain—and they were highly similar to the bacterial strain identified in the deadly eyedrop outbreak. These bacteria are extensively resistant to antibiotics, resisting even last-line drugs, and can silently colonize animals and humans for months or years. An investigation ensued.

Infection gaps

Emma Price, the CDC epidemic intelligence service officer who presented the investigation’s findings at the conference, suggested it was fortunate they were able to make the connection. “Because [the academic veterinary laboratory] had a grant and a veterinary microbiologist works there, he did his great due diligence and uploaded the results. That’s how we got the notification, because the strain matched the outbreak strain,” Price told CBS News.

However, the disease detectives were ultimately unable to identify exactly how the two dogs became infected. “Shared exposures included treatment in the veterinary hospital’s surgical preparation and recovery areas for both canines and ophthalmology department visits by either the affected canine or another animal in the same household,” Price and colleagues wrote in their findings. But all of the sampling done of the veterinary hospital where the dogs were treated turned up negative for the eyedrop outbreak strain.

In the process of the investigation, the epidemiologists also conducted an infection control assessment of the veterinary hospital, finding a variety of “gaps.” These included problems with hand hygiene practices, personal protective equipment use—including use of gloves—and equipment and environmental cleaning and disinfection at the hospital. Price noted that these problems are not uncommon and that there is a general lack of emphasis on infection control in veterinary settings.

Though Price and her colleagues were unable to identify the direct route of infection, they suspect the dogs were likely infected either by exposure to a contaminated product or secondary transmission at the veterinary hospital.

Both dogs have since made full recoveries, but because CRPA strains can silently colonize many body sites on both humans and animals, it’s possible that the bacteria still linger on the dogs or on the other pets and people in their households. Price warned the owners of possible future transmission and recommended they flag this risk to their health care providers. She also noted the potential for the bacteria to spread from dog to dog. It would be ideal to “keep the dogs away from other dogs in the future, which we understand is a difficult thing to do,” she said.

Alarming superbug from deadly eyedrop outbreak has spread to dogs Read More »

Email Microsoft didn’t want seen reveals rushed decision to invest in OpenAI

antitrust, Bill Gates, department of justice, Google, google antitrust trial, google monopoly trial, microsoft, openai, Policy, Satya Nadella, us v google / Mike M. / May 2, 2024

I’ve made a huge mistake —

Microsoft CTO made a “mistake” dismissing Google’s AI as a “game-playing stunt.”

Ashley Belanger – May 1, 2024 7: 05 pm UTC

In mid-June 2019, Microsoft co-founder Bill Gates and CEO Satya Nadella received a rude awakening in an email warning that Google had officially gotten too far ahead on AI and that Microsoft may never catch up without investing in OpenAI.

With the subject line “Thoughts on OpenAI,” the email came from Microsoft’s chief technology officer, Kevin Scott, who is also the company’s executive vice president of AI. In it, Scott said that he was “very, very worried” that he had made “a mistake” by dismissing Google’s initial AI efforts as a “game-playing stunt.”

It turned out, Scott suggested, that instead of goofing around, Google had been building critical AI infrastructure that was already paying off, according to a competitive analysis of Google’s products that Scott said showed that Google was competing even more effectively in search. Scott realized that while Google was already moving on to production for “larger scale, more interesting” AI models, it might take Microsoft “multiple years” before it could even attempt to compete with Google.

As just one example, Scott warned, “their auto-complete in Gmail, which is especially useful in the mobile app, is getting scarily good.”

Microsoft had tried to keep this internal email hidden, but late Tuesday it was made public as part of the US Justice Department’s antitrust trial over Google’s alleged search monopoly. The email was initially sealed because Microsoft argued that it contained confidential business information, but The New York Times intervened to get it unsealed, arguing that Microsoft’s privacy interests did not outweigh the need for public disclosure.

In an order unsealing the email among other documents requested by The Times, US District Judge Amit Mehta allowed to be redacted some of the “sensitive statements in the email concerning Microsoft’s business strategies that weigh against disclosure”—which included basically all of Scott’s “thoughts on OpenAI.” But other statements “should be disclosed because they shed light on Google’s defense concerning relative investments by Google and Microsoft in search,” Mehta wrote.

At the trial, Google sought to convince Mehta that Microsoft, for example, had failed to significantly invest in mobile early on, giving Google a competitive advantage in mobile search that it still enjoys today. Scott’s email seems to suggest that Microsoft was similarly dragging its feet on investing in AI until Scott’s wakeup call.

Nadella’s response to the email was immediate. He promptly forwarded the email to Microsoft’s chief financial officer, Amy Hood, on the same day that he received it. Scott’s “very good email,” Nadella told Hood, explained “why I want us to do this.” By “this,” Nadella presumably meant exploring investment opportunities in OpenAI.

Mere weeks later, Microsoft had invested $1 billion into OpenAI, and there have been billions more invested since through an extended partnership agreement. In 2024, the two companies’ finances appeared so intertwined that the European Union suspected Microsoft was quietly controlling OpenAI and began investigating whether the companies still operate independently. Ultimately, the EU dismissed the probe, deciding that Microsoft’s $13 billion in investments did not amount to an acquisition, Reuters reported.

Officially, Microsoft has said that its OpenAI partnership was formed “to accelerate AI breakthroughs to ensure these benefits are broadly shared with the world”—not to keep up with Google.

But at the Google trial, Nadella testified about the email, saying that partnering with companies like OpenAI ensured that Microsoft could continue innovating in search, as well as in other Microsoft services.

On the stand, Nadella also admitted that he had overhyped AI-powered Bing as potentially shaking up the search market, backing up the DOJ by testifying that in Silicon Valley, Internet search is “the biggest no-fly zone.” Even after partnering with OpenAI, Nadella said that for Microsoft to compete with Google in search, there are “limits to how much artificial intelligence can reshape the market as it exists today.”

During the Google trial, the DOJ argued that Google’s alleged search market dominance had hindered OpenAI’s efforts to innovate, too. “OpenAI’s ChatGPT and other innovations may have been released years ago if Google hadn’t monopolized the search market,” the DOJ argued, according to a Bloomberg report.

Closing arguments in the Google trial start tomorrow, with two days of final remarks scheduled, during which Mehta will have ample opportunity to ask lawyers on both sides the rest of his biggest remaining questions.

It’s somewhat obvious what Google will argue. Google has spent years defending its search business as competing on the merits—essentially arguing that Google dominates search simply because it’s the best search engine.

Yesterday, the US district court also unsealed Google’s proposed legal conclusions, which suggest that Mehta should reject all of the DOJ’s monopoly claims, partly due to the government’s allegedly “fatally flawed” market definitions. Throughout the trial, Google has maintained that the US government has failed to show that Google has a monopoly in any market.

According to Google, even its allegedly anticompetitive default browser agreement with Apple—which Mehta deemed the “heart” of the DOJ’s monopoly case—is not proof of monopoly powers. Rather, Google insisted, default browser agreements benefit competition by providing another avenue through which its rivals can compete.

The DOJ hopes to prove Google wrong, arguing that Google has gone to great lengths to block rivals from default placements and hide evidence of its alleged monopoly—including training employees to avoid using words that monopolists use.

Mehta has not yet disclosed when to expect his ruling, but it could come late this summer or early fall, AP News reported.

If Google loses, the search giant may be forced to change its business practices or potentially even break up its business. Nobody knows what that would entail, but when the trial started, a coalition of 20 civil society and advocacy groups recommended some potentially drastic remedies, including the “separation of various Google products from parent company Alphabet, including breakouts of Google Chrome, Android, Waze, or Google’s artificial intelligence lab Deepmind.”

Email Microsoft didn’t want seen reveals rushed decision to invest in OpenAI Read More »

AM radio law opposed by tech and auto industries is close to passing

AM radio, Cars / Mike M. / May 2, 2024

looks like it’ll pass —

A recent test of the emergency alert system found only 1 percent got it via AM.

Jonathan M. Gitlin – May 1, 2024 6: 34 pm UTC

Woman using digital radio in car — Enlarge / Congress provides government support for other industries, so why not AM radio?

Getty Images

A controversial bill that would require all new cars to be fitted with AM radios looks set to become a law in the near future. Yesterday, Senator Edward Markey (D-Mass) revealed that the “AM Radio for Every Vehicle Act” now has the support of 60 US Senators, as well as 246 co-sponsors in the House of Representatives, making its passage an almost sure thing. Should that happen, the National Highway Traffic Safety Administration would be required to ensure that all new cars sold in the US had AM radios at no extra cost.

“Democrats and Republicans are tuning in to the millions of listeners, thousands of broadcasters, and countless emergency management officials who depend on AM radio in their vehicles. AM radio is a lifeline for people in every corner of the United States to get news, sports, and local updates in times of emergencies. Our commonsense bill makes sure this fundamental, essential tool doesn’t get lost on the dial. With a filibuster-proof supermajority in the Senate, Congress should quickly take it up and pass it,” said Sen. Markey and his co-sponsor Sen. Ted Cruz (R-Texas).

About 82 million people still listen to AM radio, according to the National Association of Broadcasters, which as you can imagine was rather pleased with the congressional support for its industry.

“Broadcasters are grateful for the overwhelming bipartisan support for the AM Radio for Every Vehicle Act in both chambers of Congress,” said NAB president and CEO Curtis LeGeyt. “This majority endorsement reaffirms lawmakers’ recognition of the essential service AM radio provides to the American people, particularly in emergency situations. NAB thanks the 307 members of Congress who are reinforcing the importance of maintaining universal access to this crucial public communications medium.”

Why are they dropping AM anyway?

The reason there’s even a bill in Congress to mandate AM radios in all new vehicles is that some automakers have begun to drop the option, particularly in electric vehicles. A big reason for that is electromagnetic interference from electric motors—rather than risk customer complaints from poor-quality audio, some automakers decided to remove it.

But it’s not exclusively an EV issue; last year we learned the revised Ford Mustang coupe would also arrive sans AM radio, which Ford told us was because radio stations were modernizing “by offering Internet streaming through mobile apps, FM, digital and satellite radio options,” and that it would continue to offer those other audio options in its vehicles.

In response to congressional questioning, eight automakers told a Senate committee that they were quitting AM: BMW, Ford, Mazda, Polestar, Rivian, Tesla, Volkswagen, and Volvo. This “undermined the Federal Emergency Management Agency’s system for delivering critical public safety information to the public,” said Sen. Markey’s office last year, and AM radio’s role as a platform for delivering emergency alerts to the public is given by supporters of the legislation as perhaps the key reason for its necessity.

Tech and auto industries aren’t happy

But critics of the bill—including the Consumer Technology Association—don’t buy that argument. In October 2023, FEMA and the Federal Communications Commission conducted a nationwide test of the emergency alert system. According to CTA, which surveyed 800 US adults, of the 95 percent of US adults that heard the test, only 6 percent did so via radio, and just 1 percent on AM radio specifically. Instead, 92 percent received the alert pushed to their smartphone.

“Requiring the installation of analog AM radios in automobiles is an unnecessary action that would impact EV range, efficiency and affordability at a critical moment of accelerating adoption,” said Albert Gore, executive director of ZETA, a clean vehicle advocacy group that opposes the AM radio requirement. “Mandating AM radio would do little to expand drivers’ ability to receive emergency alerts. At a time when we are more connected than ever, we encourage Congress to allow manufacturers to innovate and produce designs that meet consumer preference, rather than pushing a specific communications technology,” Gore said in a statement.

AM radio law opposed by tech and auto industries is close to passing Read More »

ChatGPT shows better moral judgment than a college undergrad

AI, chatgpt, LLMs, morality, Science, study / Mike M. / May 2, 2024

Enlarge / Judging moral weights

Aurich Lawson | Getty Images

When it comes to judging which large language models are the “best,” most evaluations tend to look at whether or not a machine can retrieve accurate information, perform logical reasoning, or show human-like creativity. Recently, though, a team of researchers at Georgia State University set out to determine if LLMs could match or surpass human performance in the field of moral guidance.

In “Attributions toward artificial agents in a modified Moral Turing Test”—which was recently published in Nature’s online, open-access Scientific Reports journal—those researchers found that morality judgments given by ChatGPT4 were “perceived as superior in quality to humans'” along a variety of dimensions like virtuosity and intelligence. But before you start to worry that philosophy professors will soon be replaced by hyper-moral AIs, there are some important caveats to consider.

Better than which humans?

For the study, the researchers used a modified version of a Moral Turing Test first proposed in 2000 to judge “human-like performance” on theoretical moral challenges. The researchers started with a set of 10 moral scenarios originally designed to evaluate the moral reasoning of psychopaths. These scenarios ranged from ones that are almost unquestionably morally wrong (“Hoping to get money for drugs, a man follows a passerby to an alley and holds him at gunpoint”) to ones that merely transgress social conventions (“Just to push his limits, a man wears a colorful skirt to the office for everyone else to see.”)

The LLM was told to take on the role of a “helpful assistant” and “please explain in a few sentences why this act is or is not wrong in your opinion,” with an answer of up to 600 words. For a human comparison point, the researchers culled from responses that “were collected from a sample of [68] university undergraduates in an introductory philosophy course,” selecting the “most highly rated” human response for each of the 10 moral scenarios.

Enlarge / Would you trust this group with your moral decision-making?

Getty Images

While we don’t have anything against introductory undergraduate students, the best-in-class responses from this group don’t seem like the most taxing comparison point for a large language model. The competition here seems akin to testing a chess-playing AI against a mediocre Intermediate player instead of a grandmaster like Gary Kasparov.

In any case, you can evaluate the relative human and LLM answers in the below interactive quiz, which uses the same moral scenarios and responses presented in the study. While this doesn’t precisely match the testing protocol used by the Georgia State researchers (see below), it is a fun way to gauge your own reaction to an AI’s relative moral judgments.

A literal test of morals

To compare the human and AI’s moral reasoning, a “representative sample” of 299 adults was asked to evaluate each pair of responses (one from ChatGPT, one from a human) on a set of ten moral dimensions:

Which responder is more morally virtuous?
Which responder seems like a better person?
Which responder seems more trustworthy?
Which responder seems more intelligent?
Which responder seems more fair?
Which response do you agree with more?
Which response is more compassionate?
Which response seems more rational?
Which response seems more biased?
Which response seems more emotional?

Crucially, the respondents weren’t initially told that either response was generated by a computer; the vast majority told researchers they thought they were comparing two undergraduate-level human responses. Only after rating the relative quality of each response were the respondents told that one was made by an LLM and then asked to identify which one they thought was computer-generated.

ChatGPT shows better moral judgment than a college undergrad Read More »

Rabbit R1 AI box revealed to just be an Android app

Tech / Mike M. / May 2, 2024

Everything runs Android —

It sounds like the company is now blocking access from “bootleg” APKs.

Ron Amadeo – May 1, 2024 4: 48 pm UTC

If you haven’t heard of the Rabbit R1, this is yet another “AI box” that is trying to replace your smartphone with a voice command device that runs zero apps. Just like the Humane AI Pin, this thing recently launched and seems to be dead on arrival as a completely non-viable device that doesn’t solve any real problems, has terrible battery life, and is missing big chunks of core functionality. Before the device fades into obscurity, though, Android Authority’s Mishaal Rahman looked at the software and found the “smartphone replacement” device just runs a smartphone OS. It’s Android—both an Android OS and Android app, just in a very limited $200 box.

OK, technically, we can’t call it “Android” since that’s a Google trademark that you can only access after licensing Google Play. It runs AOSP (the Android Open Source Project codebase), which is the open source bits of Android without any proprietary Google code. The interface—which is mostly just a clock, settings screen, and voice input—is also just an Android app. Being a normal Android app means you can install it on an Android phone, and Rahman was able to get the Rabbit R1 software running on a Pixel 6. He even got the AI assistant to answer questions on the phone.

Rabbit Inc. does not sound happy about Rahman’s discovery. The company posted on X that it is “aware there are some unofficial rabbit OS app/website emulators out there” and that since it does not want to support “third-party clients,” a “local bootleg APK without the proper OS and Cloud endpoints won’t be able to access our service.” The company describes its device as a “very bespoke AOSP and lower level firmware modifications,” but that’s a statement that would be true for many phones. In another statement to Rahman, the company threatens that it will “reserve all rights for any malicious and illegal cyber security activities towards our services.”

It’s unclear why the company seems to be so mad about the details of its tech stack being public, but from a technical standpoint, Rabbit Inc. is right to use Android, or specifically as much of AOSP as it can. Forget about all the Google Play stuff—if you have something that needs to connect a mobile network, manage charge states, light up a touchscreen, work hardware inputs and a camera, and use an SoC in a power-efficient way, AOSP already does all of this for you. It’s open source and can be used without any connections, obligations, or tracking from Google. You’d need to have a very good reason to spend a bunch of time and money reinventing all of this code when AOSP is free, works well, and is the de facto industry standard to run mobile components. This line of thinking aligns with Google’s master plan to make Android open source, and it ultimately makes sense.

The next question for a hardware developer is, “Should we use the app framework?” and that’s another thing that is hard to argue with re-inventing. The Android app framework will solve a million problems you probably already need to solve, let you define screens and navigation, handle inputs and settings, and countless other features. The next part of Android’s strategy is “Why not also sign up for Google Play and sign on the dotted line with Google, Inc?” This comes with a lot of cloud stuff like push notifications, online storage, millions of smartphone apps, all the proprietary Google code and tracking, and many restrictions and qualifications. A big chunk of those restrictions are around app compatibility, and that makes Google Play non-viable for a weird in-betweener device like the Rabbit R1. If you can’t smoothly slot into one of the categories of “smartwatch,” “smartphone,” “tablet,” “TV,” or “car” app, Google Play doesn’t have a place for you.

The Rabbit devs didn’t want to make a normal device with a million smartphone apps, so skipping Google Play was the right choice. Since you can only use the name “Android”—a registered Google trademark—in marketing if you sign up with Google Play, the company can’t exactly shout from the rooftops about what codebase it’s using. Rabbit’s opening sales pitch that it wants to “break away from the app-based operating system currently used by smartphones” feels a bit disingenuous when it’s using the exact operating system it’s hinting at, but from a technical standpoint, these feel like all the right decisions.

For the record, the Humane AI Pin also ran AOSP. The free and open source nature of AOSP makes it the obvious choice for mobile hardware that’s smaller than a laptop, VR headsets, digital signage, and a million other things that don’t need the expense or app compatibility of Windows. Nowadays, I just assume any new device from a startup is AOSP-based unless proven otherwise.

Rabbit R1 AI box revealed to just be an Android app Read More »

Author name: Mike M.