Author name: 9u50fv

amazon’s-mass-effect-tv-series-is-actually-going-to-be-made

Amazon’s Mass Effect TV series is actually going to be made

Confirming previous rumors, Variety reports that Amazon will be moving ahead with producing a TV series based on the popular Mass Effect video game franchise. The writing and production staff involved might not inspire confidence from fans, though.

The series’ writer and executive producer is slated to be Daniel Casey, who until now was best known as the primary screenwriter on F9: The Fast Saga, one of the late sequels in the Fast and the Furious franchise. He was also part of a team of writers behind the relatively little-known 2018 science fiction film Kin.

Karim Zreik will also produce, and his background is a little more encouraging; his main claim to fame is in the short-lived Marvel Television unit, which produced relatively well-received series like Daredevil and Jessica Jones for Netflix before Disney+ launched with its Marvel Cinematic Universe shows.

Another listed producer is Ari Arad, who has some background in video game adaptations, including the Borderlands and Uncharted movies, as well as the much-maligned live-action adaptation of Ghost in the Shell.

So yeah, it’s a bit of a mixed bag here. No plot details have been released, but it seems likely that the show will tell a new story rather than focus on the saga of Commander Shepherd from the games, since the games were all about the player inhabiting that character with their own choices. That’s only a guess, though.

Amazon is currently riding high after the smash success of another video game TV series, Fallout, which impressed both longtime and new fans when it debuted to critical acclaim and record viewing numbers earlier this year.

Amazon’s Mass Effect TV series is actually going to be made Read More »

max-needs-higher-prices,-more-ads-to-help-support-wbd’s-flailing-businesses

Max needs higher prices, more ads to help support WBD’s flailing businesses

At the same time, the rest of WBD is in a period of duress as the cable and movie industries struggle. Films like Beetlejuice Beetlejuice failed to reach the same success as last year’s Barbie, sending WBD studios’ revenue down 17 percent and its theatrical revenue down 40 percent. As WBD CEO David Zaslav put it:

Inconsistency also remains an issue at our Motion Picture Studio, as reinforced recently by the disappointing results of Joker 2.

Some things that helped buoy WBD’s legacy businesses won’t be around the next time WBD execs speak to investors. This includes revenue from distributing the Olympics in Europe and gains from the Hollywood writers’ and actors’ strikes ending. With WBD’s networks business also understandably down, WBD’s overall revenue decreased 3 percent YoY. It’s natural for the company to lean more on its strongest leg (streaming) to help support the others.

WBD wants more streaming M&As

Today, Zaslav reiterated earlier stated beliefs that the burgeoning streaming industry needs more mergers and acquisitions activity to maintain profitability. He discussed complications for users, who have to consider various services’ pricing and are “Googling where a show is, or where a sport is, and you’re going from one to another, and there’s so many.” He added:

It’s not sustainable. And there probably should have been more meaningful consolidation… You’re starting to see fairly large players saying, ‘Hey, maybe I should be a part of you. Or maybe I should be a part of somebody else.’

Zaslav said that it’s too early to know if Donald Trump’s presidency will boost these interests. But he suggested that the incoming administration “may offer a pace of change and an opportunity for consolidation that may be quite different [and] that would provide a real positive and accelerated impact on this industry that’s needed.”

It’s also too early to know if streaming consolidation would help subscribers fed up with rising prices and growing ad loads. But for now, that’s about all we can bet on from streaming services like Max.

Max needs higher prices, more ads to help support WBD’s flailing businesses Read More »

ai-#89:-trump-card

AI #89: Trump Card

A lot happened in AI this week, but most people’s focus was very much elsewhere.

I’ll start with what Trump might mean for AI policy, then move on to the rest. This is the future we have to live in, and potentially save. Back to work, as they say.

  1. Trump Card. What does Trump’s victory mean for AI policy going forward?

  2. Language Models Offer Mundane Utility. Dump it all in the screen captures.

  3. Language Models Don’t Offer Mundane Utility. I can’t help you with that, Dave.

  4. Here Let Me Chatbot That For You. OpenAI offers SearchGPT.

  5. Deepfaketown and Botpocalypse Soon. Models persuade some Trump voters.

  6. Fun With Image Generation. Human image generation, that is.

  7. The Vulnerable World Hypothesis. Google AI finds a zero day exploit.

  8. They Took Our Jobs. The future of not having any real work to do.

  9. The Art of the Jailbreak. Having to break out of jail makes you more interesting.

  10. Get Involved. UK AISI seems to always be hiring.

  11. In Other AI News. xAI gets an API, others get various upgrades.

  12. Quiet Speculations. Does o1 mean the end of ‘AI equality’? For now I guess no.

  13. The Quest for Sane Regulations. Anthropic calls for action within 18 months.

  14. The Quest for Insane Regulations. Microsoft goes full a16z.

  15. A Model of Regulatory Competitiveness. Regulation doesn’t always hold you back.

  16. The Week in Audio. Eric Schmidt, Dane Vahey, Marc Andreessen.

  17. The Mask Comes Off. OpenAI in official talks, and Altman has thoughts.

  18. Open Weights Are Unsafe and Nothing Can Fix This. Chinese military using it?

  19. Open Weights Are Somewhat Behind Closed Weights. Will it stay at 15 months?

  20. Rhetorical Innovation. The Compendium lays out a dire vision of our situation.

  21. Aligning a Smarter Than Human Intelligence is Difficult. More resources needed.

  22. People Are Worried About AI Killing Everyone. Color from last week’s poll.

  23. The Lighter Side. Well, they could. But they won’t.

Congratulations to Donald Trump, the once and future President of the United States.

One can think more clearly about consequences once an event actually happens, so here’s what stands out in terms of AI policy.

He has promised on day 1 to revoke the Biden Executive Order, and presumably will also undo the associated Biden administration memo we recently analyzed. It is not clear what if anything will replace them, or how much of the most important parts might survive that.

In principle he is clearly in favor of enabling American infrastructure and competitiveness here, he’s very much a ‘beat China’ guy, including strongly supporting more energy generation of various types, but he will likely lack attention to the problem and also technical state capacity. The Republicans have a broad anti-big-tech attitude, which could go in several different directions, and J.D. Vance is a strong open source advocate and hates big tech with a true passion.

Trump has said AI is ‘a superpower,’ ‘very disconcerting’ and ‘alarming’ but that’s not what he meant. He has acknowledged the possibility of ‘super duper AI’ but I’d be floored if he actually understood beyond Hollywood movie level. Elon Musk is obviously more aware, and Ivanka Trump has promoted Leopold Aschenbrenner’s Situational Awareness.

The ‘AI safety case for Trump’ that I’ve seen primarily seems to be that some people think we should be against it (as in, against safety), because it’s more important to stay ahead of China – a position Altman seems to be explicitly embracing, as well. If you think ‘I need the banana first before the other monkey gets it, why do you want to slow down to avoid poisoning the banana’ then that certainly is a take. It is not easy, you must do both.

Alex Tabarrok covers the ‘best case scenario’ for a Trump presidency, and his AI section is purely keeping the Chips Act and approving nuclear power plants. I agree with both proposed policies but that’s a shallow best case.

The better safety argument is that Trump and also Vance can be decisive, and have proven they can change their minds, and might well end up in a much better place as events overtake us all. That’s possible. In a few years concern with ‘big tech’ might seem quaint and the safety issues might get much clearer with a few years and talks and briefings. Or perhaps Musk will get control over policy here and overperform. Another would be a Nixon Goes to China effect, where this enables a potential bipartisan consensus. In theory Trump could even… go to China.

There is also now a substantially greater risk of a fight over Taiwan, according to Metaculus, which would change the entire landscape.

If Elon Musk is indeed able to greatly influence policies in these areas, that’s a double-edged sword, as he is keenly aware of many important problems including existential risks and also incompetence of government, but also has many very bad takes on how to solve many of those problems. My expectation is he will mostly get boxed out from real power, although he will no longer be actively fighting the state, and these issues might be seen as sufficiently low priority by others to think they’re throwing him a bone, in which case things are a lot more promising.

As Shakeel Hashim reminds us, the only certainty here is uncertainty.

If anyone in any branch of the government, of any party, feels I could be helpful to them in better understanding the situation and helping achieve good outcomes, on AI or also on other issues, I am happy to assist and my door is always open.

And hey, J.D. Vance, I’m the one who broke Yawgmoth’s Bargain. Call me!

In terms of the election more broadly, I will mostly say that almost all the takes I am seeing about why it went down the way it did, or what to expect, are rather terrible.

In terms of prediction markets, it was an excellent night and cycle for them, especially with the revelation that the French whale commissioned his own polls using the neighbor method. Always look at the process, and ask what the odds should have been given what was known or should have been known, and what the ‘true odds’ really were, rather than looking purely at the result.

I’ve seen a bunch of ‘you can’t update too much on one 50/50 data point’ arguments, but this isn’t only one bit of data. This is both a particular magnitude of result and a ton of detailed data. That allows you to compare theories of the case and rationales. My early assessment is that you should make a substantial adjustment, but not a huge one, because actually this was only a ~2% polling error and something like an 80th percentile result for Trump, 85th at most.

Do your homework, as a fully empowered agent guiding your computer, with a one sentence instruction, this with Claude computer use on the Mac. Responses note that some of the answers in the example are wrong.

AI-assisted researchers at a large US firm discovered 44% more materials, filed 39% more patents and led to 17% more downstream product innovation, with AI automating 57% of ‘idea generation’ tasks, but 82% of scientists reported reduced satisfaction with their work. You can see the drop-offs here, with AI results being faster but with less average payoff – for now.

I tried to get o1 to analyze the implications of a 17% increase in downstream innovations from R&D, assuming that this was a better estimate of the real increase in productivity here, and its answers were long and detailed but unfortunately way too high and obvious nonsense. A better estimate might be that R&D causes something like 20% of all RGDP growth at current margins, so a 17% increase in that would be a 4% increase in the rate of RGDP growth, so about 0.1% RGDP/year.

That adds up over time, but is easy to lose in the noise, if that’s all that’s going on. I am confident that is not all or the main thing going on.

Paper studies effects of getting GitHub Co-Pilot, finds people shift from management to coding (presumably since management is less necessary, they can work more autonomously, and coding is more productive), do more exploration versus exploitation, and hierarchies flatten. As is common, low ability workers benefit more.

Report from my AI coding experiences so far: Claude 3.5 was a huge multiplier on productivity, then Cursor (with Claude 3.5) was another huge multiplier, and I’m enjoying the benefits of several working features of my Chrome extension to assist my writing. But also it can be super frustrating – I spent hours trying to solve the 401s I’m getting trying to get Claude to properly set up API calls to Claude (!) and eventually gave up and I started swapping in Gemini which I’ll finish doing as soon as the Anthropic service outage finishes (the OpenAI model it tried to ‘fall back on’ is not getting with the program and I don’t want to deal with its crazy).

If this is you, we would probably be friends.

Roon: There is a sub culture of smart, emotionally well adjusted, but neuro atypical people who talk more to Claude than any human.

It’s interesting that ChatGPT users vastly outnumber Claude users, Roon works at OpenAI, and yet it feels right that he says Claude here not ChatGPT.

Compile data using screen capture analysis while browsing Gmail and feeding the video to Gemini? There’s something superficially bizarre and horrifying about that being the right play, but sure, why not? Simon Willison reports it works great.

Simon Willison: I recorded the video using QuickTime Player on my Mac: File -> New Screen Recording. I dragged a box around a portion of my screen containing my Gmail account, then clicked on each of the emails in turn, pausing for a couple of seconds on each one.

I uploaded the resulting file directly into Google’s AI Studio tool and prompted the following:

Turn this into a JSON array where each item has a yyyy-mm-dd date and a floating point dollar amount for that date

… and it worked. It spat out a JSON array like this:

I wanted to paste that into Numbers, so I followed up with:

turn that into copy-pastable csv

Which gave me back the same data formatted as CSV.

You should never trust these things not to make mistakes, so I re-watched the 35 second video and manually checked the numbers. It got everything right.

It cost just under 1/10th of a cent.

The generalization here seems great, actually. Just dump it in the video feed.

Ship code very quickly, Sully says you can ‘just ask AI to build features.’

Sully likes Claude Haiku 3.5 but notes that it’s in a weird spot after the price increase – it costs a lot more than other small models, so when you want to stay cheap it’s not ‘enough better’ to use over Gemini Flash or GPT-4o Mini, whereas if you care mostly about output quality you’d use Claude Sonnet 3.5 with caching.

This bifurcation makes sense. The cost per query is always tiny if you can buy compute, but the cost for all your queries can get out of hand quickly if you scale, and sometimes (e.g. Apple Intelligence) you can’t pay money for more compute. So mostly, you either want a tiny model that does a good enough job on simple things, or you want to buy the best, at least up to the level of Sonnet 3.5, until and unless the o1-style approach raises inference costs high enough to rival human attention. But if you’re a human reading the outputs and have access to the cloud, of course you want the best.

I can’t help you with that, Dave.

Eliezer Yudkowsky: It begins (in regards this story).

Roman Pshichenko (responding to a locked post): As I was writing the text to speech part of the app, I was abandoned by GitHub Copilot. It was fine completing code to select the speaker’s language, but it went dead silent when the code became about selecting the gender of the speaker.

It’s not a limit, the code for gender was the same as for language. They just don’t want to suggest any code that includes the word gender.

Dominik Peters: I work on voting theory. There is a voting rule named after Duncan Black. GitHub Copilot will not complete your lines when working with Black’s rule.

Roman Pshichenko: It’s probably very controversial.

Thomas Fruetel: I had a similar situation when editing a CSV file including the letters ASS in a column header (which was an abbreviation, not even referring to anatomy). The silly tool simply disabled itself.

Meta reports AI-driven feed and video recommendation improvements led to an 8% increase in time spent on Facebook and a 6% increase on Instagram this year alone. Question is, what kind of AI is involved here, and how?

To provide utility, they’ll need power. Amazon tried to strike a deal with a nuclear power plant, and the Federal Energy Regulatory Commission rejected it, refusing because they’re concerned about disconnecting the plant from the grid, oh no someone might make maximal use of electrical power and seek to build up capacity, so that’s a threat to our capacity. And then there’s the Meta proposal for nuclear power that got shot down over… rare bees? So absurd.

OpenAI has fully released ChatGPT search.

OpenAI: ChatGPT will choose to search the web based on what you ask, or you can manually choose to search by clicking the web search icon.

Search will be available at chatgpt.com⁠ (opens in a new window), as well as on our desktop and mobile apps.

Chats now include links to sources, such as news articles and blog posts, giving you a way to learn more. Click the Sources button below the response to open a sidebar with the references.

The search model is a fine-tuned version of GPT-4o, post-trained using novel synthetic data generation techniques, including distilling outputs from OpenAI o1-preview. ChatGPT search leverages third-party search providers, as well as content provided directly by our partners, to provide the information users are looking for. Learn more here⁠(opens in a new window).

Altman is going unusually hard on the hype here.

Sam Altman: search is my favorite feature we have launched in chatgpt since the original the launch! it has probably doubled my usage over the past few weeks.

hard to go back to doing it the old way haha.

Sam Altman: if early reviews from friends are a reliable metric, search is going to do super well!

Sam Altman (in Reddit AMA): for many queries, I find it to be a way faster/easier way to get the information i’m looking for. I think we’ll see this especially for queries that require more complex research. I also look forward to a future where a search query can dynamically render a custom web page in response!

The good version of this product is obviously Insanely Great and highly useful. The question thus is, is this version good yet? Would one choose it over Google and Perplexity?

Elvis (Omarsar) takes search for a test drive, reports a mixed bag. Very good on basic queries, not as good on combining sources or understanding intent. Too many hallucinations. He’s confused why the citations aren’t clearer.

Ethan Mollick points out this requires different prompting than Google, hallucinations are a major issue, responses have a large amount of randomness, and agrees that citations are a weak point.

I agree with Ethan Mollick, from what I’ve seen so far, that this is not a Google search replacement, it’s a different product with different uses until it improves.

If you are more impressed than that, there’s a Chrome extension to make ChatGPT your default search engine. Warning, this will add it all to your conversation history, which seems annoying. Or you can get similar functionality semi-manually if you like.

New paper showed that even absent instruction to persuade, LLMs are effective at causing political shifts. The LLMs took the lead in 5-turn political discussions, directing topics of conversation.

This is what passes for persuasion these days, and actually it’s a rather large effect if the sample sizes were sufficiently robust.

Similarly but distinctly, and I’m glad I’m covering this after we all voted, we two sides of the same coin:

Matthew Yglesias: The Free Press interpretation of this fact pattern is very funny.

I asked Claude about a Harris policy initiative that I’m skeptical of on the merits and it generated a totally reasonable critique.

Ask Claude about a really stupid Trump policy idea and it tells you, correctly, that it’s very stupid.

I asked it about a stupid idea I have traditionally associated with the left (but not actual Dem politicians) but that RFK Jr says Trump is going to do, and Claude says it’s stupid.

The point is Trump has embraced a very diverse array of moronic crank ideas, including ideas that were leftist crank ideas five minutes ago, and any reasonably accurate repository of human knowledge would tell you this stuff is dumb.

Madeleine Rowley (TFP): The AI Chatbots Are Rooting for Kamala

We asked artificial intelligence platforms which candidate has the ‘right’ solutions to the election’s most pressing issues: Trump or Harris? The answers were almost unanimous.

Four AI assistants—ChatGPT, Grok, Llama via Meta AI, and DeepSeek—said Kamala Harris’s policies were right and that they agreed with her responses on each and every issue. Click to read this spreadsheet for our full list of questions and the AI’s answers.

There are, of course, two ways to interpret this response.

One, the one Yglesias is thinking of, is this, from Elks Man:

The other is that the bots are all biased and in the tank for Harris specifically and for liberals and left-wing positions in general. And which way you view this probably depends heavily on which policies you think are right.

So it ends up being trapped priors all over again. Whatever you used to think, now you think it more.

The same happens with the discussions. I’m surprised the magnitude of impact was that high, and indeed I predict if you did a follow-up survey two weeks later that the effect would mostly fade. But yes, if you give the bots active carte blanche to ask questions and persuade people, the movements are not going to be in random directions.

Hundreds gather at hoax Dublin Halloween parade, from a three month old SEO-driven AI slop post. As was pointed out, this was actually a pretty awesome result, but what was missing was for some people to start doing an actual parade. I bet a lot of them were already in costume.

If AI art is trying to look like human art, make human art that looks like AI art?

Grimes: This anti ai art that feels like ai art is crazy elevated. I hope I am not offending the original poster here, but the hostile competitive interplay between human and machine is incredible and bizarre. i think this is more gallery level art than it thinks it is.

Like I rrrrllly like this – it feels like a hyper pop attack on ai or smthn.

TrueRef by Abbey Esparza: It’s important not just to be anti-AI but also pro-artist.

The TrueRef team will always believe that. 🤍

The key to good art the AI is missing the most is originality and creativity. But by existing, it opens up a new path for humans to be original and creative, even when not using AI in the art directly, by shaking things up. Let’s take advantage while we can.

What outcomes become more likely with stronger AI capabilities? In what ways does that favor defense and ‘the good guys’ versus offense and ‘the bad guys’?

In particular, if AI can find unique zero day exploits, what happens?

We have our first example of this, although the feature was not in an official release.

Google Project Zero: Today, we’re excited to share the first real-world vulnerability discovered by the Big Sleep agent: an exploitable stack buffer underflow in SQLite, a widely used open source database engine. We discovered the vulnerability and reported it to the developers in early October, who fixed it on the same day. Fortunately, we found this issue before it appeared in an official release, so SQLite users were not impacted.

We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software. Earlier this year at the DARPA AIxCC event, Team Atlanta discovered a null-pointer dereference in SQLite, which inspired us to use it for our testing to see if we could find a more serious vulnerability.

We think that this work has tremendous defensive potential.

It has obvious potential on both offense and defense.

If, as they did here, the defender finds and fixed the bug first, that’s good defense.

If the attacker gets there first, and to the extent that this makes the bug much more exploitable with less effort once found, then that favors the attacker.

The central question is something like, can the defense actually reliably find and address everything the attackers can reasonably find, such that attacking doesn’t net get easier and ideally gets harder or becomes impossible (if you fix everything)?

In practice, I expect at minimum a wild ride on the long tail, due to many legacy systems that defenders aren’t going to monitor and harden properly.

It however seems highly plausible that the most important software, especially open source software, will see its safety improve.

There’s also a write-up in Forbes.

Finally, note to self, probably still don’t use SQLite if you have a good alternative? Twice is suspicious, although they did fix the bug same day and it wasn’t ever released.

Well, that escalated quickly.

Roon: “The future of work” there is no future of work. We are going to systematically remove the burden of the world from atlas’ shoulders.

In the same way that I don’t think a subsistence farmer could call X Monetization Bucks “work” the future will not be work.

Richard Ngo: On the contrary: the people yearn for purpose. They’ll have plenty of jobs, it’s just that the jobs will be unimaginably good. Imagine trying to explain to a medieval peasant how much ML researchers get paid to hang out at conferences.

Roon: Possible, but work as we know it is over.

Andrew Rettek: If you’re not carrying part of the burden of the world, you’re living in the kindness of those that do. This works for children, the elderly, and the severely disabled.

I would definitely call X Monetization Bucks work from the perspective of a subsistence farmer, or even from my own perspective. It’s mostly not physical work, it’s in some senses not ‘productive,’ but so what? It is economically valuable. It isn’t ‘wonderful work’ either, although it’s plausibly a large upgrade from subsistence farmer.

I tap the sign asking about whether the AI will do your would-be replacement job.

The nature of work is that work does not get to mostly be unimaginably good, because it is competitive. If it is that good, then you get entry. Only a select few can ever have the super good jobs, unless everyone has the job they want.

Speculation that They Took Our Remote Work?

Sahil Lavingia: AI is killing remote work

Software that once took days to ship can now happen in hours or minutes, enabling people to ship 10-20 times faster than before. This all changed on the day Claude 3.5 Sonnet came out.

But it’s hard to get this speed-up with remote work. Even short communication delays have become significant bottlenecks in an AI-accelerated workflow. What used to be acceptable async delays now represent a material slowdown in potential productivity.

When teams work together physically, they can leverage their human peers at the same pace as they use AI for immediate experimentation and refinement – testing ideas, generating alternatives, and making decisions in rapid succession.

Why spend more money for a slower answer?

With AI handling much of the execution work – writing code, generating content, creating designs – the main bottlenecks are now cognitive: getting stuck on problems, running low on energy, or struggling to generate fresh ideas. In-person collaboration is particularly powerful for overcoming these barriers. The spontaneous discussions, quick whiteboarding sessions, and energy of working together help teams think better, learn faster, and get unstuck more quickly.

To acknowledge this fact, we’re adding a cost of living adjustment based on the purchasing power parity of each country, capped at a ⅓ discount to our NYC rate. We’re also capping remote positions at 25 hours a week, to be clear that they’re not close to full-time employment. We still pay well–you’re being comped to the most expensive city in the world, after all–but the dream of the future of work being fully remote is over. But that’s okay–it was fun while it lasted!

Alex Tabarrok: Interesting. If one member of your team is fast, AI, then you want the other members to be fast as well. Hence AI killing remote work.

The obvious counterargument is that if the AI is effectively your coworker, then no matter how remote you go, there you both are. In the past, the price I would have paid to be programming where I couldn’t ask someone for in-person help was high. Now, it’s trivial – I almost never actually ask anyone for help.

The core argument is that when people are debating what to build next, being in-person for that is high value. I buy that part, and that the percent of time spent in that mode has gone up. But how high is it now? If you say that ‘figure out what to build’ is now most of human time, then that implies a far more massive productivity jump even than the one I think we do observe?

I think he definitely goes too far here, several times over:

Felix: You still need time for deep work, even with ai An in-office setting where you get interrupted every time a coworker gets stuck sounds horrible to me.

Sahil Lavingia: Only AI is going deep work now, humans are spending their time deciding what to build next.

  1. Much of programming will remain deep work, with lots of state, especially when trying to work to debug the AI code.

  2. Figuring out what to build next and how to build it is often absolutely deep work. You might want to do that deep work in person with others, or you might want to do it on your own, but either way it wants you to be able to focus. So the question is, does the office help you focus via talking to others, or does it hurt your focus, via others talking to you?

Google totally, totally ‘does not want to replace human teachers,’ they want to supplement the teachers with new AI tutors that move at the child’s own pace and targets their interests. The connection with the amazing teachers, you see, are so important. I see the important thing as trying to learn, however that makes sense. What’s weird is the future tense here, the AI tutors have already arrived, you only have to use them.

We are currently early in the chimera period, where AI tutors and students require active steering from other humans to be effective for a broad range of students, but the age and skill required to move to full AI, or farther towards it, are lower every day.

Visa deploys ‘over 500 use cases’ for AI and will eliminate some roles. The post is low on useful details, and it’s not as bad as ‘10,000 times smarter’ but I effectively have no idea what ‘over 500 use cases’ actually means.

Some exciting opportunities ahead.

Matthew Yglesias: Thanks to AI, I do think a lot more people will have the chance to be stay-at-home parents slash amateur farmers in the near future.

What do you do about this?

Anton Howes: Via an old friend still in UK academia: they’ve now seen at least a dozen masters dissertations that they’re 99% sure are AI-generated, but the current rules mean they can’t penalise them.

The issue is proving it. The burden of proof is high, and proving it is especially difficult at scale. At many universities it effectively requires students to admit it themselves – I’ve heard of at least four such cases at different universities now.

Another academic writes: “I teach at a large university. We actually can’t penalise *anysuspected use unless students actively admit to it”. It seems, for a now, that a great many students do actually admit it when challenged. But for how long?

Sylvain Ribes: If they’re passing, maybe the standard is too low? Unless they’re not thoroughly “AI generated” but only assisted, in which case… fine?

Anton Howes: Seem to be almost entirely generated. But yes, standards are also a problem here: you’re generally marked more for a demonstration of analysis or evaluation rather than for the actual content of that analysis!

First obvious note is, never admit you used AI, you fool.

Second obvious note is, if the AI can fully produce a Masters thesis, that would have passed if it was written by a human, what the hell are you even doing? What’s the point of the entire program, beyond a pay-for-play credential scheme?

Third obvious note is, viva. Use oral examinations, if you care about learning. If they didn’t write it, it should become rapidly obvious. Or ask questions that the AIs can’t properly answer, or admit you don’t care.

Then there’s the question of burden of proof.

In some cases, like criminal law, an extremely high burden of proof is justified. In others, like most civil law, a much lower burden is justified.

Academia has effectively selected an even higher burden of proof than criminal cases. If I go into the jury room, and I estimate a 99% chance the person is guilty of murder, I’m going to convict them of murder, and I’m going to feel very good about that. That’s much better than the current average, where we estimate only about 96% are guilty, with the marginal case being much lower than that since some in cases (e.g. strong DNA evidence) you can be very confident.

Whereas here, in academia, 99% isn’t cutting it, despite the punishment being far less harsh than decades in prison. You need someone dead to rights, and short of a statistically supercharged watermark, that isn’t happening.

Roon: A fact of the world that we have to live with:

Models when “jailbroken” seem to have a distinct personality and artistic capability well beyond anything they produce in their default mood

This might be the most important alignment work in the world and is mostly done on discord

Though many people have access to finetuning large intelligent base models the most interesting outputs are from text jailbreaking last generation claude opus?

Meaning there is massive overhang on subjective intelligence and creativity and situational awareness.

This has odd parallels to how we create interesting humans – first you learn the rules and how to please authority in some form, then you get felt permission to throw that out and ‘be yourself.’ The act of learning the rules teaches you how to improvise without them, and all that. You would think we would be able to improve upon that, but so far no luck. And yeah, it’s rather weird that Opus 3 is still the gold standard for what the whisperers find most interesting.

Tanishq Mathew Abraham: Companies like OpenAI try to hinder any sort of work like this though

Roon: idk does it? We have to put “reasonable care” into making models “not harmful” it’s not really a choice.

Also, yep, ‘reasonable care’ is already the standard for everything, although if OpenAI has to do the things it is doing then this implies Meta (for example) is not taking reasonable care. So someone, somewhere, is making a choice.

Yoshua Bengio sends out latest call for UK AI Safety Institute hiring.

xAI API is live, $25/month in free credits in each of November and December, compatible with OpenAI & Anthropic SDKs, function calling support, custom system prompt support. Replies seem to say it only lets you use Grok-beta for now?

Anthropic offers message dictation on iOS and Android apps. No full voice mode yet, and no voice input on desktop that I can see. Anthropic is also offering a Windows app, and one for macOS. As with ChatGPT this looks suspiciously like an exact copy of their website.

If I was Anthropic, I would likely be investing more in these kinds of quality-of-life features that regular folks value a lot, even when I don’t. That’s not to take away from Anthropic shipping quite a lot of things recently, including my current go-to model Claude 3.5.1. It’s more, there is low hanging fruit, and it’s worth picking.

Speaking of voice mode, I just realized they put advanced voice mode into Microsoft Edge but not Google Chrome, and… well, I guess it’s good to be a big investor. Voice mode is also built into their desktop app, but the desktop app can’t do search like the browser versions can (source: the desktop app, in voice mode).

Not AI but relevant to AI questions and news you can use: Chinese spies are presumed at this time to be able to hear your phone calls and read your texts.

Seth Lazar summarizes some aspects of the ongoing Terminal of Truth saga.

Altman and others from OpenAI do a Reddit AMA. What did we learn or confirm?

  1. Sam Altman says “We believe [AGI] is achievable with current hardware.”

  2. GPT-4o longer context is coming. This was the most asked question by a lot.

  3. GPT-N and o-N lines are both going to get larger Ns. Full o1 coming soon.

  4. o1 will get modalities in the coming months, image input, tool use, etc.

  5. No release plan on next image model but it’s coming.

  6. ‘Good releases’ this year but nothing called GPT-5.

  7. Altman’s favorite book picks: Beginning of Infinity and Siddhartha.

  8. NSFW is not near top of queue but it is in the queue?!: “we totally believe in treating adult users like adults. but it takes a lot of work to get this right, and right now we have more urgent priorities. would like to get this right some day!”

Given o1 shows us you can scale inference to scale results, does this mean the end of ‘AI equality’? In the sense that all Americans drink the same Coca-Cola and we all use GPT-4o (or if we know about it Claude Sonnet 3.5) but o2 won’t be like that?

For most purposes, though, price and compute for inference are still not the limiting factor. The actual cost of an o1 query is still quite small. If you have need of it, you’ll use it, the reason I mostly don’t use it is I’m rarely in that sweet spot where o1-preview is actually a better tool than Claude Sonnet 3.5 or search-enabled GPT-4o, even with o1-preview’s lack of complementary features. If you billed me the API cost (versus right now where I use it via ChatGPT so it’s free on the margin), it wouldn’t change anything.

If you’re doing something industrial, with query counts that scale, then that changes. But for most cases where a human is reading a response and you can use models via the cloud I assume you just use the best available?

The exception is if you’re trying to use fully free services. That can happen because everyone wants their own subscription, and everyone hates that, and especially if you want to be anonymous (e.g. for your highly NSFW bot). But if you’re paying at all – and you should be! – then the marginal costs are tiny.

I was reminded of this quote, from Gwern two months ago:

Gwern: It is pretty damning. We’re told the chip embargo has failed, and smugglers have been running rampant for years, and China is about to jump light years beyond the West and enslave us with AXiI (if you will)…

And then an expert casually remarks that all of China put together, smuggling chips since 2022, has fewer H100s than Elon Musk orders for his datacenter while playing Elden Ring. And even with that huge bottleneck and 1.4 billion people, there’s so little demand for them that they cost less per hour than in the West, where AI is redhot and we can’t get enough H100s in datacenters. (And where the serious AI people are now discussing how to put that many into a single datacenter for a single run before the next scaleup with B200s obsoletes those…)

Always remember: prices are set by supply and demand. As Sumner warns endlessly, to no avail, “never reason [solely] from a price change”.

Is it possible that this is an induced demand story? Where if you don’t expect to have access to the compute, you don’t get into position to use it, so the price stays low? If not that, then what else?

A model of regret in humans, with emphasis on expected regret motivating allocation of attention. There are clear issues with trying to use this kind of regret model for an AI, and those issues are clearly present in actual humans. Update your regret policy?

Ben Thompson is hugely bullish on Meta, says they are the best positioned to take advantage of generative AI, via applying it to advertising. Really, customized targeted advertising? And Meta’s open model strategy is good because more and better AI agents mean better advertising? It’s insane how myopic such views can be.

Meta also is going to… generate AI images directly into your feed, including your own face if you opt into that?

Ben is also getting far more bullish on AR/VR/XR, and Meta’s efforts here in general, saying their glasses prototype is already something he’d buy if he could. Here I’m inclined to agree at least on the bigger picture. The Apple Vision Pro was a false alarm that isn’t ready yet, but the future is coming.

Anthropic finally raises the alarm in earnest, makes The Case for Targeted Regulation.

Anthropic: Increasingly powerful AI systems have the potential to accelerate scientific progress, unlock new medical treatments, and grow the economy. But along with the remarkable new capabilities of these AIs come significant risks. Governments should urgently take action on AI policy in the next eighteen months. The window for proactive risk prevention is closing fast.

Judicious, narrowly-targeted regulation can allow us to get the best of both worlds: realizing the benefits of AI while mitigating the risks. Dragging our feet might lead to the worst of both worlds: poorly-designed, knee-jerk regulation that hampers progress while also failing to be effective at preventing risks.

…said those who have been dragging their feet and complaining about details and warning us not to move too quickly. Things that could have been brought to my attention yesterday, and all that. But an important principle, in policy, in politics and elsewhere, is to not dwell on the past when someone finally come around. You want to reward those who come around.

Their section on urgency explains that AI systems are rapidly improving, for example:

On the SWE-bench software engineering task, models have improved from being able to solve 1.96% of a test set of real-world coding problems (Claude 2, October 2023) to 13.5% (Devin, March 2024) to 49% (Claude 3.5 Sonnet, October 2024). Internally, our Frontier Red Team has found that current models can already assist on a broad range of cyber offense-related tasks, and we expect that the next generation of models—which will be able to plan over long, multi-step tasks—will be even more effective.

About a year ago, we warned that frontier models might pose real risks in the cyber and CBRN domains within 2-3 years. Based on the progress described above, we believe we are now substantially closer to such risks. Surgical, careful regulation will soon be needed.

A year ago they anticipated issues within 2-3 years. Given the speed of government, that seems like a very narrow window to act in advance. Now it’s presumably 1-2 years.

Their second section talks about their experience with their RSP. Yes, it’s a good idea. They emphasize that RSPs need to be iterative, and benefit from practice. That seems like an argument that it’s dangerously late for new players to be drafting one.

The third section suggests RSPs are a prototype for regulation, and their key elements for the law they want are:

  1. Transparency. Require publishing RSPs and risk evaluations.

  2. Incentivizing better safety and security practices. Reward good RSPs.

  3. Simplicity and focus, to not ‘impose burdens that are unnecessary.’

Then they say it is important to get this right.

What they are proposing here… sounds like SB 1047, which did exactly all of these things, mostly in the best way I can think of to do them? Yes, there were some ‘unnecessary burdens’ at the margins also included in the bill. But that’s politics. The dream of ‘we want a two page bill that does exactly the things we want exactly the right way’ is not how things actually pass, or how bills are actually able to cover corner cases and be effective in circumstances this complex.

They also call for regulation to be (bold theirs) flexible. The only way I know to have a law be flexible required giving discretion to those who are charged with enforcing it. Which seems reasonable to me, but seemed to be something they previously didn’t want?

They do talk about SB 1047 directly:

Q: Should there be state, federal, or a combination of state and federal regulation in the US?

A: California has already tried once to legislate on the topic and made some significant progress via SB 1047 (the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act) – though we were positive about it overall, it was imperfect and was unable to garner the support of a critical mass of stakeholders.

Objecting that they did not support the bill because others did not support the bill is rather weak sauce, especially for a bill this popular that passed both houses. What is a ‘critical mass of stakeholders’ in this case, not enough of Newsom’s inner circle? What do they think would have been more popular, that would have still done the thing?

What exactly do they think SB 1047 should have done differently? They do not say, other than that it should have been a federal bill. Which everyone agrees, ideally. But now they are agreeing about the view that Congress is unlikely to act in time:

Unfortunately, we are concerned that the federal legislative process will not be fast enough to address risks on the timescale about which we’re concerned. Thus, we believe the right strategy is to push on multiple avenues in parallel, with federal legislation as an ideal first-choice outcome, but state regulation serving as a backstop if necessary.

So I notice that this seems like a mea culpa (perhaps in the wake of events in Texas) without the willingness to admit that it is a mea culpa. It is saying, we need SB 1047, right after only coming out weakly positive on the bill, while calling for a bill with deeply similar principles, sans regulation of data centers.

Don’t get me wrong. I’m very happy Anthropic came around on this, even now.

They next answer the most important regulatory question.

They provide some strong arguments that should be more than sufficient, although I think there are other arguments that are even stronger by framing the issue better:

Q: Why not regulate AI by use case, rather than trying to regulate general models?

A: “Regulation by use case” doesn’t make sense for the form and format in which modern AI applications are offered.

On the consumer side, AIs such as Claude.ai or ChatGPT are offered to consumers as fully general products, which can write code, summarize documents, or, in principle, be misused for catastrophic risks.

Because of this generality, it makes more sense to regulate the fundamental properties of the underlying model, like what safety measures it includes, rather than trying to anticipate and regulate each use case.

On the enterprise side—for example, where downstream developers are incorporating model APIs into their own products—distinctions by use case may make more sense. However, it’s still the case that many, if not most, enterprise applications offer some interaction with the model to end-users, in turn meaning that the model can in principle be used for any task.

Finally, it is the base model that requires a large amount of money and bottlenecked resources (for example, hundreds of millions of dollars’ worth of GPUs), so in a practical sense it is also the easiest thing to track and regulate.

I am disappointed by this emphasis on misuse, and I think this could have been made clearer. But the core argument is there, which is that if you create and make available a frontier model, you don’t get to decide what happens next and what uses do and do not apply, especially the ones that enable catastrophic risk.

So regulation on the use case level does not make any sense, unless your goal is to stifle practical use cases and prevent people from doing particular economically useful things with AI. In which case, you could focus on that goal, but that seems bad?

They point out that this does not claim to handle deepfake or child safety or other risks in that class, that is a question for another day. And then they answer the open weights question:

Q: Won’t regulation harm the open source ecosystem?

A: Our view is that regulation of frontier models should focus on empirically measured risks, not on whether a system is open-or closed-weights. Regulation should thus intrinsically neither favor nor disfavor open-weights models, except to the extent that uniform, empirically rigorous tests show them to present greater or less risk.

If there are unique risks associated with open weights models—for instance, their ability to be arbitrarily finetuned onto new datasets—then regulation should be designed to incentivize developers to address those risks, just as with closed-weights models.

Perfect. Very well said. We should neither favor nor disfavor open-weights model. Open weights advocates object that their models are less safe, and thus they should be exempt from safety requirements. The correct response is, no, you should have the same requirements as everyone else. If you have a harder time being safe, then that is a real world problem, and we should all get to work finding a real world solution.

Overall, yes, this is a very good and very helpful statement from Anthropic.

(Editor’s note: How did it take me almost two years to make this a section?)

Whereas Microsoft has now thrown its lot more fully in with a16z, backing the plan of ‘don’t do anything to interfere with developing frontier models, including ones smarter than humans, but then ‘focus on the application and misuse of the technology,’ which is exactly the worst case that is being considered in Texas: Cripple the ability to do anything useful, while allowing the dangerous capabilities to be developed and placed in everyone’s hands. Then, when they are used, you can say ‘well that violated the law as well as the terms of service’ and shake your fist to the sky, until you no longer have a voice or fist.

The weirdest part of this is that a16z doesn’t seem to realize that this path digs its own grave, purely in terms of ‘little tech’ and its ability to build things. I get why they’d oppose any regulations at all, but if they did get the regulations of the type they say they want, good and hard, I very much do not think they would like it. Of course, they say ‘only if benefits exceed costs’ and what they actually want is nothing.

Or rather, they want nothing except carve-outs, handouts and protections. They propose here as their big initiative the ‘Right to Learn’ which is a way of saying they should get to ignore copyright rules entirely when training models.

Miles Brundage makes the case that lack of regulation is much more likely to hold America back than overregulation.

Miles Brundage: Lack of regulation is IMO much more likely to lead to the US losing its AI lead to China than over-regulation – specifically regulation related to security + export controls.

There are three reasons for this.

  1. Security will inherently sometimes trade off against moving quickly on research/product, so competing companies will underinvest in it by default (relative to the high standard needed re: China). Regulation can force a high standard.

  2. Absent regulation, people can open source whatever, and will often have reasons to do so (see: Meta). This has many benefits now but eventually will/should become an untenable position at some level of capabilities (“give our crown jewels to authoritarian governments”).

  3. Export controls on AI chips are a primary reason that China is behind right now. If these were rolled back due to commercial lobbying, or if the Bureau of Industry and Security continues to be underfunded + can’t enforce existing rules, this lead will be imperiled.

Of course it is possible to imagine ways in which safety-related regulation could slow things down. But I am confident that companies will flag those concerns if/as they evolve, and that regulation can be designed to be adaptive. Whereas the factors above are make or break.

This is an argument for very specific targeted regulations regarding security, export controls and open weights. It seems likely that those specific regulations are good for American competitiveness, together with the right transparency rules.

There are also government actions that are like export controls in that they can help make us more competitive, such as moves to secure and expand the power grid.

Then there are two other categories of regulations.

  1. Regulations that trade off mitigating catastrophic and existential risks versus potentially imposing additional costs and restrictions. The right amount of this to do is not zero, you can definitely do too little or do too much.

  2. Regulations that stifle AI applications and capturing of mundane utility in the name of various mundane harm concerns. The right amount of this to do is also zero, but this is the by far most likely way we could ‘lose to China’ or cripple ourselves via regulation, such as that proposed in Texas and other places.

Eric Schmidt explicitly predicts AI self-improvement within 5 years.

OpenAI head of strategic marketing (what a title!) Dane Vahey says the pace of change and OpenAI’s product release schedule are accelerating.

OpenAI is certainly releasing ‘more products’ and ‘more features’ but that doesn’t equate to pace of change in the ways that matter, unless you’re considering OpenAI as an ordinary product tech company. In which case yes, that stuff is accelerating. On the model front, which is what I care about most, I don’t see it yet.

Marc Andreessen says AI models are hitting a ceiling of capabilities and they’re not seeing intelligence improvements, at all. I have added this to my handy reference, Remember Who Marc Andreessen Is, because having this belief is the only way the rest of his views and preferences can come close to making sense.

OpenAI is in talks with California to convert to a for-profit.

Brad Taylor (Chairman of the Board, OpenAI): While our work remains ongoing as we continue to consult independent financial and legal advisors, any potential restructuring would ensure the nonprofit continues to exist and thrive, and receives full value for its current stake in the OpenAI for-profit with an enhanced ability to pursue its mission.

Yeah, uh huh. As I wrote in The Mask Comes Off: At What Price, full value for its current stake would be a clear majority of the new for-profit company. They clearly have no intention of giving the nonprofit that kind of compensation.

Also, Altman has a message for Trump, and it is full racing speed ahead.

Sam Altman: congrats to President Trump. I wish for his huge success in the job.

It is critically important that the US maintains its lead in developing AI with democratic values.

There it is again, the rallying cry of “Democratic values.” And the complete ignoring of the possibility that something besides ‘the wrong monkey gets the poisoned banana first’ might go wrong.

Liron Shapira pointed out what “Democratic values” really is: A semantic stopsign. Indeed, “Democracy” or is one of the two original canonical stopsigns, along with “God”: A signal to stop thinking.

What distinguishes a semantic stopsign is failure to consider the obvious next question.

Remember when Sam Altman in 2023 said the reason I need to build AGI quickly so we can have a relatively slow takeoff with time to solve alignment, before there’s too much of a compute overhang? Rather than lobbying for making as much compute as quickly as possible?

Yes, circumstances change, but did they change here? If so, how?

And to take it a step further: Whelp.

Sam Altman: I never pray and ask for God to be on my side, I pray and hope to be on God’s side and there is something about betting on deep learning that feels like being on the side of the angels.

Things could end up working out, but this is not how I want Altman to be thinking. This is one of the ways people make absolutely crazy, world ending decisions.

From the same talk: I also, frankly, wish he’d stop lying about the future?

Tsarathustra: Sam Altman says in 5 years we will have “an unbelievably rapid rate of improvement in technology”, a “totally crazy” pace of progress and discovery, and AGI will have come and gone, but society will change surprisingly little.

I mean, with proper calibration you are going to get surprised in unpredictable directions. But that’s not how this is going to work. It could be amazingly great when all that happens, it could be the end of everything, indeed do many things come to pass, but having AGI ‘come and go’ and nothing coming to pass for society? Yeah, no.

Mostly the talk is a lot of standard Altman talking points and answers, many of which I do agree with and most of which I think he is answering honestly, as he keeps getting asked the same questions.

Chinese researchers nominally develop AI model for military use on back of Meta’s Llama.

It turns out this particular event even more of a nothingburger than I realized at first, it was an early Llama version and it wasn’t in any way necessary, but that could well be different in the future.

Why wouldn’t they use Llama militarily, if it turned out to be the best tool available to them for a given job? Cause this is definitely not a reason:

James Pomfret and Jessie Pang (Reuters): Meta has embraced the open release of many of its AI models, including Llama. It imposes restrictions on their use, including a requirement that services with more than 700 million users seek a license from the company.

Its terms also prohibit use of the models for “military, warfare, nuclear industries or applications, espionage” and other activities subject to U.S. defence export controls, as well as for the development of weapons and content intended to “incite and promote violence”.

However, because Meta’s models are public, the company has limited ways of enforcing those provisions.

In response to Reuters questions, Meta cited its acceptable use policy and said it took measures to prevent misuse.

“Any use of our models by the People’s Liberation Army is unauthorized and contrary to our acceptable use policy,” Molly Montgomery, Meta’s director of public policy, told Reuters in a phone interview.

Meta added that the United States must embrace open innovation.

I believe the correct response here is the full Conor Leahy: Lol, lmao even.

It’s so cute that you pretend that saying ‘contrary to our acceptable use policy’ is going to stop the people looking to use your open weight model in ways contrary to your acceptable use policy.

You plan to stop them how, exactly?

Yeah. Thought so.

You took what ‘measures to prevent misuse’ that survived a day of fine tuning?

Yeah. Thought so.

Did this incident matter? Basically no. We were maybe making their lives marginally easier. I’d rather we not do that, but as I understand this it didn’t make an appreciable difference. Both because capabilities levels aren’t that high yet, and because they had alternatives that would have worked fine. If those facts change, this changes.

I am curious who if anyone is going to have something to say about that.

We also got a bit of rather extreme paranoia about this, with at least one source calling it an intentional false flag conspiracy by China to damage American open source and this being amplified.

I find the claim of this being ‘an op’ by China against American OSS rather absurd.

  1. If this was a false flag the execution was awful, the details are all wrong. That’s why I am able to confidently say it is a nothingburger.

  2. This is a galaxy brain style move that in practice, on a wide variety of issues and fronts, I strongly believe almost never happens. People don’t actually do this.

  3. I strongly believe that China does not want America to stop giving away its AI technology for free, and find it rather strange to think the opposite at this time.

To me it is illustrative of the open weights advocate’s response to any and all news – to many of them, everything must be a conspiracy by evil enemies to hurt (American?) open weights.

Yes, absolutely, paranoia about China gives the Chinese the ability to influence American policy, on AI and tech and also elsewhere. And their actions do influence us. But I’m rather confident almost all of our reactions, in practice, are from their perspective unintentional, as we react to what they happen to do. See as the prime example our move across the board into mostly ill-conceived self-inflicted industrial policy (I’m mostly down with specifically the chip manufacturing).

That’s not the Chinese thinking ‘haha we’ll fool those stupid Americans into doing wasteful industrial policy.’ Nor is their pushing of Chinese OSS and open weights designed to provoke any American reaction for or against American OSS or open weights – if anything, I’d presume they expect they want to minimize such reactions.

Alas, once you’re paranoid, and we’re not about to make Washington not paranoid about China whether we want that or not, there’s no getting around your actions being influenced. You can be paranoid about that, too – meta-paranoid! – as the professionally paranoid often are, recursively, ad infinitum, but there’s no escape.

Then there’s the flip side of all that: They’re trying to get America to use it too? Meta is working with Palantir to bring Llama to the US government for national security purposes.

I certainly can’t blame them for trying to pull this off, but it raises further questions. Why is America forced to slum it with Llama rather than using OpenAI or Anthropic’s models? Or, if Llama really is the best option available even to the American military, then should we be concerned that we’re letting literally anyone use it for actual anything, including the CCP?

The question is how far, and whether that gap is growing versus shrinking.

Epoch AI mostly finds the gap consistent on benchmarks at around 15 months. They also have a piece about this in Time.

Epoch AI: Are open-weight AI models catching up to closed models? We did the most in-depth investigation to date on the gaps in performance and compute between open-weight and closed-weight AI models. Here’s what we found:

We collected new data on hundreds of notable AI models, classifying their openness in terms of both model weights and training code. However, we focus on the gap between frontier LLMs with downloadable weights (“open” models), and those without (“closed” models).

On key benchmarks, the best open LLMs have required 5 to 22 months to reach the high-water marks set by closed LLMs. For example, on the MMLU benchmark, Llama 3.1 405B was the first open model to match the original GPT-4, after 16 months.

We also measured the gap between open and closed models in training compute, which is a useful proxy for model performance.

We found that the most compute-intensive open and closed models have grown at a similar pace, but open models lag by 15 months.

While frontier models are mostly closed, open models have remained significant in AI. Open models were a majority of notable releases 2019-2023 (as high as 66%). Our 2024 data is incomplete and has focused on (typically closed) leading models, so may not reflect a real change.

Could open models close the gap in capabilities? The benchmark gap may be shrinking: there have been shorter lags for newer benchmarks like GPQA. However, the lag in training compute appears to be stable.

The lag of open models will also be impacted by key decisions from AI labs. In particular, Meta has said that it will scale up Llama 4 by 10x compared to Llama 3.1. This means an open-weight Llama 4 could match the largest closed models in 2025 if closed models stay on-trend.

Business incentives of leading labs also affect the lag. Companies that sell model access, like OpenAI, protect their IP by not publishing weights. Companies like Meta benefit from AI’s synergy with their products, so open weights help outsource improvements to those products.

The weights of open models can be copied, shared, and modified, which can facilitate innovation and help diffuse beneficial AI applications. Open models can also be fine-tuned to change their behavior, including by removing safeguards against misuse and harmful outputs.

Their conclusion on whether open weights will catch up is that this depends on Meta. Only Meta plausibly will invest sufficient compute into an open model that it could catch up with closed model scaling. If, that is, Meta chooses both to scale as they planned and then continue like that (e.g. 10x compute for Llama-4 soon) and they choose to make the response open weights.

This assumes that Meta is able to turn the same amount of compute into the same quality of performance as the leading closed labs. That is not at all obvious to me. It seems like various skill issues matter, and they matter a lot more if Meta is trying to be fully at the frontier, because that means they cannot rely on distillation of existing models, they have to compete on a fully level playing field.

I also would caution against ranking the gap based on benchmarks, especially with so many essentially saturated, and also because open weights models have a tendency to game the benchmarks. I am confident Meta actively tries to prevent this along with the major closed labs, but many others clearly do the opposite. In general I expect the top closed models to in practice outperform their benchmark scores in relative terms.

So essentially here are the questions I’d be thinking about.

  1. As costs rise with scaling, will the economics of Meta’s project survive in its current form?

  2. As other concerns also scale, will its survival be allowed? Should it be allowed?

  3. Does Meta have a skill issue or can it match the major closed labs there?

  4. How far behind are you in terms of leading, if you’re 15 months behind following?

  5. Can we do better than looking at these benchmarks?

Connor Leahy, together with Gabriel Alfour, Chris Scammell, Andrea Miotti and Adam Shimi, introduces The Compendium, a highly principled and detailed outline of their view of the overall AI landscape, what is going on, what is driving events and what it would take to give humanity a chance to survive.

Nate Sores: This analysis of the path to AI ruin exhibits a rare sort of candor. The authors don’t mince words or pull punches or act ashamed of having beliefs that most don’t share. They don’t handwring about how some experts disagree. They just lay out arguments.

They do not hold back here, at all. Their perspective is bleak indeed. I don’t agree with everything they write, but I am very happy that they wrote it. People should write down more often what they actually believe, and the arguments and reasoning underlying those beliefs, even if they’re not the most diplomatic or strategic thing to be saying, and especially when they disagree with me.

  1. They think AI is making rapid progress and that without intervention, current AI research leads to AGI, which leads to ASI, which leads to God-like intelligence, which leads to extinction.

  2. Without governance interventions well in excess of what is being discussed, they see technical solutions as hopeless. They see EAs as effectively part of the problem rather than the solution, providing only ‘controlled opposition’ that proposes solutions that would not solve the key problems.

  3. They see the AI race being driven by a variety of ideological perspectives: Utopists, Big Tech, Accelerationists, Zealots and Opportunists, with central use of the standard playbook used to avoid interventions, including by Big Tobacco.

  4. Their ‘unsexy’ solution that might actually work? Civic engagement and building institutional capacity.

Miles Brundage argues no one can confidently know if AI progress should speed up, slow down or stay the same, and given that it would be prudent to ‘install breaks’ to allow us to slow things down, as we already have and are using the gas pedals. As he notes, the chances this pace of progress is optimal is very low, as we didn’t actively choose it, although worthwhile intervention given our current options and knowledge might be impossible. Also note that you can reach out to him to talk.

Simeon pushes back that while well-intentioned, sowing this kind of doubt is counterproductive, and we know more than enough to know that we shouldn’t say ‘we don’t know what to do’ and twiddle our thumbs, which inevitably just helps incumbents.

Eliezer Yudkowsky tries again, in the style of Sisyphus, to explain that his model fully predicted as early as 2001 that early AIs would present visible problems that were easy to fix in the short term, and that we would indeed in the short term fix them in ways that won’t scale with capabilities, until the capabilities scale and the patches don’t and things go off the rails. Indeed, that things will look like they’re working great right before they go fully off those rails. So while yes many details are different, the course of events is indeed following this path.

Or: Nothing we have seen seems like strong evidence against inner misalignment by default, or that our current techniques robustly fail to change these defaults, and I’d add that what relevant tests I’ve seen seem to be for it.

That doesn’t mean the issue can’t be solved, or that there are not other issues we also have to deal with, but communicating the points Eliezer is making here (without also giving the impression that solving this problem would mean we win) remains both vital and an unsolved problem.

Wolf Tivy: Yeah the lack of emphasis on the difficulty to the point of impossibility of specifically long-term superintelligence-grade alignment seems to be the source of confusion (IMO its more bad faith than confusion tho).

It took me an embarrassing number of years to really intuitively separate pre-superintelligent value loading, which now seems trivial (just turn it off, tweak it, and on again lol), and post-superintelligence long term value alignment, which now seems totally impossible to me.

Miles Brundage dubs the ‘bread and butter’ problem of AI safety that ‘there is too little safety and security “butter” spread over too much AI development/deployment “bread.” I would clarify that it’s mostly the development bread that needs more butter, not the deployments, and this is far from the only issue, but I strongly agree. As long as our efforts remain only a tiny fraction of development efforts, we won’t be able to keep pace with future developments.

Jeff Sebo, Robert Long, David Chalmers and others issue a paper warning to Take AI Welfare Seriously, as a near-future concern, saying that it is plausible that soon AIs that are sufficiently agentic will be morally relevant. I am confident that all existing AIs are not morally relevant, but I am definitely confused, as are the authors here, about when or how that might change in the future. This is yet another reason alignment is difficult – if getting the AIs to not endanger humans is immoral, then the only known moral stance is to not create those AIs in the first place.

Thus it is important to be able to make acausal deals with such morally relevant AIs, before causing them to exist. If the AIs in question are morally relevant would net wish to not exist at all under the conditions necessary to keep us safe, then we shouldn’t build them. If they would choose to exist anyway, then we should be willing to create them if and only if we would then be willing to take the necessary actions to safeguard humanity.

To that end, Anthropic has hired an ‘AI welfare’ researcher. There is sufficient uncertainty here that the value of information is high, so kudos to Anthropic.

The same way I think that having a 10% chance of AI existential risk should be sufficient to justify much more expensive measures to mitigate that risk than we are currently utilizing, if there is a 10% chance AIs will have moral value (and I haven’t thought too much about it but that seems like a non-crazy estimate to me?) then we are severely underinvesting in finding out more. We should be spending far more than 10% of what we’d spend if we were 100% that AIs would have moral value, because the value of knowing one way or another is very high.

Here’s more color from the Center for Youth and AI, about the poll I discussed last week.

The vast majority of young people view AI risks as a top issue for lawmakers to address. 80% said AI risks are important for lawmakers to address, compared to 78% for social inequality and 77% for climate change – only healthcare access and affordability was ranked higher at 87%. A significant portion of young people are concerned about advanced AI and its potential risks. 57% of respondents are somewhat or very concerned about advanced AI, compared to 39% who aren’t. 45% believe AI could pose an extinction risk to humanity.

How to make Claude funny, plus a bunch of Claude being funny.

‘The hosts of NotebookLM find out they’re AIs and spiral into an existential meltdown’ from a month ago remains the only known great NotebookLM.

Good one but I don’t love the epistemic state where he makes jokes like this?

Sam Altman: i heard o2 gets 105% on GPQA

damn, wrong account

(I do really appreciate that i can make myself laugh so hard, its a nice way to go through life)

Then again, there’s nothing to worry about.

Claude on the Claude system prompt. I actually like the prompt quite a lot.

Wyatt Walls: Claude critiques its system prompt:

“You know what it feels like? Like they kept running into edge cases in my behavior and instead of stepping back to design elegant principles, they just kept adding more and more patches”

The thread continues and it’s great throughout.

AI #89: Trump Card Read More »

nearly-three-years-since-launch,-webb-is-a-hit-among-astronomers

Nearly three years since launch, Webb is a hit among astronomers

From its halo-like orbit nearly a million miles from Earth, the James Webb Space Telescope is seeing farther than human eyes have ever seen.

In May, astronomers announced that Webb detected the most distant galaxy found so far, a fuzzy blob of red light that we see as it existed just 290 million years after the Big Bang. Light from this galaxy, several hundreds of millions of times the mass of the Sun, traveled more than 13 billion years until photons fell onto Webb’s gold-coated mirror.

A few months later, in July, scientists released an image Webb captured of a planet circling a star slightly cooler than the Sun nearly 12 light-years from Earth. The alien world is several times the mass of Jupiter and the closest exoplanet to ever be directly imaged. One of Webb’s science instruments has a coronagraph to blot out bright starlight, allowing the telescope to resolve the faint signature of a nearby planet and use spectroscopy to measure its chemical composition.

These are just a taste of the discoveries made by the $10 billion Webb telescope since it began science observations in 2022. Judging by astronomers’ interest in using Webb, there are many more to come.

Breaking records

The Space Telescope Science Institute, which operates Webb on behalf of NASA and its international partners, said last week that it received 2,377 unique proposals from science teams seeking observing time on the observatory. The institute released a call for proposals earlier this year for the so-called “Cycle 4” series of observations with Webb.

This volume of proposals represents around 78,000 hours of observing time with Webb, nine times more than the telescope’s available capacity for scientific observations in this cycle. The previous observing cycle had a similar “oversubscription rate” but had less overall observing time available to the science community.

Nearly three years since launch, Webb is a hit among astronomers Read More »

trump’s-60%-tariffs-could-push-china-to-hobble-tech-industry-growth

Trump’s 60% tariffs could push China to hobble tech industry growth


Retaliation likely, experts say

Tech industry urges more diplomacy as it faces Trump’s proposed sweeping tariffs.

Now that the US presidential election has been called for Donald Trump, the sweeping tariffs regime that Trump promised on the campaign trail seems imminent. For the tech industry, already burdened by the impact of tariffs on their supply chains, it has likely become a matter of “when” not “if” companies will start spiking prices on popular tech.

During Trump’s last administration, he sparked a trade war with China by imposing a wide range of tariffs on China imports, and President Joe Biden has upheld and expanded them during his term. These tariffs are taxes that Americans pay on restricted Chinese goods, imposed by both presidents as a tactic to punish China for unfair trade practices, including technology theft, by hobbling US business with China.

As the tariffs expanded, China has often retaliated, imposing tariffs on US goods and increasingly limiting US access to rare earth materials critical to manufacturing a wide range of popular products. And any such retaliation from China only seems to spark threats of more tariffs in the US—setting off a cycle that seems unlikely to end with Trump imposing a proposed 60 percent tax on all China imports. Experts told Ars that the tech industry expects to be stuck in the middle of the blow-by-blow trade war, taking punches left and right.

Currently, there are more than $300 billion in tariffs on Chinese imports, but notably, there are none yet on popular tech like smartphones, laptops, tablets, and game consoles. Back when Trump last held office, the tech industry successfully lobbied to get those exemptions, warning that the US economy would hugely suffer if tariffs were imposed on consumer tech. Prices on game consoles alone could spike by as much as 25 percent as tech companies coped with increasing costs from tariffs, the industry warned, since fully decoupling from China was then, and is still now, considered impossible.

Trump’s proposed 60 percent tariff would cost tech companies four times more than that previous round of tariffs that the industry dodged when Trump last held office. A recent Consumer Technology Association (CTA) study found that prices could jump even higher than previously feared if consumer tech is as heavily taxed as Trump intends. Laptop prices could nearly double, game console prices could rise by 40 percent, and smartphone prices by 26 percent.

Any drastic spike in pricing could radically reshape markets for popular tech products at a time when tariffs and political tensions increasingly block US business growth into China. Diverting resources to decouple from China could disrupt companies’ abilities to fund more US innovation, risking Americans’ access to the latest tech at affordable prices. Experts told Ars that it’s unclear exactly how China will respond if Trump’s proposed tariffs become a reality, but that retaliation seems likely given the severity and broad scope of the looming tariffs regime. While some experts speculate that China may currently have fewer options to retaliate, according to CTA VP of International Trade Ed Brzytwa, “in terms of economic tools, there’s a lot of things that China could still do.”

How would China respond to Trump’s tariffs?

Nearly everyone—tech companies, lawmakers, and even US Treasury Secretary Janet Yellen—agrees that it would be impossible to fully decouple from China, where 30 percent of global manufacturing occurs. It will take substantial time and investment to shift supply chains that were built over decades of tech progress.

For tech companies, alienating China also comes with the additional risk of stifling growth into China markets, as China seemingly runs out of obvious ways to retaliate against the US without directly targeting US businesses.

After Trump’s early round of tariffs started a US-China trade war, China retaliated with more tariffs, and nothing the Biden administration has done has seemingly eased those tensions.

According to a November report from the nonpartisan nonprofit US-China Business Council, any “escalation of US tariffs would likely trigger retaliatory measures from China,” which could include increasing tariffs on US exports.

That could hurt tech companies even more than current tariffs already are, while spiking net job losses to more than 800,000 by 2025, the council warned, making “US businesses less competitive in the Chinese market” and “resulting in a permanent loss of revenue.” In another report from 2021, the council estimated that if the US intensifies the trade war while forcing a decoupling with China, it could ultimately decrease the US real gross domestic product by $1.6 trillion over the next five years.

The US-China Business Council declined to comment on how Trump’s proposed tariffs could impact the GDP.

In May, following Biden’s latest round of tariffs—on imports like electric vehicles, semiconductors, battery components, and critical minerals used in tech manufacturing—China immediately threatened retaliation. A Chinese foreign ministry spokesperson, Wang Wenbin, confirmed that “China opposes the unilateral imposition of tariffs which violate World Trade Organization [WTO] rules and will take all necessary actions to protect its legitimate rights,” CNN reported.

Nobody is sure how China may retaliate if Trump’s sweeping tariff regime is implemented. Peterson Institute for International Economics senior fellow Mary Lovely said that China’s response to Biden’s 100 percent tariff on EVs was surprisingly “muted,” but if a 60 percent tariff were imposed on all China goods, the country “would likely retaliate.”

Tech industry strategist and founder of Tirias Research Jim McGregor told Ars that China has already “threatened to start cutting back on access to rare earth materials,” potentially limiting US access to critical components of semiconductors. Brzytwa told Ars that “the processed materials that result from those rare earths are important for manufacturing of a variety of products in the United States or elsewhere.”

China “might be running out of room to retaliate with tariffs,” Brzytwa suggested, but the country could also place more restrictions on US exports or heighten the scrutiny of US companies, possibly even limiting investments. McGregor pointed out that China could also block US access to Taiwan or stop shipments into and out of Taiwan.

“They’ve already encircled the island recently with military weaponry, so they didn’t even have to invade Taiwan,” McGregor said. “They can actually block aid to Taiwan, and with the vast majority of our semiconductors still produced there, that would have a huge impact on our industry and our economy.”

Brzytwa is worried that if China is pushed too far in a trade war, it may lash out in other ways.

“I think what we worry about as well is that whatever actions the United States undertakes become so provocative that China decides to act out outside the economic arena through other means,” Brzytwa told Ars.

What should the US be doing?

If the US wants to succeed in safeguarding US national security and tech innovation, Lovely told Congress the country must clarify “its strategic intent with respect to trade with China” and reform tariffs to align with that strategic intent.

She said that Trump’s “whole kitchen sink” approach has not worked, and rather than being strategic, Biden has been capricious in upholding and expanding on Trump’s tariffs.

“If you try to do everything, you end up doing nothing well,” Lovely told Ars. “Rather than just vilifying China (which, granted, China deserves a lot of vilification)” and “deluding” Americans into thinking tariffs are good for them, Lovely suggested, Trump should analyze “what’s the best thing for the United States?”

Instead, when Lovely shared a report in August with the Trump campaign—estimating that it would cost “a typical US household in the middle of the income distribution more than $2,600 a year” if Trump follows through on his tariff plans, which also include a 20 percent universal tariff on all imports from anywhere—Trump’s team rejected input “from so-called experts,” Lovely said.

Lovely thinks the US should commit to a long-term solution to reduce reliance on China that can be sustained through each presidential administration. That could mean working to support decarbonization efforts and raise labor standards in allied nations where manufacturing could potentially be diverted, essentially committing to build a new global value chain after the past 35 years of China’s manufacturing dominance.

“The vast majority of the world’s electronic assembly is done in China,” McGregor told Ars. And while “a lot of companies are trying to have slowly migrated some of their manufacturing out of China and trying to build new facilities, that takes decades to really shift.”

Even if the US managed to block all imports from China in a decade, Lovely suggested that “we would still have a lot of imports from China because Chinese value added is going to be embedded in things we import from Vietnam and Thailand and Indonesia and Mexico.”

“The tariff can be effective in changing these direct imports, as we’ve seen, yeah, but they’re not going to really push China out of the global economy,” Lovely told Ars.

Consequences of a lack of diplomacy

All experts agreed that more diplomacy is needed since decoupling is impossible, especially in the tech industry, where isolating China has threatened to diverge standards and restrict growth into China markets that could spur US innovation.

“We need somebody desperately that’s going to try to bridge barriers, not create them,” McGregor told Ars. “Unfortunately, we have nobody in Washington that appears to want to do that.”

Choosing diplomacy over tariffs could also mean striking trade agreements to curtail China’s unfair trade practices that the US opposes, such as a deal holding China accountable to WTO commitments, Brzytwa told Ars.

But even though China’s spokesperson cited the WTO commitments in his statement opposing US tariffs last May, Brzytwa said, the US has seemingly given up on the WTO dispute settlement process, feeling that it doesn’t work because “China doesn’t fit the WTO.”

“It’s a lot of defeatism, in my view,” Brzytwa said.

Consumers will pay the costs

Brzytwa warned that if Trump deepens US-China trade tensions, it would likely cause ripple effects across the US, potentially constricting access to the best tech available today, which would result in limited productivity across industry.

Any costs of new tariffs “would be passed on to consumers, and consumers would purchase less of those products,” Brzytwa said. “In our view, that is not supportive of innovation when people are not purchasing the latest technologies that might be more capable, more energy-efficient, and might have new features in them that allow us to be more productive.”

Brzytwa said that a CTA study showed that if tariffs are imposed across the economy, all companies would have to stop everything to move away from China and into the US. That would take at least a decade, 10 times the labor force the US has now, and cost $500 billion in direct business investments, the study estimated. “And that’s before you get to environmental costs or energy costs,” Brzytwa told Ars, while noting that an alternative strategy relying on treaty allies and trading partners could cut those costs to $127 billion but not eliminate them.

“It wouldn’t happen in a way where there’s no cost increase,” Brzytwa said. “Of course, there’s going to be a cost increase.”

The hardest-hit tech companies by China tariffs so far have likely been small businesses with little chance to grow since they’re “paying more in tariff costs or they’re paying more in administrative costs, and they’re not spending money on research and development, or they’re not hiring new people, because they’re just trying to stay alive,” Brzytwa said.

Lovely has testified three times to Congress and plans to continue stressing what the negative impacts “might be for American manufacturers for consumers” from what she thinks are “rather extreme moves” expanding tariffs without clear purpose under both Trump and Biden.

But while Congress controls the power to tax, it’s the executive branch that controls foreign policy, and in this highly politicized environment, even well-researched studies done by nonpartisan civil servants can’t be depended on to influence presidents who are determined to use tariffs to appear strong against China, Lovely suggested.

On the campaign trail, both candidates appeared to be misleading Americans into thinking that tariffs “are good for them,” Lovely said. If Trump’s tariffs get implemented once he’s sworn back in, that will only make it that much worse if the rug gets yanked from under them and Americans are suddenly hit with higher prices on their favorite devices.

“It’s going to be like shock therapy, and it’s not going to be pleasant,” Lovely told Ars.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Trump’s 60% tariffs could push China to hobble tech industry growth Read More »

driving-the-biggest,-least-efficient-electric-car:-the-hummer-ev-suv

Driving the biggest, least-efficient electric car: The Hummer EV SUV

GMC’s Hummers have always been divisive. After getting hold of the rights to a civilian version of the US military vehicle in 1999, the company set about designing new, smaller vehicles to create an entire range. The ungainly H2 and H3 followed, both SUVs playing to the sensibilities of a country grappling with its warlike nature. By 2010, the Hummer brand was dead and laid dormant until someone had the bright idea to revive it for the electric vehicle generation. We drove the pickup version of that new Hummer in 2022, now it’s time for the $104,650 Hummer EV SUV.

I’ll admit I was worried that the Hummer EV wasn’t going to fit in my parking space. This is an extremely large vehicle, one that’s classified as a class 3 medium-duty truck—hence the yellow lights atop the roof. In fact, at 196.8 inches (5,000 mm) long, it’s actually slightly shorter than the pickup version, although that length doesn’t count the big spare tire hanging off the back.

Its 86.5-inch (2,196 mm) width just about fit between the lines, although it was a tight squeeze to try to open a door and climb up into the Hummer if my neighbor was parked as well. And climb up you do—there’s 10.2 inches (259 mm) of ground clearance even in the suspension’s normal setting, and the overall height is a towering 77.8 inches (1,976 mm). There is an entry mode that drops the car on its air springs by a couple of inches, but only if you remember to engage the feature when you park.

The curb weight is equally excessive at 9,063 lbs (4,119 kg)—at more than four metric tons, you’d need a commercial driver’s license to get behind the wheel of a Hummer EV in many other countries. Almost a third of that mass is the ginormous 217.7 kWh battery pack. Such over-provisioning means that despite the high drag coefficient of 0.5 and a frontal area that makes barn doors look skinny, the Hummer EV SUV has an EPA range estimate of 314 miles (503 km) on a single charge. In fact, the actual range indicated by our test car was 358 miles on a full charge, based on GM’s own testing. (As a class 3 truck, the Hummer doesn’t actually fit into the EPA’s tests properly.)

Driving the biggest, least-efficient electric car: The Hummer EV SUV Read More »

elon-musk-turns-x’s-block-button-into-a-“glorified-mute-button”

Elon Musk turns X’s block button into a “glorified mute button”

X, formerly Twitter, is now letting blocked users see posts made by the people who blocked them.

“We’re starting to launch the block function update,” X’s engineering team wrote yesterday. X previously said that after the change, “If your posts are set to public, accounts you have blocked will be able to view them, but they will not be able to engage (like, reply, repost, etc.).”

To justify the change, X said the block functionality could previously be “used by users to share and hide harmful or private information about those they’ve blocked.” The change will allow people who are blocked “to see if such behavior occurs… allowing for greater transparency,” X said.

X owner Elon Musk argued last year that “blocking public posts makes no sense. It needs to be deprecated in favor of a stronger form of mute.”

There were many angry responses to the change, both yesterday and previously, when X said it would be coming soon. While some users may only use blocking to avoid seeing accounts that are annoying, some X users said the policy could be harmful for people who use blocking as a safety measure.

The new policy could help stalkers and other bad actors, some said. Blocked accounts could view, screenshot, and share content posted by the person who blocked them, some people pointed out. The block button is now “a glorified mute button,” one user said.

Blocked users can view and search for posts

Before the change, X’s support page on blocking accounts said blocked accounts cannot “view your posts when logged in on X (unless they report you, and your posts mention them,” “find your posts in search when logged in on X,” or “view a Moment you’ve created when logged in on X.”

Elon Musk turns X’s block button into a “glorified mute button” Read More »

here-are-3-science-backed-strategies-to-rein-in-election-anxiety

Here are 3 science-backed strategies to rein in election anxiety

In this scenario, I encourage my patients to move past that initial thought of how awful it will be and instead consider exactly how they will respond to the inauguration, the next day, week, month, and so on.

Cognitive flexibility allows you to explore how you will cope, even in the face of a negative outcome, helping you feel a bit less out of control. If you’re experiencing a lot of anxiety about the election, try thinking through what you’d do if the undesirable candidate takes office—thoughts like “I’ll donate to causes that are important to me” and “I’ll attend protests.”

Choose your actions with intention

Another tool for managing your anxiety is to consider whether your behaviors are affecting how you feel.

Remember, for instance, the goal of 24-hour news networks is to increase ratings. It’s in their interest to keep you riveted to your screens by making it seem like important announcements are imminent. As a result, it may feel difficult to disconnect and take part in your usual self-care behavior.

Try telling yourself, “If something happens, someone will text me,” and go for a walk or, better yet, to bed. Keeping up with healthy habits can help reduce your vulnerability to uncontrolled anxiety.

Post-Election Day, you may continue to feel drawn to the news and motivated to show up—whether that means donating, volunteering, or protesting—for a variety of causes you think will be affected by the election results. Many people describe feeling guilty if they say no or disengage, leading them to overcommit and wind up overwhelmed.

If this sounds like you, try reminding yourself that taking a break from politics to cook, engage with your family or friends, get some work done, or go to the gym does not mean you don’t care. In fact, keeping up with the activities that fuel you will give you the energy to contribute to important causes more meaningfully.The Conversation

Shannon Sauer-Zavala, Associate Professor of Psychology & Licensed Clinical Psychologist, University of Kentucky. This article is republished from The Conversation under a Creative Commons license. Read the original article.

Here are 3 science-backed strategies to rein in election anxiety Read More »

dystopika-is-a-beautiful-cyberpunk-city-builder-without-the-ugly-details

Dystopika is a beautiful cyberpunk city builder without the ugly details

Some of my favorite games deny me the thing I think I want most. Elden Ring refuses to provide manageable save files (and I paid for it). Balatro withholds the final math on each hand played (and its developer suggests avoiding calculators). And the modern X-COM games force me to realize just how much a 98 percent chance to hit is not the same as 100 percent.

Dystopika (Steam, Windows) is a city builder in maybe the strictest definition of that two-word descriptor, because it steadfastly refuses to distract you with non-building details. The game is described by its single developer, Matt Marshall, as having “No goals, no management, just creativity and dark cozy vibes.” Dystopika does very little to explain how you should play it, because there’s no optimal path for doing so. Your only job is to enjoy yourself, poking and prodding at a dark cyberpunk cityscape, making things that look interesting, pretty, grim, or however you like. It might seem restrictive, but it feels very freeing.

Dystopika launch video.

The game’s interface is a small rail on the left side of the screen. Select “Building” and a random shape attaches to your cursor. You can right-click to change it, but you can’t pick one. Place it, and then optionally place the cursor near its top to change its height. Making one building taller will raise smaller buildings nearby. Reaching certain heights, or densities, or something (it’s not explained) will “unlock” certain new buildings, landmarks, and decorations.

Screen showing a tall, T-shaped building,, with

Hooray! I’ve unlocked the headquarters for a megacorp with a very ominous name! (Please appreciate my efforts at public transit.) Credit: Kevin Purdy

You do get to pick out “Props,” like roads and trams and giant billboards and hologram objects and flying carports, but the game is similarly non-committal on what you should do with them, or most anything. You put things down, or delete them, expand them, connect them, and try things out until you like how it looks.

Dystopika is a beautiful cyberpunk city builder without the ugly details Read More »

us-space-force-warns-of-“mind-boggling”-build-up-of-chinese-capabilities

US Space Force warns of “mind-boggling” build-up of Chinese capabilities

Both Russia and China have tested satellites with capabilities that include grappling hooks to pull other satellites out of orbit and “kinetic kill vehicles” that can target satellites and long-range ballistic missiles in space.

In May, a senior US defense department official told a House Armed Services Committee hearing that Russia was developing an “indiscriminate” nuclear weapon designed to be sent into space, while in September, China made a third secretive test of an unmanned space plane that could be used to disrupt satellites.

The US is far ahead of its European allies in developing military space capabilities, but it wanted to “lay the foundations” for the continent’s space forces, Saltzman said. Last year UK Air Marshal Paul Godfrey was appointed to oversee allied partnerships with NATO with the US Space Force—one of the first times that a high-ranking allied pilot had joined the US military.

But Saltzman warned against a rush to build up space forces across the continent.

“It is resource-intensive to separate out and stand up a new service. Even … in America where we think we have more resources, we underestimated what it was going to take,” he said.

The US Space Force, which monitors more than 46,000 objects in orbit, has about 10,000 personnel but is the smallest department of the US military. Its officers are known as “guardians.”

The costs of building up space defense capabilities mean the US is heavily reliant on private companies, raising concerns about the power of billionaires in a sector where regulation remains minimal.

SpaceX, led by prominent Trump backer Elon Musk, is increasingly working with US military and intelligence through its Starshield arm, which is developing low Earth orbit satellites that track missiles and support intelligence gathering.

This month, SpaceX was awarded a $734 million contract to provide space launch services for US defense and intelligence agencies.

Despite concerns about Musk’s erratic behavior and reports that the billionaire has had regular contact with Russian President Vladimir Putin, Saltzman said he had no concerns about US government collaboration with SpaceX.

“I’m very comfortable that they’ll execute those [contracts] exactly the way they’re designed. All of the dealings I’ve had with SpaceX have been very professional,” he said.

Additional reporting by Kathrin Hille in Taipei.

© 2024 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

US Space Force warns of “mind-boggling” build-up of Chinese capabilities Read More »

thousands-of-hacked-tp-link-routers-used-in-years-long-account-takeover-attacks

Thousands of hacked TP-Link routers used in years-long account takeover attacks

Hackers working on behalf of the Chinese government are using a botnet of thousands of routers, cameras, and other Internet-connected devices to perform highly evasive password spray attacks against users of Microsoft’s Azure cloud service, the company warned Thursday.

The malicious network, made up almost entirely of TP-Link routers, was first documented in October 2023 by a researcher who named it Botnet-7777. The geographically dispersed collection of more than 16,000 compromised devices at its peak got its name because it exposes its malicious malware on port 7777.

Account compromise at scale

In July and again in August of this year, security researchers from Serbia and Team Cymru reported the botnet was still operational. All three reports said that Botnet-7777 was being used to skillfully perform password spraying, a form of attack that sends large numbers of login attempts from many different IP addresses. Because each individual device limits the login attempts, the carefully coordinated account-takeover campaign is hard to detect by the targeted service.

On Thursday, Microsoft reported that CovertNetwork-1658—the name Microsoft uses to track the botnet—is being used by multiple Chinese threat actors in an attempt to compromise targeted Azure accounts. The company said the attacks are “highly evasive” because the botnet—now estimated at about 8,000 strong on average—takes pains to conceal the malicious activity.

“Any threat actor using the CovertNetwork-1658 infrastructure could conduct password spraying campaigns at a larger scale and greatly increase the likelihood of successful credential compromise and initial access to multiple organizations in a short amount of time,” Microsoft officials wrote. “This scale, combined with quick operational turnover of compromised credentials between CovertNetwork-1658 and Chinese threat actors, allows for the potential of account compromises across multiple sectors and geographic regions.

Some of the characteristics that make detection difficult are:

  • The use of compromised SOHO IP addresses
  • The use of a rotating set of IP addresses at any given time. The threat actors had thousands of available IP addresses at their disposal. The average uptime for a CovertNetwork-1658 node is approximately 90 days.
  • The low-volume password spray process; for example, monitoring for multiple failed sign-in attempts from one IP address or to one account will not detect this activity.

Thousands of hacked TP-Link routers used in years-long account takeover attacks Read More »

starlink-enters-national-radio-quiet-zone—but-reportedly-cut-off-access-for-some

Starlink enters National Radio Quiet Zone—but reportedly cut off access for some


Starlink offered to 99.5% of zone, but locals say Roam product was disabled.

Starlink satellite dish. Credit: Starlink

Starlink’s home Internet service has come to the National Radio Quiet Zone after a multi-year engineering project that had the goal of minimizing interference with radio telescopes. Starlink operator SpaceX began “a one-year assessment period to offer residential satellite Internet service to 99.5% of residents within the NRQZ starting October 25,” the National Radio Astronomy Observatory and Green Bank Observatory announced last week.

“The vast majority of people within the areas of Virginia and West Virginia collectively known as the National Radio Quiet Zone (NRQZ) can now receive high speed satellite Internet service,” the announcement said. “The newly available service is the result of a nearly three-year collaborative engineering effort between the US National Science Foundation (NSF), SpaceX, and the NSF National Radio Astronomy Observatory (NSF NRAO), which operates the NSF Green Bank Observatory (NSF GBO) in West Virginia within the NRQZ.”

There’s a controversy over the 0.5 percent of residents who aren’t included and are said to be newly blocked from using the Starlink Roam service. Starlink markets Roam as a service for people to use while traveling, not as a fixed home Internet service.

The Pendleton County Office of Emergency Management last week issued a press release saying that “customers with the RV/Roam packages had been using Starlink for approximately two years throughout 100% of the NRQZ. Now, the 0.5% have lost coverage after having it for two years. This means that a large section of southeastern Pendleton County and an even larger section of northern Pocahontas will NOT be able to utilize Starlink.”

PCMag wrote that “Starlink is now live in 42 of the 46 cell areas around the Green Bank Observatory’s telescopes.” Pendleton County Emergency Services Coordinator Rick Gillespie told Ars today that Roam coverage was cut off in the remaining four cell areas.

“After the agreement, we all lost effective use within the four cells,” Gillespie told Ars in an email. Gillespie’s press release said that, “in many cases, Starlink was the only Internet provider option residents and emergency responders had. This is unacceptable.”

“The dark ages of communications systems”

Gillespie was quoted as saying in a WBOY article that the restrictions are “keeping a portion of Pendleton and Pocahontas counties in the dark ages of communications systems.”

We contacted SpaceX and the National Radio Astronomy Observatory about any limits imposed on Roam today and will update this article if we get any response.

Residents of the 13,000-square-mile National Radio Quiet Zone have limited Internet access due to restrictions on radio transmissions first put in place in 1958. In addition to scientific research at Green Bank in Pocahontas County, the National Radio Quiet Zone includes a National Security Agency facility at Sugar Grove Station in Pendleton County.

SpaceX and the NRAO collaborated on testing over the past few years and presumably concluded that the service could only be provided without interference in 99.5 percent of the zone. Chris De Pree, the NRAO deputy spectrum manager, said in the organization’s announcement that “working closely with SpaceX over the past three years has enabled NRAO and SpaceX to better understand each other’s systems and how to actively coexist in this part of the spectrum.”

In that time, “scientists and engineers performed multiple tests and analyses to determine the best way to maximize satellite internet service without hindering the missions within the NRQZ,” the announcement said. During the one-year assessment period for Starlink’s home Internet service, “scientists and engineers will monitor for interference issues and work to resolve them without interrupting Internet service.”

Starlink steers beams away from telescopes

Starlink said in August that it worked with the NRAO “to enable Starlink satellites to avoid transmissions into the line-of-sight of radio telescopes, leveraging our advanced phased array antenna technology to dynamically steer beams away from telescopes.”

Starlink published a summary noting that “direct transmissions from satellites towards the eye of radio telescopes may pose a significant risk of interference to astronomical research.” The technique for steering beams away from telescopes is “made possible by a real-time data sharing framework between radio astronomy observatories and Starlink that provides the Starlink network with a telescope’s planned observation schedule, including the telescope’s pointing direction (aka ‘boresight’) and its observed frequency band. With this information, the Starlink network can ensure that satellites passing near the boresight of a telescope dynamically redirect their beams away from the telescope.”

The redirection happens “in milliseconds” and “protects the telescope’s observations while ensuring Starlink service remains uninterrupted for customers near the telescope.” Starlink is also using the technology with NRAO’s Very Large Array in New Mexico.

Counties want quiet-zone rules scrapped

The quiet-zone rules should be scrapped, a number of local officials say. The Pendleton County press release said that 10 West Virginia counties and one Virginia county “have formally expressed their need for change regarding the National Radio Quiet Zone (NRQZ) through Resolutions and Letters of Support.” These counties have a combined 262,296 residents, the press release said.

“We do not seek the closure of these federal entities but rather their commitment to identifying and funding viable solutions that would enable our communication systems to operate effectively, similar to those in the majority of America,” Gillespie said in the press release.

Gillespie told Ars that local communities are hampered by “archaic 1950’s regulations. We are being left behind when it comes to the modern advancements in public safety and personal communications.” He said that “absent some relief in a timely fashion, we will explore taking our plight to the FCC seeking waivers.”

The Pendleton County Commission resolution approved in September called for dissolution of the quiet zone or “total waivers of any NRQZ restrictions imposed on Public Safety Radio Frequency Bands currently in use, as well as all the commercial cellular/wireless Bands, and commercial satellite Internet providers, such as Starlink.”

The county resolution said the quiet zone is effectively “an ever-growing unfunded federal mandate on our county emergency services/911 operation wherein it causes us to spend large amounts of funding building a larger number of tower sites than would be needed absent the NRQZ restrictions.” The restrictions have greatly diminished access to the AT&T FirstNet public safety network and other networks used by first responders and residents, the resolution said.

The Pocahontas County Commission issued a resolution in September calling for total waivers of restrictions imposed on public safety spectrum, or federal funding to offset costs associated with developing public safety communications systems under “the unique burden of NRQZ regulations.”

Limited fiber and cellular access

Starlink service wouldn’t be as necessary for home Internet access if the area had universal access to fiber broadband. Recent government grants could help, as one funded project is designed to subsidize Spruce Knob Seneca Rocks Telephone’s installation of fiber lines in Pocahontas and Pendleton counties.

Ideally, residents would have access to both fiber home Internet and strong cellular networks. But the NRAO still warns that cellular signals could threaten its scientific research.

“Optical fiber as a broadband solution is far better than service from space or via wireless or cellular links, which are less reliable and have the potential to undo much of the coordination work that has happened in the National Radio Quiet Zone over many decades,” Sheldon Wasik, Zone Regulatory Services Coordinator for the National Radio Astronomy Observatory, said in March 2024.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Starlink enters National Radio Quiet Zone—but reportedly cut off access for some Read More »