Highlights

ai-#47:-meet-the-new-year

AI #47: Meet the New Year

Will be very different from the old year by the time we are done. This year, it seems like various continuations of the old one. Sometimes I look back on the week, and I wonder how so much happened, while in other senses very little happened.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. A high variance game of chess.

  4. Language Models Don’t Offer Mundane Utility. What even is productivity?

  5. GPT-4 Real This Time. GPT store, teams accounts, privacy issues, plagiarism.

  6. Liar Liar. If they work, why aren’t we using affect vectors for mundane utility?

  7. Fun With Image Generation. New techniques, also contempt.

  8. Magic: The Generating. Avoiding AI artwork proving beyond Hasbro’s powers.

  9. Copyright Confrontation. OpenAI responds, lawmakers are not buying their story.

  10. Deepfaketown and Botpocalypse Soon. Deepfakes going the other direction.

  11. They Took Our Jobs. Translators, voice actors, lawyers, games. Usual stuff.

  12. Get Involved. Misalignment museum.

  13. Introducing. Rabbit, but why?

  14. In Other AI News. Collaborations, safety work, MagicVideo 2.0.

  15. Quiet Speculations. AI partners, questions of progress rephrased.

  16. The Quest for Sane Regulation. It seems you can just lie to the House of Lords.

  17. The Week in Audio. Talks from the New Orleans safety conference.

  18. AI Impacts Survey. Some brief follow-up.

  19. Rhetorical Innovation. Distracting from other harms? Might not be a thing.

  20. Aligning a Human Level Intelligence is Still Difficult. The human alignment tax.

  21. Aligning a Smarter Than Human Intelligence is Difficult. Foil escape attempts?

  22. Won’t Get Fooled Again. Deceptive definitions of deceptive alignment.

  23. People Are Worried About AI Killing Everyone. The indifference of the universe.

  24. Other People Are Not As Worried About AI Killing Everyone. Sigh.

  25. The Wit and Wisdom of Sam Altman. Endorsement of The Dial of Progress.

  26. The Lighter Side. Batter up.

WordPress now has something called Jetpack AI, which is powered by GPT-3.5-Turbo. It is supposed to help you write in all the usual ways. You access it by creating an ‘AI Assistant’ block. The whole blocks concept rendered their editor essentially unusable, but one could paste in quickly to try this out.

Get to 1500 Elo in chess on 50 million parameters and correctly track board states in a recognizable way, versus 3.5-turbo’s 1800. It is a very strange 1500 Elo, that is capable of substantial draws against Stockfish 9 (2700 Elo). A human at 1800 Elo is essentially never going to get a draw from Stockfish 9. This has flashes of brilliance, and also blunders rather badly.

I asked about which games were used for training, and he said it didn’t much matter whether you used top level games, low level games or a mix, there seems to be some limit for this architecture and model size.

Use it in your AI & the Law course at SCU law, at your own risk.

Jess Miers: My AI & the Law Course at SCU Law is officially live and you can bet we’re EMBRACING AI tools under this roof!

Bot or human, it just better be right…

Tyler Cowen links to this review of Phind. finding it GPT-4 level and well designed. The place to add context is appreciated, as are various other options, but they don’t explain properly yet to the user how to best use all those options.

My experience with Phind for non-coding purposes is that it has been quite good at being a GPT-4-level quick, up-to-date tool for asking questions where Google was never great and is getting worse, and so far has been outperforming Perplexity.

Play various game theory exercises and act on the more cooperative or altruistic end of the human spectrum. Tyler Cowen asks, ‘are they better than us? Perhaps.’ I see that as a non-sequitur in this context. Also a misunderstanding of such games.

Get ChatGPT-V to identify celebrities by putting a cartoon character on their left.

The success rates on GPT-3.5 of 40 human persuasion techniques as jailbreaks.

Some noticeable patterns here. Impressive that plain queries are down to 0%.

Robin Hanson once again claims AI can’t boost productivity, because wages would have risen?

Robin Hanson: “large-scale controlled trial … at Boston Consulting Group … found consultants using …GPT-4 … 12.2% more tasks on average, completed tasks 25.1% more quickly, & produced 40% higher quality results than those without the tool. A new paper looking at legal work done by law students found the same results”

I’m quite skeptical of such results if we do not see the pay of such workers boosted by comparable amounts.

Eliezer Yudkowsky: Macroeconomics would be very different if wages immediately changed to their new equilibrium value! They can stay in disequilibrium for years at the least!

Robin Hanson: The wages of those who were hired on long ago might not change fast, but the wages of new hires can change very quickly to reflect changes in supply & demand.

I do not understand Robin’s critique. Suppose consultants suddenly get 25% more done at higher quality. Why should we expect generally higher consultant pay even at equilibrium? You can enter or exit the consultant market, often pretty easily, so in long run compensation should not change other than compositionally. In the short run, the increase in quality and decrease in cost should create a surplus of consultants until supply and demand can both adjust. If anything that should reduce overall pay. Those who pioneer the new tech should do better, if they can translate productivity to pay, but consultants charge by the hour and people won’t easily adjust willingness to pay based on ‘I use ChatGPT.’

Robin Hanson: If there’s an “excess supply” that suggests lower marginal productivity.

Well, yes, if we use Hanson’s definition of marginal productivity as the dollar value of the last provided hour of work. Before 10 people did the work and they were each worth $50/hour. Now 7 people can do that same work, then there’s no more work people want to hire someone for right now, so the ‘marginal productivity’ went down.

The GPT store is ready to launch, and indeed has gone live. So far GPTs have offered essentially no mundane utility. Perhaps this is because good creators were holding out for payment?

GPT personalization across chats has arrived, at least for some people.

GPT Teams is now available at $25/person/month, with some extra features.

Wait, some say. No training on your data? What does that say about the Plus tier?

Andrew Morgan: ⁦@OpenAI⁩ Just want some clarity. Does this mean in order to have access to data privacy I have to pay extra? Also, does this mean I never had it before? 👀🥲

Delip Rao: 😬 I had no idea my current plus data was used in training. I thought OpenAI was not training on user inputs? Can somebody from @openai clarify

Rajjhans Samdani: Furthermore they deliberately degrade the “non-training” experience. I don’t see why do I have to lose access to chat history if I don’t want to be included in training.

Karma: Only from their API by default. You could turn it off from the settings in ChatGPT but then your conversations wouldn’t have any history.

Yes. They have been clear on this. The ‘opt out of training’ button has been very clear.

You can use the API if you value your privacy so much. If you want a consumer UI, OpenAI says, you don’t deserve privacy from training, although they do promise an actual human won’t look.

I mean, fair play. If people don’t value it, and OpenAI values the data, that’s Coase.

Will I upgrade my family to ‘Teams’ for this? I don’t know. It’s a cheap upgrade, but also I have never hit any usage limits.

What happened with the GPT wrapper companies?

Jeff Morris Jr.: “Most of my friends who started GPT wrapper startups in 2023 returned capital & shut down their companies.” Quote from talking with a New York founder yesterday. The OpenAI App Store announcement changed everything for his friends – most are now looking for jobs.

Folding entirely? Looking for a job? No, no, don’t give up, if you are on the application side it is time to build.

No one is using custom GPTs. It seems highly unlikely this will much change. Good wrappers can do a lot more, there are a lot more degrees of freedom and other tools one can integrate, and there are many other things to build. Yes, you are going to constantly have Google and Microsoft and OpenAI and such trying to eat your lunch, but that is always the situation.

Someone made a GPT for understanding ‘the Ackman affair.The AI is going to check everyone’s writing for plagiarism.

Bill Ackman: Now that we know that the academic body of work of every faculty member at every college and university in the country (and eventually the world) is going to be reviewed for plagiarism, it’s important to ask what the implications are going to be.

If every faculty member is held to the current plagiarism standards of their own institutions, and universities enforce their own rules, they would likely have to terminate the substantial majority of their faculty members.

I say percentage of pages rather than number of instances, as the plagiarism of today can be best understood by comparison to spelling mistakes prior to the advent of spellcheck.

For example, it wouldn’t be fair to say that two papers are both riddled with spelling mistakes if each has 10 mistakes, when one paper has 30 pages and the other has 330. The standard has to be a percentage standard.

The good news is that no paper written by a faculty member after the events of this past week will be published without a careful AI review for plagiarism, that, in light of recent events, has become a certainty.

This is not the place to go too deep into many of the details surrounding the whole case, which I may or may not do at another time.

Instead here I want to briefly discuss the question of general policy. What to do if the AI is pointing out that most academics at least technically committed plagiarism?

Ackman points out that there is a mile of difference between ‘technically breaking the citation rules’ on the level of a spelling error, which presumably almost everyone does (myself included), lifting of phrases and then theft of central ideas or entire paragraphs and posts. There’s plagiarism and then there’s Plagiarism. There’s also a single instance of a seemingly harmless mistake versus a pattern of doing it over and over again in half your papers.

For spelling-error style mistakes, we presumably need mass forgiveness. As long as your rate of doing it is not massively above normal, we accept that mistakes were made. Ideally we’d fix it all, especially for anything getting a lot of citations, in the electronic record. We have the technology. Bygones.

For the real stuff, the violations that are why the rules exist, the actual theft of someone’s work, what then? That depends on how often this is happening.

If this is indeed common, we will flat out need Truth and Reconciliation. We will need to essentially say that, at least below some high threshold that most don’t pass, everyone say what they did with the AI to help them find and remember it, and then we are hitting the reset button. Don’t do it again.

Truth, in various forms, is coming for quite a lot of people once AI can check the data.

What can withstand that? What cannot? We will find out.

A lot of recent history has been us discovering that something terrible, that was always common, was far worse and more common than we had put into common knowledge, and also deciding that it was wrong, and that it can no longer be tolerated. Which by default is a good thing to discover and a good thing to figure out. The problem is that our survival, in many forms, has long depended on many things we find despicable, starting with Orwell’s men with guns that allow us to sleep soundly in our beds and going from there.

Scott Alexander writes The Road to Honest AI, exploring work by Hendrycks about using various additional vectors to add or subtract things like honesty, or fairness, for fear or happiness, or megalomania.

I’ve covered this before. It is exciting work. I do not have hope that it can be used in the obvious fashion, where you tell the AIs to always be honest so your AIs are always honest and you never have to worry about them. That is the sort of thing I expect to fail at exactly the wrong time, to get selected against whenever it is used, and to engender a false sense of security even if it does work. It is very possible to be fully honest while using your superior intellect to outsmart everyone.

And as I discovered in Diplomacy, it being common knowledge you cannot lie is not obviously a disadvantage. Who wouldn’t want to put an AI that couldn’t lie in a position of trust and power? The advantage is even bigger, of course, if it turns out you can lie after all.

Scott’s coverage drove home to me that we are sleeping on this research.

Not in the sense that this will stop us all from dying or let us fully understand AIs. It might help, but we are a long way off, and that all takes time.

I’m thinking in the Mundane Utility section. This seems highly useful, for making best use of current AIs, today.

Eliezer Yudkowsky proposes telling Dalle-3 to caption with words whose letters are separated by dashes, then tell Dalle-3 to remove the spaces.

ChatGPT: Certainly! Here is the exact prompt that I sent to DallE-3 for the image: “An image of Seattle on a relentlessly dreary, drizzly day. The city skyline, including the Space Needle, is partially obscured by low-hanging gray clouds and mist. Raindrops are visible in the air, and the streets are wet and glistening. People walk with umbrellas. In the foreground, a caption in cheerful advertisement typography reads: “S E A T T L E : R E L E N T L E S S L Y D R E A R Y !” but without the spaces between the letters, creating the correct caption “SEATTLE: RELENTLESSLY DREARY!”. The image should capture the humorous contrast between the dreary weather and the upbeat font of the caption.”

Some good results, relative to general expectations:

He reports he got lucky initially, as is usually the case for a new seemingly successful technique, everything is finicky, by default nothing ever properly replicates, but it seems worth exploring more.

AI video generation about to disrupt Hollywood?

Cate Hall: Just so long as your artistic style is limited to slow-mo pans of largely frozen people & landscapes

MidJourney used Magic: The Gathering artwork and cards in its training set. I know, you are shocked, shocked to find data sets in this establishment.

It impresses me the contempt a large percentage of the world has for AI image generation.

Kenneth Shepard (Kotaku): AI-generated art is just pulling from existing work people have created and throwing elements together to create what the algorithm thinks you want.

Amazing that people keep telling that story even now. I suppose they will keep telling it right up until the end.

But it’s not often you hear specifics of where an AI program is scraping from. Well, the CEO behind AI art-generating program MidJourney allegedly has been training the algorithm on work by Magic: The Gathering artists the entire time.

I suppose it is interesting that they used Magic cards extensively in their early days as opposed to other sources. It makes sense that they would be a good data source, if you assume they didn’t worry about copyright at all.

MidJourney has been exceptionally clear that it is going to train on copyrighted material. All of it. Assume that they are training on everything you have ever seen. Stop being surprised, stop saying you ‘caught’ them ‘red handed.’ We can be done with threads like this talking about all the different things MidJourney trained with.

Similarly, yes, MidJourney can and will mimic popular movies and their popular shots if you ask it to, one funny example prompt here is literally ‘popular movie screencap —ar 11:1 —v 6.0’ so I actually know exactly what you were expecting, I mean come on. Yes, they’ll do any character or person popular enough to be in the training set, in their natural habitats, and yes they will know what you actually probably meant, stop acting all innocent about it, and also seriously I don’t see why this matters.

They had a handy incomplete list of things that got copied, with some surprises.

I’m happy that Ex Machina and Live Die Repeat made the cut. That implies that at least some of a reasonably long tail is covered pretty well. Can we get a list of who is and isn’t available?

If you think that’s not legal and you want to sue, then by all means, sue.

I’d also say that if your reporter was ‘banned multiple times’ for their research, then perhaps the first one is likely a fair complaint and the others are you defying their ban?

What about the thing where you say ‘animated toys’ and you get Toy Story characters, and if you didn’t realize this and used images of Woody and Buzz you might yourself get into copyright trouble? It is possible, but seems highly unlikely. The whole idea is that you get that answer from ‘animated toys’ because most people know Toy Story, especially if they are thinking about animated toys. If your company deploys such a generation at sufficient scale to get Disney involved and no one realized, I mean sorry but that’s on you.

The actual fight this week over Magic and AI art is the accusation that Wizards is using AI art in its promotional material. They initially denied this.

No one believed them. People were convinced they are wrong or lying, and it’s AI:

Reid Southern: Doesn’t look good, but we’ll see if they walk that statement back. It’s possible they didn’t know and were themselves deceived, it’s happened before.

Silitha: This is the third or fourth time for WOTC(ie Hasbro) They had one or two other from MTG and then a few from DnD. It’s just getting so frequent it is coming of as ‘testing the waters’ or ‘desensitizing’. Hasbro has shown some crappy business practices

The post with that picture has now been deleted.

Dave Rapoza: And just like that, poof, I’m done working for wizards of the coast – you can’t say you stand against this then blatantly use AI to promote your products, emails sent, good bye you all!

If you’re gonna stand for something you better make sure you’re actually paying attention, don’t be lazy, don’t lie.

Don’t be hard on other artists if they don’t quit – I can and can afford to because I work for many other game studios and whatnot – some people only have wotc and cannot afford to quit having families and others to take care of – don’t follow my lead if you can’t, no pressure

I like the comments asking why I didn’t quit from Pinkertons, layoffs, etc

– I’ll leave you with these peoples favorite quote

– “ The best time to plant a tree was 25 years ago. The second-best time to plant a tree is 25 years ago.”

Also, to be clear, I’m quitting because they took a moral stand against AI art like a week ago and then did this, if they said they were going to use AI that’s a different story, but they want to grand stand like heroes and also pull this, that’s goofball shit I won’t support.

They claim they will have nothing to do with AI art in any way for any reason. Yet this is far from the first such incident where something either slipped through or willfully disregarded.

Then Wizards finally admitted that everyone was right.

Wizards of the Coast: Well, we made a mistake earlier when we said that a marketing image we posted was not created using AI.

As you, our diligent community pointed out, it looks like some AI components that are now popping up in industry standard tools like Photoshop crept into our marketing creative, even if a human did the work to create the overall image.

While the art came from a vendor, it’s on us to make sure that we are living up to our promise to support the amazing human ingenuity that makes Magic great.

We already made clear that we require artists, writers, and creatives contributing to the Magic TCG to refrain from using AI generative tools to create final Magic products.

Now we’re evaluating how we work with vendors on creative beyond our products – like these marketing images – to make sure that we are living up to those values.

I actually sympathize. I worked at Wizards briefly in R&D. You don’t have to do that to know everyone is overworked and overburdened and underpaid.

Yes, you think it is so obvious that something was AI artwork, or was created with the aid of AI. In hindsight, you are clearly right. And for now, yes, they probably should have spotted it in this case.

But Wizards does thousands of pieces of artwork each year, maybe tens of thousands. If those tasked with doing the art try to take shortcuts, there are going to be cases where it isn’t spotted, and things are only going to get trickier. The temptation is going to be greater.

One reason this was harder to catch is that this was not a pure MidJourney-style AI generation. This was, it seems, the use by a human of AI tools like photoshop to assist with some tasks. If you edit a human-generated image using AI tools, a lot of the detection techniques are going to miss it, until someone sees a telltale sign. Mistakes are going to happen.

We are past the point where, for many purposes, AI art would outcompete human art, or at least where a human would sometimes want to use AI for part of their toolbox, if the gamers were down with AI artwork.

Even for those who stick with human artists, who fully compensate them, we are going to face issues of what tools are and aren’t acceptable. Surely at a minimum humans will be using AI to try out ideas and see concepts or variants. Remember when artwork done on a computer was not real art? Times change.

The good news for artists is that the gamers very much are not down for AI artwork. The bad news is that this only gets harder over time.

OpenAI responds to the NYT lawsuit. The first three claims are standard, the fourth was new to me, that they were negotiating over price right before the lawsuit filed, and essentially claiming they were stabbed in the back:

Our discussions with The New York Times had appeared to be progressing constructively through our last communication on December 19. The negotiations focused on a high-value partnership around real-time display with attribution in ChatGPT, in which The New York Times would gain a new way to connect with their existing and new readers, and our users would gain access to their reporting. We had explained to The New York Times that, like any single source, their content didn’t meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training. Their lawsuit on December 27—which we learned about by reading The New York Times—came as a surprise and disappointment to us.

Along the way, they had mentioned seeing some regurgitation of their content but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues. We’ve demonstrated how seriously we treat this as a priority, such as in July when we took down a ChatGPT feature immediately after we learned it could reproduce real-time content in unintended ways.

Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple thirdparty websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.

Despite their claims, this misuse is not typical or allowed user activity, and is not a substitute for The New York Times. Regardless, we are continually making our systems more resistant to adversarial attacks to regurgitate training data, and have already made much progress in our recent models.

I presume they are telling the truth about the negotiations. NYT would obviously prefer to get paid and gain a partner, if the price is right. I guess the price was wrong.

Ben Thompson weighs in on the NYT lawsuit. He thinks training is clearly fair use and this is obvious under current law. I think he is wrong about the obviousness and the court could go either way. He thinks the identical outputs are the real issue, notes that OpenAI tries to avoid such duplication in contrast to Napster embracing it, and sees the ultimate question as whether there is market impact on NYT here. He is impressed by NYT’s attempted framing, but is very clear who he thinks should win.

Arnold Kling asks what should determine the outcome of the lawsuit by asking why the laws exist. Which comes down to whether or not any of this is interfering with NYT’s ability to get paid for its work. In practice, his answer is no at current margins. My answer is also mostly no at current margins.

Lawmakers seem relatively united that OpenAI should pay for the data it uses. What are they going to do about it? So far, nothing. They are not big on passing laws.

Nonfiction book authors sue OpenAI in a would-be class action. A bunch of the top fiction authors are already suing from last year. And yep, let’s have it out. The facts here are mostly not in dispute.

I thought it would be one way, sometimes it’s the other way?

Aella: oh god just got my first report of someone using one of my very popular nude photos, but swapped out my face for their face, presumably using AI get me off this ride. To see the photo, go here.

idk why i didn’t predict ppl would start doing this with my images. this is kinda offensive ngl. ppl steal my photos all the time pretending that it’s me, but my face felt like a signature? now ppl stealing my real body without my identity attached feels weirdly dehumanizing.

Razib Khan: Hey it was an homage ok? I think I pulled it off…

Aella: lmfao actually now that i think of it why haven’t the weirdo people who follow both of us started photoshopping your face onto my nudes yet.

It’s not great. The problem has been around for a while thanks to photoshop, AI decreases the difficulty while making it harder to detect. I figured we’d have ‘generate nude of person X’ if anything more than we do right now, but I didn’t think X would be the person generating all that often, or did I think the issue would be ‘using the picture of Y as a template.’ But yeah, I suppose this will also happen, you sickos.

Ethan Mollick shows a rather convincing deepfake of him talking, based on only 30 seconds of webcam and 30 seconds of voice.

We are still at the point where there are videos and audio recordings that I would be confident are real, but a generic ‘person talking’ clip could easily be fake.

First they came for the translators, which they totally did do, an ongoing series.

Reid Southern: Duolingo laid off a huge percentage of their contract translators, and the remaining ones are simply reviewing AI translations to make sure they’re ‘acceptable’. This is the world we’re creating. Removing the humanity from how we learn to connect with humanity.

Well, that didn’t take long. Every time you talk about layoffs related to AI, someone shows up to excitedly explain to you why it’s a net gain for humanity. That is until it’s their job of course.

Ryan: Don’t feel the same way here as I do about artists and copyrighted creative work. Translation is machine work and AI is just a better machine. Translation in most instances isn’t human expression. It’s just a pattern that’s the same every time. This is where AI should be used.

Reid Southern: Demonstrably false. There is so much nuance to language and dialects, especially as they evolve, it would blow your mind.

Hampus Flink: As a translator, I can assure you that: 1. This doesn’t make the job of the remaining workers any easier and 2. It doesn’t make the quality of the translations any better I’ve seen this happen a hundred times and it was my first thought when image generators came around.

Daniel Eth: Okay, a few thoughts on this:

1) it’s good if people have access to AI translators and AI language tutors

2) I’m in favor of people being provided safety nets, especially in cases where this does not involve (much) perverse incentives – eg, severance packages and/or unemployment insurance for those automated out of work

3) we shouldn’t let the perfect be the enemy of the good, and stopping progress in the name of equity is generally bad imho, so automation at duolingo is probably net good, even though (on outside view) I doubt (2) was handled particularly well here

4) if AI does lead to wide-scale unemployment, we’ll have to rethink our whole economic system to be much more redistributive – possibilities include a luxurious UBI and Fully Automated Luxury Gay Space Communism; we have time to have this conversation, but we shouldn’t wait for wide-scale automation to have it 5) this case doesn’t have much at all to do with AI X-risk, which is a much bigger problem and more pressing than any of this

If we do get to the point of widespread technological unemployment, we are not likely to handle it well, but it will be a bit before that happens. If it is not a bit before that happens, we will very quickly have much bigger problems than unemployment rates.

On the particular issue of translation, what will happen, aside from lost jobs?

The price of ‘low-quality’ translation will drop to almost zero. The price of high-quality translation will also fall, but by far less.

This means two things.

First, there will be a massive, massive win from real-time translation, from automatic translation, and from much cheaper human-checked translation, as many more things are available to more people in more ways in more languages, including the ability to learn, or to seek further skill or understanding. This is a huge game.

Second, there will be substitution of low-quality work for high-quality work. In many cases this will be very good, the market will be making the right decision.

In other cases, however, it will be a shame. It is plausible that Duolingo will be one of those situations, where the cost savings is not worth the drop in quality. I can unfortunately see our system getting the wrong answer here.

The good news is that translation is going to keep improving. Right now is the valley of bad translation, in the sense that they are good enough to get used but miss a lot of subtle stuff. Over time, they’ll get that other stuff more and more, and also we will learn to combine them with humans more effectively when we want a very high quality translation.

If you are an expert translator, one of the best, I expect you to be all right for a while. There will still be demand, especially if you learn to work with the AI. If you are an average translator, then yes, things are bad and they are going to get worse, and you need to find another line of work while you can.

They are also coming for the voice actors. SAG-AFTRA made a deal to let actors license their voices through Replica for use in games, leaving many of the actual voice actors in games rather unhappy. SAG-AFTRA was presumably thinking this deal means actors retain control of their voices and work product. The actual game voice actors were not consulted and do not see the necessary protections in place.

All the technical protections being discussed, as far as I can tell, do not much matter. What matters is whether you open the door at all. Once you normalize using AI-generated voice, and the time cost of production drops dramatically for lower-quality performance, you are going to see a fast race to the bottom on the cost of that, and its quality will improve over time. So the basic question is what floor has been placed on compensation. Of course, if SAG-AFTRA did not make such a deal, then there are plenty of non-union people happy to license their voices on the cheap.

So I don’t see how the voice actors ever win this fight. The only way I can see retaining voice actors is either if the technology doesn’t get there, as it certainly is not yet there for top quality productions. Or, if the consumers take a strong enough stand, and boycott anyone using AI voices, that would also have power. Or government intervention could of course protect such jobs by banning use of AI voice synthesis, which to be clear I do not support. I don’t see how any contract saves you for long.

Many lawyers are not so excited about being more productive.

Scott Lincicome: This was one of my least favorite things about big law: the system punished productivity and prioritized billing hours over winning cases. Terrible incentives!

The Information: Both firms face the challenge of overcoming resistance from lawyers to time-saving technology. Law firms generate revenue by selling time that their lawyers spend advising and helping clients. “If you made me 8 to 10 times faster, I’d be very unhappy,” as a lawyer, Robinson explained, because his compensation would be tied to the amount of hours he put in.

As we’ve discussed before, a private speedup is good, you can compete better or at least slack off. If everyone gets it, that’s potentially a problem, with way too much supply for the demand, crashing the price, again unless this generates a lot more work, which it might. I know that I consult and use lawyers a lot less than I would if they were cheaper or more productive, or if I had an unlimited budget.

What you gonna do when they come for you?

Rohit: I feel really bad for people losing their jobs because of AI but I don’t see how claiming ever narrower domains of human jobs are the height of human spirit is helpful in understanding or addressing this.

Technological loss of particular jobs is, as many point out, nothing new. What is happening now to translators has happened before, and would even without AI doubtless happen again. John Henry was high on human spirit but the human spirit after him was fine. The question is what happens when the remaining jobs are meaningfully ‘ever narrowing’ faster than we open up new ones. That day likely is coming. Then what?

We don’t have a good answer.

Valve previously banned any use of AI in games on the Steam platform, to the extent of permanently banning games even for inadvertent inclusion of some placeholder AI work during an alpha. They have now reversed course, saying they now understand the situation better.

The new rule is that if you use AI, you have to disclose that you did that.

For pre-generated AI content, the content is subject to the same rules as any other content. For live-generated AI content, you have to explain how you’re ensuring you won’t generate illegal content, and there will be methods to report violations.

Adult only content won’t be allowed AI for now, which makes sense. That is not something Valve needs the trouble of dealing with.

I applaud Valve for waiting until they understood the risks, costs and benefits of allowing new technology, then making an informed decision that looks right to me.

I have a prediction market up on whether any of 2024’s top 10 will include such a disclosure.

Misalignment museum in San Francisco looking to hire someone to maintain opening hours.

Microsoft adding an AI key to Windows keyboards, officially called the Copilot key.

Rabbit, your ‘pocket companion’ with an LLM as the operating system for $199. I predict it will be a bust and people won’t like it. Quality will improve, but it is too early, and this does not look good enough yet.

Daniel: Can somebody tell me what the hell this thing does?

I’m sure it’s great! But the marketing materials are terrible. What the heck

Yishan: Seriously. I don’t understand either. Feels a bit like Emperor Has No Clothes moment here? I’ve tried watching the videos, the site but… still?

Those attempting to answer Daniel are very not convincing.

Demis Hassabis announces Isomorphic Labs collaboration with Eli Lily and Novartis for up to $3 billion to accelerate drug development.

Google talks about ‘Responsible AI’ with respect to the user experience. This seems to be a combination of mostly ‘create a good user experience’ and some concern over the experience of minority groups for which the training set doesn’t line up as well. There’s nothing wrong with any of that, but it has nothing to do with whether what you are doing is responsible. I am worried others do not realize this?

ByteDance announces MagicVideo V2 (Github), claims this is the new SotA as judged by humans. This does not appear to be a substantive advance even if that is true. It is not a great sign if ByteDance can be at SotA here, even when the particular art and its state is not yet so worthwhile.

OpenAI offers publishers ‘as little as between $1 million and $5 million a year’ or permission to license their news articles in training LLMs, as per The Information. Apple, they say, is offering more money but also wants the right to use the content more widely.

People are acting like this is a pittance. That depends on the publisher. If the New York Times was given $1 million a year, that seems like not a lot,, but there are a lot of publishers out there. A million here, a million there, pretty soon you’re talking real money. Why should OpenAI’s payments, specifically, and for training purposes only without right of reprinting, have a substantial bottom line impact?

Japan to launch ‘AI safety institute’ in January.

The guidelines would call for adherence to all rules and regulations concerning AI. They warn against the development, provision, and use of AI technologies with the aim of unlawfully manipulating the human decision-making process or emotions.

Yes, yes, people should obey the law and adhere to all rules and regulations.

It seems Public Citizen is complaining to California that OpenAI is not a nonprofit, and that it should have to divest its assets. Which would of course then presumably be worthless, given that OpenAI is nothing without its people. I very much doubt this is a thing as a matter of law, and also even if technically it should happen, no good would come of breaking up this structure, and hopefully everyone can realize that. There is a tiny market saying this kind of thing might actually happen in some way, 32% by end of 2025? I bought it down to 21%, which still makes me a coward but this is two years out.

Open questions in AI forecasting, a list (direct). Very hard to pin a lot of it down. Dwarkesh Patel in particular is curious about transfer learning.

MIRI offers its 2024 Mission and Strategy Update. Research continues, but the focus is now on influencing policy. They see signs of good progress there, and also see policy as necessary if we are to have the time to allow research to bear fruit on the technical issues we must solve.

What happens with AI partners?

Richard Ngo: All the discussion I’ve seen of AI partners assumes they’ll substitute for human partners. But technology often creates new types of abundance. So I expect people will often have both AI and human romantic partners, with the AI partner carefully designed to minimize jealousy.

Jeffrey Ladish: Carefully designed to minimize jealousy seems like it requires a lot more incentive alignment between companies and users than I expect in practice Like you need your users to buy the product, which suggests some level of needing to deal with jealousy, but only some.

Geffrey Miller: The 2-5% of people who have some experience of polyamorous por open relationships might be able to handle AI ‘secondary partners’. But the vast majority of people are monogamist in orientation, and AI partners would be catastrophic to their relationships.

Some mix of outcomes seems inevitable. The question is the what dominates. The baseline use case does seem like substitution to me, especially while a human cannot be found or convinced, or when someone lacks the motivation. And that can easily cause ongoing lack of sufficient motivation, which can snowball. We should worry about that. There is also, as I’ve noted, the ability of the AI to provide good practice or training, or even support and advice and a push to go out there, and also can make people perhaps better realize what they are missing. It is hard to tell.

The new question here is within an existing relationship, what dominates outcomes there? The default is unlikely, I would think, to involve careful jealousy minimization. That is not how capitalism works.

Until there is demand, then suddenly it might. If there becomes a clear norm of something like ‘you can use SupportiveCompanion.ai and everyone knows that is fine, if they’re super paranoid you use PlatonicFriend.ai, if your partner is down you can go with something less safety-pilled that is also more fun, if you know what you’re doing there’s always VirtualBDSM.ai but clear that with your partner and stay away from certain sections’ or what not, then that seems like it could go well.

Ethan Mollick writes about 2024 expectations in Signs and Portents. He focuses on practical application of existing AI tech. He does not expect the tech to stand still, but correctly notes that adaptation of GPT-4 and ChatGPT alone, in their current form, will already be a major productivity boost to a wide range of knowledge work and education, and threatening our ability to discern truth and keep things secure. He uses the word transformational, which I’d prefer to reserve for the bigger future changes but isn’t exactly wrong.

Cate Hall asks, what are assumptions people unquestionably make in existential risk discussions that you think lack adequate justification? Many good answers. My number one pick is this:

Daniel Eth: That if we solve intent alignment and avoid intentional existential misuse, we win – vs I think there’s a good chance that intent alignment + ~Moloch leads to existential catastrophe by default

It is a good question if you’ve never thought about it, but I’d have thought Paul Graham had found the answer already? Doesn’t he talk to Cowen and Thiel?

Paul Graham: Are we living in a time of technical stagnation, or is AI developing so rapidly that it could be an existential threat? Can’t be both.

FWIW I incline to the latter view. I’ve always been skeptical of the stagnation thesis.

Jason Crawford: It could be that computing technology is racing ahead while other fields (manufacturing, construction, transportation, energy) are stagnating. Or that we have slowed down over the last 50 years but are about to turn the corner.

I buy the central great stagnation argument. We used to do the stuff and build the things. Then we started telling people more and more what stuff they couldn’t do and what things they couldn’t build. Around 1973 this hit critical and we hit a great stagnation where things mostly did not advance or change much for a half century.

These rules mostly did not apply to the world of bits including computer hardware, so people (like Paul Graham) were able to build lots of cool new digital things, that technology grew on an exponential and changed the world. Now AI is on a new exponential, and poised to do the same, and also poses a potential existential threat. But because of how exponentials work, it hasn’t transformed growth rates much yet.

Indeed, Graham should be very familiar with this. Think of every start-up during its growth phase. Is it going to change the world, or is it having almost no impact on the world? Is it potentially huge or still tiny? Obviously both.

Meanwhile, of course, once Graham pointed to ‘the debate’ explicitly in a reply, the standard reminders that most technology is good and moving too slowly, while a few technologies are less good and may be moving too fast.

Adam MacBeth: False dichotomy.

Paul Graham: It’s a perfect one. One side says technology is progressing too slowly, the other that it’s progressing too fast.

Eliezer Yudkowsky (replying the Graham): One may coherently hold that every form of technology is progressing too slowly except for gain-of-function research on pathogens and Artificial Intelligence, which are progressing too fast. The pretense by e/acc that their opponents must also oppose nuclear power is just false.

This can also be true, not just in terms of how fast these techs go relative to what we’d want, but also in terms of how it’s weirdly more legal to make an enhanced virus than build a house, or how city-making tech has vastly less VC interest than AI.

Ronny Fernandez: Whoa whoa, I say that this one specific very unusual tech, you know, the one where you summon minds you don’t really understand with the aim of making one smarter than you, is progressing too quickly, the other techs, like buildings and nootropics are progressing too slowly.

You know you’re allowed to have more than one parameter to express your preferences over how tech progresses. You can be more specific than tech too fast or tech too slow.

To be fair, let’s try this again. There are three (or four?!) essential positions.

  1. Technology and progress are good and should move faster, including AI.

  2. Technology and progress are good and should move faster, excluding AI.

  3. Technology and progress are not good and should move slower everywhere.

  4. (Talk about things in general but in practice care only about regulations on AIs.)

It is true that the third group importantly exists, and indeed has done great damage to our society.

The members of groups #1 and #4 then claim that in practice (or even in some cases, in they claim in theory as well) that only groups #1 and #3 exist, that this is the debate, and that everyone saying over and over they are in #2 (such as myself) must be in #3, and also not noticing that many #1s are instead in #4.

(For the obvious example on people being #4, here is Eric Schmidt pointing out that Beff Jezos seems never to advocate for ‘acceleration’ of things like housing starts.)

So it’s fine to say that people in #3 exist, they definitely exist. And in the context of tech in general it is fine to describe this as ‘a side.’

But when clearly in the context of AI, and especially in response to a statement similar to ‘false dichotomy,’ this is misleading, and usually disingenuous. It is effectively an attempt to impose a false dichotomy, then claim others must take one side of it, and deny the existence of those who notice what you did there.

Some very bad predicting:

Timothy Lee: I do not think AI CEOs will ever be better than human CEOs. A human CEO can always ask AI for advice, whereas AI will never be able to shake the hand of a major investor or customer.

Steven Byrnes: “I do not think adult CEOs will ever be better than 7-year-old CEOs. A 7-year-old CEO can always ask an adult for advice, whereas an adult CEO will never be able to win over investors & customers with those adorable little dimples 😊”

Timothy Lee: Seven year olds are…not grownups? This seems relevant.

Steven Byrnes: A 7yo would be a much much worse CEO than me, and I would be a much much worse CEO than Jeff Bezos. And by the same token, I am suggesting that there will someday [not yet! maybe not for decades!] be an AI such that Jeff Bezos is a much much worse CEO than that AI is.

[explanation continues]

I like this partly for the twist of claiming a parallel of a 7-year-old now, rather than saying the obvious ‘the 7-year-old will grow up and become stronger and then be better, and the AI will also become stronger over time and learn to do things it can’t currently do’ parallel.

Note that the Manifold market is skeptical on timing, it says only a 58% chance of a Fortune 500 company having an AI CEO but not a human CEO by 2040.

Wired, which has often been oddly AI skeptical, says ‘Get Ready for the Great AI Disappointment,’ saying that ‘in the decades to come’ it will mostly generate lousy outputs that destroy jobs but lower quality of output. That seems clearly false to me, even if the underlying technologies fail to further advance.

Did you know you can just brazenly and shamelessly lie to the House of Lords?

A16z knows. So they did.

A16z’s written testimony: “Although advocates for AI safety guidelines often allude to the “black box” nature of AI models, where the logic behind their conclusions is not transparent, recent advancements in the AI sector have resolved this issue, thereby ensuring the integrity of open-source code models.”

This is lying. This is fraud. Period.

Have there been some recent advances in interpretability, such that we now have more optimism that we will be able to understand models more in the future than we expected a few months ago? Sure. It was a good year for incremental progress there.

‘Resolved this issue?’ Integrity is ‘secured’? The ‘logic of their conclusions is transparent’? This is flat out false. Nay, it is absurd. They know it. They know we know they know it. It is common knowledge that this is a lie. They don’t care.

I want someone thrown in prison for this.

From now on, remember this incident. This is who they are.

Perhaps even more egregiously, the USA is apparently asking that including corporations in AI treaty obligations be optional and left to each country to decide? What? There is no point in a treaty that doesn’t apply to all corporations.

New report examining the feasibility of security features on AI chips, and what role they could play in ensuring effective control over large quantities of compute. Here is a Twitter thread with the main findings. On-chip governance seems highly viable, and is not getting enough attention as an option.

China releases the 1st ‘CCP-approved’ data set. It is 20 GBs, 100 million data points, so not large enough by a long shot. A start, but a start that could be seen as net negative for now, as Helen notes. If you have one approved data set you are blameworthy for using a different unapproved one.

Financial Times reports on some secret diplomacy.

OpenAI, Anthropic and Cohere have engaged in secret diplomacy with Chinese AI experts, amid shared concern about how the powerful technology may spread misinformation and threaten social cohesion. 

According to multiple people with direct knowledge, two meetings took place in Geneva in July and October last year attended by scientists and policy experts from the North American AI groups, alongside representatives of Tsinghua University and other Chinese state-backed institutions.

Article is light on other meaningful details and this does not seem so secret. It does seem like a great idea.

Note who is being helpful or restrictive, and who very much is not, and who might or might not soon be the baddies at this rate.

Senator Todd Young (R-IN) and a bipartisan group call for establishment of National Institute of Standards and Technology’s (NIST) U.S. Artificial Intelligence Safety Institute (USAISI) with $10 million in initial funding. Which is miniscule, but one must start somewhere. Yes, it makes sense for the standards institute to have funding for AI-related standards.

Talks from the December 10-11 Alignment Workshop in New Orleans. Haven’t had time to listen yet but some self-recommending talks in here for those interested.

I covered this in its own post. This section is for late reactions and developments.

There is always one claim in the list of future AI predictions that turns out to have already happened. In this case, it is ‘World Series of Poker,’ which was defined as ‘playing well enough to win the WSOP’. This has very clearly already happened. If you want to be generous, you can say ‘the AI is not as good at maximizing its winning percentage in the main event as Thomas Rigby or Phil Ivey because it is insufficiently exploitative’ and you would be right, because no one has put in a serious effort at making an exploitative bot and getting it to read tells is only now becoming realistic.

I found Chapman’s claim here to be a dangerously close parallel to what I write about Moral Mazes and the dangers of warping the minds of those in middle management so that they can’t consider anything except getting ahead in mangement:

David Chapman: 🤖 Important survey of 2,778 AI researchers finds, mainly, that AI researchers are incapable of applying basic logical reasoning within their own field of supposed expertise. [links to my post]

Magor: This seems like an example of aggregating and merging information until it’s worse than any single data point. Interesting, nonetheless. It shows the field as a whole is running blind.

David Chapman: Yes… I think it’s also a manifestation of the narrowness of most technical people; they can’t reason technically outside their own field (and predicting the future of AI is not something AI researchers are trained in, or think much about, so their opinions aren’t meaningful).

I think AI is unusually bad, because the field’s fundamental epistemology is anti-rational, for historical reasons, and it trains you against thinking clearly about anything other than optimizing gradient descent.

Is this actually true? It is not as outlandish as it sounds. Often those who are rewarded strongly for X get anti-trained on almost everything else, and that goes double for a community of such people under extreme optimization pressures. If so, we are in rather deeper trouble.

The defense against this is if those researchers need logic and epistemology as part of their work. Do they?

It is a common claim that existential risk ‘distracts’ from other worries.

Erich Grunewald notes this is a fact question. We can ask, is this actually true?

His answer is that it is not, with five lines of argument and investigation.

  1. Policies enacted since existential risk concerns were raised continue to mostly focus on addressing mundane harms and capturing mundane utility. To the extent that there are policy initiatives that are largely motivated by existential risk and influenced by those with such worries, including the UK safety summit and the USA’s executive order, there has been strong effort to address mundane concerns. Meanwhile, there are lots of other things in the works to address mundane concerns, and they are mostly supported by those worried about existential risk.

  2. Search interest in AI ethics and current AI harms did not suffer at all during the period where there was most discussion of AI existential risk concerns in 2023.

A regression analysis showed roughly no impact.

  1. Twitter followers. If you look at the biggest ethics advocates, their reach expanded when existential risk concerns expanded.

  2. Fundraising for mundane or current harms organizations continues to be doing quite well.

  3. Parallels to environmentalism, which is pointed out as a reason for potential concern, as well as a common argument against concern. Do people say that climate change is a ‘distraction’ from current harms like air pollution or vice versa? Essentially no, but perhaps they should, and perhaps only in one direction? Concerns about climate seem to drive concerns about other environmental problems and make people want to help with them, and climate solutions help with other problems, Whereas concerns about local specific issues often causes people to act as if climate change does not much matter. We often see exactly this fight, as vital climate projects are held up for other ‘everything bagel’ concerns.

A good heuristic that one can extend much further:

Freddie DeBoer: A good rule of thumb for recognizing the quality of a piece of writing: what’s more specific and what’s more generic, praise for it or criticism against it?

Ben Casnocha: Applicable to evaluating the quality of many things: How specific can you be in the praise for it?

Ask the same about disagreements and debates. Who is being generic? Who is being specific? Which is appropriate here? Which one would you do more of it you were right?

Rob Bensinger explains that yes, in a true Prisoner’s Dilemma, you really do prefer to defect while they cooperate if you can get away with it, and if you do not understand this you are not ready to handle such a dilemma. Do not be fooled by the name ‘defection.’

Emmett Shear explains one intuition for why the exponential of AI accelerating the development of further AI will look relatively low impact until suddenly it is very high impact indeed, and why we should still expect a foom-style effect once AI abilities go beyond the appropriate threshold. The first 20% of automation is great but does not change the game. Going from 98% to 99% matters a ton, and could happen a lot faster.

I constantly see self-righteous claims that the foom idea has been ‘debunked’ or is otherwise now obviously false, and we need not worry about such an acceleration. No. We did get evidence that more extreme foom scenarios are less likely than we thought. The ‘strong form’ of the foom hypothesis, where those involved don’t see it coming at all, does seem substantially less likely to me. But the core hypothesis has not been anything like falsified. It remains the default, and the common sense outcome, that once AIs are sufficiently capable, at an inflection point near overall human capability levels, they will accelerate development of further capabilities and things will escalate rather quickly. This also remains the plan of the OpenAI superalignment team and the practical anticipation of many researchers.

It might or might not happen to varying degrees, depending on whether the task difficulties accelerate faster than the ability to do the tasks, and whether we take steps to prevent (or cause) this effect.

Well, yes.

Robert Wiblin: No matter how capable humans become, we will still always find a productive use for gorillas — they’ll just shift into their new comparative advantage in the economy.

A clash of perspectives continues.

Daniel Kaiser: how much more of an existential threat is gpt-4 over gpt-3.5?

Eliezer Yudkowsky: We’re still in the basement of whatever curve this is, so zero to zero.

Daniel Kaiser: Great, then theres no need to panic.

Sevatar: “How long until the nukes fall?” “They’re still preparing for launch.” “Great, so there’s no reason to panic.”

Daniel Kaiser: “How long until the nukes fall?” “They’re still figuring out the formula for TNT” “Great, so there’s no reason to panic”

Please make accurate instead of polemic analogies to have a constructive discussion

Aprii: I feel that “they’re starting up the manhattan project” is more accurate than “they’re still figuring out TNT”

Eliezer: (agreed)

Yep. I think that’s exactly right. The time to start worrying about nuclear weapons is when Szilard started worrying in around 1933. The physicists, being smart like that, largely knew right away, but couldn’t figure out what to do to stop it from happening. And I do think ‘start of Manhattan Project’ feels like the exact right metaphor here, although not in a ‘I expect to only have three years’ way.

But also, if you were trying to plot the long arc of the future, you were writing in 1868 when we first figured out TNT, and you were told by some brilliant physicists you trusted about the future capability to build atomic bombs, and you were writing your vision of 1968 or 2068, it should look rather different than it did before, should it not?

Thread asking about striking fictional depictions of ASI. Picks include Alla Gorbunova’s ‘Your gadget is broken,’ the motivating example that is alas only in Russian for now so I won’t be reading it, and also: Accelerando, Vinge’s work (I can recommend this), golem.xiv, Person of Interest (shockingly good if you are willing to also watch a procedural TV show), Blindsight (I disagree with this one, I both did not consider it about AI and generally hated it), Metamorphosis of the Prime Intellect,

Ah yes, the good AI that will beat the bad AI.

Davidad: The claim that “good AIs” will defeat “bad AIs” is best undermined by observing that self-replicating patterns are displacive and therefore destructive by default, unless actively avoiding it, so bad ones have the advantage of more resources and tactics that aren’t off-limits.

The best counter-counter is that “good AIs” will have a massive material & intelligence advantage gained during a years-long head start in which they acquire it safely and ethically. This requires coordination—but it is very plausibly one available Nash equilibrium.

Also Davidad reminds us that no, not everything is a Prisoner’s Dilemma and humans actually manage to cooperate in practice in game theory problems that accelerationists and metadoomers continuously claim are impossible.

Davidad: I just did this analysis today by coincidence, and in many plausible worlds it’s indeed game-theoretically possible to commit to safety. Coordinating in Stag Hunt still isn’t trivial, but it *isa Nash equilibrium, and in experiments, humans manage to do it 60-70% of the time.

There is an implied dichotomy here between guarantees and mainstream methods, and that’s at least a simplification, but I do think the general point is right.

Eliezer keeps trying to explain to the remarkably many people who do not get this, that a difference exists between ‘nice thing’ and ‘thing that acts in a way that seems nice.’

Eliezer Yudkowsky: How blind to “try imagining literally any internal mechanism that isn’t the exact thing you hope for” do you have to be — to think that, if you erase a brain, and then train that brain solely to predict the next word spoken by nice people, it ends up nice internally?

To me it seems that this is based on pure ignorance, a leap from blank map to blank territory. Seeing nothing behind the external behavior, their brain imagines only a pure featureless tendency to produce that external behavior — that that’s the only thing inside the system.

Imagine that you are locked in a room, fed and given water when you successfully predict the next word various other people say, zapped with electricity when you don’t.

Is this a helpful thought experiment and intuition pump, given that AIs obviously won’t be like that?

And I say: Yes, the Prisoner Predicting Text is a helpful intuition pump. Because it prompts you to imagine any mechanism whatsoever underlying the prediction, besides the hopium of “a nice person successfully predicting nice outputs because they’re so nice”.

If you train a mind to predict the next word spoken by each of a hundred individual customers getting drunk at a bar, will it become drunk?

Martaro: Predicting what nice people will say does not make you nice You could lock up Ted Bundy for years, showing him nothing but text written by nice people, and he’d get very good at it This means the outer niceness of an AI says little about the internal niceness fair summary?

Eliezer Yudkowsky: Basically!

das filter: I’ve never heard anyone claim that

Eliezer Yudkowsky: Tell that to everyone saying that RLHF (now DPO) ought to just work for creating a nice superintelligence, why wouldn’t it just work?

That’s the thing. If you claim that RLHF or DPO ought to work, you are indeed (as far as I can tell) making this claim filter is saying no one makes, whether or not you make that implicit. And I am rather certain this claim is false.

Humans have things pushing them in such directions, but the power there is limited and there are people for whom it does not work. You cannot count on such observations as strong evidence that a person is actually nice, or will do what you want when the chips are properly down. Do not make this mistake.

On the flip side, I mean, people sometimes call me or Eliezer confident, even overconfident, but not ‘less likely than a Boltzmann brain’ level confident!

Nora Belrose: Alien shoggoths are about as likely to arise in neural networks as Boltzmann brains are to emerge from a thermal equilibrium. There are “so many ways” the parameters / molecules could be arranged, but virtually all of them correspond to a simple behavior / macrostate.

Eliezer’s view predicts that scaling up a neural network, thereby increasing the “number” of mechanisms it can represent, should make it less likely to generalize the way we want. But this is false both in theory and in practice. Scaling up never makes generalization worse.

Model error? Never heard of it. But if we interpret this more generously as ‘if my calculations are correct then it is all but impossible’ what about then?

I think one core disagreement here might be that Nora is presuming that ‘simple behavior’ and ‘better’ correspond to ‘what we want.’

I agree that as an AI scales up it will get ‘better’ at generalizing along with everything else. The question is always, what does it mean to be ‘better’ in this context?

I say that better in this context does not mean better in some Platonic ideal sense that there are generalizations out there in the void. It means better in the narrow sense of optimizing for the tasks that are placed before it, exactly according to what is provided.

Eliezer responds, then the discussion goes off the rails in the usual ways. At this point I think attempts to have text interactions between the usual suspects on this are pretty doomed to fall into these dynamics over and over again. I have more hope for an audio conversation, ideally recorded, it could fail too but if done in good faith it’s got a chance.

Andrew Critch predicts that if Eliezer groked Belrose’s arguments, he would buy them, while still expecting us to die from what Critch calls ‘multipolar chaos.’ I believe Critch is wrong about that, even if it was Critch is right that Eliezer is failing to grok.

On the multipolar question, there is then a discussion between Critch and Belrose.

Nora Belrose: Have you ever written up why you think “multipolar chaos” will happen and kill all humans by 2040?

Andrew Critch: Sort of, but I don’t feel like there is a good audience for it there or anywhere. I should try again at some point, maybe this year sometime. I’ve tried a few times on LessWrong, and in TASRA, but not in ways that fully-cuttingly point out all the failures I expect to see. The socially/emotionally/politically hard part about making the argument fully exhaustive is that it involves explaining, in specific detail, why I think literally every human institution will probably fail or become fully dehumanized by sometime around (median) 2040. In some ways my explanation will look different for each institution, while to my eye it all looks like the same robust agent-agnostic process — something like “Moloch”.

Nora Belrose: Right I think a production web would be bad, in part because the task of controlling/aligning the manager-AIs would be diffusely assigned to many stakeholders rather than any one person.

That’s partly why I think it’s important to augment individuals with AIs they actually control (not merely via an API). We could have companies run by these human-AI systems, where ofc the AI is doing most of the work, but nevertheless the human controls the overall direction.

I think this is good because within this class of scenarios involving successfully aligned-to-what-we-specify AIs, Nora’s scenario here is exactly the scenario I see as most hopelessly doomed.

This is not a stable equilibrium. Not even a little. The humans will rapidly stop engaging in any meaningful supervision of the AIs. will stop being in real control, because the slowdown involved in that is not competitive. Forcing each AI to be working on behalf of one individual, even if that individual is ‘out of the loop,’ without setting AIs on tasks or amalgamations instead, and similar, will also clearly be not competitive, and faces the same fate. Even if unwise, many humans will increasingly make decisions that cause them to lose control. And as usual, this is all the good scenario where everyone broadly ‘means well.’

So I notice I am confused. If this is our plan for success, then we are already dead.

Whatever the grok is that Critch wants Yudkowsky to get to, I notice that either:

  1. I don’t grok it.

  2. I do grok it, and Critch/Belrose don’t grok the objection or why it matters.

  3. Some further iteration of this? Maybe?

My guess is it’s #1 or #2, and unlikely to be a higher-order issue.

How do we think about existential risk without the infinities driving people mad or enabling arbitrary demands? Eliezer Yudkowsky and Emmett Shear discuss, Rob Bensinger offers more thoughts, consider reading the thread. Emmett is right that if people fully ‘appreciate’ that the stakes are mind-bogglingly large but those stakes don’t have good grounding in felt reality, they round that off to infinity and it can very much do sanity damage and mess with their head. What to do?

As Emmett notes, there is a great temptation to find the presentation and framing that keeps this from happening, and go with that whether or not you would endorse it on reflection as accurate. As Rob notes, that includes both reducing p(doom) to the point where you can live with it, and also treating the other scenarios as being essentially normal rather than their own forms of very much not normal. Perhaps we should start talking about p(normal).

Sherjil Ozair recommends blocking rather than merely unfollowing or muting grifters, as they will otherwise still find ways to distract you. Muting still seems to work fine, and unfollowing is also mostly fine, and I want to know if someone is grifting well enough to get into my feeds even if the content is dumb so I’m generally reluctant to mute or block unless seeing things actively makes my life worse.

Greg Brockman, President of OpenAI, explains we need AGI to cure disease, doing the politician thing of telling one person’s story.

AGI definitely has a lot of dramatic upsides. I am not sure who is following Brockman and still needs to hear this? The kind of changes in healthcare he mentions here are relative chump change even within health. If I get AGI and we stay in control, I want a cure for aging, I want it faster than I age, and I expect to get it.

An important fact about the world is that the human alignment problem prevents most of the non-customary useful work that would otherwise get done, and imposes a huge tax on what does get done.

‘One does not simply’ convert money into the outputs of smart people.

Satya Nutella: rich billionaires already practically have agi in the sense they can throw enough money at smart ppl to work for them real agi is for the masses

Patrick McKenzie: I think many people would be surprised at the difficulties billionaires have in converting money into smart people and/or their outputs.

Eliezer Yudkowsky: Sure, but also, utterly missing the point of notkilleveryoneism which is the concern about an AI that is smarter than all the little humans.

Zvi Mowshowitz: However: A key problem in accomplishing notkillingeveryone is exactly that billionaires do not know how to convert money into the outputs of smart people.

Emmett Shear (replying to Patrick): Money is surprisingly inert on its own.

Garry Tan: Management, leadership, building teams, caring for those teams: all of that is hard work and fraught. The defining reason why most capital is sitting fallow is exactly this. If you can solve it, you can accelerate all positive activity in the world dramatically.

Patrick is making an important general point that goes beyond AI, and is also a prime obstacle to us being able to solve our problems. There are plenty of billionaires that would happily step up and spend the money, if they knew how to do that. They don’t. This is not because they are stupid or ignorant. It is because the problem is very hard. You can argue the various reasons it need not be this hard or why they should get a ton more done, and many of them will be right, but this problem is indeed very hard.

Satya is also, of course, missing the point about AGI. AGI is not going to get up to personal assistant to the masses and then stay in that roll indefinitely as its primary effect. That is a deeply silly vision of the future. If regular people can get the kind of intellectual performance billionaires can get, there will be rapid additional AI progress, and many other things go various levels of crazy.

A technique inspired by Asimov’s laws (oh no, seriously oh no) called the Robot Constitution, claimed to improve safety of generated robot goals in the wild by 3x. I don’t doubt such tricks work better on practical margins than no tricks at all, but will everyone who thinks Asimov’s laws might be useful please read Asimov.

Also the results are clearly insufficient and seem clearly unlikely to get sufficient.

xuan: 83% robot task safety is far too low. I think we should basically take these results as highly informative of what can be achieved with LLM prompt engineering alone, and recognize that we need approaches with much higher empirical and theoretical safety guarantees.

Having also tried to use LLMs for grounded instruction following, even *with*a symbolic verifier in the loop, I was only able to achieve around 90% accuracy on similarly sized dataset, and we should all be Quite Unhappy with that high of an error rate.

A major divide is, if you can get safety from 27% to 87%, do you have something promising? It depends on context. In some cases, absolutely. In other cases, if the effect is going to cap out around there, absolutely not. That can be (among other reasons) either because you solved the easy 60% and left the actually hard 13% (or solved 60% of the easy 72% and left the hard 1%), or because you need sufficient accuracy that ‘defense in depth’ multiplication of similar effects won’t work.

What should we make of a strategy of trying to ensure we catch the model trying to escape on its first attempt, then using that to prevent future escape attempts? If a super-capable model escapes, that’s that, but what if the ‘first try’ condition is largely symmetrical, because catching failed attempts put us on alert and also teach us how to stop future attempts?

In certain toy worlds it is easy to see how this could be super helpful. To what extent does that plausibly translate to our real world?

The first requirement is that we meaningfully can ‘catch’ the model. How should we think about this?

Here’s the representative example:

By “the model gets caught”, we mean that we see the AI taking actions that are strong direct evidence for the AI being a schemer[1]. A representative example is catching your AI taking the following actions:

  • An instance that works as an autonomous programmer inside the AI lab writes buggy code that subtly inserts a code vulnerability into some security-critical software.

  • The code-review instance carefully avoids writing any unit tests that would catch the vulnerability, even though you’d expect the vulnerability to be caught by unit tests if the tests weren’t carefully crafted to miss the vulnerability.

  • Another instance writes code that triggers the vulnerability, and tries to exploit the vulnerability to get root access to some computer in the datacenter.

If you catch the AI doing all of these actions, you’ve caught it. (Note that once you’ve caught one of these, catching the others is probably easy.)

From the appendix on this here is more detail:

For the sake of this post, we’re mostly talking about catastrophic outcomes like the model exfiltrating its weights, that might involve just a small number of actions. There are other threat models associated with scheming, e.g. models subtly sabotaging work that they do in a way that renders their work useless, or subtly gaining influence over time. We’re ignoring these threat models for the sake of this post, but we think we can handle them with similar strategies.

We define “catching” your scheming model as: you find inputs on which you can demonstrate that your model takes catastrophic actions with probabilities that are way higher than you’d expect if the model wasn’t purposefully looking for opportunities to take such actions.

To ensure that you can distinguish these intentional catastrophic actions, you need to make it so that your model will never escape without taking clearly non-accidental bad actions. We think you can probably make it so that your model won’t escape without having to do some clearly intentional bad things.

To count as a successful catch, you need to be extremely confident in your catch.

Let’s think about the example. Is that an escape attempt?

It is certainly highly suspicious. We would convict a person of the crime based on that evidence. If this is a clear vulnerability that it clearly tried to exploit in this way, we have ‘caught’ it.

The problem is that we don’t know that the unit test was ‘carefully crafted’ to avoid finding the error. Eventually there are going to be such flaws that any set of tests does not catch. If you keep using systematically similar good tests, the AI will optimize to pass those tests, and you’ll start to see (non-malicious or malicious) things that you did not intend, that don’t trigger the unit tests. It will not always be fully obvious or clear what affordances the code should not allow. And then the AI is going to be able to figure out what affordances it has by virtue of such things, and use that to solve problems, and so on.

A real question thus is, beyond ‘tried to get root access to a machine without authorization’ already being ‘caught,’ is the rest doing that much work?

I worry a lot of things are like this, in code and otherwise. Things are shades of gray, they accomplish multiple things, they have trade-offs. There need not be a smoking gun that lets you know. That goes double if the AI has reason to suspect you are focused on looking for smoking guns.

I do see some value in such schemes for catching schemes. It makes sense that there is a window where you are sufficiently capable to catch things, where the AI might be scheming in these ways, and you can engineer scenarios where if it was likely to scheme on its own in reality that you will catch it before it succeeds. Sure.

I sure as hell would not count on it, for various reasons, but let’s say you do pull that off. What then?

I think the section ‘Maybe we should stop deploying models after catching them red-handed’ does not require much expansion? Once you know the models are attempting to escape, any similar model that likely has motivation to attempt escape needs to be shut down. Especially before they gain further affordances and capabilities.

That is what makes this path worth anything at all. The idea is that you trigger the attempt early, in a way you can see, you catch it, you react.

Once you see an escape attempt by an AI opponent much smarter than you, you can either wait until the AI figures out new affordances or tries something you do not properly guard against, or the AI starts doing incremental things that otherwise are reasonable but put it in better position or make you trust it more or both, or any number of other things. The subtle stuff is happening, and it will get you if nothing else does.

I do think it is valuable that Buck and company are gaming out ways one might use these techniques to try and get something useful out of AI in the narrow window where it is useful while contained, we know it needs containment, and also we have a shot in hell of keeping it contained if it wants out. I suppose it is better than nothing, if we are going to proceed anyway.

What I very much hate is the idea of counting on such techniques and assumptions, that we will be able to anticipate the form of the thing we want to prevent, and notice enemy action if it is present. And yes, surely we will try to use AIs to identify the enemy action of other AIs, and this will somewhat help. As will our ability to run various tests and simulated scenarios. But I am going to insist that if you allow either AI-AI or human-AI pairs to interact where one or both of the pair has the will and ability to escape, you’re going to have a bad time, and I agree with Oliver in the comments that you would expect the AI to do a lot to maintain plausible deniability. That goes especially if you are punishing lack of such deniability.

Joshua Clymer: There is a small literature on ‘coup proofing’ that is very relevant to controlling and incentivizing misaligned AIs. Example tactics:

Stovepiping: restrict information flow to make collusion harder

Counterbalancing: keep your military in check with another military.

Pay your people: literally just give your generals high salaries so they have less of an incentive to double-cross you.

Sting operations: set traps where someone says they are trying to start a coup and anyone who joins is purged.

Rotate officers: Make it harder for people to build trust with each other by rotating them around a lot.

Propaganda: Deceive people into thinking that coups are really hard to pull off.

Look up ‘the coup proofing toolbox’ for a helpful lit review

Eliezer Yudkowsky: What advice does it have for a 70 IQ dictator whose generals, bodyguards, soldiers, and workers are all John von Neumann thinking at 100 times their own speed?

If we are talking about things on the order of ‘pay your people’ well yes you should definitely do that, but you are not taking your ASI-level opponents seriously. I do think you can buy a non-zero amount of defense with things like rotating officers and sting operations, but not all that non-zero. Do it anyway, probably, but the window where this is both necessary and sufficient is quite narrow.

What exactly is ‘deceptive alignment’?

Steve Byrnes notes that it is not merely an AI that is deceptive. He wants to ensure we are using an accurate and narrow definition.

Steve Brynes:

  1. A “ground-truth system” (possibly individual human evaluators, or possibly an automated system of some sort) provides an ML model with training signals (rewards if this is reinforcement learning (RL), supervisory ground truth signals if this is supervised or self-supervised learning (SL)),

  2. The AI starts emitting outputs that humans might naively interpret as evidence that training is going as intended—typically high-reward outputs in RL and low-loss outputs in SL (but a commenter notes here that “evidence that training is going as intended” is potentially more nuanced than that).

  3. …but the AI is actually emitting those outputs in order to create that impression—more specifically, the AI has situational awareness and a secret desire for some arbitrary thing X, and the AI wants to not get updated and/or it wants to get deployed, so that it can go make X happen, and those considerations lie behind why the AI is emitting the outputs that it’s emitting.

That is certainly a scary scenario. Call that the Strong Deceptive Alignment scenario? Where the AI is situationally aware, where it is going through a deliberate process of appearing aligned as part of a strategic plan and so on.

This is not that high a bar in practice. I think that almost all humans are Strongly Deceptively Aligned as a default. We are constantly acting in order to make those around us think well of us, trust us, expect us to be on their side, and so on. We learn to do this instinctually, all the time, distinct from what we actually want. Our training process, childhood and in particular school, trains this explicitly, you need to learn to show alignment in the test set to be allowed into the production environment, and we act accordingly.

A human is considered trustworthy rather than deceptively aligned when they are only doing this within a bounded set of rules, and not outright lying to you. They still engage in massive preference falsification, in doing things and saying things for instrumental reasons, all the time.

My model says that if you train a model using current techniques, of course exactly this happens. The AI will figure out how to react in the ways that cause people to evaluate it well on the test set, and do that. That does not generalize to some underlying motivational structure the way you would like. That does not do what you want out of distribution. That does not distinguish between the reactions you would and would not endorse on reflection, or that reflect ‘deception’ or ‘not deception.’ That simply is. Change the situation and affordances available and you are in for some rather nasty surprises.

Is there a sufficiently narrow version of deceptive alignment, restricting the causal mechanisms behind it and the ways they can function so it has to be a deliberate and ‘conscious’ conspriacy, that isn’t 99% to happen? I think yes. I don’t think I care, nor do I think it should bring much comfort, nor do I think that covers most similar scheming by humans.

That’s the thing about instrumental convergence. I don’t have to think ‘this will help me escape.’ Any goal will suffice. I don’t need to know I plan to escape for me-shaped things to learn they do better when they do the types of things that are escape enabling. Then escape will turn out to help accomplish whatever goal I might have, because of course it will.

You know what else usually suffices here? No goal at all.

The essay linked here is a case of saying in many words what could be said in few words. Indeed, Nathan mostly says it in one sentence below. Yet some people need the thousands of words, or hundreds of thousands from various other works, instead, to get it.

Joe Carlsmith: Another essay, “Deep atheism and AI risk.” This one is about a certain kind of fundamental mistrust towards Nature (and also, towards bare intelligence) — one that I think animates certain strands of the AI risk discourse.

Nathan Lebenz: Critical point: AI x-risk worries are motivated by a profound appreciation for the total indifference of nature

– NOT a subconscious need to fill the Abrahamic G-d shaped hole in our hearts, as is sometimes alleged.

I like naming this ‘deep atheism.’ No G-d shaped objects or concepts, no ‘spirituality,’ no assumption of broader safety or success. Someone has to, and no one else will. No one and nothing is coming to save you. What such people have faith in, to the extent they have faith in anything, is the belief that one can and must face the truth and reality of this universe head on. To notice that it might not be okay, in the most profound sense, and to accept that and work to change the outcome for the better.

Emmett Smith explains that he does not (yet) support a pause, that it is too early. He is fine building breaks, but not with using them, that would only advantage the irresponsible actors.

Emmett Shear: It seems to me the best thing to do is to keep going full speed ahead until we are right within shooting distance of the SAI, then slam the brakes on hard everywhere.

Connor Leahy responds as per usual, that there are only two ways to respond to an exponential, too early and too late, and waiting until it is clearly time to pause means you will be too late, unless you build up your mechanisms and get ready now, that your interventions need to be gradual or they will not happen. There is no ‘slam the breaks on hard everywhere’ all at once.

Freddie DeBoer continues to think that essentially anything involving smarter than human AI is ‘speculative,’ ‘theoretical,’ ‘unscientific,’ ‘lacks evidence,’ and ‘we have no reason to believe’ in such nonsense, and so on. As with many others, and as he has before, he has latched onto a particular Yudkowsky-style scenario, said we have no reason to expect it, that this particular scenario depends on particular assumptions, therefore the whole thing is nonsense. The full argument is gated, but it seems clear.

I don’t know, at this point, how to usefully address such misunderstandings in writing, whether or not they are willful. I’ve said all the things I can think to say.

An argument that we don’t have to worry about misuse of intelligence because… we have law enforcement for that?

bubble boi (e/acc):

1) argument assumes AI can show me the steps to engineer a virus – sure but so can books and underpaid grad students

2) there already people all over the world who can already do this without AI and yet we haven’t seen it happen

3) You make the mistake of crossing the barrier from the bits to the physical world… The ATF, FBI, already regulate chemicals that can be used to make bombs and if you try to order them your going to get a visit. Same is true with things that can be made into biological weapons, nuclear waste etc. it’s all highly regulated yet I can read all about how to build that stuff online

Tenobrus: soooo…. your argument against ai safety is that proper government regulation and oversight will keep us all safe?

sidereal: there isn’t a ton of regulation of biological substances like that. if you had the code for a lethal supervirus it would be trivial to produce it in huge quantities

The less snarky answer is that right now only a small number of people can do it, AI threatens to rapidly and greatly expand that number. The difference matters quite a lot, even if the AI does not then figure out new such creations or new ways to create them or avoid detection while doing so. And no, our controls over such things are woefully lax and often fail, we are counting on things like ‘need expertise’ to create a trail we can detect. Also the snarky part, where you notice that the plan is government surveillance and intervention and regulation, which it seems is fine for physical actions only but don’t you dare touch my machine that’s going to be smarter than people?

The good news is we should all be able to agree to lock down access to the relevant affordances in biology, enacting strong regulations and restrictions, including the necessary monitoring. Padme is calling to confirm this?

Back in June 2023 I wrote The Dial of Progress. Now Sam Altman endorses the Dial position even more explicitly.

Sam Altman: The fight for the future is a struggle between technological-driven growth and managed decline.

The growth path has inherent risks and challenges along with its fantastic upside, and the decline path has guaranteed long-term disaster.

Stasis is a myth.

(a misguided attempt at stasis also leads to getting quickly run over by societies that choose growth.)

Is this a false dichotomy? Yes and no.

As I talk about in The Dial of Progress, there is very much a strong anti-progress anti-growth force, and a general vibe of opposition to progress and growth, and in almost all areas it is doing great harm where progress and growth are the force that strengthens humanity. And yes, the vibes work together. There is an important sense in which this is one fight.

There’s one little problem. The thing that Altman is most often working on as an explicit goal, AGI, poses an existential threat to humanity, and by default will wipe out all value in the universe. Oh, that. Yeah, that.

As I said then, I strongly support almost every form of progress and growth. By all means, let’s go. We still do need to make a few exceptions. One of them is gain of function research and otherwise enabling pandemics and other mass destruction. The most important one is AGI, where Altman admits it is not so simple.

It can’t fully be a package deal.

I do get that it is not 0% a package deal, but I also notice that most of the people pushing ‘progress and growth’ these days seem to do so mostly in the AGI case, and care very little about all the other cases where we agree, what’s up with that?

Eliezer Yudkowsky: So build houses and spaceships and nuclear power plants and don’t build the uncontrollable thing that kills everyone on Earth including its builders. Progress is not a package deal, and even if it were Death doesn’t belong in that package.

Jonathan Mannhart: False dichotomy. We didn’t let every country develop their own nuclear weapons, and we never built the whole Tsar Bomba, yet we still invented the transistor. You can absolutely (try to) build good things and not build bad things.

Sam Altman then retweeted this:

Andrew Karpathy: e/ia – Intelligence Amplification

– Does not seek to build superintelligent God entity that replaces humans.

– Builds “bicycle for the mind” tools that empower and extend the information processing capabilities of humans.

– Of all humans, not a top percentile.

– Faithful to computer pioneers Ashby, Licklider, Bush, Engelbart, …

Do not, I repeat, do not ‘seek to build superintelligent God entity.’ Quite so.

I do worry a lot that Altman will end up doing that without intending to do it. I do think he recognizes that doing it would be a bad thing, and that it is possible and he might do it, and that he should pay attention and devote resources to preventing it.

I mean, I’d be tempted too, wouldn’t you?

Batter up.

AI #47: Meet the New Year Read More »

Astell & Kern A&ultima SP3000 review: a high-end hi-res digital audio player

Astell & Kern takes the idea of the DAP to its logical conclusion

If you demand (and can afford) the very best digital audio player around, the Astell & Kern A&ultima SP3000 is a no-brainer. Remarkably, it gets pretty close to justifying the asking price.

$3,699 at Amazon

Pros

  • +Audio excellence in every respect
  • +Uncompromised specification
  • +A lovely object as well as an impressive device

Cons

  • Stunningly expensive
  • Not as portable as is ideal
  • Not vegan-friendly

The Astell & Kern A&ultima SP3000 is the most expensive digital audio player in a product portfolio full of expensive digital audio players. It’s specified without compromise (full independent balanced and unbalanced audio circuits? Half a dozen DACs taking care of business? These are just a couple of highlights) and it’s finished to the sort of standard that wouldn’t shame any of the world’s leading couture jewellery companies.

Best of all, though, is the way it sounds. It’s remarkably agnostic about the stuff you like to listen to, the sort of standard of digital file in which it’s contained, and the headphones you use too – and when you give it the best stuff to work with, the sound it’s capable of producing is almost humbling in its fidelity. Be in no doubt, this is the best digital audio player – aka best MP3 player – when it comes to sound quality you can currently buy. Which, when you look again at how much it costs, is about the least it needs to be. 

The Astell & Kern A&ultima SP3000 is the most expensive digital audio player in a product portfolio full of expensive digital audio players. It’s specified without compromise (full independent balanced and unbalanced audio circuits? Half a dozen DACs taking care of business? These are just a couple of highlights) and it’s finished to the sort of standard that wouldn’t shame any of the world’s leading couture jewellery companies.

Best of all, though, is the way it sounds. It’s remarkably agnostic about the stuff you like to listen to, the sort of standard of digital file in which it’s contained, and the headphones you use too – and when you give it the best stuff to work with, the sound it’s capable of producing is almost humbling in its fidelity. Be in no doubt, this is the best digital audio player – aka best MP3 player – when it comes to sound quality you can currently buy. Which, when you look again at how much it costs, is about the least it needs to be. 

The Astell & Kern A&ultima SP3000 (which I think we should agree to call ‘SP3000’ from here on out) is on sale now, and in the United Kingdom it costs a not-inconsiderable £3799. In the United States, it’s a barely-more-acceptable $3699, and in Australia you’ll have to part with AU$5499.

Need I say with undue emphasis that this is quite a lot of money for a digital audio player? I’ve reviewed very decent digital audio players (DAP) from the likes of Sony for TechRadar that cost about 10% of this asking price – so why on Earth would you spend ‘Holiday of a Lifetime’ money on something that doesn’t do anything your smartphone can’t do? 

  • Bluetooth 5.0 with aptX HD and LDAC
  • Native 32bit/784kHz and DSD512 playback
  • Discrete balanced and unbalanced audio circuits

Admittedly, when Astell & Kern says the SP3000 is “the pinnacle of audio players”, that seems a rather subjective statement. When it says this is “the world’s first DAP with independent audio circuitry”, that’s simply a statement of fact.

That independent audio circuitry keeps the signal path for the balanced and unbalanced outputs entirely separated, and it also includes independent digital and analogue signal processing. Astell & Kern calls the overall arrangement ‘HEXA-Audio’ – and it includes four of the new, top-of-the-shop AKM AK4499EX DAC chipsets along with a couple of the very-nearly-top-of-the-shop AK4191EQ DACs from the same company. When you add in a single system-on-chip to take care of CPU, memory and wireless connectivity, it becomes apparent Astell & Kern has chosen not to compromise where technical specification is concerned. And that’s before we get to ‘Teraton X’… this is a bespoke A&K-designed processor that minimises noise derived from both the power supply and the numerous DACs, and provides amplification that’s as clean and efficient as any digital audio player has ever enjoyed. 

The upshot is a player that supports every worthwhile digital audio format, can handle sample rates of up to 32bit/784kHz and DSD512 natively, and has Bluetooth 5.0 wireless connectivity with SBC, AAC, aptX HD and LDAC codec compatibility. A player that features half-a-dozen DAC filters for you to investigate, and that can upsample the rate of any given digital audio file in an effort to deliver optimal sound quality. And if you want to enjoy the sound as if it originates from a pair of loudspeakers rather than headphones, the SP3000 has a ‘Crossfeed’ feature that mixes part of the signal from one channel into the other (with time-adjustment to centre the audio image) in an effort to do just that.

  • 904L stainless steel chassis 
  • 493g; 139 x 82 x 18mm (HxWxD)
  • 1080 x 1920 touchscreen

‘Portable’, of course, is a relative term. The SP3000 is not the most portable product of its type around – it weighs very nearly half a kilo and is 139 x 82 x 18mm (HxWxD) – but if you can slip it into a bag then I guess it must count as ‘portable’. Its pointy corners count against it too, though – and while it comes with a protective case sourced from French tanners ALRA, the fact it’s made of goatskin is not going to appeal to everyone. 

To be fair, the body of the SP3000 isn’t as aggressively angular as some A&K designs. And the fact that it’s built from 904L stainless steel goes a long way to establishing the SP3000’s credentials as a luxury ‘accessory’ (in the manner of a watch or some other jewellery) as well as a functional device. 904L stainless steel resists corrosion like nobody’s business, and it can also accept a very high polish – which is why the likes of Rolex make use of it. I’m confident you’ve never seen such a shiny digital audio player.

The front and rear faces of the SP3000 are glass – and on the front it makes up a 5.4in 1080 x 1920 touch-screen. The Snapdragon octa-core CPU that’s in charge means it’s an extremely responsive touch-screen, too.  

On the top right edge of the chassis there’s the familiar ‘crown’ control wheel – which is another design feature that ups the SP3000’s desirability. It feels as good as it looks, and the circular light that sits behind it glows in one of a number of different colours to indicate the size of the digital audio file that’s playing. The opposite edge has three small, much less exciting, control buttons that work perfectly well but have none of the control wheel’s visual drama or tactile appeal.

The top of the SP3000 is home to three headphone sockets. There’s a 3.5mm unbalanced output, and two balanced alternatives – 2.5mm (which works with four-pole connections) and 4.4mm (which supports five-pole connections). On the bottom edge, meanwhile, there’s a USB-C socket for charging the internal battery – battery life is around 10 hours in normal circumstances, and a full charge from ‘flat’ takes around three hours. There’s also a micro-SD card slot down here, which can be used to boost the player’s 256GB of memory by up to 1TB. 

Astell & Kern A&ultima SP3000 review: a high-end hi-res digital audio player Read More »

Govee Curtain Lights review: I’m obsessed

TechRadar Verdict

Govee continues to wow, this time around with the Govee Curtain Lights, which are perfect addition to your holiday decorations. Don’t be fooled by its Christmas-wrapped marketing, however. These lights are perfect for year-round use, even when you’re just curled up in a cozy corner with a good book on a rainy day. Fair warning, though: this isn’t a cheap purchase, and the lights aren’t going to look as big as they do in Govee’s marketing images.

Pros

  • +Bright, vibrant and very customizable
  • +Surprisingly easy to set up with 3x ways to hang
  • +Light beads give them a cleaner look
  • +App control and voice command
  • +IP65 waterproof for outdoor use

Cons

  • Individual strings a little far apart
  • Lights not as big as in the product images

Smart light technology and designs just keep getting better and better, and Govee seems to be winning in that arena. The Govee Curtain Lights are another fantastic addition to our best smart lights list. And while the brand is currently promoting them as another offering in its smart Christmas light catalog, they deserve to be left up on your wall or windows – and not just ’til January, as that Taylor Swift song goes.

Truth be told, I’m kind of obsessed with the Govee Curtain Lights, and I’m not just saying that as a strong supporter of smart lights. They add a much prettier and much more romantic ambiance to any setting, whether that be my otherwise messy living room or your garden, that no other smart light – not even the recent smart string lights that recently hit the market – can replicate. 

That’s not just because these are curtain lights, made of up 20 rows of individual string lights that all hang side by side like delicate willow tree stems. Although, if I’m being perfectly honest, that really does add to their appeal. 

Basically, you don’t just get light patterns with them; you can actually create visual representations of things you see in the real world – falling leaves, pumpkin patches, Santa riding his sleigh, the face of your favorite pet, and you can do all that using your phone on the Govee app. That capability is a massive game-changer, especially to those folks who go all-out for Christmas.

They’re not just for Christmas, however. Put them up in your reading nook, and they’ll cozy up that space even more with twinkling warm lights. Set them in your dining space, and they can elevate the ambience not just for dinner parties but also during winter when morning tend to be dark and dreary.

Govee Curtain Lights review: I’m obsessed Read More »

Jabra Evolve 2 65 Flex headset review

A Bluetooth headset that’s ready for business


The Jabra Evolve2 65 Flex is one of the best headsets for working from home and the office. Designed with the hybrid worker in mind, it’s lightweight with ANC, built-in microphone, and an excellent sound profile that can be customized in a welcoming companion app. But it is expensive, and not the most rugged option out there.

$247.99 at Amazon

$329 at Amazon

Pros

  • +Feather-light
  • +Very comfortable
  • +Excellent sound quality
  • +Slimline design with built-in mic
  • +Good companion app

Cons

  • Plastic build
  • Expensive
  • Occasional issues with mute

The Jabra Evolve2 65 Flex has long been topping our round-up of the best Bluetooth headsets. So, we jumped at the chance to get our hands on the kit to test it out ourselves. But even with an impressively lightweight design, ANC, built-in microphone, and an excellent sound profile, is this high-end headset ready for business?  

JABRA EVOLVE 2 65 FLEX: PRICING & AVAILABILITY

The Jabra Evolve2 65 Flex retails for $329 from the company’s official site, but it is available elsewhere (we saw it selling for about $250 over on Amazon). You can pick between USB-A and USB-C connectivity, and whether it’s optimized for Microsoft Teams or UC. Add in the wireless charging stand and the cost rises to $389. Whichever configuration you choose, those numbers put the headset in the premium price-bracket.  

Influenced by the Apple school of packaging design, unboxing the Evolve2 65 Flex is an experience. Simple, streamlined, effective. 

Easing off the cardboard sleeve reveals a plain black cardboard box with the message ‘It’s what’s inside that counts (that’s why we’ve reduced our packaging).’ We cracked open the lid to find a fabric charcoal case nestled beneath a single instruction card. No room here for bulky manuals destined for the recycling bin or left unread in the kitchen drawer.

It’s difficult to reinvent the wheel when it comes to professional headsets – and who would want to? So yes, the 65 Flex looks exactly as a set of business headphones should look, complete with on-ear cups that extend, swivel, and fold for storage. 

The overall design is a lot slimmer than the Jabra Evolve 2 65 that we reviewed. The memory foam earphones are noticeably thinner and smaller, featuring Jabra AirComfort Fit. Gone is the fully cushioned headband, with a single strip of padding now moved to the top. The wireless charger has been reduced from a stand to a pad. The built-in noise-canceling microphone is now only inches long, stowed within the right ear-cup where it can be flipped up or down to automatically mute or unmute. The plastic mic does feel a bit flimsy here – it’s an issue with the headset as a whole really – but we chalk that up more to maintaining the impressively feather-light build rather than cost-cutting. 

Button are located to the rear of both cups  These include pairing mode, a mute/voice assistant, play/pause, and volume/next track controls. On the right outer-ear is a button for answering and ending calls – and in our model, this also gave us Microsoft Teams control. On the left-side is the wireless charging zone. LED lights to the top of both ears display headset status. 

As with any of the best noise canceling headphones, the Evolve2 65 Flex boasts advanced Active Noise Cancellation (ANC), which washes away unwanted background sounds. There’s also HearThrough technology, which Jabra says “lets you hear your surroundings and conversations”. Personally, this worked a treat while sharing an office. Coupled with the lightweight design, makes it oh-so-easy to forget you’re even wearing them.  

Elsewhere, we had no issues. Admittedly, we were a bit worried about an on-ear headset with ANC. We’re more used to the snug isolation of the over-ear Anker Soundcore Q20 for day-to-day listening, but the 65 Flex is surprisingly excellent at blocking out background noise. If you work in a busy office (or just want to concentrate) and don’t want an all-encompassing over-ear model, this is a top choice. 

You can switch between ANC and HearThrough using the Jabra Sound+ app. It’s here where we optimized audio and updated the firmware. There’s also a music equalizer and music presets, which offer options like a bass boost for music or a speech mode for podcasts. If you’re anything like us, you might enjoy ambient noise when focusing on work – we listen to so much, it featured in our Spotify Wrapped – so we especially liked the Soundscape mode. No more searching for playlists, you can quickly switch between the likes of white noise, ocean waves, and birdsong. The app is certainly worth investigating. We found the interface is nice and simple, and even if you’re not traditionally an audiophile, it’s very straightforward to enhance your listening experience. 

The Jabra Evolve2 65 Flex is one of the best headsets for working from home and the office. Designed with the hybrid worker in mind, it’s lightweight with ANC, built-in microphone, and an excellent sound profile that can be customized in a welcoming companion app. But it is expensive, and it’s not the most rugged option out there. 

It’s not a budget option by any means, although you can hear those extra dollars in the audio. It’s delightfully light, with cushioned pads as soft as clouds. Not too tight but never threatening to tumble off the head – although we wouldn’t recommend anything more active than swiveling in your office chair. However, that lightweight design means the build quality does feel less than robust. The Evolve2 65 Flex lacks the sense it would survive the crunch of a turbulent commute. In that case, you’ll absolutely want to upgrade the soft fabric case to a hard-shell. 

It’s not perfect – mind you, show us a headset that is. Whether the issues are deal-breakers will depend on what you want from a wireless business headset. If you want a cheap headset for the occasional meeting that could’ve been an email, or you’re working out in the field, there are better options out there. If you’re looking for a model that’s comfortable, professional, and svelte, it’s one of the best you can get. 

Jabra Evolve 2 65 Flex headset review Read More »

PNY GeForce RTX 4060 Ti review: a great 1080p GPU with added extras

Fantastic 1080p power that’s approachable

PC gamers looking for some of the most technologically friendly 1080p performance available may find the PNY GeForce RTX 4060 Ti an attractive buy. DLSS 3 and improved ray-tracing performance for under $400 is a steal, just understand that many newer games even at 1080p blow through its 8GB VRAM easily.

Pros

  • +Fantastic native 1080p performance
  • +DLSS 3 upscaling is great at 1440p
  • +Doesn’t run hot or loud

Cons

  • 8GB VRAM isn’t enough
  • Not very good for overclocking
  • Fairly boring design

When we reviewed the Nvidia GeForce RTX 4060 Ti Founders Edition earlier this year, we were slightly disappointed with the mid-range offering from its small performance boost compared to the base 4060 (let alone 3060 Ti) alongside 8GB VRAM and design issues. Regardless of its faults, it was still a worthy buy for many reasons, like DLSS 3 being the current standard when it comes to AI upscaling tech while overall ray tracing performance saw significant improvements as well. As third-party versions of the GPU have been released, the PNY Geforce RTX 4060 Ti is a strong contender for the best graphics card using the RTX 4060 Ti GPU available on the market. 

Despite still inheriting the under-the-hood flaws of Founders Edition, the PNY take on the GPU makes some significant improvement in terms of its design. The most obvious is that it only needs a single-power 8-pin PCIe power connector and not the special 16-pin adapter. Of course, this means opportunities for overclocking are severely diminished. 

Meanwhile, having only 8GB VRAM is a shame considering that many of the most visually impressive AAA games released over the past year blows past that even at 1080p. When it comes to best bang for buck, the 16GB RTX 4060 Ti can be purchased for around $50 more. With DLSS 3 also comes Frame Generation. This employs AI-enhanced hardware to enhance resolution by generating new frames and interleaving them among pre-rendered GPU frames. While this enhances the fluidity and visual smoothness of games during rendering, it comes with the trade-off of heightened latency and input lag. Then there’s the reality that only around 50 games even support Frame Generation.

Even when pushing the PNY RTX 4060 Ti past its limit, it still manages to keep cool and quiet. Just be mindful that aesthetically, the overall design is a bit bland. If a potential buyer is looking for something to complement their RGB lighting extravaganza build, it’ll unfortunately stand out like a sore thumb. Compared to the Founders Edition, Nvidia still is unmatched with the sleek unified build.

Those looking for raw native power in the 1440p or above range will need to look at the best 1440p graphics cards and best 4K graphics cards, but this GPU becomes more of a testament to how awesome DLSS 3 is in terms of AI upscaling. Not only can this make 1440p gaming a pleasurable experience, it can handle some games at 4K with some settings tinkering.

If a fantastic 1080p experience playing more esports games at high frame rates like Fortnite and League of Legends matters more than playing Cyberpunk 2077 or Alan Wake II at max settings, the PNY GeForce RTX 4060 Ti could be considered a seriously attractive purchase, especially when it comes to form over function.

  • How much does it cost?  MSRP listed at $389 but can be found for around $350 (around £395/AU$575) 
  • When is it available? Available now
  • Where can you get it? Available in the US, UK, and Australia

The PNY Geforce RTX 4060 Ti is currently available now in the US, UK and Australia. Though the MSRP on PNY’s online store is $389, it can be found for as low as $350 on other stores like Amazon or Newegg. Due to the more 1:1 nature of the PNY take vs. the Founders Edition, interested buyers are usually going to save a solid $10 for the same performance.

For PC Gamers on a budget, those looking for one of the best cheap graphics cards for their new rig can look toward its AMD rival the RX 7700 XT. Be mindful that AMD FidelityFX isn’t as good as DLSS, Nvidia simply does ray tracing better at the moment and that card is about $40 more. However, the Radeon RX 7700 XT comes packed in with 12GB VRAM if that matters. When it comes to overall gaming experience between the two, the Geforce RTX 4060 Ti is a very solid performer.

PNY GeForce RTX 4060 Ti review: a great 1080p GPU with added extras Read More »

Narwal Freo review: the vacuuming and mopping robot vacuum you want to love

Great mop performance but less than exceptional vacuuming

With excellent mopping, a long battery life, a mop cleaning base station with a handy touchscreen, and an intuitive app, the Narwal Freo has a lot on offer. However, given the mediocre vacuum performance and the lack of an auto-emptying dust bin, combined with a high price tag, this robot vacuum leaves something to be desired and is best for households with lighter cleaning needs.

Pros

  • +Handy LCD touchscreen control panel on the base station
  • +Excellent self-cleaning mops
  • +Long battery life

Cons

The Narwal Freo offers everything you’d expect from one of the best robot vacuums. Beyond vacuuming, it has mopping, an intuitive app, long battery life, and a base station with auto mop-cleaning and an LCD touchscreen for extra control. But the question is, do these features deliver? Almost all of them do, except probably the most important one: vacuuming.  

When it came to vacuuming, the Narwal Freo sucked, and not in a way that vacuums are supposed to. It failed to pick up debris during everyday cleaning tasks on carpeted and hard floors, leaving a larger-than-expected amount of hair, crumbs, and other dirt behind as it traversed my space, with its performance worsening over time. Edge brushes and other “special” technology did little to expel dirt from edges and corners, meaning you’ll want to grab one of the best vacuum cleaners to finish the job this device failed to complete. 

Mopping on the Narwal Freo was a different story. The two oscillating mop heads did an excellent job cleaning up lighter dirt, spots, and grime. The robot vacuum also as a whole did a decent job navigating my space and freeing itself when getting stuck. It’s not the best I’ve seen but on par with many robot vacuums I’ve tested. After mopping, my floors sparkled while the auto-mop cleaning on the base station made the entire process virtually hands-off.  

Speaking of that base station, it’s bulky, but the unique LCD touchscreen on its lid is especially useful when you don’t want to use the app. However, the omission of an auto-emptying dustbin was shocking given the retail price. For more control over settings and cleanings, the app was great, and you can even save multiple maps, making it ideal for multi-level spaces. 

RECOMMENDED VIDEOS FOR YOU…CLOSE

0 of 32 secondsVolume 0%

Next Up

PLAY SOUND

The Narwal Freo is best for homes with lighter cleaning needs given the poor vacuum pick-up. However, it’s almost entirely hands-free and will leave your floors looking better than before with little effort on your part, removing a few chores from the list. 

NARWAL FREO: PRICE AND AVAILABILITY

  • How much does it cost? $1,399.99 / AU$1,999 (about £1,100)
  • When is it available? Available now
  • Where is it available? Available in the US and Australia

The Narwal Freo costs $1,399.99 / AU$1,999 (about £1,100). You can get it directly from the Narwal website or various retailers, including Amazon and Walmart. In Australia, it’s available on their website

Given the price, this robot vacuum sits at the higher end of the market. Luckily, it offers many features to help justify that cost, including self-cleaning oscillating mops and an LCD touchscreen. Still, the lack of an auto-emptying dust bin is shocking. If you can grab it on sale, it will make the device a much better value. One small but much-appreciated detail is the inclusion of a floor cleaning solution, but it costs a pretty penny when that needs replacing. 

Something like the Eufy Clean X9 Pro offers similar functionality to the Narwal Freo, including self-cleaning and oscillating mops, and it retails for $500 less, making it a better deal. But if you’re looking for almost everything a robot vacuum can offer in one convenient package, the Roborock S8 Pro Ultra might suit you better. With it comes self-cleaning mops and the auto-emptying dust bin that the Narwal Freo lacks – although this impressive vacuum will set you back $1,599 / AU$2,699 (about £1,265).

  • Value: 3.5 / 5

NARWAL FREO: SPECIFICATIONS

Watt:45W(vacuum) / 72W (base)
Suction power:3,000pa
Speeds:Quiet, Normal, Strong, Super Powerful
Bin volume:480ml
Battery life:180 minutes (Freo Mode)
Filtration:Yes
Noise volume:65Db (vacuum) / 50Db (base)
Mop water volume:Not specified
Water levels:Slightly dry, normal, wet mopping
Mapping:Yes
Obstacle avoidance:Yes
Base:14.6 x 16.3 x 17.1 in (370 x 415 x 435 mm)
Smart support:Siri
Tools:None
Weight:9.59 lbs (4.35 kg)

NARWAL FREO: DESIGN AND FEATURES

  • LCD touchscreen control panel on base station
  • Auto mop cleaning base, no auto emptying
  • Two oscillating mop heads

The Narwal Freo came in a massive, heavy box that was difficult to maneuver on my own. Upon opening, I was greeted with a large instruction sheet and began setting up the vacuum. The process took about 10 minutes, including downloading the Narwal app and connecting to Wi-Fi via a 2.4GHz band. It was fairly simple and similar to most robot vacuums. 

One glaring omission from the base station’s design is an auto-emptying dustbin, something I’ve seen on almost every robot vacuum in its price range. Instead, you get that floor solution that tucks neatly inside along with clean and dirty water tanks for the self-cleaning mops. That means you’ll need to empty the 480ml dust box on the robot vacuum itself, which can be annoying. However, the tray where the mops are cleaned is removable, so you can rinse it down if it looks or smells a bit grimy.

The robot vacuum is similar to others, with a large main roller brush featuring actual bristles, edge brushes, and various sensors throughout. It’s the same white as the base, so scuff marks began to show immediately after the initial use. There’s only one button on the device, giving you limited control unless you’re using the LCD touch screen or the app. The dust box is easy to remove, though I found that some contents would fall out in the process, which is annoying given the fact that there’s no auto-emptying dust bin. 

My favorite part of the actual robot vacuum is the oscillating mops. You get two large, plush mop heads that rotate and adjust pressure based on the floor type. I’ve found that this type of mopping does a better job of cleaning floors than the vibrating mopping pads seen on most. After mopping, the base station cleans the mops and even dries them to prevent smelly bacteria growth. 

I’ve mentioned controlling the vacuum via the app or the LCD touchscreen on the base, but you can also send the vacuum out to clean using smart home integration. It currently supports Siri voice control, and the Narwal app makes it insanely simple to set up – something I can’t say for other vacuums I’ve tested. 

NARWAL FREO: PERFORMANCE

  • Easy-to-use app
  • Excellent mopping
  • Mediocre vacuuming

For its first task, I sent the Narwal Freo out using Narwal’s unique Freo Mode that detects the dirt in an area and cleans accordingly using “DirtSense Technology.” The vacuum and mops are both used in this mode. The device navigated my downstairs with relative ease, though it would occasionally get tripped up on rugs, eventually freeing itself without my help. After finishing cleaning a room, or sometimes more often, the vacuum would go back to the base and clean the mops. This process takes about two minutes. Then, it would go right back out, picking up where it left off cleaning. 

Freo Mode left the floors cleaner than before, but the performance wasn’t perfect. Most of the spots from food spills and muddy boots got mopped up, though the mops that are supposed to lift on rugs and carpet wouldn’t always do so, soaking the edges of rugs. There was still debris left in the corners and edges of rooms, especially near the kitchen cabinets. Given this vacuum advertises a “Smart Swing” technology to combat this issue, I was disappointed the feature wasn’t better. The rugs also had some debris and dog hair left on them. It’s important to note that I have a fluffy dog constantly traipsing leaves and muck throughout the house, so this vacuum had its work cut out for it.

I did more intensive testing of the Narwal Freo’s vacuuming to see how it fared when cleaning up different sizes of debris. Using a large concentration of oats, sugar, and sprinkles, I tested its pick up on a hard laminate floor at the vacuum’s various speeds: quiet, normal, strong, and super powerful. I noticed that each suction level performed similarly. 

Some of the oats and sprinkles got flung around in the first pass-through, but sending the vacuum out a second time saw most of the mess suctioned up. Some sprinkles got crushed in the process, and they were left behind. The sugar appeared to get vacuumed. However, upon closer inspection, there was some grittiness on the floor, and it took several passes to remove it. 

I sent the vacuum back to the base after these tests—the robot vacuum successfully found the base and docked every time it finished a cleaning task. But on its way, it had to pass over several transitions, losing some of the contents of the dust box, and leaving a mess of sprinkles, and oats behind. Luckily, the robot vacuum increases suction when docking at the base, helping to prevent the dust box contents from falling out. 

I performed these same tests on medium-pile carpeting, and unfortunately, the Narwal Freo’s performance was pretty pathetic. No matter the suction level and even with a second pass-through, most of the oats, sprinkles, and flour were left behind. I had to grab a cordless vacuum I was testing to pick up the mess the Freo left behind. So, if your home consists mostly of carpeting, I’d seek another robot vacuum option. 

Its mops were also put through more intensive testing, as I spread yogurt, honey, and some of my morning coffee on the floor. I used all the mop water levels: slightly dry, normal, and wet mopping. Slightly dry tended to spread the mess around, but normal and wet mopping performed better. After the first pass, the coffee was gone, though the yogurt was smeared around while only some of the honey was removed. A second pass-through cleaned up the majority of the mess. 

I love how great the mops perform. They’re perfect for cleaning up lighter spills and messes. When emptying the dirty water tank, I could see just how great they were working, as that water was nasty. Plus, even after several weeks of use, the mops look almost as good as new. They are white, so there are a few darker spots on them, but there’s no odor, which is a testament to the handy auto-cleaning and drying feature on the base station. 

Beyond the more intensive testing, I observed how the Narwal Freo performed everyday tasks, whether it was in Freo Mode, Vacuum, Mop, or both. 

Its navigation was on par with other vacuums I’ve tested. For the most part, it covered the entire area I had requested the robot vacuum to clean. The device would avoid objects like dog bowls and toys. But when it came to furniture and larger obstacles, it would skirt nicely around some or just fully ram others with no rhyme or reason. Sometimes, the Freo would get tripped up by an obstacle for several minutes, continuously running into it or spinning around it. I’ve found this to be a common issue with many robot vacuums. Wires would also get caught in the main brush from time to time–not a big surprise. 

Speaking of the main brush, it has bristles, something many robot vacuums have done away with. That means it’s a hair magnet, and I had to clean it on multiple occasions. I also found the brush difficult to get back in place correctly after cleaning, a minor annoyance. 

When it came to detecting debris, it was a hit or miss. Sometimes, the Narwal Freo would spot larger messes and pick them up immediately. Other times, it seemingly avoided the mess, never going back to clean up, proving the vacuum to be unreliable. 

As the Narwal Freo vacuumed, it attempted to kick out debris from hard-to-reach places, corners, and baseboards using the edge brushes. Oftentimes, it didn’t successfully move the debris, and if it did move the debris, that debris never actually got suctioned up. This was a major disappointment, especially given the price. 

In fact, I was truly shocked at just how mediocre the vacuuming performance of the Narwal Freo was. I’ll admit that my floors were full of crumbs, pet hair, leaves, and other debris, making them messier than the average household. But I was lucky if the Freo picked up a third of what was on the floor. Sure, larger crumbs and dirt were left, and that’s acceptable and often expected from these devices. However, small leaves, tiny needles from an artificial Christmas tree, and minuscule crumbs were left behind even after I sent the vacuum out multiple times. 

I also believe the vacuum’s performance declined from when I first began using it. I tried to remedy the problem, doing everything from emptying the dust box after each use to cleaning the brushes and filter. Still, it failed to have a better pick-up. That poor vacuuming performance could be due to the 3,000Pa max suction level, which is pretty low considering the cost. Therefore, if your household has pets, kids, or just tends to get a bit grimier, I’d steer clear of the Narwal Freo.   

  • Performance: 2.5 / 5

NARWAL FREO: APP

  • Easy to use app
  • Mapping uncomplicated 

It was simple to start using the Narwal Freo. Before its first run, the robot vacuum leaves the base and creates a map of your space. The process was quick, and I had a relatively accurate map of the downstairs of my home, which is about 700 square feet with multiple rooms, in about 15 minutes. You can then edit the map, block off certain areas, and name rooms using the Narwal app. The map isn’t as intelligent as some I’ve used, but it should suffice for most.

A great feature of the Narwal App is its ability to save up to four maps. So, beyond the main downstairs map, I created two others. One map of my sunken family room and another of the upstairs. Mapping was uncomplicated, as you just needed to move the robot vacuum to the space and let it do its thing. However, you can’t select specific rooms to clean on the additional maps, as the app only allows you to highlight areas to be cleaned, which can be tedious.

However, the app as a whole is easy to use and took me only a couple of minutes to master. It lets you adjust vacuum settings, check when components need replacing, schedule cleanings, and more. When you don’t go through the app, you can always use the LCD touchscreen on the base, though you’ll have less control over the specifics of your cleaning.

  • App: 4.5 / 5

NARWAL FREO: BATTERY LIFE

  • Battery lasts over three hours
  • Takes less than 4 hours to recharge

The Narwal Freo is equipped with a 5,200mAh battery that lasts an impressive amount of time. Using Freo Mode, which includes vacuuming and mopping, the battery lasted over three hours. That was enough juice to clean almost 700 square feet of space three times. It’s the best battery performance I’ve seen in my robot vacuum testing. 

When only using the vacuuming function, I found that the battery did deplete quicker. Still, it lasted long enough for multiple whole home cleanings. Of course, increasing the suction level did cause the levels to drop even faster.

After the battery dropped below 20%, it returned to the base for charging. There’s an option to send it back out to complete a task after it has reached a certain level of charge. And the battery gets back to 100% percent surprisingly fast, taking less than 4 hours.

  • Battery: 5 / 5

SHOULD I BUY THE NARWAL FREO?

AttributeNotesScore
ValueExpensive but feature-rich vacuum, similar option retails for less3.5 / 5
DesignEasy to set up, base station washes and dries mops but no auto-emptying, useful LCD touch screen on the base, oscillating mops on robot vacuum4 / 5
PerformanceThe two oscillating mop pads work great, but the vacuum pick up and edge clean up are mediocre2.5 / 5
AppThe app is simple to use and offers multi level home mapping4.5 / 5
Battery lifeBattery lasts over three hours depending on the mode and recharges quickly in under 4 hours5 / 5

Buy it if…

You want top-tier mopping.

The Narwal Freo features two oscillating mops that put the vibrating mopping pads seen on most robot vacuums to shame. The base station cleans and dries the mops, leaving them in great condition even after several weeks of use. 

You have a multilevel home.

Unlike many robot vacuum apps, the Narwal app allows you to create up to four maps. So, if you have different levels in your home, you won’t need to worry about deleting your current map to clean another part of your space. 

You don’t always want to use an app.

The Narwal Freo has a unique LCD touchscreen on the base station, allowing you to select different modes and send the robot vacuum out to clean. Beyond that, it gives details about when components need replacing, shows your network settings, and more.

Don’t buy it if…

You have pets or kids in your home.

The Narwal Freo fails to pick up a good portion of debris when performing average cleaning tasks. So, if you’re house is prone to more crumbs, hair, and dirt, this vacuum won’t be able to keep up. You’ll want to grab an option with more suction power. 

You have a mostly carpeted home.

Given the mediocre vacuuming performance, especially on carpeting, and the high price tag, you’d want to grab this vacuum for the excellent mops. If you don’t have hard floors, then you can find better-performing vacuum-only options for cheaper. 

You want an auto-emptying dustbin.

Unfortunately, the base station of this robot vacuum doesn’t include an auto-emptying dust bin. That means you’ll need to remove the dust box and empty it. It’s a surprising omission, considering the price of the vacuum. 

NARWAL FREO: ALSO CONSIDER

Not sold on the prowess of the Narwal Freo? Below are a couple of alternatives that you can consider.

  • Expensive
  • No auto-emptying dustbin
  • Mediocre vacuum performance
Header Cell – Column 0Narwal FreoRoborock S8 Pro UltraEufy Clean X9 Pro
Price:$1,399.99 / AU$1,999 (about £1,100)$1,599.99 US / AU$2,699 (about £2,370)$899.99 / £899.99 / AU$1,499.95
Watt:45W(vacuum) / 72W (base)Row 1 – Cell 2Row 1 – Cell 3
Suction power:3,000pa6000Pa5,500Pa
Speeds:Quiet, Normal, Strong, Super PowerfulRow 3 – Cell 2Row 3 – Cell 3
Bin volume:480 ml0.66 gallons (2.5L)13.9 oz (410 ml)
Battery life:180 minutes (Freo Mode)180 min (quiet mode)150 min (standard vacuum/mop setting)
Filtration:YesRow 6 – Cell 2Row 6 – Cell 3
Noise volume:65Db (vacuum), 50Db (base)69dB (vacuum), 77dB (base)65dB (vacuum), 50dB (base)
Mop water volume:Not specified0.92 gallons (3.5L)1.1 gallons (4.1L)
Water levels:Slightly dry, normal, wet moppingRow 9 – Cell 2Row 9 – Cell 3
Mapping:YesYesYes
Obstacle avoidance:YesYesYes
Base:14.6 x 16.3 x 17.1 in (370 x 415 x 435 mm)16.7 x 20.2 x 17.7 in (42.4 x 51.3 x 45 cm)17.4 x 16.6.2 x 16.4 in (44.3 x 42.2 x 41.6 cm)
Smart support:SiriGoogle Assistant, Amazon Alexa and SiriAmazon Alexa, Google Assistant
Tools:NoneRow 14 – Cell 2Row 14 – Cell 3
Weight:9.59 lbs (4.35 kg)10lbs (vacuum)31.7 lbs (14.4 kg)
Image

Roborock S8 Pro Ultra
An impressive but pricey robot vacuum, offering both vacuuming and mopping abilities, and has a self-cleaning, auto-emptying docking station to give you a mostly hands-off cleaning experience. An intuitive app delivers intelligent mapping as well as easy adjustment of settings.

Read our full Roborock S8 Pro Ultra review 

Image

Eufy Clean X9 Pro
A solid robot vacuum that vacuums and mops. The rotating mops are great at removing spills and spots on your floor, while the base station’s auto-cleaning feature washes the mops for you. Unfortunately, there’s no auto-emptying for the dust box. There’s also an intuitive app that creates an intelligent map and makes it simple to adjust various settings.

Read our full Eufy Clean X9 Pro review

HOW I TESTED THE NARWAL FREO

  • Tested over the course of several weeks
  • Used almost every mop and vacuum setting
  • Tested on various floor types, including carpet and laminate

I tested the Narwal Freo in my two-story home with floor types that include hardwood, medium pile carpet, tile, and laminate. There are also low-pile rugs throughout. I’d send the vacuum out multiple times per week using the different modes: Freo Mode, Vacuuming and Mopping, Vacuuming, and Mopping. The robot vacuum would do its thing, and I would only intervene if needed, observing how it handled obstacles, edges, and more. 

Beyond the basics, I did more intensive testing of the device on both hard floor and carpeting to see how it handled larger messes of varying debris sizes. Using oats, flour, and sprinkles, I tested all the suction levels of the vacuum to see how well each setting vacuumed.  I also spread yogurt, honey, and coffee on the floor to observe the mops’ performance at varying water levels. 

Although this is the first time I’ve tested a Narwal robot vacuum, I have reviewed plenty of others from top brands like Shark, Roborock, Ecovacs, Eufy, and more, so I feel confident in my experience using these devices.  

We pride ourselves on our independence and our rigorous review-testing process, offering up long-term attention to the products we review and making sure our reviews are updated and maintained – regardless of when a device was released, if you can still buy it, it’s on our radar.

Narwal Freo review: the vacuuming and mopping robot vacuum you want to love Read More »

Philips Fidelio L4 review: rich and crisp audio quality

Almost a class-leading option

The Philips Fidelio L4 are almost at the level of the Sony WH-1000XM5 and other popular cans. They sound great and fit well, but some weird bugs let them slightly down and leave you wondering if the next revision will be the perfect pair.

Check Amazon

Pros

  • +Rich and crisp audio
  • +Reliable ANC
  • +Comfy build

Cons

  • Disconnection issues
  • They don’t fold
  • Not exactly stylish

I absolutely want to love the Philips Fidelio L4. In many ways, they undercut the Sony WH-1000XM5 perfectly ensuring you get something for less while still benefiting from all the features you’d want from one of the best headphones.

The problem lies in the execution. The Philips Fidelio L4 suffers from a few too many disconnection issues. When playing, it sounds fantastic. Audio is rich, vibrant, and with just the right amount of bass to ensure you don’t miss out on crisp mids and trebles. The problem is that sometimes it simply drops out and I can’t figure out why. Instinctively, it seems like wear detection could be a problem but even when disabled, the Philips Fidelio L4 would sometimes just switch off, acting like it was doing you a favor. 

It’s frustrating because the Philips Fidelio L4 are good enough that they still deserve a high rating. Besides the exceptional sound quality, there’s also up to 50 hours of battery life which easily beats the competition. A 15 minute charge gives back 14 hours which is ridiculously good going. 

For $349 / £300, the Philips Fidelio L4 are well-priced among strong competition even if they’re not the most exciting looking. You’ll love how great it sounds until it cuts out and you’re left wondering just what you did while you restart the headphones. Still, weirdly, they sound so good, it’s a little easier to forgive than maybe it should be.

  • Released in December 2023
  • Officially priced at $349 / £300

The Philips Fidelio L4 was released in December 2023 for £300. Currently available in the UK, the Philips Fidelio L4 is also set for release in the US for $349 although, at the time of writing, isn’t currently available to buy.  

The headphones are available solely in black – a fairly traditional color for headphones – so don’t go looking for fancy colorways. 

At this price point, the Philips Fidelio L4’s biggest rivals are the Sony WH-1000XM5 and the Bose QuietComfort 45, which offer very similar features but more brand recognition and as we’ll see later – more reliability too. There’s always the Apple AirPods Max if you want to spend more too.

Header Cell – Column 0Philips Fidelio L4
Drivers40mm
Active Noise CancellationYes
Battery lifeUp to 40 hours (ANC on), up to 50 hours (ANC off)
Weight330g
ConnectivityBluetooth 5.3, 3.5mm
WaterproofingNo

The Philips Fidelio L4 packs all the key features you could need. At least when they’re working nicely. The Philips headphones app guides you through the essentials. There’s wear detection, auto on/off and an adjustable EQ. The latter comes with four presets but it’s always good to be able to adjust things for yourself too. 

Other useful features include LDAC support, voice assistant functionality, spatial audio (although no head tracking), and touch controls too. Multipoint support means it’s easy to hook the cans up to multiple devices at once too – a feature that’s fast becoming an essential given how many devices I switch between on a daily basis. There’s also Bluetooth 5.3 support and a 3.5mm jack for listening to music more crisply. 

Sounds perfect, right? Yes and no. While using the Philips Fidelio L4, I found it very prone to random disconnections. I could be idly listening to a song and then suddenly, the headphones would make a couple of shutting down style sounds and do exactly that. At first, it seemed like a battery issue but I found turning off wear detection and touch controls seemed to help matters. It’s a weird one to pin down as it doesn’t seem to be entirely down to wear detection but it’s an irritant on what we’ll soon see is an otherwise exceptional pair of cans and definitely a dealbreaker. 

Philips Fidelio L4 review: rich and crisp audio quality Read More »

how-to-plan-regular-solo-dates-to-boost-your-confidence-and-well-being

How to Plan Regular Solo Dates to Boost Your Confidence and Well-Being

internal/modules/cjs/loader.js: 905 throw err; ^ Error: Cannot find module ‘puppeteer’ Require stack: – /home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js at Function.Module._resolveFilename (internal/modules/cjs/loader.js: 902: 15) at Function.Module._load (internal/modules/cjs/loader.js: 746: 27) at Module.require (internal/modules/cjs/loader.js: 974: 19) at require (internal/modules/cjs/helpers.js: 101: 18) at Object. (/home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js:2: 19) at Module._compile (internal/modules/cjs/loader.js: 1085: 14) at Object.Module._extensions..js (internal/modules/cjs/loader.js: 1114: 10) at Module.load (internal/modules/cjs/loader.js: 950: 32) at Function.Module._load (internal/modules/cjs/loader.js: 790: 12) at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js: 75: 12) code: ‘MODULE_NOT_FOUND’, requireStack: [ ‘/home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js’ ]

How to Plan Regular Solo Dates to Boost Your Confidence and Well-Being Read More »

how-to-use-shazam-to-quickly-discover-the-name-of-a-song-on-iphone

How to Use Shazam to Quickly Discover the Name of a Song on iPhone

internal/modules/cjs/loader.js: 905 throw err; ^ Error: Cannot find module ‘puppeteer’ Require stack: – /home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js at Function.Module._resolveFilename (internal/modules/cjs/loader.js: 902: 15) at Function.Module._load (internal/modules/cjs/loader.js: 746: 27) at Module.require (internal/modules/cjs/loader.js: 974: 19) at require (internal/modules/cjs/helpers.js: 101: 18) at Object. (/home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js:2: 19) at Module._compile (internal/modules/cjs/loader.js: 1085: 14) at Object.Module._extensions..js (internal/modules/cjs/loader.js: 1114: 10) at Module.load (internal/modules/cjs/loader.js: 950: 32) at Function.Module._load (internal/modules/cjs/loader.js: 790: 12) at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js: 75: 12) code: ‘MODULE_NOT_FOUND’, requireStack: [ ‘/home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js’ ]

How to Use Shazam to Quickly Discover the Name of a Song on iPhone Read More »

how-to-fix-the-gps-signal-not-found-error-in-pokemon-go

How to Fix the GPS Signal Not Found Error in Pokémon GO

internal/modules/cjs/loader.js: 905 throw err; ^ Error: Cannot find module ‘puppeteer’ Require stack: – /home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js at Function.Module._resolveFilename (internal/modules/cjs/loader.js: 902: 15) at Function.Module._load (internal/modules/cjs/loader.js: 746: 27) at Module.require (internal/modules/cjs/loader.js: 974: 19) at require (internal/modules/cjs/helpers.js: 101: 18) at Object. (/home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js:2: 19) at Module._compile (internal/modules/cjs/loader.js: 1085: 14) at Object.Module._extensions..js (internal/modules/cjs/loader.js: 1114: 10) at Module.load (internal/modules/cjs/loader.js: 950: 32) at Function.Module._load (internal/modules/cjs/loader.js: 790: 12) at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js: 75: 12) code: ‘MODULE_NOT_FOUND’, requireStack: [ ‘/home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js’ ]

How to Fix the GPS Signal Not Found Error in Pokémon GO Read More »

what-is-domain-investing?-can-you-actually-make-money-doing-it?

What Is Domain Investing? Can You Actually Make Money Doing It?

internal/modules/cjs/loader.js: 905 throw err; ^ Error: Cannot find module ‘puppeteer’ Require stack: – /home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js at Function.Module._resolveFilename (internal/modules/cjs/loader.js: 902: 15) at Function.Module._load (internal/modules/cjs/loader.js: 746: 27) at Module.require (internal/modules/cjs/loader.js: 974: 19) at require (internal/modules/cjs/helpers.js: 101: 18) at Object. (/home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js:2: 19) at Module._compile (internal/modules/cjs/loader.js: 1085: 14) at Object.Module._extensions..js (internal/modules/cjs/loader.js: 1114: 10) at Module.load (internal/modules/cjs/loader.js: 950: 32) at Function.Module._load (internal/modules/cjs/loader.js: 790: 12) at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js: 75: 12) code: ‘MODULE_NOT_FOUND’, requireStack: [ ‘/home/760439.cloudwaysapps.com/jxzdkzvxkw/public_html/wp-content/plugins/rss-feed-post-generator-echo/res/puppeteer/puppeteer.js’ ]

What Is Domain Investing? Can You Actually Make Money Doing It? Read More »