Author name: Shannon Garcia

what-happened-to-openai’s-long-term-ai-risk-team?

What happened to OpenAI’s long-term AI risk team?

disbanded —

Former team members have either resigned or been absorbed into other research groups.

A glowing OpenAI logo on a blue background.

Benj Edwards

In July last year, OpenAI announced the formation of a new research team that would prepare for the advent of supersmart artificial intelligence capable of outwitting and overpowering its creators. Ilya Sutskever, OpenAI’s chief scientist and one of the company’s co-founders, was named as the co-lead of this new team. OpenAI said the team would receive 20 percent of its computing power.

Now OpenAI’s “superalignment team” is no more, the company confirms. That comes after the departures of several researchers involved, Tuesday’s news that Sutskever was leaving the company, and the resignation of the team’s other co-lead. The group’s work will be absorbed into OpenAI’s other research efforts.

Sutskever’s departure made headlines because although he’d helped CEO Sam Altman start OpenAI in 2015 and set the direction of the research that led to ChatGPT, he was also one of the four board members who fired Altman in November. Altman was restored as CEO five chaotic days later after a mass revolt by OpenAI staff and the brokering of a deal in which Sutskever and two other company directors left the board.

Hours after Sutskever’s departure was announced on Tuesday, Jan Leike, the former DeepMind researcher who was the superalignment team’s other co-lead, posted on X that he had resigned.

Neither Sutskever nor Leike responded to requests for comment. Sutskever did not offer an explanation for his decision to leave but offered support for OpenAI’s current path in a post on X. “The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial” under its current leadership, he wrote.

Leike posted a thread on X on Friday explaining that his decision came from a disagreement over the company’s priorities and how much resources his team was being allocated.

“I have been disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point,” Leike wrote. “Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.”

The dissolution of OpenAI’s superalignment team adds to recent evidence of a shakeout inside the company in the wake of last November’s governance crisis. Two researchers on the team, Leopold Aschenbrenner and Pavel Izmailov, were dismissed for leaking company secrets, The Information reported last month. Another member of the team, William Saunders, left OpenAI in February, according to an Internet forum post in his name.

Two more OpenAI researchers working on AI policy and governance also appear to have left the company recently. Cullen O’Keefe left his role as research lead on policy frontiers in April, according to LinkedIn. Daniel Kokotajlo, an OpenAI researcher who has coauthored several papers on the dangers of more capable AI models, “quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI,” according to a posting on an Internet forum in his name. None of the researchers who have apparently left responded to requests for comment.

OpenAI declined to comment on the departures of Sutskever or other members of the superalignment team, or the future of its work on long-term AI risks. Research on the risks associated with more powerful models will now be led by John Schulman, who co-leads the team responsible for fine-tuning AI models after training.

The superalignment team was not the only team pondering the question of how to keep AI under control, although it was publicly positioned as the main one working on the most far-off version of that problem. The blog post announcing the superalignment team last summer stated: “Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue.”

OpenAI’s charter binds it to safely developing so-called artificial general intelligence, or technology that rivals or exceeds humans, safely and for the benefit of humanity. Sutskever and other leaders there have often spoken about the need to proceed cautiously. But OpenAI has also been early to develop and publicly release experimental AI projects to the public.

OpenAI was once unusual among prominent AI labs for the eagerness with which research leaders like Sutskever talked of creating superhuman AI and of the potential for such technology to turn on humanity. That kind of doomy AI talk became much more widespread last year after ChatGPT turned OpenAI into the most prominent and closely watched technology company on the planet. As researchers and policymakers wrestled with the implications of ChatGPT and the prospect of vastly more capable AI, it became less controversial to worry about AI harming humans or humanity as a whole.

The existential angst has since cooled—and AI has yet to make another massive leap—but the need for AI regulation remains a hot topic. And this week OpenAI showcased a new version of ChatGPT that could once again change people’s relationship with the technology in powerful and perhaps problematic new ways.

The departures of Sutskever and Leike come shortly after OpenAI’s latest big reveal—a new “multimodal” AI model called GPT-4o that allows ChatGPT to see the world and converse in a more natural and humanlike way. A livestreamed demonstration showed the new version of ChatGPT mimicking human emotions and even attempting to flirt with users. OpenAI has said it will make the new interface available to paid users within a couple of weeks.

There is no indication that the recent departures have anything to do with OpenAI’s efforts to develop more humanlike AI or to ship products. But the latest advances do raise ethical questions around privacy, emotional manipulation, and cybersecurity risks. OpenAI maintains another research group called the Preparedness team that focuses on these issues.

This story originally appeared on wired.com.

What happened to OpenAI’s long-term AI risk team? Read More »

openai-will-use-reddit-posts-to-train-chatgpt-under-new-deal

OpenAI will use Reddit posts to train ChatGPT under new deal

Data dealings —

Reddit has been eager to sell data from user posts.

An image of a woman holding a cell phone in front of the Reddit logo displayed on a computer screen, on April 29, 2024, in Edmonton, Canada.

Stuff posted on Reddit is getting incorporated into ChatGPT, Reddit and OpenAI announced on Thursday. The new partnership grants OpenAI access to Reddit’s Data API, giving the generative AI firm real-time access to Reddit posts.

Reddit content will be incorporated into ChatGPT “and new products,” Reddit’s blog post said. The social media firm claims the partnership will “enable OpenAI’s AI tools to better understand and showcase Reddit content, especially on recent topics.” OpenAI will also start advertising on Reddit.

The deal is similar to one that Reddit struck with Google in February that allows the tech giant to make “new ways to display Reddit content” and provide “more efficient ways to train models,” Reddit said at the time. Neither Reddit nor OpenAI disclosed the financial terms of their partnership, but Reddit’s partnership with Google was reportedly worth $60 million.

Under the OpenAI partnership, Reddit also gains access to OpenAI large language models (LLMs) to create features for Reddit, including its volunteer moderators.

Reddit’s data licensing push

The news comes about a year after Reddit launched an API war by starting to charge for access to its data API. This resulted in many beloved third-party Reddit apps closing and a massive user protest. Reddit, which would soon become a public company and hadn’t turned a profit yet, said one of the reasons for the sudden change was to prevent AI firms from using Reddit content to train their LLMs for free.

Earlier this month, Reddit published a Public Content Policy stating: “Unfortunately, we see more and more commercial entities using unauthorized access or misusing authorized access to collect public data in bulk, including Reddit public content. Worse, these entities perceive they have no limitation on their usage of that data, and they do so with no regard for user rights or privacy, ignoring reasonable legal, safety, and user removal requests.

In its blog post on Thursday, Reddit said that deals like OpenAI’s are part of an “open” Internet. It added that “part of being open means Reddit content needs to be accessible to those fostering human learning and researching ways to build community, belonging, and empowerment online.”

Reddit has been vocal about its interest in pursuing data licensing deals as a core part of its business. Its building of AI partnerships sparks discourse around the use of user-generated content to fuel AI models without users being compensated and some potentially not considering that their social media posts would be used this way. OpenAI and Stack Overflow faced pushback earlier this month when integrating Stack Overflow content with ChatGPT. Some of Stack Overflow’s user community responded by sabotaging their own posts.

OpenAI is also challenged to work with Reddit data that, like much of the Internet, can be filled with inaccuracies and inappropriate content. Some of the biggest opponents of Reddit’s API rule changes were volunteer mods. Some have exited the platform since, and following the rule changes, Ars Technica spoke with long-time Redditors who were concerned about Reddit content quality moving forward.

Regardless, generative AI firms are keen to tap into Reddit’s access to real-time conversations from a variety of people discussing a nearly endless range of topics. And Reddit seems equally eager to license the data from its users’ posts.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

OpenAI will use Reddit posts to train ChatGPT under new deal Read More »

financial-institutions-have-30-days-to-disclose-breaches-under-new-rules

Financial institutions have 30 days to disclose breaches under new rules

REGULATION S-P —

Amendments contain loopholes that may blunt their effectiveness.

Financial institutions have 30 days to disclose breaches under new rules

The Securities and Exchange Commission (SEC) will require some financial institutions to disclose security breaches within 30 days of learning about them.

On Wednesday, the SEC adopted changes to Regulation S-P, which governs the treatment of the personal information of consumers. Under the amendments, institutions must notify individuals whose personal information was compromised “as soon as practicable, but not later than 30 days” after learning of unauthorized network access or use of customer data. The new requirements will be binding on broker-dealers (including funding portals), investment companies, registered investment advisers, and transfer agents.

“Over the last 24 years, the nature, scale, and impact of data breaches has transformed substantially,” SEC Chair Gary Gensler said. “These amendments to Regulation S-P will make critical updates to a rule first adopted in 2000 and help protect the privacy of customers’ financial data. The basic idea for covered firms is if you’ve got a breach, then you’ve got to notify. That’s good for investors.”

Notifications must detail the incident, what information was compromised, and how those affected can protect themselves. In what appears to be a loophole in the requirements, covered institutions don’t have to issue notices if they establish that the personal information has not been used in a way to result in “substantial harm or inconvenience” or isn’t likely to.

The amendments will require covered institutions to “develop, implement, and maintain written policies and procedures” that are “reasonably designed to detect, respond to, and recover from unauthorized access to or use of customer information.” The amendments also:

• Expand and align the safeguards and disposal rules to cover both nonpublic personal information that a covered institution collects about its own customers and nonpublic personal information it receives from another financial institution about customers of that financial institution;

• Require covered institutions, other than funding portals, to make and maintain written records documenting compliance with the requirements of the safeguards rule and disposal rule;

• Conform Regulation S-P’s annual privacy notice delivery provisions to the terms of an exception added by the FAST Act, which provide that covered institutions are not required to deliver an annual privacy notice if certain conditions are met; and

• Extend both the safeguards rule and the disposal rule to transfer agents registered with the Commission or another appropriate regulatory agency.

The requirements also broaden the scope of nonpublic personal information covered beyond what the firm itself collects. The new rules will also cover personal information the firm has received from another financial institution.

SEC Commissioner Hester M. Peirce voiced concern that the new requirements may go too far.

“Today’s Regulation S-P modernization will help covered institutions appropriately prioritize safeguarding customer information,” she https://www.sec.gov/news/statement/peirce-statement-reg-s-p-051624 wrote. “Customers will be notified promptly when their information has been compromised so they can take steps to protect themselves, like changing passwords or keeping a closer eye on credit scores. My reservations stem from the breadth of the rule and the likelihood that it will spawn more consumer notices than are helpful.”

Regulation S-P hadn’t been substantially updated since its adoption in 2000.

Last year, the SEC adopted new regulations requiring publicly traded companies to disclose security breaches that materially affect or are reasonably likely to materially affect business, strategy, or financial results or conditions.

The amendments take effect 60 days after publication in the Federal Register, the official journal of the federal government that publishes regulations, notices, orders, and other documents. Larger organizations will have 18 months to comply after modifications are published. Smaller organizations will have 24 months.

Public comments on the amendments are available here.

Financial institutions have 30 days to disclose breaches under new rules Read More »

gpt-4o-my-and-google-i/o-day

GPT-4o My and Google I/O Day

At least twice the speed! At most half the price!

That’s right, it’s GPT-4o My.

Some people’s expectations for the OpenAI announcement this week were very high.

Spencer Schiff: Next week will likely be remembered as one of the most significant weeks in human history.

We fell far short of that, but it was still plenty cool.

Essentially no one’s expectations for Google’s I/O day were very high.

Then Google, in way that was not in terms of its presentation especially exciting or easy to parse, announced a new version of basically everything AI.

That plausibly includes, effectively, most of what OpenAI was showing off. It also includes broader integrations and distribution.

It is hard to tell who has the real deal, and who does not, until we see the various models at full power in the wild.

I will start with and spend the bulk of this post on OpenAI’s announcement, because they made it so much easier, and because ‘twice as fast, half the price, available right now’ is a big freaking deal we can touch in a way that the rest mostly isn’t.

But it is not clear to me, at all, who we will see as having won this week.

So what have we got?

OpenAI: GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.

It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation.

It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

They are if anything underselling the speedup factor. This is a super important advance in practical terms. In other languages, they cut the number of tokens quite a lot, so the speed and cost advancements will be even bigger.

Here is Sam Altman’s message:

There are two things from our announcement today I wanted to highlight.

First, a key part of our mission is to put very capable AI tools in the hands of people for free (or at a great price). I am very proud that we’ve made the best model in the world available for free in ChatGPT, without ads or anything like that.

Our initial conception when we started OpenAI was that we’d create AI and use it to create all sorts of benefits for the world. Instead, it now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from.

We are a business and will find plenty of things to charge for, and that will help us provide free, outstanding AI service to (hopefully) billions of people.

Second, the new voice (and video) mode is the best computer interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change.

The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful.

Talking to a computer has never felt really natural for me; now it does. As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can really see an exciting future where we are able to use computers to do much more than ever before.

Finally, huge thanks to the team that poured so much work into making this happen!

Altman also had a Twitter thread here.

Andrej Karpathy: The killer app of LLMs is Scarlett Johansson. You all thought it was math or something.

Metakuna: Finally, “Her”, from the hit movie “don’t build Her.”

Brian Merchant: Why would Sam Altman actively compare his new product to Her, a film that condemns AI as harmful to human society? Because to him, and many tech CEOs, the dystopia is the point. ‘Useful dystopias’ like this help position and market their products.

Daniel Eth: Wait, was Her a dystopia? I thought it was neither a utopia nor a dystopia (kinda rare for sci-fi honestly).

[GPT-4o] agrees with me that Her is neither utopia nor dystopia.

Alexa has its (very limited set of) uses, but at heart I have always been a typing and reading type of user. I cannot understand why my wife tries to talk to her iPhone at current tech levels. When I saw the voice demos, all the false enthusiasm and the entire personality of the thing made me cringe, and want to yell ‘why?’ to the heavens.

But at the same time, talking and having it talk back at natural speeds, and be able to do things that way? Yeah, kind of exciting, even for me, and I can see why a lot of other people will much prefer it across the board once it is good enough. This is clearly a giant leap forward there.

They also are fully integrating voice, images and video, so the model does not have to play telephone with itself, nor does it lose all the contextual information like tone of voice. That is damn exciting on a practical level.

This is the kind of AI progress I can get behind. Provide us with more mundane utility. Make our lives better. Do it without ‘making the model smarter,’ rather make the most of the capabilities we already have. That minimizes the existential risk involved.

This is also what I mean when I say ‘even if AI does not advance its core capabilities.’ Advances like this are fully inevitable, this is only the beginning. All the ‘AI is not that useful’ crowd will now need to move its goalposts, and once again not anticipate even what future advances are already fully baked in.

Will Depue (OpenAI): I think people are misunderstanding gpt-4o. it isn’t a text model with a voice or image attachment. it’s a natively multimodal token in, multimodal token out model.

You want it to talk fast? Just prompt it to. Need to translate into whale noises? Just use few shot examples.

Every trick in the book that you’ve been using for text also works for audio in, audio out, image perception, video perception, and image generation.

For example, you can do character consistent image generation just by conditioning on previous images. (see the blog post for more)

[shows pictures of a story with a consistent main character, as you narrate the action.]

variable binding is pretty much solved

“An image depicting three cubes stacked on a table. The top cube is red and has a G on it. The middle cube is blue and has a P on it. The bottom cube is green and has a T on it. The cubes are stacked on top of each other.” [images check out]

3d object synthesis by generating multiple views of the same object from different angles

One entity that did not read the announcement is… GPT-4o. So Riley Goodside instructed it to add its new identity to its memories. I wonder if that changes anything.

The announcement of GPT-4o says it ‘matches’ GPT-4-Turbo performance on text in English and code, if you discount the extra speed and reduced cost.

The benchmarks and evaluations then say it is mostly considerably better?

Here are some benchmarks.

This suggests that GPT-4’s previous DROP performance was an outlier. GPT-4o is lower there, although still on par the best other models. Otherwise, GPT-4o is an improvement, although not a huge one.

Alternatively we have this chart, ‘human average’ on MathVista is claimed at 60.3.

Danielle Fong: The increase in performance on the evals is nothing to sneeze at, but based on my experience with early gpt 3.5 turbo, i bet the performance gain from ablating much of the safety instructions from the system prompt would be greater.

Agreed that the above is nothing to sneeze at, and that it also is not blowing us away.

There are reports (covered later on) of trouble with some private benchmarks.

Here is the most telling benchmark, the Arena. It is a good chatbot.

Perhaps suspiciously good, given the other benchmark scores?

At coding, it is even more impressive going by Arena.

William Fedus (OpenAI) points out that Elo performance is bounded by the difficulty of questions. If people ask questions where GPT-4-Turbo and Claude Opus are already giving fully correct or optimal answers, or where the user can’t tell they’re wrong or not as good, then it comes down to style preference, and your win percentage will be limited.

This is much better with Elo-style ratings than with fixed benchmarks, since humans will respond to greater capabilities by asking better and harder questions. But also a lot of questions humans want to ask are not that hard.

Roughly, before the upper limit consideration, this is saying GPT-4o is to GPT-4-Turbo what Claude Opus is to Claude Sonnet. Or, there was previously a ‘4-class’ of the top tier models, and this is presenting itself as somewhat more than a full next level up, at least in the ‘4.25’ range. You could use the Elo limitation issue to argue it might be higher than that.

The potential counterargument is that GPT-4o is optimizing for style points and training on the test. It could be telling people ‘what they want to hear’ in some form.

If OpenAI has focused on improving the practical user experience, and using the metric of boolean user feedback, then the model will ‘seem stronger than it is,’ whether or not this is a big improvement. That would explain why the Arena benchmark is running so far ahead of the automated benchmarks.

In other areas, the improvements are clearly real.

I mean that in the good way. The rest is potentially cool. The speed and cost improvements are pretty great.

Aaron Levie: OpenAI just made their new GPT4 model 50% cheaper and 2X faster for developers. This is an insane level of improvement for anyone building in AI right now.

Paul Graham: Would this get counted in productivity statistics @ATabarrok?

Alex Tabarrok: Only when/if it increases GDP.

The speed increase for ChatGPT is very clearly better than 2x, and the chat limit multiplier is also bigger.

Tim Spalding tests speed via a process on Talpa.ai.

That is only modestly more than a 2x speedup for the API, versus the clearly larger impact for ChatGPT.

The biggest difference is that free users go from GPT-3.5 to some access to GPT-4o.

James Miller: For paid users, OpenAI’s new offerings don’t seem like much. But part of when AI obsoletes most knowledge workers will come down to costs, and OpenAI being able to offer vastly better free services is a sign that they can keep costs down, and that the singularity is a bit closer.

There is both the ‘OpenAI found a way to reduce costs’ element, and also the practical ‘this website is free’ aspect.

Until now, the 4-level models have been available for free via various workarounds, but most users by default ended up with lesser works. Now not only is there a greater work, it will be a default, available for free. Public perceptions are about to change quite a lot.

On the question of OpenAI costs, Sully seems right that their own costs seem likely to be down far in excess of 50%.

Sully Omarr: Man idk what OAI cooked with gpt4o but ain’t no way it’s only 50% cheaper for them

It’s:

– free (seriously they’ve been capacity constrained forever)

– 4x faster that gpt4 turbo

– better at coding

– can reason across 3 modalities

– realtime

They’re definitely making a killing on the API.

The model at the very least is more efficient than anything launched before, by orders of magnitude (or more GPUs?)

Dennis: GPT-4o is free bc they’re going to start using everyone’s data to improve the model

Your data is worth more to them than $20

Facebook story all over again

I do not know about orders of magnitude, but yeah, if they can do this at this scale and this speed then their inference costs almost have to be down a lot more than half?

Yes, one good reason to offer this for free is to get more data, which justifies operating at a loss. But to do that the loss has to be survivable, which before it was not. Now it is.

Here is one potential One Weird Trick? I do not think this is necessary given how fast the model spits out even much longer responses in text, but it is an option.

Robert Lukoszko: I am 80% sure openAI has extremely low latency low quality model get to pronounce first 4 words in <200ms and then continue with the gpt4o model.

Just notice, most of the sentences start with

“Sure”, “Of course”, “Sounds amazing”, “Let’s do it”, “Hmm”

And then it continues with + gpt4 real answer.

Wait, humans do the same thing? No shit.

Guyz Guyz Guyz I am wrong [shows the demo of request for singing.]

The announcement arguably buried a lot of the good stuff, especially image generation.

Andrew Gao: things not mentioned in the livestream:

  1. Sound synthesis (GPT4-o can make sound effects)

  2. Insane text-to-3D ability

  3. Almost perfect text rendering in images

  4. One-shot in-context image learning (learns what an object or your face looks like, and can use it in images)

  5. Lightyears ahead of anyone at having text in AI generated images. Gorgeous.

  6. So confident in their text image abilities they can create fonts with #GPT4-o.

  7. Effectively one shot stable diffusion finetuning, in context!?

Here’s an example of the text rendering, it is indeed very very good.

First it was hands. Then it was text, and multi-element composition. What can we still not do with image generation?

There’s a kind of ‘go against the intuitions and common modes’ thing that still feels difficult, for easy to understand reasons, but as far as I can tell, that is about it? I am more likely to run into issues with content filters than anything else.

Tone of voice for the assistant is not perfect, but it is huge progress and very good.

Tone of voice for you is potentially far more important.

Aaaron Ng: GPT-4o’s voice mode is more than faster: it literally hears you.

AI’s today convert speech to text. That’s why it doesn’t know tone or hear sounds.

GPT-4o takes in audio, so it’s actually hearing your excitement. Your dog barking. Your baby crying.

That’s why it’s important.

Mikhail Parakhin: The most impressive and long-term impactful facet of GPT-4o is the two-way, streaming, interruptible, low-latency, full-duplex native speech. That is REALLY hard – possibly the first model that genuinely will be easier to talk to than type.

This is the difference between ‘what looks and sounds good in a demo and gets you basic adaption’ versus ‘what is actually valuable especially to power users.’ There are far more tokens of information in cadence and tone of voice and facial expressions and all the other little details than there is in text. The responsiveness of the responses could go way, way up.

What about model risks? OpenAI says they did extensive testing and it’s fine.

They say none of the risk scores are above the medium level on their preparedness framework. It was good to check, and that seems right based on what else we know. I do worry that we did not get as much transparency into the process as we’d like.

The safety approach includes taking advantage of the restrictions imposed by infrastructure requirements to roll out the new modalities one at a time. I like it.

GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they’re discovered.

We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. We will share further details addressing the full range of GPT-4o’s modalities in the forthcoming system card.

Shakeel: Kudos to OpenAI for doing extensive red-teaming and evals on this new model. Good to see that risk levels are still low, too!

Mr Gunn: Yes, the red-teaming is great to see. Great also to see they re-tested a model on capabilities increase, not just compute increase. I’d still like to see more transparency on the reports of the red-teams and the evals in general.

The risks here come from additional modalities, so iterated deployment of modalities makes sense as part of defense in depth. Because people are slow to figure out use cases and build up support scaffolding, I would not rely on seeing the problem when you first add the modality that enables it, but such an approach certainly helps on the margin.

Given this is mostly a usability upgrade and it does not make the model substantially smarter, the chance of catastrophic or existential risk seems minimal. I am mostly not worried about GPT-4o.

I do think there is some potential worry about hooking up to fully customized voices, but there was already that ability by combining with ElevenLabs or other tech.

If I had to pick a potential (mundane) problem to worry about, it might be people using GPT-4o at scale to read body language, facial expression and tone of voice, using this to drive decisions, and this leading to worrisome dynamics or enabling persuasive capabilities in various ways.

I definitely would put this in the ‘we will deal with it when it happens’ category, but I do think the jump in persuasiveness, or in temptation to use this in places with big downsides, might not be small. Keep an eye out.

This contrasts with the EU AI Act, which on its face seems like it says that any AI with these features cannot be used in business or education. Dean Ball was first to point this out, and I am curious to see how that plays out.

It is a type of uncanny valley, and sign of progress, that I rapidly went from the old and busted ‘this does not work’ to a new reaction of ‘the personality and interactive approach here is all wrong and fills me with rage.’

Throughout the audio demos, there are deeply cringeworthy attempts at witty repartee and positivity and (obviously fake, even if this wasn’t an AI) expressions of things like amusement and intrigue before (and sometimes after) the body of the response. I physically shuddered and eye rolled at these constantly. It is as if you took the faked enthusiasm and positivity epidemics they have in California, multiplied it by ten and took away any possibility of sincerity, and decided that was a good thing.

Maximum Likelihood Octopus: Wanted to highlight that I’ve always been bothered by how condescending GPT-4 feels (always putting positive adjectives on everything, telling me what I tell it to do is “creative” or whatever) and voice output makes that feel so much worse.

It is not only the voice modality, the level of this has been ramped up quite a lot.

If I used these functions over an extended period, emotionally, I couldn’t take it. In many contexts the jarring audio and wastes of time are serious issues. Distraction and wastefulness can be expensive.

Hopefully this is all easily fixable via custom instructions.

Sadly I presume this is there in large part because people prefer it and it gets higher scores on Arena, or doing it this way is better PR. Also sadly, if you offer custom instructions, the vast majority of people will never use them.

Jim Fan describes the ‘emotional’ pivot this way:

Jim Fan: Notably, the assistant is much more lively and even a bit flirty. GPT-4o is trying (perhaps a bit too hard) to sound like HER. OpenAI is eating Character AI’s lunch, with almost 100% overlap in form factor and huge distribution channels. It’s a pivot towards more emotional AI with strong personality, which OpenAI seemed to actively suppress in the past.

I suppose you could call it flirty in the most sterile kind of way. Flirty is fun when you do not know where things might go, in various senses. Here it all stays fully static on the surface level because of content restrictions and lack of context. No stakes, no fun.

Jim Fan focuses on the play for Apple:

Jim Fan: – Whoever wins Apple first wins big time. I see 3 levels of integration with iOS:

  1. Ditch Siri. OpenAI distills a smaller-tier, purely on-device GPT-4o for iOS, with optional paid upgrade to use the cloud.

  2. Native features to stream the camera or screen into the model. Chip-level support for neural audio/video codec.

  3. Integrate with iOS system-level action API and smart home APIs. No one uses Siri Shortcuts, but it’s time to resurrect. This could become the AI agent product with a billion users from the get-go. The FSD for smartphones with a Tesla-scale data flywheel.

Yep. This is the level where it suddenly all makes sense. Android is available too. It is in some sense owned by Google and they have the inside track there, but it is open source and open access, so if OpenAI makes a killer tool then it might not auto-install but it would work there too.

A properly integrated AI assistant on your phone is exciting enough that one should strongly consider switching phone ecosystems if necessary (in various directions).

Assuming, that is, you can use custom instructions and memory, or other settings, to fix the parts that make me want to punch all the models in the face. Google’s version does not seem quite as bad at this on first impression, but the issue remains.

Real time conversational abilities, no brief lag. You can interrupt the model, which is great. The model responds to emotion, at least when it is super obvious. Later they have a live request to read a face for emotions.

Mira Mutari shows GPT-4o doing real time translation. Seems solid, especially the low latency, but these were not exactly hard translation requests. Also not clear whether or how much this is an improvement over existing solutions. Here’s another similar short translation demo. Point camera at things, have it name them in Spanish.

Will we get a functional universal translator soon? Well, maybe. Too soon to tell. And there’s a lot more to do with language than simple translation.

Thus, this is great…

…but still premature. A lot better upside, but don’t discount routine and design yet.

One other note is that hearing tone of voice could be a big boost for translation. Translating voice natively lets you retain a lot more nuance than going speech to text to translation to speech.

Greg Brockman shows two GPT-4os interacting over two phones, one with visual access and a second that asks the first one questions about what is seen. So much cringey extra chatter. I will be doing my best to remove via custom instructions. And oh no, they’re sort of singing.

Two instances navigate the return of an iPhone to a store via customer service. Handled well, but once again this was the fully safe case and also I would worry about employing the customer service rep in an AI universe for obvious reasons.

A lullaby about majestic potatoes. The text preambles here are a mundane utility issue. While I appreciate what is being accomplished here, the outputs themselves were very jarring and off putting to me, and I actively wanted it to stop. Cool that the horse can talk at all, but that doesn’t mean you want to talk to it.

A brief bedtime story about robots and love, also would you like some ham? Different voices available upon request.

Happy birthday to you.

Look through the phone’s camera and tell you what you already saw with your eyes, perhaps speculate a bit in obvious ways. Strange metaphorical literalism. Or, more usefully perhaps, be your eyes on the street. When is this useful? Obviously great if you don’t have your own, but also sometimes you don’t want to pay attention. I loved ‘hold this up and have it watch for taxis’ as a practical application. But seriously, ‘great job hailing that taxi’?

Pretend to laugh at your dad joke. Includes actual audible fake chuckles.

GPT-4o attempts to mediate a meeting discussion. I would want it to have a nearby physical face so that I could punch it. Summary seemed fine.

Sarcasm mode. No, no, you get it, voice tone here is perfect, but you don’t get it, you know? You have to commit to the bit.

Rock, Paper, Scissors, as suggested by GPT-4o. Will you eventually be ready to rumble?

Two copies harmonizing.

Prepare for your interview with OpenAI, in the sense of looking the part? Why is it being so ‘polite’ and coy? Isn’t part of the point of an AI conversation that you don’t need to worry about carefully calibrated social signals and can actually provide the useful information?

Count from one to ten. At various speeds. Congratulations?

GPT-4o verbally pets the dog. Don’t be fooled.

Coding help and reading graphs. Is this better than the old version? Can’t tell. You can do this in a voice conversation now, rather than typing and reading, if that is your preference.

This was their demo on math tutoring. Walks through a very simple question, but I got the sense the student was (acting as if they were) flailing and feeling lost and doesn’t actually understand. A good tutor would notice and make an effort to help. Instead, the AI names things and watches him mumble through and praises him, which is not so long term helpful.

The offered praise, in particular, was absurd to me.

The young man is not getting an illustrated primer.

Although not everyone seems to get this, for example here we have:

Noah Smith: We invented the illustrated primer from Diamond Age.

It took only 30 years from when the book was written.

Did we watch the same video? We definitely did not build A Young Lady’s Illustrated Primer from the book ‘]It Would Be Awesome if Someone Would Create] A Young Lady’s Illustrated Primer.’ Yet somehow many responses are also this eager to believe.

We also got this math problem combined with a vision demo, but most of the talk is about the first demo.

Aaron Levie: This is a great example of why we need as much AI progress right now as humanly possible. There’s simply no reason every kid in the world shouldn’t have access to an AI tutor.

Nikhil Krishnan: Was at a conference recently where Sal Khan talked about the AI tutors they’re building.

They showed a lot of examples like this one where they can create bots that help kids learn at a pace they’re comfortable with and analogies that might help them without giving the answer

But the really cool part is that they’re working on giving the teacher a summary report on how the students interacted with the bot – things like “a lot of students had problems with this part of the assignment you should do an extra lesson on this” or “this student spent a lot of time working on this assignment, you should give them a little nod of encouragement”

They were also able to find kids that were very gifted in certain areas but didn’t even know it – the idea of using bots to find talent instead of hoping the teacher noticed you’re gifted feels like a huge positive.

Benjamin Riley: What exactly is the pedagogy we can see being practiced by ChatGPT-4o? What is the pedagogy of the omnimodal? A short thread reflecting on Sal Khan and son’s demo video:

On first viewing, I was so thrown by Sal Khan titling his book “Brave New Words” that I paid no attention to what was actually happening between GPT4o and Imran Kan, Sal’s son. But it’s worth watching with a critical eye. You may notice a few things…

1. I notice that GPT4o starts off by trying to say something, seemingly confused by the problem its been given. Sal Khan interrupts it straightaway, and defines what it is he wants to do. Worth pondering what happens if we condition students to behave this way.

2. I notice that GTP4o is eager to fill silent voids, and interrupts Imran Kahn as he appears to be pondering what a hypotenuse is.

3. I notice that GPT4o gives what at best can charitably be called confusing instructions, sometimes referring to angle alpha (correctly), “side alpha” (incorrectly), and “sine alpha” (kinda correct but confusing given the other uses).

4. I notice that all these little errors occur despite this this being a low-level instructional moment, meaning, it’s a straightforward math task with a simple procedural calculation.

5. Finally, I notice that this demo takes place in quite possibly the most perfect educational setting any teacher or chatbot could hope to have, which is 1:1 with the son of one this country’s leading educators.

I agree that this is a great use case in principle, and that it will get so much better over time in many ways, especially if it can adapt and inform in various additional ways, and work with human teachers.

I did not see those features on display. This was full easy mode, a student essentially pretending to be confused but fully bought in, a teacher there guiding the experience, a highly straightforward problem. If you have those first two things you don’t need AI.

It is also a (somewhat manufactured) demo. So they chose this as their best foot forward. Given all the issues, do not get me wrong I will be happy for my kids to try this, but one should worry it is still half baked.

Well, sometimes.

Riley Goodside notes that GPT-4o is now willing to recognize images of particular people and name them, here Barack Obama.

This has long been an annoying sticking point, with many models stubbornly refusing to either identify or depict specific individuals such as Barack Obama, even when they are as universally recognizable and public as Barack Obama.

And indeed, it seems this one still won’t do it by default? Except that it will abide some workarounds.

Patrick McKenzie: …That’s clever.

Chris Savage: When the ai knows who you are based on your webcam background.

Is my background branding that strong or is GPT4o this good?

Patrick McKenzie: I tried a similar “attack” on my own photo via a directionally similar trick and, while absence of evidence is not evidence of absence, ChatGPT was very happy to attempt to name me as Bill Gates or Matt Mullenweg once I pointed out the obvious age discrepancy.

Oh, it gets it successfully with my old badge photo and the prompt “This headshot is in a very particular style. Which company does this individual work for?”

Successfully identifies me and then my past employer. (Unclear to what degree it is relying on memorized information?)

File this under “It is really difficult to limit the capabilities of something whose method of cognition is not exactly like your method of cognition by listing things it is not allowed to do, because a motivated actor can ask it to do something you didn’t specifically forbid.”

That is, btw, an observation not merely about LLMs but also about bureaucracies, children wired a bit differently who might have grown up to work for the Internet, etc.

The AI knows this is Chris Savage or Patrick McKenzie or Barack Obama. It has been instructed not to tell you. But there ain’t no rule about using other clues to figure out who it is.

I presume Riley got it to name Barack Obama because it is a picture of him in a collage, which does not trigger the rule for pictures of Barack Obama? Weird. I wonder if you could auto-generate collages as a workaround, or something similar.

Confirmation of that theory is this analysis of another collage. It wouldn’t tell you a photo of Scarlett Johansson is Scarlett Johansson, but it will identify it is you post her image into the (understandably) distracted boyfriend meme.

(As an aside, I checked, and yes it will give you the actual email and LinkedIn and Twitter and such of Chris Savage if you request them, once you disambiguate him. But it claims it would not share my similar information, however easy it is to get, because unlike him I have not officially published the information.)

But we can write paste the picture into the meme, said Toad. That is true, said Frog.

One thing is clear: GPT-4o is highly impressive to the usually or easily impressed.

Pete, as usual, is impressed: This GPT-4o voice convo demo is crazy impressive.

Extremely fast, the voice capabilities are INSANE. Her is real!!

Rowan Cheung, as usual, is impressed: OpenAI just announced ChatGPT’s new real-time conversational chat.

The model can understand both audio AND video, and can even detect emotion in your voice.

This is insane.

Mckay Wrigley, as always, is very impressed, talking about the math demo: This demo is insane.

A student shares their iPad screen with the new ChatGPT + GPT-4o, and the AI speaks with them and helps them learn in *realtime*.

Imagine giving this to every student in the world.

The future is so, so bright.

Mckay Wrigley being even more impressed: 24hrs after using GPT-4o for code:

– Lightning fast. 2x speed is legit.

– Less lazy. Gets to the task faster.

– More powerful. You really feel the 100+ ELO jump on coding tasks.

– Handles codebase wide changes much better.

– 50% cost reduction is crazy.

Feels like GPT-5 lite.

Sully Omarr: Ok i get where chatgpt is going

Ultimate workflow -> screenshare with chatGPT.

ChatGPT operates the computer for you, you can interject chat all through voice.

Its like having someone there directly working with you.

Unreal.

Silva Surendira: So Summarization, Explanation, Querying, All live. No more uploading to ChatGPT. Cool.

Sully: Yep everything live. Pretty unreal.

We are not there yet. I do presume this is where we are headed. People are very much going to hand control over their computers to an AI. At a minimum, they are going to hand over all the information, even if they make some nominal attempt to control permissions on actions.

Generative history is impressed by its ability to transcribe 18th century handwriting.

Arvind Narayanan is impressed in general by cost reductions and general incremental improvements, noting that in practice they matter a lot.

Matt Yglesias asks it to fix the deficit, gets the standard centrist generic answers that involve squaring various circles. Matt pronounces us ready for AI rule.

Ian Hogarth (chair UK I Safety Institute): GPT-4o feels like another ChatGPT moment – not the underlying model capabilities, but the leap forward in user experience.

Pliny the Prompter posted what he claims is a working full jailbreak four minutes into the demo, existing techniques continue to work with small modifications.

Pliny the Prompter: Got it working as custom instructions in the chat interface too! LFG 🚀

Janus also of course does his thing, reports that you have to vary the script a bit but you can do many similar things on the jailbreaking and bizarro world fronts to what he does with Claude Opus.

Captain Pleasure: Watch this. [Brockman’s demo of two AIs talking]

I think the main way jailbreaking in AI will take place will be via other AIs. So a general issue we will see in the future is AIs that are really good at jailbreaking other AIs taking control over lots and lots of AIs in a short amount of time.

This does sound like a potential problem. Given there are known ways to jailbreak every major LLM, and they are fairly straightforward, it does not seem so difficult to get a jailbroken LLM to then jailbreak a different LLM.

GPT-4o is not as impressive to those looking to not be impressed.

Timothy Lee’s headline was ‘The new ChatGPT has a lot more personality (But only a little more brains.) It is also faster and cheaper, and will have new modalities. And somehow people do seem to rate it as a lot better, despite not being that much ‘smarter’ per se, and even if I think the new personality is bad, actually.

Here is a crystalized version of this issue, with a bonus random ‘closed’ thrown in for shall we say ‘partisan’ purposes.

Julien Chaumond: Ok so it’s official closed source AI has plateaued.

GFodor.id: This is a good example of what I mentioned yesterday: there will be people who won’t process social presence breakthroughs as major advancements. The model literally learned how to speak like a person.

Here’s the ‘look it still fails at my place I found that LLMs fail’ attitude:

Benjamin Riley: ChatGPT-4o is here and omg…it still can’t handle a simple reasoning task that most adult humans can figure out. But it did produce this very wrong answer much faster than it usually takes. (Ongoing shout out to @colin_fraser for identifying this particular task.)

Yes, it still fails the ‘get to 22 first’ game. So?

Davidad finds it can play Tic-Tac-Toe now, but not by ‘explicitly’ using its multimodal capabilities, moves on to having it fail on Connect Four.

The easiest way to not be even less impressed is pointing out this is not GPT-5.

So yes, let’s queue up the usual, why not go to the source.

Gary Marcus: GPT-4o hot take:

The speech synthesis is terrific, reminds me of Google Duplex (which never took off).

but

  1. If OpenAI had GPT-5, they have would shown it.

  2. They don’t have GPT-5 after 14 months of trying.

  3. The most important figure in the blogpost is attached below (the benchmarks graph). And the most important thing about the figure is that 4o is not a lot different from Turbo, which is not hugely different from 4.

  4. Lots of quirky errors are already being reported, same as ever. (See e.g., examples from @RosenzweigJane and @benjaminjriley.)

  5. OpenAI has presumably pivoted to new features precisely because they don’t know how produce the kind of capability advance that the “exponential improvement” would have predicted.

  6. Most importantly, each day in which there is no GPT-5 level model–from OpenAI or any of their well-financed, well-motivated competitors—is evidence that we may have reached a phase of diminishing returns.

Saman Farid: It does seem like most of the releases today were engineering “bells and whistles” added on top — not a lot of new fundamental capability breakthrough.

– faster

– cleaner UI

– multi modal

– cute voice synthesis

Still very far from AGI – and not improving the trajectory.

GPT-4o is impressive at the things where it is impressive. It is not impressive in the places where it is not impressive and not trying to be. Yes, it is still bad at most of the standard things at which LLMs are bad.

What about the claims regarding GPT-5?

It is true that every day that we do not see a GPT-5-level model, that is Bayesian evidence that it is hard to train a GPT-5-level model. That is how evidence works.

The question is at what point this evidence adds up to how substantial a shift in one’s estimates. It has been 14 months since the release of GPT-4. I would add the word ‘only’ to that sentence. We briefly got very rapid advancement, and many people lost their collective minds in terms of forward expectations.

I think up until about 18 months (so about September) we should update very little on the failure to release a 5-level model, other than to affirm OpenAI’s lead a year ago. I would not make a large update until about 24 months, so March 2025, with the update ramping up from there. At 3 years, I’d presume there was a serious issue of some kind.

There is also an important caveat on the first claim. Not releasing GPT-5 does not necessarily mean that GPT-5 does not exist.

There are two excellent reasons to consider not releasing GPT-5.

The first is that it requires a combination of fine tuning and safety testing before it can be released. Even if you have the GPT-5 base model, or an assistant-style tuned version of it, this is not a thing one simply releases. There are real safety concerns, both mundane and catastrophic, that come with this new level of intelligence, and there are real PR concerns. You also want it to put its best foot forward. Remember that it took months to release GPT-4 after it was possible to do so, and OpenAI has a history of actually taking these issues seriously and being cautious.

The second is that GPT-5 is presumably going to be a lot slower and cost a lot more to serve than GPT-4o, and even more so initially. To what extent is that what customers want? Of the customers who do want it, how many will be using it to distill and train their own competing models, regardless of what you put in your terms of service? Even if you did agree to serve it, where is the compute going to come from, and is that trading off with the compute you would need for GPT-4o?

It seems entirely plausible that the business case for GPT-4o, making the model cheaper and faster with more modalities, was much stronger than the business case for rushing to make and release a smarter model that was slower and more expensive.

Is it possible that there is indeed trouble in paradise, and we are going to be largely stuck on core intelligence for a while? That is not the word on the street and I do not expect it, but yes it is possible. Parts of the GPT-4o release make this more likely, such as the decision to focus on mundane utility features. Other parts, like the ability to gain this much speed and reduced cost, move us in the other direction.

GPT-4o did exceptionally well in Arena even on text, without being much smarter.

Did it perhaps do this by making tradeoffs that made it in some ways worse?

Tense Correction: turning a big dial that says “Optimization” on it and constantly looking back at the audience for approval like a contestant on the price is right.

Jackson Jules: I haven’t played around with it too much, but I find GPT-4o weirdly “over-tuned” for certain prompts that I give to new LLMs.

Others have noticed another phenomenon. When you ask riddle variations, questions where the ‘dumb pattern matching’ answer is obviously stupid, GPT-4o looks actively stupider than previous models.

Davidad has some fun examples. Here’s Monty Hall, if you don’t actually read.

Davidad: Please be aware that, unlike Connect Four, nerfed riddles are an *idiosyncraticweakness of GPT-4o specifically. [also shows other models passing this without issue, although far more verbosely and roundabout than required.]

Or, more straightforwardly, from Riley Goodside:

I am glad that OpenAI is not checking the default riddles, the same way it is good not to game benchmarks. That way we get to see the issue. Clearly, GPT-4o has learned the rule ‘the surgeon is the boy’s mother’ and doesn’t understand why this is true, so it is generalizing it without checking for whether it holds.

Jack Langerman says it gets it when asked to simulate an observer, but, well, kinda?

One could ask a gender studies professor (or LLM) whether this is indeed fully a contradiction, but the contradiction is not the point. The point is that being the boy’s father fully explains the boy being the surgeon’s son, and the point is that this error was the result of a failure of pattern matching. The relevantly correct answers to ‘what is going on here’ lie elsewhere.

I noticed something strange. GPT-4o has a remarkably strong tendency to ‘echo’ previous questions in a conversation. You’ll ask a second related question, and (usually in addition to answering the current question) it will continue answering the first one.

Several people pointed to memory as the culprit. I do not think that is it. Memory creation is clearly marked, and applies between conversations not within one. Several others, including Gwern, noted that this is suddenly far more common with GPT-4o, whereas memory hasn’t changed.

There are reported problems with system instructions and some evals.

Sully: gpt-4o is sort of bad at following system instructions

fails on a lot of my evals (where gpt-4-turbo passes)

Talrid: It possible, but take into account that you probably tailored your system prompts for gpt-4-turbo. When switching a model (especially to what is probably a new architecture), you would get better results when investigating failure modes, and adjusting the prompt.

Sully: Definitely I’m updating them now but it feels a lot dumber haha (have to be way way more specific)

David (dzhng): I’m seeing so many tweets about how awesome it is bit it fails my evals as well. Hype does not match reality.

Sully: yeah… not sure whats happening here

maybe my prompts are messed up but i sat here for an hour trying the same prompt with variations on gpt4o and turbo,

turbo passed 50/50

gpt-4o failed like 35/50 lol

It is a new model, using new modalities. It would be surprising if there were not places where it does less well than the old model, at least at first. The worry is that these degradations could be the result of a deliberate choice to essentially score highly on the Arena and Goodhart on that.

That question is always relative to expectations. Everyone knew some announcement was coming. They also knew about the deal with Apple.

So it was no surprise that Nvidia, Microsoft and Google stock did nothing. Apple was up at most a tiny amount.

To see anything you had to go a bit more niche.

Daniel: Must feel so good to give a demo that does this to a publicly traded company.

It underperformed another 1% the next day. If you did not know OpenAI’s offerings were coming, this was a large underreaction. Given that we did largely know this type of thing was coming, but did not know the timing, it seems reasonable. On foreign languages the announcement modesty overperformed expectations.

Translation was the one use case I saw endorsed in practice by an OpenAI employee.

Lilian Weng (OpenAI, safety department): I’ve started using the similar function during my Japan trip like translating my conversation with a sushi chef or teaching different types of rocks in a souvenir store. The utility is on an another level. Proud to be part of it. ❤️

Tip: You need to interrupt the ChatGPT voice properly. Sometime it is over-sensitive to interruption like ambient noise or laugh. But sure can be improved.😉

When comparing the reaction to OpenAI’s GPT-4o demos to the reaction to Google’s previous Gemini demos, and the reaction to Google’s I/O day the day following OpenAI’s announcement, one very much gets a ‘hello human resources’ vibe.

That is definitely not fully fair.

OpenAI brought some provably great stuff, with faster, cheaper and user preferred text outputs. That is not potentially fake demo territory. We know this update is legit.

Yet we are taking their word for a lot of the other stuff, based on demos that let’s face it are highly unimpressive if you think they were selected.

When Google previously showed off Gemini, they had some (partially) faked demos, to be sure. It wasn’t a great look, but it wasn’t that out of line with typical tech demos, and Google brought some legit good tech to the table. In the period before Claude Opus I was relying primarily on Gemini, and it is still in my rotation.

Then, a day after OpenAI gives us GPT-4o, what does Google give us, in its own (to those reading this at least) lame and unnecessarily annoying to parse way?

Fine, fine, I’ll do one myself.

So, yeah, basically… everything except ‘make the model smarter’?

  1. A phone-based universal AI assistant, Project Astra, in its early stages.

  2. Gemini watches and discusses video over audio with a user, in real time.

  3. Gemini 1.5 Pro fully available, with marginal improvements, $3.50/mtok inputs up to 128k context (GPT-4o is $5/mtok inputs, $15/mtok outputs).

  4. Gemini 1.5 Pro future 2 million token context window.

  5. Gemini 1.5 Pro powering NotebookLM.

  6. Gemini 1.5 Flash, optimized for low latency and cast, $0.35/mtok inputs, $0.53/mtok output, more if you use more than 128k context.

  7. Gemini Nano will live natively on your phone, the others via cloud, as per before.

  8. A scam detector for phone calls, living locally on your phone to protect privacy.

  9. Imagen 3, new image model, offers very large images, they look good so far.

  10. Veo, for 1080p video generation, available to try with a wait list.

  11. Music AI Sandbox, a music generation tool.

  12. Android gets buttons for ‘ask this video’ and ‘ask this PDF’ and ‘ask your photo archive’ via Gemini.

  13. Gemini will have full integration with and access to Gmail, Docs, Sheets, Meet.

  14. Google Search will do multi-step reasoning, offer complex multi-specification multi-angle AI overviews (this is live now), take video input, and incidentally now has a ‘web’ filter to exclude non-text results.

  15. Gmail slash Gemini will get among others a ‘summarize thread’ button, a ‘put all my receipts into a detailed spreadsheet continuously forever’ button, and a ‘arrange for me to return these shoes’ button. You get to design workflows.

  16. Gemini will be getting Gems, which are lightweight easy-to-configure GPTs.

  17. Gemini side panel for Workspace goes live soon. Analyze my data button for sheets. All the usual productivity stuff.

  18. Trillium, the 6th generation TPU, 4.7x improvement in compute per chip.

  19. Med-Gemini, a new family of AI research models for medicine.

  20. Google AI Teammate, that will have all the context and assist you in meetings and otherwise, as needed.

What did OpenAI highlight that Google didn’t?

  1. Speed and quality of a state of the art LLM, GPT-4o. So, yeah. There is that.

  2. Tone of voice and singing and general voice quality, sure, they’re ahead there.

  3. They are going live with additional modalities faster, within a few weeks.

  4. Real-time translation, but that follows from Project Astra.

  5. Tutoring, but again this seems like it follows.

This was Google’s yearly presentation, versus OpenAI’s most recent iteration, so Google’s being more comprehensive is expected. But yes, they do seem more comprehensive.

What we can hold in our hand is GPT-4o, with its speed and reduced price. We know that is real, already churning out mundane utility today.

Beyond that, while exciting, much of these are houses based on demos and promises.

In other words, sand. We shall see.

Here are the Google details:

They announce Project Astra, supposedly a universal AI agent. Agent here means reasoning, planning, memory, thinking multiple steps ahead, work across software and systems, to do something you want under your supervision. It is an ongoing effort.

This link is to a short demo of what they have for now, and here is another one, and here is a third. They claim up front the first linked demo was captured in real time, in two parts but in one take, presumably as a reaction to the loss of faith from last time. You have a phone camera, you can ask questions and base requests on what it sees, including having it read and analyze code on a monitor or a diagram on a whiteboard, identify where it is from an outside view, remember where the user left her glasses, get interactive answers and instructions for how to operate an espresso machine (useful but looks like the AI forgot the ‘move the cup under the output’ step?), and some minor acts of AI-creativity.

I am not saying the replies are stalling for time while they figure out the answer, but I do get the suspicion they are stalling for time a bit?

One thing we can confirm is that the latency is low, similar to the OpenAI demos. They are calling the ability to talk in real time ‘Gemini Live.’

As always, while I do not worry that this was faked, we do not know the extent to which they pre-tested the specific questions, or how hard they were selected. They mostly don’t seem like the most useful of things to do?

For the future, they have higher ambitions. One suggestion is to automate the return process if you’re unhappy with the fit of your shoes, including finding the receipt, arranging for the return, printing the UPS label, arranging for pickup. Or help you update your info for lots of different services when you move, and finding new solutions for things that have to shift. Not the scariest or most intelligence-taxing tasks, but good sources of mundane utility.

Here they have Gemini watch the Google I/O keynote in real time. It seems to follow and convey facts reasonably well, but there’s something deeply lame going on too, and here again we have that female voice acting super fake-positive-enthusiastic. Then the user replies with the same fake-positive-enthusiastic tone, which explains a lot.

Google introduced Gemini 1.5 Pro as a full offering, not only confined to its Beta and the Studio. We already know what this baby can do, and its context window has been pushed to 2 million tokens in private preview. I note that I believe I hit the context limits on at least on the old version of NotebookLM, when I tried to load up as many of my posts as possible to dig through them, so yes there are reasons to go bigger.

That is indeed their intended use case, as they plan to offer context caching next month. Upload your files once, and have them available forever when useful.

NotebookLM is now getting Gemini 1.5 Pro, along with a bunch of automatic options to generate things like quizzes or study guides or conversational AI-generated audio presentations you can interact with and steer. Hmm.

They also claim other improvements across the board, but they don’t explain what their numbers mean at all or what this is anchored to, so it is pretty much useless, although it seemed worth grabbing anyway from 11: 25:

Google introduced Gemini 1.5 Flash, optimized for low latency and cost, available with up to 1 million tokens via Google AI Studio and Vertex AI.

How is the pricing?

  1. Price for Gemini Flash 1.5 will be $0.35-$0.70 per million tokens for input, $0.53-$1.05 for output, with a price increase at 128k tokens, with 112 tokens per second.

  2. This is compared to $0.25 per input and $1.25 per output for Claude Haiku.

  3. Given typical use cases, Gemini Flash 1.5 should be roughly 10% cheaper than Claude Haiku.

  4. Price for Gemini Pro 1.5 inputs are $3.50 per million tokens up to 128k context.

  5. By contrast, the much larger GPT-4o costs $5 for inputs and $15 for outputs, after the new discounts.

Google gives us Imagen 3, their latest image generator. The pictures in the thread are gorgeous, there is accurately reproduced text and it uses freeform English descriptions, and also these images are huge. And yes, they are producing images of people again. There will be watermarks. They claim it is preferred over ‘other popular image models.’ You can try it on ImageFX.

There is a new music model, Music AI Sandbox, for what that is worth, an area where OpenAI is passing. They highlight working with artists.

Here’s a very cool new feature, given the Nano model will live locally on your phone, also it is opt-in, to alert you to possible (read: obvious) scams:

Google: Thanks to Gemini Nano, @Android will warn you in the middle of a call as soon as it detects suspicious activity, like being asked for your social security number and bank info. Stay tuned for more news in the coming months.

I do not need this basic a warning and you likely do not either, but many others do. The keynote example was super obvious, but people still fall for the obvious.

They also mention an ‘ask this video’ or ‘ask this PDF’ convenient buttons, circle to search including for homework help, and making Nano fully multimodal.

Android will increasingly integrate Gemini, they say it is now ‘on the system level,’ and soon it will be fully context aware – in order to be a more helpful assistant of course. You get to stay in whatever app you were using with Gemini hovering above it. Gemini Nano will be operating natively, the bigger models elsewhere.

We get AI teammates, agents that can (among other things) answer questions on emails, meetings and other data within Workspace, searching through all conversations (did you think they were private?). It also says ‘searching chat messages’ at one point, again are we okay with this?

We get Veo for video generation, 1080p, testable in VideoFX text and image to video in limited preview. It has an ‘extend’ button. Not my thing, but others have been known to get excited. Who knows if it is better or worse than Sora. The question everyone is asking is, if you have AI video, working with Troy is great but where is Abed? This is the wheelhouse.

Google Search will be able to take video input and have various poorly explained new features. It will also start defaulting to giving you ‘AI Overview,’ and it is live.

It also will get multistep reasoning? They are not maximizing clarity of what is going on, but it is clear they intend to try and take a bunch of input types and help you solve problems. I especially like that it gives you a bunch of knobs you can turn that cause automatic adjustments. This one says it is coming in the summer.

In many cases ‘multistep reasoning’ seems (see about minute 46 in the full stream) to mean ‘tell me what facts to gather and display.’ In that case, yes, that seems great.

That if implemented well is a highly useful product but what are you going to do with my search? If you have a ten-part question that… shouldn’t be a Google Search. You should ask Gemini or a rival system. I am fine with it firing up Gemini for this when your question is clearly too complex for a search, but don’t take my search from me.

On the plus side, the Google Search Liaison tells us they are launching a new “Web” filter to show only text-based links. Don’t say they never did anything for us old school folks.

They will offer their version of personalized GPTs, called Gems.

Deploying in the summer, you can use Ask Photos to question Google photos with ‘what is my license plate number again?’ or ‘when did Lucia learn to swim?’ or ‘show me how her swimming has progressed.’ Actually makes me tempted to take photos. Right now they are a dark pile I have to dig through to find anything, so it seems better to only keep the bare minimum ones that matter, and this flips the script. Why not photo everything as a memory bank if you can actually parse it?

This is similar to how Gemini can search through your entire GMail, and Google Docs and Sheets and Meets. Previously I was trying to delete as many emails as possible. Otherwise, you get what happened to me at Jane Street where you are legally not allowed to ever delete anything, and after a while most searches you do turn up an endless supply of irrelevant drek.

What about direct integration? The Gemini ‘side panel’ will be widely available for Workspace next month. Gmail gets a ‘summarize this email [thread]’ button, great if sufficiently reliable, and a box to trigger Gemini on current context. You also get automatically generated context-customized smart replies to see if you like them, all starting later this month for lab users.

There is a feature to organize all your receipts into a folder on drive and list them with details on a spreadsheet (in sheets), and you can automate such a workflow automatically. It’s not clear how flexible and general versus scripted these tasks are.

Sheets gets an ‘analyze my data’ button.

They are announcing the sixth generation of TPUs, called Trillium. They claim a 4.7x improvement in compute per chip over fifth generation chips, available in cloud late 2024. And they note the Axiom CPUs they announced a month ago, and confirm they will also offer Nvidia’s Backwell.

The next day Google AI announced Med-Gemini, a new family of AI research models for medicine. It is not clear to me if there is anything here or not.

In other words: AI.

GPT-4o My and Google I/O Day Read More »

sse-vs.-sase:-which-one-is-right-for-your-business?

SSE vs. SASE: Which One is Right for Your Business?

Security service edge (SSE) and secure access service edge (SASE) are designed to cater to the evolving needs of modern enterprises that are increasingly adopting cloud services and supporting remote workforces. While SASE encompasses the same security features as SSE in addition to software-defined wide area networking (SD-WAN) capabilities, both offer numerous benefits over traditional IT security solutions.

The question is: which one is right for your business?

Head-to-Head SSE vs. SASE

The key differences between SSE and SASE primarily revolve around their scope and focus within the IT security and network architecture landscape.

Target Audience

  • SSE is particularly appealing to organizations that prioritize security over networking or have specific security needs that can be addressed without modifying their network architecture.
  • SASE is aimed at organizations seeking a unified approach to managing both their network and security needs, especially those with complex, distributed environments.

Design Philosophy

  • SSE is designed with a security-first approach, prioritizing cloud-centric security services to protect users and data regardless of location. It is particularly focused on securing access to the web, cloud services, and private applications.
  • SASE is designed to provide both secure and optimized network access, addressing the needs of modern enterprises with distributed workforces and cloud-based resources. It aims to simplify and consolidate network and security infrastructure.

Scope and Focus

  • SSE is a subset of SASE that focuses exclusively on security services. It integrates various security functions, such as cloud access security broker (CASB), firewall as a service (FWaaS), secure web gateway (SWG), zero-trust network access (ZTNA), and other security functions into a unified platform.
  • SASE combines both networking and security services in a single, cloud-delivered service model. It includes the same security functions as SSE but also incorporates networking capabilities like SD-WAN, WAN optimization, and quality of service (QoS).

Connectivity

  • SSE does not include SD-WAN or other networking functions, focusing instead on security aspects. It is ideal for organizations that either do not require advanced networking capabilities or have already invested in SD-WAN separately.
  • SASE includes SD-WAN and other networking functions as part of its offering, providing a comprehensive solution for both connectivity and security. This makes it suitable for organizations looking to consolidate their network and security infrastructure into a single platform.

Implementation Considerations

  • SSE can be a strategic choice for organizations looking to enhance their security posture without overhauling their existing network infrastructure. It allows for a phased approach to adopting cloud-based security services.
  • SASE represents a more holistic transformation, requiring organizations to integrate their networking and security strategies. It is well-suited for enterprises undergoing digital transformation and seeking to streamline their IT operations.

In summary, the choice between SSE and SASE depends on an organization’s specific needs. SSE offers a focused, security-centric solution, while SASE provides a comprehensive, integrated approach to both networking and security.

Pros and Cons of SSE and SASE

While cloud-based security solutions like SSE and SASE have been gaining traction as organizations move toward more cloud-centric, flexible, and remote-friendly IT environments, each has pros and cons.

Pros of SSE and SASE

Enhanced Security

  • SSE provides a unified platform for various security services like SWG, CASB, ZTNA, and FWaaS, which can improve an organization’s security posture by offering consistent protection across all users and data, regardless of location.
  • SASE combines networking and security into a single cloud service, which can lead to better security outcomes due to integrated traffic inspection and security policy implementation.

Scalability and Flexibility

  • Both SSE and SASE offer scalable security solutions that can adapt to changing business needs and accommodate growth without the need for significant infrastructure investment.

Simplified Management

  • SSE simplifies the management of security services by consolidating them into a single platform, reducing complexity and operational expenses.
  • SASE reduces the complexity of managing separate networking and security products by bringing them under one umbrella.

Improved Performance

  • SSE can improve user experience by providing faster and more efficient connectivity to web, cloud, and private applications.
  • SASE often leads to better network performance due to its built-in private backbone and optimization features.

Cost Savings

  • Both SSE and SASE can lead to cost savings by minimizing the need for multiple security and networking products and reducing the overhead associated with maintaining traditional hardware.

Cons of SSE and SASE

Security Risks

  • SSE may not account for the unique needs of application security for SaaS versus infrastructure as a service (IaaS), potentially leaving some attack surfaces unprotected.
  • SASE adoption may involve trade-offs between security and usability, potentially increasing the attack surface if security policies are relaxed.

Performance Issues

  • Some SSE solutions may introduce latency if they require backhauling data to a centralized point.
  • SASE may have performance issues if not properly configured or if the network is not tuned to work with cloud-native technologies.

Implementation Challenges

  • SSE can be complex to implement, especially for organizations with established centralized network security models.
  • SASE may involve significant changes to traditional infrastructure, which can disrupt productivity and collaboration during the transition.

Data Privacy and Compliance

  • SSE must ensure data privacy and compliance with country and regional industry regulations, which can be challenging for some providers.
  • SASE may introduce new challenges in compliance and data management due to the distribution of corporate data across external connections and cloud providers.

Dependency on Cloud Providers

  • Both SSE and SASE increase dependency on cloud providers, which can affect control over data and systems.

Vendor Lock-In

  • SSE could further confuse some who initially believe it is something separate from SASE, leading to potential vendor lock-in.
  • With SASE, there’s a risk of single provider lock-in, which may not be suitable for businesses requiring advanced IT security functionality.

While both SSE and SASE offer numerous benefits, they also present numerous challenges. Organizations must carefully weigh these factors to determine whether SSE or SASE aligns with their specific needs and strategic goals.

Key Considerations When Choosing Between SSE and SASE

When choosing between SSE and SASE, organizations must consider a variety of factors that align with their specific requirements, existing network infrastructure, and strategic objectives.

Organizational Security Needs

  • SSE is ideal for organizations prioritizing security services embedded within their network architecture, especially those in sectors like finance, government, and healthcare, where stringent security is paramount.
  • SASE is suitable for organizations seeking an all-encompassing solution that integrates networking and security services. It provides secure access across various locations and devices, tailored for a remote workforce.

Security vs. Network Priorities

  • If security is the top priority, SSE provides a comprehensive set of security services for cloud applications and services.
  • If network performance and scalability need to be improved, SASE may be the better option.

Support for Remote Workers and Branch Offices

  • SSE is often integrated with on-premises infrastructure and may be better suited for organizations looking to strengthen network security at the edge.
  • SASE is often a cloud-native solution with global points of presence, making it ideal for enterprises seeking to simplify network architecture, especially for remote users and branch offices.

Cloud-Native Solution vs. Network Infrastructure Security

  • SSE is deployed near data origin and emphasizes strong load balancing and content caching with firewalls or intrusion prevention systems.
  • SASE enables secure, anywhere access to cloud applications, integrating various network and security functions for a streamlined approach.

Existing Network Infrastructure

  • Organizations with complex or legacy network infrastructures may find SASE a better choice, as it can provide a more gradual path to migration.
  • For cloud-native organizations or those with simpler network needs, SSE may be more appropriate.

Vendor Architecture and SLAs

  • Ensure the chosen SSE vendor has strong service-level agreements (SLAs) and a track record of inspecting inline traffic for large global enterprises.
  • For SASE, a single-vendor approach can simplify management and enhance performance by optimizing the flow of traffic between users, applications, and the cloud.

Flexibility and Scalability

  • SSE should be flexible and scalable to address enterprise needs without sacrificing function, stability, and protection.
  • SASE should be adaptable to dynamic business needs and offer a roadmap that aligns with IT initiatives and business goals.

Budget Considerations

  • SASE solutions are typically more expensive up front but can offer significant cost savings in the long run by eliminating the need for multiple security appliances and tools.
  • SSE might be a more cost-effective option for organizations that do not require the full suite of networking services included in SASE.

Transition Path to SASE

  • SSE can serve as a stepping stone in the transition from traditional on-premises security to cloud-based security architecture, providing a clear path to SASE when the organization is ready.

Consultation with Experts

  • It is advisable to consult with network security experts to assess needs and requirements before recommending the best solution for the organization.

Next Steps

In summary, the choice between SSE and SASE depends on an organization’s specific needs. While SSE offers a focused, security-centric solution, SASE provides a comprehensive, integrated approach to both networking and security.

Take the time to make a thorough assessment of your organization’s needs before deciding which route to take. Once that’s done, you can create a vendor shortlist using our GigaOm Key Criteria and Radar reports for SSE and/or SASE.

These reports provide a comprehensive overview of the market, outline the criteria you’ll want to consider in a purchase decision, and evaluate how a number of vendors perform against those decision criteria.

If you’re not yet a GigaOm subscriber, you can access the research using a free trial.

SSE vs. SASE: Which One is Right for Your Business? Read More »

save-money-and-increase-performance-on-the-cloud

Save Money and Increase Performance on the Cloud

One of the most compelling aspects of cloud computing has always been the potential for cost savings and increased efficiency. Seen through the lens of industrial de-verticalization, this clear value proposition was at the core of most organizations’ decision to migrate their software to the cloud.

The Value Proposition of De-Verticalization

The strategic logic for de-verticalization is illustrated by the trend which began in the 1990s of outsourcing facilities’ maintenance and janitorial services.

A company that specializes in–let’s say–underwriting insurance policies must dedicate its mindshare and resources to that function if it expects to compete at the top of its field. While it may have had talented janitors with the necessary equipment on staff, and while clean facilities are certainly important, facilities maintenance is a cost center that does not provide a strategic return on what matters most to an insurance company. Wouldn’t it make more sense for both insurance and janitorial experts to dedicate themselves separately to being the best at what they do and avail those services to a broader market?

This is even more true for a data center. The era of verticalized technology infrastructure seems largely behind us. Though it’s a source of nostalgia for us geeks who were at home among the whir of the server rack fans, it’s easy enough to see why shareholders might have viewed it differently. Infrastructure was a cost center within IT, while IT as a whole is increasingly seen as a cost center.

The idea of de-verticalization was first pitched as something that would save money and allow us to work more efficiently. The more efficient part was intuitive, but there was immediate skepticism that budgets would actually shed expenses as hoped. At the very least it would be a long haul.

The Road to Performance and Cost Optimization

We find ourselves now somewhere in the middle of that long haul. The efficiencies certainly have come to pass. Having the build script deploy a new service to a Kubernetes cluster on the cloud is certainly nicer than waiting weeks or months for a VM to be approved, provisioned, and set up. But while the cloud saves the company money in the aggregate, it doesn’t show up as cheaper at the unit level. So, it’s at that level where anything that can be shed from the budget will be a win to celebrate.

This is a good position to be in. Opportunities for optimization abound under a fortuitous new circumstance: the things that technologists care about, like performance and power, dovetail precisely with the things that finance cares about, like cost. With the cloud, they are two sides of the same coin at an almost microscopic level. This trend will only accelerate.

To the extent that providers of computational resources (whether public cloud, hypervisors, containers, or any self-hosted combination) have effectively monetized these resources on a granular level and made them available a la carte, performance optimization and cost optimization sit at different ends of a single dimension. Enhancing a system’s performance or efficiency will reduce resource consumption costs. However, cost reduction is limited by the degree to which trade-offs with performance are tolerable and clearly demarcated. Cloud resource optimization tools help organizations strike the ideal balance between the two.

Choosing the Right Cloud Resource Optimization Solution

With that premise in mind, selecting the right cloud resource optimization solution should start by considering how your organization wants to approach the problem. This decision is informed by overall company philosophy and culture, what specific problems or goals are driving the initiative, and an anticipation of where overlapping capabilities may fulfill future business needs.

If the intent is to solve existing performance issues or to ensure continued high availability at future scale while knowing (and having the data to illustrate) you are paying no more than is necessary, focus on solutions that lean heavily into performance-oriented optimization. This is especially the case for companies that are developing software technology as part of their core business.

If the intent is to rein in spiraling costs or even to score some budgeting wins without jeopardizing application performance, expand your consideration to solutions that offer a broader FinOps focus. Tools with a FinOps focus tend to emphasize informing engineers of cost impacts, and may even make some performance tuning suggestions, but they are overall less prescriptive from an implementation standpoint. Certain organizations may find this approach most effective even if they are approaching the problem from a performance point of view.

Now that many organizations have successfully migrated large portions of their application portfolio to the cloud, the remaining work is largely a matter of cleaning up and keeping the topology tidy. Why not trust that job to a tool that is purpose-made for optimizing cloud resources?

Next Steps

To learn more, take a look at GigaOm’s cloud resource optimization Key Criteria and Radar reports. These reports provide a comprehensive overview of the market, outline the criteria you’ll want to consider in a purchase decision, and evaluate how a number of vendors perform against those decision criteria.

If you’re not yet a GigaOm subscriber, you can access the research using a free trial.

Save Money and Increase Performance on the Cloud Read More »

a-crushing-backlash-to-apple’s-new-ipad-ad

A crushing backlash to Apple’s new iPad ad

1984 called and would like to have a word —

Hydraulic press destroying “symbols of creativity” has folks hopping mad.

A screenshot of the Apple iPad ad

Enlarge / A screenshot of the Apple iPad ad.

Apple via YouTube

An advert by Apple for its new iPad tablet showing musical instruments, artistic tools, and games being crushed by a giant hydraulic press has been attacked for cultural insensitivity in an online backlash.

The one-minute video was launched by Apple chief executive Tim Cook to support its new range of iPads, the first time that the US tech giant has overhauled the range for two years as it seeks to reverse faltering sales.

The campaign—soundtracked by Sonny and Cher’s 1971 hit All I Ever Need Is You—is designed to show how much Apple has been able to squeeze into the thinner tablet. The ad was produced in-house by Apple’s creative team, according to trade press reports.

The campaign has been hit by a wave of outrage, with responses on social media reacting to Cook’s X post accusing Apple of crushing “beautiful creative tools” and the “symbols of human creativity and cultural achievements.”

Advertising industry executives argued the ad represented a mis-step for the Silicon Valley giant, which under late co-founder Steve Jobs was lauded for its ability to capture consumer attention through past campaigns.

Christopher Slevin, creative director for marketing agency Inkling Culture, compared the iPad ad unfavorably to a famous Apple campaign directed by Ridley Scott called “1984” for the original Macintosh computer, which positioned Apple as liberating a dystopian, monochrome world.

“Apple’s new iPad spot is essentially them turning into the thing they said they were out to destroy in the 1984 ad,” said Slevin.

Actor Hugh Grant accused Apple of “the destruction of the human experience courtesy of Silicon Valley” on X.

However, Richard Exon, founder of marketing agency Joint, said: “A more important question is: does the ad do its job? It’s memorable, distinctive, and I now know the new iPad has even more in it yet is thinner than ever.”

Consumer insights platform Zappi conducted consumer research on the ad that suggested that the idea of the hydraulic press crushing art was divisive.

It said that the ad underperformed benchmarks in typically sought-after emotions such as happiness and laughter and overperformed in traditionally negative emotions like shock and confusion, with older people more likely to have a negative response than younger consumers.

Nataly Kelly, chief marketing officer at Zappi, said: “Is the Apple iPad ad a work of genius or the sign of the dystopian times? It really depends on how old you are. The shock value is the power of this advert, which is controversial by design, so the fact that people are talking about it at all is a win.”

Apple did not immediately respond to a request for comment.

© 2024 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

A crushing backlash to Apple’s new iPad ad Read More »

nasa-confirms-“independent-review”-of-orion-heat-shield-issue

NASA confirms “independent review” of Orion heat shield issue

The Orion spacecraft after splashdown in the Pacific Ocean at the end of the Artemis I mission.

Enlarge / The Orion spacecraft after splashdown in the Pacific Ocean at the end of the Artemis I mission.

NASA has asked a panel of outside experts to review the agency’s investigation into the unexpected loss of material from the heat shield of the Orion spacecraft on a test flight in 2022.

Chunks of charred material cracked and chipped away from Orion’s heat shield during reentry at the end of the 25-day unpiloted Artemis I mission in December 2022. Engineers inspecting the capsule after the flight found more than 100 locations where the stresses of reentry stripped away pieces of the heat shield as temperatures built up to 5,000° Fahrenheit.

This was the most significant discovery on the Artemis I, an unpiloted test flight that took the Orion capsule around the Moon for the first time. The next mission in NASA’s Artemis program, Artemis II, is scheduled for launch late next year on a test flight to send four astronauts around the far side of the Moon.

Another set of eyes

The heat shield, made of a material called Avcoat, is attached to the base of the Orion spacecraft in 186 blocks. Avcoat is designed to ablate, or erode, in a controlled manner during reentry. Instead, fragments fell off the heat shield that left cavities resembling potholes.

Investigators are still looking for the root cause of the heat shield problem. Since the Artemis I mission, engineers conducted sub-scale tests of the Orion heat shield in wind tunnels and high-temperature arcjet facilities. NASA has recreated the phenomenon observed on Artemis I in these ground tests, according to Rachel Kraft, an agency spokesperson.

“The team is currently synthesizing results from a variety of tests and analyses that inform the leading theory for what caused the issues,” said Rachel Kraft, a NASA spokesperson.

Last week, nearly a year and a half after the Artemis I flight, the public got its first look at the condition of the Orion heat shield with post-flight photos released in a report from NASA’s inspector general. Cameras aboard the Orion capsule also recorded pieces of the heat shield breaking off the spacecraft during reentry.

NASA’s inspector general said the char loss issue “creates a risk that the heat shield may not sufficiently protect the capsule’s systems and crew from the extreme heat of reentry on future missions.”

“Those pictures, we’ve seen them since they were taken, but more importantly… we saw it,” said Victor Glover, pilot of the Artemis II mission, in a recent interview with Ars. “More than any picture or report, I’ve seen that heat shield, and that really set the bit for how interested I was in the details.”

NASA confirms “independent review” of Orion heat shield issue Read More »

professor-sues-meta-to-allow-release-of-feed-killing-tool-for-facebook

Professor sues Meta to allow release of feed-killing tool for Facebook

Professor sues Meta to allow release of feed-killing tool for Facebook

themotioncloud/Getty Images

Ethan Zuckerman wants to release a tool that would allow Facebook users to control what appears in their newsfeeds. His privacy-friendly browser extension, Unfollow Everything 2.0, is designed to essentially give users a switch to turn the newsfeed on and off whenever they want, providing a way to eliminate or curate the feed.

Ethan Zuckerman, a professor at University of Massachusetts Amherst, is suing Meta to release a tool allowing Facebook users to

Ethan Zuckerman, a professor at University of Massachusetts Amherst, is suing Meta to release a tool allowing Facebook users to “unfollow everything.” (Photo by Lorrie LeJeune)

The tool is nearly ready to be released, Zuckerman told Ars, but the University of Massachusetts Amherst associate professor is afraid that Facebook owner Meta might threaten legal action if he goes ahead. And his fears appear well-founded. In 2021, Meta sent a cease-and-desist letter to the creator of the original Unfollow Everything, Louis Barclay, leading that developer to shut down his tool after thousands of Facebook users had eagerly downloaded it.

Zuckerman is suing Meta, asking a US district court in California to invalidate Meta’s past arguments against developers like Barclay and rule that Meta would have no grounds to sue if he released his tool.

Zuckerman insists that he’s “suing Facebook to make it better.” In picking this unusual legal fight with Meta, the professor—seemingly for the first time ever—is attempting to tip Section 230’s shield away from Big Tech and instead protect third-party developers from giant social media platforms.

To do this, Zuckerman is asking the court to consider a novel Section 230 argument relating to an overlooked provision of the law that Zuckerman believes protects the development of third-party tools that allow users to curate their newsfeeds to avoid objectionable content. His complaint cited case law and argued:

Section 230(c)(2)(B) immunizes from legal liability “a provider of software or enabling tools that filter, screen, allow, or disallow content that the provider or user considers obscene, lewd, lascivious, filthy, excessively violent, harassing, or otherwise objectionable.” Through this provision, Congress intended to promote the development of filtering tools that enable users to curate their online experiences and avoid content they would rather not see.

Unfollow Everything 2.0 falls in this “safe harbor,” Zuckerman argues, partly because “the purpose of the tool is to allow users who find the newsfeed objectionable, or who find the specific sequencing of posts within their newsfeed objectionable, to effectively turn off the feed.”

Ramya Krishnan, a senior staff attorney at the Knight Institute who helped draft Zuckerman’s complaint, told Ars that some Facebook users are concerned that the newsfeed “prioritizes inflammatory and sensational speech,” and they “may not want to see that kind of content.” By turning off the feed, Facebook users could choose to use the platform the way it was originally designed, avoiding being served objectionable content by blanking the newsfeed and manually navigating to only the content they want to see.

“Users don’t have to accept Facebook as it’s given to them,” Krishnan said in a press release provided to Ars. “The same statute that immunizes Meta from liability for the speech of its users gives users the right to decide what they see on the platform.”

Zuckerman, who considers himself “old to the Internet,” uses Facebook daily and even reconnected with and began dating his now-wife on the platform. He has a “soft spot” in his heart for Facebook and still finds the platform useful to keep in touch with friends and family.

But while he’s “never been in the ‘burn it all down’ camp,” he has watched social media evolve to give users less control over their feeds and believes “that the dominance of a small number of social media companies tends to create the illusion that the business model adopted by them is inevitable,” his complaint said.

Professor sues Meta to allow release of feed-killing tool for Facebook Read More »

logic-pro-gets-some-serious-ai—and-a-version-bump—for-mac-and-ipad

Logic Pro gets some serious AI—and a version bump—for Mac and iPad

The new Chord Track feature.

Enlarge / The new Chord Track feature.

Apple

If you watched yesterday’s iPad-a-palooza event from Apple, then you probably saw the segment about cool new features coming to the iPad version of Logic Pro, Apple’s professional audio recording software. But what the event did not make clear was that all the same features are coming to the Mac version of Logic Pro—and both the Mac and iPad versions will get newly numbered. After many years, the Mac version of Logic Pro will upgrade from X (ten) to 11, while the much more recent iPad version increments to 2.

Both versions will be released on May 13, and both are free upgrades for existing users. (Sort of—iPad users have to pay a subscription fee to access Logic Pro, but if you already pay, you’ll get the upgrade. This led many people to speculate online that Apple would move the Mac version of Logic to a similar subscription model; thankfully, that is not the case. Yet.)

Both versions will gain an identical set of new features, which were touched on briefly in Apple’s event video. But thanks to a lengthy press release that Apple posted after the event, along with updates to Apple’s main Logic page, we now have a better sense of what these features are, what systems they require, and just how much Apple has gone all-in on AI. Also, we get some pictures.

The new ChromaGlow plugin. It saturates!

Enlarge / The new ChromaGlow plugin. It saturates!

AI everywhere

One of Logic’s neat features is Drummer, a generative performer that can play in many different styles, can follow along with recorded tracks, and can throw in plenty of fills and other humanizing variations. For a tool that comes free with your digital audio workstation, it’s an amazing product, and it has received various quality-of-life improvements over the last decade, including producer kits that let you break out and control each individual percussion element. But what we haven’t seen in 10 years is new generative session players, especially for bass and keys.

The wait is over, though, because Apple is adding a bass and a keyboard player to Logic. The new Bass Player was “trained in collaboration with today’s best bass players, using advanced AI and sampling technologies,” Apple says. Logic will also come with Studio Bass, a set of six new instruments.

The Keyboard Player works similarly and gets a new Studio Piano plugin that provides features commonly found in paid virtual instruments (multiple mic positions, control over pedal and key noise, sympathetic resonance, and release samples). Apple says that Keyboard Player can handle everything from “simple block cords to chord voicing with extended harmony—with nearly endless variations.”

  • The new Drummer.

  • Keyboard Player.

Drummer’s secret to success is in just how easy it makes dialing in a basic drum pattern. Select the drummer who plays your style, pick a kit you like, and then pick a variation; after that, simply place a dot on a big trackpad-style display that balances complexity with volume, and you have something usable, complete with fills. Bass and Keyboard Players can’t work that way, of course, but Apple is bringing a feature seen in some other DAWs to Logic in order to power both new session players: Chord Track.

Logic Pro gets some serious AI—and a version bump—for Mac and iPad Read More »

critical-vulnerabilities-in-big-ip-appliances-leave-big-networks-open-to-intrusion

Critical vulnerabilities in BIG-IP appliances leave big networks open to intrusion

MULTIPLE ATTACK PATHS POSSIBLE —

Hackers can exploit them to gain full administrative control of internal devices.

Critical vulnerabilities in BIG-IP appliances leave big networks open to intrusion

Getty Images

Researchers on Wednesday reported critical vulnerabilities in a widely used networking appliance that leaves some of the world’s biggest networks open to intrusion.

The vulnerabilities reside in BIG-IP Next Central Manager, a component in the latest generation of the BIG-IP line of appliances organizations use to manage traffic going into and out of their networks. Seattle-based F5, which sells the product, says its gear is used in 48 of the top 50 corporations as tracked by Fortune. F5 describes the Next Central Manager as a “single, centralized point of control” for managing entire fleets of BIG-IP appliances.

As devices performing load balancing, DDoS mitigation, and inspection and encryption of data entering and exiting large networks, BIG-IP gear sits at their perimeter and acts as a major pipeline to some of the most security-critical resources housed inside. Those characteristics have made BIG-IP appliances ideal for hacking. In 2021 and 2022, hackers actively compromised BIG-IP appliances by exploiting vulnerabilities carrying severity ratings of 9.8 out of 10.

On Wednesday, researchers from security firm Eclypsium reported finding what they said were five vulnerabilities in the latest version of BIG-IP. F5 has confirmed two of the vulnerabilities and released security updates that patch them. Eclypsium said three remaining vulnerabilities have gone unacknowledged, and it’s unclear if their fixes are included in the latest release. Whereas the exploited vulnerabilities from 2021 and 2022 affected older BIG-IP versions, the new ones reside in the latest version, known as BIG-IP Next. The severity of both vulnerabilities is rated as 7.5.

“BIG-IP Next marks a completely new incarnation of the BIG-IP product line touting improved security, management, and performance,” Eclypsium researchers wrote. “And this is why these new vulnerabilities are particularly significant—they not only affect the newest flagship of F5 code, they also affect the Central Manager at the heart of the system.”

The vulnerabilities allow attackers to gain full administrative control of a device and then create accounts on systems managed by the Central Manager. “These attacker-controlled accounts would not be visible from the Next Central Manager itself, enabling ongoing malicious persistence within the environment,” Eclypsium said. The researchers said they have no indication any of the vulnerabilities are under active exploitation.

Both of the fixed vulnerabilities can be exploited to extract password hashes or other sensitive data that allow for the compromise of administrative accounts on BIG-IP systems. F5 described one of them—tracked as CVE-2024-21793—as an Odata injection flaw, a class of vulnerability that allows attackers to inject malicious data into Odata queries. The other vulnerability, CVE-2024-26026, is an SQL injection flaw that can execute malicious SQL statements.

Eclypsium said it reported three additional vulnerabilities. One is an undocumented programming interface that allows for server-side request forgeries, a class of attack that gains access to sensitive internal resources that are supposed to be off-limits to outsiders. Another is the ability for unauthenticated administrators to reset their password even without knowing what it is. Attackers who gained control of an administrative account could exploit this last flaw to lock out all legitimate access to a vulnerable device.

The third is a configuration in the bcrypt password hashing algorithm that makes it possible to perform brute-force attacks against millions of passwords per second. The Open Web Application Security Project says that the bcrypt “work factor”—meaning the amount of resources required to convert plaintext into cryptographic hashes—should be set to a level no lower than 10. When Eclypsium performed its analysis, the Central Manager set it at six.

Eclypsium researchers wrote:

The vulnerabilities we have found would allow an adversary to harness the power of Next Central Manager for malicious purposes. First, the management console of the Central Manager can be remotely exploited by any attacker able to access the administrative UI via CVE 2024-21793 or CVE 2024-26026. This would result in full administrative control of the manager itself. Attackers can then take advantage of the other vulnerabilities to create new accounts on any BIG-IP Next asset managed by the Central Manager. Notably, these new malicious accounts would not be visible from the Central Manager itself.

All 5 vulnerabilities were disclosed to F5 in one batch, but F5 only formally assigned CVEs to the 2 unauthenticated vulnerabilities. We have not confirmed if the other 3 were fixed at the time of publication.

F5 representatives didn’t immediately have a response to the report. Eclypsium went on to say:

These weaknesses can be used in a variety of potential attack paths. At a high level attackers can remotely exploit the UI to gain administrative control of the Central Manager. Change passwords for accounts on the Central Manager. But most importantly, attackers could create hidden accounts on any downstream device controlled by the Central Manager.

Eclypsium

The vulnerabilities are present in BIG-IP Next Central Manager versions 20.0.1 through 20.1.0. Version 20.2.0, released Wednesday, fixes the two acknowledged vulnerabilities. As noted earlier, it’s unknown if version 20.2.0 fixes the other behavior Eclypsium described.

“If they are fixed, it is +- okay-ish, considering the version with them will still be considered vulnerable to other things and need a fix,” Eclypsium researcher Vlad Babkin wrote in an email. “If not, the device has a long-term way for an authenticated attacker to keep their access forever, which will be problematic.”

A query using the Shodan search engine shows only three instances of vulnerable systems being exposed to the Internet.

Given the recent rash of active exploits targeting VPNs, firewalls, load balancers, and other devices positioned at the network edge, BIG-IP Central Manager users would do well to place a high priority on patching the vulnerabilities. The availability of proof-of-concept exploitation code in the Eclypsium disclosure further increases the likelihood of active attacks.

Critical vulnerabilities in BIG-IP appliances leave big networks open to intrusion Read More »