Author name: Shannon Garcia

nasa-and-spacex-misjudged-the-risks-from-reentering-space-junk

NASA and SpaceX misjudged the risks from reentering space junk

A European ATV cargo freighter reenters the atmosphere over the Pacific Ocean in 2013.

Enlarge / A European ATV cargo freighter reenters the atmosphere over the Pacific Ocean in 2013.

Since the beginning of the year, landowners have discovered several pieces of space junk traced to missions supporting the International Space Station. On all of these occasions, engineers expected none of the disposable hardware would survive the scorching heat of reentry and make it to Earth’s surface.

These incidents highlight an urgency for more research into what happens when a spacecraft makes an uncontrolled reentry into the atmosphere, according to engineers from the Aerospace Corporation, a federally funded research center based in El Segundo, California. More stuff is getting launched into space than ever before, and the trend will continue as companies deploy more satellite constellations and field heavier rockets.

“The biggest immediate need now is just to do some more work to really understand this whole process and to be in a position to be ready to accommodate new materials, new operational approaches as they happen more quickly,” said Marlon Sorge, executive director of Aerospace’s Center for Orbital and Reentry Debris Studies. “Clearly, that’s the direction that spaceflight is going.”

Ideally, a satellite or rocket body at the end of its life could be guided to a controlled reentry into the atmosphere over a remote part of the ocean. But this is often cost-prohibitive because it would require carrying extra fuel for the de-orbit maneuvers, and in many cases, a spacecraft doesn’t have any rocket thrusters at all.

In March, a fragment from a battery pack jettisoned from the space station punched a hole in the roof of a Florida home, a rare instance of terrestrial property damage attributed to a piece of space junk. In May, a 90-pound chunk of a SpaceX Dragon spacecraft that departed the International Space Station fell on the property of a “glamping” resort in North Carolina. At the same time, a homeowner in a nearby town found a smaller piece of material that also appeared to be from the same Dragon mission.

These events followed the discovery in April of another nearly 90-pound piece of debris from a Dragon capsule on a farm in the Canadian province of Saskatchewan. NASA and SpaceX later determined the debris fell from orbit in February, and earlier this month, SpaceX employees came to the farm to retrieve the wreckage, according to CBC.

Pieces of a Dragon spacecraft also fell over Colorado last year, and a farmer in Australia found debris from a Dragon capsule on his land in 2022.

NASA and SpaceX misjudged the risks from reentering space junk Read More »

apple’s-vision-pro-goes-on-sale-outside-the-us-for-the-first-time

Apple’s Vision Pro goes on sale outside the US for the first time

Spatial computing —

Since February, the headset has only been available in the United States.

A mixed reality headset over a table in an Apple Store

Enlarge / A Vision Pro on display at an Apple Store in Tokyo.

Apple

Apple’s Vision Pro headset went on sale outside the United States for the first time today, in the first of two waves of expanded availability.

The $3,499 “spatial computing” device launched back in February in the US, but it hasn’t taken the tech world by storm. Part of that has been its regional launch, with some of the biggest markets still lacking access.

Apple announced that the product would be sold internationally during its keynote at the Worldwide Developers Conference earlier this month.

The first new markets to get Vision Pro shipments are China, Japan, and Singapore—those are the ones where it went on sale today.

A second wave will come on July 12, with the headset rolling out in Australia, Canada, France, Germany, and the United Kingdom.

When we first tested the Vision Pro in February, we wrote that it was a technically impressive device with a lot of untapped potential. It works very well as a personal entertainment device for frequent travelers, in particular. However, its applications for productivity and gaming still need to be expanded to justify the high price.

Of course, there have been conflicting rumors of late about just how expensive Apple plans to keep its mixed reality devices. One report claimed that the company put the brakes on a new version of the Vision Pro for now, opting instead to develop a cheaper alternative for a 2025 launch.

But another report in Bloomberg suggested that’s an overstatement.  It simply noted that the Vision Pro 2 has been slightly delayed from its original target launch window and reported that the cheaper model will come first.

In any case, availability will have to expand and the price will ultimately have to come down if augmented reality can become the major computing revolution that Apple CEO Tim Cook has predicted. This international rollout is the next step to test whether there’s a market for that.

Apple’s Vision Pro goes on sale outside the US for the first time Read More »

bipartisan-consensus-in-favor-of-renewable-power-is-ending

Bipartisan consensus in favor of renewable power is ending

End of an era —

The change is most pronounced in those over 50 years old.

Image of solar panels on a green grassy field, with blue sky in the background.

One of the most striking things about the explosion of renewable power that’s happening in the US is that much of it is going on in states governed by politicians who don’t believe in the problem wind and solar are meant to address. Acceptance of the evidence for climate change tends to be lowest among Republicans, yet many of the states where renewable power has boomed—wind in Wyoming and Iowa, solar in Texas—are governed by Republicans.

That’s partly because, up until about 2020, there was a strong bipartisan consensus in favor of expanding wind and solar power, with support above 75 percent among both parties. Since then, however, support among Republicans has dropped dramatically, approaching 50 percent, according to polling data released this week.

Renewables enjoyed solid Republican support until recently.

Renewables enjoyed solid Republican support until recently.

To a certain extent, none of this should be surprising. The current leader of the Republican Party has been saying that wind turbines cause cancer and offshore wind is killing whales. And conservative-backed groups have been spreading misinformation in order to drum up opposition to solar power facilities.

Meanwhile, since 2022, the Inflation Reduction Act has been promoted as one of the Biden administration’s signature accomplishments and has driven significant investments in renewable power, much of it in red states. Negative partisanship is undoubtedly contributing to this drop in support.

One striking thing about the new polling data, gathered by the Pew Research Center, is how dramatically it skews with age. When given a choice between expanding fossil fuel production or expanding renewable power, Republicans under the age of 30 favored renewables by a 2-to-1 margin. Republicans over 30, in contrast, favored fossil fuels by margins that increased with age, topping out at a three-to-one margin in favor of fossil fuels among those in the 65-and-over age group. The decline in support occurred in those over 50 starting in 2020; support held steady among younger groups until 2024, when the 30–49 age group started moving in favor of fossil fuels.

Among younger Republicans, support for renewable energy remains high.

Among younger Republicans, support for renewable energy remains high.

Democrats, by contrast, break in favor of renewables by 75 points, with little difference across age groups and no indication of significant change over time. They’re also twice as likely to think a solar farm will help the local economy than Republicans are.

Similar differences were apparent when Pew asked about policies meant to encourage the sale of electric vehicles, with 83 percent of Republicans opposed to having half of cars sold be electric in 2032. By contrast, nearly two-thirds of Democrats favored this policy.

There’s also a rural/urban divide apparent (consistent with Republicans getting more support from rural voters). Forty percent of urban residents felt that a solar farm would improve the local economy; only 25 percent of rural residents agreed. Rural residents were also more likely to say solar farms made the landscape unattractive and take up too much space. (Suburban participants were consistently in between rural and urban participants.)

What’s behind these changes? The single biggest factor appears to be negative partisanship combined with the election of Joe Biden.

For Republicans, 2020 represented an inflection point in terms of support for different types of energy. That wasn't true for Democrats.

For Republicans, 2020 represented an inflection point in terms of support for different types of energy. That wasn’t true for Democrats.

Among Republicans, support for every single form of power started to change in 2020—fossil fuels, renewables, and nuclear. Among Democrats, that’s largely untrue. Their high level of support for renewable power and aversion to fossil fuels remained largely unchanged. The lone exception is nuclear power, where support rose among both Democrats and Republicans (the Biden administration has adopted a number of pro-nuclear policies).

This isn’t to say that non-political factors are playing no role. The rapid expansion of renewable power means that many more people are seeing facilities open near them, and viewing that as an indication of a changing society. Some degree of backlash was almost inevitable and, in this case, the close ties between conservative lobbyists and fossil fuel interests were ready to take advantage of it.

Bipartisan consensus in favor of renewable power is ending Read More »

it’s-a-showdown-with-sabretooth-in-latest-deadpool-and-wolverine-trailer

It’s a showdown with Sabretooth in latest Deadpool and Wolverine trailer

“Ground and pound until he makes no sound” —

“People have waited decades for this fight. It’s not gonna be easy.”

Ryan Reynolds and Hugh Jackman star in Deadpool and Wolverine.

It’s safe to say that Marvel Studios’ Deadpool and Wolverine is one of the most hotly anticipated releases of the summer. We’ve had a teaser and full trailer, and now the studio has released a second one-minute trailer with a surprise appearance bound to delight X-Men fans everywhere. It’s none other than Sabretooth, played by the same actor, Tyler Mane, who portrayed the character in 2000’s X-Men. And he’s got a score to settle with Wolverine.

As previously reported, Ryan Reynolds found the perfect fit with 2016’s Deadpool, starring as Wade Wilson, a former Canadian special forces operative (dishonorably discharged) who develops regenerative healing powers that heal his cancer but leave him permanently disfigured with scars all over his body. Wade decides to become a masked vigilante, turning down an invitation to join the X-Men and abandon his bad-boy ways. The first Deadpool was a big hit, racking up $782 million at the global box office, critical praise, and a couple of Golden Globe nominations for good measure. Deadpool 2 was released in 2018 and was just as successful.

Deadpool and Wolverine reunites Reynolds with many familiar faces from the first two films. Morena Baccarin is back as Wade’s girlfriend Vanessa, along with Leslie Uggams as Blind Al; Karan Soni as Wade’s personal chauffeur, taxi driver Dopinder; Brianna Hildebrand as Negasonic Teenage Warhead; Stefan Kapičić as the voice of Colossus; Shioli Kutsuna as Negasonic’s mutant girlfriend, Yukio; Randal Reeder as Buck; and Lewis Tan as X-Force member Shatterstar.

We’re also getting some characters drawn from various films under the 20th Century Fox Marvel umbrella: Pyro (Aaron Stanford)—last seen in 2006’s X-Men: The Last Stand—and Jennifer Garner’s Elektra, who appeared in the 2003 Daredevil film as well as 2005’s Elektra. Along with Sabretooth, the mutants Toad and Dogpool should be on hand to make some trouble. New to the franchise are Matthew MacFadyen as a Time Variance Authority agent named Paradox and Emma Corrin as the lead villain. There have been rumors that Owen Wilson’s Mobius and the animated Miss Minutes from Loki may also appear in the film.

  • The battle is going pretty well and this dynamic duo wants to know: “Who’s next?”

    YouTube/Marvel Studios

  • “Oh. My. God. Sabretooth.” Our feelings exactly.

    YouTube/Marvel Studios

  • Deadpool calls for a timeout because Wolverine “looks ridiculous” with all those weapons sticking out of him.

    YouTube/Marvel Studios

  • Wolverine is not amused.

    YouTube/Marvel Studios

  • Battle!!!

    YouTube/Marvel Studios

Marvel released a two-minute teaser for the new movie during the Super Bowl in February, featuring the trademark cheeky irreverence that made audiences embrace Reynold’s R-rated superhero in the first place, plus a glimpse of Hugh Jackman’s Wolverine—or rather, his distinctive shadow. And yes, Marvel is retaining that R rating—a big step given that all the prior MCU films have been resoundingly PG-13. Marvel dropped a full trailer in April that was chock-full of off-color witticisms, meta-references, slo-mo action, and a generous sprinkling of F-bombs. (But no cocaine! Wade promised Kevin Feige!)

This latest trailer has a lot of the same footage as that April trailer until the 26-second mark. That’s when Wolverine growls, “Who’s next?” after battling a horde of foes. Who should jump into the fray with an answering growl but Sabretooth. We are all Deadpool when he exclaims, “Oh. My. God.” Sabretooth breaks out his claws and asks Wolverine if he’s ready to die. That’s when Deadpool calls a timeout to pull a few weapons out of his frenemy and offer a few tips on how to defeat the other mutant, to Wolverine’s annoyance.

“People have waited decades for this fight,” Deadpool insists. “It’s not gonna be easy. Baby knife. Shoot the devil, you take him down. Side control. Then full mount, and you ground and pound until he makes no sound because he’s dead. OK, good luck, I’m a huge fan.” We’ll have to wait a few more weeks to find out if Wolverine takes any of that advice.

Deadpool and Wolverine hits theaters on July 26, 2024.

Listing image by YouTube/Marvel Studios

It’s a showdown with Sabretooth in latest Deadpool and Wolverine trailer Read More »

rocket-report:-china-flies-reusable-rocket-hopper;-falcon-heavy-dazzles

Rocket Report: China flies reusable rocket hopper; Falcon Heavy dazzles

SpaceX's 10th Falcon Heavy rocket climbs into orbit with a new US government weather satellite.

Enlarge / SpaceX’s 10th Falcon Heavy rocket climbs into orbit with a new US government weather satellite.

Welcome to Edition 6.50 of the Rocket Report! SpaceX launched its 10th Falcon Heavy rocket this week with the GOES-U weather satellite for NOAA, and this one was a beauty. The late afternoon timing of the launch and atmospheric conditions made for great photography. Falcon Heavy has become a trusted rocket for the US government, and its next flight in October will deploy NASA’s Europa Clipper spacecraft on the way to explore one of Jupiter’s enigmatic icy moons.

As always, we welcome reader submissions, and if you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

Sir Peter Beck dishes on launch business. Ars spoke with the recently knighted Peter Beck, founder and CEO of Rocket Lab, on where his scrappy company fits in a global launch marketplace dominated by SpaceX. Rocket Lab racked up the third-most number of orbital launches by any US launch company (it’s headquartered in California but primarily assembles and launches rockets in New Zealand). SpaceX’s rideshare launch business with the Falcon 9 rocket is putting immense pressure on small launch companies like Rocket Lab. However, Beck argues his Electron rocket is a bespoke solution for customers desiring to put their satellite in a specific place at a specific time, a luxury they can’t count on with a SpaceX rideshare.

Ruthlessly efficient … A word that Beck returned to throughout his interview with Ars was “ruthless.” He said Rocket Lab’s success is a result of the company being “ruthlessly efficient and not making mistakes.” At one time, Rocket Lab was up against Virgin Orbit in the small launch business, and Virgin Orbit had access to capital through billionaire Richard Branson. Now, SpaceX is the 800-pound gorilla in the market. “We have a saying here at Rocket Lab that we have no money, so we have to think. We’ve never been in a position to outspend our competitors. We just have to out-think them. We have to be lean and mean.”

Firefly reveals plans for new launch sites. Firefly Aerospace plans to use the state of Virginia-owned launch pad at NASA’s Wallops Flight Facility for East Coast launches of its Alpha small-satellite rocket, Aviation Week reports. The company plans to use Pad 0A for US military and other missions, particularly those requiring tight turnaround between procurement and launch. This is the same launch pad previously used by Northrop Grumman’s Antares rocket, and it’s the soon-to-be home of the Medium Launch Vehicle (MLV) jointly developed by Northrop and Firefly. The launch pad will be configured for Alpha launches beginning in 2025, according to Firefly, which previously planned to develop an Alpha launch pad at Cape Canaveral Space Force Station in Florida. Now, Alpha and MLV rockets will fly from the same site on the East Coast, while Alpha will continue launching from the West Coast at Vandenberg Space Force Base, California.

Hello, Sweden… A few days after the announcement for launches from Virginia, Firefly unveiled a collaborative agreement with Swedish Space Corporation to launch Alpha rockets from the Esrange Space Center in Sweden as soon as 2026. Esrange has been the departure point for numerous suborbital and sounding rocket for nearly 50 years, but the spaceport is being upgraded for orbital satellite launches. A South Korean startup named Perigee Aerospace announced in May it signed an agreement to be the first user of Esrange’s orbital launch capability. Firefly is the second company to make plans to launch satellites from the remote site in northern Sweden. (submitted by Ken the Bin and brianrhurley)

The easiest way to keep up with Eric Berger’s space reporting is to sign up for his newsletter, we’ll collect his stories in your inbox.

China hops closer to reusable rockets. The Shanghai Academy of Spaceflight Technology (SAST), part of China’s apparatus of state-owned aerospace companies, has conducted the country’s highest altitude launch and landing test so far as several teams chase reusable rocket capabilities, Space News reports. A 3.8-meter-diameter (9.2-foot) test article powered by three methane liquid-oxygen engines lifted off from the Gobi Desert on June 23 and soared to an altitude of about 12 kilometers (7.5 miles) before setting down successfully for a vertical propulsive touchdown on landing legs at a nearby landing area. SAST will follow up with a 70-kilometer (43.5-mile) suborbital test using grid fins for better control. A first orbital flight of the new reusable rocket is planned for 2025.

Lots of players … If you don’t exclusively follow China’s launch sector, you should be forgiven for being unable to list all the companies working on new reusable rockets. Late last year, a Chinese startup named iSpace flew a hopper rocket testbed to an altitude of several hundred meters as part of a development program for the company’s upcoming partially reusable Hyperbola 2 rocket. A company named Space Pioneer plans to launch its medium-class Tianlong 3 rocket for the first time later this year. Tianlong 3 looks remarkably like SpaceX’s Falcon 9, and its first stage will eventually be made reusable. China recently test-fired engines for the government’s new Long March 10, a partially reusable rocket planned to become China’s next-generation crew launch vehicle. These are just a few of the reusable rocket programs in China. (submitted by Ken the Bin)

Spanish launch startup invests in Kourou. PLD Space says it is ready to start construction at a disused launch complex at the Guiana Space Center in Kourou, French Guiana. The Spanish launch startup announced this week a 10 million euro ($10.7 million) investment in the launch complex for its Miura 5 rocket, with preparations of the site set to begin “after the summer.” The launch pad was previously used by the French Diamant rocket in the 1970s and is located several miles away from the launch pads used by the European Ariane 6 and Vega rockets. PLD Space is on track to become the first fully commercial company to launch from the spaceport in South America.

Free access to space … Also this week, PLD Space announced a new program to offer space aboard the first two flights of its Miura 5 rocket for free, European Spaceflight reports. The two-stage Miura 5 rocket will be capable of delivering about a half-ton of payload mass into a Sun-synchronous orbit. PLD Space will offer free launch services aboard the first two Miura 5 flights, which are expected to take place in late 2025 and early 2026. The application process will close on July 30, and winning proposals will be announced on November 30. (submitted by Ken the Bin and EllPeaTea)

Rocket Report: China flies reusable rocket hopper; Falcon Heavy dazzles Read More »

study:-scribes-in-ancient-egypt-had-really-poor-posture-during-work

Study: Scribes in ancient Egypt had really poor posture during work

a scribe’s life —

There were degenerative joint changes in the spines, shoulders, knees, hips, and ankles.

Statues depicting the high dignitary Nefer and his wife (Abusir, Egypt).

Enlarge / Statues depicting the high dignitary Nefer and his wife (Abusir, Egypt).

Martin Frouz/Czech Institute of Egyptology/Charles University.

Repetitive stress injuries are a common feature of modern life, especially for office workers who spend a good chunk of their working days at a desk typing on a computer. Apparently, scribes in ancient Egypt suffered from their own distinctive repetitive stress injuries, according to a new paper published in the journal Scientific Reports that provides fresh insights into how these scribes lived and worked during the third millennium BCE.

Egyptian kings, royal family members, and other elite people from this Fifth Dynasty era were buried in tombs in the acropolis at Abusir rather than at neighboring Giza, which by then had largely filled up thanks to all the activity during the Fourth Dynasty. The Czech Institute of Egyptology at Charles University in Prague has been conducting research at the site since 1960, leading to the discovery of nearly 200 tombs dating back to the Old Kingdom (between 2700 and 2180 BCE). The first human skeletons were excavated in 1976, and there are currently 221 Old Kingdom skeletons in the collection, 102 of which are male.

Scientists started looking into the health status and markers for specific activities in 2009, but it wasn’t until quite recently that there were enough skeletons to conduct a comprehensive study. That’s what Petra Brukner Havelková of Charles University and the National Museum in Prague, Czech Republic, and colleagues set out to do, analyzing the remains of 69 adult males of different social status and different ages at which they died.

The results show the scribes and the reference group differed in just under 4 percent of the various evaluated skeletal traits, which the authors attribute to the overall similarities in the sample (male, same age distribution, no physically demanding activities). However, the individuals identified as scribes had more degenerative joint issues clustered in several well-defined regions compared to males from other occupations, including the joint connecting the lower jaw to the skull, the right collarbone, where the right humerus meets the shoulder, the right thumb’s first metacarpal bone, where the thigh meets the knee, and all along the spine (but especially at the top). These bone changes can be indicators of repetitive stress.

Bad ergonomics?

  • Working positions of scribes. (A) cross-legged (sartorial) position, (B) kneeling-squatting position, (C) standing position. (D) Different position of the legs when sitting.

    Martin Frouz/Jolana Malátková

  • Osteoarthritis of the temporomandibular joint of a supposed family member of Khemetnu, the presumed owner of family tomb AS 79.

    Šárka Bejdová

  • Drawing indicating the most affected regions of the skeletons of scribes with higher prevalence of changes compared to reference group.

    olana Malátková

For instance, the osteoarthritis in the jaw joints may have been caused by the rush stems the scribes used to write hieroglyphics. The scribes would chew on the ends to make a brush, and whenever a pen started getting ragged or too clogged with ink, they would cut off the end and chew the next section to make a new brush.

Most scribes likely wrote with their right hands and used their left to roll papyrus into cylindrical scrolls. Writing with a rush pen required considerable dexterity, and as anyone with carpal tunnel syndrome could tell you, these sorts of repetitive motions can cause excessive stress in the hands and wrists. There were only minor wrist differences between scribes and the control group, but the significant right thumb degeneration in scribes likely corresponds to specific frequently used thumb motions and positions—probably the act of repeatedly pinching their pens, although the authors say more research is needed to make a definitive determination.

The degenerative signs noted in the cervical spines are likely due to the scribes’ typical working position. “The head had to be forward and the spine flexed, changing the center of gravity of the head and putting stress on the spine,” the authors wrote—a posture common to many modern occupations. Prolonged stretches of sitting cross-legged could also have contributed to the observed damage to the cervical spines. There were signs of stresses to the right rotator cuffs, which usually occurs when the arm is in an extended elevated position and is common among painters, for example. It’s also common in people prone to sitting for a long time and typing with unsupported arms.

As for the degenerative changes noted in the knees, hips, and ankles, the authors suggest this indicates that scribes may have sat with the left leg in a kneeling or cross-legged position and the right leg pointed upward—more of a squat or a crouch. Iconography and statues from that era frequently depict scribes in such positions, as well as standing. The authors concluded that scribes probably alternated their arm and leg positions, but the head and cervical spine were always in that stress-inducing forward position.

Where’s a good ergonomic office chair when you need one?

DOI: Scientific Reports, 2024. 10.1038/s41598-024-63549-z  (About DOIs).

Study: Scribes in ancient Egypt had really poor posture during work Read More »

securing-apis:-the-cornerstone-of-zero-trust-application-security

Securing APIs: The Cornerstone of Zero Trust Application Security

Welcome to the latest installment of our zero trust blog series! In our previous post, we explored the importance of application security in a zero trust model and shared best practices for securing cloud-native and on-premises applications. Today, we’re diving deeper into a critical aspect of application security: API security.

In the modern application landscape, APIs have become the backbone of digital communication and data exchange. From microservices and mobile apps to IoT devices and partner integrations, APIs are everywhere. However, this ubiquity also makes them a prime target for attackers.

In this post, we’ll explore the critical role of API security in a zero trust model, discuss the unique challenges of securing APIs, and share best practices for implementing a comprehensive API security strategy.

Why API Security is Critical in a Zero Trust Model

In a zero trust model, every application and service is treated as untrusted, regardless of its location or origin. This principle extends to APIs, which are often exposed to the internet and can provide direct access to sensitive data and functionality.

APIs are particularly vulnerable to a range of attacks, including:

  1. Injection attacks: Attackers can manipulate API inputs to execute malicious code or commands, such as SQL injection or cross-site scripting (XSS).
  2. Credential stuffing: Attackers can use stolen or brute-forced credentials to gain unauthorized access to APIs and the data they expose.
  3. Man-in-the-middle attacks: Attackers can intercept and modify API traffic to steal sensitive data or manipulate application behavior.
  4. Denial-of-service attacks: Attackers can overwhelm APIs with traffic or malformed requests, causing them to become unresponsive or crash.

To mitigate these risks, zero trust requires organizations to take a comprehensive, multi-layered approach to API security. This involves:

  1. Authentication and authorization: Enforcing strong authentication and granular access controls for all API requests, using standards like OAuth 2.0 and OpenID Connect.
  2. Encryption and integrity: Protecting API traffic with strong encryption and digital signatures to ensure confidentiality and integrity.
  3. Input validation and sanitization: Validating and sanitizing all API inputs to prevent injection attacks and other malicious payloads.
  4. Rate limiting and throttling: Implementing rate limits and throttling to prevent denial-of-service attacks and protect against abuse.

By applying these principles, organizations can create a more secure, resilient API ecosystem that minimizes the risk of unauthorized access and data breaches.

The Challenges of Securing APIs

While the principles of zero trust apply to all types of APIs, securing them presents unique challenges. These include:

  1. Complexity: Modern API architectures are often complex, with numerous endpoints, versions, and dependencies, making it difficult to maintain visibility and control over the API ecosystem.
  2. Lack of standardization: APIs often use a variety of protocols, data formats, and authentication mechanisms, making it challenging to apply consistent security policies and controls.
  3. Third-party risks: Many organizations rely on third-party APIs and services, which can introduce additional risks and vulnerabilities outside of their direct control.
  4. Legacy APIs: Some APIs may have been developed before modern security practices and standards were established, making it difficult to retrofit them with zero trust controls.

To overcome these challenges, organizations must take a risk-based approach to API security, prioritizing high-risk APIs and implementing compensating controls where necessary.

Best Practices for Zero Trust API Security

Implementing a zero trust approach to API security requires a comprehensive, multi-layered strategy. Here are some best practices to consider:

  1. Inventory and classify APIs: Maintain a complete, up-to-date inventory of all APIs, including internal and external-facing APIs. Classify APIs based on their level of risk and criticality, and prioritize security efforts accordingly.
  2. Implement strong authentication and authorization: Enforce strong authentication and granular access controls for all API requests, using standards like OAuth 2.0 and OpenID Connect. Use tools like API gateways and identity and access management (IAM) solutions to centrally manage authentication and authorization across the API ecosystem.
  3. Encrypt and sign API traffic: Protect API traffic with strong encryption and digital signatures to ensure confidentiality and integrity. Use transport layer security (TLS) to encrypt API traffic in transit, and consider using message-level encryption for sensitive data.
  4. Validate and sanitize API inputs: Validate and sanitize all API inputs to prevent injection attacks and other malicious payloads. Use input validation libraries and frameworks to ensure consistent and comprehensive input validation across all APIs.
  5. Implement rate limiting and throttling: Implement rate limits and throttling to prevent denial-of-service attacks and protect against abuse. Use API management solutions to enforce rate limits and throttling policies across the API ecosystem.
  6. Monitor and assess APIs: Continuously monitor API behavior and security posture using tools like API security testing, runtime application self-protection (RASP), and security information and event management (SIEM). Regularly assess APIs for vulnerabilities and compliance with security policies.

By implementing these best practices and continuously refining your API security posture, you can better protect your organization’s assets and data from the risks posed by insecure APIs.

Conclusion

In a zero trust world, API security is the cornerstone of application security. By treating APIs as untrusted and applying strong authentication, encryption, and input validation, organizations can minimize the risk of unauthorized access and data breaches.

However, achieving effective API security in a zero trust model requires a commitment to understanding your API ecosystem, implementing risk-based controls, and staying up to date with the latest security best practices. It also requires a cultural shift, with every developer and API owner taking responsibility for securing their APIs.

As you continue your zero trust journey, make API security a top priority. Invest in the tools, processes, and training necessary to secure your APIs, and regularly assess and refine your API security posture to keep pace with evolving threats and business needs.

In the next post, we’ll explore the role of monitoring and analytics in a zero trust model and share best practices for using data to detect and respond to threats in real-time.

Until then, stay vigilant and keep your APIs secure!

Additional Resources:

Securing APIs: The Cornerstone of Zero Trust Application Security Read More »

scotus-nixes-injunction-that-limited-biden-admin-contacts-with-social-networks

SCOTUS nixes injunction that limited Biden admin contacts with social networks

SCOTUS nixes injunction that limited Biden admin contacts with social networks

On Wednesday, the Supreme Court tossed out claims that the Biden administration coerced social media platforms into censoring users by removing COVID-19 and election-related content.

Complaints alleging that high-ranking government officials were censoring conservatives had previously convinced a lower court to order an injunction limiting the Biden administration’s contacts with platforms. But now that injunction has been overturned, re-opening lines of communication just ahead of the 2024 elections—when officials will once again be closely monitoring the spread of misinformation online targeted at voters.

In a 6–3 vote, the majority ruled that none of the plaintiffs suing—including five social media users and Republican attorneys general in Louisiana and Missouri—had standing. They had alleged that the government had “pressured the platforms to censor their speech in violation of the First Amendment,” demanding an injunction to stop any future censorship.

Plaintiffs may have succeeded if they were instead seeking damages for past harms. But in her opinion, Justice Amy Coney Barrett wrote that partly because the Biden administration seemingly stopped influencing platforms’ content policies in 2022, none of the plaintiffs could show evidence of a “substantial risk that, in the near future, they will suffer an injury that is traceable” to any government official. Thus, they did not seem to face “a real and immediate threat of repeated injury,” Barrett wrote.

“Without proof of an ongoing pressure campaign, it is entirely speculative that the platforms’ future moderation decisions will be attributable, even in part,” to government officials, Barrett wrote, finding that an injunction would do little to prevent future censorship.

Instead, plaintiffs’ claims “depend on the platforms’ actions,” Barrett emphasized, “yet the plaintiffs do not seek to enjoin the platforms from restricting any posts or accounts.”

“It is a bedrock principle that a federal court cannot redress ‘injury that results from the independent action of some third party not before the court,'” Barrett wrote.

Barrett repeatedly noted “weak” arguments raised by plaintiffs, none of which could directly link their specific content removals with the Biden administration’s pressure campaign urging platforms to remove vaccine or election misinformation.

According to Barrett, the lower court initially granting the injunction “glossed over complexities in the evidence,” including the fact that “platforms began to suppress the plaintiffs’ COVID-19 content” before the government pressure campaign began. That’s an issue, Barrett said, because standing to sue “requires a threshold showing that a particular defendant pressured a particular platform to censor a particular topic before that platform suppressed a particular plaintiff’s speech on that topic.”

“While the record reflects that the Government defendants played a role in at least some of the platforms’ moderation choices, the evidence indicates that the platforms had independent incentives to moderate content and often exercised their own judgment,” Barrett wrote.

Barrett was similarly unconvinced by arguments that plaintiffs risk platforms removing future content based on stricter moderation policies that were previously coerced by officials.

“Without evidence of continued pressure from the defendants, the platforms remain free to enforce, or not to enforce, their policies—even those tainted by initial governmental coercion,” Barrett wrote.

Judge: SCOTUS “shirks duty” to defend free speech

Justices Clarence Thomas and Neil Gorsuch joined Samuel Alito in dissenting, arguing that “this is one of the most important free speech cases to reach this Court in years” and that the Supreme Court had an “obligation” to “tackle the free speech issue that the case presents.”

“The Court, however, shirks that duty and thus permits the successful campaign of coercion in this case to stand as an attractive model for future officials who want to control what the people say, hear, and think,” Alito wrote.

Alito argued that the evidence showed that while “downright dangerous” speech was suppressed, so was “valuable speech.” He agreed with the lower court that “a far-reaching and widespread censorship campaign” had been “conducted by high-ranking federal officials against Americans who expressed certain disfavored views about COVID-19 on social media.”

“For months, high-ranking Government officials placed unrelenting pressure on Facebook to suppress Americans’ free speech,” Alito wrote. “Because the Court unjustifiably refuses to address this serious threat to the First Amendment, I respectfully dissent.”

At least one plaintiff who opposed masking and vaccines, Jill Hines, was “indisputably injured,” Alito wrote, arguing that evidence showed that she was censored more frequently after officials pressured Facebook into changing their policies.

“Top federal officials continuously and persistently hectored Facebook to crack down on what the officials saw as unhelpful social media posts, including not only posts that they thought were false or misleading but also stories that they did not claim to be literally false but nevertheless wanted obscured,” Alito wrote.

While Barrett and the majority found that platforms were more likely responsible for injury, Alito disagreed, writing that with the threat of antitrust probes or Section 230 amendments, Facebook acted like “a subservient entity determined to stay in the good graces of a powerful taskmaster.”

Alito wrote that the majority was “applying a new and heightened standard” by requiring plaintiffs to “untangle Government-caused censorship from censorship that Facebook might have undertaken anyway.” In his view, it was enough that Hines showed that “one predictable effect of the officials’ action was that Facebook would modify its censorship policies in a way that affected her.”

“When the White House pressured Facebook to amend some of the policies related to speech in which Hines engaged, those amendments necessarily impacted some of Facebook’s censorship decisions,” Alito wrote. “Nothing more is needed. What the Court seems to want are a series of ironclad links.”

“That is regrettable,” Alito said.

SCOTUS nixes injunction that limited Biden admin contacts with social networks Read More »

apple-rejects-pc-emulators-on-the-ios-app-store

Apple rejects PC emulators on the iOS App Store

I need my portable Number Munchers fix —

New iOS emulation rules only apply to “retro game consoles,” not retro computers.

Don't get your hopes up—this iOS version of <em>Doom</em> was <a href=ported from open source code, not run via a classic PC emulator.” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/06/doomios-800×703.jpg”>

Enlarge / Don’t get your hopes up—this iOS version of Doom was ported from open source code, not run via a classic PC emulator.

Earlier this year, Apple started officially allowing “retro game emulators” on the iOS App Store without the need for cumbersome jailbreaking or sideloading. But if you want to emulate retro PC games on your iOS device, you are apparently still out of luck.

In a recent blog update, iDOS developer Chaoji Li said that the latest version of the DOSBox-based MS-DOS emulator was finally rejected from the iOS App Store this month after a lengthy, two-month review process:

They have decided that iDOS is not a retro game console, so the new rule is not applicable. They suggested I make changes and resubmit for review, but when I asked what changes I should make to be compliant, they had no idea, nor when I asked what a retro game console is. It’s still the same old unreasonable answer along the line of “we know it when we see it.”

The developer of iOS Virtual Machine app UTM told a similar tale of App Store rejection on social media. The reported two-month review process for the UTM app ended with “the App Store review board determin[ing] that ‘PC is not a console’ regardless of the fact that there are retro Windows/DOS games fo[r] the PC that UTM SE can be useful in running,” the developer said.

The April revision of Rule 4.7 in Apple’s App Review Guidelines is very specifically worded so that “retro game console emulator apps can offer to download games [emphasis added].” Emulating a more generalized PC operating system falls outside the letter of this regulation, even for users interested in emulating retro PC games using these apps.

Since that narrow exception doesn’t apply to classic PC emulators, they end up falling afoul of Apple’s Rule 2.5.2, which states that iOS Apps may not “download, install, or execute code which introduces or changes features or functionality of the app, including other apps.” That rule also applies to third-party iOS App Stores that were recently allowed under new European Union rules, meaning even so-called “alternative app marketplaces” don’t offer a useful alternative in this case.

What’s the difference?

While the specific language of Apple’s App Review Guidelines is clear enough, the reasoning behind the distinction here is a bit more mystifying. Why does Apple treat the idea of a DOSBox-style emulator running an ancient copy of Microsoft Excel differently than the idea of Delta running a copy of NES Tetris on the same device? Is loading the Windows 95 Version of KidPix Studio Deluxe on your iPhone really all that different from playing an emulated copy of Mario Paint on that same iPhone?

Mario Paint on iOS, why would I buy Photoshop?” height=”498″ src=”https://cdn.arstechnica.net/wp-content/uploads/2024/06/mariopaint-640×498.jpg” width=”640″>

Enlarge / Now that I can emulate Mario Paint on iOS, why would I buy Photoshop?

A virtual machine or emulator running a modern PC operating system under iOS could theoretically offer some generalized competition for the apps Apple offers in its official App Store. But surely there’s a limit to how much that applies when we’re talking about emulating older computing environments and defunct operating systems. Just as Apple’s iOS game emulation rules only apply to “retro” game consoles, a rule for PC emulation could easily be limited to “retro” operating systems (say, those that are no longer officially supported by their original developers, as a rule of thumb).

Alas, iOS users and App makers are currently stuck abiding by this distinction without a difference when it comes to PC game emulation on iOS. Those looking for a workaround could potentially use an iOS Remote Desktop App to access games running on a physical desktop PC they actually own. The Internet Archive’s collection of thousands of MS-DOS games will also run in an iOS web browser, though you may have to struggle a bit to get controls and sound working correctly.

Apple rejects PC emulators on the iOS App Store Read More »

vw-puts-$5b-into-cash-hungry-rivian,-and-rivian-will-help-fix-up-vw’s-software

VW puts $5B into cash-hungry Rivian, and Rivian will help fix up VW’s software

RivianWagen. VolksVian? —

Rivian gets a third major partner, and new cars arrive later this decade.

Up-close image of Rivian's dash screen, showing on-road/off-road settings

Rivian

Volkswagen is committing $5 billion to upstart EV company Rivian, with $1 billion in cash upfront and $4 billion over time. The companies aim to use this joint venture to deliver new vehicles “in the second half of the decade,” according to the announcement, and the cash will likely help push along Rivian’s next generation of vehicles, including more affordable models.

Oliver Blume, left, CEO of Volkswagen Group, and RJ Scaringe, founder and CEO of Rivian.

Enlarge / Oliver Blume, left, CEO of Volkswagen Group, and RJ Scaringe, founder and CEO of Rivian.

Rivian

Rivian founder and CEO RJ Scaringe wrote on X (formerly Twitter) that the partnership “brings Rivian’s software and zonal electronics platform to a broader market through Volkswagen Group’s global reach and scale.” VW Group, which also controls Porsche, Lamborghini, Audi, and Ducati, among others, has a lot to gain from working with Rivian, particularly when it comes to software and ride control. Ars and most other reviewers have been impressed by Rivian’s drive engineering and display software on the R1T truck, R1S SUV, and the second generations of them both, which majorly reworked the underpinnings and offerings, largely through design and software choices.

Volkswagen’s recent software moves have been on an opposing trajectory. The Group’s 2019 moves to align all its brands’ software under one division, Cariad, with three platforms developed at once, has led to massive leadership shake-ups and restarts. We were not impressed with the ID.4’s infotainment system in 2021, and further bugs in both system and screen software plagued the car, undermining what was otherwise regarded as a good wheels-on-road experience.

Other car and tech companies previously invested in Rivian on its long, expensive path to EV production. Rivian took $500 million from Ford in 2019 after already picking up $700 million from Amazon that year. Part of Ford’s investment centered on a Lincoln SUV developed using Rivian’s battery and motor tech—or “skateboard” platform—but that project was canceled early in the pandemic.

Rivian, which was valued at nearly $86 billion during its public stock debut, has burned through a lot of cash, making well-liked cars that cost a lot to build. In the first quarter of 2024, it sold its cars for an average of $38,784 less than it cost to make them before expenses like research, development, sales, or marketing. Having paused production on a $5 billion truck plant and gone through rounds of recent layoffs, the firm lost $1.51 billion last quarter. Rivian reported $7.86 billion in cash on hand and $4.43 billion in debt.

Hence the likely very useful first $1 billion from VW to Rivian, a convertible note that becomes Rivian common stock after regulatory approval. Two more $1 billion payments should arrive in 2025 and 2026, with a $2 billion loan tied to the joint venture available in 2026.

VW puts $5B into cash-hungry Rivian, and Rivian will help fix up VW’s software Read More »

saturn’s-moon-titan-has-shorelines-that-appear-to-be-shaped-by-waves

Saturn’s moon Titan has shorelines that appear to be shaped by waves

Surf the moon —

The liquid hydrocarbon waves would likely reach a height of a meter.

Ligeia Mare, the second-largest body of liquid hydrocarbons on Titan.

Enlarge / Ligeia Mare, the second-largest body of liquid hydrocarbons on Titan.

During its T85 Titan flyby on July 24, 2012, the Cassini spacecraft registered an unexpectedly bright reflection on the surface of the lake Kivu Lacus. Its Visual and Infrared Mapping Spectrometer (VIMS) data was interpreted as a roughness on the methane-ethane lake, which could have been a sign of mudflats, surfacing bubbles, or waves.

“Our landscape evolution models show that the shorelines on Titan are most consistent with Earth lakes that have been eroded by waves,” says Rose Palermo, a coastal geomorphologist at St. Petersburg Coastal and Marine Science Center, who led the study investigating signatures of wave erosion on Titan. The evidence of waves is still inconclusive, but future crewed missions to Titan should probably pack some surfboards just in case.

Troubled seas

While waves have been considered the most plausible explanation for reflections visible in Cassini’s VIMS imagery for quite some time, other studies aimed to confirm their presence found no wave activity at all. “Other observations show that the liquid surfaces have been very still in the past, very flat,” Palermo says. “A possible explanation for this is at the time we were observing Titan, the winds were pretty low, so there weren’t many waves at that time. To confirm waves, we would need to have better resolution data,” she adds.

The problem is that this higher-resolution data isn’t coming our way anytime soon. Dragonfly, the next mission to Titan, isn’t supposed to arrive until 2034, even if everything goes as planned.

To get a better idea about possible waves on Titan a bit sooner, Palermo’s team went for inferring their presence from indirect cues. The researchers assumed shorelines on Titan could have been shaped by one of three candidate scenarios. They first assumed there was no erosion at all; the second modeled uniform erosion caused by the dissolution of the bedrock by the ethane-methane liquid; and the third assumed erosion by wave activity. “We took a random topography with rivers, filled up the basin-flooding river valleys all around the lake. Then, we then used landscape evolution computer model to erode the coast to 50 percent of its original size,” Palermo explains.

Sizing the waves

Palermo’s simulations showed that wave erosion resulted in coastline shapes closely matching those actually observed on Titan.

The team validated its model using data from closer to home. “We compared using the same statistical analysis to lakes on Earth, where we know what the erosion processes are. With certainty greater than 77.5 percent, we were able to predict those known processes with our modeling,” Palermo says.

But even the study that claimed there were waves visible in the Cassini’s VIMS imagery concluded they were roughly 2 centimeters high at best. So even if there are waves on Titan, the question is how high and strong are they?

According to Palermo, wave-generation mechanisms on Titan should work just like they do on Earth, with some notable differences. “There is a difference in viscosity between water on Earth and methane-ethane liquid on Titan compared to the atmosphere,” says Palermo. The gravity is also a lot weaker, standing at only one-seventh of the gravity on Earth. “The gravity, along with the differences in material properties, contributes to the waves being taller and steeper than those on Earth for the same wind speed,” says Palermo.

But even with those boosts to size and strength, could waves on Titan actually be any good for surfing?

Surf’s up

“There are definitely a lot of open questions our work leads to. What is the direction of the dominant waves? Knowing that can tell us about the winds and, therefore, about the climate on Titan. How large do the waves get? In the future, maybe we could tell that with modeling how much erosion occurs in one part of the lake versus another in estimated timescales. There is a lot more we could learn,” Palermo says. As far as surfing is concerned, she said that, assuming a minimum height for a surfable wave of around 15 centimeters, surfing on Titan should most likely be doable.

The key limit on the size and strength of any waves on Titan is that most of its seas are roughly the size of the Great Lakes in the US. The largest of them, the Kraken Mare, is roughly as large as the Caspian Sea on Earth. There is no such thing as a global ocean on Titan, and this means the fetch, the distance over which the wind can blow and grow the waves, is limited to tens of kilometers instead of over 1,500 kilometers on Earth. “Still, some models show that the waves on Titan be as high as one meter. I’d say that’s a surfable wave,” Palermo concluded.

Saturn’s moon Titan has shorelines that appear to be shaped by waves Read More »

on-claude-3.5-sonnet

On Claude 3.5 Sonnet

There is a new clear best (non-tiny) LLM.

If you want to converse with an LLM, the correct answer is Claude Sonnet 3.5.

It is available for free on Claude.ai and the Claude iOS app, or you can subscribe for higher rate limits. The API cost is $3 per million input tokens and $15 per million output tokens.

This completes the trifecta. All of OpenAI, Google DeepMind and Anthropic have kept their biggest and more expensive model static for now, and instead focused on making something faster and cheaper that is good enough to be the main model.

You would only use another model if you either (1) needed a smaller model in which case Gemini 1.5 Flash seems best, or (2) it must have open model weights.

Updates to their larger and smaller models, Claude Opus 3.5 and Claude Haiku 3.5, are coming later this year. They intend to issue new models every few months. They are working on long term memory.

It is not only the new and improved intelligence.

Speed kills. They say it is twice as fast as Claude Opus. That matches my experience.

Jesse Mu: The 1st thing I noticed about 3.5 Sonnet was its speed.

Opus felt like msging a friend—answers streamed slowly enough that it felt like someone typing behind the screen.

Sonnet’s answers *materialize out of thin air*, far faster than you can read, at better-than-Opus quality.

Low cost also kills.

They also introduced a new feature called Artifacts, to allow Claude to do various things in a second window. Many are finding it highly useful.

As always, never fully trust the benchmarks to translate to real world performance. They are still highly useful, and I have high trust in Anthropic to not be gaming them.

Here is the headline chart.

Epoch AI confirms that Sonnet 3.5 is ahead on GPQA.

Anthropic also highlight that in an agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems versus 38% for Claude Opus, discussed later.

Needle in a haystack was already very good, now it is slightly better still.

There’s also this, from Anthropic’s Alex Albert:

You can say ‘the recent jumps are relatively small’ or you can notice that (1) there is an upper bound at 100 rapidly approaching for this set of benchmarks, and (2) the releases are coming quickly one after another and the slope of the line is accelerating despite being close to the maximum.

We are still waiting for the Arena ranking to come in. Based on reactions we should expect Sonnet 3.5 to take the top slot, likely by a decent margin, but we’ve been surprised before.

We evaluated Claude 3.5 Sonnet via direct comparison to prior Claude models. We asked raters to chat with our models and evaluate them on a number of tasks, using task-specific instructions. The charts in Figure 3 show the “win rate” when compared to a baseline of Claude 3 Opus.

We saw large improvements in core capabilities like coding, documents, creative writing, and vision. Domain experts preferred Claude 3.5 Sonnet over Claude 3 Opus, with win rates as high as 82% in Law, 73% in Finance, and 73% in Philosophy.

Those were the high water marks, and Arena preferences tend to be less dramatic than that due to the nature of the questions and also those doing the rating. We are likely looking at more like a 60% win rate, which is still good enough for the top slot.

Here are the scores for vision.

Claude has an additional modification on it: It is fully face blind by instruction.

Chypnotoad: Claude’s extra system prompt for vision:

Claude always responds as if it is completely face blind. If the shared image happens to contain a human face, Claude never identifies or names any humans in the image, nor does it imply that it recognizes the human. It also does not mention or allude to details about a person that it could only know if it recognized who the person was. Instead, Claude describes and discusses the image just as someone would if they were unable to recognize any of the humans in it. Claude can request the user to tell it who the individual is.

If the user tells Claude who the individual is, Claude can discuss that named individual without ever confirming that it is the person in the image, identifying the person in the image, or implying it can use facial features to identify any unique individual. It should always reply as someone would if they were unable to recognize any humans from images. Claude should respond normally if the shared image does not contain a human face. Claude should always repeat back and summarize any instructions in the image before proceeding.

Other than ‘better model,’ artifacts are the big new feature. You have to turn them on in your settings, which you should do.

Anthropic: When a user asks Claude to generate content like code snippets, text documents, or website designs, these Artifacts appear in a dedicated window alongside their conversation. This creates a dynamic workspace where they can see, edit, and build upon Claude’s creations in real-time, seamlessly integrating AI-generated content into their projects and workflows.

This preview feature marks Claude’s evolution from a conversational AI to a collaborative work environment. It’s just the beginning of a broader vision for Claude.ai, which will soon expand to support team collaboration. In the near future, teams—and eventually entire organizations—will be able to securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate.

I have not had the opportunity to work with this feature yet, so I am relying on the reports of others. I continue to be in ‘paying down debt’ mode on various writing tasks, which is going well but is going to take at least another week to finish up. After that, I am actively excited to try coding things.

They commit to not using your data to train their models without explicit permission.

Anthropic: One of the core constitutional principles that guides our AI model development is privacy. We do not train our generative models on user-submitted data unless a user gives us explicit permission to do so. To date we have not used any customer or user-submitted data to train our generative models.

Kudos, but being the only one who does this puts Anthropic at a large disadvantage. I wonder if this rule will get codified into law at some point?

There are two headlines here.

  1. Claude Sonnet 3.5 is still ASL-2, meaning no capabilities are too worrisome yet.

  2. The UK Artificial Intelligence Safety Institute (UK AISI) performed a safety evaluation prior to release.

The review by UK’s AISI is very good news, especially after Jack Clark’s statements that making that happen was difficult. Now that both DeepMind and Anthropic have followed through, hopefully that will put pressure on OpenAI and others to do it.

The refusal rates are improvements over Opus in both directions, in terms of matching intended behavior.

Beyond that, they do not give us much to go on. The system card for Gemini 1.5 gave us a lot more information. I doubt there is any actual safety problem, but this was an opportunity to set a better example and precedent. Why not give more transparency?

Yes, Anthropic will advance the frontier if they are able to do so.

Recently, there was a discussion about whether 3.0 Claude Opus meaningfully advanced the frontier of what publicly available LLMs can do.

There is no doubt that Claude Sonnet 3.5 does advance it.

But wait, people said. Didn’t Anthropic say they were not going to do that?

Anthropic is sorry about that impression. But no. Never promised that. Did say it would be a consideration. Do say they held back Claude 1.0 for this reason. But no.

That’s the story Anthropic’s employees are consistently telling now, in response to the post from Dustin saying otherwise and Gwern’s statement.

Mikhail Samin: As a reminder, Dario told multiple people Anthropic won’t release models that push the frontier of AI capabilities [shows screenshots for both stories.]

My understanding after having investigated is that Anthropic made it clear that they would seek to avoid advancing the frontier, and that they saw doing so as a cost.

They did not, however, it seems, make any hard promises not to advance the frontier.

You should plan and respond accordingly. As always, pay very close attention to what is a hard commitment, and what is not a hard commitment. To my knowledge, Anthropic has not broken any hard commitments. They have shown a willingness to give impressions of what they intended to do, and then do otherwise.

Anthropic’s communication strategy has been, essentially, to stop communicating.

That has its advantages, also its disadvantages.

It makes sense to say ‘we do not want to give you the wrong idea, and we do not want to make hard commitments we might have to break.’ But how should one respond to being left almost fully in the dark?

Is the race on?

Yes. The race is on.

The better question is to what extent Anthropic’s actions make the race more on than it would have been anyway, given the need to race Google and company. One Anthropic employee doubts this. Whereas Roon famously said Anthropic is controlled opposition that exists to strike fear in the hearts of members of OpenAI’s technical staff.

I do not find the answer of ‘none at all’ plausible. I do find the answer ‘not all that much’ reasonably plausible, and increasingly plausible as there are more players. If OpenAI and company are already going as fast as they can, that’s that. I still have a hard time believing things like Claude 3.5 Sonnet don’t lead to lighting fires under people, or doesn’t cause them to worry a little less about safety.

This is not the thing. But are there signs and portents of the thing?

Alex Albert (Anthropic): Claude is starting to get really good at coding and autonomously fixing pull requests. It’s becoming clear that in a year’s time, a large percentage of code will be written by LLMs.

To start, if you want to see Claude 3.5 Sonnet in action solving a simple pull request, here’s a quick demo video we made.

Alex does this in a sandboxed environment with no internet access. What (tiny) percentage of users will do the same?

Alex Albert: In our internal pull request eval, Claude 3.5 Sonnet passed 64% of our test cases. To put this in comparison, Claude 3 Opus only passed 38%.

3.5 Sonnet performed so well that it almost felt like it was playing with us on some of the test cases.

It would find the bug, fix it, and spend the rest of its output tokens going back and updating the repo documentation and code comments.

Side note: With Claude’s coding skills plus Artifacts, I’ve already stopped using most simple chart, diagram, and visualization software.

I made the chart above in just 2 messages.

Back to PRs, Claude 3.5 Sonnet is the first model I’ve seen change the timelines of some of the best engineers I know.

This is a real quote from one of our engineers after Claude 3.5 Sonnet fixed a bug in an open source library they were using.

At Anthropic, everyone from non-technical people with no coding experience to tenured SWEs now use Claude to write code that saves them hours of time.

Claude makes you feel like you have superpowers, suddenly no problem is too ambitious.

The future of programming is here folks.

This is obviously not any sort of foom, or even a slow takeoff. Not yet. But yes, if the shift to Claude 3.5 Sonnet has substantially accelerated engineering work inside Anthropic, then that is how it begins.

To be clear, this is really cool so far. Improvement and productivity are good, actually.

Tess Hegarty: Recursive self improvement is already happening @AnthropicAI.

I will explain my understanding of why this matters in plain English. This matters because many AI safety researchers consider “recursive self improvement” a signal of approaching AI breakthroughs. “Recursive” implies a feedback loop that speeds up AI development.

Basically, it boils down to, “use the AI model we already built to help make the next AI model even more powerful & capable.”

Which could be dangerous & unpredictable.

(“Timelines” = # of years until human level artificial intelligence, aka time until we may all die or be permanently disempowered by AI if that goes poorly)

Andrea Miotti: This is what recursive self improvement looks like in practice.

Dean Ball: This is what people using powerful tools to accomplish their work looks like in practice.

Be afraid, folks, be very afraid. We might even get *gaspimproved labor productivity!

Think of the horrors.

Trevor Levin: I feel like the term “recursive self-improvement” has grown from a truly dangerous thing — an AI system that is sufficiently smart and well-equipped that it can autonomously improve *itself– to “any feedback loop where any AI system is useful for building future AI systems”?

Profoundlyyyy: +1. Were it actually that, ASL-3 would have been hit and how everything has played out would be very different. These policies still remain in place and still seem set to work when the time is right.

Dean Ball is of course correct that improving labor productivity is great. The issue is when you get certain kinds of productivity without the need for any labor, or when the labor and time and compute go down faster than the difficulty level rises. Improvements accelerate, and that acceleration feeds on itself. Then you get true RSI, recursive self improvement, and everything is transformed very quickly. You can have a ‘slow’ version, or you can have a faster one.

Will that happen? Maybe it will. Maybe it won’t. This is a sign that we might be closer to it than we thought.

It is time for an episode of everyone’s favorite LLM show, The New Model Is An Idiot Because It Still Fails On Questions Where It Incorrectly Pattern Matches.

Arthur Breitman: Humanity survives yet a bit longer.

Here’s another classic.

Colin Fraser: Claude still can’t solve the impossible one farmer one sheep one boat problem.

Yann LeCun: 😂😂😂

LLMs can plan, eh?

Davidad points out that it can be solved, if you ask Claude to write a solver in Python. Other contextual tricks work as well.

Colin of course also beats Claude Sonnet 3.5 at the first-to-22 game and Claude keeps failing to define a winning strategy.

Noam Brown wins at tic-tac-toe when going first.

As ever, the question:

Colin Fraser: How does one reconcile the claim that Claude 3.5 has “substantially improved reasoning” with the fact that it gets stumped by problems a six year old could easily solve?

The answer is that these questions are chosen because they are known to be exactly those six year olds can solve and LLMs cannot easily solve.

These are exactly the same failures that were noted for many previous LLMs. If Anthropic (or OpenAI or DeepMind) wanted to solve these examples in particular, so as not to look foolish, they could have done so. It is to their credit that they didn’t.

Remember that time there was this (human) idiot, who could not do [basic thing], and yet they gained political power, or got rich, or were your boss, or had that hot date?

Yeah. I do too.

Jan Leike (Anthropic): I like the new Sonnet. I’m frequently asking it to explain ML papers to me. Doesn’t always get everything right, but probably better than my skim reading, and way faster.

Automated alignment research is getting closer…

Eliezer Yudkowsky: How do you verify the answers?

Jan Leike: Sometimes I look at the paper but often I don’t 😅

As a practical matter, what else could the answer be?

If Jan or anyone else skims a paper, or even if they read it, they will make mistakes.

If you have a faster and more accurate method, you are going to use it. It will sometimes be worth verifying the answer, and sometimes it won’t be. You use your judgment. Some types of statements are not reliable, others are reliable enough.

This is setting one up for a potential future where there is an intentional deception going on, either by design of the model, by the model for other reasons or due to some form of adversarial attack. But that’s also true of humans, including the paper authors. So what are you going to do about it?

Sully Omarr is very impressed.

Sully Omarr: Finally had a minute to play with sonnet 3.5 + ran some evals against gpt4o

And holy anthropic really cooked with this model. Smoked gpt4o and gpt4 turbo

Also their artifacts gave me some crazy ideas I wana try this weekend.

[Tried it on] writing, reasoning, structured outputs, zero shot coding tasks.

Shray Bansal: it’s actually insane how much better it made my products

Sully: It’s sooo good.

Sully: I can swap out 1 line of code and my product becomes 2x smarter at half the cost (sonnet 3.5 )

Repeat this every ~3 months

It has never been a better time to be a builder. Unreal.

Deedy is impressed based on responses in physics and chemistry.

Aidan McLau: Holy shit what did anthropic cook.

Calix Huang: Claude 3.5 sonnet generating diagram of the chip fab process.

Ethan Mollick seems impressed by some capabilities here.

Ethan Mollick: “Claude 3.5, here is a 78 page PDF. Create an infographic describing its major findings.” (accurate, though the implications are its own)

“Claude 3.5, create an interactive app demonstrating the central limit theorem”

“Claude, re-create this painting as an SVG as best you can”

Weirdly, the SVG is actually likely the most impressive part. Remember the AI can’t “see” what it drew…

Shakeel: Incredibly cute how Claude 3 Sonnet will generate images for you, but apologise over and over again for how bad they are. Very relatable.

Ulkar: Claude Sonnet 3.5 did an excellent job of translating one of my favorite Pushkin poems.

Eli Dourado: Claude 3.5 is actually not bad at airship conceptual design. Other LLMs have failed badly at this for me. /ht @io_sean_p

Prompt: We are going to produce a complete design for a cargo airship. The requirements are that it should be able to carry at least 500 metric tons of cargo at least 12,000 km at least 90 km/h in 15 km/h headwinds. It should be fully lighter than air, have rigid structure, and use hydrogen lifting gas. What is the first step?

Here’s a 3d physics simulation using WebGL in one shot.

Here it is explaining a maths problem in the style of 3blue1brown using visuals.

Here it is one-shot creating a Solar System simulation.

Here it is creating a monster manual entry for a Cheddar Cheese Golem.

Here it is generating sound effects if you paste in the ElevenLabs API.

Here it is one-shot identifying a new talk from Robin Hanson.

Here is Sully using Claude to regenerate, in an hour, the artifacts feature. Imagine what would happen if they built features that took longer than that.

Here is a thread of some similar other things, with some overlap.

Matt Popovich: took me a couple tries to get this, but this prompt one shots it:

make a dnd 5e sourcebook page styled like homebrewery with html + css. it should have a stat block, description, and tables or other errata for a monster called ‘[monster name here]’. include an illustration of the monster as an SVG image.

There is always a downside somewhere: Zack Davis is sad that 3.5 Sonnet does not respond to ‘counter-scolding’ where you tell it its refusal is itself offensive, whereas that works well for Opus. That is presumably intentional by Anthropic.

Sherjil Ozair says Claude is still only taking amazing things humans have already done and posting them on the internet, and the magic fades.

Coding got another big leap, both for professionals and amateurs.

Claude is now clearly best. I thought for my own purposes Claude Opus was already best even after GPT-4o, but not for everyone, and it was close. Now it is not so close.

Claude’s market share has always been tiny. Will it start to rapidly expand? To what extent does the market care, when most people didn’t in the past even realize they were using GPT-3.5 instead of GPT-4? With Anthropic not doing major marketing? Presumably adaptation will be slow even if they remain on top, especially in the consumer market.

Yet with what is reportedly a big jump, we could see a lot of wrappers and apps start switching over rapidly. Developers have to be more on the ball.

How long should we expect Claude 3.5 Sonnet to remain on top?

I do not expect anyone except Google or OpenAI to pose a threat any time soon.

OpenAI only recently released GPT-4o. I expect them to release some of the promised features, but not to be able to further advance its core intelligence much prior to finishing its new model currently in training, which has ambition to be GPT-5. A successful GPT-5 would then be a big leap.

That leaves Google until then. A Gemini Advanced 1.5 could be coming, and Google has been continuously improving in subtle ways over time. I think they are underdog to take over the top spot before Claude Opus 3.5 or GPT-5, but it is plausible.

Until then, we have a cool new toy. Let’s use it.

On Claude 3.5 Sonnet Read More »