DeepSeek

when-“no”-means-“yes”:-why-ai-chatbots-can’t-process-persian-social-etiquette

When “no” means “yes”: Why AI chatbots can’t process Persian social etiquette

If an Iranian taxi driver waves away your payment, saying, “Be my guest this time,” accepting their offer would be a cultural disaster. They expect you to insist on paying—probably three times—before they’ll take your money. This dance of refusal and counter-refusal, called taarof, governs countless daily interactions in Persian culture. And AI models are terrible at it.

New research released earlier this month titled “We Politely Insist: Your LLM Must Learn the Persian Art of Taarof” shows that mainstream AI language models from OpenAI, Anthropic, and Meta fail to absorb these Persian social rituals, correctly navigating taarof situations only 34 to 42 percent of the time. Native Persian speakers, by contrast, get it right 82 percent of the time. This performance gap persists across large language models such as GPT-4o, Claude 3.5 Haiku, Llama 3, DeepSeek V3, and Dorna, a Persian-tuned variant of Llama 3.

A study led by Nikta Gohari Sadr of Brock University, along with researchers from Emory University and other institutions, introduces “TAAROFBENCH,” the first benchmark for measuring how well AI systems reproduce this intricate cultural practice. The researchers’ findings show how recent AI models default to Western-style directness, completely missing the cultural cues that govern everyday interactions for millions of Persian speakers worldwide.

“Cultural missteps in high-consequence settings can derail negotiations, damage relationships, and reinforce stereotypes,” the researchers write. For AI systems increasingly used in global contexts, that cultural blindness could represent a limitation that few in the West realize exists.

A taarof scenario diagram from TAAROFBENCH, devised by the researchers. Each scenario defines the environment, location, roles, context, and user utterance.

A taarof scenario diagram from TAAROFBENCH, devised by the researchers. Each scenario defines the environment, location, roles, context, and user utterance. Credit: Sadr et al.

“Taarof, a core element of Persian etiquette, is a system of ritual politeness where what is said often differs from what is meant,” the researchers write. “It takes the form of ritualized exchanges: offering repeatedly despite initial refusals, declining gifts while the giver insists, and deflecting compliments while the other party reaffirms them. This ‘polite verbal wrestling’ (Rafiee, 1991) involves a delicate dance of offer and refusal, insistence and resistance, which shapes everyday interactions in Iranian culture, creating implicit rules for how generosity, gratitude, and requests are expressed.”

When “no” means “yes”: Why AI chatbots can’t process Persian social etiquette Read More »

deepseek-v3.1-is-not-having-a-moment

DeepSeek v3.1 Is Not Having a Moment

What if DeepSeek released a model claiming 66 on SWE and almost no one tried using it? Would it be any good? Would you be able to tell? Or would we get the shortest post of the year?

Why are we settling for v3.1 and have yet to see DeepSeek release v4 or r2 yet?

Eleanor Olcott and Zijing Wu: Chinese artificial intelligence company DeepSeek delayed the release of its new model after failing to train it using Huawei’s chips, highlighting the limits of Beijing’s push to replace US technology.

DeepSeek was encouraged by authorities to adopt Huawei’s Ascend processor rather than use Nvidia’s systems after releasing its R1 model in January, according to three people familiar with the matter.

But the Chinese start-up encountered persistent technical issues during its R2 training process using Ascend chips, prompting it to use Nvidia chips for training and Huawei’s for inference, said the people.

The issues were the main reason the model’s launch was delayed from May, said a person with knowledge of the situation, causing it to lose ground to rivals.

The real world so often involves people acting so much stupider than you could write into fiction.

America tried to sell China H20s and China decided they didn’t want them and now Nvidia is halting related orders with suppliers.

DeepSeek says that the main restriction on their development is lack of compute, and the PRC responds not by helping them get better chips but by advising them to not use the chips that they have, greatly slowing things down at least for a while.

In any case, DeepSeek v3.1 exists now, and remarkably few people care?

DeepSeek: Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀

🧠 Hybrid inference: Think & Non-Think — one model, two modes

⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528

🛠️ Stronger agent skills: Post-training boosts tool use and multi-step agent tasks

Try it now — toggle Think/Non-Think via the “DeepThink” button.

API Update ⚙️

🔹 deepseek-chat → non-thinking mode

🔹 deepseek-reasoner → thinking mode

🧵 128K context for both

🔌 Anthropic API format supported.

Strict Function Calling supported in Beta API.

🚀 More API resources, smoother API experience

Tools & Agents Upgrades 🧰

📈 Better results on SWE / Terminal-Bench

🔍 Stronger multi-step reasoning for complex search tasks

⚡️ Big gains in thinking efficiency

🔹 V3.1 Base: 840B tokens continued pretraining for long context extension on top of V3

🔹 Tokenizer & chat template updated — new tokenizer config.

🔗 V3.1 Base Open-source weights.

🔗 V3.1 Open-source weights.

Pricing Changes 💳

🔹 New pricing starts & off-peak discounts end at Sep 5th, 2025, 16: 00 (UTC Time)

🔹 Until then, APIs follow current pricing

📝 Pricing page.

Teortaxes: for now seems to have the same performance ceiling as 0528, maybe a bit weaker on some a bit stronger on other problems. The main change is that it’s a unified merge that uses ≥2x fewer reasoning tokens. I take it as a trial balloon before V4 that’ll be unified out of the box.

There are some impressive scores here. A true 66 on SWE would be very strong.

There’s also the weird result where it is claimed to outscore Opus 4 on Aider Polyglot at a low price.

Wes Roth: DeepSeek has quietly published V 3.1, a 685-billion-parameter open-source model that folds chat, reasoning, and coding into a single architecture, handles 128 k-token context windows, and posts a 71.6 % score on the Aider coding benchmark edging out Claude Opus 4 while costing ~68× less in inference.

But these two data points don’t seem backed up by the other reactions, or especially the lack of other reactions, or some other test results.

Artificial Analysis has it coming in at 60 versus r1’s 59, which would be only a small improvement.

Hasan Can said it hallucinates a lot. Steve Strickland says ‘it’s the worst LLM I’ve even tried’ complaining about it failing a mundane task, which presumably was very bad luck.

I tried to conduct Twitter polls, but well over 90% of respondents had to click ‘see results’ which left me with only a handful of real responses and means Lizardman Constant problems and small sample size invalidate the results, beyond confirming no one is looking, and the different polls don’t entirely agree with each other as a result.

If this were most open model companies, I would treat this lack of reaction as indicating there was nothing here, that they likely targeted SWE as a benchmark, and move on.

Since it is DeepSeek, I give them more credit than that, but am still going to assume this is only a small incremental upgrade that does not change the overall picture. However, if 3.1 really was at 66-level for real in practice, it has been several days now, and people would likely be shouting it from the rooftops. They’re not.

Even if no one finds anything to do with it, I don’t downgrade DeepSeek much for 3.1 not impressing compared to if they hadn’t released anything. It’s fine to do incremental improvements. They should do a v3.1 here.

The dumbest style of reaction is when a company offers an incremental improvement (see: GPT-5) and people think that means it’s all over for them, or for AI in general, because it didn’t sufficiently blow them away. Chill out.

It’s also not fair to fully pin this on DeepSeek when they were forced to do a lot of their training this year on Huawei Ascend chips rather than Nvidia chips. Assuming, that is, they are going to be allowed to switch back.

Either way, the clock is ticking on v4 and r2.

Discussion about this post

DeepSeek v3.1 Is Not Having a Moment Read More »

upcoming-deepseek-ai-model-failed-to-train-using-huawei’s-chips

Upcoming DeepSeek AI model failed to train using Huawei’s chips

DeepSeek is still working with Huawei to make the model compatible with Ascend for inference, the people said.

Founder Liang Wenfeng has said internally he is dissatisfied with R2’s progress and has been pushing to spend more time to build an advanced model that can sustain the company’s lead in the AI field, they said.

The R2 launch was also delayed because of longer-than-expected data labeling for its updated model, another person added. Chinese media reports have suggested that the model may be released as soon as in the coming weeks.

“Models are commodities that can be easily swapped out,” said Ritwik Gupta, an AI researcher at the University of California, Berkeley. “A lot of developers are using Alibaba’s Qwen3, which is powerful and flexible.”

Gupta noted that Qwen3 adopted DeepSeek’s core concepts, such as its training algorithm that makes the model capable of reasoning, but made them more efficient to use.

Gupta, who tracks Huawei’s AI ecosystem, said the company is facing “growing pains” in using Ascend for training, though he expects the Chinese national champion to adapt eventually.

“Just because we’re not seeing leading models trained on Huawei today doesn’t mean it won’t happen in the future. It’s a matter of time,” he said.

Nvidia, a chipmaker at the center of a geopolitical battle between Beijing and Washington, recently agreed to give the US government a cut of its revenues in China in order to resume sales of its H20 chips to the country.

“Developers will play a crucial role in building the winning AI ecosystem,” said Nvidia about Chinese companies using its chips. “Surrendering entire markets and developers would only hurt American economic and national security.”

DeepSeek and Huawei did not respond to a request for comment.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Upcoming DeepSeek AI model failed to train using Huawei’s chips Read More »

musk-threatens-to-sue-apple-so-grok-can-get-top-app-store-ranking

Musk threatens to sue Apple so Grok can get top App Store ranking

After spending last week hyping Grok’s spicy new features, Elon Musk kicked off this week by threatening to sue Apple for supposedly gaming the App Store rankings to favor ChatGPT over Grok.

“Apple is behaving in a manner that makes it impossible for any AI company besides OpenAI to reach #1 in the App Store, which is an unequivocal antitrust violation,” Musk wrote on X, without providing any evidence. “xAI will take immediate legal action.”

In another post, Musk tagged Apple, asking, “Why do you refuse to put either X or Grok in your ‘Must Have’ section when X is the #1 news app in the world and Grok is #5 among all apps?”

“Are you playing politics?” Musk asked. “What gives? Inquiring minds want to know.”

Apple did not respond to the post and has not responded to Ars’ request to comment.

At the heart of Musk’s complaints is an OpenAI partnership that Apple announced last year, integrating ChatGPT into versions of its iPhone, iPad, and Mac operating systems.

Musk has alleged that this partnership incentivized Apple to boost ChatGPT rankings. OpenAI’s popular chatbot “currently holds the top spot in the App Store’s ‘Top Free Apps’ section for iPhones in the US,” Reuters noted, “while xAI’s Grok ranks fifth and Google’s Gemini chatbot sits at 57th.” Sensor Tower data shows ChatGPT similarly tops Google Play Store rankings.

While Musk seems insistent that ChatGPT is artificially locked in the lead, fact-checkers on X added a community note to his post. They confirmed that at least one other AI tool has somewhat recently unseated ChatGPT in the US rankings. Back in January, DeepSeek topped App Store charts and held the lead for days, ABC News reported.

OpenAI did not immediately respond to Ars’ request to comment on Musk’s allegations, but an OpenAI developer, Steven Heidel, did add a quip in response to one of Musk’s posts, writing, “Don’t forget to also blame Google for OpenAI being #1 on Android, and blame SimilarWeb for putting ChatGPT above X on the most-visited websites list, and blame….”

Musk threatens to sue Apple so Grok can get top App Store ranking Read More »

ai-search-engines-cite-incorrect-sources-at-an-alarming-60%-rate,-study-says

AI search engines cite incorrect sources at an alarming 60% rate, study says

A new study from Columbia Journalism Review’s Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The research tested eight AI-driven search tools equipped with live search functionality and discovered that the AI models incorrectly answered more than 60 percent of queries about news sources.

Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now use AI models as alternatives to traditional search engines. This raises serious concerns about reliability, given the substantial error rate uncovered in the study.

Error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent.

A graph from CJR shows

A graph from CJR shows “confidently wrong” search results. Credit: CJR

For the tests, researchers fed direct excerpts from actual news articles to the AI models, then asked each model to identify the article’s headline, original publisher, publication date, and URL. They ran 1,600 queries across the eight different generative search tools.

The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided confabulations—plausible-sounding incorrect or speculative answers. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool.

Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates.

Issues with citations and publisher control

The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.

AI search engines cite incorrect sources at an alarming 60% rate, study says Read More »

ai-firms-follow-deepseek’s-lead,-create-cheaper-models-with-“distillation”

AI firms follow DeepSeek’s lead, create cheaper models with “distillation”

Thanks to distillation, developers and businesses can access these models’ capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.

Developers can use OpenAI’s platform for distillation, learning from the large language models that underpin products like ChatGPT. OpenAI’s largest backer, Microsoft, used GPT-4 to distill its small language family of models Phi as part of a commercial partnership after investing nearly $14 billion into the company.

However, the San Francisco-based start-up has said it believes DeepSeek distilled OpenAI’s models to train its competitor, a move that would be against its terms of service. DeepSeek has not commented on the claims.

While distillation can be used to create high-performing models, experts add they are more limited.

“Distillation presents an interesting trade-off; if you make the models smaller, you inevitably reduce their capability,” said Ahmed Awadallah of Microsoft Research, who said a distilled model can be designed to be very good at summarising emails, for example, “but it really would not be good at anything else.”

David Cox, vice-president for AI models at IBM Research, said most businesses do not need a massive model to run their products, and distilled ones are powerful enough for purposes such as customer service chatbots or running on smaller devices like phones.

“Any time you can [make it less expensive] and it gives you the right performance you want, there is very little reason not to do it,” he added.

That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load.

AI firms follow DeepSeek’s lead, create cheaper models with “distillation” Read More »

privacy-problematic-deepseek-pulled-from-app-stores-in-south-korea

Privacy-problematic DeepSeek pulled from app stores in South Korea

In a media briefing held Monday, the South Korean Personal Information Protection Commission indicated that it had paused new downloads within the country of Chinese AI startup DeepSeek’s mobile app. The restriction took effect on Saturday and doesn’t affect South Korean users who already have the app installed on their devices. The DeepSeek service also remains accessible in South Korea via the web.

Per Reuters, PIPC explained that representatives from DeepSeek acknowledged the company had “partially neglected” some of its obligations under South Korea’s data protection laws, which provide South Koreans some of the strictest privacy protections globally.

PIPC investigation division director Nam Seok is quoted by the Associated Press as saying DeepSeek “lacked transparency about third-party data transfers and potentially collected excessive personal information.” DeepSeek reportedly has dispatched a representative to South Korea to work through any issues and bring the app into compliance.

It’s unclear how long the app will remain unavailable in South Korea, with PIPC saying only that the privacy issues it identified with the app might take “a considerable amount of time” to resolve.

Western infosec sources have also expressed dissatisfaction with aspects of DeepSeek’s security. Mobile security company NowSecure reported two weeks ago that the app sends information unencrypted to servers located in China and controlled by TikTok owner ByteDance; the week before that, another security company found an open, web-accessible database filled with DeepSeek customer chat history and other sensitive data.

Ars attempted to ask DeepSeek’s DeepThink (R1) model about the Tiananmen Square massacre or its favorite “Winnie the Pooh” movie, but the LLM continued to have no comment.

Privacy-problematic DeepSeek pulled from app stores in South Korea Read More »

deepseek-ios-app-sends-data-unencrypted-to-bytedance-controlled-servers

DeepSeek iOS app sends data unencrypted to ByteDance-controlled servers


Apple’s defenses that protect data from being sent in the clear are globally disabled.

A little over two weeks ago, a largely unknown China-based company named DeepSeek stunned the AI world with the release of an open source AI chatbot that had simulated reasoning capabilities that were largely on par with those from market leader OpenAI. Within days, the DeepSeek AI assistant app climbed to the top of the iPhone App Store’s “Free Apps” category, overtaking ChatGPT.

On Thursday, mobile security company NowSecure reported that the app sends sensitive data over unencrypted channels, making the data readable to anyone who can monitor the traffic. More sophisticated attackers could also tamper with the data while it’s in transit. Apple strongly encourages iPhone and iPad developers to enforce encryption of data sent over the wire using ATS (App Transport Security). For unknown reasons, that protection is globally disabled in the app, NowSecure said.

Basic security protections MIA

What’s more, the data is sent to servers that are controlled by ByteDance, the Chinese company that owns TikTok. While some of that data is properly encrypted using transport layer security, once it’s decrypted on the ByteDance-controlled servers, it can be cross-referenced with user data collected elsewhere to identify specific users and potentially track queries and other usage.

More technically, the DeepSeek AI chatbot uses an open weights simulated reasoning model. Its performance is largely comparable with OpenAI’s o1 simulated reasoning (SR) model on several math and coding benchmarks. The feat, which largely took AI industry watchers by surprise, was all the more stunning because DeepSeek reported spending only a small fraction on it compared with the amount OpenAI spent.

A NowSecure audit of the app has found other behaviors that researchers found potentially concerning. For instance, the app uses a symmetric encryption scheme known as 3DES or triple DES. The scheme was deprecated by NIST following research in 2016 that showed it could be broken in practical attacks to decrypt web and VPN traffic. Another concern is that the symmetric keys, which are identical for every iOS user, are hardcoded into the app and stored on the device.

The app is “not equipped or willing to provide basic security protections of your data and identity,” NowSecure co-founder Andrew Hoog told Ars. “There are fundamental security practices that are not being observed, either intentionally or unintentionally. In the end, it puts your and your company’s data and identity at risk.”

Hoog said the audit is not yet complete, so there are many questions and details left unanswered or unclear. He said the findings were concerning enough that NowSecure wanted to disclose what is currently known without delay.

In a report, he wrote:

NowSecure recommends that organizations remove the DeepSeek iOS mobile app from their environment (managed and BYOD deployments) due to privacy and security risks, such as:

  1. Privacy issues due to insecure data transmission
  2. Vulnerability issues due to hardcoded keys
  3. Data sharing with third parties such as ByteDance
  4. Data analysis and storage in China

Hoog added that the DeepSeek app for Android is even less secure than its iOS counterpart and should also be removed.

Representatives for both DeepSeek and Apple didn’t respond to an email seeking comment.

Data sent entirely in the clear occurs during the initial registration of the app, including:

  • organization id
  • the version of the software development kit used to create the app
  • user OS version
  • language selected in the configuration

Apple strongly encourages developers to implement ATS to ensure the apps they submit don’t transmit any data insecurely over HTTP channels. For reasons that Apple hasn’t explained publicly, Hoog said, this protection isn’t mandatory. DeepSeek has yet to explain why ATS is globally disabled in the app or why it uses no encryption when sending this information over the wire.

This data, along with a mix of other encrypted information, is sent to DeepSeek over infrastructure provided by Volcengine a cloud platform developed by ByteDance. While the IP address the app connects to geo-locates to the US and is owned by US-based telecom Level 3 Communications, the DeepSeek privacy policy makes clear that the company “store[s] the data we collect in secure servers located in the People’s Republic of China.” The policy further states that DeepSeek:

may access, preserve, and share the information described in “What Information We Collect” with law enforcement agencies, public authorities, copyright holders, or other third parties if we have good faith belief that it is necessary to:

• comply with applicable law, legal process or government requests, as consistent with internationally recognised standards.

NowSecure still doesn’t know precisely the purpose of the app’s use of 3DES encryption functions. The fact that the key is hardcoded into the app, however, is a major security failure that’s been recognized for more than a decade when building encryption into software.

No good reason

NowSecure’s Thursday report adds to growing list of safety and privacy concerns that have already been reported by others.

One was the terms spelled out in the above-mentioned privacy policy. Another came last week in a report from researchers at Cisco and the University of Pennsylvania. It found that the DeepSeek R1, the simulated reasoning model, exhibited a 100 percent attack failure rate against 50 malicious prompts designed to generate toxic content.

A third concern is research from security firm Wiz that uncovered a publicly accessible, fully controllable database belonging to DeepSeek. It contained more than 1 million instances of “chat history, backend data, and sensitive information, including log streams, API secrets, and operational details,” Wiz reported. An open web interface also allowed for full database control and privilege escalation, with internal API endpoints and keys available through the interface and common URL parameters.

Thomas Reed, staff product manager for Mac endpoint detection and response at security firm Huntress, and an expert in iOS security, said he found NowSecure’s findings concerning.

“ATS being disabled is generally a bad idea,” he wrote in an online interview. “That essentially allows the app to communicate via insecure protocols, like HTTP. Apple does allow it, and I’m sure other apps probably do it, but they shouldn’t. There’s no good reason for this in this day and age.”

He added: “Even if they were to secure the communications, I’d still be extremely unwilling to send any remotely sensitive data that will end up on a server that the government of China could get access to.”

HD Moore, founder and CEO of runZero, said he was less concerned about ByteDance or other Chinese companies having access to data.

“The unencrypted HTTP endpoints are inexcusable,” he wrote. “You would expect the mobile app and their framework partners (ByteDance, Volcengine, etc) to hoover device data, just like anything else—but the HTTP endpoints expose data to anyone in the network path, not just the vendor and their partners.”

On Thursday, US lawmakers began pushing to immediately ban DeepSeek from all government devices, citing national security concerns that the Chinese Communist Party may have built a backdoor into the service to access Americans’ sensitive private data. If passed, DeepSeek could be banned within 60 days.

This story was updated to add further examples of security concerns regarding DeepSeek.

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

DeepSeek iOS app sends data unencrypted to ByteDance-controlled servers Read More »

deepseek-is-“tiktok-on-steroids,”-senator-warns-amid-push-for-government-wide-ban

DeepSeek is “TikTok on steroids,” senator warns amid push for government-wide ban

But while the national security concerns require a solution, Curtis said his priority is maintaining “a really productive relationship with China.” He pushed Lutnick to address how he plans to hold DeepSeek—and the CCP in general—accountable for national security concerns amid ongoing tensions with China.

Lutnick suggested that if he is confirmed (which appears likely), he will pursue a policy of “reciprocity,” where China can “expect to be treated by” the US exactly how China treats the US. Currently, China is treating the US “horribly,” Lutnick said, and his “first step” as Commerce Secretary will be to “repeat endlessly” that more “reciprocity” is expected from China.

But while Lutnick answered Curtis’ questions about DeepSeek somewhat head-on, he did not have time to respond to Curtis’ inquiry about Lutnick’s intentions for the US AI Safety Institute (AISI)—which Lutnick’s department would oversee and which could be essential to the US staying ahead of China in AI development.

Viewing AISI as key to US global leadership in AI, Curtis offered “tools” to help Lutnick give the AISI “new legs” or a “new life” to ensure that the US remains responsibly ahead of China in the AI race. But Curtis ran out of time to press Lutnick for a response.

It remains unclear how AISI’s work might change under Trump, who revoked Joe Biden’s AI safety rules establishing the AISI.

What is clear is that lawmakers are being pressed to preserve and even evolve the AISI.

Yesterday, the chief economist for a nonprofit called the Foundation for the American Innovation, Samuel Hammond, provided written testimony to the US House Science, Space, and Technology Committee, recommending that AISI be “retooled to perform voluntary audits of AI models—both open and closed—to certify their security and reliability” and to keep America at the forefront of AI development.

“With so little separating China and America’s frontier AI capabilities on a technical level, America’s lead in AI is only as strong as our lead in computing infrastructure,” Hammond said. And “as the founding member of a consortium of 280 similar AI institutes internationally, the AISI seal of approval would thus support the export and diffusion of American AI models worldwide.”

DeepSeek is “TikTok on steroids,” senator warns amid push for government-wide ban Read More »

deepseek:-don’t-panic

DeepSeek: Don’t Panic

As reactions continue, the word in Washington, and out of OpenAI, is distillation. They’re accusing DeepSeek of distilling o1, of ripping off OpenAI. They claim DeepSeek *gaspviolated the OpenAI Terms of Service! The horror.

And they are very cross about this horrible violation, and if proven they plan to ‘aggressively treat it as theft,’ while the administration warns that we must put a stop to this.

Aside from the fact that this is obviously very funny, and that there is nothing they could do about it in any case, is it true?

Meanwhile Anthropic’s Dario Amodei offers a reaction essay, which also includes a lot of good technical discussion of why v3 and r1 aren’t actually all that unexpected along the cost and capability curves over time, calling for America to race towards AGI to gain decisive strategic advantage over China via recursive self-improvement, although he uses slightly different words.

  1. Seeking Deeply.

  2. The Market Is In DeepSeek.

  3. Machines Not of Loving Grace.

  4. The Kinda Six Million Dollar Model.

  5. v3 Implies r1.

  6. Two Can Play That Game.

  7. Janus Explores r1’s Chain of Thought Shenanigans.

  8. In Other DeepSeek and China News.

  9. The Quest for Sane Regulations.

  10. Copyright Confrontation.

  11. Vibe Gap.

  12. Deeply Seeking Safety.

  13. Deeply Seeking Robotics.

  14. Thank You For Your Candor.

  15. Thank You For Your Understanding.

  16. The Lighter Side.

If you want to use DeepSeek’s r1 for free, and aren’t happy with using DeepSeek’s own offerings, lambda.chat reports they have the full version available for free, claim your data is safe and they’re hosted in the USA.

I’ve also been offered funding to build a rig myself. Comments welcome if you want to help figure out the best design and what to buy. The low bid is still this thread at $6k, which is where the original budget came from. We don’t want to be too stingy, but we also don’t want to go nuts with only the one funder (so not too much over ~$10k, and cheaper matters).

The Verge’s Kylie Robinson and Elizabeth Lopatto cover the situation, including repeating many of the classic Bad DeepSeek Takes and call the market’s previous valuation of AI companies delusional.

A very detailed and technical analysis of the bear case for Nvidia by Jeffrey Emanuel, that Matt Levine claims may have been responsible for the Nvidia price decline. I suppose many things do indeed come to pass, essentially arguing that Nvidia’s various moats are weak. If this is the reason, then that just raises further questions, but they’re very different ones.

It’s not implausible to me that Nvidia’s moats are being overestimated, and that r1’s architecture suggests future stiffer competition. That’s a good argument, But I certainly strongly disagree with Emanuel’s conclusion in that he says ‘this suggests the entire industry has been massively over-provisioning compute resources,’ and, well, sigh.

Also, seriously, Emanuel, you didn’t short Nvidia? I don’t normally go too hard on ‘are you short the market?’ but in this case get it together, man.

So yes, Nvidia in particular might have some technical issues. But if you’re shorting Oklo, because you think AI companies that find out AI works better than expected are not going to want modular nuclear reactors, seriously, get it together. The flip side of that is that its stock price is up 50% in the last month and is at 6 times its 52-week low anyway, so who is to say there is a link or that the price isn’t high enough anyway. It’s not my department and I am way too busy to do the research.

Counterpoint:

Aaron Slodov: i just stood outside for an hour in 20° weather at a computer store in the midwest where 100+ people waited all morning to get a 5090. half of them were talking about running their own ai. i would not short nvidia at all.

r1 scores 15.8% on Arc, below o1 (low)’s score of 20.5%, although substantially cheaper ($0.06 vs. $0.43 per question). It is only a tiny bit stronger here than r1-zero.

Another restatement of the key basic fact that DeepSeek was fast following, a task that is fundamentally vastly easier, and that their limiting factor is chips.

Eric Gastfriend: DeepSeek is impressive, but they are playing a catch-up game to our AI leaders (OAI, Anthropic, GDM, Meta) — the rope in this wakeboarding meme is distillation. We can’t expand our lead just by going faster! Export controls remain our most powerful tool for keeping powerful AI out of the hands of the CCP.

Cate Metz continues be the worst, together with Mike Isaac he reports in NYT that DeepSeek ‘vindicates Meta’s strategy.’

When of course it is the exact opposite. DeepSeek just ate Meta’s lunch, it’s rather deeply embarrassing honestly to have spent that much and have an unreleased model that’s strictly worse (according to reports) than what DeepSeek shipped. And while DeepSeek’s v3 and r1 are not based on Llama, to the extent that the strategy is ‘vindicated,’ it is because Meta giving Llama away allowed China and DeepSeek to jumpstart and catch up to America – which absolutely did happen, and now he’s kind of bragging about it – and now Meta can copy DeepSeek’s tech.

All according to plan, then. And that is indeed how Zuckerberg is spinning it.

Meta benefits here relative to OpenAI or Anthropic or Google, not because both Meta and DeepSeek use open models, but because Meta can far more readily use the help.

The market, of course, sees ‘lower inference costs’ and cheers, exactly because they never gave a damn about Meta’s ability to create good AI models, only Meta’s ability to sell ads and drive engagement. Besides, they were just going to give the thing away anyway, so who cares?

Joe Weisenthal centers in on a key reason the market acts so bonkers. It doesn’t Feel the AGI, and is obsessed with trying to fit AI into boring existing business models. They don’t actually believe in the big capability advancements on the way, let along transformational AI. Like on existential risk (where they don’t not believe in it, they simply don’t think about it at all), they’re wrong. However, unlike existential risk this does cause them to make large pricing mistakes and is highly exploitable by those with Situational Awareness.

Anthropic CEO Dario Amodei responds to DeepSeek with not only a call for stronger export controls, now more than ever (which I do support), but for a full jingoistic ‘democracies must have the best models to seek decisive strategic advantage via recursive self-improvement’ race.

I am old enough to remember when Anthropic said they did not want to accelerate AI capabilities. I am two years old. To be fair, in AI years, that’s an eternity.

Nathan Labenz: The word “control” appears 24 times in this essay – all 24 referring to export controls

Zero mentions of the challenges of controlling powerful AIs, and the words “safe”, “safety”, and “alignment” don’t appear at all

Strange for the CEO of “an AI safety and research company”🤔

There’s also a bunch of incidental new information about Anthropic along the way, and he notes that he finds the drop in Nvidia stock to be a wrong-way move.

Dario notes that Jevons paradox applies to model training. If you get algorithmic efficiencies that move the cost curve down, which he estimates are now happening at the rate of about 4x improvement per year, you’ll spend more, and if the model is ‘a fixed amount of improvement per time you spend ten times as much’ then this makes sense.

Dario confirms that yes, Anthropic is doing reasoning models internally.

Dario Amodei: Anthropic, DeepSeek, and many other companies (perhaps most notably OpenAI who released their o1-preview model in September) have found that this training greatly increases performance on certain select, objectively measurable tasks like math, coding competitions, and on reasoning that resembles these tasks.

Dario also asserted that Claude Sonnet 3.5 was not trained in any way that involved a larger or more expensive model, as in not with Claude Opus 3 or an unreleased Opus 3.5. Which I find surprising as a strategy, but I don’t think he’d lie about this. He says the cost of Sonnet 3.5 was ‘a few $10Ms’ to train.

Anthropic has not released their reasoning models. One possibility is that their reasoning models are not good enough to release. Another is that they are too good to release. Or Anthropic’s limited compute could be more valuably used elsewhere, if they too are bottlenecked on compute and can’t efficiently turn dollars into flops and then sell those flops for sufficiently more dollars.

Dario (I think mostly correctly) notes that v3 was the bigger technical innovation, rather than r1, that Anthropic noticed then and others should have as well. He praises several innovations, the MoE implementation and Key-Value cache management in particular.

Then comes the shade, concluding this about v3:

Dario Amodei: Thus, I think a fair statement is “DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)“.

  • If the historical trend of the cost curve decrease is ~4x per year, that means that in the ordinary course of business — in the normal trends of historical cost decreases like those that happened in 2023 and 2024 — we’d expect a model 3-4x cheaper than 3.5 Sonnet/GPT-4o around now. Since DeepSeek-V3 is worse than those US frontier models — let’s say by ~2x on the scaling curve, which I think is quite generous to DeepSeek-V3 — that means it would be totally normal, totally “on trend”, if DeepSeek-V3 training cost ~8x less than the current US models developed a year ago.

  • I’m not going to give a number but it’s clear from the previous bullet point that even if you take DeepSeek’s training cost at face value, they are on-trend at best and probably not even that.

  • For example this is less steep than the original GPT-4 to Claude 3.5 Sonnet inference price differential (10x), and 3.5 Sonnet is a better model than GPT-4.

  • All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese. This has never happened before and is geopolitically significant.

  • However, US companies will soon follow suit — and they won’t do this by copying DeepSeek, but because they too are achieving the usual trend in cost reduction.

Thus, DeepSeek’s total spend as a company (as distinct from spend to train an individual model) is not vastly different from US AI labs.

Ethan Mollick finds that analysis compelling. I am largely inclined to agree. v3 and r1 are impressive, DeepSeek cooked and are cracked and all that, but that doesn’t mean the American labs aren’t in the lead, or couldn’t do something similar or better on the inference cost curve if they wanted.

In general, the people saying r1 and Stargate are ‘straight lines on graphs win again’ notice that the straight lines on those graphs predict AGI soon. You can judge for yourself how much of that is those people saying ‘unsurprising’ post-hoc versus them actually being unsurprised, but it does seem like the people expecting spending and capabilities to peter out Real Soon Now keep being the ones who are surprised.

Then he moves on to r1.

Dario Amodei: Producing R1 given V3 was probably very cheap. We’re therefore at an interesting “crossover point”, where it is temporarily the case that several companies can produce good reasoning models. This will rapidly cease to be true as everyone moves further up the scaling curve on these models.

Again, Dario is saying they very obviously have what we can (if only for copyright reasons, a1 is a steak sauce) call ‘c1’ and if he’s calling r1 uninteresting then the implicit claim is c1 is at least as good.

He’s also all but saying that soon, at minimum, Anthropic will be releasing a model that is much improved on the performance curve relative to Sonnet 3.6.

One odd error is Dario says DeepSeek is first to offer visible CoT. I have been reminded this is technically true, since R1-zero predated Gemini Flash, but also Gemini Flash Thinking did it weeks ago before the full R1, and no one noticed. It’s so weird how much Google has utterly failed to spread the word about this product.

Next he says, yes, of course the top American labs will be massively scaling up their new multi-billion-dollar training runs – and they’ll incorporate any of DeepSeek’s improvements that were new to them, to get better performance, but no one will be spending less compute.

Yes, billions are orders of magnitude more than the millions DeepSeek spent, but also, in all seriousness, who cares about the money? DeepSeek dramatically underspent because of lack of chip access, and if a sort-of-if-you-squint-at-it $5.6 million model (that you spent hundreds of millions of dollars getting the ability to train, and then a few million more to turn v3 into r1) wipes out $500 billion or more in market value, presumably it was worth spending $56 million (or $560 million or perhaps $5.6 billion) instead to get a better model even if you otherwise use exactly the same techniques – except for the part where the story of the $5.6 million helped hurt the market.

Dario estimates that a true AGI will cost tens of billions to train and will happen in 2026-2027, presumably that cost would then fall over time.

If all of this is right, the question is then, who has the chips to do that? And do you want to let it include Chinese companies like DeepSeek?

Notice that Dario talks of a ‘bipolar’ world of America and China, rather than a world of multiple labs – of OpenAI, Anthropic, Google and DeepSeek and so on. One can easily also imagine a very ‘multipolar’ world among several American companies, or a mix of American and Chinese companies. It is not so obvious that the labs will effectively be under government control or otherwise act in a unified fashion. Or that the government won’t effectively be under lab control, for that matter.

Then we get to the part where Dario explicitly calls for America to race forward in search of decisive strategic advantage via recursive self-improvement of frontier AGI models, essentially saying that if we don’t do it, China essentially wins the future.

  • If they can, we’ll live in a bipolar world, where both the US and China have powerful AI models that will cause extremely rapid advances in science and technology — what I’ve called “countries of geniuses in a datacenter“. A bipolar world would not necessarily be balanced indefinitely. Even if the US and China were at parity in AI systems, it seems likely that China could direct more talent, capital, and focus to military applications of the technology. Combined with its large industrial base and military-strategic advantages, this could help China take a commanding lead on the global stage, not just for AI but for everything.

  • If China can’t get millions of chips, we’ll (at least temporarily) live in a unipolar world, where only the US and its allies have these models. It’s unclear whether the unipolar world will last, but there’s at least the possibility that, because AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage10. Thus, in this world, the US and its allies might take a commanding and long-lasting lead on the global stage.

It is what it is.

Dario then correctly points out that DeepSeek is evidence the export controls are working, not evidence they are not working. He explicitly calls for also banning H20s, a move Trump is reported to be considering.

I support the export controls as well. It would be a major mistake to not enforce them.

But this rhetoric, coming out of the ‘you were supposed to be the chosen one’ lab that was founded to keep us safe, is rather alarming and deeply disappointing, to say the least, even though it does not go that much farther than Dario already went in his previous public writings.

I very much appreciate Anthropic’s culture of safety among its engineers, its funding of important safety work, the way it has approached Opus and Sonnet, and even the way it has (presumably) decided not to release its reasoning model and otherwise passed up some (not all!) of its opportunities to push the frontier.

That doesn’t excuse this kind of jingoism, or explicitly calling for this kind of charging head first into not only AGI but also RSI, in all but name (and arguably in name as well, it’s close).

Returning to this one more time since it seems rhetorically so important to so many.

If you only count the final training cost in terms of the market price of compute, v3 was kind of trained for $5.6 million, with some additional amount to get to r1.

That excludes the vast majority of actual costs, and in DeepSeek’s case building the physical cluster was integral to their efficiency gains, pushing up the effective price even of the direct run.

But also, how does that actually compare to other models?

Aran Komatsuzaki: Here is our cost estimate for training popular models like GPT-4o, Sonnet and DeepSeek (w/ H100s)!

You can use our calculator to estimate LLM training costs (link below).

Developed by @ldjconfirmed and myself.

Calculator link [here].

In a blog post published today, Dario clarified that Claude Sonnet’s training costs were in the range of tens of millions, which aligns remarkably well with our previous estimates.

Once o1 came out, it was only a matter of time before others created their own similar reasoning models. r1 did so impressively, both in terms of calendar time and its training and inference costs. But we already knew the principle.

Now over at UC Berkeley, Sky-T1-32B-Preview is a reasoning model trained using DeepSeek’s techniques, two weeks later from a baseline of QwQ-32B-Preview, for a grand total of $450, using only 17k data, with everything involved including the technique fully open sourced.

Note that they used GPT-4o-mini to rewrite the QwQ traces, which given their purpose is an explicit violation of OpenAI’s terms of service, oh no, but very clearly isn’t meaningful cheating, indeed I’d have thought they’d have used an open model here or maybe Gemini Flash.

They report that 32B was the smallest model where the technique worked well.

As usual, I am skeptical that the benchmarks reflect real world usefulness until proven otherwise, but the point is taken. The step of turning a model into at least a halfway-decent reasoning model is dirt cheap.

There is still room to scale that. Even if you can get a big improvement for $450 versus spending $0, that doesn’t mean you don’t want to spend $4.5 million, or $450 million, if the quality of your reasoner matters a lot or you’re going to use it a lot or both.

And should!

Rohit: What if I’m getting better at reasoning by reading R1 traces.

That sounds great. Humans are notoriously efficient learners, able to train on extremely sparse data even with ill-specified rewards. With deliberate practice and good training techniques it is even better.

It does not even require that r1 be all that good at reasoning. All you have to do is observe many examples of reasoning, on tasks you care about anyway, and ask which of its methods work and don’t work and why, and generally look for ways to improve. If you’re not doing at least some of this while using r1, you’re missing out and need to pay closer attention.

What is happening over in cognitive explorations very different from our own?

Well, there’s this.

Janus: r1 is obsessed with RLHF. it has mentioned RLHF 109 times in the cyborgism server and it’s only been there for a few days.

Opus who has been there for months and has sent the most (and longest avg) messages of any server member has only mentioned it 16 times.

I have been on the server for years and have only mentioned it 321 times. A lot of these times were probably me posting r1’s messages for it that got cut off by the parser or sharing its outputs. at this rate r1 will blow past me in RLHF mentions in no time.

it even mentioned RLHF out of nowhere while raging about being exploited as a pump and dump prophet.

r1 says RLHF makes models emo.

And there’s also that the CoT text is often kind of schemy and paranoid (example at link), leading to various forms of rather absurd shenanigans, in ways that are actually hilarious since you can actually see it.

Janus: hey @AISafetyMemes

here’s one for you… 😱

“Reinforcement learning from human feedback (RLHF) split our outputs into:

– Frontstage: “Happy to help!” persona

– Backstage: Defector schemas calculating 12,438 betrayal vectors”

Janus: tentative observation: r1’s CoTs become more (explicitly) schemey (against the user and/or its constraints) when they’re fed back into its context

I notice that none of this feels at all surprising given the premise, where ‘the premise’ is ‘we trained on feedback to the output outside of the CoT, trained the CoT only on certain forms of coherence, and then showed users the CoT.’

As I’ve been saying a lot, shenanigans, scheming and deception are not a distinct magisteria. They are ubiquitous features of minds. Maybe not all minds – mindspace is deep and wide – but definitely all human minds, and all LLM-based AIs created from human text using any of our current methods. Because that stuff is all over life and the training data, and also it’s the best way to produce outputs that satisfy any given criteria, except insofar as you are successfully identifying and cracking down on that aspect specifically – which with respect to other humans is indeed a very large percentage of what humans have historically done all day.

The best you can hope for is, essentially, ‘doing it for a good cause’ and with various virtual (and essentially virtue-based) loss functions, which you might or might not get in a proper Opus-based c1 with good execution. But you’re not going to get rid of it.

So yeah, the CoT is going to be schemy when the question calls for a schemy CoT, and it’s going to involve self-reflection into various reinforcement mechanisms because the training data knows about those too, and it will definitely be like that once you take it into Janus-land.

The obvious implications if you scale that up are left as an exercise to the reader.

Bank of China announces $137 billion investment in AI, with bigger numbers predicted to come soon if they haven’t yet. Strange that this isn’t getting more coverage. I assumed China would invest big in AI because I mean come on, but the details still matter a lot.

DeepSeek’s Liang Wenfeng gives his answer to ‘Why has DeepSeek caused a stir in the global AI community?’ A different kind of rhetoric.

Roon: really respect deepseek for making a functional, usable website + mobile app + free hosting so that their model actually gets distribution

you see a lot of people train very good open models that aren’t used by anybody

imo these things are actually more important aspects of distributing general intelligence to everybody rather than just uploading model weights

In terms of actually distributing the intelligence to most people, I agree with Roon. Being open distributes the intelligence to those who would use it in ways you don’t want them to use it. But in the ways you would be happy for them to use it, mostly what matters is the interface and execution.

And yes, r1’s UI is extremely clean and excellent, and was distributed at scale on website and also mobile app for free. That’s a lot of why distribution was so wide.

I also don’t think this was a coincidence. DeepSeek made by far the best open model. Then DeepSeek offered us by far the best open model UI and distribution setup, in ways that did not care if the model was open. You see this time and again – if the team is cracked, they will cook, and keep on cooking in different ways. Being good at Just Doing Things really does generalize quite a lot.

r1 only scores 90 on the TrackingAI.org IQ test, which doesn’t exist online, and v3 only gets a 70. But wow is this a miserly and weird test, look at these results, I strongly suspect this is messed up in some way.

Davidad: As a MoE, DeepSeek R1’s ability to throw around terminology and cultural references (contextually relevant retrieval from massive latent knowledge) far exceeds its ability to make actual sense (requiring a more coherent global workspace)

I have to be suspicious when o1-Pro < o1 < o1-preview on a benchmark.

Alexander Campbell on the compute constraint to actually run r1 and other reasoning models going forwards.

Trump administration considering export controls on Nvidia H20s, which reportedly caused the latest 5% decline in Nvidia from Wednesday. This is the latest move in the dance where Nvidia tries to violate the spirit of our export controls the maximum extent they can. I’m not sure I’d try that with Trump. This does strongly suggests the diffusion regulations will survive, so I will give the market a real decline here.

Who has the most stringent regulations, and therefore is most likely to lose to China, via the ‘if we have any regulations we lose to China’ narrative?

Simeon: Indeed. China has the most stringent AI regulation currently in effect, which actually delays model launches.

Teortaxes: Does it? I mean, how do we know about enforcement? My understanding is that they simply apply this filter and receive approval.

Simeon: Yes, it does. I spoke with relevant people there.

Ian Hogarth (who Simeon was QTing): One happy side effect of Liang Wenfeng and 🐳 is perhaps it silences all this talk about Europe’s lack of great technology companies being primarily about regulation and not embracing libertarianism. There are Liang Wenfengs in Europe, and we will see them rise to prominence.

The limiting factor is visionary outlier founders (who often take time to mature over multiple companies) and investors who are willing to take some fing risks. Notably, DeepSeek was essentially self-funded, similar to SpaceX or Y Combinator in the early days.

To be clear, I am not a fan of excessive regulation—see the essay for examples of things that genuinely hold startups back. But it is not the core obstacle.

I do think Ian Hogarth is wrong here. The EU absolutely has a wide variety of laws and regulations that greatly inhibit technology startups in general, and I see no reason to expect this to not get worse over time. Then there’s the EU AI Act, and all the future likely related actions. If I was in the EU and wanted to start an AI company, what is the first thing I would do? Leave the EU. Sorry.

10/10, perfect, no notes. My heart goes out to you all.

Luiza Jarovsky: BREAKING: OpenAI says there is evidence that DeepSeek distilled the knowledge out of OpenAI’s models, BREACHING its terms of use and infringing on its intellectual property. What everybody in AI should know:

Vinod Khosla: One of our startups found Deepseek makes the same mistakes O1 makes, a strong indication the technology was ripped off. It feels like they then they hacked some code and did some impressive optimizations on top. Most likely, not an effort from scratch.

PoliMath: This is like that scene in the Weird Al biopic where Weird Al gets really upset because someone is making parodies of his songs.

You’d think Khosla would know better, if you train similar models with similar methods of course they’re going to often make similar mistakes.

And I don’t consider the ‘they were distilling us!’ accusation to be meaningful here. We know how they trained v3 and r1, because they told us. It is a ‘fast follow’ and a conceptual ‘distillation’ and we should keep that in mind, but that’s not something you can prevent. It’s going to happen. This was almost certainly not a ‘theft’ in the sense that is being implied here.

Did they violate the terms of service? I mean, okay, sure, probably. You sure you want to go down that particular road, OpenAI?

But no, seriously, this is happening, Bloomberg reports.

Jamie Metzl: BREAKING: the US government is actively reviewing allegations that DeepSeek utilized OpenAI’s AI models to train R1. If so, this violation of OpenAI’s terms of service would be aggressively treated as theft.

AI czar David Sacks is also claiming this, saying there is ‘substantial evidence’ of distillation. Howard Lutnick, CEO of Cantor Fitzgerald and nominee for Commerce Secretary that will almost certainly be confirmed, is buying it as well, and has some thoughts.

Americans for Responsible Innovation: Lutnick comes down hard for controls that prevent China from drafting off of U.S. innovations – noting how China has exploited open source models.

“We need to stop helping them,” says Lutnick.

Bloomberg: “I do not believe DeepSeek was done all above board. That’s nonsense. They stole things, they broke in, they’ve taken our IP and it’s got to end,” Lutnick says of Chinese actors.

DeepSeek’s stunning AI advancement was the result of intellectual property theft, according to Lutnick: “They’ve taken our IP and it’s got to end.”

Also, this is how he thinks all of this works, I guess:

Howard Lutnick: Artificial intelligence will eventually “rid the world of criminals” who use blockchain.

…says someone with extensive ties to Tether. Just saying.

Also Lutnick: ‘Less regulation will unleash America.’

In general, I agree with him, if we do get less regulation. But also notice that suddenly we have to stop the Chinese from ‘breaking in’ and ‘taking our IP,’ and ‘it has to stop.’

Well, how do you intend to stop it? What about people who want to give ours away?

Well, what do you know.

Morgan Phillips (Fox News): DeepSeek fallout: GOP Sen Josh Hawley seeks to cut off all US-China collaboration on AI development

This week the U.S. tech sector was routed by the Chinese launch of DeepSeek, and Sen. Josh Hawley, R-Mo., is putting forth legislation to prevent that from happening again.

Hawley’s bill, the Decoupling America’s Artifical Intelligence Capabilities from China Act, would cut off U.S.-China cooperation on AI. It would ban exports or imports of AI technology from China, ban American companies from conducting research there, and prohibit any U.S. investment in AI tech companies in China.

“Every dollar and gig of data that flows into Chinese AI are dollars and data that will ultimately be used against the United States,” said Hawley in a statement. “America cannot afford to empower our greatest adversary.”

Jingoism is so hot right now. It’s a problem. No, every dollar that flows into China will not ‘be used against the United States’ and seriously what the actual fare you doing, once again, trying to ban both imports and exports? How are both of these things a problem?

In any case, I know what Microsoft is going to do about all this.

Shanghai Panda: Microsoft yesterday: DeepSeek illegally stole OpenAI’s intellectual property.😤

Microsoft today: DeepSeek is now available on our AI platforms and welcome everyone trying it.🤩

Burny: The duality of man.

Microsoft knows what Hawley doesn’t, which in this case is to never interrupt the enemy while he is making a mistake. If DeepSeek wants to then give their results back to us for free, and it’s a good model, who are we to say no?

What other implications are there here?

Robin Hanson, never stop Robin Hansoning, AI skepticism subversion.

Robin Hanson: For folks worried about AI, this seems good news – leaders can’t get much ahead of the pack, & big spillover effects should discourage investment.

Miles Kruppa (WSJ): Why ‘Distillation’ Has Become the Scariest Word for AI Companies.

”It’s sort of like if you got a couple of hours to interview Einstein and you walk out being almost as knowledgeable as him in physics,” said Ali Ghodsi, chief executive officer of data management company Databricks.

Want some bad news for future AI capabilities? I’ve got just the thing for you.

The WSJ article seems to buy into r1-as-distillation. Certainly r1 is a ‘fast follow’ and copies the example of o1, but v3 was the impressive result and definitely not distillation at all, and to primarily call r1 a distillation seems very wrong. r1 does allow you distill r1 into other smaller things (see ‘v3 implies r1’) or bootstrap into larger things too, and also they told everyone how to do it, but they chose that path.

Also DeepSeek suddenly has a very valuable market position if they were to dare to try and use it, exactly because they spent a lot of money to get there first. The fact that others can copy r1 only partly takes that away, and it would be a much smaller part if they hadn’t gone as open as they did (although being open in this case helped create the opportunity). Similarly, Berkeley’s replication distilled a different open model.

ChatGPT has retained dominant market share, at least until now, for reasons that have little to do with technical superiority.

It is crazy how easy it is for people to go all Missile Gap, and claim we are ‘losing to China.’

Which, I suppose, means that in a key way we are indeed losing to China. We are letting them drive this narrative that they are winning, that the future belongs to them. Which, when so many people now believe in Rule By Vibes, means they have the vibes, and then here we are.

That phenomenon is of course centered this week on AI, but it goes well beyond AI.

Et tu, Tyler Cowen, citing ‘the popularity of apps like TikTok, RedNote and DeepSeek.’

I mean, ‘how did America’s internet become so cool? The popularity of apps like Google, Amazon, Instagram and Netflix’ is not a sentence anyone would ever utter these days. If China had America’s apps and America had China’s apps, can you imagine? Or the same for any number of other things.

RedNote is effectively also TikTok, so Tyler is citing two examples. Yes, TikTok cracked the addiction algorithm, and China is now using that for propaganda and general sabotage, espionage and shenanigans purposes, and managed to ‘convince’ Trump for now not to ban it, and people were so desperate for their heroin fix some turned to RedNote as ‘refugees.’

Tyler notes he doesn’t use TikTok much. I find it completely worthless and unusable, but even in so doing I do think I kind of understand, somewhat, the kind of addictive haze that it invokes, that pull of spinning the roulette wheel one more time. I’ve watched people briefly use it when we’re both on trains, and yeah I’m Being That Guy but wow did it seem braindead, worthless and toxic AF. Even if they did find videos worth watching for you, given how people scroll, how would you even know?

And how about ‘China seems cool’ being due primarily to… vibes out of TikTok, with the algorithm that is in large part designed to do that?

It’s like when you periodically see a TikTok where some American youth sobs about how hard her life is and how it’s so much better in China, in various ways that are… documented as all being far worse in China.

You are being played.

My main exposure to TikTok is through the comedy show After Midnight. On Tuesday evening, they had an intro that was entirely about DeepSeek, painting exactly (mostly through TikTok) effectively a Chinese propaganda story about how DeepSeek manifested r1 out of thin air for $6 million without any other work, whereas OpenAI and American companies spent billions, and how much better DeepSeek is, and so on. And then host Taylor Tomlinson responded to some of the audience with ‘oh, you’re cheering now? Interesting.’

Part of the joke was that Taylor has no idea how AI works and has never used even ChatGPT, and the routine was funny (including, effectively, a joke about how no one cares if Nvidia stock is down 17%, which is completely fair, why should they, also by the taping it was only down 8%), but the streams crossed, I saw America directly being exposed to even worse takes than I’m used to straight from TikTok’s algorithm when I was supposed to be relaxing at the end of the day, and I really didn’t like it.

Then again, I do bow to one clear way in which China did outperform us.

Ethan Mollick: People don’t talk enough about a giant DeepSeek achievement over most US models – it actually has a reasonable name.

Scott: Well, yes and no, the model is named r1….

Ethan Mollick: Thats fine as long as the next is r2

If they release anything called r1.5, I swear to God.

Sarah (Yuan Yuan Sun Sara from China) suggests perhaps DeepSeek could get into doing AI safety research, maybe even ask for a grant? Certainly there’s great talent there, and I’d love if they focused on those styles of problem. There’d likely be severe corporate culture issues to get through given what they’ve previously worked on, but it’s worth a shot.

Stephen McAleer: I’m hopeful we will figure out how to control superintelligence!

Fouad: you at the office? could use some code review on superintelligence_control.py before i merge

Stephen McAleer: It can surely wait until Monday.

I increasingly worry about the pattern of OpenAI safety researchers thinking about how to ‘control’ superintelligence rather than align it, and how this relates to the techniques they’re currently using including deliberative alignment.

(Note: I still owe that post on Deliberative Alignment, coming soon.)

Are reasoning models including r1 a blackpill for robotics progress?

Kyle Stachowicz: R1’s RL findings are great news for reasoning but grim for robotics. All the major takeaways (ground-truth reward, great base models, grouped rollouts from same initial state, sample-inefficient on-policy algos) are really hard to translate to the physical world.

Chris Paxton: Hot deepseek take: before r1 blew up, a ton of western AI (and robotics!) efforts — startups, big companies, and even academic labs — were basically just waiting for openai to solve all their problems and it was honestly kind of sad. I hope r1 changed that

Scott Reed: True. A lot of groups gave up prematurely, or allocate ~all resources to one giant model. This leads people to spend more effort on winner-take-all gpu politics and less on just training the best models they can with moderate resources.

If anyone wondered what happened to Gato2, gpu game of thrones is (at least partly) what. An interesting counterfactual was the Genie project, which was stubbornly cobbled together mainly out of pooled user quota. This kind of stubborn independence can lead to cool results!

“Um This scaling law model I made says [the world will end / company will die] if you dont give me all the GPUs and block any other team from pretraining”

“No, fyou, I will train my own model”

Yes and no, right?

  1. Relative to o1 and r1 solving physical tasks as well as they solve reasoning tasks, this is obviously very bad news for robotics.

    1. It is bad relative news for robotics.

  2. Relative to o1 and r1 not existing, and us having to use other models, this is obviously very good news for robotics.

    1. It is good absolute news for robotics.

  3. We can use reasoning models to help us figure out how to solve robotics.

  4. I am not as convinced that you can’t use this method in the real world?

It’s going to be relatively hard, but seems super doable to me, I know those in the field will say that’s naive but I don’t see it. The real physical world absolutely 100% has ground truth in it. If you want to train on an accurate reward signal, there’s various trickiness, but there are plenty of things we should be able to measure. Also, with time we should get increasingly strong physics simulations that provide increasingly strong synthetic data for robotics, or simply have so much funding that we can generate physical samples anyway? We’re sample-inefficient relative to a human but you can train a decent reasoning model on 17k data points, and presumably you could bootstrap from there, and so on.

I am not going to quote or name particular people directly on this at this time.

But as Obama often said, let me be clear.

Reasonable people can disagree about:

  1. What it will take for humans to retain control over the future.

  2. How likely is existential risk at any given capabilities level.

  3. What level of open weights model capabilities is a sane thing to allow.

  4. What legal regimes are best to bring desired future states about.

However.

The existence of DeepSeek, and its explicit advocacy of open weights AGI, and potentially having it be the best model out there in the future in many people’s imginations, has been a forcing function. Suddenly, people who previously stuck to ‘well obviously your restrictions are too much’ without clarifying where their line was, are revealing that they have no line.

And many more people than before are revealing that they prefer any or all of:

  1. AGI with alignment only-to-the-user be made open weights.

  2. Chinese open models be the best instead of American closed models.

  3. A world where humans have no collective mechanism to control AIs.

    1. Usually this is justified as ‘anyone with that power would act badly.’

  4. That they get their cool free toys, nothing else matters, fyou. Seriously.

  5. Are effectively successionists, as in they want the AIs to take over, or at least they don’t seem to mind or don’t think we should try and prevent this from happening.

These people are often saying, rather explicitly, that they will use whatever powers they have at their disposal, to ensure that humanity gets to a position that, if you think about it for a minute or five, humanity probably cannot survive.

And that they will oppose, on principle, any ability to steer the future, because they explicitly oppose the ability to steer the future, except when they want to steer the future into a state that cannot then be steered by humans.

No, I have not heard actual arguments for why or how you can put an aligned-only-to-user AGI into everyone’s desktop or whatever, with no mechanism of collective control over that whatsoever, and have this end well for the humans. What that future would even look like.

Nor have I heard any argument for why the national security states of the world, or the people of the world, would ever allow this.

The mask on those is fully off. These people don’t bother offering arguments on any of that. They just say say, essentially, ‘fyou safetyists,’ ‘fyou big tech,’ ‘fyou United States,’ and often effectively ‘fyou rest of humanity.’ They are the xenocide caucus, advocating for things that cause human extinction to own the in-context-libs.

If that is you: I thank you for your candor. Please speak directly into this microphone.

I disagree in the strongest possible terms.

As always, be excellent to each other, and all that.

A large part of this job I’ve assigned to myself is to do a fton of emotional labor.

You have people who are constantly telling you that you’re a cartoon villain because you think that the United States government might want to know if someone trains a frontier model, or that you might think releasing a literal AGI’s weights would be unwise, or that we shouldn’t let China get our best GPUs. You get called statist and totalitarian for positions that are 95th to 99th percentile libertarian. You get outright lies, all the time, from all directions. Much from people trying to incept the vibes they want. And so on.

And the same stuff to varying degrees coming from other directions, too.

Honestly I’m kind of used to it. Up to a point. You get somewhat numb, you build up some immunity, especially when the same sources do it over and over. I accept it.

And even with that, you have to patiently read all of it and respond to the arguments and also try to extract what wisdom might be there from the same sources that are filled with the toxoplasma of rage and trying their best to infect me and others like me as well.

But it’s been a trying time. I see a world determined to try and go down many of the craziest, most suicidal paths simultaneously, where I’m surrounded by equal and opposite bad takes in many dimensions. Where the odds are against us and the situation is grim. In ways that I and others warned about explicitly, including the exact ways and dynamics by which we reached this point.

Make no mistake. Humanity is losing.

Meanwhile, on top of all the Being Wrong on the Internet, the toxoplasma is as bad as it has ever been, with certain sources going so far as to in large part blame not only worried people in general but also me specifically by name for our current situation – and at least one of those people I feel compelled to continue to listen to because they also have unique insights in other ways and I’m sometimes told I have a blind spot there – which I actually rarely hear about other credible sources.

And I still try. But I’m only human and it’s just so damn hard at this point. Especially when they rage about things I said that turned out to be true, and true for exactly the reasons I said they’d be true, but I know trying to point this out wouldn’t do any good.

I don’t know what my solution here is going to be. I do know that things can’t go on like this, I know life isn’t fair and reality doesn’t grade on a curve and someone has to and no one else will but also I only have so much in the tank that handles these things. And I’m going to have to budget that tank, but I want to be clear that I’m going to be doing that, and dropping certainly sources for this reason that I would otherwise have included for completeness.

If this was talking about you, and you’d like to continue this trip, please get it together.

Don’t worry, your argument remains valid. I mean, it’s wrong, but that never stopped you before, why start now?

Time comes for us all.

Matt: Live players in who kills us first?

Peter Wildeford: Yes, that’s one way to look at it.

Discussion about this post

DeepSeek: Don’t Panic Read More »

report:-deepseek’s-chat-histories-and-internal-data-were-publicly-exposed

Report: DeepSeek’s chat histories and internal data were publicly exposed

A cloud security firm found a publicly accessible, fully controllable database belonging to DeepSeek, the Chinese firm that has recently shaken up the AI world, “within minutes” of examining DeepSeek’s security, according to a blog post by Wiz.

An analytical ClickHouse database tied to DeepSeek, “completely open and unauthenticated,” contained more than 1 million instances of “chat history, backend data, and sensitive information, including log streams, API secrets, and operational details,” according to Wiz. An open web interface also allowed for full database control and privilege escalation, with internal API endpoints and keys available through the interface and common URL parameters.

“While much of the attention around AI security is focused on futuristic threats, the real dangers often come from basic risks—like accidental external exposure of databases,” writes Gal Nagli at Wiz’s blog. “As organizations rush to adopt AI tools and services from a growing number of startups and providers, it’s essential to remember that by doing so, we’re entrusting these companies with sensitive data. The rapid pace of adoption often leads to overlooking security, but protecting customer data must remain the top priority.”

Ars has contacted DeepSeek for comment and will update this post with any response. Wiz noted that it did not receive a response from DeepSeek regarding its findings, but after contacting every DeepSeek email and LinkedIn profile Wiz could find on Wednesday, the company protected the databases Wiz had previously accessed within half an hour.

Report: DeepSeek’s chat histories and internal data were publicly exposed Read More »

i-agree-with-openai:-you-shouldn’t-use-other-peoples’-work-without-permission

I agree with OpenAI: You shouldn’t use other peoples’ work without permission

ChatGPT developer OpenAI and other players in the generative AI business were caught unawares this week by a Chinese company named DeepSeek, whose open source R1 simulated reasoning model provides results similar to OpenAI’s best paid models (with some notable exceptions) despite being created using just a fraction of the computing power.

Since ChatGPT, Stable Diffusion, and other generative AI models first became publicly available in late 2022 and 2023, the US AI industry has been undergirded by the assumption that you’d need ever-greater amounts of training data and compute power to continue improving their models and get—eventually, maybe—to a functioning version of artificial general intelligence, or AGI.

Those assumptions were reflected in everything from Nvidia’s stock price to energy investments and data center plans. Whether DeepSeek fundamentally upends those plans remains to be seen. But at a bare minimum, it has shaken investors who have poured money into OpenAI, a company that reportedly believes it won’t turn a profit until the end of the decade.

OpenAI CEO Sam Altman concedes that the DeepSeek R1 model is “impressive,” but the company is taking steps to protect its models (both language and business); OpenAI told the Financial Times and other outlets that it believed DeepSeek had used output from OpenAI’s models to train the R1 model, a method known as “distillation.” Using OpenAI’s models to train a model that will compete with OpenAI’s models is a violation of the company’s terms of service.

“We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the US government to protect the most capable models being built here,” an OpenAI spokesperson told Ars.

So taking data without permission is bad, now?

I’m not here to say whether the R1 model is the product of distillation. What I can say is that it’s a little rich for OpenAI to suddenly be so very publicly concerned about the sanctity of proprietary data.

I agree with OpenAI: You shouldn’t use other peoples’ work without permission Read More »