Deep Research

claude’s-ai-research-mode-now-runs-for-up-to-45-minutes-before-delivering-reports

Claude’s AI research mode now runs for up to 45 minutes before delivering reports

Still, the report contained a direct quote statement from William Higinbotham that appears to combine quotes from two sources not cited in the source list. (One must always be careful with confabulated quotes in AI because even outside of this Research mode, Claude 3.7 Sonnet tends to invent plausible ones to fit a narrative.) We recently covered a study that showed AI search services confabulate sources frequently, and in this case, it appears that the sources Claude Research surfaced, while real, did not always match what is stated in the report.

There’s always room for interpretation and variation in detail, of course, but overall, Claude Research did a relatively good job crafting a report on this particular topic. Still, you’d want to dig more deeply into each source and confirm everything if you used it as the basis for serious research. You can read the full Claude-generated result as this text file, saved in markdown format. Sadly, the markdown version does not include the source URLS found in the Claude web interface.

Integrations feature

Anthropic also announced Thursday that it has broadened Claude’s data access capabilities. In addition to web search and Google Workspace integration, Claude can now search any connected application through the company’s new “Integrations” feature. The feature reminds us somewhat of OpenAI’s ChatGPT Plugins feature from March 2023 that aimed for similar connections, although the two features work differently under the hood.

These Integrations allow Claude to work with remote Model Context Protocol (MCP) servers across web and desktop applications. The MCP standard, which Anthropic introduced last November and we covered in April, connects AI applications to external tools and data sources.

At launch, Claude supports Integrations with 10 services, including Atlassian’s Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid. The company plans to add more partners like Stripe and GitLab in the future.

Each integration aims to expand Claude’s functionality in specific ways. The Zapier integration, for instance, reportedly connects thousands of apps through pre-built automation sequences, allowing Claude to automatically pull sales data from HubSpot or prepare meeting briefs based on calendar entries. With Atlassian’s tools, Anthropic says that Claude can collaborate on product development, manage tasks, and create multiple Confluence pages and Jira work items simultaneously.

Anthropic has made its advanced Research and Integrations features available in beta for users on Max, Team, and Enterprise plans, with Pro plan access coming soon. The company has also expanded its web search feature (introduced in March) to all Claude users on paid plans globally.

Claude’s AI research mode now runs for up to 45 minutes before delivering reports Read More »

what-does-“phd-level”-ai-mean?-openai’s-rumored-$20,000-agent-plan-explained.

What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan explained.

On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent—suggesting a leap in mathematical reasoning capabilities over the previous model.

Benchmarks vs. real-world value

Ideally, potential applications for a true PhD-level AI model would include analyzing medical research data, supporting climate modeling, and handling routine aspects of research work.

The high price points reported by The Information, if accurate, suggest that OpenAI believes these systems could provide substantial value to businesses. The publication notes that SoftBank, an OpenAI investor, has committed to spending $3 billion on OpenAI’s agent products this year alone—indicating significant business interest despite the costs.

Meanwhile, OpenAI faces financial pressures that may influence its premium pricing strategy. The company reportedly lost approximately $5 billion last year covering operational costs and other expenses related to running its services.

News of OpenAI’s stratospheric pricing plans come after years of relatively affordable AI services that have conditioned users to expect powerful capabilities at relatively low costs. ChatGPT Plus remains $20 per month and Claude Pro costs $30 monthly—both tiny fractions of these proposed enterprise tiers. Even ChatGPT Pro’s $200/month subscription is relatively small compared to the new proposed fees. Whether the performance difference between these tiers will match their thousandfold price difference is an open question.

Despite their benchmark performances, these simulated reasoning models still struggle with confabulations—instances where they generate plausible-sounding but factually incorrect information. This remains a critical concern for research applications where accuracy and reliability are paramount. A $20,000 monthly investment raises questions about whether organizations can trust these systems not to introduce subtle errors into high-stakes research.

In response to the news, several people quipped on social media that companies could hire an actual PhD student for much cheaper. “In case you have forgotten,” wrote xAI developer Hieu Pham in a viral tweet, “most PhD students, including the brightest stars who can do way better work than any current LLMs—are not paid $20K / month.”

While these systems show strong capabilities on specific benchmarks, the “PhD-level” label remains largely a marketing term. These models can process and synthesize information at impressive speeds, but questions remain about how effectively they can handle the creative thinking, intellectual skepticism, and original research that define actual doctoral-level work. On the other hand, they will never get tired or need health insurance, and they will likely continue to improve in capability and drop in cost over time.

What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan explained. Read More »

hugging-face-clones-openai’s-deep-research-in-24-hours

Hugging Face clones OpenAI’s Deep Research in 24 hours

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Hugging Face clones OpenAI’s Deep Research in 24 hours Read More »