AI – Page 57

Big Tech is spending more than VC firms on AI startups

AI, Amazon, chatgpt, Google, microsoft, openai, syndication / Shannon Garcia / December 28, 2023

money cannon —

Microsoft, Google, and Amazon haved crowded out traditional Silicon Valley investors.

George Hammond, Financial Times – Dec 27, 2023 3: 41 pm UTC

Enlarge / A string of deals by Microsoft, Google and Amazon amounted to two-thirds of the $27 billion raised by fledgling AI companies in 2023,

FT montage/Dreamstime

Big tech companies have vastly outspent venture capital groups with investments in generative AI startups this year, as established giants use their financial muscle to dominate the much-hyped sector.

Microsoft, Google and Amazon last year struck a series of blockbuster deals, amounting to two-thirds of the $27 billion raised by fledgling AI companies in 2023, according to new data from private market researchers PitchBook.

The huge outlay, which exploded after the launch of OpenAI’s ChatGPT in November 2022, highlights how the biggest Silicon Valley groups are crowding out traditional tech investors for the biggest deals in the industry.

The rise of generative AI—systems capable of producing humanlike video, text, image and audio in seconds—have also attracted top Silicon Valley investors. But VCs have been outmatched, having been forced to slow down their spending as they adjust to higher interest rates and falling valuations for their portfolio companies.

“Over the past year, we’ve seen the market quickly consolidate around a handful of foundation models, with large tech players coming in and pouring billions of dollars into companies like OpenAI, Cohere, Anthropic and Mistral,” said Nina Achadjian, a partner at US venture firm Index Ventures referring to some of the top AI startups.

“For traditional VCs, you had to be in early and you had to have conviction—which meant being in the know on the latest AI research and knowing which teams were spinning out of Google DeepMind, Meta and others,” she added.

A string of deals, such as Microsoft’s $10 billion investment in OpenAI as well as billions of dollars raised by San Francisco-based Anthropic from both Google and Amazon, helped push overall spending on AI groups to nearly three times as much as the previous record of $11 billion set two years ago.

Venture investing in tech hit record levels in 2021, as investors took advantage of ultra-low interest rates to raise and deploy vast sums across a range of industries, particularly those most disrupted by Covid-19.

Microsoft has also committed $1.3 billion to Inflection, another generative AI start-up, as it looks to steal a march on rivals such as Google and Amazon.

Building and training generative AI tools is an intensive process, requiring immense computing power and cash. As a result, start-ups have preferred to partner with Big Tech companies which can provide cloud infrastructure and access to the most powerful chips as well as dollars.

That has rapidly pushed up the valuations of private start-ups in the space, making it harder for VCs to bet on the companies at the forefront of the technology. An employee stock sale at OpenAI is seeking to value the company at $86 billion, almost treble the valuation it received earlier this year.

“Even the world’s top venture investors, with tens of billions under management, can’t compete to keep these AI companies independent and create new challengers that unseat the Big Tech incumbents,” said Patrick Murphy, founding partner at Tapestry VC, an early-stage venture capital firm.

“In this AI platform shift, most of the potentially one-in-a-million companies to appear so far have been captured by the Big Tech incumbents already.”

VCs are not absent from the market, however. Thrive Capital, Josh Kushner’s New York-based firm, is the lead investor in OpenAI’s employee stock sale, having already backed the company earlier this year. Thrive has continued to invest throughout a downturn in venture spending in 2023.

Paris-based Mistral raised around $500 million from investors including venture firms Andreessen Horowitz and General Catalyst, and chipmaker Nvidia since it was founded in May this year.

Some VCs are seeking to invest in companies building applications that are being built over so-called “foundation models” developed by OpenAI and Anthropic, in much the same way apps began being developed on mobile devices in the years after smartphones were introduced.

“There is this myth that only the foundation model companies matter,” said Sarah Guo, founder of AI-focused venture firm Conviction. “There is a huge space of still-unexplored application domains for AI, and a lot of the most valuable AI companies will be fundamentally new.”

Additional reporting by Tim Bradshaw.

Big Tech is spending more than VC firms on AI startups Read More »

US agency tasked with curbing risks of AI lacks funding to do the job

AI, NIST, Policy, regulation, syndication / Mike M. / December 24, 2023

more dollars needed —

Lawmakers fear the NIST will have to rely on companies developing the technology.

Will Knight, wired.com – Dec 23, 2023 11: 45 am UTC

They know... — Enlarge / They know…

Aurich / Getty

US president Joe Biden’s plan for containing the dangers of artificial intelligencealready risks being derailed by congressional bean counters.

A White House executive order on AI announced in October calls on the US to develop new standards for stress-testing AI systems to uncover their biases, hidden threats, and rogue tendencies. But the agency tasked with setting these standards, the National Institute of Standards and Technology (NIST), lacks the budget needed to complete that work independently by the July 26, 2024, deadline, according to several people with knowledge of the work.

Speaking at the NeurIPS AI conference in New Orleans last week, Elham Tabassi, associate director for emerging technologies at NIST, described this as “an almost impossible deadline” for the agency.

Some members of Congress have grown concerned that NIST will be forced to rely heavily on AI expertise from private companies that, due to their own AI projects, have a vested interest in shaping standards.

The US government has already tapped NIST to help regulate AI. In January 2023 the agency released an AI risk management framework to guide business and government. NIST has also devised ways to measure public trust in new AI tools. But the agency, which standardizes everything from food ingredients to radioactive materials and atomic clocks, has puny resources compared to those of the companies on the forefront of AI. OpenAI, Google, and Meta each likely spent upwards of $100 million to train the powerful language models that undergird applications such as ChatGPT, Bard, and Llama 2.

NIST’s budget for 2023 was $1.6 billion, and the White House has requested that it be increased by 29 percent in 2024 for initiatives not directly related to AI. Several sources familiar with the situation at NIST say that the agency’s current budget will not stretch to figuring out AI safety testing on its own.

On December 16, the same day Tabassi spoke at NeurIPS, six members of Congress signed a bipartisan open letter raising concern about the prospect of NIST enlisting private companies with little transparency. “We have learned that NIST intends to make grants or awards to outside organizations for extramural research,” they wrote. The letter warns that there does not appear to be any publicly available information about how those awards will be decided.

The lawmakers’ letter also claims that NIST is being rushed to define standards even though research into testing AI systems is at an early stage. As a result there is “significant disagreement” among AI experts over how to work on or even measure and define safety issues with the technology, it states. “The current state of the AI safety research field creates challenges for NIST as it navigates its leadership role on the issue,” the letter claims.

NIST spokesperson Jennifer Huergo confirmed that the agency had received the letter and said that it “will respond through the appropriate channels.”

NIST is making some moves that would increase transparency, including issuing a request for information on December 19, soliciting input from outside experts and companies on standards for evaluating and red-teaming AI models. It is unclear if this was a response to the letter sent by the members of Congress.

The concerns raised by lawmakers are shared by some AI experts who have spent years developing ways to probe AI systems. “As a nonpartisan scientific body, NIST is the best hope to cut through the hype and speculation around AI risk,” says Rumman Chowdhury, a data scientist and CEO of Parity Consultingwho specializes in testing AI models for bias and other problems. “But in order to do their job well, they need more than mandates and well wishes.”

Yacine Jernite, machine learning and society lead at Hugging Face, a company that supports open source AI projects, says big tech has far more resources than the agency given a key role in implementing the White House’s ambitious AI plan. “NIST has done amazing work on helping manage the risks of AI, but the pressure to come up with immediate solutions for long-term problems makes their mission extremely difficult,” Jernite says. “They have significantly fewer resources than the companies developing the most visible AI systems.”

Margaret Mitchell, chief ethics scientist at Hugging Face, says the growing secrecy around commercial AI models makes measurement more challenging for an organization like NIST. “We can’t improve what we can’t measure,” she says.

The White House executive order calls for NIST to perform several tasks, including establishing a new Artificial Intelligence Safety Institute to support the development of safe AI. In April, a UK taskforce focused on AI safety was announced. It will receive $126 million in seed funding.

The executive order gave NIST an aggressive deadline for coming up with, among other things, guidelines for evaluating AI models, principles for “red-teaming” (adversarially testing) models, developing a plan to get US-allied nations to agree to NIST standards, and coming up with a plan for “advancing responsible global technical standards for AI development.”

Although it isn’t clear how NIST is engaging with big tech companies, discussions on NIST’s risk management framework, which took place prior to the announcement of the executive order, involved Microsoft; Anthropic, a startup formed by ex-OpenAI employees that is building cutting-edge AI models; Partnership on AI, which represents big tech companies; and the Future of Life Institute, a nonprofit dedicated to existential risk, among others.

“As a quantitative social scientist, I’m both loving and hating that people realize that the power is in measurement,” Chowdhury says.

This story originally appeared on wired.com.

US agency tasked with curbing risks of AI lacks funding to do the job Read More »

Apple wants AI to run directly on its hardware instead of in the cloud

AI, Apple, IPhone, LLMs, syndication / DJ Henderson / December 21, 2023

Making Siri smarter —

iPhone maker wants to catch up to its rivals when it comes to AI.

Tim Bradshaw, Financial Times – Dec 21, 2023 2: 43 pm UTC

Apple’s latest research about running large language models on smartphones offers the clearest signal yet that the iPhone maker plans to catch up with its Silicon Valley rivals in generative artificial intelligence.

The paper, entitled “LLM in a Flash,” offers a “solution to a current computational bottleneck,” its researchers write.

Its approach “paves the way for effective inference of LLMs on devices with limited memory,” they said. Inference refers to how large language models, the large data repositories that power apps like ChatGPT, respond to users’ queries. Chatbots and LLMs normally run in vast data centers with much greater computing power than an iPhone.

The paper was published on December 12 but caught wider attention after Hugging Face, a popular site for AI researchers to showcase their work, highlighted it late on Wednesday. It is the second Apple paper on generative AI this month and follows earlier moves to enable image-generating models such as Stable Diffusion to run on its custom chips.

Device manufacturers and chipmakers are hoping that new AI features will help revive the smartphone market, which has had its worst year in a decade, with shipments falling an estimated 5 percent, according to Counterpoint Research.

Despite launching one of the first virtual assistants, Siri, back in 2011, Apple has been largely left out of the wave of excitement about generative AI that has swept through Silicon Valley in the year since OpenAI launched its breakthrough chatbot ChatGPT. Apple has been viewed by many in the AI community as lagging behind its Big Tech rivals, despite hiring Google’s top AI executive, John Giannandrea, in 2018.

While Microsoft and Google have largely focused on delivering chatbots and other generative AI services over the Internet from their vast cloud computing platforms, Apple’s research suggests that it will instead focus on AI that can run directly on an iPhone.

Apple’s rivals, such as Samsung, are gearing up to launch a new kind of “AI smartphone” next year. Counterpoint estimated more than 100 million AI-focused smartphones would be shipped in 2024, with 40 percent of new devices offering such capabilities by 2027.

The head of the world’s largest mobile chipmaker, Qualcomm chief executive Cristiano Amon, forecast that bringing AI to smartphones would create a whole new experience for consumers and reverse declining mobile sales.

“You’re going to see devices launch in early 2024 with a number of generative AI use cases,” he told the Financial Times in a recent interview. “As those things get scaled up, they start to make a meaningful change in the user experience and enable new innovation which has the potential to create a new upgrade cycle in smartphones.”

More sophisticated virtual assistants will be able to anticipate users’ actions such as texting or scheduling a meeting, he said, while devices will also be capable of new kinds of photo editing techniques.

Google this month unveiled a version of its new Gemini LLM that will run “natively” on its Pixel smartphones.

Running the kind of large AI model that powers ChatGPT or Google’s Bard on a personal device brings formidable technical challenges, because smartphones lack the huge computing resources and energy available in a data center. Solving this problem could mean that AI assistants respond more quickly than they do from the cloud and even work offline.

Ensuring that queries are answered on an individual’s own device without sending data to the cloud is also likely to bring privacy benefits, a key differentiator for Apple in recent years.

“Our experiment is designed to optimize inference efficiency on personal devices,” its researchers said. Apple tested its approach on models including Falcon 7B, a smaller version of an open source LLM originally developed by the Technology Innovation Institute in Abu Dhabi.

Optimizing LLMs to run on battery-powered devices has been a growing focus for AI researchers. Academic papers are not a direct indicator of how Apple intends to add new features to its products, but they offer a rare glimpse into its secretive research labs and the company’s latest technical breakthroughs.

“Our work not only provides a solution to a current computational bottleneck but also sets a precedent for future research,” wrote Apple’s researchers in the conclusion to their paper. “We believe as LLMs continue to grow in size and complexity, approaches like this work will be essential for harnessing their full potential in a wide range of devices and applications.”

Apple did not immediately respond to a request for comment.

Apple wants AI to run directly on its hardware instead of in the cloud Read More »

Child sex abuse images found in dataset training image generators, report says

AI, Policy / Mike M. / December 20, 2023

More than 1,000 known child sexual abuse materials (CSAM) were found in a large open dataset—known as LAION-5B—that was used to train popular text-to-image generators such as Stable Diffusion, Stanford Internet Observatory (SIO) researcher David Thiel revealed on Wednesday.

SIO’s report seems to confirm rumors swirling on the Internet since 2022 that LAION-5B included illegal images, Bloomberg reported. In an email to Ars, Thiel warned that “the inclusion of child abuse material in AI model training data teaches tools to associate children in illicit sexual activity and uses known child abuse images to generate new, potentially realistic child abuse content.”

Thiel began his research in September after discovering in June that AI image generators were being used to create thousands of fake but realistic AI child sex images rapidly spreading on the dark web. His goal was to find out what role CSAM may play in the training process of AI models powering the image generators spouting this illicit content.

“Our new investigation reveals that these models are trained directly on CSAM present in a public dataset of billions of images, known as LAION-5B,” Thiel’s report said. “The dataset included known CSAM scraped from a wide array of sources, including mainstream social media websites”—like Reddit, X, WordPress, and Blogspot—as well as “popular adult video sites”—like XHamster and XVideos.

Shortly after Thiel’s report was published, a spokesperson for LAION, the Germany-based nonprofit that produced the dataset, told Bloomberg that LAION “was temporarily removing LAION datasets from the Internet” due to LAION’s “zero tolerance policy” for illegal content. The datasets will be republished once LAION ensures “they are safe,” the spokesperson said. A spokesperson for Hugging Face, which hosts a link to a LAION dataset that’s currently unavailable, confirmed to Ars that the dataset is now unavailable to the public after being switched to private by the uploader.

Removing the datasets now doesn’t fix any lingering issues with previously downloaded datasets or previously trained models, though, like Stable Diffusion 1.5. Thiel’s report said that Stability AI’s subsequent versions of Stable Diffusion—2.0 and 2.1—filtered out some or most of the content deemed “unsafe,” “making it difficult to generate explicit content.” But because users were dissatisfied by these later, more filtered versions, Stable Diffusion 1.5 remains “the most popular model for generating explicit imagery,” Thiel’s report said.

A spokesperson for Stability AI told Ars that Stability AI is “committed to preventing the misuse of AI and prohibit the use of our image models and services for unlawful activity, including attempts to edit or create CSAM.” The spokesperson pointed out that SIO’s report “focuses on the LAION-5B dataset as a whole,” whereas “Stability AI models were trained on a filtered subset of that dataset” and were “subsequently fine-tuned” to “mitigate residual behaviors.” The implication seems to be that Stability AI’s filtered dataset is not as problematic as the larger dataset.

Stability AI’s spokesperson also noted that Stable Diffusion 1.5 “was released by Runway ML, not Stability AI.” There seems to be some confusion on that point, though, as a Runway ML spokesperson told Ars that Stable Diffusion “was released in collaboration with Stability AI.”

A demo of Stable Diffusion 1.5 noted that the model was “supported by Stability AI” but released by CompVis and Runway. While a YCombinator thread linking to a blog—titled “Why we chose not to release Stable Diffusion 1.5 as quickly”—from Stability AI’s former chief information officer, Daniel Jeffries, may have provided some clarity on this, it has since been deleted.

Runway ML’s spokesperson declined to comment on any updates being considered for Stable Diffusion 1.5 but linked Ars to a Stability AI blog from August 2022 that said, “Stability AI co-released Stable Diffusion alongside talented researchers from” Runway ML.

Stability AI’s spokesperson said that Stability AI does not host Stable Diffusion 1.5 but has taken other steps to reduce harmful outputs. Those include only hosting “versions of Stable Diffusion that include filters” that “remove unsafe content” and “prevent the model from generating unsafe content.”

“Additionally, we have implemented filters to intercept unsafe prompts or unsafe outputs when users interact with models on our platform,” Stability AI’s spokesperson said. “We have also invested in content labelling features to help identify images generated on our platform. These layers of mitigation make it harder for bad actors to misuse AI.”

Beyond verifying 1,008 instances of CSAM in the LAION-5B dataset, SIO found 3,226 instances of suspected CSAM in the LAION dataset. Thiel’s report warned that both figures are “inherently a significant undercount” due to researchers’ limited ability to detect and flag all the CSAM in the datasets. His report also predicted that “the repercussions of Stable Diffusion 1.5’s training process will be with us for some time to come.”

“The most obvious solution is for the bulk of those in possession of LAION‐5B‐derived training sets to delete them or work with intermediaries to clean the material,” SIO’s report said. “Models based on Stable Diffusion 1.5 that have not had safety measures applied to them should be deprecated and distribution ceased where feasible.”

Child sex abuse images found in dataset training image generators, report says Read More »

A song of hype and fire: The 10 biggest AI stories of 2023

AI, AI ethics, AI hype, AI safety, Anthropic, audio synthesis, Bing Chat, Biz & IT, chatgpt, chatgtp, encylopedia, Features, greg brockman, image synthesis, large language models, machine learning, Meta, Meta AI, microsoft, MidJourney, openai, sam altman, Stable Diffusion, text synthesis, video synthesis, will smith, x-risk, Yann LeCun / DJ Henderson / December 18, 2023

An illustration of a robot accidentally setting off a mushroom cloud on a laptop computer. — Getty Images | Benj Edwards

“Here, There, and Everywhere” isn’t just a Beatles song. It’s also a phrase that recalls the spread of generative AI into the tech industry during 2023. Whether you think AI is just a fad or the dawn of a new tech revolution, it’s been impossible to deny that AI news has dominated the tech space for the past year.

We’ve seen a large cast of AI-related characters emerge that includes tech CEOs, machine learning researchers, and AI ethicists—as well as charlatans and doomsayers. From public feedback on the subject of AI, we’ve heard that it’s been difficult for non-technical people to know who to believe, what AI products (if any) to use, and whether we should fear for our lives or our jobs.

Meanwhile, in keeping with a much-lamented trend of 2022, machine learning research has not slowed down over the past year. On X, former Biden administration tech advisor Suresh Venkatasubramanian wrote, “How do people manage to keep track of ML papers? This is not a request for support in my current state of bewilderment—I’m genuinely asking what strategies seem to work to read (or “read”) what appear to be 100s of papers per day.”

To wrap up the year with a tidy bow, here’s a look back at the 10 biggest AI news stories of 2023. It was very hard to choose only 10 (in fact, we originally only intended to do seven), but since we’re not ChatGPT generating reams of text without limit, we have to stop somewhere.

Bing Chat “loses its mind”

In February, Microsoft unveiled Bing Chat, a chatbot built into its languishing Bing search engine website. Microsoft created the chatbot using a more raw form of OpenAI’s GPT-4 language model but didn’t tell everyone it was GPT-4 at first. Since Microsoft used a less conditioned version of GPT-4 than the one that would be released in March, the launch was rough. The chatbot assumed a temperamental personality that could easily turn on users and attack them, tell people it was in love with them, seemingly worry about its fate, and lose its cool when confronted with an article we wrote about revealing its system prompt.

Aside from the relatively raw nature of the AI model Microsoft was using, at fault was a system where very long conversations would push the conditioning system prompt outside of its context window (like a form of short-term memory), allowing all hell to break loose through jailbreaks that people documented on Reddit. At one point, Bing Chat called me “the culprit and the enemy” for revealing some of its weaknesses. Some people thought Bing Chat was sentient, despite AI experts’ assurances to the contrary. It was a disaster in the press, but Microsoft didn’t flinch, and it ultimately reigned in some of Bing Chat’s wild proclivities and opened the bot widely to the public. Today, Bing Chat is now known as Microsoft Copilot, and it’s baked into Windows.

US Copyright Office says no to AI copyright authors

Enlarge / An AI-generated image that won a prize at the Colorado State Fair in 2022, later denied US copyright registration.

Jason M. Allen

In February, the US Copyright Office issued a key ruling on AI-generated art, revoking the copyright previously granted to the AI-assisted comic book “Zarya of the Dawn” in September 2022. The decision, influenced by the revelation that the images were created using the AI-powered Midjourney image generator, stated that only the text and arrangement of images and text by Kashtanova were eligible for copyright protection. It was the first hint that AI-generated imagery without human-authored elements could not be copyrighted in the United States.

This stance was further cemented in August when a US federal judge ruled that art created solely by AI cannot be copyrighted. In September, the US Copyright Office rejected the registration for an AI-generated image that won a Colorado State Fair art contest in 2022. As it stands now, it appears that purely AI-generated art (without substantial human authorship) is in the public domain in the United States. This stance could be further clarified or changed in the future by judicial rulings or legislation.

A song of hype and fire: The 10 biggest AI stories of 2023 Read More »

These AI-generated news anchors are freaking me out

AI, anchors, Channel 1, Features, nescasters, News, Tech, TV news / DJ Henderson / December 15, 2023

Enlarge / Max Headroom as prophecy.

Aurich Lawson | Channel 1

Here at Ars, we’ve long covered the interesting potential and significant peril (and occasional silliness) of AI-generated video featuring increasingly realistic human avatars. Heck, we even went to the trouble of making our own “deepfake” Mark Zuckerberg in 2019, when the underlying technology wasn’t nearly as robust as it is today.

But even with all that background, startup Channel 1‘s vision of a near-future where AI-generated avatars read you the news was a bit of a shock to the system. The company’s recent proof-of-concept “showcase” newscast reveals just how far AI-generated videos of humans have come in a short time and how those realistic avatars could shake up a lot more than just the job market for talking heads.

“…the newscasters have been changed to protect the innocent”

See the highest quality AI footage in the world.

🤯 – Our generated anchors deliver stories that are informative, heartfelt and entertaining.

Watch the showcase episode of our upcoming news network now. pic.twitter.com/61TaG6Kix3

— Channel 1 (@channel1_ai) December 12, 2023

To be clear, Channel 1 isn’t trying to fool people with “deepfakes” of existing news anchors or anything like that. In the first few seconds of its sample newscast, it identifies its talking heads as a “team of AI-generated reporters.” A few seconds later, one of those talking heads explains further: “You can hear us and see our lips moving, but no one was recorded saying what we’re all saying. I’m powered by sophisticated systems behind the scenes.”

Even with those kinds of warnings, I found I had to constantly remind myself that the “people” I was watching deliver the news here were only “based on real people who have been compensated for use of their likeness,” as Deadline reports (how much they were compensated will probably be of great concern to actors who recently went on strike in part over the issue of AI likenesses). Everything from the lip-syncing to the intonations to subtle gestures and body movements of these Channel 1 anchors gives an eerily convincing presentation of a real newscaster talking into the camera.

Sure, if you look closely, there are a few telltale anomalies that expose these reporters as computer creations—slight video distortions around the mouth, say, or overly repetitive hand gestures, or a nonsensical word emphasis choice. But those signs are so small that they would be easy to miss at a casual glance or on a small screen like that on a phone.

In other words, human-looking AI avatars now seem well on their way to climbing out of the uncanny valley, at least when it comes to news anchors who sit at a desk or stand still in front of a green screen. Channel 1 investor Adam Mosam told Deadline it “has gotten to a place where it’s comfortable to watch,” and I have to say I agree.

A Channel 1 clip shows how its system can make video sources appear to speak a different language.

The same technology can be applied to on-the-scene news videos as well. About eight minutes into the sample newscast, Channel 1 shows a video of a European tropical storm victim describing the wreckage in French. Then it shows an AI-generated version of the same footage with the source speaking perfect English, using a facsimile of his original voice and artificial lipsync placed over his mouth.

Without the on-screen warning that this was “AI generated Language: Translated from French,” it would be easy to believe that the video was of an American expatriate rather than a native French speaker. And the effect is much more dramatic than the usual TV news practice of having an unseen interpreter speak over the footage.

These AI-generated news anchors are freaking me out Read More »

If AI is making the Turing test obsolete, what might be better?

AI, Computer science, Science, Turing test / DJ Henderson / December 14, 2023

A white android sitting at a table in a depressed manner with an alchoholic drink. Very high resolution 3D render.

If a machine or an AI program matches or surpasses human intelligence, does that mean it can simulate humans perfectly? If yes, then what about reasoning—our ability to apply logic and think rationally before making decisions? How could we even identify whether an AI program can reason? To try to answer this question, a team of researchers has proposed a novel framework that works like a psychological study for software.

“This test treats an ‘intelligent’ program as though it were a participant in a psychological study and has three steps: (a) test the program in a set of experiments examining its inferences, (b) test its understanding of its own way of reasoning, and (c) examine, if possible, the cognitive adequacy of the source code for the program,” the researchers note.

They suggest the standard methods of evaluating a machine’s intelligence, such as the Turing Test, can only tell you if the machine is good at processing information and mimicking human responses. The current generations of AI programs, such as Google’s LaMDA and OpenAI’s ChatGPT, for example, have come close to passing the Turing Test, yet the test results don’t imply these programs can think and reason like humans.

This is why the Turing Test may no longer be relevant, and there is a need for new evaluation methods that could effectively assess the intelligence of machines, according to the researchers. They claim that their framework could be an alternative to the Turing Test. “We propose to replace the Turing test with a more focused and fundamental one to answer the question: do programs reason in the way that humans reason?” the study authors argue.

What’s wrong with the Turing Test?

During the Turing Test, evaluators play different games involving text-based communications with real humans and AI programs (machines or chatbots). It is a blind test, so evaluators don’t know whether they are texting with a human or a chatbot. If the AI programs are successful in generating human-like responses—to the extent that evaluators struggle to distinguish between the human and the AI program—the AI is considered to have passed. However, since the Turing Test is based on subjective interpretation, these results are also subjective.

The researchers suggest that there are several limitations associated with the Turing Test. For instance, any of the games played during the test are imitation games designed to test whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they receive. ChatGPT is great at mimicking human language, even in responses where it gives out incorrect information. So, the test clearly doesn’t evaluate a machine’s reasoning and logical ability.

The results of the Turing Test also can’t tell you if a machine can introspect. We often think about our past actions and reflect on our lives and decisions, a critical ability that prevents us from repeating the same mistakes. The same applies to AI as well, according to a study from Stanford University which suggests that machines that could self-reflect are more practical for human use.

“AI agents that can leverage prior experience and adapt well by efficiently exploring new or changing environments will lead to much more adaptive, flexible technologies, from household robotics to personalized learning tools,” Nick Haber, an assistant professor from Stanford University who was not involved in the current study, said.

In addition to this, the Turing Test fails to analyze an AI program’s ability to think. In a recent Turing Test experiment, GPT-4 was able to convince evaluators that they were texting with humans over 40 percent of the time. However, this score fails to answer the basic question: Can the AI program think?

Alan Turing, the famous British scientist who created the Turing Test, once said, “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.” His test only covers one aspect of human intelligence, though: imitation. Although it is possible to deceive someone using this one aspect, many experts believe that a machine can never achieve true human intelligence without including those other aspects.

“It’s unclear whether passing the Turing Test is a meaningful milestone or not. It doesn’t tell us anything about what a system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence,” Mustafa Suleyman, an AI expert and founder of DeepAI, told Bloomberg.

If AI is making the Turing test obsolete, what might be better? Read More »

Humana also using AI tool with 90% error rate to deny care, lawsuit claims

AI, claim denial, coverage, health, humana, insurance, insurance coverage, medicare, medicare advantage, nH Predict, nursing home, post-acute care, Science, unitedhealth / DJ Henderson / December 13, 2023

AI denials —

The AI model, nH Predict, is the focus of another lawsuit against UnitedHealth.

Beth Mole – Dec 13, 2023 9: 40 pm UTC

Enlarge / Signage is displayed outside the Humana Inc. office building in Louisville, Kentucky, US, in 2016.

Humana, one the nation’s largest health insurance providers, is allegedly using an artificial intelligence model with a 90 percent error rate to override doctors’ medical judgment and wrongfully deny care to elderly people on the company’s Medicare Advantage plans.

According to a lawsuit filed Tuesday, Humana’s use of the AI model constitutes a “fraudulent scheme” that leaves elderly beneficiaries with either overwhelming medical debt or without needed care that is covered by their plans. Meanwhile, the insurance behemoth reaps a “financial windfall.”

The lawsuit, filed in the US District Court in western Kentucky, is led by two people who had a Humana Medicare Advantage Plan policy and said they were wrongfully denied needed and covered care, harming their health and finances. The suit seeks class-action status for an unknown number of other beneficiaries nationwide who may be in similar situations. Humana provides Medicare Advantage plans for 5.1 million people in the US.

It is the second lawsuit aimed at an insurer’s use of the AI tool nH Predict, which was developed by NaviHealth to forecast how long patients will need care after a medical injury, illness, or event. In November, the estates of two deceased individuals brought a suit against UnitedHealth—the largest health insurance company in the US—for also allegedly using nH Predict to wrongfully deny care.

Humana did not respond to Ars’ request for comment for this story. United Health previously said that “the lawsuit has no merit, and we will defend ourselves vigorously.”

AI model

In both cases, the plaintiffs claim that the insurers use the flawed model to pinpoint the exact date to blindly and illegally cut off payments for post-acute care that is covered under Medicare plans—such as stays in skilled nursing facilities and inpatient rehabilitation centers. The AI-powered model comes up with those dates by comparing a patient’s diagnosis, age, living situation, and physical function to similar patients in a database of 6 million patients. In turn, the model spits out a prediction for the patient’s medical needs, length of stay, and discharge date.

But, the plaintiffs argue that the model fails to account for the entirety of each patient’s circumstances, their doctors’ recommendations, and the patient’s actual conditions. And they claim the predictions are draconian and inflexible. For example, under Medicare Advantage plans, patients who have a three-day hospital stay are typically entitled to up to 100 days of covered care in a nursing home. But with nH Predict in use, patients rarely stay in a nursing home for more than 14 days before claim denials begin.

Though few people appeal coverage denials generally, of those who have appealed the AI-based denials, over 90 percent have gotten the denial reversed, the lawsuits say.

Still, the insurers continue to use the model and NaviHealth employees are instructed to hew closely to the AI-based predictions, keeping lengths of post-acute care to within 1 percent of the days estimated by nH Predict. NaviHealth employees who fail to do so face discipline and firing. ” Humana banks on the patients’ impaired conditions, lack of knowledge, and lack of resources to appeal the wrongful AI-powered decisions,” the lawsuit filed Tuesday claims.

Plaintiff’s cases

One of the plaintiffs in Tuesday’s suit is JoAnne Barrows of Minnesota. On November 23, 2021, Barrows, then 86, was admitted to a hospital after falling at home and fracturing her leg. Doctors put her leg in a cast and issued an order not to put any weight on it for six weeks. On November 26, she was moved to a rehabilitation center for her six-week recovery. But, after just two weeks, Humana’s coverage denials began. Barrows and her family appealed the denials, but Humana denied the appeals, declaring that Barrows was fit to return to her home despite being bedridden and using a catheter.

Her family had no choice but to pay out-of-pocket. They tried moving her to a less expensive facility, but she received substandard care there, and her health declined further. Due to the poor quality of care, the family decided to move her home on December 22, even though she was still unable to use her injured leg, go the bathroom on her own, and still had a catheter.

The other plaintiff is Susan Hagood of North Carolina. On September 10, 2022, Hagood was admitted to a hospital with a urinary tract infection, sepsis, and a spinal infection. She stayed in the hospital until October 26, when she was transferred to a skilled nursing facility. Upon her transfer, she had eleven discharging diagnoses, including sepsis, acute kidney failure, kidney stones, nausea and vomiting, a urinary tract infection, swelling in her spine, and a spinal abscess. In the nursing facility, she was in extreme pain and on the maximum allowable dose of the painkiller oxycodone. She also developed pneumonia.

On November 28, she returned to the hospital for an appointment, at which point her blood pressure spiked, and she was sent to the emergency room. There, doctors found that her condition had considerably worsened.

Meanwhile, a day earlier, on November 27, Humana determined that it would deny coverage of part of her stay at the skilled nursing facility, refusing to pay from November 14 to November 28. Humana said Hagood no longer needed the level of care the facility provided and that she should be discharged home. The family paid $24,000 out-of-pocket for her care, and to date, Hagood remains in a skilled nursing facility.

Overall, the patients claim that Humana and UnitedHealth are aware that nH Predict is “highly inaccurate” but use it anyway to avoid paying for covered care and make more profit. The denials are “systematic, illegal, malicious, and oppressive.”

The lawsuit against Humana alleges breach of contract, unfair dealing, unjust enrichment, and bad faith insurance violations in many states. It seeks damages for financial losses and emotional distress, disgorgement and/or restitution, and to have Humana barred from using the AI-based model to deny claims.

Humana also using AI tool with 90% error rate to deny care, lawsuit claims Read More »

Dropbox spooks users with new AI features that send data to OpenAI when used

AI, AI privacy, Biz & IT, chatgpt, chatgtp, data privacy, dropbox, large language models, machine learning, openai, privacy / DJ Henderson / December 13, 2023

adventures in data consent —

AI feature turned on by default worries users; Dropbox responds to concerns.

Benj Edwards – Updated Dec 13, 2023 7: 41 pm UTC

On Wednesday, news quickly spread on social media about a new enabled-by-default Dropbox setting that shares Dropbox data with OpenAI for an experimental AI-powered search feature, but Dropbox says data is only shared if the feature is actively being used. Dropbox says that user data shared with third-party AI partners isn’t used to train AI models and is deleted within 30 days.

Even with assurances of data privacy laid out by Dropbox on an AI privacy FAQ page, the discovery that the setting had been enabled by default upset some Dropbox users. The setting was first noticed by writer Winifred Burton, who shared information about the Third-party AI setting through Bluesky on Tuesday, and frequent AI critic Karla Ortiz shared more information about it on X.

Wednesday afternoon, Drew Houston, the CEO of Dropbox, apologized for customer confusion in a post on X and wrote, “The third-party AI toggle in the settings menu enables or disables access to DBX AI features and functionality. Neither this nor any other setting automatically or passively sends any Dropbox customer data to a third-party AI service.“

Critics say that communication about the change could have been clearer. AI researcher Simon Willison wrote, “Great example here of how careful companies need to be in clearly communicating what’s going on with AI access to personal data.”

A screenshot of Dropbox's third-party AI feature switch. — Enlarge / A screenshot of Dropbox’s third-party AI feature switch.

Benj Edwards

So why would Dropbox ever send user data to OpenAI anyway? In July, the company announced an AI-powered feature called Dash that allows AI models to perform universal searches across platforms like Google Workspace and Microsoft Outlook.

According to the Dropbox privacy FAQ, the third-party AI opt-out setting is part of the “Dropbox AI alpha,” which is a conversational interface for exploring file contents that involves chatting with a ChatGPT-style bot using an “Ask something about this file” feature. To make it work, an AI language model similar to the one that powers ChatGPT (like GPT-4) needs access to your files.

According to the FAQ, the third-party AI toggle in your account settings is turned on by default if “you or your team” are participating in the Dropbox AI alpha. Still, multiple Ars Technica staff who had no knowledge of the Dropbox AI alpha found the setting enabled by default when they checked.

In a statement to Ars Technica, a Dropbox representative said, “The third-party AI toggle is only turned on to give all eligible customers the opportunity to view our new AI features and functionality, like Dropbox AI. It does not enable customers to use these features without notice. Any features that use third-party AI offer disclosure of third-party use, and link to settings that they can manage. Only after a customer sees the third-party AI transparency banner and chooses to proceed with asking a question about a file, will that file be sent to a third-party to generate answers. Our customers are still in control of when and how they use these features.”

Right now, the only third-party AI provider for Dropbox is OpenAI, writes Dropbox in the FAQ. “Open AI is an artificial intelligence research organization that develops cutting-edge language models and advanced AI technologies. Your data is never used to train their internal models, and is deleted from OpenAI’s servers within 30 days.” It also says, “Only the content relevant to an explicit request or command is sent to our third-party AI partners to generate an answer, summary, or transcript.”

Disabling the feature is easy if you prefer not to use Dropbox AI features. Log into your Dropbox account on a desktop web browser, then click your profile photo > Settings > Third-party AI. This link may take you to that page more quickly. On that page, click the switch beside “Use artificial intelligence (AI) from third-party partners so you can work faster in Dropbox” to toggle it into the “Off” position.

This story was updated on December 13, 2023, at 5: 35 pm ET with clarifications about when and how Dropbox shares data with OpenAI, as well as statements from Dropbox reps and its CEO.

Dropbox spooks users with new AI features that send data to OpenAI when used Read More »

Everybody’s talking about Mistral, an upstart French challenger to OpenAI

AI, Anthropic, Arthur Mensch, Biz & IT, chatgpt, chatgtp, france, GPT-3, GPT-3.5, GPT-4, large language models, LLaMA, Llama 2, machine learning, Meta, Mistral, Mistral AI, Mixtral, Mixtral 8x7B, openai, Paris / Shannon Garcia / December 12, 2023

A challenger appears —

“Mixture of experts” Mixtral 8x7B helps open-weights AI punch above its weight class.

Benj Edwards – Dec 12, 2023 8: 15 pm UTC

An illustrated robot holding a French flag. — Enlarge / An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It’s hard to draw a picture of an LLM, so a robot will have to do.

On Monday, Mistral AI announced a new AI language model called Mixtral 8x7B, a “mixture of experts” (MoE) model with open weights that reportedly truly matches OpenAI’s GPT-3.5 in performance—an achievement that has been claimed by others in the past but is being taken seriously by AI heavyweights such as OpenAI’s Andrej Karpathy and Jim Fan. That means we’re closer to having a ChatGPT-3.5-level AI assistant that can run freely and locally on our devices, given the right implementation.

Mistral, based in Paris and founded by Arthur Mensch, Guillaume Lample, and Timothée Lacroix, has seen a rapid rise in the AI space recently. It has been quickly raising venture capital to become a sort of French anti-OpenAI, championing smaller models with eye-catching performance. Most notably, Mistral’s models run locally with open weights that can be downloaded and used with fewer restrictions than closed AI models from OpenAI, Anthropic, or Google. (In this context “weights” are the computer files that represent a trained neural network.)

Mixtral 8x7B can process a 32K token context window and works in French, German, Spanish, Italian, and English. It works much like ChatGPT in that it can assist with compositional tasks, analyze data, troubleshoot software, and write programs. Mistral claims that it outperforms Meta’s much larger LLaMA 2 70B (70 billion parameter) large language model and that it matches or exceeds OpenAI’s GPT-3.5 on certain benchmarks, as seen in the chart below.

Enlarge / A chart of Mixtral 8x7B performance vs. LLaMA 2 70B and GPT-3.5, provided by Mistral.

Mistral

The speed at which open-weights AI models have caught up with OpenAI’s top offering a year ago has taken many by surprise. Pietro Schirano, the founder of EverArt, wrote on X, “Just incredible. I am running Mistral 8x7B instruct at 27 tokens per second, completely locally thanks to @LMStudioAI. A model that scores better than GPT-3.5, locally. Imagine where we will be 1 year from now.”

LexicaArt founder Sharif Shameem tweeted, “The Mixtral MoE model genuinely feels like an inflection point — a true GPT-3.5 level model that can run at 30 tokens/sec on an M1. Imagine all the products now possible when inference is 100% free and your data stays on your device.” To which Andrej Karpathy replied, “Agree. It feels like the capability / reasoning power has made major strides, lagging behind is more the UI/UX of the whole thing, maybe some tool use finetuning, maybe some RAG databases, etc.”

Mixture of experts

So what does mixture of experts mean? As this excellent Hugging Face guide explains, it refers to a machine-learning model architecture where a gate network routes input data to different specialized neural network components, known as “experts,” for processing. The advantage of this is that it enables more efficient and scalable model training and inference, as only a subset of experts are activated for each input, reducing the computational load compared to monolithic models with equivalent parameter counts.

In layperson’s terms, a MoE is like having a team of specialized workers (the “experts”) in a factory, where a smart system (the “gate network”) decides which worker is best suited to handle each specific task. This setup makes the whole process more efficient and faster, as each task is done by an expert in that area, and not every worker needs to be involved in every task, unlike in a traditional factory where every worker might have to do a bit of everything.

OpenAI has been rumored to use a MoE system with GPT-4, accounting for some of its performance. In the case of Mixtral 8x7B, the name implies that the model is a mixture of eight 7 billion-parameter neural networks, but as Karpathy pointed out in a tweet, the name is slightly misleading because, “it is not all 7B params that are being 8x’d, only the FeedForward blocks in the Transformer are 8x’d, everything else stays the same. Hence also why total number of params is not 56B but only 46.7B.”

Mixtral is not the first “open” mixture of experts model, but it is notable for its relatively small size in parameter count and performance. It’s out now, available on Hugging Face and BitTorrent under the Apache 2.0 license. People have been running it locally using an app called LM Studio. Also, Mistral began offering beta access to an API for three levels of Mistral models on Monday.

Everybody’s talking about Mistral, an upstart French challenger to OpenAI Read More »

AI companion robot helps some seniors fight loneliness, but others hate it

Aging, AI, companion robot, depression, ElliQ, health, loneliness, new york, Science / DJ Henderson / December 11, 2023

AI buddy —

There’s limited evidence for health benefits so far; early work suggests no one-size-fits-all.

Beth Mole – Dec 12, 2023 12: 00 am UTC

Enlarge / ElliQ, an AI companion robot from Intuition Robotics.

Some seniors in New York are successfully combating their loneliness with an AI-powered companion robot named ElliQ—while others called the “proactive” device a nag and joked about taking an ax to it.

The home assistant robot, made by Israel-based Intuition Robotics, is offered to New York seniors through a special program through the state’s Office for the Aging (NYSOFA). Over the past year, NYSOFA has partnered with Intuition Robotics to bring ElliQ to over 800 seniors struggling with loneliness. In a report last week, officials said they had given out hundreds and had only 150 available devices.

ElliQ includes a tablet and a two-piece lamp-like robot with a head that lights up and rotates to face a speaker. Marketed as powered by “Cognitive AI technology,” it proactively engages in conversations with users, giving them reminders and prompts, such as asking them how they’re doing, telling them it’s time to check their blood pressure or take their medicine, and asking if they want to have a video call with family. Speaking with a female voice, the robot is designed to hold human-like conversations, engage in small talk, express empathy, and share humor. It can provide learning and wellness programs, such as audiobooks and relaxation exercises.

Interest in using social robots, such as ElliQ, for elder care has been growing for years, but the field still lacks solid evidence that the devices can significantly improve health, well-being, and depression. Systemic reviews in 2018 found the technology had potential, but studies lacked statistical significance and rigorous design.

The program in New York adds to the buzz but doesn’t offer the high-quality study design that could yield definitive answers. In August, the state released a report on an unspecified number of ElliQ users, which indicated that the device was helpful. Specifically, 59 percent of users reported the device was “very helpful” at reducing loneliness, while 37 percent reported it was “helpful” and only 4 percent reported it as “unhelpful.” Engagement with the device declined over time, with users initially interacting with ElliQ an average of 62 times a day in the first 15 days of use, which fell to 21 times a day between 60 and 90 days and 33 times a day after 180.

Mixed feelings

“We had high hopes for the efficacy of ElliQ, but the results that we’re seeing are truly exceeding our expectations,” Greg Olsen, director of the New York State Office for the Aging, said in a statement at the time of the report’s release. “The data speaks for itself, and the stories that we’re hearing from case managers and clients around the state have been nothing short of unbelievable.”

But other recent data on the potential for companion robots to reduce loneliness has indicated that there’s no one-size-fits-all approach. There are a lot of factors that can influence how individuals perceive such a device. A 2021 qualitative study evaluated the responses from 16 seniors who were asked for feedback on three types of robot companions, including ElliQ. The results were mixed for the proactive robot. While some felt the occasional chattiness of ElliQ would be comforting during an otherwise solitary day, others felt it was intrusive and “nagging.” Some felt the device’s tone was “rude.”

“I don’t know whether that would drive me mental if it kept interrupting me and telling me what to do … I might want to get an ax and cut it up,” one study participant said.

How welcoming a person might be to an assertive AI-assistant like ElliQ may link with a person’s general preferences regarding human company, the authors suggested. Those who value their space and autonomy may be less open to such as device compared with more gregarious seniors.

While some participants said ElliQ’s reminders could be useful, others expressed a deep concern that an overreliance on technology for everyday tasks—like paying bills, taking medications, or turning lights off—could hasten the decline of cognitive and physical abilities. Study participants also raised concerns regarding the inauthenticity of a relationship with a nonhuman, a loss of dignity, and a lack of control. Some disliked that ElliQ couldn’t be fully controlled by the user and was so assertive, which some perceived as pushy. Some worried about feeling embarrassed about being seen interacting with a robot companion. A 2022 study also explored the issue of stigma, with participants expressing that the use of such devices could reinforce stereotypes of aging, including isolation and dependency.

While researchers continue to explore the potential use and design of AI-powered companion robots, anecdotes from New York’s program suggest the tools are clearly helpful for some. One New Yorker named Priscilla told CBS News she found ElliQ helpful.

“She keeps me company. I get depressed real easy. She’s always there. I don’t care what time of day, if I just need somebody to talk to me,” Priscilla said. “I think I said that’s the biggest thing, to hear another voice when you’re lonely.”

AI companion robot helps some seniors fight loneliness, but others hate it Read More »

As ChatGPT gets “lazy,” people test “winter break hypothesis” as the cause

AI, Biz & IT, chatgpt, ChatGPT lazy, chatgtp, GPT-3.5, GPT-4, machine learning, openai, winter break hypothesis / DJ Henderson / December 11, 2023

only 14 shopping days ’til Christmas —

Unproven hypothesis seeks to explain ChatGPT’s seemingly new reluctance to do hard work.

Benj Edwards – Dec 11, 2023 10: 50 pm UTC

A hand moving a wooden calendar piece that says

In late November, some ChatGPT users began to notice that ChatGPT-4 was becoming more “lazy,” reportedly refusing to do some tasks or returning simplified results. Since then, OpenAI has admitted that it’s an issue, but the company isn’t sure why. The answer may be what some are calling “winter break hypothesis.” While unproven, the fact that AI researchers are taking it seriously shows how weird the world of AI language models has become.

“We’ve heard all your feedback about GPT4 getting lazier!” tweeted the official ChatGPT account on Thursday. “We haven’t updated the model since Nov 11th, and this certainly isn’t intentional. model behavior can be unpredictable, and we’re looking into fixing it.”

On Friday, an X account named Martian openly wondered if LLMs might simulate seasonal depression. Later, Mike Swoopskee tweeted, “What if it learned from its training data that people usually slow down in December and put bigger projects off until the new year, and that’s why it’s been more lazy lately?”

Since the system prompt for ChatGPT feeds the bot the current date, people noted, some began to think there may be something to the idea. Why entertain such a weird supposition? Because research has shown that large language models like GPT-4, which powers the paid version of ChatGPT, respond to human-style encouragement, such as telling a bot to “take a deep breath” before doing a math problem. People have also less formally experimented with telling an LLM that it will receive a tip for doing the work, or if an AI model gets lazy, telling the bot that you have no fingers seems to help lengthen outputs.

“Winter break hypothesis” test result screenshots from Rob Lynch on X.
“Winter break hypothesis” test result screenshots from Rob Lynch on X.
“Winter break hypothesis” test result screenshots from Rob Lynch on X.

On Monday, a developer named Rob Lynch announced on X that he had tested GPT-4 Turbo through the API over the weekend and found shorter completions when the model is fed a December date (4,086 characters) than when fed a May date (4,298 characters). Lynch claimed the results were statistically significant. However, a reply from AI researcher Ian Arawjo said that he could not reproduce the results with statistical significance. (It’s worth noting that reproducing results with LLM can be difficult because of random elements at play that vary outputs over time, so people sample a large number of responses.)

As of this writing, others are busy running tests, and the results are inconclusive. This episode is a window into the quickly unfolding world of LLMs and a peek into an exploration into largely unknown computer science territory. As AI researcher Geoffrey Litt commented in a tweet, “funniest theory ever, I hope this is the actual explanation. Whether or not it’s real, [I] love that it’s hard to rule out.”

A history of laziness

One of the reports that started the recent trend of noting that ChatGPT is getting “lazy” came on November 24 via Reddit, the day after Thanksgiving in the US. There, a user wrote that they asked ChatGPT to fill out a CSV file with multiple entries, but ChatGPT refused, saying, “Due to the extensive nature of the data, the full extraction of all products would be quite lengthy. However, I can provide the file with this single entry as a template, and you can fill in the rest of the data as needed.”

On December 1, OpenAI employee Will Depue confirmed in an X post that OpenAI was aware of reports about laziness and was working on a potential fix. “Not saying we don’t have problems with over-refusals (we definitely do) or other weird things (working on fixing a recent laziness issue), but that’s a product of the iterative process of serving and trying to support sooo many use cases at once,” he wrote.

It’s also possible that ChatGPT was always “lazy” with some responses (since the responses vary randomly), and the recent trend made everyone take note of the instances in which they are happening. For example, in June, someone complained of GPT-4 being lazy on Reddit. (Maybe ChatGPT was on summer vacation?)

Also, people have been complaining about GPT-4 losing capability since it was released. Those claims have been controversial and difficult to verify, making them highly subjective.

As Ethan Mollick joked on X, as people discover new tricks to improve LLM outputs, prompting for large language models is getting weirder and weirder: “It is May. You are very capable. I have no hands, so do everything. Many people will die if this is not done well. You really can do this and are awesome. Take a deep breathe and think this through. My career depends on it. Think step by step.”

As ChatGPT gets “lazy,” people test “winter break hypothesis” as the cause Read More »