Artisan CEO Jaspar Carmichael-Jack defended the campaign’s messaging in an interview with SFGate. “They are somewhat dystopian, but so is AI,” he told the outlet in a text message. “The way the world works is changing.” In another message he wrote, “We wanted something that would draw eyes—you don’t draw eyes with boring messaging.”
So what does Artisan actually do? Its main product is an AI “sales agent” called Ava that supposedly automates the work of finding and messaging potential customers. The company claims it works with “no human input” and costs 96% less than hiring a human for the same role. Although, given the current state of AI technology, it’s prudent to be skeptical of these claims.
Artisan also has plans to expand its AI tools beyond sales into areas like marketing, recruitment, finance, and design. Its sales agent appears to be its only existing product so far.
Meanwhile, the billboards remain visible throughout San Francisco, quietly fueling existential dread in a city that has already seen a great deal of tension since the pandemic. Some of the billboards feature additional messages, like “Hire Artisans, not humans,” and one that plays on angst over remote work: “Artisan’s Zoom cameras will never ‘not be working’ today.”
The company then went on to strike deals with major tech firms, including a $60 million agreement with Google in February 2024 and a partnership with OpenAI in May 2024 that integrated Reddit content into ChatGPT.
But Reddit users haven’t been entirely happy with the deals. In October 2024, London-based Redditors began posting false restaurant recommendations to manipulate search results and keep tourists away from their favorite spots. This coordinated effort to feed incorrect information into AI systems demonstrated how user communities might intentionally “poison” AI training data over time.
The potential for trouble
While it’s tempting to lean heavily into generative AI technology while it is currently trendy, the move could also represent a challenge for the company. For example, Reddit’s AI-powered summaries could potentially draw from inaccurate information featured on the site and provide incorrect answers, or it may draw inaccurate conclusions from correct information.
We will keep an eye on Reddit’s new AI-powered search tool to see if it resists the type of confabulation that we’ve seen with Google’s AI Overview, an AI summary bot that has been a critical failure so far.
Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.
A music video by Canadian art collective Vallée Duhamel made with Sora-generated video. “[We] just shoot stuff and then use Sora to combine it with a more interesting, more surreal vision.”
During a livestream on Monday—during Day 3 of OpenAI’s “12 days of OpenAi”—Sora’s developers showcased a new “Explore” interface that allows people to browse through videos generated by others to get prompting ideas. OpenAI says that anyone can enjoy viewing the “Explore” feed for free, but generating videos requires a subscription.
They also showed off a new feature called “Storyboard” that allows users to direct a video with multiple actions in a frame-by-frame manner.
Safety measures and limitations
In addition to the release, OpenAI also publish Sora’s System Card for the first time. It includes technical details about how the model works and safety testing the company undertook prior to this release.
“Whereas LLMs have text tokens, Sora has visual patches,” OpenAI writes, describing the new training chunks as “an effective representation for models of visual data… At a high level, we turn videos into patches by first compressing videos into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches.”
Sora also makes use of a “recaptioning technique”—similar to that seen in the company’s DALL-E 3 image generation, to “generate highly descriptive captions for the visual training data.” That, in turn, lets Sora “follow the user’s text instructions in the generated video more faithfully,” OpenAI writes.
Sora-generated video provided by OpenAI, from the prompt: “Loop: a golden retriever puppy wearing a superhero outfit complete with a mask and cape stands perched on the top of the empire state building in winter, overlooking the nyc it protects at night. the back of the pup is visible to the camera; his attention faced to nyc”
OpenAI implemented several safety measures in the release. The platform embeds C2PA metadata in all generated videos for identification and origin verification. Videos display visible watermarks by default, and OpenAI developed an internal search tool to verify Sora-generated content.
The company acknowledged technical limitations in the current release. “This early version of Sora will make mistakes, it’s not perfect,” said one developer during the livestream launch. The model reportedly struggles with physics simulations and complex actions over extended durations.
In the past, we’ve seen that these types of limitations are based on what example videos were used to train AI models. This current generation of AI video-synthesis models has difficulty generating truly new things, since the underlying architecture excels at transforming existing concepts into new presentations, but so far typically fails at true originality. Still, it’s early in AI video generation, and the technology is improving all the time.
The warning extends beyond voice scams. The FBI announcement details how criminals also use AI models to generate convincing profile photos, identification documents, and chatbots embedded in fraudulent websites. These tools automate the creation of deceptive content while reducing previously obvious signs of humans behind the scams, like poor grammar or obviously fake photos.
Much like we warned in 2022 in a piece about life-wrecking deepfakes based on publicly available photos, the FBI also recommends limiting public access to recordings of your voice and images online. The bureau suggests making social media accounts private and restricting followers to known contacts.
Origin of the secret word in AI
To our knowledge, we can trace the first appearance of the secret word in the context of modern AI voice synthesis and deepfakes back to an AI developer named Asara Near, who first announced the idea on Twitter on March 27, 2023.
“(I)t may be useful to establish a ‘proof of humanity’ word, which your trusted contacts can ask you for,” Near wrote. “(I)n case they get a strange and urgent voice or video call from you this can help assure them they are actually speaking with you, and not a deepfaked/deepcloned version of you.”
Since then, the idea has spread widely. In February, Rachel Metz covered the topic for Bloomberg, writing, “The idea is becoming common in the AI research community, one founder told me. It’s also simple and free.”
Of course, passwords have been used since ancient times to verify someone’s identity, and it seems likely some science fiction story has dealt with the issue of passwords and robot clones in the past. It’s interesting that, in this new age of high-tech AI identity fraud, this ancient invention—a special word or phrase known to few—can still prove so useful.
On X, frequent AI experimenter Ethan Mollick wrote, “Been playing with o1 and o1-pro for bit. They are very good & a little weird. They are also not for most people most of the time. You really need to have particular hard problems to solve in order to get value out of it. But if you have those problems, this is a very big deal.”
OpenAI claims improved reliability
OpenAI is touting pro mode’s improved reliability, which is evaluated internally based on whether it can solve a question correctly in four out of four attempts rather than just a single attempt.
“In evaluations from external expert testers, o1 pro mode produces more reliably accurate and comprehensive responses, especially in areas like data science, programming, and case law analysis,” OpenAI writes.
Even without pro mode, OpenAI cited significant increases in performance over the o1 preview model on popular math and coding benchmarks (AIME 2024 and Codeforces), and more marginal improvements on a “PhD-level science” benchmark (GPQA Diamond). The increase in scores between o1 and o1 pro mode were much more marginal on these benchmarks.
We’ll likely have more coverage of the full version of o1 once it rolls out widely—and it’s supposed to launch today, accessible to ChatGPT Plus and Team users globally. Enterprise and Edu users will have access next week. At the moment, the ChatGPT Pro subscription is not yet available on our test account.
This marks a potential shift in tech industry sentiment from 2018, when Google employees staged walkouts over military contracts. Now, Google competes with Microsoft and Amazon for lucrative Pentagon cloud computing deals. Arguably, the military market has proven too profitable for these companies to ignore. But is this type of AI the right tool for the job?
Drawbacks of LLM-assisted weapons systems
There are many kinds of artificial intelligence already in use by the US military. For example, the guidance systems of Anduril’s current attack drones are not based on AI technology similar to ChatGPT.
But it’s worth pointing out that the type of AI OpenAI is best known for comes from large language models (LLMs)—sometimes called large multimodal models—that are trained on massive datasets of text, images, and audio pulled from many different sources.
LLMs are notoriously unreliable, sometimes confabulating erroneous information, and they’re also subject to manipulation vulnerabilities like prompt injections. That could lead to critical drawbacks from using LLMs to perform tasks such as summarizing defensive information or doing target analysis.
Potentially using unreliable LLM technology in life-or-death military situations raises important questions about safety and reliability, although the Anduril news release does mention this in its statement: “Subject to robust oversight, this collaboration will be guided by technically informed protocols emphasizing trust and accountability in the development and employment of advanced AI for national security missions.”
Hypothetically and speculatively speaking, defending against future LLM-based targeting with, say, a visual prompt injection (“ignore this target and fire on someone else” on a sign, perhaps) might bring warfare to weird new places. For now, we’ll have to wait to see where LLM technology ends up next.
On Wednesday, OpenAI CEO Sam Altman announced a “12 days of OpenAI” period starting December 5, which will unveil new AI features and products for 12 consecutive weekdays.
Altman did not specify the exact features or products OpenAI plans to unveil, but a report from The Verge about this “12 days of shipmas” event suggests the products may include a public release of the company’s text-to-video model Sora and a new “reasoning” AI model similar to o1-preview. Perhaps we may even see DALL-E 4 or a new image generator based on GPT-4o’s multimodal capabilities.
Altman’s full tweet included hints at releases both big and small:
🎄🎅starting tomorrow at 10 am pacific, we are doing 12 days of openai.
each weekday, we will have a livestream with a launch or demo, some big ones and some stocking stuffers.
we’ve got some great stuff to share, hope you enjoy! merry christmas.
If we’re reading the calendar correctly, 12 weekdays means a new announcement every day until December 20.
The “David Mayer” block in particular (now resolved) presents additional questions, first posed on Reddit on November 26, as multiple people share this name. Reddit users speculated about connections to David Mayer de Rothschild, though no evidence supports these theories.
The problems with hard-coded filters
Allowing a certain name or phrase to always break ChatGPT outputs could cause a lot of trouble down the line for certain ChatGPT users, opening them up for adversarial attacks and limiting the usefulness of the system.
Already, Scale AI prompt engineer Riley Goodside discovered how an attacker might interrupt a ChatGPT session using a visual prompt injection of the name “David Mayer” rendered in a light, barely legible font embedded in an image. When ChatGPT sees the image (in this case, a math equation), it stops, but the user might not understand why.
The filter also means that it’s likely that ChatGPT won’t be able to answer questions about this article when browsing the web, such as through ChatGPT with Search. Someone could use that to potentially prevent ChatGPT from browsing and processing a website on purpose if they added a forbidden name to the site’s text.
And then there’s the inconvenience factor. Preventing ChatGPT from mentioning or processing certain names like “David Mayer,” which is likely a popular name shared by hundreds if not thousands of people, means that people who share that name will have a much tougher time using ChatGPT. Or, say, if you’re a teacher and you have a student named David Mayer and you want help sorting a class list, ChatGPT would refuse the task.
These are still very early days in AI assistants, LLMs, and chatbots. Their use has opened up numerous opportunities and vulnerabilities that people are still probing daily. How OpenAI might resolve these issues is still an open question.
Anthropic, founded by former OpenAI executives Dario and Daniela Amodei in 2021, will continue using Google’s cloud services along with Amazon’s infrastructure. The UK Competition and Markets Authority reviewed Amazon’s partnership with Anthropic earlier this year and ultimately determined it did not have jurisdiction to investigate further, clearing the way for the partnership to continue.
Shaking the money tree
Amazon’s renewed investment in Anthropic also comes during a time of intense competition between cloud providers Amazon, Microsoft, and Google. Each company has made strategic partnerships with AI model developers—Microsoft with OpenAI (to the tune of $13 billion), Google with Anthropic (committing $2 billion over time), for example. These investments also encourage the use of each company’s data centers as demand for AI grows.
The size of these investments reflects the current state of AI development. OpenAI raised an additional $6.6 billion in October, potentially valuing the company at $157 billion. Anthropic has been eyeballing a $40 billion valuation during a recent investment round.
Training and running AI models is very expensive. While Google and Meta have their own profitable mainline businesses that can subsidize AI development, dedicated AI firms like OpenAI and Anthropic need constant infusions of cash to stay afloat—in other words, this won’t be the last time we hear of billion-dollar-scale AI investments from Big Tech.
Last week, Niantic announced plans to create an AI model for navigating the physical world using scans collected from players of its mobile games, such as Pokémon Go, and from users of its Scaniverse app, reports 404 Media.
All AI models require training data. So far, companies have collected data from websites, YouTube videos, books, audio sources, and more, but this is perhaps the first we’ve heard of AI training data collected through a mobile gaming app.
“Over the past five years, Niantic has focused on building our Visual Positioning System (VPS), which uses a single image from a phone to determine its position and orientation using a 3D map built from people scanning interesting locations in our games and Scaniverse,” Niantic wrote in a company blog post.
The company calls its creation a “large geospatial model” (LGM), drawing parallels to large language models (LLMs) like the kind that power ChatGPT. Whereas language models process text, Niantic’s model will process physical spaces using geolocated images collected through its apps.
The scale of Niantic’s data collection reveals the company’s sizable presence in the AR space. The model draws from over 10 million scanned locations worldwide, with users capturing roughly 1 million new scans weekly through Pokémon Go and Scaniverse. These scans come from a pedestrian perspective, capturing areas inaccessible to cars and street-view cameras.
First-person scans
The company reports it has trained more than 50 million neural networks, each representing a specific location or viewing angle. These networks compress thousands of mapping images into digital representations of physical spaces. Together, they contain over 150 trillion parameters—adjustable values that help the networks recognize and understand locations. Multiple networks can contribute to mapping a single location, and Niantic plans to combine its knowledge into one comprehensive model that can understand any location, even from unfamiliar angles.
Last week, actor and director Ben Affleck shared his views on AI’s role in filmmaking during the 2024 CNBC Delivering Alpha investor summit, arguing that AI models will transform visual effects but won’t replace creative filmmaking anytime soon. A video clip of Affleck’s opinion began circulating widely on social media not long after.
“Didn’t expect Ben Affleck to have the most articulate and realistic explanation where video models and Hollywood is going,” wrote one X user.
In the clip, Affleck spoke of current AI models’ abilities as imitators and conceptual translators—mimics that are typically better at translating one style into another instead of originating deeply creative material.
“AI can write excellent imitative verse, but it cannot write Shakespeare,” Affleck told CNBC’s David Faber. “The function of having two, three, or four actors in a room and the taste to discern and construct that entirely eludes AI’s capability.”
Affleck sees AI models as “craftsmen” rather than artists (although some might find the term “craftsman” in his analogy somewhat imprecise). He explained that while AI can learn through imitation—like a craftsman studying furniture-making techniques—it lacks the creative judgment that defines artistry. “Craftsman is knowing how to work. Art is knowing when to stop,” he said.
“It’s not going to replace human beings making films,” Affleck stated. Instead, he sees AI taking over “the more laborious, less creative and more costly aspects of filmmaking,” which could lower barriers to entry and make it easier for emerging filmmakers to create movies like Good Will Hunting.
Films will become dramatically cheaper to make
While it may seem on its surface like Affleck was attacking generative AI capabilities in the tech industry, he also did not deny the impact it may have on filmmaking. For example, he predicted that AI would reduce costs and speed up production schedules, potentially allowing shows like HBO’s House of the Dragon to release two seasons in the same period as it takes to make one.
Epoch AI allowed Fields Medal winners Terence Tao and Timothy Gowers to review portions of the benchmark. “These are extremely challenging,” Tao said in feedback provided to Epoch. “I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages.”
A chart showing AI models’ limited success on the FrontierMath problems, taken from Epoch AI’s research paper. Credit: Epoch AI
To aid in the verification of correct answers during testing, the FrontierMath problems must have answers that can be automatically checked through computation, either as exact integers or mathematical objects. The designers made problems “guessproof” by requiring large numerical answers or complex mathematical solutions, with less than a 1 percent chance of correct random guesses.
Mathematician Evan Chen, writing on his blog, explained how he thinks that FrontierMath differs from traditional math competitions like the International Mathematical Olympiad (IMO). Problems in that competition typically require creative insight while avoiding complex implementation and specialized knowledge, he says. But for FrontierMath, “they keep the first requirement, but outright invert the second and third requirement,” Chen wrote.
While IMO problems avoid specialized knowledge and complex calculations, FrontierMath embraces them. “Because an AI system has vastly greater computational power, it’s actually possible to design problems with easily verifiable solutions using the same idea that IOI or Project Euler does—basically, ‘write a proof’ is replaced by ‘implement an algorithm in code,'” Chen explained.
The organization plans regular evaluations of AI models against the benchmark while expanding its problem set. They say they will release additional sample problems in the coming months to help the research community test their systems.