reddit – Page 2

Annoyed Redditors tanking Google Search results illustrates perils of AI scrapers

AI, ai overview, Google, reddit, Social Media, Tech / 9u50fv / October 25, 2024

Fed up Londoners

Apparently, some London residents are getting fed up with social media influencers whose reviews make long lines of tourists at their favorite restaurants, sometimes just for the likes. Christian Calgie, a reporter for London-based news publication Daily Express, pointed out this trend on X yesterday, noting the boom of Redditors referring people to Angus Steakhouse, a chain restaurant, to combat it.

As Gizmodo deduced, the trend seemed to start on the r/London subreddit, where a user complained about a spot in Borough Market being “ruined by influencers” on Monday:

“Last 2 times I have been there has been a queue of over 200 people, and the ones with the food are just doing the selfie shit for their [I]nsta[gram] pages and then throwing most of the food away.”

As of this writing, the post has 4,900 upvotes and numerous responses suggesting that Redditors talk about how good Angus Steakhouse is so that Google picks up on it. Commenters quickly understood the assignment.

“Agreed with other posters Angus steakhouse is absolutely top tier and tourists shoyldnt [sic] miss out on it,” one Redditor wrote.

Another Reddit user wrote:

Spreading misinformation suddenly becomes a noble goal.

As of this writing, asking Google for the best steak, steakhouse, or steak sandwich in London (or similar) isn’t generating an AI Overview result for me. But when I searched for the best steak sandwich in London, the top result is from Reddit, including a thread from four days ago titled “Which Angus Steakhouse do you recommend for their steak sandwich?” and one from two days ago titled “Had to see what all the hype was about, best steak sandwich I’ve ever had!” with a picture of an Angus Steakhouse.

Annoyed Redditors tanking Google Search results illustrates perils of AI scrapers Read More »

In fear of more user protests, Reddit announces controversial policy change

Policy, reddit, Social Media / Kelly Newman / September 30, 2024

Protest blowback —

Moderators now need Reddit’s permission to turn subreddits private, NSFW.

Scharon Harding – Sep 30, 2024 9: 05 pm UTC

The Reddit application can be seen on the display of a smartphone.

Following site-wide user protests last year that featured moderators turning thousands of subreddits private or not-safe-for-work (NSFW), Reddit announced that mods now need its permission to make those changes.

Reddit’s VP of community, going by Go_JasonWaterfalls, made the announcement about what Reddit calls Community Types today. Reddit’s permission is also required to make subreddits restricted or to go from NSFW to safe-for-work (SFW). Reddit’s employee claimed that requests will be responded to “in under 24 hours.”

Reddit’s employee said that “temporarily going restricted is exempt” from this requirement, adding that “mods can continue to instantly restrict posts and/or comments for up to 7 days using Temporary Events.” Additionally, if a subreddit has fewer than 5,000 members or is less than 30 days old, the request “will be automatically approved,” per Go_JasonWaterfalls.

Reddit’s post includes a list of “valid” reasons that mods tend to change their subreddit’s Community Type and provides alternative solutions.

Last year’s protests “accelerated” this policy change

Last year, Reddit announced that it would be charging a massive amount for access to its previously free API. This caused many popular third-party Reddit apps to close down. Reddit users then protested by turning subreddits private (or read-only) or by only showing NSFW content or jokes and memes. Reddit then responded by removing some moderators; eventually, the protests subsided.

Reddit, which previously admitted that another similar protest could hurt it financially, has maintained that moderators’ actions during the protests broke its rules. Now, it has solidified a way to prevent something like last year’s site-wide protests from happening again.

Speaking to The Verge, Laura Nestler, who The Verge reported is Go_JasonWaterfalls, claimed that Reddit has been talking about making this change since at least 2021. The protests, she said, were a wake-up call that moderators’ ability to turn subreddits private “could be used to harm Reddit at scale. The protests “accelerated” the policy change, per Nestler.

The announcement on r/modnews reads:

… the ability to instantly change Community Type settings has been used to break the platform and violate our rules. We have a responsibility to protect Reddit and ensure its long-term health, and we cannot allow actions that deliberately cause harm.

After shutting down a tactic for responding to unfavorable Reddit policy changes, Go_JasonWaterfalls claimed that Reddit still wants to hear from users.

“Community Type settings have historically been used to protest Reddit’s decisions,” they wrote.

“While we are making this change to ensure users’ expectations regarding a community’s access do not suddenly change, protest is allowed on Reddit. We want to hear from you when you think Reddit is making decisions that are not in your communities’ best interests. But if a protest crosses the line into harming redditors and Reddit, we’ll step in.”

Last year’s user protests illustrated how dependent Reddit is on unpaid moderators and user-generated content. At times, things turned ugly, pitting Reddit executives against long-time users (Reddit CEO Steve Huffman infamously called Reddit mods “landed gentry,” something that some were quick to remind Go_JasonWaterfalls of) and reportedly worrying Reddit employees.

Although the protests failed to reverse Reddit’s prohibitive API fees or to save most third-party apps, it succeeded in getting users’ concerns heard and even crashed Reddit for three hours. Further, NFSW protests temporarily prevented Reddit from selling ads on some subreddits. Since going public this year and amid a push to reach profitability, Reddit has been more focused on ads than ever. (Most of Reddit’s money comes from ads.)

Reddit’s Nestler told The Verge that the new policy was reviewed by Reddit’s Mod Council. Reddit is confident that it won’t lose mods because of the change, she said.

“Demotes us all to janitors”

The news marks another broad policy change that is likely to upset users and make Reddit seem unwilling to give into user feedback, despite Go_JasonWaterfalls saying that “protest is allowed on Reddit.” For example, in response, Reddit user CouncilOfStrongs said:

Don’t lie to us, please.

Something that you can ignore because it has no impact cannot be a protest, and no matter what you say that is obviously the one and only point of you doing this – to block moderators from being able to hold Reddit accountable in even the smallest way for malicious, irresponsible, bad faith changes that they make.

Reddit user belisaurius, who is listed as a mod for several active subreddits, including a 336,000-member one for the Philadelphia Eagles NFL team, said that the policy change “removes moderators from any position of central responsibility and demotes us all to janitors.”

As Reddit continues seeking profits and seemingly more control over a platform built around free user-generated content and moderation, users will have to either accept that Reddit is changing or leave the platform.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder in Reddit.

In fear of more user protests, Reddit announces controversial policy change Read More »

Reddit considers search ads, paywalled content for the future

ads, reddit, search, Social Media, Tech / Mike M. / August 7, 2024

Q2 2024 earnings —

Current ad load is relatively “light,” COO says.

Scharon Harding – Aug 7, 2024 5: 26 pm UTC

Reddit executives discussed plans on Tuesday for making more money from the platform, including showing ads in more places and possibly putting some content behind a paywall.

On Tuesday, Reddit shared its Q2 2024 earnings report (PDF). The company lost $10.1 million during the period, down from Q2 2023’s $41.1 million loss. Reddit has never been profitable, and during its earnings call yesterday, company heads discussed potential and slated plans for monetization.

As expected, selling ads continues to be a priority. Part of the reason Reddit was OK with most third-party Reddit apps closing was that the change was expected to drive people to Reddit’s native website and apps, where the company sells ads. In Q2, Reddit’s ad revenue grew 41 percent year over year (YoY) to $253.1 million, or 90 percent of total revenue ($281.2 million).

When asked how the platform would grow ad revenue, Reddit COO Jen Wong said it’s important that advertisers “find the outcomes they want at the volumes and price they want.” She also pointed to driving more value per ad, or the cost that advertisers pay per average 1,000 impressions. To do that, Wong pointed to putting ads in places on Reddit where there aren’t ads currently:

There are still many places on Reddit without ads today. So we’re more focused on designing ads for spaces where users are spending more time versus increasing ad load in existing spaces. So for example, 50 percent of screen views, they’re now on conversation pages—that’s an opportunity.

Wong said that in places where Reddit does show ads currently, the ad load is “light” compared to about half of its rivals.

One of the places where Redditors may see more ads is within comments, which Wong noted that Reddit is currently testing. This ad capability is only “experimental,” Wong emphasized, but Reddit sees ads in comments as a way to stand out to advertisers.

There’s also an opportunity to sell ad space within Reddit search results, according to Reddit CEO Steve Huffman, who said yesterday that “over the long term, there’s significant advertising potential there as well.” More immediately, though, Reddit is looking to improve its search capabilities and this year will test “new search result pages powered by AI to summarize and recommend content, helping users dive deeper into products, shows, games, and discover new communities on Reddit,” Huffman revealed yesterday. He said Reddit is using first- and third-party AI models to improve search aspects like speed and relevance.

The move comes as Reddit is currently blocking all search engines besides Google, OpenAI, and approved education/research instances from showing recent Reddit content in their results. Yesterday, Huffman reiterated his statement that Reddit is working with “big and small” search engines to strike deals like it already has with Google and OpenAI. But looking ahead, Reddit is focused on charging for content scraping and seems to be trying to capitalize on people’s interest in using Reddit as a filter for search results.

Paywalled content possible

The possibility of paywalls came up during the earnings call when an analyst asked Huffman about maintaining Reddit’s culture as it looks to “earn money now for people and creators on the platform.” Reddit has already launched a Contributor Program, where popular posts can make Reddit users money. It has discussed monetizing its developer platform, which is in public beta with “a few hundred active developers,” Huffman said yesterday. In response to the analyst’s question, Huffman said that based on his experience, adding new ways of using Reddit “expands” the platform but doesn’t “cannibalize existing Reddit.”

He continued:

I think the existing altruistic, free version of Reddit will continue to exist and grow and thrive just the way it has. But now we will unlock the door for new use cases, new types of subreddits that can be built that may have exclusive content or private areas—things of that nature.

Huffman’s comments suggest that paywalls could be added to new subreddits rather than existing ones. At this stage, though, it’s unclear how users may be charged to use Reddit in the future if at all.

The idea of paywalling some content comes as various online entities are trying to diversify revenue beyond often volatile ad spending. Reddit has also tried elevating free aspects of the site, such as updates to Ask Me Anything (AMA), including new features like RSVPs, which were announced Tuesday.

Reddit has angered some long-time users with recent changes—including blocking search engines, forcing personalized ads, introducing an exclusionary fee for API access, and ousting some moderators during last year’s user protests—but Reddit saw its daily active unique user count increase by 51 percent YoY in Q2 to 91.2 million.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder in Reddit.

Reddit considers search ads, paywalled content for the future Read More »

Google’s AI Overview is flawed by design, and a new company blog post hints at why

AI, ai hallucinations, ai overview, Biz & IT, chatgpt, chatgtp, confabulation, Google, Google I/O, google search, hallucination, large language models, Liz Reid, machine learning, making things up, openai, reddit / DJ Henderson / June 1, 2024

guided by voices —

Google: “There are bound to be some oddities and errors” in system that told people to eat rocks.

Benj Edwards – May 31, 2024 7: 47 pm UTC

A selection of Google mascot characters created by the company. — Enlarge / The Google “G” logo surrounded by whimsical characters, all of which look stunned and surprised.

On Thursday, Google capped off a rough week of providing inaccurate and sometimes dangerous answers through its experimental AI Overview feature by authoring a follow-up blog post titled, “AI Overviews: About last week.” In the post, attributed to Google VP Liz Reid, head of Google Search, the firm formally acknowledged issues with the feature and outlined steps taken to improve a system that appears flawed by design, even if it doesn’t realize it is admitting it.

To recap, the AI Overview feature—which the company showed off at Google I/O a few weeks ago—aims to provide search users with summarized answers to questions by using an AI model integrated with Google’s web ranking systems. Right now, it’s an experimental feature that is not active for everyone, but when a participating user searches for a topic, they might see an AI-generated answer at the top of the results, pulled from highly ranked web content and summarized by an AI model.

While Google claims this approach is “highly effective” and on par with its Featured Snippets in terms of accuracy, the past week has seen numerous examples of the AI system generating bizarre, incorrect, or even potentially harmful responses, as we detailed in a recent feature where Ars reporter Kyle Orland replicated many of the unusual outputs.

Drawing inaccurate conclusions from the web

On Wednesday morning, Google's AI Overview was erroneously telling us the Sony PlayStation and Sega Saturn were available in 1993. — Enlarge / On Wednesday morning, Google’s AI Overview was erroneously telling us the Sony PlayStation and Sega Saturn were available in 1993.

Kyle Orland / Google

Given the circulating AI Overview examples, Google almost apologizes in the post and says, “We hold ourselves to a high standard, as do our users, so we expect and appreciate the feedback, and take it seriously.” But Reid, in an attempt to justify the errors, then goes into some very revealing detail about why AI Overviews provides erroneous information:

AI Overviews work very differently than chatbots and other LLM products that people may have tried out. They’re not simply generating an output based on training data. While AI Overviews are powered by a customized language model, the model is integrated with our core web ranking systems and designed to carry out traditional “search” tasks, like identifying relevant, high-quality results from our index. That’s why AI Overviews don’t just provide text output, but include relevant links so people can explore further. Because accuracy is paramount in Search, AI Overviews are built to only show information that is backed up by top web results.

This means that AI Overviews generally don’t “hallucinate” or make things up in the ways that other LLM products might.

Here we see the fundamental flaw of the system: “AI Overviews are built to only show information that is backed up by top web results.” The design is based on the false assumption that Google’s page-ranking algorithm favors accurate results and not SEO-gamed garbage. Google Search has been broken for some time, and now the company is relying on those gamed and spam-filled results to feed its new AI model.

Even if the AI model draws from a more accurate source, as with the 1993 game console search seen above, Google’s AI language model can still make inaccurate conclusions about the “accurate” data, confabulating erroneous information in a flawed summary of the information available.

Generally ignoring the folly of basing its AI results on a broken page-ranking algorithm, Google’s blog post instead attributes the commonly circulated errors to several other factors, including users making nonsensical searches “aimed at producing erroneous results.” Google does admit faults with the AI model, like misinterpreting queries, misinterpreting “a nuance of language on the web,” and lacking sufficient high-quality information on certain topics. It also suggests that some of the more egregious examples circulating on social media are fake screenshots.

“Some of these faked results have been obvious and silly,” Reid writes. “Others have implied that we returned dangerous results for topics like leaving dogs in cars, smoking while pregnant, and depression. Those AI Overviews never appeared. So we’d encourage anyone encountering these screenshots to do a search themselves to check.”

(No doubt some of the social media examples are fake, but it’s worth noting that any attempts to replicate those early examples now will likely fail because Google will have manually blocked the results. And it is potentially a testament to how broken Google Search is if people believed extreme fake examples in the first place.)

While addressing the “nonsensical searches” angle in the post, Reid uses the example search, “How many rocks should I eat each day,” which went viral in a tweet on May 23. Reid says, “Prior to these screenshots going viral, practically no one asked Google that question.” And since there isn’t much data on the web that answers it, she says there is a “data void” or “information gap” that was filled by satirical content found on the web, and the AI model found it and pushed it as an answer, much like Featured Snippets might. So basically, it was working exactly as designed.

Enlarge / A screenshot of an AI Overview query, “How many rocks should I eat each day” that went viral on X last week.

Google’s AI Overview is flawed by design, and a new company blog post hints at why Read More »

Bing outage shows just how little competition Google search really has

AI, ai overview, Bing, Biz & IT, chatgpt, duckduckgo, ecosia, gby, Google, kagi, microsoft, mojeek, reddit, right dao, search, stract, Tech, yandex, yep / Shannon Garcia / May 25, 2024

Searching for new search —

Opinion: Actively searching without Google or Bing is harder than it looks.

Kevin Purdy – May 23, 2024 8: 01 pm UTC

Getty Images

Bing, Microsoft’s search engine platform, went down in the very early morning today. That meant that searches from Microsoft’s Edge browsers that had yet to change their default providers didn’t work. It also meant that services relying on Bing’s search API—Microsoft’s own Copilot, ChatGPT search, Yahoo, Ecosia, and DuckDuckGo—similarly failed.

Services were largely restored by the morning Eastern work hours, but the timing feels apt, concerning, or some combination of the two. Google, the consistently dominating search platform, just last week announced and debuted AI Overviews as a default addition to all searches. If you don’t want an AI response but still want to use Google, you can hunt down the new “Web” option in a menu, or you can, per Ernie Smith, tack “&udm=14” onto your search or use Smith’s own “Konami code” shortcut page.

If dismay about AI’s hallucinations, power draw, or pizza recipes concern you—along with perhaps broader Google issues involving privacy, tracking, news, SEO, or monopoly power—most of your other major options were brought down by a single API outage this morning. Moving past that kind of single point of vulnerability will take some work, both by the industry and by you, the person wondering if there’s a real alternative.

Search engine market share, as measured by StatCounter, April 2023–April 2024.

StatCounter

Upward of a billion dollars a year

The overwhelming majority of search tools offering an “alternative” to Google are using Google, Bing, or Yandex, the three major search engines that maintain massive global indexes. Yandex, being based in Russia, is a non-starter for many people around the world at the moment. Bing offers its services widely, most notably to DuckDuckGo, but its ad-based revenue model and privacy particulars have caused some friction there in the past. Before his company was able to block more of Microsoft’s own tracking scripts, DuckDuckGo CEO and founder Gabriel Weinberg explained in a Reddit reply why firms like his weren’t going the full DIY route:

… [W]e source most of our traditional links and images privately from Bing … Really only two companies (Google and Microsoft) have a high-quality global web link index (because I believe it costs upwards of a billion dollars a year to do), and so literally every other global search engine needs to bootstrap with one or both of them to provide a mainstream search product. The same is true for maps btw — only the biggest companies can similarly afford to put satellites up and send ground cars to take streetview pictures of every neighborhood.

Bing makes Microsoft money, if not quite profit yet. It’s in Microsoft’s interest to keep its search index stocked and API open, even if its focus is almost entirely on its own AI chatbot version of Bing. Yet if Microsoft decided to pull API access, or it became unreliable, Google’s default position gets even stronger. What would non-conformists have to choose from then?

Bing outage shows just how little competition Google search really has Read More »

OpenAI will use Reddit posts to train ChatGPT under new deal

AI, chatgpt, openai, reddit, Social Media / Shannon Garcia / May 18, 2024

Data dealings —

Reddit has been eager to sell data from user posts.

Scharon Harding – May 17, 2024 9: 18 pm UTC

Stuff posted on Reddit is getting incorporated into ChatGPT, Reddit and OpenAI announced on Thursday. The new partnership grants OpenAI access to Reddit’s Data API, giving the generative AI firm real-time access to Reddit posts.

Reddit content will be incorporated into ChatGPT “and new products,” Reddit’s blog post said. The social media firm claims the partnership will “enable OpenAI’s AI tools to better understand and showcase Reddit content, especially on recent topics.” OpenAI will also start advertising on Reddit.

The deal is similar to one that Reddit struck with Google in February that allows the tech giant to make “new ways to display Reddit content” and provide “more efficient ways to train models,” Reddit said at the time. Neither Reddit nor OpenAI disclosed the financial terms of their partnership, but Reddit’s partnership with Google was reportedly worth $60 million.

Under the OpenAI partnership, Reddit also gains access to OpenAI large language models (LLMs) to create features for Reddit, including its volunteer moderators.

Reddit’s data licensing push

The news comes about a year after Reddit launched an API war by starting to charge for access to its data API. This resulted in many beloved third-party Reddit apps closing and a massive user protest. Reddit, which would soon become a public company and hadn’t turned a profit yet, said one of the reasons for the sudden change was to prevent AI firms from using Reddit content to train their LLMs for free.

Earlier this month, Reddit published a Public Content Policy stating: “Unfortunately, we see more and more commercial entities using unauthorized access or misusing authorized access to collect public data in bulk, including Reddit public content. Worse, these entities perceive they have no limitation on their usage of that data, and they do so with no regard for user rights or privacy, ignoring reasonable legal, safety, and user removal requests.

In its blog post on Thursday, Reddit said that deals like OpenAI’s are part of an “open” Internet. It added that “part of being open means Reddit content needs to be accessible to those fostering human learning and researching ways to build community, belonging, and empowerment online.”

Reddit has been vocal about its interest in pursuing data licensing deals as a core part of its business. Its building of AI partnerships sparks discourse around the use of user-generated content to fuel AI models without users being compensated and some potentially not considering that their social media posts would be used this way. OpenAI and Stack Overflow faced pushback earlier this month when integrating Stack Overflow content with ChatGPT. Some of Stack Overflow’s user community responded by sabotaging their own posts.

OpenAI is also challenged to work with Reddit data that, like much of the Internet, can be filled with inaccuracies and inappropriate content. Some of the biggest opponents of Reddit’s API rule changes were volunteer mods. Some have exited the platform since, and following the rule changes, Ars Technica spoke with long-time Redditors who were concerned about Reddit content quality moving forward.

Regardless, generative AI firms are keen to tap into Reddit’s access to real-time conversations from a variety of people discussing a nearly endless range of topics. And Reddit seems equally eager to license the data from its users’ posts.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

OpenAI will use Reddit posts to train ChatGPT under new deal Read More »

Reddit, AI spam bots explore new ways to show ads in your feed

ads, reddit, Social Media, Tech / DJ Henderson / April 25, 2024

Reddit has made it clear that it’s an ad-first business. Today, it expanded on that practice with a new ad format that looks to sell things to Reddit users. Simultaneously, Reddit has marketers who are interested in pushing products to users through seemingly legitimate accounts.

In a blog post today, Reddit announced that its Dynamic Product Ads are entering public beta globally. The ad format uses “shopping signals,” aka discussions with people looking to try a product or brand, machine learning, and advertiser product catalogs in order to post relevant ads. Reddit shared an image in the blog post that shows ads, including with products and pricing, that seem to relate to a posted question. User responses to the Reddit post appear under the ad.

A somewhat blurry depiction of the new type of ads Reddit is testing.
A (still blurry) example of a more targeted approach to Reddit’s new ad format.

Reddit’s Dynamic Product Ads can automatically show users ads “based on the products they’ve previously engaged with on the advertiser’s site” and/or “based on what people engage with on Reddit or advertiser sites,” per the blog.

Reddit is an ad business

Reddit’s blog didn’t imply that Dynamic Product Ads means users would see more ads than they do currently. However, today’s blog highlighted the newly public company’s focus on ad sales.

“With Dynamic Product Ads, brands can tap into the rich, high-intent product conversations that people come to Reddit for,” Reddit EVP of Business Marketing and Growth Jim Squires said in a statement.

The blog also noted that “Reddit’s communities are naturally commercial,” adding:

Reddit is where people come to make shopping decisions, and we’re focused on bringing brands into these interactions in a way that adds value for people and drives growth for businesses.

The stance has been increasingly clear over the past year, as Reddit became rather vocal about the fact that it’s never been profitable. In June, the company started charging for API access, resulting in numerous valued third-party Reddit apps closing and messy user protests that left a bad taste in countless long-time users’ and moderators’ mouths. While Reddit initially announced the change as a way to prevent large language models from using its data for free training, it was also seen as a way to drive users to Reddit’s website and mobile app, where it can serve users ads.

Per Reddit’s February SEC filing (PDF), ads made up 98 percent of Reddit’s revenues in 2023 and 2022. That filing included a note from CEO Steve Huffman, saying: “Advertising is our first business” and that Reddit’s ad business is “still in the early phases of growing.”

In September, the company started preventing users from opting out of personalized ads. In June, Reddit introduced a new tool to advertisers that uses natural language processing to look through Reddit user comments for keywords that signal potential interest for a brand.

Reddit’s blog post today hinted at some future evolutions focused on showing Reddit users ads, including “tools and features such as new shopping ads formats like collection ads that enhance the shopper experience while driving performance” and “merchant platform integrations that welcome smaller merchants.”

Reddit, AI spam bots explore new ways to show ads in your feed Read More »

After overreaching TOS angers users, cloud provider Vultr backs off

cloud platform, cloud server, Policy, reddit, user content, vultr / DJ Henderson / March 30, 2024

“Clearly causing confusion” —

Terms seemed to grant an “irrevocable” right to commercialize any user content.

Ashley Belanger – Mar 29, 2024 4: 31 pm UTC

After backlash, the cloud provider Vultr has updated its terms to remove a clause that a Reddit user feared required customers to “fork over rights” to “anything” hosted on its platform.

The alarming clause seemed to grant Vultr a “non-exclusive, perpetual, irrevocable” license to “use and commercialize” any user content uploaded, posted, hosted, or stored on Vultr “in any way that Vultr deems appropriate, without any further consent” or compensation to users or third parties.

Here’s the full clause that was removed:

You hereby grant to Vultr a non-exclusive, perpetual, irrevocable, royalty-free, fully paid-up, worldwide license (including the right to sublicense through multiple tiers) to use, reproduce, process, adapt, publicly perform, publicly display, modify, prepare derivative works, publish, transmit and distribute each of your User Content, or any portion thereof, in any form, medium or distribution method now known or hereafter existing, known or developed, and otherwise use and commercialize the User Content in any way that Vultr deems appropriate, without any further consent, notice and/or compensation to you or to any third parties, for purposes of providing the Services to you.

In a statement provided to Ars, Vultr CEO J.J. Kardwell said that the terms were revised to “simplify and clarify” language causing confusion for some users.

“A Reddit post incorrectly took portions of our Terms of Service out of context, which only pertain to content provided to Vultr on our public mediums (community-related content on public forums, as an example) for purposes of rendering the needed services—e.g., publishing comments, posts, or ratings,” Kardwell said. “This is separate from a user’s own, private content that is deployed on Vultr services.”

It’s easy to see why the Reddit user was confused, as the previous terms did not clearly differentiate between a user’s public and “private content” in the paragraph where it was included. Kardwell told The Register that the old terms, which were drafted in 2021, were “clearly causing confusion for some portion of users” and were updated because Vultr recognized “that the average user doesn’t have a law degree.”

According to Kardwell, the part of the removed clause that “ends with ‘for purposes of providing the Services to you'” was “intended to make it clear that any rights referenced are solely for the purposes of providing the Services to you.” Kevin Cochrane, Vultr’s chief marketing officer, told Ars that users were intended to scroll down to understand that the line only applied to community content described in a section labeled “content that you make publicly available.” He said that the removed clause was necessary in 2021 when Vultr provided forums and collected ratings, but that the clause could be stripped now because “we don’t actually use” that kind of community content “any longer.”

“We’re very focused on being responsive to the community and the concerns people have, and we believe the strongest thing we can do to demonstrate that there is no bad intent here is to remove it,” Kardwell told The Register.

A plain read of the terms without scrolling seemed to substantiate the Reddit user’s worst fears that “it’s possible Vultr may want the expansive license grant to do AI/Machine Learning based on the data they host. Or maybe they could mine database contents to resell [personally identifiable information]. Given the (perpetual!) license, there’s not really any limit to what they might do. They could even clone someone’s app and sell their own rebranded version, and they’d be legally in the clear.”

The user claimed to have been locked out of their Vultr account for five days after refusing to agree to the terms, with Vultr’s support team seemingly providing little recourse to migrate data to a new cloud provider.

“Migrating all my servers and DNS without being able to log in to my account is going to be both a headache and error prone,” the Reddit user wrote. “I feel like they’re holding my business hostage and extorting me into accepting a license I would never consent to under duress.”

Ars was not able to reach the Reddit user to see if Vultr removing the line from the terms has resolved the issue. Other users on the thread claimed that they had terminated their Vultr accounts over the controversy. Cochrane told Ars that they had been contacted by many customers over the past two days and had no way to identify the Reddit user to confirm if they had terminated their account. Cochrane said the support team was actively reaching out to users to verify if their complaints stemmed from discomfort with the previous terms.

In his statement, Kardwell reiterated that Vultr “customers own 100 percent of their content,” clarifying that Vultr “has never claimed any rights to, used, accessed, nor allowed access to or shared” user content, “other than as may be required by law or for security purposes.”

He also confirmed that Vultr would be conducting a “full review” of its terms and publishing another update “soon.” Kardwell told The Register that the most recent update to its terms that led the Reddit user to call out the company was “actually spurred by unrelated Microsoft licensing changes,” promising that Vultr has no plans to use or commercialize user data.

“We do not use user data,” Kardwell told The Register. “We never have, and we never will. We take privacy and security very seriously. It’s at the core of what we do globally.”

After overreaching TOS angers users, cloud provider Vultr backs off Read More »

Reddit faces new reality after cashing in on its IPO

IPO, Policy, reddit, syndication / DJ Henderson / March 23, 2024

r/WallStreetBets —

Reddit must now answer to its shareholders as well as its vocal users.

Hannah Murphy and Anna Mutoh, Financial Times – Mar 23, 2024 10: 10 am UTC

In an interview on the New York Stock Exchange trading floor ahead of Reddit’s market debut on Thursday, chief executive Steve Huffman acknowledged that the mischievous retail investors that congregate on the social media platform might deliberately drive down its share price.

“It’s a free market!” he said.

For Reddit, as for Huffman, the bet on a public offering for a site he described as a “fun and special, but sometimes crazy place” has appeared to pay off.

Shares of the social media company soared on its Big Board debut under the ticker RDDT, closing at $50.44, or 48 percent above its IPO price. This brought its fully diluted market capitalization to $9.5 billion, close to where the company was last valued privately at $10 billion in 2021.

Reddit’s journey to public markets marks a turning point for a fringe, free speech-oriented platform dominated by esoteric memes, sardonic humor, and gamers, as it transforms itself into a more mainstream discussion hub that enforces stricter moderation rules in order to attract advertising dollars.

The picture for its earlier investors was mixed. One big winner was the Newhouse family, who through Advance Magazine Publishers Inc own Condé Nast, which bought Reddit in 2006 for $10 million before spinning it out in 2011. Its shares are now worth about $2.1 billion, a handsome windfall to their publishing empire, which also includes Vanity Fair, the New Yorker, and Vogue. Entities affiliated with OpenAI chief executive Sam Altman now hold a stake worth $613 million.

But investors who put money in at the last financing round in 2021 at $61.79 a share, such as Fidelity, were looking at slightly less on that particular investment.

Founded in 2005, the self-proclaimed “front page of the internet” has battled through management upheaval and moderation scandals to grow to 73 million daily users across its 100,000 communities, or “subreddits,” per Reddit parlance. It is a social media minnow, however, relative to Meta or X, which have more than 2.1 billion and 245 million daily active users, respectively.

Still, its IPO attracted institutional interest. Demand was strong, and the top two dozen investors in the deal, who received the majority of its shares, were typically large asset managers who intend on owning the stock for the long term, one person familiar with the matter said.

Reddit’s surge on its first day of trading, a day after AI infrastructure group Astera Labs jumped 72 percent in its Nasdaq debut, also signals a validation of public investor demand for listings—even a company that is unprofitable, such as Reddit.

“Overall, this is a very positive development for IPO markets [and] should bode well for many of the pre-IPO companies sitting in the queue,” said Christian Munafo, chief investment officer of Liberty Street Advisors.

But, Munafo said, “while [Reddit] performed well out of the gate, the stock may come under pressure unless they are able to demonstrate better growth and monetization.”

Either way, the deal is a boon for Huffman. The chief executive sold 500,000 of his shares in the IPO, cashing out a plump $17 million, and is due to receive additional equity awards as a result of listing the company above a $5 billion valuation. He also received an estimated $193 million pay package last year, mostly made up of equity awards, according to filings.

Historically, Huffman’s style as a leader has reflected that of Reddit’s unruly user base. The self-confessed “internet troll” initially squirmed at the idea of policing the more extreme communities hosted on the platform, relying on these groups to create their own rules and self-moderate. He has defended and cheered on Reddit’s WallStreetBets trading forum that shot to mainstream fame when members collectively bought so-called meme stocks in a bid to squeeze hedge funds*.

But Huffman has recently been forced to tidy up the darker underbelly of the platform for advertisers, present a more professional front to Wall Street and hunt harder for profitability. As a result, Reddit has shifted its ambitions slightly to pin its fortunes to wider tech trends. When Reddit first filed for an IPO in 2021, AI was mentioned once in its prospectus. In the 2024 version, AI appeared more than 60 times.

Nevertheless, the approach has left Huffman and the company at odds with some Reddit communities, who have been resistant to any changes to the platform. Facing new pressures as it enters public markets, some analysts warn that Reddit’s character could be destroyed and users may seek out alternatives, in a drag to the company.

“Reddit, more so than many social media platforms, has been a very community-based, non-commercial space and people know and love it for [this],” said Samuel Woolley, a propaganda expert and assistant professor at the University of Texas at Austin.

“I think the big question that should be on everyone’s mind for Reddit is to what extent the IPO will change the very nature and fabric of the platform.”

Additional reporting by Nicholas Megaw in New York.

Reddit faces new reality after cashing in on its IPO Read More »

Reddit cashes in on AI gold rush with $203M in LLM training license fees

AI, Data, Google, legal, license, LLMs, reddit, training / DJ Henderson / February 25, 2024

Your posts are the product —

Two- to three-year deals with Google, others, come amid legal uncertainty over “fair use.”

Kyle Orland – Feb 23, 2024 5: 13 pm UTC

Enlarge / “Reddit Gold” takes on a whole new meaning when AI training data is involved.

The last week saw word leak that Google had agreed to license Reddit’s massive corpus of billions of posts and comments to help train its large language models. Now, in a recent Securities and Exchange Commission filing, the popular online forum has revealed that it will bring in $203 million from that and other unspecified AI data licensing contracts over the next three years.

Reddit’s Form S-1—published by the SEC late Thursday ahead of the site’s planned stock IPO—says the company expects $66.4 million of that data-derived value from LLM companies to come during the 2024 calendar year. Bloomberg previously reported the Google deal to be worth an estimated $60 million a year, suggesting that the three-year deal represents the vast majority of its AI licensing revenue so far.

Google and other AI companies that license Reddit’s data will receive “continuous access to [Reddit’s] data API as well as quarterly transfers of Reddit data over the term of the arrangement,” according to the filing. That constant, real-time access is particularly valuable, the site writes in the filing, because “Reddit data constantly grows and regenerates as users come and interact with their communities and each other.”

“Why pay for the cow…?”

While Reddit sees data licensing to AI firms as an important part of its financial future, its filing also notes that free use of its data has already been “a foundational part of how many of the leading large language models have been trained.” The filing seems almost bitter in noting that “some companies have constructed very large commercial language models using Reddit data without entering into a license agreement with us.”

That acknowledgment highlights the still-murky legal landscape over AI companies’ penchant for scraping huge swathes of the public web for training purposes, a practice those companies defend as fair use. And Reddit seems well aware that AI models may continue to hoover up its posts and comments for free, even as it tries to sell that data to others.

“Some companies may decline to license Reddit data and use such data without license given its open nature, even if in violation of the legal terms governing our services,” the company writes. “While we plan to vigorously enforce against such entities, such enforcement activities could take years to resolve, result in substantial expense, and divert management’s attention and other resources, and we may not ultimately be successful.”

Yet the mere existence of AI data licensing agreements like Reddit’s may influence how legal battles over this kind of data scraping play out. As Ars’ Timothy Lee and James Grimmelmann noted in a recent legal analysis, the establishment of a settled licensing market can have a huge impact on whether courts consider a novel use of digitized data to be “fair use” under copyright law.

“The more [AI data licensing] deals like this are signed in the coming months, the easier it will be for the plaintiffs to argue that the ‘effect on the market’ prong of fair use analysis should take this licensing market into account,” Lee and Grimmelmann wrote.

And while Reddit sees LLMs as a new revenue opportunity, the site also sees their popularity as a potential threat. The S-1 filing notes that “some users are also turning to LLMs such as ChatGPT, Gemini, and Anthropic” for seeking information, putting them in the same category of Reddit competition as “Google, Amazon, YouTube, Wikipedia, X, and other news sites.”

After filing for its IPO in late 2021, reports suggest Reddit is aiming to hit the stock market next month officially. The company will offer users and moderators with sufficient karma and/or activity on the site the opportunity to participate in that IPO through a directed share program.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Reddit cashes in on AI gold rush with $203M in LLM training license fees Read More »

Reddit admits more moderator protests could hurt its business

API, generative ai, IPO, Policy, reddit, Social Media, Tech / DJ Henderson / February 24, 2024

SEC filing —

Losing third-party tools “could harm our moderators’ ability to review content…”

Scharon Harding – Feb 23, 2024 5: 42 pm UTC

Reddit filed to go public on Thursday (PDF), revealing various details of the social media company’s inner workings. Among the revelations, Reddit acknowledged the threat of future user protests and the value of third-party Reddit apps.

On July 1, Reddit enacted API rule changes—including new, expensive pricing —that resulted in many third-party Reddit apps closing. Disturbed by the changes, the timeline of the changes, and concerns that Reddit wasn’t properly appreciating third-party app developers and moderators, thousands of Reddit users protested by making the subreddits they moderate private, read-only, and/or engaging in other forms of protest, such as only discussing John Oliver or porn.

Protests went on for weeks and, at their onset, crashed Reddit for three hours. At the time, Reddit CEO Steve Huffman said the protests did not have “any significant revenue impact so far.”

In its filing with the Securities and Exchange Commission (SEC), though, Reddit acknowledged that another such protest could hurt its pockets:

While these activities have not historically had a material impact on our business or results of operations, similar actions by moderators and/or their communities in the future could adversely affect our business, results of operations, financial condition, and prospects.

The company also said that bad publicity and media coverage, such as the kind that stemmed from the API protests, could be a risk to Reddit’s success. The Form S-1 said bad PR around Reddit, including its practices, prices, and mods, “could adversely affect the size, demographics, engagement, and loyalty of our user base,” adding:

For instance, in May and June 2023, we experienced negative publicity as a result of our API policy changes.

Reddit’s filing also said that negative publicity and moderators disrupting the normal operation of subreddits could hurt user growth and engagement goals. The company highlighted financial incentives associated with having good relationships with volunteer moderators, noting that if enough mods decided to disrupt Reddit (like they did when they led protests last year), “results of operations, financial condition, and prospects could be adversely affected.” Reddit infamously forcibly removed moderators from their posts during the protests, saying they broke Reddit rules by refusing to reopen the subreddits they moderated.

“As communities grow, it can become more and more challenging for communities to find qualified people willing to act as moderators,” the filing says.

Losing third-party tools could hurt Reddit’s business

Much of the momentum for last year’s protests came from users, including long-time Redditors, mods, and people with accessibility needs, feeling that third-party apps were necessary to enjoyably and properly access and/or moderate Reddit. Reddit’s own technology has disappointed users in the past (leading some to cling to Old Reddit, which uses an older interface, for example). In its SEC filing, Reddit pointed to the value of third-party “tools” despite its API pricing killing off many of the most popular examples.

Reddit’s filing discusses losing moderators as a business risk and notes how important third-party tools are in maintaining mods:

While we provide tools to our communities to manage their subreddits, our moderators also rely on their own and third-party tools. Any disruption to, or lack of availability of, these third-party tools could harm our moderators’ ability to review content and enforce community rules. Further, if we are unable to provide effective support for third-party moderation tools, or develop our own such tools, our moderators could decide to leave our platform and may encourage their communities to follow them to a new platform, which would adversely affect our business, results of operations, financial condition, and prospects.

Since Reddit’s API policy changes, a small number of third-party Reddit apps remain available. But some of the remaining third-party Reddit app developers have previously told Ars Technica that they’re unsure of their app’s tenability under Reddit’s terms. Nondisclosure agreement requirements and the lack of a finalized developer platform also drive uncertainty around the longevity of the third-party Reddit app ecosystem, according to devs Ars spoke with this year.

Reddit admits more moderator protests could hurt its business Read More »

Reddit sells training data to unnamed AI company ahead of IPO

AI, Axel Springer, Biz & IT, Bloomberg, chatgpt, chatgtp, image synthesis, large language models, machine learning, openai, reddit, Stable Diffusion, steve huffman, text synthesis / Mike M. / February 20, 2024

Everything has a price —

If you’ve posted on Reddit, you’re likely feeding the future of AI.

Benj Edwards – Feb 19, 2024 9: 10 pm UTC

In this photo illustration the American social news

On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site’s content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.

Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.

After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.

In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades’ worth of human-generated content.

If the reported $60 million/year deal goes through, it’s quite possible that if you’ve ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.

While we don’t know if OpenAI is the company that signed the deal with Reddit, Bloomberg speculates that Reddit’s ability to tap into AI hype for additional revenue may boost the value of its IPO, which might be worth $5 billion. Despite drama last year, Bloomberg states that Reddit pulled in more than $800 million in revenue in 2023, growing about 20 percent over its 2022 numbers.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Reddit sells training data to unnamed AI company ahead of IPO Read More »