children’s online privacy

ai-trains-on-kids’-photos-even-when-parents-use-strict-privacy-settings

AI trains on kids’ photos even when parents use strict privacy settings

“Outrageous” —

Even unlisted YouTube videos are used to train AI, watchdog warns.

AI trains on kids’ photos even when parents use strict privacy settings

Human Rights Watch (HRW) continues to reveal how photos of real children casually posted online years ago are being used to train AI models powering image generators—even when platforms prohibit scraping and families use strict privacy settings.

Last month, HRW researcher Hye Jung Han found 170 photos of Brazilian kids that were linked in LAION-5B, a popular AI dataset built from Common Crawl snapshots of the public web. Now, she has released a second report, flagging 190 photos of children from all of Australia’s states and territories, including indigenous children who may be particularly vulnerable to harms.

These photos are linked in the dataset “without the knowledge or consent of the children or their families.” They span the entirety of childhood, making it possible for AI image generators to generate realistic deepfakes of real Australian children, Han’s report said. Perhaps even more concerning, the URLs in the dataset sometimes reveal identifying information about children, including their names and locations where photos were shot, making it easy to track down children whose images might not otherwise be discoverable online.

That puts children in danger of privacy and safety risks, Han said, and some parents thinking they’ve protected their kids’ privacy online may not realize that these risks exist.

From a single link to one photo that showed “two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural,” Han could trace “both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia.” And perhaps most disturbingly, “information about these children does not appear to exist anywhere else on the Internet”—suggesting that families were particularly cautious in shielding these boys’ identities online.

Stricter privacy settings were used in another image that Han found linked in the dataset. The photo showed “a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating” during the week after their final exams, Han reported. Whoever posted that YouTube video adjusted privacy settings so that it would be “unlisted” and would not appear in searches.

Only someone with a link to the video was supposed to have access, but that didn’t stop Common Crawl from archiving the image, nor did YouTube policies prohibiting AI scraping or harvesting of identifying information.

Reached for comment, YouTube’s spokesperson, Jack Malon, told Ars that YouTube has “been clear that the unauthorized scraping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse.” But Han worries that even if YouTube did join efforts to remove images of children from the dataset, the damage has been done, since AI tools have already trained on them. That’s why—even more than parents need tech companies to up their game blocking AI training—kids need regulators to intervene and stop training before it happens, Han’s report said.

Han’s report comes a month before Australia is expected to release a reformed draft of the country’s Privacy Act. Those reforms include a draft of Australia’s first child data protection law, known as the Children’s Online Privacy Code, but Han told Ars that even people involved in long-running discussions about reforms aren’t “actually sure how much the government is going to announce in August.”

“Children in Australia are waiting with bated breath to see if the government will adopt protections for them,” Han said, emphasizing in her report that “children should not have to live in fear that their photos might be stolen and weaponized against them.”

AI uniquely harms Australian kids

To hunt down the photos of Australian kids, Han “reviewed fewer than 0.0001 percent of the 5.85 billion images and captions contained in the data set.” Because her sample was so small, Han expects that her findings represent a significant undercount of how many children could be impacted by the AI scraping.

“It’s astonishing that out of a random sample size of about 5,000 photos, I immediately fell into 190 photos of Australian children,” Han told Ars. “You would expect that there would be more photos of cats than there are personal photos of children,” since LAION-5B is a “reflection of the entire Internet.”

LAION is working with HRW to remove links to all the images flagged, but cleaning up the dataset does not seem to be a fast process. Han told Ars that based on her most recent exchange with the German nonprofit, LAION had not yet removed links to photos of Brazilian kids that she reported a month ago.

LAION declined Ars’ request for comment.

In June, LAION’s spokesperson, Nathan Tyler, told Ars that, “as a nonprofit, volunteer organization,” LAION is committed to doing its part to help with the “larger and very concerning issue” of misuse of children’s data online. But removing links from the LAION-5B dataset does not remove the images online, Tyler noted, where they can still be referenced and used in other AI datasets, particularly those relying on Common Crawl. And Han pointed out that removing the links from the dataset doesn’t change AI models that have already trained on them.

“Current AI models cannot forget data they were trained on, even if the data was later removed from the training data set,” Han’s report said.

Kids whose images are used to train AI models are exposed to a variety of harms, Han reported, including a risk that image generators could more convincingly create harmful or explicit deepfakes. In Australia last month, “about 50 girls from Melbourne reported that photos from their social media profiles were taken and manipulated using AI to create sexually explicit deepfakes of them, which were then circulated online,” Han reported.

For First Nations children—”including those identified in captions as being from the Anangu, Arrernte, Pitjantjatjara, Pintupi, Tiwi, and Warlpiri peoples”—the inclusion of links to photos threatens unique harms. Because culturally, First Nations peoples “restrict the reproduction of photos of deceased people during periods of mourning,” Han said the AI training could perpetuate harms by making it harder to control when images are reproduced.

Once an AI model trains on the images, there are other obvious privacy risks, including a concern that AI models are “notorious for leaking private information,” Han said. Guardrails added to image generators do not always prevent these leaks, with some tools “repeatedly broken,” Han reported.

LAION recommends that, if troubled by the privacy risks, parents remove images of kids online as the most effective way to prevent abuse. But Han told Ars that’s “not just unrealistic, but frankly, outrageous.”

“The answer is not to call for children and parents to remove wonderful photos of kids online,” Han said. “The call should be [for] some sort of legal protections for these photos, so that kids don’t have to always wonder if their selfie is going to be abused.”

AI trains on kids’ photos even when parents use strict privacy settings Read More »

ai-trained-on-photos-from-kids’-entire-childhood-without-their-consent

AI trained on photos from kids’ entire childhood without their consent

AI trained on photos from kids’ entire childhood without their consent

Photos of Brazilian kids—sometimes spanning their entire childhood—have been used without their consent to power AI tools, including popular image generators like Stable Diffusion, Human Rights Watch (HRW) warned on Monday.

This act poses urgent privacy risks to kids and seems to increase risks of non-consensual AI-generated images bearing their likenesses, HRW’s report said.

An HRW researcher, Hye Jung Han, helped expose the problem. She analyzed “less than 0.0001 percent” of LAION-5B, a dataset built from Common Crawl snapshots of the public web. The dataset does not contain the actual photos but includes image-text pairs derived from 5.85 billion images and captions posted online since 2008.

Among those images linked in the dataset, Han found 170 photos of children from at least 10 Brazilian states. These were mostly family photos uploaded to personal and parenting blogs most Internet surfers wouldn’t easily stumble upon, “as well as stills from YouTube videos with small view counts, seemingly uploaded to be shared with family and friends,” Wired reported.

LAION, the German nonprofit that created the dataset, has worked with HRW to remove the links to the children’s images in the dataset.

That may not completely resolve the problem, though. HRW’s report warned that the removed links are “likely to be a significant undercount of the total amount of children’s personal data that exists in LAION-5B.” Han told Wired that she fears that the dataset may still be referencing personal photos of kids “from all over the world.”

Removing the links also does not remove the images from the public web, where they can still be referenced and used in other AI datasets, particularly those relying on Common Crawl, LAION’s spokesperson, Nate Tyler, told Ars.

“This is a larger and very concerning issue, and as a nonprofit, volunteer organization, we will do our part to help,” Tyler told Ars.

Han told Ars that “Common Crawl should stop scraping children’s personal data, given the privacy risks involved and the potential for new forms of misuse.”

According to HRW’s analysis, many of the Brazilian children’s identities were “easily traceable,” due to children’s names and locations being included in image captions that were processed when building the LAION dataset.

And at a time when middle and high school-aged students are at greater risk of being targeted by bullies or bad actors turning “innocuous photos” into explicit imagery, it’s possible that AI tools may be better equipped to generate AI clones of kids whose images are referenced in AI datasets, HRW suggested.

“The photos reviewed span the entirety of childhood,” HRW’s report said. “They capture intimate moments of babies being born into the gloved hands of doctors, young children blowing out candles on their birthday cake or dancing in their underwear at home, students giving a presentation at school, and teenagers posing for photos at their high school’s carnival.”

There is less risk that the Brazilian kids’ photos are currently powering AI tools since “all publicly available versions of LAION-5B were taken down” in December, Tyler told Ars. That decision came out of an “abundance of caution” after a Stanford University report “found links in the dataset pointing to illegal content on the public web,” Tyler said, including 3,226 suspected instances of child sexual abuse material.

Han told Ars that “the version of the dataset that we examined pre-dates LAION’s temporary removal of its dataset in December 2023.” The dataset will not be available again until LAION determines that all flagged illegal content has been removed.

“LAION is currently working with the Internet Watch Foundation, the Canadian Centre for Child Protection, Stanford, and Human Rights Watch to remove all known references to illegal content from LAION-5B,” Tyler told Ars. “We are grateful for their support and hope to republish a revised LAION-5B soon.”

In Brazil, “at least 85 girls” have reported classmates harassing them by using AI tools to “create sexually explicit deepfakes of the girls based on photos taken from their social media profiles,” HRW reported. Once these explicit deepfakes are posted online, they can inflict “lasting harm,” HRW warned, potentially remaining online for their entire lives.

“Children should not have to live in fear that their photos might be stolen and weaponized against them,” Han said. “The government should urgently adopt policies to protect children’s data from AI-fueled misuse.”

Ars could not immediately reach Stable Diffusion maker Stability AI for comment.

AI trained on photos from kids’ entire childhood without their consent Read More »