AI Models Accused of Training on Children's Data Despite Privacy Settings

AI models are being accused of using images of real-life children as training datasets despite parents enabling data privacy settings.

Human Rights Watch researcher Jung Han found more than 190 photos of Australian children, including those from Indigenous people, illegally collected through web crawlers.

AI Models Accused of Training on Children's Data Despite Privacy Settings — Aitor Diago/Getty Images

Many of these images collected even included identifying URLs that could allow strangers to track down the children's full name, address, and other personal information.

The report came after Huan earlier spotted nearly 170 photos of Brazilian kids, including still images from YouTube videos.

All of the images were discovered in the LAION-5B dataset, a popular AI data crawler often used by Stable Diffusion.

LAION, a "nonprofit, volunteer organization," told Ars Technica before that that it is already doing its part to resolve the "larger and very concerning issue" to prevent misuse of its tool in exploiting children.

Concerns Rise Amid Surge of AI-Generated Explicit Images

As the AI tool is often used to generate deepfakes, there is a high risk of users using the non-consenting children's likenesses for deepfakes that could damage their reputation or encourage bullying.

The Federal Bureau of Investigation has earlier warned people to be vigilant against malicious actors using AI to extort people through manipulated and generated explicit photos.

Han did not confirm whether the photos collected were being used for criminal schemes but noted that even if LAION deleted the photos, the images remain accessible through AI models that have already used the dataset for training.

How to Protect Likenesses from AI Data Web Crawlers?

The best way to prevent web crawlers from accessing sensitive images, especially pictures of children, is to limit these photos on online spaces and social platforms.

As these platforms are usual targets for AI training datasets, it is advisable to blur out or exclude images of children in online posts to protect their privacy.

Adding "AI poison" filters to these images could also help prevent AIs from training from these photos, although there have been talks of AI companies developing tools to bypass these filters.