AI-image generator Runway was caught scraping thousands of YouTube videos and films without consent to train its large language models, according to leaked internal documents via 404 Media.
According to the report, the multibillion-dollar AI firm was secretly using illegally obtained popular YouTube videos and paywalled media to train its latest AI models.
Although the report was unable to confirm which AI model trained from the scraped data sets, 404 Media hinted that the model could be the company's latest Gen-3 model based on how it evaded previous inquiries.
In an earlier TechCrunch interview, Runway co-founder Anastasis Germanidis claimed that its AI only uses "internal datasets" with an "in-house research team that oversees all of our training."
If the statement is proven false, the AI firm could likely violate YouTube's policy against unauthorized data scraping policy Google has enforced against OpenAI.
It is worth noting that Runway is a Google-backed firm with a $1.5 billion investment in June last year and allows its cloud storage system to be used to train new AI models.
AI Firms Accused of Scraping YouTube Videos for Data Training
Runway is not the only AI firm being accused of illegally using licensed and protected content to train its new models.
Tech industry giants like Apple, Anthropic, Nvidia, and Salesforce have been reported of using scraped popular YouTube videos for AI data training.
Even Google itself has notably taken user-generated content on its video-streaming social platform to train text-based AI models.
How to Protect User Content on YouTube from AI Data Scrapers?
The best way to prevent AI data scrapers from obtaining user-generated content on social platforms like YouTube is to use tools that could fill the video with invisible "AI poisons."
This basically works by adding an invisible layer that could render the video or data unusable when processed on AI training.
Most tools for these filters are available for free, although users would need to make requests first before having their content protected.