Apple, NVIDIA, Anthropic Accused of Using YouTube Transcripts for AI Training Without Permission

Apple, NVIDIA, and Anthropic allegedly used transcripts from over 173,000 YouTube videos to train their AI models without any permission.

Proof News' recent investigation found that the companies have obtained the dataset from a nonprofit company called Eleuther AI.

YouTube Video Transcripts Allegedly Help Train AI Models

Proof News shared that around 173,536 subtitles from YouTube videos were gathered from more than 48,000 channels to complete the dataset. Silicon Valley giants like Apple, NVIDIA, Salesforce, and Anthropic have allegedly used the data.

The dataset featured transcripts across various content on YouTube like Khan Academy, MIT, NPR, BBC, "The Late Show With Stephen Colbert," "Jimmy Kimmel Live," and more. Materials from popular creators like MrBeast, Marques Brownlee, Pewdiepie, and Jacksepticeye were also spotted.

Proof News also clarified that the dataset is focused on supplying plain text subtitles and did not include any video imagery. The majority of the subtitles gathered were in languages like Japanese, German, and Arabic.

AI Companies Faces Scrutiny From Content Creators

AI companies have been receiving complaints and copyright infringement lawsuits due to the issue of data scraping. According to YouTube, using its data to train AI models is a violation of the platform's terms and services.

The creator of MKBHD, Brownlee, expressed that data scraping is "going to be an evolving problem for a long time." He also noted how Apple seemingly avoids being at fault due to indirect scraping.

Similarly, Anthrophic spokesperson Jennifer Martinez referred the potential violations to the authors of the dataset, also known as "The Pile." The AI company emphasized that YouTube's term only covers the direct use of the platform and that using The Pile is a different case.

Apple, NVIDIA, and other companies have not responded to the issue.

Apple, NVIDIA, Anthropic Accused of Using YouTube Transcripts for AI Training Without Permission

YouTube Video Transcripts Allegedly Help Train AI Models

AI Companies Faces Scrutiny From Content Creators

More from iTechPost

Google Pixel 9a vs Samsung Galaxy A56 – The Ultimate Pixel vs Samsung Mid-Range Phone Showdown

Laptop SSD Upgrade Guide in 2026: Boost Speed, Maximize Capacity & Ensure Full Compatibility

With the iPhone 18 Launch Approaching: Here Are Confirmed Specs, Features, and Camera Upgrades

Best Budget Laptops 2026: Best Units Under $800 for Work and Budget Gaming

YouTube Video Transcripts Allegedly Help Train AI Models

AI Companies Faces Scrutiny From Content Creators

Sign Up for the iTechPost Newsletter